LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 2/4] powerpc/xive: fix hcall H_INT_RESET to support long busy delays
From: Cédric Le Goater @ 2018-05-08  7:05 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Michael Ellerman, Benjamin Herrenschmidt, Cédric Le Goater
In-Reply-To: <20180508070517.947-1-clg@kaod.org>

The hcall H_INT_RESET can take some time to complete and in such cases
it returns H_LONG_BUSY_* codes requiring the machine to sleep for a
while before retrying.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
---

 Changes since v2:

 - replaced msleep() by mdelay() as some calling path are under lock. 

 arch/powerpc/sysdev/xive/spapr.c | 52 ++++++++++++++++++++++++++++++++++++----
 1 file changed, 47 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/sysdev/xive/spapr.c b/arch/powerpc/sysdev/xive/spapr.c
index 3cf5f8bf4c29..31dc73cacd45 100644
--- a/arch/powerpc/sysdev/xive/spapr.c
+++ b/arch/powerpc/sysdev/xive/spapr.c
@@ -19,6 +19,7 @@
 #include <linux/spinlock.h>
 #include <linux/cpumask.h>
 #include <linux/mm.h>
+#include <linux/delay.h>
 
 #include <asm/prom.h>
 #include <asm/io.h>
@@ -108,6 +109,51 @@ static void xive_irq_bitmap_free(int irq)
 	}
 }
 
+
+/* Based on the similar routines in RTAS */
+static unsigned int plpar_busy_delay_time(long rc)
+{
+	unsigned int ms = 0;
+
+	if (H_IS_LONG_BUSY(rc)) {
+		ms = get_longbusy_msecs(rc);
+	} else if (rc == H_BUSY) {
+		ms = 10; /* seems appropriate for XIVE hcalls */
+	}
+
+	return ms;
+}
+
+static unsigned int plpar_busy_delay(int rc)
+{
+	unsigned int ms;
+
+	ms = plpar_busy_delay_time(rc);
+	if (ms)
+		mdelay(ms);
+
+	return ms;
+}
+
+/*
+ * Note: this call has a partition wide scope and can take a while to
+ * complete. If it returns H_LONG_BUSY_* it should be retried
+ * periodically.
+ */
+static long plpar_int_reset(unsigned long flags)
+{
+	long rc;
+
+	do {
+		rc = plpar_hcall_norets(H_INT_RESET, flags);
+	} while (plpar_busy_delay(rc));
+
+	if (rc)
+		pr_err("H_INT_RESET failed %ld\n", rc);
+
+	return rc;
+}
+
 static long plpar_int_get_source_info(unsigned long flags,
 				      unsigned long lisn,
 				      unsigned long *src_flags,
@@ -433,11 +479,7 @@ static void xive_spapr_put_ipi(unsigned int cpu, struct xive_cpu *xc)
 
 static void xive_spapr_shutdown(void)
 {
-	long rc;
-
-	rc = plpar_hcall_norets(H_INT_RESET, 0);
-	if (rc)
-		pr_err("H_INT_RESET failed %ld\n", rc);
+	plpar_int_reset(0);
 }
 
 /*
-- 
2.13.6

^ permalink raw reply related

* [PATCH v2 0/4] powerpc/xive: add support for H_INT_RESET
From: Cédric Le Goater @ 2018-05-08  7:05 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Michael Ellerman, Benjamin Herrenschmidt, Cédric Le Goater

Hello,

H_INT_RESET performs a reset of the Hypervisor internal interrupt
structures, removing all settings done with H_INT_SET_SOURCE_CONFIG
and H_INT_SET_QUEUE_CONFIG. This is most important for kdump and kexec
to be able to restart in a clean state.

First patch closes a window in the kexec sequence in which the
H_INT_RESET hcall can be run concurrently with other hcalls when CPUs
are brought down. The other patches wire up the reset hcall making
sure it supports H_LONG_BUSY_* returned by the Hypervisor.

Thanks,

C.

 Changes since v2:

 - replaced msleep() by mdelay() as some calling path are under lock. 

Cédric Le Goater (4):
  powerpc/64/kexec: fix race in kexec when XIVE is shutdown
  powerpc/xive: fix hcall H_INT_RESET to support long busy delays
  powerpc/xive: shutdown XIVE when kexec or kdump is performed
  powerpc/xive: prepare all hcalls to support long busy delays

 arch/powerpc/kernel/machine_kexec_64.c |  8 ++--
 arch/powerpc/platforms/pseries/kexec.c |  7 ++-
 arch/powerpc/sysdev/xive/spapr.c       | 88 +++++++++++++++++++++++++++++-----
 3 files changed, 84 insertions(+), 19 deletions(-)

-- 
2.13.6

^ permalink raw reply

* [PATCH] powerpc/perf: Update raw-event code encoding comment for power8
From: Madhavan Srinivasan @ 2018-05-08  5:00 UTC (permalink / raw)
  To: mpe; +Cc: linuxppc-dev, Madhavan Srinivasan

Comment explanning the raw event code encoding for Power8 was
moved to isa207_common.h file when re-factoring the code to
support power9. But then Power9 pmu branched out due to changes
specific to power9. So move the encoding comment back to power8-pmu.c
Just comment movement and no logic change.

Fixes: 4d3576b20716 ('powerpc/perf: factor out power8 pmu macros and defines')
Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
---
 arch/powerpc/perf/isa207-common.h | 64 ---------------------------------------
 arch/powerpc/perf/power8-pmu.c    | 64 +++++++++++++++++++++++++++++++++++++++
 2 files changed, 64 insertions(+), 64 deletions(-)

diff --git a/arch/powerpc/perf/isa207-common.h b/arch/powerpc/perf/isa207-common.h
index 6c737d675792..6a0b586c935a 100644
--- a/arch/powerpc/perf/isa207-common.h
+++ b/arch/powerpc/perf/isa207-common.h
@@ -17,70 +17,6 @@
 #include <asm/firmware.h>
 #include <asm/cputable.h>
 
-/*
- * Raw event encoding for PowerISA v2.07:
- *
- *        60        56        52        48        44        40        36        32
- * | - - - - | - - - - | - - - - | - - - - | - - - - | - - - - | - - - - | - - - - |
- *   | | [ ]                           [      thresh_cmp     ]   [  thresh_ctl   ]
- *   | |  |                                                              |
- *   | |  *- IFM (Linux)                 thresh start/stop OR FAB match -*
- *   | *- BHRB (Linux)
- *   *- EBB (Linux)
- *
- *        28        24        20        16        12         8         4         0
- * | - - - - | - - - - | - - - - | - - - - | - - - - | - - - - | - - - - | - - - - |
- *   [   ] [  sample ]   [cache]   [ pmc ]   [unit ]   c     m   [    pmcxsel    ]
- *     |        |           |                          |     |
- *     |        |           |                          |     *- mark
- *     |        |           *- L1/L2/L3 cache_sel      |
- *     |        |                                      |
- *     |        *- sampling mode for marked events     *- combine
- *     |
- *     *- thresh_sel
- *
- * Below uses IBM bit numbering.
- *
- * MMCR1[x:y] = unit    (PMCxUNIT)
- * MMCR1[x]   = combine (PMCxCOMB)
- *
- * if pmc == 3 and unit == 0 and pmcxsel[0:6] == 0b0101011
- *	# PM_MRK_FAB_RSP_MATCH
- *	MMCR1[20:27] = thresh_ctl   (FAB_CRESP_MATCH / FAB_TYPE_MATCH)
- * else if pmc == 4 and unit == 0xf and pmcxsel[0:6] == 0b0101001
- *	# PM_MRK_FAB_RSP_MATCH_CYC
- *	MMCR1[20:27] = thresh_ctl   (FAB_CRESP_MATCH / FAB_TYPE_MATCH)
- * else
- *	MMCRA[48:55] = thresh_ctl   (THRESH START/END)
- *
- * if thresh_sel:
- *	MMCRA[45:47] = thresh_sel
- *
- * if thresh_cmp:
- *	MMCRA[22:24] = thresh_cmp[0:2]
- *	MMCRA[25:31] = thresh_cmp[3:9]
- *
- * if unit == 6 or unit == 7
- *	MMCRC[53:55] = cache_sel[1:3]      (L2EVENT_SEL)
- * else if unit == 8 or unit == 9:
- *	if cache_sel[0] == 0: # L3 bank
- *		MMCRC[47:49] = cache_sel[1:3]  (L3EVENT_SEL0)
- *	else if cache_sel[0] == 1:
- *		MMCRC[50:51] = cache_sel[2:3]  (L3EVENT_SEL1)
- * else if cache_sel[1]: # L1 event
- *	MMCR1[16] = cache_sel[2]
- *	MMCR1[17] = cache_sel[3]
- *
- * if mark:
- *	MMCRA[63]    = 1		(SAMPLE_ENABLE)
- *	MMCRA[57:59] = sample[0:2]	(RAND_SAMP_ELIG)
- *	MMCRA[61:62] = sample[3:4]	(RAND_SAMP_MODE)
- *
- * if EBB and BHRB:
- *	MMCRA[32:33] = IFM
- *
- */
-
 #define EVENT_EBB_MASK		1ull
 #define EVENT_EBB_SHIFT		PERF_EVENT_CONFIG_EBB_SHIFT
 #define EVENT_BHRB_MASK		1ull
diff --git a/arch/powerpc/perf/power8-pmu.c b/arch/powerpc/perf/power8-pmu.c
index c9356955cab4..d12a2db26353 100644
--- a/arch/powerpc/perf/power8-pmu.c
+++ b/arch/powerpc/perf/power8-pmu.c
@@ -30,6 +30,70 @@ enum {
 #define	POWER8_MMCRA_IFM2		0x0000000080000000UL
 #define	POWER8_MMCRA_IFM3		0x00000000C0000000UL
 
+/*
+ * Raw event encoding for PowerISA v2.07 (Power8):
+ *
+ *        60        56        52        48        44        40        36        32
+ * | - - - - | - - - - | - - - - | - - - - | - - - - | - - - - | - - - - | - - - - |
+ *   | | [ ]                           [      thresh_cmp     ]   [  thresh_ctl   ]
+ *   | |  |                                                              |
+ *   | |  *- IFM (Linux)                 thresh start/stop OR FAB match -*
+ *   | *- BHRB (Linux)
+ *   *- EBB (Linux)
+ *
+ *        28        24        20        16        12         8         4         0
+ * | - - - - | - - - - | - - - - | - - - - | - - - - | - - - - | - - - - | - - - - |
+ *   [   ] [  sample ]   [cache]   [ pmc ]   [unit ]   c     m   [    pmcxsel    ]
+ *     |        |           |                          |     |
+ *     |        |           |                          |     *- mark
+ *     |        |           *- L1/L2/L3 cache_sel      |
+ *     |        |                                      |
+ *     |        *- sampling mode for marked events     *- combine
+ *     |
+ *     *- thresh_sel
+ *
+ * Below uses IBM bit numbering.
+ *
+ * MMCR1[x:y] = unit    (PMCxUNIT)
+ * MMCR1[x]   = combine (PMCxCOMB)
+ *
+ * if pmc == 3 and unit == 0 and pmcxsel[0:6] == 0b0101011
+ *	# PM_MRK_FAB_RSP_MATCH
+ *	MMCR1[20:27] = thresh_ctl   (FAB_CRESP_MATCH / FAB_TYPE_MATCH)
+ * else if pmc == 4 and unit == 0xf and pmcxsel[0:6] == 0b0101001
+ *	# PM_MRK_FAB_RSP_MATCH_CYC
+ *	MMCR1[20:27] = thresh_ctl   (FAB_CRESP_MATCH / FAB_TYPE_MATCH)
+ * else
+ *	MMCRA[48:55] = thresh_ctl   (THRESH START/END)
+ *
+ * if thresh_sel:
+ *	MMCRA[45:47] = thresh_sel
+ *
+ * if thresh_cmp:
+ *	MMCRA[22:24] = thresh_cmp[0:2]
+ *	MMCRA[25:31] = thresh_cmp[3:9]
+ *
+ * if unit == 6 or unit == 7
+ *	MMCRC[53:55] = cache_sel[1:3]      (L2EVENT_SEL)
+ * else if unit == 8 or unit == 9:
+ *	if cache_sel[0] == 0: # L3 bank
+ *		MMCRC[47:49] = cache_sel[1:3]  (L3EVENT_SEL0)
+ *	else if cache_sel[0] == 1:
+ *		MMCRC[50:51] = cache_sel[2:3]  (L3EVENT_SEL1)
+ * else if cache_sel[1]: # L1 event
+ *	MMCR1[16] = cache_sel[2]
+ *	MMCR1[17] = cache_sel[3]
+ *
+ * if mark:
+ *	MMCRA[63]    = 1		(SAMPLE_ENABLE)
+ *	MMCRA[57:59] = sample[0:2]	(RAND_SAMP_ELIG)
+ *	MMCRA[61:62] = sample[3:4]	(RAND_SAMP_MODE)
+ *
+ * if EBB and BHRB:
+ *	MMCRA[32:33] = IFM
+ *
+ */
+
 /* PowerISA v2.07 format attribute structure*/
 extern struct attribute_group isa207_pmu_format_group;
 
-- 
2.7.4

^ permalink raw reply related

* Re: [PATCH v2 6/7] ocxl: Add an IOCTL so userspace knows what CPU features are available
From: Alastair D'Silva @ 2018-05-08  3:54 UTC (permalink / raw)
  To: Nicholas Piggin
  Cc: Frederic Barrat, linuxppc-dev, linux-kernel, linux-doc, mikey,
	vaibhav, aneesh.kumar, malat, felix, pombredanne, sukadev, gregkh,
	arnd, andrew.donnellan, fbarrat, corbet
In-Reply-To: <20180508135019.3e08d57a@roar.ozlabs.ibm.com>

On Tue, 2018-05-08 at 13:50 +1000, Nicholas Piggin wrote:
> On Tue, 08 May 2018 10:41:55 +1000
> "Alastair D'Silva" <alastair@au1.ibm.com> wrote:
> 
> > On Mon, 2018-05-07 at 20:14 +0200, Frederic Barrat wrote:
> > > 
> > > Le 18/04/2018 à 03:08, Alastair D'Silva a écrit :  
> > > > From: Alastair D'Silva <alastair@d-silva.org>
> > > > 
> > > > In order for a userspace AFU driver to call the Power9 specific
> > > > OCXL_IOCTL_ENABLE_P9_WAIT, it needs to verify that it can
> > > > actually
> > > > make that call.
> > > > 
> > > > Signed-off-by: Alastair D'Silva <alastair@d-silva.org>
> > > > ---
> > > >   Documentation/accelerators/ocxl.rst |  1 -
> > > >   drivers/misc/ocxl/file.c            | 25
> > > > +++++++++++++++++++++++++
> > > >   include/uapi/misc/ocxl.h            |  4 ++++
> > > >   3 files changed, 29 insertions(+), 1 deletion(-)
> > > > 
> > > > diff --git a/Documentation/accelerators/ocxl.rst
> > > > b/Documentation/accelerators/ocxl.rst
> > > > index ddcc58d01cfb..7904adcc07fd 100644
> > > > --- a/Documentation/accelerators/ocxl.rst
> > > > +++ b/Documentation/accelerators/ocxl.rst
> > > > @@ -157,7 +157,6 @@ OCXL_IOCTL_GET_METADATA:
> > > >     Obtains configuration information from the card, such at
> > > > the
> > > > size of
> > > >     MMIO areas, the AFU version, and the PASID for the current
> > > > context.
> > > > 
> > > > -  
> > > 
> > > 
> > > Intended?
> > > 
> > > Other than that,
> > > Acked-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
> > >   
> > 
> > Nope, I'll fix that, thanks.
> > 
> 
> Just to be clear, this is for OCXL features. I would just make that
> explicit in the title (s/CPU/OCXL).
> 
> Thanks,
> Nick
> 

OK, sounds reasonable.

-- 
Alastair D'Silva
Open Source Developer
Linux Technology Centre, IBM Australia
mob: 0423 762 819

^ permalink raw reply

* Re: [PATCH v2 6/7] ocxl: Add an IOCTL so userspace knows what CPU features are available
From: Nicholas Piggin @ 2018-05-08  3:50 UTC (permalink / raw)
  To: Alastair D'Silva
  Cc: Frederic Barrat, linuxppc-dev, linux-kernel, linux-doc, mikey,
	vaibhav, aneesh.kumar, malat, felix, pombredanne, sukadev, gregkh,
	arnd, andrew.donnellan, fbarrat, corbet
In-Reply-To: <1525740115.7796.46.camel@au1.ibm.com>

On Tue, 08 May 2018 10:41:55 +1000
"Alastair D'Silva" <alastair@au1.ibm.com> wrote:

> On Mon, 2018-05-07 at 20:14 +0200, Frederic Barrat wrote:
> > 
> > Le 18/04/2018 à 03:08, Alastair D'Silva a écrit :  
> > > From: Alastair D'Silva <alastair@d-silva.org>
> > > 
> > > In order for a userspace AFU driver to call the Power9 specific
> > > OCXL_IOCTL_ENABLE_P9_WAIT, it needs to verify that it can actually
> > > make that call.
> > > 
> > > Signed-off-by: Alastair D'Silva <alastair@d-silva.org>
> > > ---
> > >   Documentation/accelerators/ocxl.rst |  1 -
> > >   drivers/misc/ocxl/file.c            | 25
> > > +++++++++++++++++++++++++
> > >   include/uapi/misc/ocxl.h            |  4 ++++
> > >   3 files changed, 29 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/Documentation/accelerators/ocxl.rst
> > > b/Documentation/accelerators/ocxl.rst
> > > index ddcc58d01cfb..7904adcc07fd 100644
> > > --- a/Documentation/accelerators/ocxl.rst
> > > +++ b/Documentation/accelerators/ocxl.rst
> > > @@ -157,7 +157,6 @@ OCXL_IOCTL_GET_METADATA:
> > >     Obtains configuration information from the card, such at the
> > > size of
> > >     MMIO areas, the AFU version, and the PASID for the current
> > > context.
> > > 
> > > -  
> > 
> > 
> > Intended?
> > 
> > Other than that,
> > Acked-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
> >   
> 
> Nope, I'll fix that, thanks.
> 

Just to be clear, this is for OCXL features. I would just make that
explicit in the title (s/CPU/OCXL).

Thanks,
Nick

^ permalink raw reply

* Re: [PATCH 04/15] powerpc/powernv: opal-kmsg use flush fallback from console code
From: Nicholas Piggin @ 2018-05-08  3:40 UTC (permalink / raw)
  To: Michael Ellerman
  Cc: linuxppc-dev, Jiri Slaby, linux-kernel, Greg Kroah-Hartman
In-Reply-To: <87wowfbvm0.fsf@concordia.ellerman.id.au>

On Mon, 07 May 2018 20:36:39 +1000
Michael Ellerman <mpe@ellerman.id.au> wrote:

> Nicholas Piggin <npiggin@gmail.com> writes:
> > On Fri, 04 May 2018 15:16:37 +1000
> > Michael Ellerman <mpe@ellerman.id.au> wrote:  
> >> Nicholas Piggin <npiggin@gmail.com> writes:  
> >> > Use the more refined and tested event polling loop from opal_put_chars
> >> > as the fallback console flush in the opal-kmsg path. This loop is used
> >> > by the console driver today, whereas the opal-kmsg fallback is not
> >> > likely to have been used for years.
> >> >
> >> > Use WARN_ONCE rather than a printk when the fallback is invoked to
> >> > prepare for moving the console flush into a common function.    
> >> 
> >> Do we want to add a WARN in that path? If we're panicking things might
> >> get worse if we WARN (which takes a trap).  
> >
> > True, probably a good idea not to... oh there's a printk_once so
> > that'll work nicely.  
> 
> Cool.
> 
> I have this series in a tree so you can send me an incremental diff if
> it's reasonably small.

It's a one liner (also moved location of message back to where it was
originally).

The next patch will clash because it moves this over into opal.c, so
you'd have to fix that by hand.

Thanks,
Nick

---
 arch/powerpc/platforms/powernv/opal-kmsg.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/platforms/powernv/opal-kmsg.c b/arch/powerpc/platforms/powernv/opal-kmsg.c
index fd2bbf4fd6dc..c610ef3541aa 100644
--- a/arch/powerpc/platforms/powernv/opal-kmsg.c
+++ b/arch/powerpc/platforms/powernv/opal-kmsg.c
@@ -53,12 +53,12 @@ static void force_opal_console_flush(struct kmsg_dumper *dumper,
 	} else {
 		__be64 evt;
 
-		WARN_ONCE(1, "opal: OPAL_CONSOLE_FLUSH missing.\n");
 		/*
 		 * If OPAL_CONSOLE_FLUSH is not implemented in the firmware,
 		 * the console can still be flushed by calling the polling
 		 * function while it has OPAL_EVENT_CONSOLE_OUTPUT events.
 		 */
+		printk_once(KERN_NOTICE "opal: OPAL_CONSOLE_FLUSH missing.\n");
 		do {
 			opal_poll_events(&evt);
 		} while (be64_to_cpu(evt) & OPAL_EVENT_CONSOLE_OUTPUT);
-- 
2.17.0

^ permalink raw reply related

* Re: [PATCH 08/15] powerpc/powernv: implement opal_put_chars_atomic
From: Nicholas Piggin @ 2018-05-08  3:36 UTC (permalink / raw)
  To: Michael Ellerman
  Cc: Benjamin Herrenschmidt, Greg Kroah-Hartman, linuxppc-dev,
	linux-kernel, Jiri Slaby
In-Reply-To: <87zi1bbvnl.fsf@concordia.ellerman.id.au>

On Mon, 07 May 2018 20:35:42 +1000
Michael Ellerman <mpe@ellerman.id.au> wrote:

> Nicholas Piggin <npiggin@gmail.com> writes:
> 
> > On Tue, 01 May 2018 19:48:58 +1000
> > Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote:
> >  
> >> On Tue, 2018-05-01 at 00:55 +1000, Nicholas Piggin wrote:  
> >> > The RAW console does not need writes to be atomic, so relax
> >> > opal_put_chars to be able to do partial writes, and implement an
> >> > _atomic variant which does not take a spinlock. This API is used
> >> > in xmon, so the less locking that is used, the better chance there
> >> > is that a crash can be debugged.    
> >> 
> >> Same comment I already had :-) "atomic" in Linux tends to mean
> >> something else (ie, atomic context), so I'd rather have something
> >> like opal_put_chars_sync() or such...  
> >
> > Oh yeah, I didn't ignore you, just... I thought atomic was okay.
> > atomic *also* tends to mean happens atomically. I think the in
> > atomic context meaning actually tends to be inatomic.
> >
> > Sync I actually thought could be more easily confused with
> > synchronous vs asynchronous here.  
> 
> I think we probably want opal_put_chars() to stay as it is.
> 
> And then add a variant for the call (just xmon?) that want lock free
> behaviour.

No it's not the lock which is important here, it is whether the
message goes to the console atomically versus other writes. The
raw console does not require this, only one which sends some
control characters, which is the hvterm-protocol compatible variant
of the vio console, and I think FSP console.

BMC consoles for example always use raw.

> opal_put_chars_unlocked() or something?

I prefer the _atomic as the special case. Ordinarily we don't have
a special requirement, but with the control characters then we do.

Thanks,
Nick

^ permalink raw reply

* Re: [PATCH v2 1/7] powerpc: Add TIDR CPU feature for Power9
From: Alastair D'Silva @ 2018-05-08  3:13 UTC (permalink / raw)
  To: Frederic Barrat, linuxppc-dev
  Cc: mikey, arnd, linux-doc, malat, gregkh, corbet, vaibhav, npiggin,
	linux-kernel, fbarrat, aneesh.kumar, andrew.donnellan,
	pombredanne, felix, sukadev
In-Reply-To: <2f95cc6f-9843-e2b9-6fca-2f4153317a5f@linux.ibm.com>

On Mon, 2018-05-07 at 19:17 +0200, Frederic Barrat wrote:
> 
> Le 18/04/2018 à 03:08, Alastair D'Silva a écrit :
> > From: Alastair D'Silva <alastair@d-silva.org>
> > 
> > This patch adds a CPU feature bit to show whether the CPU has
> > the TIDR register available, enabling as_notify/wait in userspace.
> > 
> > Signed-off-by: Alastair D'Silva <alastair@d-silva.org>
> > ---
> >   arch/powerpc/include/asm/cputable.h | 3 ++-
> >   arch/powerpc/kernel/dt_cpu_ftrs.c   | 1 +
> >   2 files changed, 3 insertions(+), 1 deletion(-)
> > 
> > diff --git a/arch/powerpc/include/asm/cputable.h
> > b/arch/powerpc/include/asm/cputable.h
> > index 4e332f3531c5..54c4cbbe57b4 100644
> > --- a/arch/powerpc/include/asm/cputable.h
> > +++ b/arch/powerpc/include/asm/cputable.h
> > @@ -215,6 +215,7 @@ static inline void cpu_feature_keys_init(void)
> > { }
> >   #define CPU_FTR_P9_TM_HV_ASSIST		LONG_ASM_CONST(0x0
> > 000100000000000)
> >   #define CPU_FTR_P9_TM_XER_SO_BUG	LONG_ASM_CONST(0x00002000
> > 00000000)
> >   #define CPU_FTR_P9_TLBIE_BUG		LONG_ASM_CONST(0x0000
> > 400000000000)
> > +#define CPU_FTR_P9_TIDR			LONG_ASM_CONST(0x00
> > 00800000000000)
> > 
> >   #ifndef __ASSEMBLY__
> > 
> > @@ -462,7 +463,7 @@ static inline void cpu_feature_keys_init(void)
> > { }
> >   	    CPU_FTR_CFAR | CPU_FTR_HVMODE | CPU_FTR_VMX_COPY | \
> >   	    CPU_FTR_DBELL | CPU_FTR_HAS_PPR | CPU_FTR_ARCH_207S |
> > \
> >   	    CPU_FTR_TM_COMP | CPU_FTR_ARCH_300 | CPU_FTR_PKEY | \
> > -	    CPU_FTR_P9_TLBIE_BUG)
> > +	    CPU_FTR_P9_TLBIE_BUG | CPU_FTR_P9_TIDR)
> >   #define CPU_FTRS_POWER9_DD1 ((CPU_FTRS_POWER9 |
> > CPU_FTR_POWER9_DD1) & \
> >   			     (~CPU_FTR_SAO))
> >   #define CPU_FTRS_POWER9_DD2_0 CPU_FTRS_POWER9
> > diff --git a/arch/powerpc/kernel/dt_cpu_ftrs.c
> > b/arch/powerpc/kernel/dt_cpu_ftrs.c
> > index 11a3a4fed3fb..10f8b7f55637 100644
> > --- a/arch/powerpc/kernel/dt_cpu_ftrs.c
> > +++ b/arch/powerpc/kernel/dt_cpu_ftrs.c
> > @@ -722,6 +722,7 @@ static __init void cpufeatures_cpu_quirks(void)
> >   	if ((version & 0xffff0000) == 0x004e0000) {
> >   		cur_cpu_spec->cpu_features &= ~(CPU_FTR_DAWR);
> >   		cur_cpu_spec->cpu_features |=
> > CPU_FTR_P9_TLBIE_BUG; > +		cur_cpu_spec->cpu_features 
> > |= CPU_FTR_P9_TIDR;
> 
> 
> Isn't it redundant with adding the flag to CPU_FTRS_POWER9?
> 
>    Fred
> 

No, cpu_features is populated from device tree, not from
CPU_FTRS_POWER9. Since TIDR will not be explicitly requested in the
device tree, we need to handle it in quirks.

-- 
Alastair D'Silva
Open Source Developer
Linux Technology Centre, IBM Australia
mob: 0423 762 819

^ permalink raw reply

* Re: [PATCH] pkeys: Introduce PKEY_ALLOC_SIGNALINHERIT and change signal semantics
From: Andy Lutomirski @ 2018-05-08  2:49 UTC (permalink / raw)
  To: Florian Weimer
  Cc: Andrew Lutomirski, linuxram, Dave Hansen, Linux-MM, Linux API,
	linux-x86_64, linux-arch, X86 ML, linuxppc-dev
In-Reply-To: <927c8325-4c98-d7af-b921-6aafcf8fe992@redhat.com>

On Mon, May 7, 2018 at 2:48 AM Florian Weimer <fweimer@redhat.com> wrote:

> On 05/03/2018 06:05 AM, Andy Lutomirski wrote:
> > On Wed, May 2, 2018 at 7:11 PM Ram Pai <linuxram@us.ibm.com> wrote:
> >
> >> On Wed, May 02, 2018 at 09:23:49PM +0000, Andy Lutomirski wrote:
> >>>
> >>>> If I recall correctly, the POWER maintainer did express a strong
> > desire
> >>>> back then for (what is, I believe) their current semantics, which my
> >>>> PKEY_ALLOC_SIGNALINHERIT patch implements for x86, too.
> >>>
> >>> Ram, I really really don't like the POWER semantics.  Can you give
some
> >>> justification for them?  Does POWER at least have an atomic way for
> >>> userspace to modify just the key it wants to modify or, even better,
> >>> special load and store instructions to use alternate keys?
> >
> >> I wouldn't call it POWER semantics. The way I implemented it on power
> >> lead to the semantics, given that nothing was explicitly stated
> >> about how the semantics should work within a signal handler.
> >
> > I think that this is further evidence that we should introduce a new
> > pkey_alloc() mode and deprecate the old.  To the extent possible, this
> > thing should work the same way on x86 and POWER.

> Do you propose to change POWER or to change x86?

Sorry for being slow to reply.  I propose to introduce a new
PKEY_ALLOC_something variant on x86 and POWER and to make the behavior
match on both.  It should at least update the values loaded when a signal
is delivered and it should probably also update it for new threads.

For glibc, for example, I assume that you want signals to be delivered with
write access disabled to the GOT.  Otherwise you would fail to protect
against exploits that occur in signal context.  Glibc controls thread
creation, so the initial state on thread startup doesn't really matter, but
there will be more users than just glibc.

--Andy

^ permalink raw reply

* Re: [PATCH 11/13] powerpc/eeh: Introduce eeh_set_irq_state()
From: Sam Bobroff @ 2018-05-08  1:12 UTC (permalink / raw)
  To: Michael Ellerman; +Cc: linuxppc-dev
In-Reply-To: <87vac486nr.fsf@concordia.ellerman.id.au>

[-- Attachment #1: Type: text/plain, Size: 1368 bytes --]

On Fri, May 04, 2018 at 01:02:32PM +1000, Michael Ellerman wrote:
> Sam Bobroff <sbobroff@linux.ibm.com> writes:
> 
> > diff --git a/arch/powerpc/kernel/eeh_driver.c b/arch/powerpc/kernel/eeh_driver.c
> > index f63a01d336ee..b3edd0df04b8 100644
> > --- a/arch/powerpc/kernel/eeh_driver.c
> > +++ b/arch/powerpc/kernel/eeh_driver.c
> > @@ -210,6 +206,23 @@ static void eeh_set_channel_state(struct eeh_pe *root, enum pci_channel_state s)
> >  				edev->pdev->error_state = s;
> >  }
> >  
> > +static void eeh_set_irq_state(struct eeh_pe *root, bool enable)
> > +{
> > +	struct eeh_pe *pe;
> > +	struct eeh_dev *edev, *tmp;
> > +
> > +	eeh_for_each_pe(root, pe)
> > +		eeh_pe_for_each_dev(pe, edev, tmp)
> > +			if (eeh_edev_actionable(edev))
> > +				if (eeh_pcid_get(edev->pdev)) {
> > +					if (enable)
> > +						eeh_enable_irq(edev);
> > +					else
> > +						eeh_disable_irq(edev);
> > +					eeh_pcid_put(edev->pdev);
> > +				}
> 
> Yikes.
> 
> What about?
> 
> 	eeh_for_each_pe(root, pe) {
> 		eeh_pe_for_each_dev(pe, edev, tmp) {
> 			if (!eeh_edev_actionable(edev))
> 				continue;
> 
> 			if (!eeh_pcid_get(edev->pdev))
> 				continue;
> 
> 			if (enable)
> 				eeh_enable_irq(edev);
> 			else
> 				eeh_disable_irq(edev);
> 
> 			eeh_pcid_put(edev->pdev);
> 		}
> 	}
> 
> cheers

Sure, will do.

Cheers,

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply

* Re: [PATCH 07/13] powerpc/eeh: Clean up pci_ers_result handling
From: Sam Bobroff @ 2018-05-08  1:09 UTC (permalink / raw)
  To: Russell Currey; +Cc: linuxppc-dev
In-Reply-To: <1525417081.2560.10.camel@russell.cc>

[-- Attachment #1: Type: text/plain, Size: 4248 bytes --]

On Fri, May 04, 2018 at 04:58:01PM +1000, Russell Currey wrote:
> On Wed, 2018-05-02 at 16:36 +1000, Sam Bobroff wrote:
> > As EEH event handling progresses, a cumulative result of type
> > pci_ers_result is built up by (some of) the eeh_report_*() functions
> > using either:
> > 	if (rc == PCI_ERS_RESULT_NEED_RESET) *res = rc;
> > 	if (*res == PCI_ERS_RESULT_NONE) *res = rc;
> > or:
> > 	if ((*res == PCI_ERS_RESULT_NONE) ||
> > 	    (*res == PCI_ERS_RESULT_RECOVERED)) *res = rc;
> > 	if (*res == PCI_ERS_RESULT_DISCONNECT &&
> > 	    rc == PCI_ERS_RESULT_NEED_RESET) *res = rc;
> > (Where *res is the accumulator.)
> > 
> > However, the intent is not immediately clear and the result in some
> > situations is order dependent.
> > 
> > Address this by assigning a priority to each result value, and always
> > merging to the highest priority. This renders the intent clear, and
> > provides a stable value for all orderings.
> > 
> > Signed-off-by: Sam Bobroff <sbobroff@linux.ibm.com>
> > ---
> >  arch/powerpc/kernel/eeh_driver.c | 36 ++++++++++++++++++++++++++--
> > --------
> >  1 file changed, 26 insertions(+), 10 deletions(-)
> > 
> > diff --git a/arch/powerpc/kernel/eeh_driver.c
> > b/arch/powerpc/kernel/eeh_driver.c
> > index 188d15c4fe3a..f33dd68a9ca2 100644
> > --- a/arch/powerpc/kernel/eeh_driver.c
> > +++ b/arch/powerpc/kernel/eeh_driver.c
> > @@ -39,6 +39,29 @@ struct eeh_rmv_data {
> >  	int removed;
> >  };
> >  
> > +static int eeh_result_priority(enum pci_ers_result result)
> > +{
> > +	switch (result) {
> > +	case PCI_ERS_RESULT_NONE: return 0;
> > +	case PCI_ERS_RESULT_NO_AER_DRIVER: return 1;
> > +	case PCI_ERS_RESULT_RECOVERED: return 2;
> > +	case PCI_ERS_RESULT_CAN_RECOVER: return 3;
> > +	case PCI_ERS_RESULT_DISCONNECT: return 4;
> > +	case PCI_ERS_RESULT_NEED_RESET: return 5;
> > +	default:
> > +		WARN_ONCE(1, "Unknown pci_ers_result value");
> 
> Missing newline and also would be good to print the value was

Sounds good, will do! I'm not sure if it's mentioned elsewhere but I'll
fix the same issues in pci_ers_result_name() as well.

> > +		return 0;
> > +	}
> > +};
> > +
> > +static enum pci_ers_result merge_result(enum pci_ers_result old,
> > +					enum pci_ers_result new)
> 
> merge_result() sounds like something really generic, maybe call it
> pci_ers_merge_result() or something?

Sounds good, will do.

> Note: just learned that it stands for Error Recovery System and that's
> a thing (?)
> 
> > +{
> > +	if (eeh_result_priority(new) > eeh_result_priority(old))
> > +		return new;
> > +	return old;
> 
> MAX would be nicer as per mpe's suggestion

Sorry, I don't understand. The return value isn't MAX(new, old) so
how can MAX() do this?

> > +}
> > +
> >  /**
> >   * eeh_pcid_get - Get the PCI device driver
> >   * @pdev: PCI device
> > @@ -206,9 +229,7 @@ static void *eeh_report_error(struct eeh_dev
> > *edev, void *userdata)
> >  
> >  	rc = driver->err_handler->error_detected(dev,
> > pci_channel_io_frozen);
> >  
> > -	/* A driver that needs a reset trumps all others */
> > -	if (rc == PCI_ERS_RESULT_NEED_RESET) *res = rc;
> > -	if (*res == PCI_ERS_RESULT_NONE) *res = rc;
> > +	*res = merge_result(*res, rc);
> >  
> >  	edev->in_error = true;
> >  	pci_uevent_ers(dev, PCI_ERS_RESULT_NONE);
> > @@ -249,9 +270,7 @@ static void *eeh_report_mmio_enabled(struct
> > eeh_dev *edev, void *userdata)
> >  
> >  	rc = driver->err_handler->mmio_enabled(dev);
> >  
> > -	/* A driver that needs a reset trumps all others */
> > -	if (rc == PCI_ERS_RESULT_NEED_RESET) *res = rc;
> > -	if (*res == PCI_ERS_RESULT_NONE) *res = rc;
> > +	*res = merge_result(*res, rc);
> >  
> >  out:
> >  	eeh_pcid_put(dev);
> > @@ -294,10 +313,7 @@ static void *eeh_report_reset(struct eeh_dev
> > *edev, void *userdata)
> >  		goto out;
> >  
> >  	rc = driver->err_handler->slot_reset(dev);
> > -	if ((*res == PCI_ERS_RESULT_NONE) ||
> > -	    (*res == PCI_ERS_RESULT_RECOVERED)) *res = rc;
> > -	if (*res == PCI_ERS_RESULT_DISCONNECT &&
> > -	     rc == PCI_ERS_RESULT_NEED_RESET) *res = rc;
> > +	*res = merge_result(*res, rc);
> >  
> >  out:
> >  	eeh_pcid_put(dev);
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply

* Re: [PATCH v2 6/7] ocxl: Add an IOCTL so userspace knows what CPU features are available
From: Alastair D'Silva @ 2018-05-08  0:41 UTC (permalink / raw)
  To: Frederic Barrat, linuxppc-dev
  Cc: linux-kernel, linux-doc, mikey, vaibhav, aneesh.kumar, malat,
	felix, pombredanne, sukadev, npiggin, gregkh, arnd,
	andrew.donnellan, fbarrat, corbet
In-Reply-To: <eec7dceb-0b25-8644-7f32-3700c3c4ed46@linux.ibm.com>

On Mon, 2018-05-07 at 20:14 +0200, Frederic Barrat wrote:
> 
> Le 18/04/2018 à 03:08, Alastair D'Silva a écrit :
> > From: Alastair D'Silva <alastair@d-silva.org>
> > 
> > In order for a userspace AFU driver to call the Power9 specific
> > OCXL_IOCTL_ENABLE_P9_WAIT, it needs to verify that it can actually
> > make that call.
> > 
> > Signed-off-by: Alastair D'Silva <alastair@d-silva.org>
> > ---
> >   Documentation/accelerators/ocxl.rst |  1 -
> >   drivers/misc/ocxl/file.c            | 25
> > +++++++++++++++++++++++++
> >   include/uapi/misc/ocxl.h            |  4 ++++
> >   3 files changed, 29 insertions(+), 1 deletion(-)
> > 
> > diff --git a/Documentation/accelerators/ocxl.rst
> > b/Documentation/accelerators/ocxl.rst
> > index ddcc58d01cfb..7904adcc07fd 100644
> > --- a/Documentation/accelerators/ocxl.rst
> > +++ b/Documentation/accelerators/ocxl.rst
> > @@ -157,7 +157,6 @@ OCXL_IOCTL_GET_METADATA:
> >     Obtains configuration information from the card, such at the
> > size of
> >     MMIO areas, the AFU version, and the PASID for the current
> > context.
> > 
> > -
> 
> 
> Intended?
> 
> Other than that,
> Acked-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
> 

Nope, I'll fix that, thanks.

-- 
Alastair D'Silva
Open Source Developer
Linux Technology Centre, IBM Australiamob: 0423 762 819

^ permalink raw reply

* Re: [PATCH v2 3/7] powerpc: use task_pid_nr() for TID allocation
From: Alastair D'Silva @ 2018-05-08  0:40 UTC (permalink / raw)
  To: Frederic Barrat, linuxppc-dev
  Cc: linux-kernel, linux-doc, mikey, vaibhav, aneesh.kumar, malat,
	felix, pombredanne, sukadev, npiggin, gregkh, arnd,
	andrew.donnellan, fbarrat, corbet
In-Reply-To: <b8aa28c4-9171-cb7f-9a43-22f2f15657b0@linux.ibm.com>

On Mon, 2018-05-07 at 19:37 +0200, Frederic Barrat wrote:
> 
> Le 18/04/2018 à 03:08, Alastair D'Silva a écrit :
> > From: Alastair D'Silva <alastair@d-silva.org>
> > 
> > The current implementation of TID allocation, using a global IDR,
> > may
> > result in an errant process starving the system of available TIDs.
> > Instead, use task_pid_nr(), as mentioned by the original author.
> > The
> > scenario described which prevented it's use is not applicable, as
> > set_thread_tidr can only be called after the task struct has been
> > populated.
> 
> 
> Here is how I understand what's going to happen if 2 threads are
> using 
> the same TIDR value, which is possible with this patch (if unlikely):
> 
> 1. waking up the wrong thread is not really a problem, as threads
> have 
> to handle spurious wake up from the 'wait' instruction anyway, and
> must 
> be using some other condition to know when to loop around the 'wait' 
> instruction.
> 
> 2. missing the right thread: if the wrong thread is on a CPU, and a 
> wake_host_thread/as_notify is sent, the core will see a matching
> thread 
> and will accept the command. The (open)capi adapter won't send an 
> interrupt. The wrong thread is awaken, which is not a problem as 
> discussed above. As the right thread to notify is not running, no
> harm 
> is done either: as soon as the thread runs, it's supposed to check
> its 
> condition (which will be met) or call 'wait', but 'wait' immediately 
> returns when called the first time after a thread is scheduled.
> 
> So I believe we are ok. But I think it requires a huge comment with
> the 
> above (at the minimum) :-)
> 
> With a comment:
> Reviewed-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
> 
>    Fred
> 

Good point, I'll add this in the next revision.


-- 
Alastair D'Silva
Open Source Developer
Linux Technology Centre, IBM Australiamob: 0423 762 819

^ permalink raw reply

* Re: [PATCH] cxl: disable the lazy approach for irqs in POWERVM environment.
From: Benjamin Herrenschmidt @ 2018-05-07 21:01 UTC (permalink / raw)
  To: christophe lombard, linuxppc-dev, vaibhav, andrew.donnellan
In-Reply-To: <62dca845-827f-dd31-5cff-6eed3e5a5d86@linux.vnet.ibm.com>

On Mon, 2018-05-07 at 18:02 +0200, christophe lombard wrote:
> 
> To answer to your questions, here is the timeline in the cxl driver
> 
>    1. call disable_irq()
> 
>    2. call plpar_hcall9() to attach a process element
> 	During this phase, phyp (as described in CAPI PAPR document) 	
> 	disables the virtual interrupts provided in the process-element-
> 	struct.
> 	Then the partition must use ibm,set-xive to enable the virtual
> 	interrupt source after H_ATTACH_CA_PROCESS completes
> 	successfully.

Ok, this is the breakage. This PAPR API is badly defined, nothing other
than the existing interrupt management PAPR calls should affect the
enable/disable state of the interrupt The fact that attaching a process
element does is a broken design.

>    3. call enable_irq()
> 	Before the following patch (genirq: Avoid unnecessary low level
> 	irq function calls) be pushed in mainline, the unmask api
> 	(= ics_rtas_unmask_irq()) was called by default and everything
> 	was ok.
> 
> 	With this patch, the unmask api is now called only if 		
> 	IRQD_IRQ_MASKED is set and in our case, it's never done and the
> 	irqs remain masked.
> 
> 
>    So, there is a disconnect between the 'HW' (phyp) state of the
>    interrupt and the IRQD_IRQ_MASKED flag.
> 
>    In the timeline of the cxl driver, the IRQD_IRQ_MASKED flag is never
>    set.
>    This flag was not used in enable_irq(), before that the patch "genirq:
>    Avoid ..." was pushed. So ics_rtas_unmask_irq() was called and
>    everything was ok.
> 
> Thanks
> 
> > > 
> > > 
> > > 
> > > > > Signed-off-by: Christophe Lombard <clombard@linux.vnet.ibm.com>
> > > > > ---
> > > > >    drivers/misc/cxl/guest.c | 1 +
> > > > >    1 file changed, 1 insertion(+)
> > > > > 
> > > > > diff --git a/drivers/misc/cxl/guest.c b/drivers/misc/cxl/guest.c
> > > > > index f58b4b6c..dc476e1 100644
> > > > > --- a/drivers/misc/cxl/guest.c
> > > > > +++ b/drivers/misc/cxl/guest.c
> > > > > @@ -389,6 +389,7 @@ static void disable_afu_irqs(struct cxl_context *ctx)
> > > > >    		hwirq = ctx->irqs.offset[r];
> > > > >    		for (i = 0; i < ctx->irqs.range[r]; hwirq++, i++) {
> > > > >    			virq = irq_find_mapping(NULL, hwirq);
> > > > > +			irq_set_status_flags(virq, IRQ_DISABLE_UNLAZY);
> > > > >    			disable_irq(virq);
> > > > >    		}
> > > > >    	}

^ permalink raw reply

* Re: [PATCH v2 7/7] ocxl: Document new OCXL IOCTLs
From: Frederic Barrat @ 2018-05-07 18:15 UTC (permalink / raw)
  To: Alastair D'Silva, linuxppc-dev
  Cc: linux-kernel, linux-doc, mikey, vaibhav, aneesh.kumar, malat,
	felix, pombredanne, sukadev, npiggin, gregkh, arnd,
	andrew.donnellan, fbarrat, corbet, Alastair D'Silva
In-Reply-To: <20180418010810.30937-8-alastair@au1.ibm.com>



Le 18/04/2018 à 03:08, Alastair D'Silva a écrit :
> From: Alastair D'Silva <alastair@d-silva.org>
> 
> Signed-off-by: Alastair D'Silva <alastair@d-silva.org>
> ---

Acked-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>

   Fred


>   Documentation/accelerators/ocxl.rst | 11 +++++++++++
>   1 file changed, 11 insertions(+)
> 
> diff --git a/Documentation/accelerators/ocxl.rst b/Documentation/accelerators/ocxl.rst
> index 7904adcc07fd..3b8d3b99795c 100644
> --- a/Documentation/accelerators/ocxl.rst
> +++ b/Documentation/accelerators/ocxl.rst
> @@ -157,6 +157,17 @@ OCXL_IOCTL_GET_METADATA:
>     Obtains configuration information from the card, such at the size of
>     MMIO areas, the AFU version, and the PASID for the current context.
> 
> +OCXL_IOCTL_ENABLE_P9_WAIT:
> +
> +  Allows the AFU to wake a userspace thread executing 'wait'. Returns
> +  information to userspace to allow it to configure the AFU. Note that
> +  this is only available on Power 9.
> +
> +OCXL_IOCTL_GET_FEATURES:
> +
> +  Reports on which CPU features that affect OpenCAPI are usable from
> +  userspace.
> +
>   mmap
>   ----
> 

^ permalink raw reply

* Re: [PATCH v2 6/7] ocxl: Add an IOCTL so userspace knows what CPU features are available
From: Frederic Barrat @ 2018-05-07 18:14 UTC (permalink / raw)
  To: Alastair D'Silva, linuxppc-dev
  Cc: linux-kernel, linux-doc, mikey, vaibhav, aneesh.kumar, malat,
	felix, pombredanne, sukadev, npiggin, gregkh, arnd,
	andrew.donnellan, fbarrat, corbet, Alastair D'Silva
In-Reply-To: <20180418010810.30937-7-alastair@au1.ibm.com>



Le 18/04/2018 à 03:08, Alastair D'Silva a écrit :
> From: Alastair D'Silva <alastair@d-silva.org>
> 
> In order for a userspace AFU driver to call the Power9 specific
> OCXL_IOCTL_ENABLE_P9_WAIT, it needs to verify that it can actually
> make that call.
> 
> Signed-off-by: Alastair D'Silva <alastair@d-silva.org>
> ---
>   Documentation/accelerators/ocxl.rst |  1 -
>   drivers/misc/ocxl/file.c            | 25 +++++++++++++++++++++++++
>   include/uapi/misc/ocxl.h            |  4 ++++
>   3 files changed, 29 insertions(+), 1 deletion(-)
> 
> diff --git a/Documentation/accelerators/ocxl.rst b/Documentation/accelerators/ocxl.rst
> index ddcc58d01cfb..7904adcc07fd 100644
> --- a/Documentation/accelerators/ocxl.rst
> +++ b/Documentation/accelerators/ocxl.rst
> @@ -157,7 +157,6 @@ OCXL_IOCTL_GET_METADATA:
>     Obtains configuration information from the card, such at the size of
>     MMIO areas, the AFU version, and the PASID for the current context.
> 
> -


Intended?

Other than that,
Acked-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>



>   mmap
>   ----
> 
> diff --git a/drivers/misc/ocxl/file.c b/drivers/misc/ocxl/file.c
> index eb409a469f21..33ae46ce0a8a 100644
> --- a/drivers/misc/ocxl/file.c
> +++ b/drivers/misc/ocxl/file.c
> @@ -168,12 +168,32 @@ static long afu_ioctl_enable_p9_wait(struct ocxl_context *ctx,
>   }
>   #endif
> 
> +
> +static long afu_ioctl_get_features(struct ocxl_context *ctx,
> +		struct ocxl_ioctl_features __user *uarg)
> +{
> +	struct ocxl_ioctl_features arg;
> +
> +	memset(&arg, 0, sizeof(arg));
> +
> +#ifdef CONFIG_PPC64
> +	if (cpu_has_feature(CPU_FTR_P9_TIDR))
> +		arg.flags[0] |= OCXL_IOCTL_FEATURES_FLAGS0_P9_WAIT;
> +#endif
> +
> +	if (copy_to_user(uarg, &arg, sizeof(arg)))
> +		return -EFAULT;
> +
> +	return 0;
> +}
> +
>   #define CMD_STR(x) (x == OCXL_IOCTL_ATTACH ? "ATTACH" :			\
>   			x == OCXL_IOCTL_IRQ_ALLOC ? "IRQ_ALLOC" :	\
>   			x == OCXL_IOCTL_IRQ_FREE ? "IRQ_FREE" :		\
>   			x == OCXL_IOCTL_IRQ_SET_FD ? "IRQ_SET_FD" :	\
>   			x == OCXL_IOCTL_GET_METADATA ? "GET_METADATA" :	\
>   			x == OCXL_IOCTL_ENABLE_P9_WAIT ? "ENABLE_P9_WAIT" :	\
> +			x == OCXL_IOCTL_GET_FEATURES ? "GET_FEATURES" :	\
>   			"UNKNOWN")
> 
>   static long afu_ioctl(struct file *file, unsigned int cmd,
> @@ -239,6 +259,11 @@ static long afu_ioctl(struct file *file, unsigned int cmd,
>   		break;
>   #endif
> 
> +	case OCXL_IOCTL_GET_FEATURES:
> +		rc = afu_ioctl_get_features(ctx,
> +				(struct ocxl_ioctl_features __user *) args);
> +		break;
> +
>   	default:
>   		rc = -EINVAL;
>   	}
> diff --git a/include/uapi/misc/ocxl.h b/include/uapi/misc/ocxl.h
> index 8d2748e69c84..bb80f294b429 100644
> --- a/include/uapi/misc/ocxl.h
> +++ b/include/uapi/misc/ocxl.h
> @@ -55,6 +55,9 @@ struct ocxl_ioctl_p9_wait {
>   	__u64 reserved3[3];
>   };
> 
> +#define OCXL_IOCTL_FEATURES_FLAGS0_P9_WAIT	0x01
> +struct ocxl_ioctl_features {
> +	__u64 flags[4];
>   };
> 
>   struct ocxl_ioctl_irq_fd {
> @@ -72,5 +75,6 @@ struct ocxl_ioctl_irq_fd {
>   #define OCXL_IOCTL_IRQ_SET_FD	_IOW(OCXL_MAGIC, 0x13, struct ocxl_ioctl_irq_fd)
>   #define OCXL_IOCTL_GET_METADATA _IOR(OCXL_MAGIC, 0x14, struct ocxl_ioctl_metadata)
>   #define OCXL_IOCTL_ENABLE_P9_WAIT	_IOR(OCXL_MAGIC, 0x15, struct ocxl_ioctl_p9_wait)
> +#define OCXL_IOCTL_GET_FEATURES _IOR(OCXL_MAGIC, 0x16, struct ocxl_ioctl_platform)
> 
>   #endif /* _UAPI_MISC_OCXL_H */
> 

^ permalink raw reply

* Re: [PATCH v2 5/7] ocxl: Expose the thread_id needed for wait on p9
From: Frederic Barrat @ 2018-05-07 18:08 UTC (permalink / raw)
  To: Alastair D'Silva, linuxppc-dev
  Cc: linux-kernel, linux-doc, mikey, vaibhav, aneesh.kumar, malat,
	felix, pombredanne, sukadev, npiggin, gregkh, arnd,
	andrew.donnellan, fbarrat, corbet, Alastair D'Silva
In-Reply-To: <20180418010810.30937-6-alastair@au1.ibm.com>



Le 18/04/2018 à 03:08, Alastair D'Silva a écrit :
> From: Alastair D'Silva <alastair@d-silva.org>
> 
> In order to successfully issue as_notify, an AFU needs to know the TID
> to notify, which in turn means that this information should be
> available in userspace so it can be communicated to the AFU.
> 
> Signed-off-by: Alastair D'Silva <alastair@d-silva.org>
> ---
>   drivers/misc/ocxl/context.c       |  5 +++-
>   drivers/misc/ocxl/file.c          | 53 +++++++++++++++++++++++++++++++++++++++
>   drivers/misc/ocxl/link.c          | 36 ++++++++++++++++++++++++++
>   drivers/misc/ocxl/ocxl_internal.h |  1 +
>   include/misc/ocxl.h               |  9 +++++++
>   include/uapi/misc/ocxl.h          | 10 ++++++++
>   6 files changed, 113 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/misc/ocxl/context.c b/drivers/misc/ocxl/context.c
> index 909e8807824a..95f74623113e 100644
> --- a/drivers/misc/ocxl/context.c
> +++ b/drivers/misc/ocxl/context.c
> @@ -34,6 +34,8 @@ int ocxl_context_init(struct ocxl_context *ctx, struct ocxl_afu *afu,
>   	mutex_init(&ctx->xsl_error_lock);
>   	mutex_init(&ctx->irq_lock);
>   	idr_init(&ctx->irq_idr);
> +	ctx->tidr = 0;
> +
>   	/*
>   	 * Keep a reference on the AFU to make sure it's valid for the
>   	 * duration of the life of the context
> @@ -65,6 +67,7 @@ int ocxl_context_attach(struct ocxl_context *ctx, u64 amr)
>   {
>   	int rc;
> 
> +	// Locks both status & tidr
>   	mutex_lock(&ctx->status_mutex);
>   	if (ctx->status != OPENED) {
>   		rc = -EIO;
> @@ -72,7 +75,7 @@ int ocxl_context_attach(struct ocxl_context *ctx, u64 amr)
>   	}
> 
>   	rc = ocxl_link_add_pe(ctx->afu->fn->link, ctx->pasid,
> -			current->mm->context.id, 0, amr, current->mm,
> +			current->mm->context.id, ctx->tidr, amr, current->mm,
>   			xsl_fault_error, ctx);
>   	if (rc)
>   		goto out;
> diff --git a/drivers/misc/ocxl/file.c b/drivers/misc/ocxl/file.c
> index 038509e5d031..eb409a469f21 100644
> --- a/drivers/misc/ocxl/file.c
> +++ b/drivers/misc/ocxl/file.c
> @@ -5,6 +5,8 @@
>   #include <linux/sched/signal.h>
>   #include <linux/uaccess.h>
>   #include <uapi/misc/ocxl.h>
> +#include <asm/reg.h>
> +#include <asm/switch_to.h>
>   #include "ocxl_internal.h"
> 
> 
> @@ -123,11 +125,55 @@ static long afu_ioctl_get_metadata(struct ocxl_context *ctx,
>   	return 0;
>   }
> 
> +#ifdef CONFIG_PPC64
> +static long afu_ioctl_enable_p9_wait(struct ocxl_context *ctx,
> +		struct ocxl_ioctl_p9_wait __user *uarg)
> +{
> +	struct ocxl_ioctl_p9_wait arg;
> +
> +	memset(&arg, 0, sizeof(arg));
> +
> +	if (cpu_has_feature(CPU_FTR_P9_TIDR)) {
> +		enum ocxl_context_status status;
> +
> +		// Locks both status & tidr
> +		mutex_lock(&ctx->status_mutex);
> +		if (!ctx->tidr) {
> +			if (set_thread_tidr(current))
> +				return -ENOENT;
> +
> +			ctx->tidr = current->thread.tidr;
> +		}


Now that we don't have the TIDR limit problem, I'm wondering if we 
cannot relax our rule a bit and have:
- first thread to enable will become the default thread and update the 
Process element
- any subsequent enable would just allocate the TIDR for the calling thread.

That way, more than one thread could be used for 'wait'.
Thoughts?

    Fred


> +
> +		status = ctx->status;
> +		mutex_unlock(&ctx->status_mutex);
> +
> +		if (status == ATTACHED) {
> +			int rc;
> +			struct link *link = ctx->afu->fn->link;
> +
> +			rc = ocxl_link_update_pe(link, ctx->pasid, ctx->tidr);
> +			if (rc)
> +				return rc;
> +		}
> +
> +		arg.thread_id = ctx->tidr;
> +	} else
> +		return -ENOENT;
> +
> +	if (copy_to_user(uarg, &arg, sizeof(arg)))
> +		return -EFAULT;
> +
> +	return 0;
> +}
> +#endif
> +
>   #define CMD_STR(x) (x == OCXL_IOCTL_ATTACH ? "ATTACH" :			\
>   			x == OCXL_IOCTL_IRQ_ALLOC ? "IRQ_ALLOC" :	\
>   			x == OCXL_IOCTL_IRQ_FREE ? "IRQ_FREE" :		\
>   			x == OCXL_IOCTL_IRQ_SET_FD ? "IRQ_SET_FD" :	\
>   			x == OCXL_IOCTL_GET_METADATA ? "GET_METADATA" :	\
> +			x == OCXL_IOCTL_ENABLE_P9_WAIT ? "ENABLE_P9_WAIT" :	\
>   			"UNKNOWN")
> 
>   static long afu_ioctl(struct file *file, unsigned int cmd,
> @@ -186,6 +232,13 @@ static long afu_ioctl(struct file *file, unsigned int cmd,
>   				(struct ocxl_ioctl_metadata __user *) args);
>   		break;
> 
> +#ifdef CONFIG_PPC64
> +	case OCXL_IOCTL_ENABLE_P9_WAIT:
> +		rc = afu_ioctl_enable_p9_wait(ctx,
> +				(struct ocxl_ioctl_p9_wait __user *) args);
> +		break;
> +#endif
> +
>   	default:
>   		rc = -EINVAL;
>   	}
> diff --git a/drivers/misc/ocxl/link.c b/drivers/misc/ocxl/link.c
> index 656e8610eec2..88876ae8f330 100644
> --- a/drivers/misc/ocxl/link.c
> +++ b/drivers/misc/ocxl/link.c
> @@ -544,6 +544,42 @@ int ocxl_link_add_pe(void *link_handle, int pasid, u32 pidr, u32 tidr,
>   }
>   EXPORT_SYMBOL_GPL(ocxl_link_add_pe);
> 
> +int ocxl_link_update_pe(void *link_handle, int pasid, __u16 tid)
> +{
> +	struct link *link = (struct link *) link_handle;
> +	struct spa *spa = link->spa;
> +	struct ocxl_process_element *pe;
> +	int pe_handle, rc;
> +
> +	if (pasid > SPA_PASID_MAX)
> +		return -EINVAL;
> +
> +	pe_handle = pasid & SPA_PE_MASK;
> +	pe = spa->spa_mem + pe_handle;
> +
> +	mutex_lock(&spa->spa_lock);
> +
> +	pe->tid = tid;
> +
> +	/*
> +	 * The barrier makes sure the PE is updated
> +	 * before we clear the NPU context cache below, so that the
> +	 * old PE cannot be reloaded erroneously.
> +	 */
> +	mb();
> +
> +	/*
> +	 * hook to platform code
> +	 * On powerpc, the entry needs to be cleared from the context
> +	 * cache of the NPU.
> +	 */
> +	rc = pnv_ocxl_spa_remove_pe_from_cache(link->platform_data, pe_handle);
> +	WARN_ON(rc);
> +
> +	mutex_unlock(&spa->spa_lock);
> +	return rc;
> +}
> +
>   int ocxl_link_remove_pe(void *link_handle, int pasid)
>   {
>   	struct link *link = (struct link *) link_handle;
> diff --git a/drivers/misc/ocxl/ocxl_internal.h b/drivers/misc/ocxl/ocxl_internal.h
> index 5d421824afd9..6c6d4e61888e 100644
> --- a/drivers/misc/ocxl/ocxl_internal.h
> +++ b/drivers/misc/ocxl/ocxl_internal.h
> @@ -77,6 +77,7 @@ struct ocxl_context {
>   	struct ocxl_xsl_error xsl_error;
>   	struct mutex irq_lock;
>   	struct idr irq_idr;
> +	__u16 tidr; // Thread ID used for P9 wait implementation
>   };
> 
>   struct ocxl_process_element {
> diff --git a/include/misc/ocxl.h b/include/misc/ocxl.h
> index 51ccf76db293..9ff6ddc28e22 100644
> --- a/include/misc/ocxl.h
> +++ b/include/misc/ocxl.h
> @@ -188,6 +188,15 @@ extern int ocxl_link_add_pe(void *link_handle, int pasid, u32 pidr, u32 tidr,
>   		void (*xsl_err_cb)(void *data, u64 addr, u64 dsisr),
>   		void *xsl_err_data);
> 
> +/**
> + * Update values within a Process Element
> + *
> + * link_handle: the link handle associated with the process element
> + * pasid: the PASID for the AFU context
> + * tid: the new thread id for the process element
> + */
> +extern int ocxl_link_update_pe(void *link_handle, int pasid, __u16 tid);
> +
>   /*
>    * Remove a Process Element from the Shared Process Area for a link
>    */
> diff --git a/include/uapi/misc/ocxl.h b/include/uapi/misc/ocxl.h
> index 0af83d80fb3e..8d2748e69c84 100644
> --- a/include/uapi/misc/ocxl.h
> +++ b/include/uapi/misc/ocxl.h
> @@ -48,6 +48,15 @@ struct ocxl_ioctl_metadata {
>   	__u64 reserved[13]; // Total of 16*u64
>   };
> 
> +struct ocxl_ioctl_p9_wait {
> +	__u16 thread_id; // The thread ID required to wake this thread
> +	__u16 reserved1;
> +	__u32 reserved2;
> +	__u64 reserved3[3];
> +};
> +
> +};
> +
>   struct ocxl_ioctl_irq_fd {
>   	__u64 irq_offset;
>   	__s32 eventfd;
> @@ -62,5 +71,6 @@ struct ocxl_ioctl_irq_fd {
>   #define OCXL_IOCTL_IRQ_FREE	_IOW(OCXL_MAGIC, 0x12, __u64)
>   #define OCXL_IOCTL_IRQ_SET_FD	_IOW(OCXL_MAGIC, 0x13, struct ocxl_ioctl_irq_fd)
>   #define OCXL_IOCTL_GET_METADATA _IOR(OCXL_MAGIC, 0x14, struct ocxl_ioctl_metadata)
> +#define OCXL_IOCTL_ENABLE_P9_WAIT	_IOR(OCXL_MAGIC, 0x15, struct ocxl_ioctl_p9_wait)
> 
>   #endif /* _UAPI_MISC_OCXL_H */
> 

^ permalink raw reply

* Re: [PATCH v2 0/4] powerpc: wii_defconfig updates
From: Jonathan Neuschäfer @ 2018-05-07 17:47 UTC (permalink / raw)
  To: Jonathan Neuschäfer
  Cc: linuxppc-dev, Joel Stanley, linux-kernel, Benjamin Gilbert,
	Robin H. Johnson, Greg Kroah-Hartman, Michael Ellerman,
	Paul Mackerras, Benjamin Herrenschmidt
In-Reply-To: <20180507142019.32669-1-j.neuschaefer@gmx.net>

[-- Attachment #1: Type: text/plain, Size: 1151 bytes --]

I forgot to CC the right set of people/mailing lists on the cover
letter. Sorry. Here it is:

On Mon, May 07, 2018 at 04:20:15PM +0200, Jonathan Neuschäfer wrote:
> v1: https://www.spinics.net/lists/kernel/msg2790389.html
>     https://www.spinics.net/lists/kernel/msg2790385.html
> 
> In the previous version of patch 2, I forgot to set CONFIG_NEW_LEDS and
> CONFIG_LEDS_TRIGGERS, so the more specific LED-related options weren't
> actually enabled, due to Kconfig dependencies. This is now fixed.
> 
> I took the opportunity of a v2 to add two more patches that I wanted to
> send anyway. The SDHCIs in the Wii are currently unusable due to bugs/
> problems in the flipper-pic/hlwd-pic drivers, but I know how to fix
> those, and will send patches.
> 
> Jonathan Neuschäfer (4):
>   powerpc: wii_defconfig: Disable Ethernet driver support code
>   powerpc: wii_defconfig: Enable GPIO-related options
>   powerpc: wii_defconfig: Enable Wii SDHCI driver
>   powerpc: wii_defconfig: Disable BCMA support
> 
>  arch/powerpc/configs/wii_defconfig | 14 ++++++++++++++
>  1 file changed, 14 insertions(+)
> 
> -- 
> 2.17.0
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply

* Re: [PATCH v2 4/7] ocxl: Rename pnv_ocxl_spa_remove_pe to clarify it's action
From: Frederic Barrat @ 2018-05-07 17:38 UTC (permalink / raw)
  To: Alastair D'Silva, linuxppc-dev
  Cc: linux-kernel, linux-doc, mikey, vaibhav, aneesh.kumar, malat,
	felix, pombredanne, sukadev, npiggin, gregkh, arnd,
	andrew.donnellan, fbarrat, corbet, Alastair D'Silva
In-Reply-To: <20180418010810.30937-5-alastair@au1.ibm.com>



Le 18/04/2018 à 03:08, Alastair D'Silva a écrit :
> From: Alastair D'Silva <alastair@d-silva.org>
> 
> The function removes the process element from NPU cache.
> 
> Signed-off-by: Alastair D'Silva <alastair@d-silva.org>
> ---

Acked-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>


>   arch/powerpc/include/asm/pnv-ocxl.h   | 2 +-
>   arch/powerpc/platforms/powernv/ocxl.c | 4 ++--
>   drivers/misc/ocxl/link.c              | 2 +-
>   3 files changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/pnv-ocxl.h b/arch/powerpc/include/asm/pnv-ocxl.h
> index f6945d3bc971..208b5503f4ed 100644
> --- a/arch/powerpc/include/asm/pnv-ocxl.h
> +++ b/arch/powerpc/include/asm/pnv-ocxl.h
> @@ -28,7 +28,7 @@ extern int pnv_ocxl_map_xsl_regs(struct pci_dev *dev, void __iomem **dsisr,
>   extern int pnv_ocxl_spa_setup(struct pci_dev *dev, void *spa_mem, int PE_mask,
>   			void **platform_data);
>   extern void pnv_ocxl_spa_release(void *platform_data);
> -extern int pnv_ocxl_spa_remove_pe(void *platform_data, int pe_handle);
> +extern int pnv_ocxl_spa_remove_pe_from_cache(void *platform_data, int pe_handle);
> 
>   extern int pnv_ocxl_alloc_xive_irq(u32 *irq, u64 *trigger_addr);
>   extern void pnv_ocxl_free_xive_irq(u32 irq);
> diff --git a/arch/powerpc/platforms/powernv/ocxl.c b/arch/powerpc/platforms/powernv/ocxl.c
> index fa9b53af3c7b..8c65aacda9c8 100644
> --- a/arch/powerpc/platforms/powernv/ocxl.c
> +++ b/arch/powerpc/platforms/powernv/ocxl.c
> @@ -475,7 +475,7 @@ void pnv_ocxl_spa_release(void *platform_data)
>   }
>   EXPORT_SYMBOL_GPL(pnv_ocxl_spa_release);
> 
> -int pnv_ocxl_spa_remove_pe(void *platform_data, int pe_handle)
> +int pnv_ocxl_spa_remove_pe_from_cache(void *platform_data, int pe_handle)
>   {
>   	struct spa_data *data = (struct spa_data *) platform_data;
>   	int rc;
> @@ -483,7 +483,7 @@ int pnv_ocxl_spa_remove_pe(void *platform_data, int pe_handle)
>   	rc = opal_npu_spa_clear_cache(data->phb_opal_id, data->bdfn, pe_handle);
>   	return rc;
>   }
> -EXPORT_SYMBOL_GPL(pnv_ocxl_spa_remove_pe);
> +EXPORT_SYMBOL_GPL(pnv_ocxl_spa_remove_pe_from_cache);
> 
>   int pnv_ocxl_alloc_xive_irq(u32 *irq, u64 *trigger_addr)
>   {
> diff --git a/drivers/misc/ocxl/link.c b/drivers/misc/ocxl/link.c
> index f30790582dc0..656e8610eec2 100644
> --- a/drivers/misc/ocxl/link.c
> +++ b/drivers/misc/ocxl/link.c
> @@ -599,7 +599,7 @@ int ocxl_link_remove_pe(void *link_handle, int pasid)
>   	 * On powerpc, the entry needs to be cleared from the context
>   	 * cache of the NPU.
>   	 */
> -	rc = pnv_ocxl_spa_remove_pe(link->platform_data, pe_handle);
> +	rc = pnv_ocxl_spa_remove_pe_from_cache(link->platform_data, pe_handle);
>   	WARN_ON(rc);
> 
>   	pe_data = radix_tree_delete(&spa->pe_tree, pe_handle);
> 

^ permalink raw reply

* Re: [PATCH v2 3/7] powerpc: use task_pid_nr() for TID allocation
From: Frederic Barrat @ 2018-05-07 17:37 UTC (permalink / raw)
  To: Alastair D'Silva, linuxppc-dev
  Cc: linux-kernel, linux-doc, mikey, vaibhav, aneesh.kumar, malat,
	felix, pombredanne, sukadev, npiggin, gregkh, arnd,
	andrew.donnellan, fbarrat, corbet, Alastair D'Silva
In-Reply-To: <20180418010810.30937-4-alastair@au1.ibm.com>



Le 18/04/2018 à 03:08, Alastair D'Silva a écrit :
> From: Alastair D'Silva <alastair@d-silva.org>
> 
> The current implementation of TID allocation, using a global IDR, may
> result in an errant process starving the system of available TIDs.
> Instead, use task_pid_nr(), as mentioned by the original author. The
> scenario described which prevented it's use is not applicable, as
> set_thread_tidr can only be called after the task struct has been
> populated.


Here is how I understand what's going to happen if 2 threads are using 
the same TIDR value, which is possible with this patch (if unlikely):

1. waking up the wrong thread is not really a problem, as threads have 
to handle spurious wake up from the 'wait' instruction anyway, and must 
be using some other condition to know when to loop around the 'wait' 
instruction.

2. missing the right thread: if the wrong thread is on a CPU, and a 
wake_host_thread/as_notify is sent, the core will see a matching thread 
and will accept the command. The (open)capi adapter won't send an 
interrupt. The wrong thread is awaken, which is not a problem as 
discussed above. As the right thread to notify is not running, no harm 
is done either: as soon as the thread runs, it's supposed to check its 
condition (which will be met) or call 'wait', but 'wait' immediately 
returns when called the first time after a thread is scheduled.

So I believe we are ok. But I think it requires a huge comment with the 
above (at the minimum) :-)

With a comment:
Reviewed-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>

   Fred



> Signed-off-by: Alastair D'Silva <alastair@d-silva.org>
> ---
>   arch/powerpc/include/asm/switch_to.h |  1 -
>   arch/powerpc/kernel/process.c        | 97 +-----------------------------------
>   2 files changed, 1 insertion(+), 97 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/switch_to.h b/arch/powerpc/include/asm/switch_to.h
> index be8c9fa23983..5b03d8a82409 100644
> --- a/arch/powerpc/include/asm/switch_to.h
> +++ b/arch/powerpc/include/asm/switch_to.h
> @@ -94,6 +94,5 @@ static inline void clear_task_ebb(struct task_struct *t)
>   extern int set_thread_uses_vas(void);
> 
>   extern int set_thread_tidr(struct task_struct *t);
> -extern void clear_thread_tidr(struct task_struct *t);
> 
>   #endif /* _ASM_POWERPC_SWITCH_TO_H */
> diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
> index 3b00da47699b..87f047fd2762 100644
> --- a/arch/powerpc/kernel/process.c
> +++ b/arch/powerpc/kernel/process.c
> @@ -1496,103 +1496,12 @@ int set_thread_uses_vas(void)
>   }
> 
>   #ifdef CONFIG_PPC64
> -static DEFINE_SPINLOCK(vas_thread_id_lock);
> -static DEFINE_IDA(vas_thread_ida);
> -
> -/*
> - * We need to assign a unique thread id to each thread in a process.
> - *
> - * This thread id, referred to as TIDR, and separate from the Linux's tgid,
> - * is intended to be used to direct an ASB_Notify from the hardware to the
> - * thread, when a suitable event occurs in the system.
> - *
> - * One such event is a "paste" instruction in the context of Fast Thread
> - * Wakeup (aka Core-to-core wake up in the Virtual Accelerator Switchboard
> - * (VAS) in POWER9.
> - *
> - * To get a unique TIDR per process we could simply reuse task_pid_nr() but
> - * the problem is that task_pid_nr() is not yet available copy_thread() is
> - * called. Fixing that would require changing more intrusive arch-neutral
> - * code in code path in copy_process()?.
> - *
> - * Further, to assign unique TIDRs within each process, we need an atomic
> - * field (or an IDR) in task_struct, which again intrudes into the arch-
> - * neutral code. So try to assign globally unique TIDRs for now.
> - *
> - * NOTE: TIDR 0 indicates that the thread does not need a TIDR value.
> - *	 For now, only threads that expect to be notified by the VAS
> - *	 hardware need a TIDR value and we assign values > 0 for those.
> - */
> -#define MAX_THREAD_CONTEXT	((1 << 16) - 1)
> -static int assign_thread_tidr(void)
> -{
> -	int index;
> -	int err;
> -	unsigned long flags;
> -
> -again:
> -	if (!ida_pre_get(&vas_thread_ida, GFP_KERNEL))
> -		return -ENOMEM;
> -
> -	spin_lock_irqsave(&vas_thread_id_lock, flags);
> -	err = ida_get_new_above(&vas_thread_ida, 1, &index);
> -	spin_unlock_irqrestore(&vas_thread_id_lock, flags);
> -
> -	if (err == -EAGAIN)
> -		goto again;
> -	else if (err)
> -		return err;
> -
> -	if (index > MAX_THREAD_CONTEXT) {
> -		spin_lock_irqsave(&vas_thread_id_lock, flags);
> -		ida_remove(&vas_thread_ida, index);
> -		spin_unlock_irqrestore(&vas_thread_id_lock, flags);
> -		return -ENOMEM;
> -	}
> -
> -	return index;
> -}
> -
> -static void free_thread_tidr(int id)
> -{
> -	unsigned long flags;
> -
> -	spin_lock_irqsave(&vas_thread_id_lock, flags);
> -	ida_remove(&vas_thread_ida, id);
> -	spin_unlock_irqrestore(&vas_thread_id_lock, flags);
> -}
> -
> -/*
> - * Clear any TIDR value assigned to this thread.
> - */
> -void clear_thread_tidr(struct task_struct *t)
> -{
> -	if (!t->thread.tidr)
> -		return;
> -
> -	if (!cpu_has_feature(CPU_FTR_P9_TIDR)) {
> -		WARN_ON_ONCE(1);
> -		return;
> -	}
> -
> -	mtspr(SPRN_TIDR, 0);
> -	free_thread_tidr(t->thread.tidr);
> -	t->thread.tidr = 0;
> -}
> -
> -void arch_release_task_struct(struct task_struct *t)
> -{
> -	clear_thread_tidr(t);
> -}
> -
>   /*
>    * Assign a unique TIDR (thread id) for task @t and set it in the thread
>    * structure. For now, we only support setting TIDR for 'current' task.
>    */
>   int set_thread_tidr(struct task_struct *t)
>   {
> -	int rc;
> -
>   	if (!cpu_has_feature(CPU_FTR_P9_TIDR))
>   		return -EINVAL;
> 
> @@ -1602,11 +1511,7 @@ int set_thread_tidr(struct task_struct *t)
>   	if (t->thread.tidr)
>   		return 0;
> 
> -	rc = assign_thread_tidr();
> -	if (rc < 0)
> -		return rc;
> -
> -	t->thread.tidr = rc;
> +	t->thread.tidr = (u16)task_pid_nr(t);
>   	mtspr(SPRN_TIDR, t->thread.tidr);
> 
>   	return 0;
> 

^ permalink raw reply

* Re: [PATCH v2 2/7] powerpc: Use TIDR CPU feature to control TIDR allocation
From: Frederic Barrat @ 2018-05-07 17:19 UTC (permalink / raw)
  To: Alastair D'Silva, linuxppc-dev
  Cc: linux-kernel, linux-doc, mikey, vaibhav, aneesh.kumar, malat,
	felix, pombredanne, sukadev, npiggin, gregkh, arnd,
	andrew.donnellan, fbarrat, corbet, Alastair D'Silva
In-Reply-To: <20180418010810.30937-3-alastair@au1.ibm.com>



Le 18/04/2018 à 03:08, Alastair D'Silva a écrit :
> From: Alastair D'Silva <alastair@d-silva.org>
> 
> Switch the use of TIDR on it's CPU feature, rather than assuming it
> is available based on architecture.
> 
> Signed-off-by: Alastair D'Silva <alastair@d-silva.org>
> ---

Reviewed-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>


>   arch/powerpc/kernel/process.c | 6 +++---
>   1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
> index 1237f13fed51..3b00da47699b 100644
> --- a/arch/powerpc/kernel/process.c
> +++ b/arch/powerpc/kernel/process.c
> @@ -1154,7 +1154,7 @@ static inline void restore_sprs(struct thread_struct *old_thread,
>   			mtspr(SPRN_TAR, new_thread->tar);
>   	}
> 
> -	if (cpu_has_feature(CPU_FTR_ARCH_300) &&
> +	if (cpu_has_feature(CPU_FTR_P9_TIDR) &&
>   	    old_thread->tidr != new_thread->tidr)
>   		mtspr(SPRN_TIDR, new_thread->tidr);
>   #endif
> @@ -1570,7 +1570,7 @@ void clear_thread_tidr(struct task_struct *t)
>   	if (!t->thread.tidr)
>   		return;
> 
> -	if (!cpu_has_feature(CPU_FTR_ARCH_300)) {
> +	if (!cpu_has_feature(CPU_FTR_P9_TIDR)) {
>   		WARN_ON_ONCE(1);
>   		return;
>   	}
> @@ -1593,7 +1593,7 @@ int set_thread_tidr(struct task_struct *t)
>   {
>   	int rc;
> 
> -	if (!cpu_has_feature(CPU_FTR_ARCH_300))
> +	if (!cpu_has_feature(CPU_FTR_P9_TIDR))
>   		return -EINVAL;
> 
>   	if (t != current)
> 

^ permalink raw reply

* Re: [PATCH v2 1/7] powerpc: Add TIDR CPU feature for Power9
From: Frederic Barrat @ 2018-05-07 17:17 UTC (permalink / raw)
  To: Alastair D'Silva, linuxppc-dev
  Cc: linux-kernel, linux-doc, mikey, vaibhav, aneesh.kumar, malat,
	felix, pombredanne, sukadev, npiggin, gregkh, arnd,
	andrew.donnellan, fbarrat, corbet, Alastair D'Silva
In-Reply-To: <20180418010810.30937-2-alastair@au1.ibm.com>



Le 18/04/2018 à 03:08, Alastair D'Silva a écrit :
> From: Alastair D'Silva <alastair@d-silva.org>
> 
> This patch adds a CPU feature bit to show whether the CPU has
> the TIDR register available, enabling as_notify/wait in userspace.
> 
> Signed-off-by: Alastair D'Silva <alastair@d-silva.org>
> ---
>   arch/powerpc/include/asm/cputable.h | 3 ++-
>   arch/powerpc/kernel/dt_cpu_ftrs.c   | 1 +
>   2 files changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/include/asm/cputable.h b/arch/powerpc/include/asm/cputable.h
> index 4e332f3531c5..54c4cbbe57b4 100644
> --- a/arch/powerpc/include/asm/cputable.h
> +++ b/arch/powerpc/include/asm/cputable.h
> @@ -215,6 +215,7 @@ static inline void cpu_feature_keys_init(void) { }
>   #define CPU_FTR_P9_TM_HV_ASSIST		LONG_ASM_CONST(0x0000100000000000)
>   #define CPU_FTR_P9_TM_XER_SO_BUG	LONG_ASM_CONST(0x0000200000000000)
>   #define CPU_FTR_P9_TLBIE_BUG		LONG_ASM_CONST(0x0000400000000000)
> +#define CPU_FTR_P9_TIDR			LONG_ASM_CONST(0x0000800000000000)
> 
>   #ifndef __ASSEMBLY__
> 
> @@ -462,7 +463,7 @@ static inline void cpu_feature_keys_init(void) { }
>   	    CPU_FTR_CFAR | CPU_FTR_HVMODE | CPU_FTR_VMX_COPY | \
>   	    CPU_FTR_DBELL | CPU_FTR_HAS_PPR | CPU_FTR_ARCH_207S | \
>   	    CPU_FTR_TM_COMP | CPU_FTR_ARCH_300 | CPU_FTR_PKEY | \
> -	    CPU_FTR_P9_TLBIE_BUG)
> +	    CPU_FTR_P9_TLBIE_BUG | CPU_FTR_P9_TIDR)
>   #define CPU_FTRS_POWER9_DD1 ((CPU_FTRS_POWER9 | CPU_FTR_POWER9_DD1) & \
>   			     (~CPU_FTR_SAO))
>   #define CPU_FTRS_POWER9_DD2_0 CPU_FTRS_POWER9
> diff --git a/arch/powerpc/kernel/dt_cpu_ftrs.c b/arch/powerpc/kernel/dt_cpu_ftrs.c
> index 11a3a4fed3fb..10f8b7f55637 100644
> --- a/arch/powerpc/kernel/dt_cpu_ftrs.c
> +++ b/arch/powerpc/kernel/dt_cpu_ftrs.c
> @@ -722,6 +722,7 @@ static __init void cpufeatures_cpu_quirks(void)
>   	if ((version & 0xffff0000) == 0x004e0000) {
>   		cur_cpu_spec->cpu_features &= ~(CPU_FTR_DAWR);
>   		cur_cpu_spec->cpu_features |= CPU_FTR_P9_TLBIE_BUG; > +		cur_cpu_spec->cpu_features |= CPU_FTR_P9_TIDR;


Isn't it redundant with adding the flag to CPU_FTRS_POWER9?

   Fred


>   	}
>   }
> 

^ permalink raw reply

* Re: [PATCH] cxl: disable the lazy approach for irqs in POWERVM environment.
From: christophe lombard @ 2018-05-07 16:02 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, linuxppc-dev, vaibhav, andrew.donnellan
In-Reply-To: <1521879295.16434.377.camel@kernel.crashing.org>

Le 24/03/2018 à 09:14, Benjamin Herrenschmidt a écrit :
> On Fri, 2018-03-23 at 17:17 +0100, christophe lombard wrote:
>> Le 23/03/2018 à 03:14, Benjamin Herrenschmidt a écrit :
>>> On Thu, 2018-03-22 at 17:37 +0100, Christophe Lombard wrote:
>>>> The cxl driver cannot disable the interrupt at the device level and has
>>>> to use disable_irq[_nosync] instead.
>>>> To avoid the implementation of the lazy optimisation (the interrupt is
>>>> marked disabled, but the hardware is left unmasked), we can disable it,
>>>> for a particular irq line, by calling
>>>> 'irq_set_status_flags(irq, IRQ_DISABLE_UNLAZY)'.
>>>
>>> Why do you need that ? What's wrong with the lazy approach ? It makes
>>> disable_irq/enable_irq faster...
>>>
>>> You shouldn't need that unless your device is generating a *LOT* of
>>> irqs while disabled.
>>>
>>
>> An issue on POWERVM (CAPI) has been introduced with the following patch
>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=bf22ff45bed664aefb5c4e43029057a199b7070c
>>
>> The PSL or AFU interrupts are never received by the cxl driver because
>> the interrupts are never unmasked.
>>
>> Without this patch (genirq: Avoid unnecessary low level irq function
>> calls), the callback desc->irq_data.chip->irq_unmask(&desc->irq_data);
>> (= ics_rtas_unmask_irq()) is called by default through irq_enable().
> 
> I don't see why this would change with the patch...
> 
>> The cxl driver disables the interrupts before attaching the process
>> element and enables the interrupts after that.
> 
> How ? Using disable_irq or something else ?
> 
>> In the current code, irq_enable() unmasks the irq only if the irq state
>> is IRQD_IRQ_MASKED but it does not.
> 
> Sorry I don't really parse your sentence. You mean there's a disconnect
> between the "HW" (or pHyp) state of the interrupt and the
> IRQD_IRQ_MASKED flag ? That shouldn't happen... if that's the case the
> bug is elsewhere, what is causing the disconnect in the first place ?
> 
>> Call irq_set_status_flags(irq, IRQ_DISABLE_UNLAZY) allows forcing
>> irq_disable() to update the irq state to IRQD_IRQ_MASKED and by default
>> irq_enable() will unmask the irq through ics_rtas_unmask_irq().
> 
> Hrm. I don't quite understand. You shouldn't need that, I suspect you
> are papering over a different bug but I'm not 100% certain as I don't
> completely understand what's happening.
> 
> Cheers,
> Ben.

Hi Ben,

To answer to your questions, here is the timeline in the cxl driver

   1. call disable_irq()

   2. call plpar_hcall9() to attach a process element
	During this phase, phyp (as described in CAPI PAPR document) 	
	disables the virtual interrupts provided in the process-element-
	struct.
	Then the partition must use ibm,set-xive to enable the virtual
	interrupt source after H_ATTACH_CA_PROCESS completes
	successfully.

   3. call enable_irq()
	Before the following patch (genirq: Avoid unnecessary low level
	irq function calls) be pushed in mainline, the unmask api
	(= ics_rtas_unmask_irq()) was called by default and everything
	was ok.

	With this patch, the unmask api is now called only if 		
	IRQD_IRQ_MASKED is set and in our case, it's never done and the
	irqs remain masked.


   So, there is a disconnect between the 'HW' (phyp) state of the
   interrupt and the IRQD_IRQ_MASKED flag.

   In the timeline of the cxl driver, the IRQD_IRQ_MASKED flag is never
   set.
   This flag was not used in enable_irq(), before that the patch "genirq:
   Avoid ..." was pushed. So ics_rtas_unmask_irq() was called and
   everything was ok.

Thanks

>>
>>
>>
>>>> Signed-off-by: Christophe Lombard <clombard@linux.vnet.ibm.com>
>>>> ---
>>>>    drivers/misc/cxl/guest.c | 1 +
>>>>    1 file changed, 1 insertion(+)
>>>>
>>>> diff --git a/drivers/misc/cxl/guest.c b/drivers/misc/cxl/guest.c
>>>> index f58b4b6c..dc476e1 100644
>>>> --- a/drivers/misc/cxl/guest.c
>>>> +++ b/drivers/misc/cxl/guest.c
>>>> @@ -389,6 +389,7 @@ static void disable_afu_irqs(struct cxl_context *ctx)
>>>>    		hwirq = ctx->irqs.offset[r];
>>>>    		for (i = 0; i < ctx->irqs.range[r]; hwirq++, i++) {
>>>>    			virq = irq_find_mapping(NULL, hwirq);
>>>> +			irq_set_status_flags(virq, IRQ_DISABLE_UNLAZY);
>>>>    			disable_irq(virq);
>>>>    		}
>>>>    	}
> 

^ permalink raw reply

* Re: [PATCH] cxl: Configure PSL to not use APC virtual machines
From: christophe lombard @ 2018-05-07 15:57 UTC (permalink / raw)
  To: Vaibhav Jain, linuxppc-dev, Frederic Barrat
  Cc: Philippe Bergheaud, Alastair D'Silva, Andrew Donnellan,
	Christophe Lombard
In-Reply-To: <20180417051102.7577-1-vaibhav@linux.vnet.ibm.com>

Le 17/04/2018 à 07:11, Vaibhav Jain a écrit :
> APC virtual machines arent used on POWER-9 chips and are already
> disabled in on-chip CAPP. They also need to be disabled on the PSL via
> 'PSL Data Send Control Register' by setting bit(47). This forces the
> PSL to send commands to CAPP with queue.id == 0.
> 
> Signed-off-by: Vaibhav Jain <vaibhav@linux.vnet.ibm.com>
> ---
>   drivers/misc/cxl/pci.c | 4 ++--
>   1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/misc/cxl/pci.c b/drivers/misc/cxl/pci.c
> index c32432168e6b..af30ee848d35 100644
> --- a/drivers/misc/cxl/pci.c
> +++ b/drivers/misc/cxl/pci.c
> @@ -516,9 +516,9 @@ static int init_implementation_adapter_regs_psl9(struct cxl *adapter,
>   	cxl_p1_write(adapter, CXL_PSL9_FIR_CNTL, psl_fircntl);
> 
>   	/* Setup the PSL to transmit packets on the PCIe before the
> -	 * CAPP is enabled
> +	 * CAPP is enabled. Make sure that CAPP virtual machines are disabled
>   	 */
> -	cxl_p1_write(adapter, CXL_PSL9_DSNDCTL, 0x0001001000002A10ULL);
> +	cxl_p1_write(adapter, CXL_PSL9_DSNDCTL, 0x0001001000012A10ULL);
> 
>   	/*
>   	 * A response to an ASB_Notify request is returned by the
> 

Thanks

Reviewed-by: Christophe Lombard <clombard@linux.vnet.ibm.com>

^ permalink raw reply

* Re: [PATCH v3] On ppc64le we HAVE_RELIABLE_STACKTRACE
From: Josh Poimboeuf @ 2018-05-07 15:42 UTC (permalink / raw)
  To: Torsten Duwe
  Cc: Michael Ellerman, Jiri Kosina, linuxppc-dev, linux-kernel,
	Nicholas Piggin, live-patching
In-Reply-To: <20180504123834.GA16581@lst.de>

On Fri, May 04, 2018 at 02:38:34PM +0200, Torsten Duwe wrote:
> 
> The "Power Architecture 64-Bit ELF V2 ABI" says in section 2.3.2.3:
> 
> [...] There are several rules that must be adhered to in order to ensure
> reliable and consistent call chain backtracing:
> 
> * Before a function calls any other function, it shall establish its
>   own stack frame, whose size shall be a multiple of 16 bytes.
> 
>  – In instances where a function’s prologue creates a stack frame, the
>    back-chain word of the stack frame shall be updated atomically with
>    the value of the stack pointer (r1) when a back chain is implemented.
>    (This must be supported as default by all ELF V2 ABI-compliant
>    environments.)
> [...]
>  – The function shall save the link register that contains its return
>    address in the LR save doubleword of its caller’s stack frame before
>    calling another function.
> 
> To me this sounds like the equivalent of HAVE_RELIABLE_STACKTRACE.
> This patch may be unneccessarily limited to ppc64le, but OTOH the only
> user of this flag so far is livepatching, which is only implemented on
> PPCs with 64-LE, a.k.a. ELF ABI v2.
> 
> Feel free to add other ppc variants, but so far only ppc64le got tested.
> 
> This change also implements save_stack_trace_tsk_reliable() for ppc64le
> that checks for the above conditions, where possible.
> 
> Signed-off-by: Torsten Duwe <duwe@suse.de>
> Signed-off-by: Nicolai Stange <nstange@suse.de>

The subject doesn't actively describe what the patch does, maybe change
it to something like:

  powerpc: Add support for HAVE_RELIABLE_STACKTRACE

or maybe

  powerpc: Add support for livepatch consistency model

Otherwise it looks great to me.

Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>

-- 
Josh

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox