LinuxPPC-Dev Archive on lore.kernel.org

LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH v3 3/3] cpuidle-pseries : Fixup exit latency for CEDE(0)
From: Gautham R. Shenoy @ 2020-07-30  5:32 UTC (permalink / raw)
  To: Nicholas Piggin, Anton Blanchard, Nathan Lynch, Michael Ellerman,
	Michael Neuling, Vaidyanathan Srinivasan
  Cc: linuxppc-dev, Gautham R. Shenoy, linux-kernel, linux-pm
In-Reply-To: <1596087177-30329-1-git-send-email-ego@linux.vnet.ibm.com>

From: "Gautham R. Shenoy" <ego@linux.vnet.ibm.com>

We are currently assuming that CEDE(0) has exit latency 10us, since
there is no way for us to query from the platform.  However, if the
wakeup latency of an Extended CEDE state is smaller than 10us, then we
can be sure that the exit latency of CEDE(0) cannot be more than that.
that.

In this patch, we fix the exit latency of CEDE(0) if we discover an
Extended CEDE state with wakeup latency smaller than 10us.

Benchmark results:

On POWER8, this patch does not have any impact since the advertized
latency of Extended CEDE (1) is 30us which is higher than the default
latency of CEDE (0) which is 10us.

On POWER9 we see improvement the single-threaded performance of ebizzy,
and no regression in the wakeup latency or the number of
context-switches.

ebizzy:
2 ebizzy threads bound to the same big-core. 25% improvement in the
avg records/s with patch.
x without_patch
* with_patch
    N           Min           Max        Median           Avg        Stddev
x  10       2491089       5834307       5398375       4244335     1596244.9
*  10       2893813       5834474       5832448     5327281.3     1055941.4

context_switch2 :
There is no major regression observed with this patch as seen from the
context_switch2 benchmark.

context_switch2 across CPU0 CPU1 (Both belong to same big-core, but different
small cores). We observe a minor 0.14% regression in the number of
context-switches (higher is better).
x without_patch
* with_patch
    N           Min           Max        Median           Avg        Stddev
x 500        348872        362236        354712     354745.69      2711.827
* 500        349422        361452        353942      354215.4     2576.9258
Difference at 99.0% confidence
    -530.288 +/- 430.963
    -0.149484% +/- 0.121485%
    (Student's t, pooled s = 2645.24)

context_switch2 across CPU0 CPU8 (Different big-cores). We observe a 0.37%
improvement in the number of context-switches (higher is better).
x without_patch
* with_patch
    N           Min           Max        Median           Avg        Stddev
x 500        287956        294940        288896     288977.23     646.59295
* 500        288300        294646        289582     290064.76     1161.9992
Difference at 99.0% confidence
    1087.53 +/- 153.194
    0.376337% +/- 0.0530125%
    (Student's t, pooled s = 940.299)

schbench:
No major difference could be seen until the 99.9th percentile.

Without-patch
Latency percentiles (usec)
	50.0th: 29
	75.0th: 39
	90.0th: 49
	95.0th: 59
	*99.0th: 13104
	99.5th: 14672
	99.9th: 15824
	min=0, max=17993

With-patch:
Latency percentiles (usec)
	50.0th: 29
	75.0th: 40
	90.0th: 50
	95.0th: 61
	*99.0th: 13648
	99.5th: 14768
	99.9th: 15664
	min=0, max=29812

Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
---
v2-->v3 : Made notation consistent with first two patches.
 drivers/cpuidle/cpuidle-pseries.c | 41 +++++++++++++++++++++++++++++++++++++--
 1 file changed, 39 insertions(+), 2 deletions(-)

diff --git a/drivers/cpuidle/cpuidle-pseries.c b/drivers/cpuidle/cpuidle-pseries.c
index f528da7..8d19820 100644
--- a/drivers/cpuidle/cpuidle-pseries.c
+++ b/drivers/cpuidle/cpuidle-pseries.c
@@ -350,13 +350,50 @@ static int pseries_cpuidle_driver_init(void)
 	return 0;
 }
 
-static void __init parse_xcede_idle_states(void)
+static void __init fixup_cede0_latency(void)
 {
+	int i;
+	u64 min_latency_us = dedicated_states[1].exit_latency; /* CEDE latency */
+	struct xcede_latency_payload *payload;
+
 	if (parse_cede_parameters())
 		return;
 
 	pr_info("cpuidle : Skipping the %d Extended CEDE idle states\n",
 		nr_xcede_records);
+
+	payload = &xcede_latency_parameter.payload;
+	for (i = 0; i < nr_xcede_records; i++) {
+		struct xcede_latency_record *record = &payload->records[i];
+		u64 latency_tb = be64_to_cpu(record->latency_ticks);
+		u64 latency_us = tb_to_ns(latency_tb) / NSEC_PER_USEC;
+
+		if (latency_us < min_latency_us)
+			min_latency_us = latency_us;
+	}
+
+	/*
+	 * By default, we assume that CEDE(0) has exit latency 10us,
+	 * since there is no way for us to query from the platform.
+	 *
+	 * However, if the wakeup latency of an Extended CEDE state is
+	 * smaller than 10us, then we can be sure that CEDE(0)
+	 * requires no more than that.
+	 *
+	 * Perform the fix-up.
+	 */
+	if (min_latency_us < dedicated_states[1].exit_latency) {
+		u64 cede0_latency = min_latency_us - 1;
+
+		if (cede0_latency <= 0)
+			cede0_latency = min_latency_us;
+
+		dedicated_states[1].exit_latency = cede0_latency;
+		dedicated_states[1].target_residency = 10 * (cede0_latency);
+		pr_info("cpuidle : Fixed up CEDE exit latency to %llu us\n",
+			cede0_latency);
+	}
+
 }
 
 /*
@@ -380,7 +417,7 @@ static int pseries_idle_probe(void)
 			cpuidle_state_table = shared_states;
 			max_idle_state = ARRAY_SIZE(shared_states);
 		} else {
-			parse_xcede_idle_states();
+			fixup_cede0_latency();
 			cpuidle_state_table = dedicated_states;
 			max_idle_state = NR_DEDICATED_STATES;
 		}
-- 
1.9.4


^ permalink raw reply related

* Re: [PATCH v2] powerpc/vio: drop bus_type from parent device
From: Greg KH @ 2020-07-30  5:37 UTC (permalink / raw)
  To: Michael Ellerman
  Cc: Thadeu Lima de Souza Cascardo, Peter Rajnoha, linuxppc-dev
In-Reply-To: <87ime56bax.fsf@mpe.ellerman.id.au>

On Thu, Jul 30, 2020 at 11:28:38AM +1000, Michael Ellerman wrote:
> [ Added Peter & Greg to Cc ]
> 
> Thadeu Lima de Souza Cascardo <cascardo@canonical.com> writes:
> > Commit df44b479654f62b478c18ee4d8bc4e9f897a9844 ("kobject: return error
> > code if writing /sys/.../uevent fails") started returning failure when
> > writing to /sys/devices/vio/uevent.
> >
> > This causes an early udevadm trigger to fail. On some installer versions of
> > Ubuntu, this will cause init to exit, thus panicing the system very early
> > during boot.
> >
> > Removing the bus_type from the parent device will remove some of the extra
> > empty files from /sys/devices/vio/, but will keep the rest of the layout
> > for vio devices, keeping them under /sys/devices/vio/.
> 
> What exactly does it change?
> 
> I'm finding it hard to evaluate if this change is going to cause a
> regression somehow.
> 
> I'm also not clear on why removing the bus type is correct, apart from
> whether it fixes the bug you're seeing.
> 
> > It has been tested that uevents for vio devices don't change after this
> > fix, they still contain MODALIAS.
> >
> > Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
> > Fixes: df44b479654f ("kobject: return error code if writing /sys/.../uevent fails")
> 
> AFAICS there haven't been any other fixes for that commit. Do we know
> why it is only vio that was affected? (possibly because it's a fake bus
> to begin with?)

So there was an error previously, the core was ignoring it, and now it
isn't and to fix that you want to remove describing what bus a device is
on?

Huh???

> 
> cheers
> 
> > diff --git a/arch/powerpc/platforms/pseries/vio.c b/arch/powerpc/platforms/pseries/vio.c
> > index 37f1f25ba804..a94dab3972a0 100644
> > --- a/arch/powerpc/platforms/pseries/vio.c
> > +++ b/arch/powerpc/platforms/pseries/vio.c
> > @@ -36,7 +36,6 @@ static struct vio_dev vio_bus_device  = { /* fake "parent" device */
> >  	.name = "vio",
> >  	.type = "",
> >  	.dev.init_name = "vio",
> > -	.dev.bus = &vio_bus_type,
> >  };

Wait, a static 'struct device'?  You all are playing with fire there.
That's a reference counted object, and should never be declared like
that at all.

I see you register it, but never unregister it, why?  Why is it even
needed?

And if you remove the bus type of it, it will show up in a different
part of sysfs, so I think this patch will show a user-visable change,
right?

thanks,

greg k-h

^ permalink raw reply

* [PATCH v3 2/3] cpuidle-pseries: Add function to parse extended CEDE records
From: Gautham R. Shenoy @ 2020-07-30  5:32 UTC (permalink / raw)
  To: Nicholas Piggin, Anton Blanchard, Nathan Lynch, Michael Ellerman,
	Michael Neuling, Vaidyanathan Srinivasan
  Cc: linuxppc-dev, Gautham R. Shenoy, linux-kernel, linux-pm
In-Reply-To: <1596087177-30329-1-git-send-email-ego@linux.vnet.ibm.com>

From: "Gautham R. Shenoy" <ego@linux.vnet.ibm.com>

Currently we use CEDE with latency-hint 0 as the only other idle state
on a dedicated LPAR apart from the polling "snooze" state.

The platform might support additional extended CEDE idle states, which
can be discovered through the "ibm,get-system-parameter" rtas-call
made with CEDE_LATENCY_TOKEN.

This patch adds a function to obtain information about the extended
CEDE idle states from the platform and parse the contents to populate
an array of extended CEDE states. These idle states thus discovered
will be added to the cpuidle framework in the next patch.

dmesg on a POWER8 and POWER9 LPAR, demonstrating the output of parsing
the extended CEDE latency parameters are as follows

POWER8
[   10.093279] xcede : xcede_record_size = 10
[   10.093285] xcede : Record 0 : hint = 1, latency = 0x3c00 tb ticks, Wake-on-irq = 1
[   10.093291] xcede : Record 1 : hint = 2, latency = 0x4e2000 tb ticks, Wake-on-irq = 0
[   10.093297] cpuidle : Skipping the 2 Extended CEDE idle states

POWER9
[    5.913180] xcede : xcede_record_size = 10
[    5.913183] xcede : Record 0 : hint = 1, latency = 0x400 tb ticks, Wake-on-irq = 1
[    5.913188] xcede : Record 1 : hint = 2, latency = 0x3e8000 tb ticks, Wake-on-irq = 0
[    5.913193] cpuidle : Skipping the 2 Extended CEDE idle states

Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
---
v2-->v3 : Cleaned up parse_cede_parameters(). Silenced some sparse warnings.
drivers/cpuidle/cpuidle-pseries.c | 142 ++++++++++++++++++++++++++++++++++++++
 1 file changed, 142 insertions(+)

diff --git a/drivers/cpuidle/cpuidle-pseries.c b/drivers/cpuidle/cpuidle-pseries.c
index f5865a2..f528da7 100644
--- a/drivers/cpuidle/cpuidle-pseries.c
+++ b/drivers/cpuidle/cpuidle-pseries.c
@@ -21,6 +21,7 @@
 #include <asm/runlatch.h>
 #include <asm/idle.h>
 #include <asm/plpar_wrappers.h>
+#include <asm/rtas.h>
 
 static struct cpuidle_driver pseries_idle_driver = {
 	.name             = "pseries_idle",
@@ -87,6 +88,137 @@ static void check_and_cede_processor(void)
 }
 
 #define NR_DEDICATED_STATES	2 /* snooze, CEDE */
+/*
+ * XCEDE : Extended CEDE states discovered through the
+ *         "ibm,get-systems-parameter" rtas-call with the token
+ *         CEDE_LATENCY_TOKEN
+ */
+#define MAX_XCEDE_STATES		4
+#define	XCEDE_LATENCY_RECORD_SIZE	10
+#define XCEDE_LATENCY_PARAM_MAX_LENGTH	(2 + 2 + \
+					(MAX_XCEDE_STATES * XCEDE_LATENCY_RECORD_SIZE))
+
+/*
+ * Section 7.3.16 System Parameters Option of PAPR version 2.8.1 has a
+ * table with all the parameters to ibm,get-system-parameters.
+ * CEDE_LATENCY_TOKEN corresponds to the token value for Cede Latency
+ * Settings Information.
+ */
+#define CEDE_LATENCY_TOKEN		45
+
+/*
+ * If the platform supports the cede latency settings
+ * information system parameter it must provide the following
+ * information in the NULL terminated parameter string:
+ *
+ * a. The first byte is the length “N” of each cede
+ *    latency setting record minus one (zero indicates a length
+ *    of 1 byte).
+ *
+ * b. For each supported cede latency setting a cede latency
+ *    setting record consisting of the first “N” bytes as per
+ *    the following table.
+ *
+ *	-----------------------------
+ *	| Field           | Field  |
+ *	| Name            | Length |
+ *	-----------------------------
+ *	| Cede Latency    | 1 Byte |
+ *	| Specifier Value |        |
+ *	-----------------------------
+ *	| Maximum wakeup  |        |
+ *	| latency in      | 8 Bytes|
+ *	| tb-ticks        |        |
+ *	-----------------------------
+ *	| Responsive to   |        |
+ *	| external        | 1 Byte |
+ *	| interrupts      |        |
+ *	-----------------------------
+ *
+ * This version has cede latency record size = 10.
+ *
+ * The structure xcede_latency_payload represents a) and b) with
+ * xcede_latency_record representing the table in b).
+ *
+ * xcede_latency_parameter is what gets returned by
+ * ibm,get-systems-parameter rtas-call when made with
+ * CEDE_LATENCY_TOKEN.
+ *
+ * These structures are only used to represent the data sent obtained
+ * by the rtas-call. The data is in Big-Endian.
+ */
+struct xcede_latency_record {
+	u8	hint;
+	__be64	latency_ticks;
+	u8	wake_on_irqs;
+} __packed;
+
+struct xcede_latency_payload {
+	u8     record_size;
+	struct xcede_latency_record records[MAX_XCEDE_STATES];
+} __packed;
+
+struct xcede_latency_parameter {
+	__be16  payload_size;
+	struct xcede_latency_payload payload;
+	u8 null_char;
+} __packed;
+
+static unsigned int nr_xcede_records;
+static struct xcede_latency_parameter xcede_latency_parameter __initdata;
+
+static int __init parse_cede_parameters(void)
+{
+	int ret, i;
+	u16 payload_size;
+	u8 xcede_record_size;
+	u32 total_xcede_records_size;
+	struct xcede_latency_payload *payload;
+
+	memset(&xcede_latency_parameter, 0, sizeof(xcede_latency_parameter));
+
+	ret = rtas_call(rtas_token("ibm,get-system-parameter"), 3, 1,
+			NULL, CEDE_LATENCY_TOKEN, __pa(&xcede_latency_parameter),
+			sizeof(xcede_latency_parameter));
+
+	if (ret) {
+		pr_err("xcede: Error parsing CEDE_LATENCY_TOKEN\n");
+		return ret;
+	}
+
+	payload_size = be16_to_cpu(xcede_latency_parameter.payload_size);
+	payload = &xcede_latency_parameter.payload;
+
+	xcede_record_size = payload->record_size + 1;
+
+	if (xcede_record_size != XCEDE_LATENCY_RECORD_SIZE) {
+		pr_err("xcede : Expected record-size %d. Observed size %d.\n",
+		       XCEDE_LATENCY_RECORD_SIZE, xcede_record_size);
+		return -EINVAL;
+	}
+
+	pr_info("xcede : xcede_record_size = %d\n", xcede_record_size);
+
+	/*
+	 * Since the payload_length includes the last NULL byte and
+	 * the xcede_record_size, the remaining bytes correspond to
+	 * array of all cede_latency settings.
+	 */
+	total_xcede_records_size = payload_size - 2;
+	nr_xcede_records = total_xcede_records_size / xcede_record_size;
+
+	for (i = 0; i < nr_xcede_records; i++) {
+		struct xcede_latency_record *record = &payload->records[i];
+		u8 hint = record->hint;
+		u8 wake_on_irqs = record->wake_on_irqs;
+		u64 latency_ticks = be64_to_cpu(record->latency_ticks);
+
+		pr_info("xcede : Record %d : hint = %u, latency = 0x%llx tb ticks, Wake-on-irq = %u\n",
+			i, hint, latency_ticks, wake_on_irqs);
+	}
+
+	return 0;
+}
 
 u8 cede_latency_hint[NR_DEDICATED_STATES];
 static int dedicated_cede_loop(struct cpuidle_device *dev,
@@ -218,6 +350,15 @@ static int pseries_cpuidle_driver_init(void)
 	return 0;
 }
 
+static void __init parse_xcede_idle_states(void)
+{
+	if (parse_cede_parameters())
+		return;
+
+	pr_info("cpuidle : Skipping the %d Extended CEDE idle states\n",
+		nr_xcede_records);
+}
+
 /*
  * pseries_idle_probe()
  * Choose state table for shared versus dedicated partition
@@ -239,6 +380,7 @@ static int pseries_idle_probe(void)
 			cpuidle_state_table = shared_states;
 			max_idle_state = ARRAY_SIZE(shared_states);
 		} else {
+			parse_xcede_idle_states();
 			cpuidle_state_table = dedicated_states;
 			max_idle_state = NR_DEDICATED_STATES;
 		}
-- 
1.9.4


^ permalink raw reply related

* [PATCH v3 0/3] cpuidle-pseries: Parse extended CEDE information for idle.
From: Gautham R. Shenoy @ 2020-07-30  5:32 UTC (permalink / raw)
  To: Nicholas Piggin, Anton Blanchard, Nathan Lynch, Michael Ellerman,
	Michael Neuling, Vaidyanathan Srinivasan
  Cc: linuxppc-dev, Gautham R. Shenoy, linux-kernel, linux-pm

From: "Gautham R. Shenoy" <ego@linux.vnet.ibm.com>

This is a v3 of the patch series to parse the extended CEDE
information in the pseries-cpuidle driver.

The previous two versions of the patches can be found here:

v2: https://lore.kernel.org/lkml/1596005254-25753-1-git-send-email-ego@linux.vnet.ibm.com/

v1: https://lore.kernel.org/linuxppc-dev/1594120299-31389-1-git-send-email-ego@linux.vnet.ibm.com/

The change from v2 --> v1 :

 * Patch 1: Got rid of some #define-s which were needed mainly for Patches4 and
   5 of v1, but were retained in v2.

 * Patch 2:

  	* Based on feedback from Michael Ellerman, rewrote the
	  function to parse the extended idle states by explicitly
	  defining the structure of the object that is returned by
	  ibm,get-system-parameters(CEDE_LATENCY_TOKEN) rtas-call. In
	  the previous versions we were passing a character array and
	  subsequently parsing the individual elements which can be
	  bug-prone. This also gets rid of the excessive (cast *)ing
	  that was in the previous versions.

	  * Marked some of the functions static and annotated some of
	  the functions with __init and data with __initdata. This
	  makes Sparse happy.

	  * Added comments for CEDE_LATENCY_TOKEN.

	  * Renamed add_pseries_idle_states() to
            parse_xcede_idle_states(). Again, this is because Patch 4
            and 5 from v1 are no longer there.

 * Patch 3: No functional changes, but minor changes to be consistent
   with Patch 1 and 2 of this series.

I have additionally tested the code on POWER8 dedicated LPAR and found
that it has no impact, since the wakeup latency of CEDE(1) is 30us
which is greater that default latency that we are assuming for
CEDE(0). So we do not need to fixup CEDE(0) latency on POWER8.

Vaidy, I have removed your Reviewed-by for v1, since the code has
changed a little bit.

Gautham R. Shenoy (3):
  cpuidle-pseries: Set the latency-hint before entering CEDE
  cpuidle-pseries: Add function to parse extended CEDE records
  cpuidle-pseries : Fixup exit latency for CEDE(0)

 drivers/cpuidle/cpuidle-pseries.c | 190 +++++++++++++++++++++++++++++++++++++-
 1 file changed, 188 insertions(+), 2 deletions(-)

-- 
1.9.4

^ permalink raw reply

* [PATCH v3 1/3] cpuidle-pseries: Set the latency-hint before entering CEDE
From: Gautham R. Shenoy @ 2020-07-30  5:32 UTC (permalink / raw)
  To: Nicholas Piggin, Anton Blanchard, Nathan Lynch, Michael Ellerman,
	Michael Neuling, Vaidyanathan Srinivasan
  Cc: linuxppc-dev, Gautham R. Shenoy, linux-kernel, linux-pm
In-Reply-To: <1596087177-30329-1-git-send-email-ego@linux.vnet.ibm.com>

From: "Gautham R. Shenoy" <ego@linux.vnet.ibm.com>

As per the PAPR, each H_CEDE call is associated with a latency-hint to
be passed in the VPA field "cede_latency_hint". The CEDE states that
we were implicitly entering so far is CEDE with latency-hint = 0.

This patch explicitly sets the latency hint corresponding to the CEDE
state that we are currently entering. While at it, we save the
previous hint, to be restored once we wakeup from CEDE. This will be
required in the future when we expose extended-cede states through the
cpuidle framework, where each of them will have a different
cede-latency hint.

Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
---
v2-->v3 : Got rid of the usused NR_CEDE_STATES definition

 drivers/cpuidle/cpuidle-pseries.c | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/drivers/cpuidle/cpuidle-pseries.c b/drivers/cpuidle/cpuidle-pseries.c
index 3e058ad2..f5865a2 100644
--- a/drivers/cpuidle/cpuidle-pseries.c
+++ b/drivers/cpuidle/cpuidle-pseries.c
@@ -86,19 +86,26 @@ static void check_and_cede_processor(void)
 	}
 }
 
+#define NR_DEDICATED_STATES	2 /* snooze, CEDE */
+
+u8 cede_latency_hint[NR_DEDICATED_STATES];
 static int dedicated_cede_loop(struct cpuidle_device *dev,
 				struct cpuidle_driver *drv,
 				int index)
 {
+	u8 old_latency_hint;
 
 	pseries_idle_prolog();
 	get_lppaca()->donate_dedicated_cpu = 1;
+	old_latency_hint = get_lppaca()->cede_latency_hint;
+	get_lppaca()->cede_latency_hint = cede_latency_hint[index];
 
 	HMT_medium();
 	check_and_cede_processor();
 
 	local_irq_disable();
 	get_lppaca()->donate_dedicated_cpu = 0;
+	get_lppaca()->cede_latency_hint = old_latency_hint;
 
 	pseries_idle_epilog();
 
@@ -130,7 +137,7 @@ static int shared_cede_loop(struct cpuidle_device *dev,
 /*
  * States for dedicated partition case.
  */
-static struct cpuidle_state dedicated_states[] = {
+static struct cpuidle_state dedicated_states[NR_DEDICATED_STATES] = {
 	{ /* Snooze */
 		.name = "snooze",
 		.desc = "snooze",
@@ -233,7 +240,7 @@ static int pseries_idle_probe(void)
 			max_idle_state = ARRAY_SIZE(shared_states);
 		} else {
 			cpuidle_state_table = dedicated_states;
-			max_idle_state = ARRAY_SIZE(dedicated_states);
+			max_idle_state = NR_DEDICATED_STATES;
 		}
 	} else
 		return -ENODEV;
-- 
1.9.4


^ permalink raw reply related

* [PATCH v3] selftests: powerpc: Fix online CPU selection
From: Sandipan Das @ 2020-07-30  5:08 UTC (permalink / raw)
  To: mpe; +Cc: srikar, kamalesh, shiganta, nasastry, harish, linuxppc-dev

The size of the CPU affinity mask must be large enough for
systems with a very large number of CPUs. Otherwise, tests
which try to determine the first online CPU by calling
sched_getaffinity() will fail. This makes sure that the size
of the allocated affinity mask is dependent on the number of
CPUs as reported by get_nprocs_conf().

Fixes: 3752e453f6ba ("selftests/powerpc: Add tests of PMU EBBs")
Reported-by: Shirisha Ganta <shiganta@in.ibm.com>
Signed-off-by: Sandipan Das <sandipan@linux.ibm.com>
Reviewed-by: Kamalesh Babulal <kamalesh@linux.vnet.ibm.com>
---
Previous versions can be found at:
v2: https://lore.kernel.org/linuxppc-dev/20200609073733.997643-1-sandipan@linux.ibm.com/
v1: https://lore.kernel.org/linuxppc-dev/20200608144212.985144-1-sandipan@linux.ibm.com/

Changes in v3:
- Use get_nprocs_conf() instead of get_nprocs() as suggested by
  Srikar to cope with test failures that Michael reported for
  sparse SMT setups.

Changes in v2:
- Added NULL check for the affinity mask as suggested by Kamalesh.
- Changed "cpu set" to "CPU affinity mask" in the commit message.

---
 tools/testing/selftests/powerpc/utils.c | 37 +++++++++++++++++--------
 1 file changed, 25 insertions(+), 12 deletions(-)

diff --git a/tools/testing/selftests/powerpc/utils.c b/tools/testing/selftests/powerpc/utils.c
index 933678f1ed0a0..18b6a773d5c73 100644
--- a/tools/testing/selftests/powerpc/utils.c
+++ b/tools/testing/selftests/powerpc/utils.c
@@ -16,6 +16,7 @@
 #include <string.h>
 #include <sys/ioctl.h>
 #include <sys/stat.h>
+#include <sys/sysinfo.h>
 #include <sys/types.h>
 #include <sys/utsname.h>
 #include <unistd.h>
@@ -88,28 +89,40 @@ void *get_auxv_entry(int type)
 
 int pick_online_cpu(void)
 {
-	cpu_set_t mask;
-	int cpu;
+	int ncpus, cpu = -1;
+	cpu_set_t *mask;
+	size_t size;
+
+	ncpus = get_nprocs_conf();
+	size = CPU_ALLOC_SIZE(ncpus);
+	mask = CPU_ALLOC(ncpus);
+	if (!mask) {
+		perror("malloc");
+		return -1;
+	}
 
-	CPU_ZERO(&mask);
+	CPU_ZERO_S(size, mask);
 
-	if (sched_getaffinity(0, sizeof(mask), &mask)) {
+	if (sched_getaffinity(0, size, mask)) {
 		perror("sched_getaffinity");
-		return -1;
+		goto done;
 	}
 
 	/* We prefer a primary thread, but skip 0 */
-	for (cpu = 8; cpu < CPU_SETSIZE; cpu += 8)
-		if (CPU_ISSET(cpu, &mask))
-			return cpu;
+	for (cpu = 8; cpu < ncpus; cpu += 8)
+		if (CPU_ISSET_S(cpu, size, mask))
+			goto done;
 
 	/* Search for anything, but in reverse */
-	for (cpu = CPU_SETSIZE - 1; cpu >= 0; cpu--)
-		if (CPU_ISSET(cpu, &mask))
-			return cpu;
+	for (cpu = ncpus - 1; cpu >= 0; cpu--)
+		if (CPU_ISSET_S(cpu, size, mask))
+			goto done;
 
 	printf("No cpus in affinity mask?!\n");
-	return -1;
+
+done:
+	CPU_FREE(mask);
+	return cpu;
 }
 
 bool is_ppc64le(void)
-- 
2.25.1


^ permalink raw reply related

* Re: [PATCH v2] selftests: powerpc: Fix online CPU selection
From: Sandipan Das @ 2020-07-30  4:59 UTC (permalink / raw)
  To: Srikar Dronamraju, mpe; +Cc: shiganta, nasastry, harish, linuxppc-dev, kamalesh
In-Reply-To: <20200729160344.GB14603@linux.vnet.ibm.com>

Hi Srikar, Michael,

On 29/07/20 9:33 pm, Srikar Dronamraju wrote:
> * Sandipan Das <sandipan@linux.ibm.com> [2020-06-09 13:07:33]:
> 
>> The size of the CPU affinity mask must be large enough for
>> systems with a very large number of CPUs. Otherwise, tests
>> which try to determine the first online CPU by calling
>> sched_getaffinity() will fail. This makes sure that the size
>> of the allocated affinity mask is dependent on the number of
>> CPUs as reported by get_nprocs().
>>
>> Fixes: 3752e453f6ba ("selftests/powerpc: Add tests of PMU EBBs")
>> Reported-by: Shirisha Ganta <shiganta@in.ibm.com>
>> Signed-off-by: Sandipan Das <sandipan@linux.ibm.com>
>> Reviewed-by: Kamalesh Babulal <kamalesh@linux.vnet.ibm.com>
>> ---
>> Previous versions can be found at:
>> v1: https://lore.kernel.org/linuxppc-dev/20200608144212.985144-1-sandipan@linux.ibm.com/
>>
>> @@ -88,28 +89,40 @@ void *get_auxv_entry(int type)
>>
>>  int pick_online_cpu(void)
>>  {
>> -	cpu_set_t mask;
>> -	int cpu;
>> +	int ncpus, cpu = -1;
>> +	cpu_set_t *mask;
>> +	size_t size;
>> +
>> +	ncpus = get_nprocs();
> 
> Please use get_nprocs_conf or sysconf(_SC_NPROCESSORS_CONF). The manpage
> seems to suggest the latter. Not sure how accurate the manpage is.
> 
> get_nprocs is returning online cpus and when smt is off, the cpu numbers
> would be sparse and hence the result from get_nprocs wouldn't be ideal for
> allocating cpumask. However get_nprocs_conf would return the max configured
> cpus and would be able to handle it. 
> 
> I think this was the same situation hit by Michael Ellerman.
> 

Yes, that seems to be the case. Thanks for testing this.
Will fix this in v3.

- Sandipan

^ permalink raw reply

* Re: [PATCH 11/15] memblock: reduce number of parameters in for_each_mem_range()
From: Baoquan He @ 2020-07-30  2:22 UTC (permalink / raw)
  To: Mike Rapoport
  Cc: linux-sh, Peter Zijlstra, Dave Hansen, linux-mips, Max Filippov,
	Paul Mackerras, sparclinux, linux-riscv, Will Deacon,
	Stafford Horne, Marek Szyprowski, linux-s390, linux-c6x-dev,
	Yoshinori Sato, x86, Russell King, Mike Rapoport,
	clang-built-linux, Ingo Molnar, Catalin Marinas, uclinux-h8-devel,
	linux-xtensa, openrisc, Borislav Petkov, Andy Lutomirski,
	Paul Walmsley, Thomas Gleixner, linux-arm-kernel, Michal Simek,
	linux-mm, linuxppc-dev, linux-kernel, iommu, Palmer Dabbelt,
	Andrew Morton, Christoph Hellwig
In-Reply-To: <20200728051153.1590-12-rppt@kernel.org>

On 07/28/20 at 08:11am, Mike Rapoport wrote:
> From: Mike Rapoport <rppt@linux.ibm.com>
> 
> Currently for_each_mem_range() iterator is the most generic way to traverse
> memblock regions. As such, it has 8 parameters and it is hardly convenient
> to users. Most users choose to utilize one of its wrappers and the only
> user that actually needs most of the parameters outside memblock is s390
> crash dump implementation.
> 
> To avoid yet another naming for memblock iterators, rename the existing
> for_each_mem_range() to __for_each_mem_range() and add a new
> for_each_mem_range() wrapper with only index, start and end parameters.
> 
> The new wrapper nicely fits into init_unavailable_mem() and will be used in
> upcoming changes to simplify memblock traversals.
> 
> Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
> ---
>  .clang-format                          |  1 +
>  arch/arm64/kernel/machine_kexec_file.c |  6 ++----
>  arch/s390/kernel/crash_dump.c          |  8 ++++----
>  include/linux/memblock.h               | 18 ++++++++++++++----
>  mm/page_alloc.c                        |  3 +--
>  5 files changed, 22 insertions(+), 14 deletions(-)

Reviewed-by: Baoquan He <bhe@redhat.com>

> 
> diff --git a/.clang-format b/.clang-format
> index a0a96088c74f..52ededab25ce 100644
> --- a/.clang-format
> +++ b/.clang-format
> @@ -205,6 +205,7 @@ ForEachMacros:
>    - 'for_each_memblock_type'
>    - 'for_each_memcg_cache_index'
>    - 'for_each_mem_pfn_range'
> +  - '__for_each_mem_range'
>    - 'for_each_mem_range'
>    - 'for_each_mem_range_rev'
>    - 'for_each_migratetype_order'
> diff --git a/arch/arm64/kernel/machine_kexec_file.c b/arch/arm64/kernel/machine_kexec_file.c
> index 361a1143e09e..5b0e67b93cdc 100644
> --- a/arch/arm64/kernel/machine_kexec_file.c
> +++ b/arch/arm64/kernel/machine_kexec_file.c
> @@ -215,8 +215,7 @@ static int prepare_elf_headers(void **addr, unsigned long *sz)
>  	phys_addr_t start, end;
>  
>  	nr_ranges = 1; /* for exclusion of crashkernel region */
> -	for_each_mem_range(i, &memblock.memory, NULL, NUMA_NO_NODE,
> -					MEMBLOCK_NONE, &start, &end, NULL)
> +	for_each_mem_range(i, &start, &end)
>  		nr_ranges++;
>  
>  	cmem = kmalloc(struct_size(cmem, ranges, nr_ranges), GFP_KERNEL);
> @@ -225,8 +224,7 @@ static int prepare_elf_headers(void **addr, unsigned long *sz)
>  
>  	cmem->max_nr_ranges = nr_ranges;
>  	cmem->nr_ranges = 0;
> -	for_each_mem_range(i, &memblock.memory, NULL, NUMA_NO_NODE,
> -					MEMBLOCK_NONE, &start, &end, NULL) {
> +	for_each_mem_range(i, &start, &end) {
>  		cmem->ranges[cmem->nr_ranges].start = start;
>  		cmem->ranges[cmem->nr_ranges].end = end - 1;
>  		cmem->nr_ranges++;
> diff --git a/arch/s390/kernel/crash_dump.c b/arch/s390/kernel/crash_dump.c
> index f96a5857bbfd..e28085c725ff 100644
> --- a/arch/s390/kernel/crash_dump.c
> +++ b/arch/s390/kernel/crash_dump.c
> @@ -549,8 +549,8 @@ static int get_mem_chunk_cnt(void)
>  	int cnt = 0;
>  	u64 idx;
>  
> -	for_each_mem_range(idx, &memblock.physmem, &oldmem_type, NUMA_NO_NODE,
> -			   MEMBLOCK_NONE, NULL, NULL, NULL)
> +	__for_each_mem_range(idx, &memblock.physmem, &oldmem_type, NUMA_NO_NODE,
> +			     MEMBLOCK_NONE, NULL, NULL, NULL)
>  		cnt++;
>  	return cnt;
>  }
> @@ -563,8 +563,8 @@ static void loads_init(Elf64_Phdr *phdr, u64 loads_offset)
>  	phys_addr_t start, end;
>  	u64 idx;
>  
> -	for_each_mem_range(idx, &memblock.physmem, &oldmem_type, NUMA_NO_NODE,
> -			   MEMBLOCK_NONE, &start, &end, NULL) {
> +	__for_each_mem_range(idx, &memblock.physmem, &oldmem_type, NUMA_NO_NODE,
> +			     MEMBLOCK_NONE, &start, &end, NULL) {
>  		phdr->p_filesz = end - start;
>  		phdr->p_type = PT_LOAD;
>  		phdr->p_offset = start;
> diff --git a/include/linux/memblock.h b/include/linux/memblock.h
> index e6a23b3db696..d70c2835e913 100644
> --- a/include/linux/memblock.h
> +++ b/include/linux/memblock.h
> @@ -142,7 +142,7 @@ void __next_reserved_mem_region(u64 *idx, phys_addr_t *out_start,
>  void __memblock_free_late(phys_addr_t base, phys_addr_t size);
>  
>  /**
> - * for_each_mem_range - iterate through memblock areas from type_a and not
> + * __for_each_mem_range - iterate through memblock areas from type_a and not
>   * included in type_b. Or just type_a if type_b is NULL.
>   * @i: u64 used as loop variable
>   * @type_a: ptr to memblock_type to iterate
> @@ -153,7 +153,7 @@ void __memblock_free_late(phys_addr_t base, phys_addr_t size);
>   * @p_end: ptr to phys_addr_t for end address of the range, can be %NULL
>   * @p_nid: ptr to int for nid of the range, can be %NULL
>   */
> -#define for_each_mem_range(i, type_a, type_b, nid, flags,		\
> +#define __for_each_mem_range(i, type_a, type_b, nid, flags,		\
>  			   p_start, p_end, p_nid)			\
>  	for (i = 0, __next_mem_range(&i, nid, flags, type_a, type_b,	\
>  				     p_start, p_end, p_nid);		\
> @@ -182,6 +182,16 @@ void __memblock_free_late(phys_addr_t base, phys_addr_t size);
>  	     __next_mem_range_rev(&i, nid, flags, type_a, type_b,	\
>  				  p_start, p_end, p_nid))
>  
> +/**
> + * for_each_mem_range - iterate through memory areas.
> + * @i: u64 used as loop variable
> + * @p_start: ptr to phys_addr_t for start address of the range, can be %NULL
> + * @p_end: ptr to phys_addr_t for end address of the range, can be %NULL
> + */
> +#define for_each_mem_range(i, p_start, p_end) \
> +	__for_each_mem_range(i, &memblock.memory, NULL, NUMA_NO_NODE,	\
> +			     MEMBLOCK_NONE, p_start, p_end, NULL)
> +
>  /**
>   * for_each_reserved_mem_region - iterate over all reserved memblock areas
>   * @i: u64 used as loop variable
> @@ -287,8 +297,8 @@ int __init deferred_page_init_max_threads(const struct cpumask *node_cpumask);
>   * soon as memblock is initialized.
>   */
>  #define for_each_free_mem_range(i, nid, flags, p_start, p_end, p_nid)	\
> -	for_each_mem_range(i, &memblock.memory, &memblock.reserved,	\
> -			   nid, flags, p_start, p_end, p_nid)
> +	__for_each_mem_range(i, &memblock.memory, &memblock.reserved,	\
> +			     nid, flags, p_start, p_end, p_nid)
>  
>  /**
>   * for_each_free_mem_range_reverse - rev-iterate through free memblock areas
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index e028b87ce294..95af111d69d3 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -6972,8 +6972,7 @@ static void __init init_unavailable_mem(void)
>  	 * Loop through unavailable ranges not covered by memblock.memory.
>  	 */
>  	pgcnt = 0;
> -	for_each_mem_range(i, &memblock.memory, NULL,
> -			NUMA_NO_NODE, MEMBLOCK_NONE, &start, &end, NULL) {
> +	for_each_mem_range(i, &start, &end) {
>  		if (next < start)
>  			pgcnt += init_unavailable_range(PFN_DOWN(next),
>  							PFN_UP(start));
> -- 
> 2.26.2
> 
> 


^ permalink raw reply

* Re: [PATCH 10/15] memblock: make memblock_debug and related functionality private
From: Baoquan He @ 2020-07-30  1:54 UTC (permalink / raw)
  To: Mike Rapoport
  Cc: linux-sh, Peter Zijlstra, Dave Hansen, linux-mips, Max Filippov,
	Paul Mackerras, sparclinux, linux-riscv, Will Deacon,
	Stafford Horne, Marek Szyprowski, linux-s390, linux-c6x-dev,
	Yoshinori Sato, x86, Russell King, Mike Rapoport,
	clang-built-linux, Ingo Molnar, Catalin Marinas, uclinux-h8-devel,
	linux-xtensa, openrisc, Borislav Petkov, Andy Lutomirski,
	Paul Walmsley, Thomas Gleixner, linux-arm-kernel, Michal Simek,
	linux-mm, linuxppc-dev, linux-kernel, iommu, Palmer Dabbelt,
	Andrew Morton, Christoph Hellwig
In-Reply-To: <20200728051153.1590-11-rppt@kernel.org>

On 07/28/20 at 08:11am, Mike Rapoport wrote:
> From: Mike Rapoport <rppt@linux.ibm.com>
> 
> The only user of memblock_dbg() outside memblock was s390 setup code and it
> is converted to use pr_debug() instead.
> This allows to stop exposing memblock_debug and memblock_dbg() to the rest
> of the kernel.
> 
> Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
> ---
>  arch/s390/kernel/setup.c |  4 ++--
>  include/linux/memblock.h | 12 +-----------
>  mm/memblock.c            | 13 +++++++++++--
>  3 files changed, 14 insertions(+), 15 deletions(-)

Nice clean up.

Reviewed-by: Baoquan He <bhe@redhat.com>

> 
> diff --git a/arch/s390/kernel/setup.c b/arch/s390/kernel/setup.c
> index 07aa15ba43b3..8b284cf6e199 100644
> --- a/arch/s390/kernel/setup.c
> +++ b/arch/s390/kernel/setup.c
> @@ -776,8 +776,8 @@ static void __init memblock_add_mem_detect_info(void)
>  	unsigned long start, end;
>  	int i;
>  
> -	memblock_dbg("physmem info source: %s (%hhd)\n",
> -		     get_mem_info_source(), mem_detect.info_source);
> +	pr_debug("physmem info source: %s (%hhd)\n",
> +		 get_mem_info_source(), mem_detect.info_source);
>  	/* keep memblock lists close to the kernel */
>  	memblock_set_bottom_up(true);
>  	for_each_mem_detect_block(i, &start, &end) {
> diff --git a/include/linux/memblock.h b/include/linux/memblock.h
> index 220b5f0dad42..e6a23b3db696 100644
> --- a/include/linux/memblock.h
> +++ b/include/linux/memblock.h
> @@ -90,7 +90,6 @@ struct memblock {
>  };
>  
>  extern struct memblock memblock;
> -extern int memblock_debug;
>  
>  #ifndef CONFIG_ARCH_KEEP_MEMBLOCK
>  #define __init_memblock __meminit
> @@ -102,9 +101,6 @@ void memblock_discard(void);
>  static inline void memblock_discard(void) {}
>  #endif
>  
> -#define memblock_dbg(fmt, ...) \
> -	if (memblock_debug) printk(KERN_INFO pr_fmt(fmt), ##__VA_ARGS__)
> -
>  phys_addr_t memblock_find_in_range(phys_addr_t start, phys_addr_t end,
>  				   phys_addr_t size, phys_addr_t align);
>  void memblock_allow_resize(void);
> @@ -456,13 +452,7 @@ bool memblock_is_region_memory(phys_addr_t base, phys_addr_t size);
>  bool memblock_is_reserved(phys_addr_t addr);
>  bool memblock_is_region_reserved(phys_addr_t base, phys_addr_t size);
>  
> -extern void __memblock_dump_all(void);
> -
> -static inline void memblock_dump_all(void)
> -{
> -	if (memblock_debug)
> -		__memblock_dump_all();
> -}
> +void memblock_dump_all(void);
>  
>  /**
>   * memblock_set_current_limit - Set the current allocation limit to allow
> diff --git a/mm/memblock.c b/mm/memblock.c
> index a5b9b3df81fc..824938849f6d 100644
> --- a/mm/memblock.c
> +++ b/mm/memblock.c
> @@ -134,7 +134,10 @@ struct memblock memblock __initdata_memblock = {
>  	     i < memblock_type->cnt;					\
>  	     i++, rgn = &memblock_type->regions[i])
>  
> -int memblock_debug __initdata_memblock;
> +#define memblock_dbg(fmt, ...) \
> +	if (memblock_debug) printk(KERN_INFO pr_fmt(fmt), ##__VA_ARGS__)
> +
> +static int memblock_debug __initdata_memblock;
>  static bool system_has_some_mirror __initdata_memblock = false;
>  static int memblock_can_resize __initdata_memblock;
>  static int memblock_memory_in_slab __initdata_memblock = 0;
> @@ -1919,7 +1922,7 @@ static void __init_memblock memblock_dump(struct memblock_type *type)
>  	}
>  }
>  
> -void __init_memblock __memblock_dump_all(void)
> +static void __init_memblock __memblock_dump_all(void)
>  {
>  	pr_info("MEMBLOCK configuration:\n");
>  	pr_info(" memory size = %pa reserved size = %pa\n",
> @@ -1933,6 +1936,12 @@ void __init_memblock __memblock_dump_all(void)
>  #endif
>  }
>  
> +void __init_memblock memblock_dump_all(void)
> +{
> +	if (memblock_debug)
> +		__memblock_dump_all();
> +}
> +
>  void __init memblock_allow_resize(void)
>  {
>  	memblock_can_resize = 1;
> -- 
> 2.26.2
> 
> 


^ permalink raw reply

* Re: [PATCH 09/15] memblock: make for_each_memblock_type() iterator private
From: Baoquan He @ 2020-07-30  1:52 UTC (permalink / raw)
  To: Mike Rapoport
  Cc: linux-sh, Peter Zijlstra, Dave Hansen, linux-mips, Max Filippov,
	Paul Mackerras, sparclinux, linux-riscv, Will Deacon,
	Stafford Horne, Marek Szyprowski, linux-s390, linux-c6x-dev,
	Yoshinori Sato, x86, Russell King, Mike Rapoport,
	clang-built-linux, Ingo Molnar, Catalin Marinas, uclinux-h8-devel,
	linux-xtensa, openrisc, Borislav Petkov, Andy Lutomirski,
	Paul Walmsley, Thomas Gleixner, linux-arm-kernel, Michal Simek,
	linux-mm, linuxppc-dev, linux-kernel, iommu, Palmer Dabbelt,
	Andrew Morton, Christoph Hellwig
In-Reply-To: <20200728051153.1590-10-rppt@kernel.org>

On 07/28/20 at 08:11am, Mike Rapoport wrote:
> From: Mike Rapoport <rppt@linux.ibm.com>
> 
> for_each_memblock_type() is not used outside mm/memblock.c, move it there
> from include/linux/memblock.h
> 
> ---
>  include/linux/memblock.h | 5 -----
>  mm/memblock.c            | 5 +++++
>  2 files changed, 5 insertions(+), 5 deletions(-)
> 
> diff --git a/include/linux/memblock.h b/include/linux/memblock.h
> index 017fae833d4a..220b5f0dad42 100644
> --- a/include/linux/memblock.h
> +++ b/include/linux/memblock.h
> @@ -532,11 +532,6 @@ static inline unsigned long memblock_region_reserved_end_pfn(const struct memblo
>  	     region < (memblock.memblock_type.regions + memblock.memblock_type.cnt);	\
>  	     region++)
>  
> -#define for_each_memblock_type(i, memblock_type, rgn)			\
> -	for (i = 0, rgn = &memblock_type->regions[0];			\
> -	     i < memblock_type->cnt;					\
> -	     i++, rgn = &memblock_type->regions[i])
> -
>  extern void *alloc_large_system_hash(const char *tablename,
>  				     unsigned long bucketsize,
>  				     unsigned long numentries,
> diff --git a/mm/memblock.c b/mm/memblock.c
> index 39aceafc57f6..a5b9b3df81fc 100644
> --- a/mm/memblock.c
> +++ b/mm/memblock.c
> @@ -129,6 +129,11 @@ struct memblock memblock __initdata_memblock = {
>  	.current_limit		= MEMBLOCK_ALLOC_ANYWHERE,
>  };
>  
> +#define for_each_memblock_type(i, memblock_type, rgn)			\
> +	for (i = 0, rgn = &memblock_type->regions[0];			\
> +	     i < memblock_type->cnt;					\
> +	     i++, rgn = &memblock_type->regions[i])
> +

Reviewed-by: Baoquan He <bhe@redhat.com>


^ permalink raw reply

* Re: [PATCH v2] powerpc/vio: drop bus_type from parent device
From: Michael Ellerman @ 2020-07-30  1:28 UTC (permalink / raw)
  To: Thadeu Lima de Souza Cascardo, linuxppc-dev
  Cc: cascardo, Peter Rajnoha, gregkh
In-Reply-To: <20200406155748.6761-1-cascardo@canonical.com>

[ Added Peter & Greg to Cc ]

Thadeu Lima de Souza Cascardo <cascardo@canonical.com> writes:
> Commit df44b479654f62b478c18ee4d8bc4e9f897a9844 ("kobject: return error
> code if writing /sys/.../uevent fails") started returning failure when
> writing to /sys/devices/vio/uevent.
>
> This causes an early udevadm trigger to fail. On some installer versions of
> Ubuntu, this will cause init to exit, thus panicing the system very early
> during boot.
>
> Removing the bus_type from the parent device will remove some of the extra
> empty files from /sys/devices/vio/, but will keep the rest of the layout
> for vio devices, keeping them under /sys/devices/vio/.

What exactly does it change?

I'm finding it hard to evaluate if this change is going to cause a
regression somehow.

I'm also not clear on why removing the bus type is correct, apart from
whether it fixes the bug you're seeing.

> It has been tested that uevents for vio devices don't change after this
> fix, they still contain MODALIAS.
>
> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
> Fixes: df44b479654f ("kobject: return error code if writing /sys/.../uevent fails")

AFAICS there haven't been any other fixes for that commit. Do we know
why it is only vio that was affected? (possibly because it's a fake bus
to begin with?)

cheers

> diff --git a/arch/powerpc/platforms/pseries/vio.c b/arch/powerpc/platforms/pseries/vio.c
> index 37f1f25ba804..a94dab3972a0 100644
> --- a/arch/powerpc/platforms/pseries/vio.c
> +++ b/arch/powerpc/platforms/pseries/vio.c
> @@ -36,7 +36,6 @@ static struct vio_dev vio_bus_device  = { /* fake "parent" device */
>  	.name = "vio",
>  	.type = "",
>  	.dev.init_name = "vio",
> -	.dev.bus = &vio_bus_type,
>  };
>  
>  #ifdef CONFIG_PPC_SMLPAR
> -- 
> 2.20.1

^ permalink raw reply

* Re: [PATCH] powerpc/pseries: explicitly reschedule during drmem_lmb list traversal
From: Michael Ellerman @ 2020-07-30  0:57 UTC (permalink / raw)
  To: Nathan Lynch, Laurent Dufour; +Cc: tyreld, cheloha, linuxppc-dev
In-Reply-To: <878sf31m8k.fsf@linux.ibm.com>

Nathan Lynch <nathanl@linux.ibm.com> writes:
> Laurent Dufour <ldufour@linux.ibm.com> writes:
>> Le 28/07/2020 à 19:37, Nathan Lynch a écrit :
>>> The drmem lmb list can have hundreds of thousands of entries, and
>>> unfortunately lookups take the form of linear searches. As long as
>>> this is the case, traversals have the potential to monopolize the CPU
>>> and provoke lockup reports, workqueue stalls, and the like unless
>>> they explicitly yield.
>>> 
>>> Rather than placing cond_resched() calls within various
>>> for_each_drmem_lmb() loop blocks in the code, put it in the iteration
>>> expression of the loop macro itself so users can't omit it.
>>
>> Is that not too much to call cond_resched() on every LMB?
>>
>> Could that be less frequent, every 10, or 100, I don't really know ?
>
> Everything done within for_each_drmem_lmb is relatively heavyweight
> already. E.g. calling dlpar_remove_lmb()/dlpar_add_lmb() can take dozens
> of milliseconds. I don't think cond_resched() is an expensive check in
> this context.

Hmm, mostly.

But there are quite a few cases like drmem_update_dt_v1():

	for_each_drmem_lmb(lmb) {
		dr_cell->base_addr = cpu_to_be64(lmb->base_addr);
		dr_cell->drc_index = cpu_to_be32(lmb->drc_index);
		dr_cell->aa_index = cpu_to_be32(lmb->aa_index);
		dr_cell->flags = cpu_to_be32(drmem_lmb_flags(lmb));

		dr_cell++;
	}

Which will compile to a pretty tight loop at the moment.

Or drmem_update_dt_v2() which has two loops over all lmbs.

And although the actual TIF check is cheap the function call to do it is
not free.

So I worry this is going to make some of those long loops take even longer.

At the same time I don't see an easy way to batch the calls to
cond_resched() without more intrusive changes.

cheers

^ permalink raw reply

* Re: [PATCH] powerpc: fix function annotations to avoid section mismatch warnings with gcc-10
From: Segher Boessenkool @ 2020-07-29 22:44 UTC (permalink / raw)
  To: Vladis Dronov
  Cc: Aneesh Kumar K . V, linuxppc-dev, linux-kernel, Paul Mackerras
In-Reply-To: <584129967.9672326.1596051896801.JavaMail.zimbra@redhat.com>

On Wed, Jul 29, 2020 at 03:44:56PM -0400, Vladis Dronov wrote:
> > > Certain warnings are emitted for powerpc code when building with a gcc-10
> > > toolset:
> > > 
> > >     WARNING: modpost: vmlinux.o(.text.unlikely+0x377c): Section mismatch in
> > >     reference from the function remove_pmd_table() to the function
> > >     .meminit.text:split_kernel_mapping()
> > >     The function remove_pmd_table() references
> > >     the function __meminit split_kernel_mapping().
> > >     This is often because remove_pmd_table lacks a __meminit
> > >     annotation or the annotation of split_kernel_mapping is wrong.
> > > 
> > > Add the appropriate __init and __meminit annotations to make modpost not
> > > complain. In all the cases there are just a single callsite from another
> > > __init or __meminit function:
> > > 
> > > __meminit remove_pagetable() -> remove_pud_table() -> remove_pmd_table()
> > > __init prom_init() -> setup_secure_guest()
> > > __init xive_spapr_init() -> xive_spapr_disabled()
> > 
> > So what changed?  These functions were inlined with older compilers, but
> > not anymore?
> 
> Yes, exactly. Gcc-10 does not inline them anymore. If this is because of my
> build system, this can happen to others also.
> 
> The same thing was fixed by Linus in e99332e7b4cd ("gcc-10: mark more functions
> __init to avoid section mismatch warnings").

It sounds like this is part of "-finline-functions was retuned" on
<https://gcc.gnu.org/gcc-10/changes.html>?  So everyone should see it
(no matter what config or build system), and it is a good thing too :-)

Thanks for the confirmation,


Segher

^ permalink raw reply

* Re: [PATCH net] ibmvnic: Fix IRQ mapping disposal in error path
From: Thomas Falcon @ 2020-07-29 22:37 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: drt, netdev, linuxppc-dev
In-Reply-To: <20200729152820.79c00fe7@kicinski-fedora-pc1c0hjn.dhcp.thefacebook.com>


On 7/29/20 5:28 PM, Jakub Kicinski wrote:
> On Wed, 29 Jul 2020 16:36:32 -0500 Thomas Falcon wrote:
>> RX queue IRQ mappings are disposed in both the TX IRQ and RX IRQ
>> error paths. Fix this and dispose of TX IRQ mappings correctly in
>> case of an error.
>>
>> Signed-off-by: Thomas Falcon <tlfalcon@linux.ibm.com>
> Thomas, please remember about Fixes tags (for networking patches,
> at least):
>
> Fixes: ea22d51a7831 ("ibmvnic: simplify and improve driver probe function")

Sorry, Jakub, I will try not to forget next time. Thanks.


^ permalink raw reply

* Re: [PATCH net] ibmvnic: Fix IRQ mapping disposal in error path
From: David Miller @ 2020-07-29 22:36 UTC (permalink / raw)
  To: tlfalcon; +Cc: drt, netdev, linuxppc-dev
In-Reply-To: <1596058592-12025-1-git-send-email-tlfalcon@linux.ibm.com>

From: Thomas Falcon <tlfalcon@linux.ibm.com>
Date: Wed, 29 Jul 2020 16:36:32 -0500

> RX queue IRQ mappings are disposed in both the TX IRQ and RX IRQ
> error paths. Fix this and dispose of TX IRQ mappings correctly in
> case of an error.
> 
> Signed-off-by: Thomas Falcon <tlfalcon@linux.ibm.com>

Applied with Fixes: tag added and queued up for -stable.

^ permalink raw reply

* Re: [PATCH net] ibmvnic: Fix IRQ mapping disposal in error path
From: Jakub Kicinski @ 2020-07-29 22:28 UTC (permalink / raw)
  To: Thomas Falcon; +Cc: drt, netdev, linuxppc-dev
In-Reply-To: <1596058592-12025-1-git-send-email-tlfalcon@linux.ibm.com>

On Wed, 29 Jul 2020 16:36:32 -0500 Thomas Falcon wrote:
> RX queue IRQ mappings are disposed in both the TX IRQ and RX IRQ
> error paths. Fix this and dispose of TX IRQ mappings correctly in
> case of an error.
> 
> Signed-off-by: Thomas Falcon <tlfalcon@linux.ibm.com>

Thomas, please remember about Fixes tags (for networking patches, 
at least):

Fixes: ea22d51a7831 ("ibmvnic: simplify and improve driver probe function")

^ permalink raw reply

* Re: [PATCH v2] pseries/drmem: don't cache node id in drmem_lmb struct
From: Laurent Dufour @ 2020-07-29 15:32 UTC (permalink / raw)
  To: Scott Cheloha, linuxppc-dev
  Cc: Nathan Lynch, Michal Suchanek, David Hildenbrand, Rick Lindsley
In-Reply-To: <20200728165339.3031126-1-cheloha@linux.ibm.com>

Hi Scott,

Le 28/07/2020 à 18:53, Scott Cheloha a écrit :
> At memory hot-remove time we can retrieve an LMB's nid from its
> corresponding memory_block.  There is no need to store the nid
> in multiple locations.
> 
> Note that lmb_to_memblock() uses find_memory_block() to get the
> corresponding memory_block.  As find_memory_block() runs in sub-linear
> time this approach is negligibly slower than what we do at present.
> 
> In exchange for this lookup at hot-remove time we no longer need to
> call memory_add_physaddr_to_nid() during drmem_init() for each LMB.
> On powerpc, memory_add_physaddr_to_nid() is a linear search, so this
> spares us an O(n^2) initialization during boot.
> 
> On systems with many LMBs that initialization overhead is palpable and
> disruptive.  For example, on a box with 249854 LMBs we're seeing
> drmem_init() take upwards of 30 seconds to complete:
> 
> [   53.721639] drmem: initializing drmem v2
> [   80.604346] watchdog: BUG: soft lockup - CPU#65 stuck for 23s! [swapper/0:1]
> [   80.604377] Modules linked in:
> [   80.604389] CPU: 65 PID: 1 Comm: swapper/0 Not tainted 5.6.0-rc2+ #4
> [   80.604397] NIP:  c0000000000a4980 LR: c0000000000a4940 CTR: 0000000000000000
> [   80.604407] REGS: c0002dbff8493830 TRAP: 0901   Not tainted  (5.6.0-rc2+)
> [   80.604412] MSR:  8000000002009033 <SF,VEC,EE,ME,IR,DR,RI,LE>  CR: 44000248  XER: 0000000d
> [   80.604431] CFAR: c0000000000a4a38 IRQMASK: 0
> [   80.604431] GPR00: c0000000000a4940 c0002dbff8493ac0 c000000001904400 c0003cfffffede30
> [   80.604431] GPR04: 0000000000000000 c000000000f4095a 000000000000002f 0000000010000000
> [   80.604431] GPR08: c0000bf7ecdb7fb8 c0000bf7ecc2d3c8 0000000000000008 c00c0002fdfb2001
> [   80.604431] GPR12: 0000000000000000 c00000001e8ec200
> [   80.604477] NIP [c0000000000a4980] hot_add_scn_to_nid+0xa0/0x3e0
> [   80.604486] LR [c0000000000a4940] hot_add_scn_to_nid+0x60/0x3e0
> [   80.604492] Call Trace:
> [   80.604498] [c0002dbff8493ac0] [c0000000000a4940] hot_add_scn_to_nid+0x60/0x3e0 (unreliable)
> [   80.604509] [c0002dbff8493b20] [c000000000087c10] memory_add_physaddr_to_nid+0x20/0x60
> [   80.604521] [c0002dbff8493b40] [c0000000010d4880] drmem_init+0x25c/0x2f0
> [   80.604530] [c0002dbff8493c10] [c000000000010154] do_one_initcall+0x64/0x2c0
> [   80.604540] [c0002dbff8493ce0] [c0000000010c4aa0] kernel_init_freeable+0x2d8/0x3a0
> [   80.604550] [c0002dbff8493db0] [c000000000010824] kernel_init+0x2c/0x148
> [   80.604560] [c0002dbff8493e20] [c00000000000b648] ret_from_kernel_thread+0x5c/0x74
> [   80.604567] Instruction dump:
> [   80.604574] 392918e8 e9490000 e90a000a e92a0000 80ea000c 1d080018 3908ffe8 7d094214
> [   80.604586] 7fa94040 419d00dc e9490010 714a0088 <2faa0008> 409e00ac e9490000 7fbe5040
> [   89.047390] drmem: 249854 LMB(s)
> 
> With a patched kernel on the same machine we're no longer seeing the
> soft lockup.  drmem_init() now completes in negligible time, even when
> the LMB count is large.
> 
> Signed-off-by: Scott Cheloha <cheloha@linux.ibm.com>
> ---
>   arch/powerpc/include/asm/drmem.h              | 21 -------------------
>   arch/powerpc/mm/drmem.c                       |  6 +-----
>   .../platforms/pseries/hotplug-memory.c        | 19 ++++++++++-------
>   3 files changed, 13 insertions(+), 33 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/drmem.h b/arch/powerpc/include/asm/drmem.h
> index 414d209f45bb..34e4e9b257f5 100644
> --- a/arch/powerpc/include/asm/drmem.h
> +++ b/arch/powerpc/include/asm/drmem.h
> @@ -13,9 +13,6 @@ struct drmem_lmb {
>   	u32     drc_index;
>   	u32     aa_index;
>   	u32     flags;
> -#ifdef CONFIG_MEMORY_HOTPLUG
> -	int	nid;
> -#endif
>   };
> 
>   struct drmem_lmb_info {
> @@ -104,22 +101,4 @@ static inline void invalidate_lmb_associativity_index(struct drmem_lmb *lmb)
>   	lmb->aa_index = 0xffffffff;
>   }
> 
> -#ifdef CONFIG_MEMORY_HOTPLUG
> -static inline void lmb_set_nid(struct drmem_lmb *lmb)
> -{
> -	lmb->nid = memory_add_physaddr_to_nid(lmb->base_addr);
> -}
> -static inline void lmb_clear_nid(struct drmem_lmb *lmb)
> -{
> -	lmb->nid = -1;
> -}
> -#else
> -static inline void lmb_set_nid(struct drmem_lmb *lmb)
> -{
> -}
> -static inline void lmb_clear_nid(struct drmem_lmb *lmb)
> -{
> -}
> -#endif
> -
>   #endif /* _ASM_POWERPC_LMB_H */
> diff --git a/arch/powerpc/mm/drmem.c b/arch/powerpc/mm/drmem.c
> index 59327cefbc6a..873fcfc7b875 100644
> --- a/arch/powerpc/mm/drmem.c
> +++ b/arch/powerpc/mm/drmem.c
> @@ -362,10 +362,8 @@ static void __init init_drmem_v1_lmbs(const __be32 *prop)
>   	if (!drmem_info->lmbs)
>   		return;
> 
> -	for_each_drmem_lmb(lmb) {
> +	for_each_drmem_lmb(lmb)
>   		read_drconf_v1_cell(lmb, &prop);
> -		lmb_set_nid(lmb);
> -	}
>   }
> 
>   static void __init init_drmem_v2_lmbs(const __be32 *prop)
> @@ -410,8 +408,6 @@ static void __init init_drmem_v2_lmbs(const __be32 *prop)
> 
>   			lmb->aa_index = dr_cell.aa_index;
>   			lmb->flags = dr_cell.flags;
> -
> -			lmb_set_nid(lmb);
>   		}
>   	}
>   }
> diff --git a/arch/powerpc/platforms/pseries/hotplug-memory.c b/arch/powerpc/platforms/pseries/hotplug-memory.c
> index 5ace2f9a277e..7bf66fdcf916 100644
> --- a/arch/powerpc/platforms/pseries/hotplug-memory.c
> +++ b/arch/powerpc/platforms/pseries/hotplug-memory.c
> @@ -356,25 +356,29 @@ static int dlpar_add_lmb(struct drmem_lmb *);
> 
>   static int dlpar_remove_lmb(struct drmem_lmb *lmb)
>   {
> +	struct memory_block *mem_block;
>   	unsigned long block_sz;
>   	int rc;
> 
>   	if (!lmb_is_removable(lmb))
>   		return -EINVAL;
> 
> +	mem_block = lmb_to_memblock(lmb);
> +	if (mem_block == NULL)
> +		return -EINVAL;
> +

lmb_to_memblock is 'getting' the memblock's device, so you should call 
put_device(mem_block->device) to reverse that operation.

Something like:

--- a/arch/powerpc/platforms/pseries/hotplug-memory.c
+++ b/arch/powerpc/platforms/pseries/hotplug-memory.c
@@ -367,7 +367,7 @@ static int dlpar_remove_lmb(struct drmem_lmb *lmb)

  	rc = dlpar_offline_lmb(lmb);
  	if (rc)
-		return rc;
+		goto out;

  	block_sz = pseries_memory_block_size();

@@ -379,7 +379,9 @@ static int dlpar_remove_lmb(struct drmem_lmb *lmb)
  	invalidate_lmb_associativity_index(lmb);
  	lmb->flags &= ~DRCONF_MEM_ASSIGNED;

-	return 0;
+out:
+	put_device(&mem_block->dev);
+	return rc;
  }

  static int dlpar_memory_remove_by_count(u32 lmbs_to_remove)



>   	rc = dlpar_offline_lmb(lmb);
>   	if (rc)
>   		return rc;
> 
>   	block_sz = pseries_memory_block_size();
> 
> -	__remove_memory(lmb->nid, lmb->base_addr, block_sz);
> +	__remove_memory(mem_block->nid, lmb->base_addr, block_sz);
> 
>   	/* Update memory regions for memory remove */
>   	memblock_remove(lmb->base_addr, block_sz);
> 
>   	invalidate_lmb_associativity_index(lmb);
> -	lmb_clear_nid(lmb);
>   	lmb->flags &= ~DRCONF_MEM_ASSIGNED;
> 
>   	return 0;
> @@ -631,7 +635,7 @@ static int dlpar_memory_remove_by_ic(u32 lmbs_to_remove, u32 drc_index)
>   static int dlpar_add_lmb(struct drmem_lmb *lmb)
>   {
>   	unsigned long block_sz;
> -	int rc;
> +	int nid, rc;
> 
>   	if (lmb->flags & DRCONF_MEM_ASSIGNED)
>   		return -EINVAL;
> @@ -642,11 +646,13 @@ static int dlpar_add_lmb(struct drmem_lmb *lmb)
>   		return rc;
>   	}
> 
> -	lmb_set_nid(lmb);
>   	block_sz = memory_block_size_bytes();
> 
> +	/* Find the node id for this address. */
> +	nid = memory_add_physaddr_to_nid(lmb->base_addr);

This function is calling hot_add_drconf_scn_to_nid() which is walking all the 
LMBs linearly, so we may spend a lot of time here.

I think it be nice to take some breath in the caller's loop in 
dlpar_memory_add_by_count() to avoid soft lockup to be raised there when adding 
a large number of LMBs, isn't it?

Nathan sent a patch to add such a breath in for_each_drmem_lmb_in_range() may be 
it would be nice to add one in dlpar_memory_add_by_count() through that patch.

> +
>   	/* Add the memory */
> -	rc = __add_memory(lmb->nid, lmb->base_addr, block_sz);
> +	rc = __add_memory(nid, lmb->base_addr, block_sz);
>   	if (rc) {
>   		invalidate_lmb_associativity_index(lmb);
>   		return rc;
> @@ -654,9 +660,8 @@ static int dlpar_add_lmb(struct drmem_lmb *lmb)
> 
>   	rc = dlpar_online_lmb(lmb);
>   	if (rc) {
> -		__remove_memory(lmb->nid, lmb->base_addr, block_sz);
> +		__remove_memory(nid, lmb->base_addr, block_sz);
>   		invalidate_lmb_associativity_index(lmb);
> -		lmb_clear_nid(lmb);
>   	} else {
>   		lmb->flags |= DRCONF_MEM_ASSIGNED;
>   	}
> 


^ permalink raw reply

* linux-next: Fixes tag needs some work in the powerpc tree
From: Stephen Rothwell @ 2020-07-29 21:55 UTC (permalink / raw)
  To: Michael Ellerman, PowerPC
  Cc: Athira Rajeev, Linux Next Mailing List, Linux Kernel Mailing List

[-- Attachment #1: Type: text/plain, Size: 390 bytes --]

Hi all,

In commit

  443359aebce0 ("powerpc/perf: Fix MMCRA_BHRB_DISABLE define for binutils < 2.28")

Fixes tag

  Fixes: 9908c826d5ed ("Add Power10 PMU feature to DT CPU features")

has these problem(s):

  - Subject does not match target commit subject
    Just use
	git log -1 --format='Fixes: %h ("%s")'

Just a hint for the future.

-- 
Cheers,
Stephen Rothwell

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply

* [PATCH 2/2] spi: mpc512x-psc: Convert to use GPIO descriptors
From: Linus Walleij @ 2020-07-29 21:48 UTC (permalink / raw)
  To: Mark Brown, linux-spi
  Cc: Linus Walleij, Anatolij Gustschin, linuxppc-dev,
	Uwe Kleine-König
In-Reply-To: <20200729214817.478834-1-linus.walleij@linaro.org>

This driver is already relying on the core to provide
valid GPIO numbers in spi->cs_gpio through
of_spi_get_gpio_numbers(), so we can switch to letting
the core use GPIO descriptors instead.

Make sure that the .set_cs() is always called, as some controller
set-up is also needed.

Cc: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Cc: Anatolij Gustschin <agust@denx.de>
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
---
 drivers/spi/spi-mpc512x-psc.c | 33 +++++++--------------------------
 1 file changed, 7 insertions(+), 26 deletions(-)

diff --git a/drivers/spi/spi-mpc512x-psc.c b/drivers/spi/spi-mpc512x-psc.c
index 35313a77f977..dd8bba408301 100644
--- a/drivers/spi/spi-mpc512x-psc.c
+++ b/drivers/spi/spi-mpc512x-psc.c
@@ -23,7 +23,6 @@
 #include <linux/clk.h>
 #include <linux/spi/spi.h>
 #include <linux/fsl_devices.h>
-#include <linux/gpio.h>
 #include <asm/mpc52xx_psc.h>
 
 enum {
@@ -99,7 +98,7 @@ static void mpc512x_psc_spi_set_cs(struct spi_device *spi, bool enable)
 	u16 bclkdiv;
 
 	if (!enable) {
-		if (mps->cs_control && gpio_is_valid(spi->cs_gpio))
+		if (mps->cs_control && spi->cs_gpiod)
 			mps->cs_control(spi, (spi->mode & SPI_CS_HIGH) ? 0 : 1);
 		return;
 	}
@@ -134,7 +133,7 @@ static void mpc512x_psc_spi_set_cs(struct spi_device *spi, bool enable)
 	out_be32(psc_addr(mps, ccr), ccr);
 	mps->bits_per_word = cs->bits_per_word;
 
-	if (mps->cs_control && gpio_is_valid(spi->cs_gpio))
+	if (mps->cs_control && spi->cs_gpiod)
 		mps->cs_control(spi, (spi->mode & SPI_CS_HIGH) ? 1 : 0);
 }
 
@@ -358,18 +357,6 @@ static int mpc512x_psc_spi_setup(struct spi_device *spi)
 		if (!cs)
 			return -ENOMEM;
 
-		if (gpio_is_valid(spi->cs_gpio)) {
-			ret = gpio_request(spi->cs_gpio, dev_name(&spi->dev));
-			if (ret) {
-				dev_err(&spi->dev, "can't get CS gpio: %d\n",
-					ret);
-				kfree(cs);
-				return ret;
-			}
-			gpio_direction_output(spi->cs_gpio,
-					spi->mode & SPI_CS_HIGH ? 0 : 1);
-		}
-
 		spi->controller_state = cs;
 	}
 
@@ -381,8 +368,6 @@ static int mpc512x_psc_spi_setup(struct spi_device *spi)
 
 static void mpc512x_psc_spi_cleanup(struct spi_device *spi)
 {
-	if (gpio_is_valid(spi->cs_gpio))
-		gpio_free(spi->cs_gpio);
 	kfree(spi->controller_state);
 }
 
@@ -461,11 +446,6 @@ static irqreturn_t mpc512x_psc_spi_isr(int irq, void *dev_id)
 	return IRQ_NONE;
 }
 
-static void mpc512x_spi_cs_control(struct spi_device *spi, bool onoff)
-{
-	gpio_set_value(spi->cs_gpio, onoff);
-}
-
 static int mpc512x_psc_spi_do_probe(struct device *dev, u32 regaddr,
 					      u32 size, unsigned int irq)
 {
@@ -485,10 +465,8 @@ static int mpc512x_psc_spi_do_probe(struct device *dev, u32 regaddr,
 	mps->type = (int)of_device_get_match_data(dev);
 	mps->irq = irq;
 
-	if (pdata == NULL) {
-		mps->cs_control = mpc512x_spi_cs_control;
-	} else {
-		mps->cs_control = pdata->cs_control;
+	if (pdata) {
+		mps->cs_control = pdata->cs_control;x
 		master->bus_num = pdata->bus_num;
 		master->num_chipselect = pdata->max_chipselect;
 	}
@@ -499,6 +477,9 @@ static int mpc512x_psc_spi_do_probe(struct device *dev, u32 regaddr,
 	master->transfer_one_message = mpc512x_psc_spi_msg_xfer;
 	master->unprepare_transfer_hardware = mpc512x_psc_spi_unprep_xfer_hw;
 	master->set_cs = mpc512x_psc_spi_set_cs;
+	/* This makes sure our custom .set_cs() is always called */
+	master->flags = SPI_MASTER_GPIO_SS;
+	master->use_gpio_descriptors = true;
 	master->cleanup = mpc512x_psc_spi_cleanup;
 	master->dev.of_node = dev->of_node;
 
-- 
2.26.2


^ permalink raw reply related

* [PATCH 1/2] spi: mpc512x-psc: Use the framework .set_cs()
From: Linus Walleij @ 2020-07-29 21:48 UTC (permalink / raw)
  To: Mark Brown, linux-spi
  Cc: Linus Walleij, Anatolij Gustschin, linuxppc-dev,
	Uwe Kleine-König

The mpc512x-psc is rolling its own chip select control code,
but the SPI master framework can handle this. It was also
evaluating the CS status for each transfer but the CS change
should be per-message not per-transfer.

Switch to use the core .set_cs() to control the chip select.

Cc: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Cc: Anatolij Gustschin <agust@denx.de>
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
---
 drivers/spi/spi-mpc512x-psc.c | 30 ++++++++----------------------
 1 file changed, 8 insertions(+), 22 deletions(-)

diff --git a/drivers/spi/spi-mpc512x-psc.c b/drivers/spi/spi-mpc512x-psc.c
index ea1b07953d38..35313a77f977 100644
--- a/drivers/spi/spi-mpc512x-psc.c
+++ b/drivers/spi/spi-mpc512x-psc.c
@@ -89,7 +89,7 @@ static int mpc512x_psc_spi_transfer_setup(struct spi_device *spi,
 	return 0;
 }
 
-static void mpc512x_psc_spi_activate_cs(struct spi_device *spi)
+static void mpc512x_psc_spi_set_cs(struct spi_device *spi, bool enable)
 {
 	struct mpc512x_psc_spi_cs *cs = spi->controller_state;
 	struct mpc512x_psc_spi *mps = spi_master_get_devdata(spi->master);
@@ -98,6 +98,12 @@ static void mpc512x_psc_spi_activate_cs(struct spi_device *spi)
 	int speed;
 	u16 bclkdiv;
 
+	if (!enable) {
+		if (mps->cs_control && gpio_is_valid(spi->cs_gpio))
+			mps->cs_control(spi, (spi->mode & SPI_CS_HIGH) ? 0 : 1);
+		return;
+	}
+
 	sicr = in_be32(psc_addr(mps, sicr));
 
 	/* Set clock phase and polarity */
@@ -132,15 +138,6 @@ static void mpc512x_psc_spi_activate_cs(struct spi_device *spi)
 		mps->cs_control(spi, (spi->mode & SPI_CS_HIGH) ? 1 : 0);
 }
 
-static void mpc512x_psc_spi_deactivate_cs(struct spi_device *spi)
-{
-	struct mpc512x_psc_spi *mps = spi_master_get_devdata(spi->master);
-
-	if (mps->cs_control && gpio_is_valid(spi->cs_gpio))
-		mps->cs_control(spi, (spi->mode & SPI_CS_HIGH) ? 0 : 1);
-
-}
-
 /* extract and scale size field in txsz or rxsz */
 #define MPC512x_PSC_FIFO_SZ(sz) ((sz & 0x7ff) << 2);
 
@@ -290,40 +287,28 @@ static int mpc512x_psc_spi_msg_xfer(struct spi_master *master,
 				    struct spi_message *m)
 {
 	struct spi_device *spi;
-	unsigned cs_change;
 	int status;
 	struct spi_transfer *t;
 
 	spi = m->spi;
-	cs_change = 1;
 	status = 0;
 	list_for_each_entry(t, &m->transfers, transfer_list) {
 		status = mpc512x_psc_spi_transfer_setup(spi, t);
 		if (status < 0)
 			break;
 
-		if (cs_change)
-			mpc512x_psc_spi_activate_cs(spi);
-		cs_change = t->cs_change;
-
 		status = mpc512x_psc_spi_transfer_rxtx(spi, t);
 		if (status)
 			break;
 		m->actual_length += t->len;
 
 		spi_transfer_delay_exec(t);
-
-		if (cs_change)
-			mpc512x_psc_spi_deactivate_cs(spi);
 	}
 
 	m->status = status;
 	if (m->complete)
 		m->complete(m->context);
 
-	if (status || !cs_change)
-		mpc512x_psc_spi_deactivate_cs(spi);
-
 	mpc512x_psc_spi_transfer_setup(spi, NULL);
 
 	spi_finalize_current_message(master);
@@ -513,6 +498,7 @@ static int mpc512x_psc_spi_do_probe(struct device *dev, u32 regaddr,
 	master->prepare_transfer_hardware = mpc512x_psc_spi_prep_xfer_hw;
 	master->transfer_one_message = mpc512x_psc_spi_msg_xfer;
 	master->unprepare_transfer_hardware = mpc512x_psc_spi_unprep_xfer_hw;
+	master->set_cs = mpc512x_psc_spi_set_cs;
 	master->cleanup = mpc512x_psc_spi_cleanup;
 	master->dev.of_node = dev->of_node;
 
-- 
2.26.2


^ permalink raw reply related

* [PATCH net] ibmvnic: Fix IRQ mapping disposal in error path
From: Thomas Falcon @ 2020-07-29 21:36 UTC (permalink / raw)
  To: netdev; +Cc: drt, Thomas Falcon, linuxppc-dev

RX queue IRQ mappings are disposed in both the TX IRQ and RX IRQ
error paths. Fix this and dispose of TX IRQ mappings correctly in
case of an error.

Signed-off-by: Thomas Falcon <tlfalcon@linux.ibm.com>
---
 drivers/net/ethernet/ibm/ibmvnic.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/ibm/ibmvnic.c b/drivers/net/ethernet/ibm/ibmvnic.c
index 0fd7eae..5afb3c9 100644
--- a/drivers/net/ethernet/ibm/ibmvnic.c
+++ b/drivers/net/ethernet/ibm/ibmvnic.c
@@ -3206,7 +3206,7 @@ static int init_sub_crq_irqs(struct ibmvnic_adapter *adapter)
 req_tx_irq_failed:
 	for (j = 0; j < i; j++) {
 		free_irq(adapter->tx_scrq[j]->irq, adapter->tx_scrq[j]);
-		irq_dispose_mapping(adapter->rx_scrq[j]->irq);
+		irq_dispose_mapping(adapter->tx_scrq[j]->irq);
 	}
 	release_sub_crqs(adapter, 1);
 	return rc;
-- 
1.8.3.1


^ permalink raw reply related

* Re: [PATCH] powerpc: fix function annotations to avoid section mismatch warnings with gcc-10
From: Vladis Dronov @ 2020-07-29 19:44 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: Aneesh Kumar K . V, linuxppc-dev, linux-kernel, Paul Mackerras
In-Reply-To: <20200729144949.GF17447@gate.crashing.org>

Hello,

----- Original Message -----
> From: "Segher Boessenkool" <segher@kernel.crashing.org>
> To: "Vladis Dronov" <vdronov@redhat.com>
> Cc: linuxppc-dev@lists.ozlabs.org, "Aneesh Kumar K . V" <aneesh.kumar@linux.ibm.com>, linux-kernel@vger.kernel.org,
> "Paul Mackerras" <paulus@samba.org>
> Sent: Wednesday, July 29, 2020 4:49:49 PM
> Subject: Re: [PATCH] powerpc: fix function annotations to avoid section mismatch warnings with gcc-10
> 
> On Wed, Jul 29, 2020 at 03:37:41PM +0200, Vladis Dronov wrote:
> > Certain warnings are emitted for powerpc code when building with a gcc-10
> > toolset:
> > 
> >     WARNING: modpost: vmlinux.o(.text.unlikely+0x377c): Section mismatch in
> >     reference from the function remove_pmd_table() to the function
> >     .meminit.text:split_kernel_mapping()
> >     The function remove_pmd_table() references
> >     the function __meminit split_kernel_mapping().
> >     This is often because remove_pmd_table lacks a __meminit
> >     annotation or the annotation of split_kernel_mapping is wrong.
> > 
> > Add the appropriate __init and __meminit annotations to make modpost not
> > complain. In all the cases there are just a single callsite from another
> > __init or __meminit function:
> > 
> > __meminit remove_pagetable() -> remove_pud_table() -> remove_pmd_table()
> > __init prom_init() -> setup_secure_guest()
> > __init xive_spapr_init() -> xive_spapr_disabled()
> 
> So what changed?  These functions were inlined with older compilers, but
> not anymore?

Yes, exactly. Gcc-10 does not inline them anymore. If this is because of my
build system, this can happen to others also.

The same thing was fixed by Linus in e99332e7b4cd ("gcc-10: mark more functions
__init to avoid section mismatch warnings").

> 
> Segher

Best regards,
Vladis Dronov | Red Hat, Inc. | The Core Kernel | Senior Software Engineer


^ permalink raw reply

* Re: [PATCH v2] selftests: powerpc: Fix online CPU selection
From: Srikar Dronamraju @ 2020-07-29 16:03 UTC (permalink / raw)
  To: Sandipan Das; +Cc: kamalesh, shiganta, nasastry, harish, linuxppc-dev
In-Reply-To: <20200609073733.997643-1-sandipan@linux.ibm.com>

* Sandipan Das <sandipan@linux.ibm.com> [2020-06-09 13:07:33]:

> The size of the CPU affinity mask must be large enough for
> systems with a very large number of CPUs. Otherwise, tests
> which try to determine the first online CPU by calling
> sched_getaffinity() will fail. This makes sure that the size
> of the allocated affinity mask is dependent on the number of
> CPUs as reported by get_nprocs().
> 
> Fixes: 3752e453f6ba ("selftests/powerpc: Add tests of PMU EBBs")
> Reported-by: Shirisha Ganta <shiganta@in.ibm.com>
> Signed-off-by: Sandipan Das <sandipan@linux.ibm.com>
> Reviewed-by: Kamalesh Babulal <kamalesh@linux.vnet.ibm.com>
> ---
> Previous versions can be found at:
> v1: https://lore.kernel.org/linuxppc-dev/20200608144212.985144-1-sandipan@linux.ibm.com/
> 
> @@ -88,28 +89,40 @@ void *get_auxv_entry(int type)
> 
>  int pick_online_cpu(void)
>  {
> -	cpu_set_t mask;
> -	int cpu;
> +	int ncpus, cpu = -1;
> +	cpu_set_t *mask;
> +	size_t size;
> +
> +	ncpus = get_nprocs();

Please use get_nprocs_conf or sysconf(_SC_NPROCESSORS_CONF). The manpage
seems to suggest the latter. Not sure how accurate the manpage is.

get_nprocs is returning online cpus and when smt is off, the cpu numbers
would be sparse and hence the result from get_nprocs wouldn't be ideal for
allocating cpumask. However get_nprocs_conf would return the max configured
cpus and would be able to handle it. 

I think this was the same situation hit by Michael Ellerman.

> +	size = CPU_ALLOC_SIZE(ncpus);
> +	mask = CPU_ALLOC(ncpus);
> +	if (!mask) {
> +		perror("malloc");
> +		return -1;
> +	}
> 

-- 
Thanks and Regards
Srikar Dronamraju

^ permalink raw reply

* Re: [PATCH] powerpc/fadump: Fix build error with CONFIG_PRESERVE_FA_DUMP=y
From: Gustavo Romero @ 2020-07-29 15:03 UTC (permalink / raw)
  To: Michael Ellerman, linuxppc-dev
In-Reply-To: <20200727070341.595634-1-mpe@ellerman.id.au>

On 7/27/20 4:03 AM, Michael Ellerman wrote:
> skiroot_defconfig fails:
> 
> arch/powerpc/kernel/fadump.c:48:17: error: ‘cpus_in_fadump’ defined but not used
>     48 | static atomic_t cpus_in_fadump;
> 
> Fix it by moving the definition into the #ifdef where it's used.
> 
> Fixes: ba608c4fa12c ("powerpc/fadump: fix race between pstore write and fadump crash trigger")
> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
> ---
>   arch/powerpc/kernel/fadump.c | 4 +++-
>   1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c
> index 1858896d6809..10ebb4bf71ad 100644
> --- a/arch/powerpc/kernel/fadump.c
> +++ b/arch/powerpc/kernel/fadump.c
> @@ -45,10 +45,12 @@ static struct fw_dump fw_dump;
>   static void __init fadump_reserve_crash_area(u64 base);
> 
>   struct kobject *fadump_kobj;
> -static atomic_t cpus_in_fadump;
> 
>   #ifndef CONFIG_PRESERVE_FA_DUMP
> +
> +static atomic_t cpus_in_fadump;
>   static DEFINE_MUTEX(fadump_mutex);
> +
>   struct fadump_mrange_info crash_mrange_info = { "crash", NULL, 0, 0, 0, false };
> 
>   #define RESERVED_RNGS_SZ	16384 /* 16K - 128 entries */
> 

Tested-by: Gustavo Romero <gromero@linux.ibm.com>


Thanks,
Gustavo

^ permalink raw reply

* Re: [PATCH] powerpc: fix function annotations to avoid section mismatch warnings with gcc-10
From: Segher Boessenkool @ 2020-07-29 14:49 UTC (permalink / raw)
  To: Vladis Dronov
  Cc: Aneesh Kumar K . V, linuxppc-dev, linux-kernel, Paul Mackerras
In-Reply-To: <20200729133741.62789-1-vdronov@redhat.com>

On Wed, Jul 29, 2020 at 03:37:41PM +0200, Vladis Dronov wrote:
> Certain warnings are emitted for powerpc code when building with a gcc-10
> toolset:
> 
>     WARNING: modpost: vmlinux.o(.text.unlikely+0x377c): Section mismatch in
>     reference from the function remove_pmd_table() to the function
>     .meminit.text:split_kernel_mapping()
>     The function remove_pmd_table() references
>     the function __meminit split_kernel_mapping().
>     This is often because remove_pmd_table lacks a __meminit
>     annotation or the annotation of split_kernel_mapping is wrong.
> 
> Add the appropriate __init and __meminit annotations to make modpost not
> complain. In all the cases there are just a single callsite from another
> __init or __meminit function:
> 
> __meminit remove_pagetable() -> remove_pud_table() -> remove_pmd_table()
> __init prom_init() -> setup_secure_guest()
> __init xive_spapr_init() -> xive_spapr_disabled()

So what changed?  These functions were inlined with older compilers, but
not anymore?


Segher

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox