* Re: [PATCH v6 03/23] powerpc/book3s64/pkeys: pkeys are supported only on hash on book3s.
From: Aneesh Kumar K.V @ 2020-07-20 6:15 UTC (permalink / raw)
To: Michael Ellerman, linuxppc-dev; +Cc: linuxram, bauerman
In-Reply-To: <87h7u2u3ep.fsf@mpe.ellerman.id.au>
On 7/20/20 11:35 AM, Michael Ellerman wrote:
> "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> writes:
>> Move them to hash specific file and add BUG() for radix path.
>> ---
>> .../powerpc/include/asm/book3s/64/hash-pkey.h | 32 ++++++++++++++++
>> arch/powerpc/include/asm/book3s/64/pkeys.h | 25 +++++++++++++
>> arch/powerpc/include/asm/pkeys.h | 37 ++++---------------
>> 3 files changed, 64 insertions(+), 30 deletions(-)
>> create mode 100644 arch/powerpc/include/asm/book3s/64/hash-pkey.h
>> create mode 100644 arch/powerpc/include/asm/book3s/64/pkeys.h
>
> This isn't signed-off.
>
Not sure how i missed that.
> I assume you meant to, so I added it.
>
yes. Thanks for taking care of that.
-aneesh
^ permalink raw reply
* Re: [FIX PATCH] powerpc/prom: Enable Radix GTSE in cpu pa-features
From: Bharata B Rao @ 2020-07-20 6:17 UTC (permalink / raw)
To: Nicholas Piggin; +Cc: Qian Cai, aneesh.kumar, linuxppc-dev
In-Reply-To: <1595223290.jz1cmk38dz.astroid@bobo.none>
On Mon, Jul 20, 2020 at 03:38:29PM +1000, Nicholas Piggin wrote:
> Excerpts from Bharata B Rao's message of July 20, 2020 2:42 pm:
> > diff --git a/arch/powerpc/kernel/prom.c b/arch/powerpc/kernel/prom.c
> > index 9cc49f265c86..a9594bad572a 100644
> > --- a/arch/powerpc/kernel/prom.c
> > +++ b/arch/powerpc/kernel/prom.c
> > @@ -163,7 +163,8 @@ static struct ibm_pa_feature {
> > { .pabyte = 0, .pabit = 6, .cpu_features = CPU_FTR_NOEXECUTE },
> > { .pabyte = 1, .pabit = 2, .mmu_features = MMU_FTR_CI_LARGE_PAGE },
> > #ifdef CONFIG_PPC_RADIX_MMU
> > - { .pabyte = 40, .pabit = 0, .mmu_features = MMU_FTR_TYPE_RADIX },
> > + { .pabyte = 40, .pabit = 0,
> > + .mmu_features = (MMU_FTR_TYPE_RADIX | MMU_FTR_GTSE) },
>
> It might look better like this:
>
> { .pabyte = 1, .pabit = 2, .mmu_features = MMU_FTR_CI_LARGE_PAGE },
> #ifdef CONFIG_PPC_RADIX_MMU
> { .pabyte = 40, .pabit = 0, .mmu_features = MMU_FTR_TYPE_RADIX },
> { .pabyte = 40, .pabit = 0, .mmu_features = MMU_FTR_TYPE_RADIX |
> MMU_FTR_GTSE },
> #endif
> { .pabyte = 1, .pabit = 1, .invert = 1, .cpu_features = CPU_FTR_NODSISRALIGN },
>
> But that's bikeshedding a bit and the optional bits already put it out
> of alignment.
Here it is...
From 1be7f3f8b43503740431b7bdf585e488ecdeb48f Mon Sep 17 00:00:00 2001
From: Nicholas Piggin <npiggin@gmail.com>
Date: Mon, 20 Jul 2020 09:05:05 +0530
Subject: [FIX PATCH] powerpc/prom: Enable Radix GTSE in cpu pa-features
When '029ab30b4c0a ("powerpc/mm: Enable radix GTSE only if supported.")'
made GTSE an MMU feature, it was enabled by default in
powerpc-cpu-features but was missed in pa-features. This causes
random memory corruption during boot of PowerNV kernels if
CONFIG_PPC_DT_CPU_FTRS isn't enabled.
Fixes: 029ab30b4c0a ("powerpc/mm: Enable radix GTSE only if supported.")
Reported-by: Qian Cai <cai@lca.pw>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Bharata B Rao <bharata@linux.ibm.com>
---
arch/powerpc/kernel/prom.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/arch/powerpc/kernel/prom.c b/arch/powerpc/kernel/prom.c
index 9cc49f265c86..dae30e805e42 100644
--- a/arch/powerpc/kernel/prom.c
+++ b/arch/powerpc/kernel/prom.c
@@ -163,7 +163,8 @@ static struct ibm_pa_feature {
{ .pabyte = 0, .pabit = 6, .cpu_features = CPU_FTR_NOEXECUTE },
{ .pabyte = 1, .pabit = 2, .mmu_features = MMU_FTR_CI_LARGE_PAGE },
#ifdef CONFIG_PPC_RADIX_MMU
- { .pabyte = 40, .pabit = 0, .mmu_features = MMU_FTR_TYPE_RADIX },
+ { .pabyte = 40, .pabit = 0, .mmu_features = MMU_FTR_TYPE_RADIX |
+ MMU_FTR_GTSE },
#endif
{ .pabyte = 1, .pabit = 1, .invert = 1, .cpu_features = CPU_FTR_NODSISRALIGN },
{ .pabyte = 5, .pabit = 0, .cpu_features = CPU_FTR_REAL_LE,
--
2.26.2
^ permalink raw reply related
* Re: [PATCH 06/11] powerpc/smp: Generalize 2nd sched domain
From: Srikar Dronamraju @ 2020-07-20 6:19 UTC (permalink / raw)
To: Gautham R Shenoy
Cc: Nathan Lynch, Oliver OHalloran, Michael Neuling, Michael Ellerman,
Anton Blanchard, linuxppc-dev, Nick Piggin
In-Reply-To: <20200717063755.GA32531@in.ibm.com>
* Gautham R Shenoy <ego@linux.vnet.ibm.com> [2020-07-17 12:07:55]:
> On Tue, Jul 14, 2020 at 10:06:19AM +0530, Srikar Dronamraju wrote:
> > Currently "CACHE" domain happens to be the 2nd sched domain as per
> > powerpc_topology. This domain will collapse if cpumask of l2-cache is
> > same as SMT domain. However we could generalize this domain such that it
> > could mean either be a "CACHE" domain or a "BIGCORE" domain.
> >
> > While setting up the "CACHE" domain, check if shared_cache is already
> > set.
> >
> > Cc: linuxppc-dev <linuxppc-dev@lists.ozlabs.org>
> > Cc: Michael Ellerman <michaele@au1.ibm.com>
> > Cc: Nick Piggin <npiggin@au1.ibm.com>
> > Cc: Oliver OHalloran <oliveroh@au1.ibm.com>
> > Cc: Nathan Lynch <nathanl@linux.ibm.com>
> > Cc: Michael Neuling <mikey@linux.ibm.com>
> > Cc: Anton Blanchard <anton@au1.ibm.com>
> > Cc: Gautham R Shenoy <ego@linux.vnet.ibm.com>
> > Cc: Vaidyanathan Srinivasan <svaidy@linux.ibm.com>
> > Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
> > ---
> > @@ -867,11 +869,16 @@ static const struct cpumask *smallcore_smt_mask(int cpu)
> > }
> > #endif
> >
> > +static const struct cpumask *cpu_bigcore_mask(int cpu)
> > +{
> > + return cpu_core_mask(cpu);
>
> It should be cpu_smt_mask() if we want the redundant big-core to be
> degenerated in favour of the SMT level on P8, no? Because
> cpu_core_mask refers to all the CPUs that are in the same chip.
>
Right, but it cant be cpu_smt_mask since cpu_smt_mask is only enabled in
CONFIG_SCHED_SMT. I was looking at using sibling_map, but we have to careful
for power9 / PowerNV mode. Guess that should be fine.
> > +}
> > +
> > static struct sched_domain_topology_level powerpc_topology[] = {
> > #ifdef CONFIG_SCHED_SMT
> > { cpu_smt_mask, powerpc_smt_flags, SD_INIT_NAME(SMT) },
> > #endif
> > - { shared_cache_mask, powerpc_shared_cache_flags, SD_INIT_NAME(CACHE) },
> > + { cpu_bigcore_mask, SD_INIT_NAME(BIGCORE) },
> > { cpu_cpu_mask, SD_INIT_NAME(DIE) },
> > { NULL, },
> > };
> > @@ -1319,7 +1326,6 @@ static void add_cpu_to_masks(int cpu)
> > void start_secondary(void *unused)
> > {
> > unsigned int cpu = smp_processor_id();
> > - struct cpumask *(*sibling_mask)(int) = cpu_sibling_mask;
> >
> > mmgrab(&init_mm);
> > current->active_mm = &init_mm;
> > @@ -1345,14 +1351,20 @@ void start_secondary(void *unused)
> > /* Update topology CPU masks */
> > add_cpu_to_masks(cpu);
> >
> > - if (has_big_cores)
> > - sibling_mask = cpu_smallcore_mask;
> > /*
> > * Check for any shared caches. Note that this must be done on a
> > * per-core basis because one core in the pair might be disabled.
> > */
> > - if (!cpumask_equal(cpu_l2_cache_mask(cpu), sibling_mask(cpu)))
> > - shared_caches = true;
> > + if (!shared_caches) {
> > + struct cpumask *(*sibling_mask)(int) = cpu_sibling_mask;
> > + struct cpumask *mask = cpu_l2_cache_mask(cpu);
> > +
> > + if (has_big_cores)
> > + sibling_mask = cpu_smallcore_mask;
> > +
> > + if (cpumask_weight(mask) > cpumask_weight(sibling_mask(cpu)))
> > + shared_caches = true;
>
> Shouldn't we use cpumask_subset() here ?
Wouldn't cpumask_subset should return 1 if both are same?
We dont want to have shared_caches set if both the masks are equal.
>
> > + }
> >
> > set_numa_node(numa_cpu_lookup_table[cpu]);
> > set_numa_mem(local_memory_node(numa_cpu_lookup_table[cpu]));
> > @@ -1390,6 +1402,14 @@ void __init smp_cpus_done(unsigned int max_cpus)
> > smp_ops->bringup_done();
> >
> > dump_numa_cpu_topology();
> > + if (shared_caches) {
> > + pr_info("Using shared cache scheduler topology\n");
> > + powerpc_topology[bigcore_idx].mask = shared_cache_mask;
> > +#ifdef CONFIG_SCHED_DEBUG
> > + powerpc_topology[bigcore_idx].name = "CACHE";
> > +#endif
> > + powerpc_topology[bigcore_idx].sd_flags = powerpc_shared_cache_flags;
> > + }
>
>
> I would much rather that we have all the topology-fixups done in one
> function.
>
> fixup_topology(void) {
> if (has_big_core)
> powerpc_topology[smt_idx].mask = smallcore_smt_mask;
>
> if (shared_caches) {
> const char *name = "CACHE";
> powerpc_topology[bigcore_idx].mask = shared_cache_mask;
> strlcpy(powerpc_topology[bigcore_idx].name, name,
> strlen(name));
> powerpc_topology[bigcore_idx].sd_flags = powerpc_shared_cache_flags;
> }
>
> /* Any other changes to the topology structure here */
We could do this.
>
> And also as an optimization, get rid of degenerate structures here
> itself so that we don't pay additional penalty while building the
> sched-domains each time.
>
Yes this is definitely in plan, but slightly later in time.
Thanks for the review and comments.
--
Thanks and Regards
Srikar Dronamraju
^ permalink raw reply
* Re: [PATCH 3/5] dma-mapping: make support for dma ops optional
From: Christoph Hellwig @ 2020-07-20 6:20 UTC (permalink / raw)
To: Guenter Roeck
Cc: Daniel Borkmann, Alexey Kardashevskiy, Greg Kroah-Hartman,
linuxppc-dev, linux-kernel, iommu, Jesper Dangaard Brouer,
Robin Murphy, Christoph Hellwig
In-Reply-To: <20200718171714.GA237687@roeck-us.net>
On Sat, Jul 18, 2020 at 10:17:14AM -0700, Guenter Roeck wrote:
> On Wed, Jul 08, 2020 at 05:24:47PM +0200, Christoph Hellwig wrote:
> > Avoid the overhead of the dma ops support for tiny builds that only
> > use the direct mapping.
> >
> > Signed-off-by: Christoph Hellwig <hch@lst.de>
>
> For ppc:pmac32_defconfig and other configurations, this patch results in:
Fixed and force pushed.
^ permalink raw reply
* Re: [PATCH 3/5] cpuidle-pseries : Fixup exit latency for CEDE(0)
From: Vaidyanathan Srinivasan @ 2020-07-20 6:24 UTC (permalink / raw)
To: Gautham R. Shenoy
Cc: Nathan Lynch, Michael Neuling, linux-pm, linux-kernel,
Nicholas Piggin, linuxppc-dev
In-Reply-To: <1594120299-31389-4-git-send-email-ego@linux.vnet.ibm.com>
* Gautham R Shenoy <ego@linux.vnet.ibm.com> [2020-07-07 16:41:37]:
> From: "Gautham R. Shenoy" <ego@linux.vnet.ibm.com>
>
> We are currently assuming that CEDE(0) has exit latency 10us, since
> there is no way for us to query from the platform. However, if the
> wakeup latency of an Extended CEDE state is smaller than 10us, then we
> can be sure that the exit latency of CEDE(0) cannot be more than that.
> that.
>
> In this patch, we fix the exit latency of CEDE(0) if we discover an
> Extended CEDE state with wakeup latency smaller than 10us. The new
> value is 1us lesser than the smallest wakeup latency among the
> Extended CEDE states.
>
> Benchmark results:
>
> ebizzy:
> 2 ebizzy threads bound to the same big-core. 25% improvement in the
> avg records/s with patch.
> x without_patch
> * with_patch
> N Min Max Median Avg Stddev
> x 10 2491089 5834307 5398375 4244335 1596244.9
> * 10 2893813 5834474 5832448 5327281.3 1055941.4
>
> context_switch2 :
> There is no major regression observed with this patch as seen from the
> context_switch2 benchmark.
>
> context_switch2 across CPU0 CPU1 (Both belong to same big-core, but different
> small cores). We observe a minor 0.14% regression in the number of
> context-switches (higher is better).
> x without_patch
> * with_patch
> N Min Max Median Avg Stddev
> x 500 348872 362236 354712 354745.69 2711.827
> * 500 349422 361452 353942 354215.4 2576.9258
>
> context_switch2 across CPU0 CPU8 (Different big-cores). We observe a 0.37%
> improvement in the number of context-switches (higher is better).
> x without_patch
> * with_patch
> N Min Max Median Avg Stddev
> x 500 287956 294940 288896 288977.23 646.59295
> * 500 288300 294646 289582 290064.76 1161.9992
>
> schbench:
> No major difference could be seen until the 99.9th percentile.
>
> Without-patch
> Latency percentiles (usec)
> 50.0th: 29
> 75.0th: 39
> 90.0th: 49
> 95.0th: 59
> *99.0th: 13104
> 99.5th: 14672
> 99.9th: 15824
> min=0, max=17993
>
> With-patch:
> Latency percentiles (usec)
> 50.0th: 29
> 75.0th: 40
> 90.0th: 50
> 95.0th: 61
> *99.0th: 13648
> 99.5th: 14768
> 99.9th: 15664
> min=0, max=29812
>
> Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Reviewed-by: Vaidyanathan Srinivasan <svaidy@linux.ibm.com>
> ---
> drivers/cpuidle/cpuidle-pseries.c | 34 ++++++++++++++++++++++++++++++++--
> 1 file changed, 32 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/cpuidle/cpuidle-pseries.c b/drivers/cpuidle/cpuidle-pseries.c
> index c13549b..502f906 100644
> --- a/drivers/cpuidle/cpuidle-pseries.c
> +++ b/drivers/cpuidle/cpuidle-pseries.c
> @@ -353,12 +353,42 @@ static int pseries_cpuidle_driver_init(void)
> static int add_pseries_idle_states(void)
> {
> int nr_states = 2; /* By default we have snooze, CEDE */
> + int i;
> + u64 min_latency_us = dedicated_states[1].exit_latency; /* CEDE latency */
>
> if (parse_cede_parameters())
> return nr_states;
>
> - pr_info("cpuidle : Skipping the %d Extended CEDE idle states\n",
> - nr_xcede_records);
> + for (i = 0; i < nr_xcede_records; i++) {
> + u64 latency_tb = xcede_records[i].wakeup_latency_tb_ticks;
> + u64 latency_us = tb_to_ns(latency_tb) / NSEC_PER_USEC;
> +
> + if (latency_us < min_latency_us)
> + min_latency_us = latency_us;
> + }
> +
> + /*
> + * We are currently assuming that CEDE(0) has exit latency
> + * 10us, since there is no way for us to query from the
> + * platform.
> + *
> + * However, if the wakeup latency of an Extended CEDE state is
> + * smaller than 10us, then we can be sure that CEDE(0)
> + * requires no more than that.
> + *
> + * Perform the fix-up.
> + */
> + if (min_latency_us < dedicated_states[1].exit_latency) {
> + u64 cede0_latency = min_latency_us - 1;
> +
> + if (cede0_latency <= 0)
> + cede0_latency = min_latency_us;
> +
> + dedicated_states[1].exit_latency = cede0_latency;
> + dedicated_states[1].target_residency = 10 * (cede0_latency);
> + pr_info("cpuidle : Fixed up CEDE exit latency to %llu us\n",
> + cede0_latency);
> + }
As per PAPR spec the CEDE hints are in increasing order of exit
latency. Hence a given state's exit latency cannot exceed the one
following it. The quirk is such that the first one (hint 0) is
implicit and hence we have to use the above logic to extract its
characteristics.
--Vaidy
^ permalink raw reply
* Re: [PATCH 4/5] cpuidle-pseries : Include extended CEDE states in cpuidle framework
From: Vaidyanathan Srinivasan @ 2020-07-20 6:31 UTC (permalink / raw)
To: Gautham R. Shenoy
Cc: Nathan Lynch, Michael Neuling, linux-pm, linux-kernel,
Nicholas Piggin, linuxppc-dev
In-Reply-To: <1594120299-31389-5-git-send-email-ego@linux.vnet.ibm.com>
* Gautham R Shenoy <ego@linux.vnet.ibm.com> [2020-07-07 16:41:38]:
> From: "Gautham R. Shenoy" <ego@linux.vnet.ibm.com>
>
> This patch exposes those extended CEDE states to the cpuidle framework
> which are responsive to external interrupts and do not need an H_PROD.
>
> Since as per the PAPR, all the extended CEDE states are non-responsive
> to timers, we indicate this to the cpuidle subsystem via the
> CPUIDLE_FLAG_TIMER_STOP flag for all those extende CEDE states which
> can wake up on external interrupts.
>
> With the patch, we are able to see the extended CEDE state with
> latency hint = 1 exposed via the cpuidle framework.
>
> $ cpupower idle-info
> CPUidle driver: pseries_idle
> CPUidle governor: menu
> analyzing CPU 0:
>
> Number of idle states: 3
> Available idle states: snooze CEDE XCEDE1
> snooze:
> Flags/Description: snooze
> Latency: 0
> Usage: 33429446
> Duration: 27006062
> CEDE:
> Flags/Description: CEDE
> Latency: 1
> Usage: 10272
> Duration: 110786770
> XCEDE1:
> Flags/Description: XCEDE1
> Latency: 12
> Usage: 26445
> Duration: 1436433815
>
> Benchmark results:
> TLDR: Over all we do not see any additional benefit from having XCEDE1 over
> CEDE.
>
> ebizzy :
> 2 threads bound to a big-core. With this patch, we see a 3.39%
> regression compared to with only CEDE0 latency fixup.
> x With only CEDE0 latency fixup
> * With CEDE0 latency fixup + CEDE1
> N Min Max Median Avg Stddev
> x 10 2893813 5834474 5832448 5327281.3 1055941.4
> * 10 2907329 5834923 5831398 5146614.6 1193874.8
>
> context_switch2:
> With the context_switch2 there are no observable regressions in the
> results.
>
> context_switch2 CPU0 CPU1 (Same Big-core, different small-cores).
> No difference with and without patch.
> x without_patch
> * with_patch
> N Min Max Median Avg Stddev
> x 500 343644 348778 345444 345584.02 1035.1658
> * 500 344310 347646 345776 345877.22 802.19501
>
> context_switch2 CPU0 CPU8 (different big-cores). Minor 0.05% improvement
> with patch
> x without_patch
> * with_patch
> N Min Max Median Avg Stddev
> x 500 287562 288756 288162 288134.76 262.24328
> * 500 287874 288960 288306 288274.66 187.57034
>
> schbench:
> No regressions observed with schbench
>
> Without Patch:
> Latency percentiles (usec)
> 50.0th: 29
> 75.0th: 40
> 90.0th: 50
> 95.0th: 61
> *99.0th: 13648
> 99.5th: 14768
> 99.9th: 15664
> min=0, max=29812
>
> With Patch:
> Latency percentiles (usec)
> 50.0th: 30
> 75.0th: 40
> 90.0th: 51
> 95.0th: 59
> *99.0th: 13616
> 99.5th: 14512
> 99.9th: 15696
> min=0, max=15996
>
> Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Reviewed-by: Vaidyanathan Srinivasan <svaidy@linux.ibm.com>
> ---
> drivers/cpuidle/cpuidle-pseries.c | 50 +++++++++++++++++++++++++++++++++++++++
> 1 file changed, 50 insertions(+)
>
> diff --git a/drivers/cpuidle/cpuidle-pseries.c b/drivers/cpuidle/cpuidle-pseries.c
> index 502f906..6f893cd 100644
> --- a/drivers/cpuidle/cpuidle-pseries.c
> +++ b/drivers/cpuidle/cpuidle-pseries.c
> @@ -362,9 +362,59 @@ static int add_pseries_idle_states(void)
> for (i = 0; i < nr_xcede_records; i++) {
> u64 latency_tb = xcede_records[i].wakeup_latency_tb_ticks;
> u64 latency_us = tb_to_ns(latency_tb) / NSEC_PER_USEC;
> + char name[CPUIDLE_NAME_LEN];
> + unsigned int latency_hint = xcede_records[i].latency_hint;
> + u64 residency_us;
> +
> + if (!xcede_records[i].responsive_to_irqs) {
> + pr_info("cpuidle : Skipping XCEDE%d. Not responsive to IRQs\n",
> + latency_hint);
> + continue;
> + }
>
> if (latency_us < min_latency_us)
> min_latency_us = latency_us;
> + snprintf(name, CPUIDLE_NAME_LEN, "XCEDE%d", latency_hint);
> +
> + /*
> + * As per the section 14.14.1 of PAPR version 2.8.1
> + * says that alling H_CEDE with the value of the cede
> + * latency specifier set greater than zero allows the
> + * processor timer facility to be disabled (so as not
> + * to cause gratuitous wake-ups - the use of H_PROD,
> + * or other external interrupt is required to wake the
> + * processor in this case).
> + *
> + * So, inform the cpuidle-subsystem that the timer
> + * will be stopped for these states.
> + *
> + * Also, bump up the latency by 10us, since cpuidle
> + * would use timer-offload framework which will need
> + * to send an IPI to wakeup a CPU whose timer has
> + * expired.
> + */
> + if (latency_hint > 0) {
> + dedicated_states[nr_states].flags = CPUIDLE_FLAG_TIMER_STOP;
> + latency_us += 10;
> + }
> +
> + /*
> + * Thumb rule : Reside in the XCEDE state for at least
> + * 10x the time required to enter and exit that state.
> + */
> + residency_us = latency_us * 10;
> +
> + strlcpy(dedicated_states[nr_states].name, (const char *)name,
> + CPUIDLE_NAME_LEN);
> + strlcpy(dedicated_states[nr_states].desc, (const char *)name,
> + CPUIDLE_NAME_LEN);
> + dedicated_states[nr_states].exit_latency = latency_us;
> + dedicated_states[nr_states].target_residency = residency_us;
> + dedicated_states[nr_states].enter = &dedicated_cede_loop;
> + cede_latency_hint[nr_states] = latency_hint;
> + pr_info("cpuidle : Added %s. latency-hint = %d\n",
> + name, latency_hint);
> + nr_states++;
This patch demonstrates the various use cases of the previous patches
in the series that helps interface with the platform firmware better.
On current platforms these benefits are very limited, but the
framework built by the previous patches helps Linux exploit new and
enhanced idle states that will be available on newer platform and
firmware.
--Vaidy
^ permalink raw reply
* Re: [PATCH 5/5] cpuidle-pseries: Block Extended CEDE(1) which adds no additional value.
From: Vaidyanathan Srinivasan @ 2020-07-20 6:34 UTC (permalink / raw)
To: Gautham R. Shenoy
Cc: Nathan Lynch, Michael Neuling, linux-pm, linux-kernel,
Nicholas Piggin, linuxppc-dev
In-Reply-To: <1594120299-31389-6-git-send-email-ego@linux.vnet.ibm.com>
* Gautham R Shenoy <ego@linux.vnet.ibm.com> [2020-07-07 16:41:39]:
> From: "Gautham R. Shenoy" <ego@linux.vnet.ibm.com>
>
> The Extended CEDE state with latency-hint = 1 is only different from
> normal CEDE (with latency-hint = 0) in that a CPU in Extended CEDE(1)
> does not wakeup on timer events. Both CEDE and Extended CEDE(1) map to
> the same hardware idle state. Since we already get SMT folding from
> the normal CEDE, the Extended CEDE(1) doesn't provide any additional
> value. This patch blocks Extended CEDE(1).
>
> Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Reviewed-by: Vaidyanathan Srinivasan <svaidy@linux.ibm.com>
> ---
> drivers/cpuidle/cpuidle-pseries.c | 57 ++++++++++++++++++++++++++++++++++++---
> 1 file changed, 54 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/cpuidle/cpuidle-pseries.c b/drivers/cpuidle/cpuidle-pseries.c
> index 6f893cd..be0b8b2 100644
> --- a/drivers/cpuidle/cpuidle-pseries.c
> +++ b/drivers/cpuidle/cpuidle-pseries.c
> @@ -350,6 +350,43 @@ static int pseries_cpuidle_driver_init(void)
> return 0;
> }
>
> +#define XCEDE1_HINT 1
> +#define ERR_NO_VALUE_ADD (-1)
> +#define ERR_NO_EE_WAKEUP (-2)
> +
> +/*
> + * Returns 0 if the Extende CEDE state with @hint is not blocked in
> + * cpuidle framework.
> + *
> + * Returns ERR_NO_EE_WAKEUP if the Extended CEDE state is blocked due
> + * to not being responsive to external interrupts.
> + *
> + * Returns ERR_NO_VALUE_ADD if the Extended CEDE state does not provide
> + * added value addition over the normal CEDE.
> + */
> +static int cpuidle_xcede_blocked(u8 hint, u64 latency_us, u8 responsive_to_irqs)
> +{
> +
> + /*
> + * We will only allow extended CEDE states that are responsive
> + * to irqs do not require an H_PROD to be woken up.
> + */
> + if (!responsive_to_irqs)
> + return ERR_NO_EE_WAKEUP;
> +
> + /*
> + * We already obtain SMT folding benefits from CEDE (which is
> + * CEDE with hint 0). Furthermore, CEDE is also responsive to
> + * timer-events, while XCEDE1 requires an external
> + * interrupt/H_PROD to be woken up. Hence, block XCEDE1 since
> + * it adds no further value.
> + */
> + if (hint == XCEDE1_HINT)
> + return ERR_NO_VALUE_ADD;
> +
> + return 0;
> +}
> +
> static int add_pseries_idle_states(void)
> {
> int nr_states = 2; /* By default we have snooze, CEDE */
> @@ -365,15 +402,29 @@ static int add_pseries_idle_states(void)
> char name[CPUIDLE_NAME_LEN];
> unsigned int latency_hint = xcede_records[i].latency_hint;
> u64 residency_us;
> + int rc;
> +
> + if (latency_us < min_latency_us)
> + min_latency_us = latency_us;
> +
> + rc = cpuidle_xcede_blocked(latency_hint, latency_us,
> + xcede_records[i].responsive_to_irqs);
>
> - if (!xcede_records[i].responsive_to_irqs) {
> + if (rc) {
> + switch (rc) {
> + case ERR_NO_VALUE_ADD:
> + pr_info("cpuidle : Skipping XCEDE%d. No additional value-add\n",
> + latency_hint);
> + break;
> + case ERR_NO_EE_WAKEUP:
> pr_info("cpuidle : Skipping XCEDE%d. Not responsive to IRQs\n",
> latency_hint);
> + break;
> + }
> +
> continue;
> }
>
> - if (latency_us < min_latency_us)
> - min_latency_us = latency_us;
> snprintf(name, CPUIDLE_NAME_LEN, "XCEDE%d", latency_hint);
>
> /*
We need these heuristics to select/reject idle states exposed by
platform firmware to Linux primarily because not all states are really
useful to Linux on a given setup.
--Vaidy
^ permalink raw reply
* Re: [PATCH 05/11] powerpc/smp: Dont assume l2-cache to be superset of sibling
From: Srikar Dronamraju @ 2020-07-20 6:45 UTC (permalink / raw)
To: Gautham R Shenoy
Cc: Nathan Lynch, Oliver OHalloran, Michael Neuling, Michael Ellerman,
Anton Blanchard, linuxppc-dev, Nick Piggin
In-Reply-To: <20200717060011.GE25851@in.ibm.com>
* Gautham R Shenoy <ego@linux.vnet.ibm.com> [2020-07-17 11:30:11]:
> Hi Srikar,
>
> On Tue, Jul 14, 2020 at 10:06:18AM +0530, Srikar Dronamraju wrote:
> > Current code assumes that cpumask of cpus sharing a l2-cache mask will
> > always be a superset of cpu_sibling_mask.
> >
> > Lets stop that assumption.
> >
> > Cc: linuxppc-dev <linuxppc-dev@lists.ozlabs.org>
> > Cc: Michael Ellerman <michaele@au1.ibm.com>
> > Cc: Nick Piggin <npiggin@au1.ibm.com>
> > Cc: Oliver OHalloran <oliveroh@au1.ibm.com>
> > Cc: Nathan Lynch <nathanl@linux.ibm.com>
> > Cc: Michael Neuling <mikey@linux.ibm.com>
> > Cc: Anton Blanchard <anton@au1.ibm.com>
> > Cc: Gautham R Shenoy <ego@linux.vnet.ibm.com>
> > Cc: Vaidyanathan Srinivasan <svaidy@linux.ibm.com>
> > Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
> > ---
> > arch/powerpc/kernel/smp.c | 28 +++++++++++++++-------------
> > 1 file changed, 15 insertions(+), 13 deletions(-)
> >
> > diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
> > index 7d430fc536cc..875f57e41355 100644
> > --- a/arch/powerpc/kernel/smp.c
> > +++ b/arch/powerpc/kernel/smp.c
> > @@ -1198,6 +1198,7 @@ static bool update_mask_by_l2(int cpu, struct cpumask *(*mask_fn)(int))
> > struct device_node *l2_cache, *np;
> > int i;
> >
> > + cpumask_set_cpu(cpu, mask_fn(cpu));
>
> It would be good to comment why do we need to do set the CPU in the
> l2-mask if we don't have a l2cache domain.
>
Good Catch,
We should move this after the cpu_to_l2cache.
> > l2_cache = cpu_to_l2cache(cpu);
> > if (!l2_cache)
> > return false;
> > @@ -1284,29 +1285,30 @@ static void add_cpu_to_masks(int cpu)
> > * add it to it's own thread sibling mask.
> > */
> > cpumask_set_cpu(cpu, cpu_sibling_mask(cpu));
> > + cpumask_set_cpu(cpu, cpu_core_mask(cpu));
> >
> > for (i = first_thread; i < first_thread + threads_per_core; i++)
> > if (cpu_online(i))
> > set_cpus_related(i, cpu, cpu_sibling_mask);
> >
> > add_cpu_to_smallcore_masks(cpu);
> > - /*
> > - * Copy the thread sibling mask into the cache sibling mask
> > - * and mark any CPUs that share an L2 with this CPU.
> > - */
> > - for_each_cpu(i, cpu_sibling_mask(cpu))
> > - set_cpus_related(cpu, i, cpu_l2_cache_mask);
> > update_mask_by_l2(cpu, cpu_l2_cache_mask);
> >
> > - /*
> > - * Copy the cache sibling mask into core sibling mask and mark
> > - * any CPUs on the same chip as this CPU.
> > - */
> > - for_each_cpu(i, cpu_l2_cache_mask(cpu))
> > - set_cpus_related(cpu, i, cpu_core_mask);
> > + if (pkg_id == -1) {
> > + struct cpumask *(*mask)(int) = cpu_sibling_mask;
> > +
> > + /*
> > + * Copy the sibling mask into core sibling mask and
> > + * mark any CPUs on the same chip as this CPU.
> > + */
> > + if (shared_caches)
> > + mask = cpu_l2_cache_mask;
> > +
>
>
> Now that we decoupling the containment relationship between
> sibling_mask and l2-cache mask, should we set all the CPUs that are
> both in cpu_sibling_mask(cpu) as well as cpu_l2_mask(cpu) in
> cpu_core_mask ?
>
Are you saying instead of setting this cpu in this cpu_core_mask, can we set
all the cpus in the mask in cpu_core_mask?
Currently we dont know if any of the cpus of the mask were already set or
not. Plus we need to anyway update cpumask of all other cpus to says they
are related. So setting a mask instead of cpu at a time will not change
anything for our side.
> > + for_each_cpu(i, mask(cpu))
> > + set_cpus_related(cpu, i, cpu_core_mask);
> >
> > - if (pkg_id == -1)
> > return;
> > + }
> >
> > for_each_cpu(i, cpu_online_mask)
> > if (get_physical_package_id(i) == pkg_id)
> > --
> > 2.17.1
> >
> --
> Thanks and Regards
> gautham.
--
Thanks and Regards
Srikar Dronamraju
^ permalink raw reply
* Re: [PATCH v4 10/10] powerpc/watchpoint: Remove 512 byte boundary
From: Jordan Niethe @ 2020-07-20 6:54 UTC (permalink / raw)
To: Ravi Bangoria
Cc: Christophe Leroy, apopple, mikey, miltonm, peterz, oleg,
Nicholas Piggin, linux-kernel, Paul Mackerras, jolsa, fweisbec,
pedromfc, naveen.n.rao, linuxppc-dev, mingo
In-Reply-To: <20200717040958.70561-11-ravi.bangoria@linux.ibm.com>
On Fri, Jul 17, 2020 at 2:11 PM Ravi Bangoria
<ravi.bangoria@linux.ibm.com> wrote:
>
> Power10 has removed 512 bytes boundary from match criteria. i.e. The watch
> range can cross 512 bytes boundary.
It looks like this change is not mentioned in ISA v3.1 Book III 9.4
Data Address Watchpoint. It could be useful to mention that in the
commit message.
Also I wonder if could add a test for this to the ptrace-hwbreak selftest?
>
> Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
> ---
> arch/powerpc/kernel/hw_breakpoint.c | 5 +++--
> 1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/arch/powerpc/kernel/hw_breakpoint.c b/arch/powerpc/kernel/hw_breakpoint.c
> index c55e67bab271..1f4a1efa0074 100644
> --- a/arch/powerpc/kernel/hw_breakpoint.c
> +++ b/arch/powerpc/kernel/hw_breakpoint.c
> @@ -418,8 +418,9 @@ static int hw_breakpoint_validate_len(struct arch_hw_breakpoint *hw)
>
> if (dawr_enabled()) {
> max_len = DAWR_MAX_LEN;
> - /* DAWR region can't cross 512 bytes boundary */
> - if (ALIGN_DOWN(start_addr, SZ_512) != ALIGN_DOWN(end_addr - 1, SZ_512))
> + /* DAWR region can't cross 512 bytes boundary on p10 predecessors */
> + if (!cpu_has_feature(CPU_FTR_ARCH_31) &&
> + (ALIGN_DOWN(start_addr, SZ_512) != ALIGN_DOWN(end_addr - 1, SZ_512)))
> return -EINVAL;
> } else if (IS_ENABLED(CONFIG_PPC_8xx)) {
> /* 8xx can setup a range without limitation */
> --
> 2.26.2
>
^ permalink raw reply
* Re: [PATCH 04/11] powerpc/smp: Enable small core scheduling sooner
From: Srikar Dronamraju @ 2020-07-20 7:20 UTC (permalink / raw)
To: Gautham R Shenoy
Cc: Nathan Lynch, Oliver OHalloran, Michael Neuling, Michael Ellerman,
Anton Blanchard, linuxppc-dev, Nick Piggin
In-Reply-To: <20200717054821.GD25851@in.ibm.com>
* Gautham R Shenoy <ego@linux.vnet.ibm.com> [2020-07-17 11:18:21]:
> On Tue, Jul 14, 2020 at 10:06:17AM +0530, Srikar Dronamraju wrote:
> > Enable small core scheduling as soon as we detect that we are in a
> > system that supports thread group. Doing so would avoid a redundant
> > check.
> >
> > Cc: linuxppc-dev <linuxppc-dev@lists.ozlabs.org>
> > Cc: Michael Ellerman <michaele@au1.ibm.com>
> > Cc: Nick Piggin <npiggin@au1.ibm.com>
> > Cc: Oliver OHalloran <oliveroh@au1.ibm.com>
> > Cc: Nathan Lynch <nathanl@linux.ibm.com>
> > Cc: Michael Neuling <mikey@linux.ibm.com>
> > Cc: Anton Blanchard <anton@au1.ibm.com>
> > Cc: Gautham R Shenoy <ego@linux.vnet.ibm.com>
> > Cc: Vaidyanathan Srinivasan <svaidy@linux.ibm.com>
> > Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
>
> I don't see a problem with this.
>
> However, since we are now going to be maintaining a single topology
> structure, wouldn't it be better to collate all the changes being made
> to the mask_functions/flags/names of this structure within a single
> function so that it becomes easier to keep track of what all changes
> are going into the topology and why are we doing it?
>
My intent was to move the topology updates early as soon as they are
detected. Currently the shared_cache want cannot be detected early.
But I think its possible to detect shared_cache early with some cleanups.
And if we do, then we should be able to call this up pretty early.
>
> > ---
> > arch/powerpc/kernel/smp.c | 12 ++++++------
> > 1 file changed, 6 insertions(+), 6 deletions(-)
> >
> > diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
> > index 24529f6134aa..7d430fc536cc 100644
> > --- a/arch/powerpc/kernel/smp.c
> > +++ b/arch/powerpc/kernel/smp.c
> > @@ -892,6 +892,12 @@ static int init_big_cores(void)
> > }
> >
> > has_big_cores = true;
> > +
> > +#ifdef CONFIG_SCHED_SMT
> > + pr_info("Big cores detected. Using small core scheduling\n");
> > + powerpc_topology[0].mask = smallcore_smt_mask;
> > +#endif
> > +
> > return 0;
> > }
> >
> > @@ -1383,12 +1389,6 @@ void __init smp_cpus_done(unsigned int max_cpus)
> >
> > dump_numa_cpu_topology();
> >
> > -#ifdef CONFIG_SCHED_SMT
> > - if (has_big_cores) {
> > - pr_info("Big cores detected but using small core scheduling\n");
> > - powerpc_topology[0].mask = smallcore_smt_mask;
> > - }
> > -#endif
> > set_sched_topology(powerpc_topology);
> > }
> >
> > --
> > 2.17.1
> >
--
Thanks and Regards
Srikar Dronamraju
^ permalink raw reply
* Re: [PATCH 04/11] powerpc/smp: Enable small core scheduling sooner
From: Jordan Niethe @ 2020-07-20 7:47 UTC (permalink / raw)
To: Srikar Dronamraju
Cc: Nathan Lynch, Gautham R Shenoy, Oliver OHalloran, Michael Neuling,
Michael Ellerman, Anton Blanchard, linuxppc-dev, Nick Piggin
In-Reply-To: <20200714043624.5648-5-srikar@linux.vnet.ibm.com>
On Tue, Jul 14, 2020 at 2:44 PM Srikar Dronamraju
<srikar@linux.vnet.ibm.com> wrote:
>
> Enable small core scheduling as soon as we detect that we are in a
> system that supports thread group. Doing so would avoid a redundant
> check.
>
> Cc: linuxppc-dev <linuxppc-dev@lists.ozlabs.org>
> Cc: Michael Ellerman <michaele@au1.ibm.com>
> Cc: Nick Piggin <npiggin@au1.ibm.com>
> Cc: Oliver OHalloran <oliveroh@au1.ibm.com>
> Cc: Nathan Lynch <nathanl@linux.ibm.com>
> Cc: Michael Neuling <mikey@linux.ibm.com>
> Cc: Anton Blanchard <anton@au1.ibm.com>
> Cc: Gautham R Shenoy <ego@linux.vnet.ibm.com>
> Cc: Vaidyanathan Srinivasan <svaidy@linux.ibm.com>
> Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
> ---
> arch/powerpc/kernel/smp.c | 12 ++++++------
> 1 file changed, 6 insertions(+), 6 deletions(-)
>
> diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
> index 24529f6134aa..7d430fc536cc 100644
> --- a/arch/powerpc/kernel/smp.c
> +++ b/arch/powerpc/kernel/smp.c
> @@ -892,6 +892,12 @@ static int init_big_cores(void)
> }
>
> has_big_cores = true;
> +
> +#ifdef CONFIG_SCHED_SMT
> + pr_info("Big cores detected. Using small core scheduling\n");
Why change the wording from "Big cores detected but using small core
scheduling\n"?
> + powerpc_topology[0].mask = smallcore_smt_mask;
> +#endif
> +
> return 0;
> }
>
> @@ -1383,12 +1389,6 @@ void __init smp_cpus_done(unsigned int max_cpus)
>
> dump_numa_cpu_topology();
>
> -#ifdef CONFIG_SCHED_SMT
> - if (has_big_cores) {
> - pr_info("Big cores detected but using small core scheduling\n");
> - powerpc_topology[0].mask = smallcore_smt_mask;
> - }
> -#endif
> set_sched_topology(powerpc_topology);
> }
>
> --
> 2.17.1
>
^ permalink raw reply
* [PATCH v3] powerpc/pseries: Avoid using addr_to_pfn in realmode
From: Ganesh Goudar @ 2020-07-20 8:03 UTC (permalink / raw)
To: mpe, linuxppc-dev; +Cc: mahesh, Ganesh Goudar, npiggin, aneesh.kumar
When an UE or memory error exception is encountered the MCE handler
tries to find the pfn using addr_to_pfn() which takes effective
address as an argument, later pfn is used to poison the page where
memory error occurred, recent rework in this area made addr_to_pfn
to run in realmode, which can be fatal as it may try to access
memory outside RMO region.
To fix this have separate functions for realmode and virtual mode
handling and let addr_to_pfn to run in virtual mode.
Without this fix following kernel crash is seen on hitting UE.
[ 485.128036] Oops: Kernel access of bad area, sig: 11 [#1]
[ 485.128040] LE SMP NR_CPUS=2048 NUMA pSeries
[ 485.128047] Modules linked in:
[ 485.128067] CPU: 15 PID: 6536 Comm: insmod Kdump: loaded Tainted: G OE 5.7.0 #22
[ 485.128074] NIP: c00000000009b24c LR: c0000000000398d8 CTR: c000000000cd57c0
[ 485.128078] REGS: c000000003f1f970 TRAP: 0300 Tainted: G OE (5.7.0)
[ 485.128082] MSR: 8000000000001003 <SF,ME,RI,LE> CR: 28008284 XER: 00000001
[ 485.128088] CFAR: c00000000009b190 DAR: c0000001fab00000 DSISR: 40000000 IRQMASK: 1
[ 485.128088] GPR00: 0000000000000001 c000000003f1fbf0 c000000001634300 0000b0fa01000000
[ 485.128088] GPR04: d000000002220000 0000000000000000 00000000fab00000 0000000000000022
[ 485.128088] GPR08: c0000001fab00000 0000000000000000 c0000001fab00000 c000000003f1fc14
[ 485.128088] GPR12: 0000000000000008 c000000003ff5880 d000000002100008 0000000000000000
[ 485.128088] GPR16: 000000000000ff20 000000000000fff1 000000000000fff2 d0000000021a1100
[ 485.128088] GPR20: d000000002200000 c00000015c893c50 c000000000d49b28 c00000015c893c50
[ 485.128088] GPR24: d0000000021a0d08 c0000000014e5da8 d0000000021a0818 000000000000000a
[ 485.128088] GPR28: 0000000000000008 000000000000000a c0000000017e2970 000000000000000a
[ 485.128125] NIP [c00000000009b24c] __find_linux_pte+0x11c/0x310
[ 485.128130] LR [c0000000000398d8] addr_to_pfn+0x138/0x170
[ 485.128133] Call Trace:
[ 485.128135] Instruction dump:
[ 485.128138] 3929ffff 7d4a3378 7c883c36 7d2907b4 794a1564 7d294038 794af082 3900ffff
[ 485.128144] 79291f24 790af00e 78e70020 7d095214 <7c69502a> 2fa30000 419e011c 70690040
[ 485.128152] ---[ end trace d34b27e29ae0e340 ]---
Signed-off-by: Ganesh Goudar <ganeshgr@linux.ibm.com>
---
V2: Leave bare metal code and save_mce_event as is.
V3: Have separate functions for realmode and virtual mode handling.
---
arch/powerpc/platforms/pseries/ras.c | 119 ++++++++++++++++-----------
1 file changed, 70 insertions(+), 49 deletions(-)
diff --git a/arch/powerpc/platforms/pseries/ras.c b/arch/powerpc/platforms/pseries/ras.c
index f3736fcd98fc..32fe3fad86b8 100644
--- a/arch/powerpc/platforms/pseries/ras.c
+++ b/arch/powerpc/platforms/pseries/ras.c
@@ -522,18 +522,55 @@ int pSeries_system_reset_exception(struct pt_regs *regs)
return 0; /* need to perform reset */
}
+static int mce_handle_err_realmode(int disposition, u8 error_type)
+{
+#ifdef CONFIG_PPC_BOOK3S_64
+ if (disposition == RTAS_DISP_NOT_RECOVERED) {
+ switch (error_type) {
+ case MC_ERROR_TYPE_SLB:
+ case MC_ERROR_TYPE_ERAT:
+ /*
+ * Store the old slb content in paca before flushing.
+ * Print this when we go to virtual mode.
+ * There are chances that we may hit MCE again if there
+ * is a parity error on the SLB entry we trying to read
+ * for saving. Hence limit the slb saving to single
+ * level of recursion.
+ */
+ if (local_paca->in_mce == 1)
+ slb_save_contents(local_paca->mce_faulty_slbs);
+ flush_and_reload_slb();
+ disposition = RTAS_DISP_FULLY_RECOVERED;
+ break;
+ default:
+ break;
+ }
+ } else if (disposition == RTAS_DISP_LIMITED_RECOVERY) {
+ /* Platform corrected itself but could be degraded */
+ pr_err("MCE: limited recovery, system may be degraded\n");
+ disposition = RTAS_DISP_FULLY_RECOVERED;
+ }
+#endif
+ return disposition;
+}
-static int mce_handle_error(struct pt_regs *regs, struct rtas_error_log *errp)
+static int mce_handle_err_virtmode(struct pt_regs *regs,
+ struct rtas_error_log *errp,
+ struct pseries_mc_errorlog *mce_log,
+ int disposition)
{
struct mce_error_info mce_err = { 0 };
- unsigned long eaddr = 0, paddr = 0;
- struct pseries_errorlog *pseries_log;
- struct pseries_mc_errorlog *mce_log;
- int disposition = rtas_error_disposition(errp);
int initiator = rtas_error_initiator(errp);
int severity = rtas_error_severity(errp);
+ unsigned long eaddr = 0, paddr = 0;
u8 error_type, err_sub_type;
+ if (!mce_log)
+ goto out;
+
+ error_type = mce_log->error_type;
+ err_sub_type = rtas_mc_error_sub_type(mce_log);
+
if (initiator == RTAS_INITIATOR_UNKNOWN)
mce_err.initiator = MCE_INITIATOR_UNKNOWN;
else if (initiator == RTAS_INITIATOR_CPU)
@@ -572,18 +609,7 @@ static int mce_handle_error(struct pt_regs *regs, struct rtas_error_log *errp)
mce_err.error_type = MCE_ERROR_TYPE_UNKNOWN;
mce_err.error_class = MCE_ECLASS_UNKNOWN;
- if (!rtas_error_extended(errp))
- goto out;
-
- pseries_log = get_pseries_errorlog(errp, PSERIES_ELOG_SECT_ID_MCE);
- if (pseries_log == NULL)
- goto out;
-
- mce_log = (struct pseries_mc_errorlog *)pseries_log->data;
- error_type = mce_log->error_type;
- err_sub_type = rtas_mc_error_sub_type(mce_log);
-
- switch (mce_log->error_type) {
+ switch (error_type) {
case MC_ERROR_TYPE_UE:
mce_err.error_type = MCE_ERROR_TYPE_UE;
mce_common_process_ue(regs, &mce_err);
@@ -683,37 +709,32 @@ static int mce_handle_error(struct pt_regs *regs, struct rtas_error_log *errp)
mce_err.error_type = MCE_ERROR_TYPE_UNKNOWN;
break;
}
+out:
+ save_mce_event(regs, disposition == RTAS_DISP_FULLY_RECOVERED,
+ &mce_err, regs->nip, eaddr, paddr);
+ return disposition;
+}
-#ifdef CONFIG_PPC_BOOK3S_64
- if (disposition == RTAS_DISP_NOT_RECOVERED) {
- switch (error_type) {
- case MC_ERROR_TYPE_SLB:
- case MC_ERROR_TYPE_ERAT:
- /*
- * Store the old slb content in paca before flushing.
- * Print this when we go to virtual mode.
- * There are chances that we may hit MCE again if there
- * is a parity error on the SLB entry we trying to read
- * for saving. Hence limit the slb saving to single
- * level of recursion.
- */
- if (local_paca->in_mce == 1)
- slb_save_contents(local_paca->mce_faulty_slbs);
- flush_and_reload_slb();
- disposition = RTAS_DISP_FULLY_RECOVERED;
- break;
- default:
- break;
- }
- } else if (disposition == RTAS_DISP_LIMITED_RECOVERY) {
- /* Platform corrected itself but could be degraded */
- printk(KERN_ERR "MCE: limited recovery, system may "
- "be degraded\n");
- disposition = RTAS_DISP_FULLY_RECOVERED;
- }
-#endif
+static int mce_handle_error(struct pt_regs *regs, struct rtas_error_log *errp)
+{
+ struct pseries_errorlog *pseries_log;
+ struct pseries_mc_errorlog *mce_log = NULL;
+ int disposition = rtas_error_disposition(errp);
+ u8 error_type, err_sub_type;
+
+ if (!rtas_error_extended(errp))
+ goto out;
+
+ pseries_log = get_pseries_errorlog(errp, PSERIES_ELOG_SECT_ID_MCE);
+ if (!pseries_log)
+ goto out;
+
+ mce_log = (struct pseries_mc_errorlog *)pseries_log->data;
+ error_type = mce_log->error_type;
+ err_sub_type = rtas_mc_error_sub_type(mce_log);
+
+ disposition = mce_handle_err_realmode(disposition, error_type);
-out:
/*
* Enable translation as we will be accessing per-cpu variables
* in save_mce_event() which may fall outside RMO region, also
@@ -724,10 +745,10 @@ static int mce_handle_error(struct pt_regs *regs, struct rtas_error_log *errp)
* Note: All the realmode handling like flushing SLB entries for
* SLB multihit is done by now.
*/
+out:
mtmsr(mfmsr() | MSR_IR | MSR_DR);
- save_mce_event(regs, disposition == RTAS_DISP_FULLY_RECOVERED,
- &mce_err, regs->nip, eaddr, paddr);
-
+ disposition = mce_handle_err_virtmode(regs, errp, mce_log,
+ disposition);
return disposition;
}
--
2.17.2
^ permalink raw reply related
* Re: [PATCH 02/11] powerpc/smp: Merge Power9 topology with Power topology
From: Srikar Dronamraju @ 2020-07-20 8:10 UTC (permalink / raw)
To: Gautham R Shenoy
Cc: Nathan Lynch, Oliver OHalloran, Michael Neuling, Michael Ellerman,
Anton Blanchard, linuxppc-dev, Nick Piggin
In-Reply-To: <20200717054436.GB25851@in.ibm.com>
* Gautham R Shenoy <ego@linux.vnet.ibm.com> [2020-07-17 11:14:36]:
> Hi Srikar,
>
> On Tue, Jul 14, 2020 at 10:06:15AM +0530, Srikar Dronamraju wrote:
> > A new sched_domain_topology_level was added just for Power9. However the
> > same can be achieved by merging powerpc_topology with power9_topology
> > and makes the code more simpler especially when adding a new sched
> > domain.
> >
> > Cc: linuxppc-dev <linuxppc-dev@lists.ozlabs.org>
> > Cc: Michael Ellerman <michaele@au1.ibm.com>
> > Cc: Nick Piggin <npiggin@au1.ibm.com>
> > Cc: Oliver OHalloran <oliveroh@au1.ibm.com>
> > Cc: Nathan Lynch <nathanl@linux.ibm.com>
> > Cc: Michael Neuling <mikey@linux.ibm.com>
> > Cc: Anton Blanchard <anton@au1.ibm.com>
> > Cc: Gautham R Shenoy <ego@linux.vnet.ibm.com>
> > Cc: Vaidyanathan Srinivasan <svaidy@linux.ibm.com>
> > Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
> > ---
> > arch/powerpc/kernel/smp.c | 33 ++++++++++-----------------------
> > 1 file changed, 10 insertions(+), 23 deletions(-)
> >
> > diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
> > index 680c0edcc59d..069ea4b21c6d 100644
> > --- a/arch/powerpc/kernel/smp.c
> > +++ b/arch/powerpc/kernel/smp.c
> > @@ -1315,7 +1315,7 @@ int setup_profiling_timer(unsigned int multiplier)
> > }
> >
> > #ifdef CONFIG_SCHED_SMT
> > -/* cpumask of CPUs with asymetric SMT dependancy */
> > +/* cpumask of CPUs with asymmetric SMT dependency */
> > static int powerpc_smt_flags(void)
> > {
> > int flags = SD_SHARE_CPUCAPACITY | SD_SHARE_PKG_RESOURCES;
> > @@ -1328,14 +1328,6 @@ static int powerpc_smt_flags(void)
> > }
> > #endif
> >
> > -static struct sched_domain_topology_level powerpc_topology[] = {
> > -#ifdef CONFIG_SCHED_SMT
> > - { cpu_smt_mask, powerpc_smt_flags, SD_INIT_NAME(SMT) },
> > -#endif
> > - { cpu_cpu_mask, SD_INIT_NAME(DIE) },
> > - { NULL, },
> > -};
> > -
> > /*
> > * P9 has a slightly odd architecture where pairs of cores share an L2 cache.
> > * This topology makes it *much* cheaper to migrate tasks between adjacent cores
> > @@ -1353,7 +1345,13 @@ static int powerpc_shared_cache_flags(void)
> > */
> > static const struct cpumask *shared_cache_mask(int cpu)
> > {
> > - return cpu_l2_cache_mask(cpu);
> > + if (shared_caches)
> > + return cpu_l2_cache_mask(cpu);
> > +
> > + if (has_big_cores)
> > + return cpu_smallcore_mask(cpu);
> > +
> > + return cpu_smt_mask(cpu);
> > }
> >
> > #ifdef CONFIG_SCHED_SMT
> > @@ -1363,7 +1361,7 @@ static const struct cpumask *smallcore_smt_mask(int cpu)
> > }
> > #endif
> >
> > -static struct sched_domain_topology_level power9_topology[] = {
> > +static struct sched_domain_topology_level powerpc_topology[] = {
>
>
> > #ifdef CONFIG_SCHED_SMT
> > { cpu_smt_mask, powerpc_smt_flags, SD_INIT_NAME(SMT) },
> > #endif
> > @@ -1388,21 +1386,10 @@ void __init smp_cpus_done(unsigned int max_cpus)
> > #ifdef CONFIG_SCHED_SMT
> > if (has_big_cores) {
> > pr_info("Big cores detected but using small core scheduling\n");
> I> - power9_topology[0].mask = smallcore_smt_mask;
> > powerpc_topology[0].mask = smallcore_smt_mask;
> > }
> > #endif
> > - /*
> > - * If any CPU detects that it's sharing a cache with another CPU then
> > - * use the deeper topology that is aware of this sharing.
> > - */
> > - if (shared_caches) {
> > - pr_info("Using shared cache scheduler topology\n");
> > - set_sched_topology(power9_topology);
> > - } else {
> > - pr_info("Using standard scheduler topology\n");
> > - set_sched_topology(powerpc_topology);
>
>
> Ok, so we will go with the three level topology by default (SMT,
> CACHE, DIE) and will rely on the sched-domain creation code to
> degenerate CACHE domain in case SMT and CACHE have the same set of
> CPUs (POWER8 for eg).
>
Right.
> From a cleanup perspective this is better, since we won't have to
> worry about defining multiple topology structures, but from a
> performance point of view, wouldn't we now pay an extra penalty of
> degenerating the CACHE domains on POWER8 kind of systems, each time
> when a CPU comes online ?
>
So if we end up either adding a topology definition for each of the new
topologies we support or we have to take the extra penalty.
But going ahead
> Do we know how bad it is ? If the degeneration takes a few extra
> microseconds, that should be ok I suppose.
>
It certainly will add to the penalty, I haven't captured per degeneration
statistics. However I ran an experiment where I run ppc64_cpu --smt=8 ,
followed by ppc64_cpu --smt=1 in a loop of 100 iterations.
On a Power8 System with 256 cpus 8 nodes.
Architecture: ppc64le
Byte Order: Little Endian
CPU(s): 256
On-line CPU(s) list: 0-255
Thread(s) per core: 8
Core(s) per socket: 4
Socket(s): 8
NUMA node(s): 8
Model: 2.1 (pvr 004b 0201)
Model name: POWER8 (architected), altivec supported
Hypervisor vendor: pHyp
Virtualization type: para
L1d cache: 64K
L1i cache: 32K
L2 cache: 512K
L3 cache: 8192K
NUMA node0 CPU(s): 0-31
NUMA node1 CPU(s): 32-63
NUMA node2 CPU(s): 64-95
NUMA node3 CPU(s): 96-127
NUMA node4 CPU(s): 128-159
NUMA node5 CPU(s): 160-191
NUMA node6 CPU(s): 192-223
NUMA node7 CPU(s): 224-255
ppc64_cpu --smt=1
N Min Max Median Avg Stddev
x 100 38.17 53.78 46.81 46.6766 2.8421603
x 100 41.34 58.24 48.35 47.9649 3.6866087
ppc64_cpu --smt=8
N Min Max Median Avg Stddev
x 100 57.43 75.88 60.61 61.0246 2.418685
x 100 58.21 79.24 62.59 63.3326 3.4094558
But once we cleanup, we could add ways to fixup topologies so that we
reverse the overhead.
--
Thanks and Regards
Srikar Dronamraju
^ permalink raw reply
* Re: [v3 12/15] powerpc/perf: Add support for outputting extended regs in perf intr_regs
From: Athira Rajeev @ 2020-07-20 8:09 UTC (permalink / raw)
To: kernel test robot
Cc: ego, Michael Neuling, maddy, kbuild-all, kvm, kvm-ppc, svaidyan,
clang-built-linux, acme, jolsa, linuxppc-dev
In-Reply-To: <202007191932.4rHY8FD8%lkp@intel.com>
[-- Attachment #1: Type: text/plain, Size: 9901 bytes --]
> On 19-Jul-2020, at 4:47 PM, kernel test robot <lkp@intel.com> wrote:
>
> Hi Athira,
>
> Thank you for the patch! Yet something to improve:
>
> [auto build test ERROR on powerpc/next]
> [also build test ERROR on tip/perf/core v5.8-rc5 next-20200717]
> [If your patch is applied to the wrong git tree, kindly drop us a note.
> And when submitting patch, we suggest to use '--base' as documented in
> https://git-scm.com/docs/git-format-patch]
>
> url: https://github.com/0day-ci/linux/commits/Athira-Rajeev/powerpc-perf-Add-support-for-power10-PMU-Hardware/20200717-224353
> base: https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git next
> config: powerpc64-randconfig-r024-20200719 (attached as .config)
> compiler: clang version 12.0.0 (https://github.com/llvm/llvm-project ed6b578040a85977026c93bf4188f996148f3218)
> reproduce (this is a W=1 build):
> wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
> chmod +x ~/bin/make.cross
> # install powerpc64 cross compiling tool for clang build
> # apt-get install binutils-powerpc64-linux-gnu
> # save the attached .config to linux build tree
> COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=powerpc64
>
> If you fix the issue, kindly add following tag as appropriate
> Reported-by: kernel test robot <lkp@intel.com>
>
> All errors (new ones prefixed by >>):
>
> arch/powerpc/include/asm/io-defs.h:45:1: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
> DEF_PCI_AC_NORET(insw, (unsigned long p, void *b, unsigned long c),
> ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> arch/powerpc/include/asm/io.h:601:3: note: expanded from macro 'DEF_PCI_AC_NORET'
> __do_##name al; \
> ^~~~~~~~~~~~~~
> <scratch space>:221:1: note: expanded from here
> __do_insw
> ^
> arch/powerpc/include/asm/io.h:542:56: note: expanded from macro '__do_insw'
> #define __do_insw(p, b, n) readsw((PCI_IO_ADDR)_IO_BASE+(p), (b), (n))
> ~~~~~~~~~~~~~~~~~~~~~^
> In file included from arch/powerpc/perf/perf_regs.c:10:
> In file included from include/linux/perf_event.h:57:
> In file included from include/linux/cgroup.h:26:
> In file included from include/linux/kernel_stat.h:9:
> In file included from include/linux/interrupt.h:11:
> In file included from include/linux/hardirq.h:10:
> In file included from arch/powerpc/include/asm/hardirq.h:6:
> In file included from include/linux/irq.h:20:
> In file included from include/linux/io.h:13:
> In file included from arch/powerpc/include/asm/io.h:604:
> arch/powerpc/include/asm/io-defs.h:47:1: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
> DEF_PCI_AC_NORET(insl, (unsigned long p, void *b, unsigned long c),
> ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> arch/powerpc/include/asm/io.h:601:3: note: expanded from macro 'DEF_PCI_AC_NORET'
> __do_##name al; \
> ^~~~~~~~~~~~~~
> <scratch space>:223:1: note: expanded from here
> __do_insl
> ^
> arch/powerpc/include/asm/io.h:543:56: note: expanded from macro '__do_insl'
> #define __do_insl(p, b, n) readsl((PCI_IO_ADDR)_IO_BASE+(p), (b), (n))
> ~~~~~~~~~~~~~~~~~~~~~^
> In file included from arch/powerpc/perf/perf_regs.c:10:
> In file included from include/linux/perf_event.h:57:
> In file included from include/linux/cgroup.h:26:
> In file included from include/linux/kernel_stat.h:9:
> In file included from include/linux/interrupt.h:11:
> In file included from include/linux/hardirq.h:10:
> In file included from arch/powerpc/include/asm/hardirq.h:6:
> In file included from include/linux/irq.h:20:
> In file included from include/linux/io.h:13:
> In file included from arch/powerpc/include/asm/io.h:604:
> arch/powerpc/include/asm/io-defs.h:49:1: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
> DEF_PCI_AC_NORET(outsb, (unsigned long p, const void *b, unsigned long c),
> ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> arch/powerpc/include/asm/io.h:601:3: note: expanded from macro 'DEF_PCI_AC_NORET'
> __do_##name al; \
> ^~~~~~~~~~~~~~
> <scratch space>:225:1: note: expanded from here
> __do_outsb
> ^
> arch/powerpc/include/asm/io.h:544:58: note: expanded from macro '__do_outsb'
> #define __do_outsb(p, b, n) writesb((PCI_IO_ADDR)_IO_BASE+(p),(b),(n))
> ~~~~~~~~~~~~~~~~~~~~~^
> In file included from arch/powerpc/perf/perf_regs.c:10:
> In file included from include/linux/perf_event.h:57:
> In file included from include/linux/cgroup.h:26:
> In file included from include/linux/kernel_stat.h:9:
> In file included from include/linux/interrupt.h:11:
> In file included from include/linux/hardirq.h:10:
> In file included from arch/powerpc/include/asm/hardirq.h:6:
> In file included from include/linux/irq.h:20:
> In file included from include/linux/io.h:13:
> In file included from arch/powerpc/include/asm/io.h:604:
> arch/powerpc/include/asm/io-defs.h:51:1: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
> DEF_PCI_AC_NORET(outsw, (unsigned long p, const void *b, unsigned long c),
> ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> arch/powerpc/include/asm/io.h:601:3: note: expanded from macro 'DEF_PCI_AC_NORET'
> __do_##name al; \
> ^~~~~~~~~~~~~~
> <scratch space>:227:1: note: expanded from here
> __do_outsw
> ^
> arch/powerpc/include/asm/io.h:545:58: note: expanded from macro '__do_outsw'
> #define __do_outsw(p, b, n) writesw((PCI_IO_ADDR)_IO_BASE+(p),(b),(n))
> ~~~~~~~~~~~~~~~~~~~~~^
> In file included from arch/powerpc/perf/perf_regs.c:10:
> In file included from include/linux/perf_event.h:57:
> In file included from include/linux/cgroup.h:26:
> In file included from include/linux/kernel_stat.h:9:
> In file included from include/linux/interrupt.h:11:
> In file included from include/linux/hardirq.h:10:
> In file included from arch/powerpc/include/asm/hardirq.h:6:
> In file included from include/linux/irq.h:20:
> In file included from include/linux/io.h:13:
> In file included from arch/powerpc/include/asm/io.h:604:
> arch/powerpc/include/asm/io-defs.h:53:1: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
> DEF_PCI_AC_NORET(outsl, (unsigned long p, const void *b, unsigned long c),
> ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> arch/powerpc/include/asm/io.h:601:3: note: expanded from macro 'DEF_PCI_AC_NORET'
> __do_##name al; \
> ^~~~~~~~~~~~~~
> <scratch space>:229:1: note: expanded from here
> __do_outsl
> ^
> arch/powerpc/include/asm/io.h:546:58: note: expanded from macro '__do_outsl'
> #define __do_outsl(p, b, n) writesl((PCI_IO_ADDR)_IO_BASE+(p),(b),(n))
> ~~~~~~~~~~~~~~~~~~~~~^
>>> arch/powerpc/perf/perf_regs.c:16:5: error: expected identifier or '('
> u64 PERF_REG_EXTENDED_MASK;
> ^
> include/linux/perf_regs.h:16:32: note: expanded from macro 'PERF_REG_EXTENDED_MASK'
> #define PERF_REG_EXTENDED_MASK 0
> ^
> 12 warnings and 1 error generated.
>
> vim +16 arch/powerpc/perf/perf_regs.c
>
> 15
>> 16 u64 PERF_REG_EXTENDED_MASK;
> 17
Hi,
This patch defines PERF_REG_EXTENDED_MASK
in arch/powerpc/include/asm/perf_event_server.h and this header file is included conditionally based on
CONFIG_PPC_PERF_CTRS in arch/powerpc/include/asm/perf_event.h.
So build breaks happens with config having CONFIG_PERF_EVENTS set
and without CONFIG_PPC_PERF_CTRS.
This will be fixed by defining PERF_REG_EXTENDED_MASK in perf_event.h as below:
—
diff --git a/arch/powerpc/include/asm/perf_event.h b/arch/powerpc/include/asm/perf_event.h
index eed3954082fa..b1c3a91aa6fa 100644
--- a/arch/powerpc/include/asm/perf_event.h
+++ b/arch/powerpc/include/asm/perf_event.h
@@ -38,4 +38,6 @@
/* To support perf_regs sier update */
extern bool is_sier_available(void);
+extern u64 PERF_REG_EXTENDED_MASK;
+#define PERF_REG_EXTENDED_MASK PERF_REG_EXTENDED_MASK
#endif
diff --git a/arch/powerpc/include/asm/perf_event_server.h b/arch/powerpc/include/asm/perf_event_server.h
index bf85d1a0b59e..5d368e81445f 100644
--- a/arch/powerpc/include/asm/perf_event_server.h
+++ b/arch/powerpc/include/asm/perf_event_server.h
@@ -15,9 +15,6 @@
#define MAX_EVENT_ALTERNATIVES 8
#define MAX_LIMITED_HWCOUNTERS 2
-extern u64 PERF_REG_EXTENDED_MASK;
-#define PERF_REG_EXTENDED_MASK PERF_REG_EXTENDED_MASK
-
struct perf_event;
struct mmcr_regs {
—
We also need this patch by Madhavan Sirinivasan : https://patchwork.ozlabs.org/project/linuxppc-dev/list/?series=183203
to solve similar build break with `is_sier_available`
Thanks
Athira
>
> ---
> 0-DAY CI Kernel Test Service, Intel Corporation
> https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org
> <.config.gz>
[-- Attachment #2: Type: text/html, Size: 22169 bytes --]
^ permalink raw reply related
* Re: [PATCH 04/11] powerpc/smp: Enable small core scheduling sooner
From: Srikar Dronamraju @ 2020-07-20 8:52 UTC (permalink / raw)
To: Jordan Niethe
Cc: Nathan Lynch, Gautham R Shenoy, Oliver OHalloran, Michael Neuling,
Michael Ellerman, Anton Blanchard, linuxppc-dev, Nick Piggin
In-Reply-To: <CACzsE9qoyh++qfTrNq5BXc5eAJeg8Gz2WxrWs4spG40t3c+MBg@mail.gmail.com>
* Jordan Niethe <jniethe5@gmail.com> [2020-07-20 17:47:27]:
> On Tue, Jul 14, 2020 at 2:44 PM Srikar Dronamraju
> <srikar@linux.vnet.ibm.com> wrote:
> >
> > Enable small core scheduling as soon as we detect that we are in a
> > system that supports thread group. Doing so would avoid a redundant
> > check.
> >
> > Cc: linuxppc-dev <linuxppc-dev@lists.ozlabs.org>
> > Cc: Michael Ellerman <michaele@au1.ibm.com>
> > Cc: Nick Piggin <npiggin@au1.ibm.com>
> > Cc: Oliver OHalloran <oliveroh@au1.ibm.com>
> > Cc: Nathan Lynch <nathanl@linux.ibm.com>
> > Cc: Michael Neuling <mikey@linux.ibm.com>
> > Cc: Anton Blanchard <anton@au1.ibm.com>
> > Cc: Gautham R Shenoy <ego@linux.vnet.ibm.com>
> > Cc: Vaidyanathan Srinivasan <svaidy@linux.ibm.com>
> > Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
> > ---
> > arch/powerpc/kernel/smp.c | 12 ++++++------
> > 1 file changed, 6 insertions(+), 6 deletions(-)
> >
> > diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
> > index 24529f6134aa..7d430fc536cc 100644
> > --- a/arch/powerpc/kernel/smp.c
> > +++ b/arch/powerpc/kernel/smp.c
> > @@ -892,6 +892,12 @@ static int init_big_cores(void)
> > }
> >
> > has_big_cores = true;
> > +
> > +#ifdef CONFIG_SCHED_SMT
> > + pr_info("Big cores detected. Using small core scheduling\n");
> Why change the wording from "Big cores detected but using small core
> scheduling\n"?
> > + powerpc_topology[0].mask = smallcore_smt_mask;
> > +#endif
> > +
To me, "but" in the message sounded as if we detected Big cores *but* we want
to continue using smallcore scheduling. However the code was always meaning
to say, "Since we detected big core, i.e thread grouping within a core, System would
benefit by using small core scheduling".
If you think, "but" was adding more info and not misleading, then I can add it back.
> > return 0;
> > }
> >
--
Thanks and Regards
Srikar Dronamraju
^ permalink raw reply
* Re: [PATCH 05/11] powerpc/smp: Dont assume l2-cache to be superset of sibling
From: Gautham R Shenoy @ 2020-07-20 8:58 UTC (permalink / raw)
To: Srikar Dronamraju
Cc: Nathan Lynch, Gautham R Shenoy, Oliver OHalloran, Michael Neuling,
Michael Ellerman, Anton Blanchard, linuxppc-dev, Nick Piggin
In-Reply-To: <20200720064504.GD21103@linux.vnet.ibm.com>
Hi Srikar,
On Mon, Jul 20, 2020 at 12:15:04PM +0530, Srikar Dronamraju wrote:
> * Gautham R Shenoy <ego@linux.vnet.ibm.com> [2020-07-17 11:30:11]:
>
> > Hi Srikar,
> >
> > On Tue, Jul 14, 2020 at 10:06:18AM +0530, Srikar Dronamraju wrote:
> > > Current code assumes that cpumask of cpus sharing a l2-cache mask will
> > > always be a superset of cpu_sibling_mask.
> > >
> > > Lets stop that assumption.
> > >
> > > Cc: linuxppc-dev <linuxppc-dev@lists.ozlabs.org>
> > > Cc: Michael Ellerman <michaele@au1.ibm.com>
> > > Cc: Nick Piggin <npiggin@au1.ibm.com>
> > > Cc: Oliver OHalloran <oliveroh@au1.ibm.com>
> > > Cc: Nathan Lynch <nathanl@linux.ibm.com>
> > > Cc: Michael Neuling <mikey@linux.ibm.com>
> > > Cc: Anton Blanchard <anton@au1.ibm.com>
> > > Cc: Gautham R Shenoy <ego@linux.vnet.ibm.com>
> > > Cc: Vaidyanathan Srinivasan <svaidy@linux.ibm.com>
> > > Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
> > > ---
> > > arch/powerpc/kernel/smp.c | 28 +++++++++++++++-------------
> > > 1 file changed, 15 insertions(+), 13 deletions(-)
> > >
> > > diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
> > > index 7d430fc536cc..875f57e41355 100644
> > > --- a/arch/powerpc/kernel/smp.c
> > > +++ b/arch/powerpc/kernel/smp.c
> > > @@ -1198,6 +1198,7 @@ static bool update_mask_by_l2(int cpu, struct cpumask *(*mask_fn)(int))
> > > struct device_node *l2_cache, *np;
> > > int i;
> > >
> > > + cpumask_set_cpu(cpu, mask_fn(cpu));
> >
> > It would be good to comment why do we need to do set the CPU in the
> > l2-mask if we don't have a l2cache domain.
> >
>
> Good Catch,
> We should move this after the cpu_to_l2cache.
>
> > > l2_cache = cpu_to_l2cache(cpu);
> > > if (!l2_cache)
> > > return false;
> > > @@ -1284,29 +1285,30 @@ static void add_cpu_to_masks(int cpu)
> > > * add it to it's own thread sibling mask.
> > > */
> > > cpumask_set_cpu(cpu, cpu_sibling_mask(cpu));
> > > + cpumask_set_cpu(cpu, cpu_core_mask(cpu));
> > >
> > > for (i = first_thread; i < first_thread + threads_per_core; i++)
> > > if (cpu_online(i))
> > > set_cpus_related(i, cpu, cpu_sibling_mask);
> > >
> > > add_cpu_to_smallcore_masks(cpu);
> > > - /*
> > > - * Copy the thread sibling mask into the cache sibling mask
> > > - * and mark any CPUs that share an L2 with this CPU.
> > > - */
> > > - for_each_cpu(i, cpu_sibling_mask(cpu))
> > > - set_cpus_related(cpu, i, cpu_l2_cache_mask);
> > > update_mask_by_l2(cpu, cpu_l2_cache_mask);
> > >
> > > - /*
> > > - * Copy the cache sibling mask into core sibling mask and mark
> > > - * any CPUs on the same chip as this CPU.
> > > - */
> > > - for_each_cpu(i, cpu_l2_cache_mask(cpu))
> > > - set_cpus_related(cpu, i, cpu_core_mask);
> > > + if (pkg_id == -1) {
> > > + struct cpumask *(*mask)(int) = cpu_sibling_mask;
> > > +
> > > + /*
> > > + * Copy the sibling mask into core sibling mask and
> > > + * mark any CPUs on the same chip as this CPU.
> > > + */
> > > + if (shared_caches)
> > > + mask = cpu_l2_cache_mask;
> > > +
> >
> >
> > Now that we decoupling the containment relationship between
> > sibling_mask and l2-cache mask, should we set all the CPUs that are
> > both in cpu_sibling_mask(cpu) as well as cpu_l2_mask(cpu) in
> > cpu_core_mask ?
> >
>
> Are you saying instead of setting this cpu in this cpu_core_mask, can we set
> all the cpus in the mask in cpu_core_mask?
No. What I am referring to is in the for-loop below, you are setting
the CPUs that are set in mask(cpu) in the cpu_core_mask.
Now, the above code sets
mask(cpu) == cpu_sibling_mask(cpu) in the absence of shared_caches, and
== cpu_l2_cache_mask(cpu) in the presence of shared_cache.
Since we have decoupled the assumption that cpu_sibling_mask(cpu) may not
be contained within cpu_l2_cache_mask(cpu), in the presence of a
shared-cache, why are we only picking the CPUs in
cpu_l2_cache_mask(cpu) to be set in cpu_core_maks(cpu) ? It should
ideally be the superset whose CPUs should be set in
cpu_core_mask(cpu). And the correct cpuset is
cpumask_or(cpu_sibling_mask(cpu), cpu_l2_cache_mask(cpu))
> Currently we dont know if any of the cpus of the mask were already set or
> not. Plus we need to anyway update cpumask of all other cpus to says they
> are related. So setting a mask instead of cpu at a time will not change
> anything for our side.
>
> > > + for_each_cpu(i, mask(cpu))
> > > + set_cpus_related(cpu, i, cpu_core_mask);
> > >
> > > - if (pkg_id == -1)
> > > return;
> > > + }
> > >
> > > for_each_cpu(i, cpu_online_mask)
> > > if (get_physical_package_id(i) == pkg_id)
> > > --
> > > 2.17.1
> > >
> > --
> > Thanks and Regards
> > gautham.
>
> --
> Thanks and Regards
> Srikar Dronamraju
^ permalink raw reply
* Re: [PATCH 06/11] powerpc/smp: Generalize 2nd sched domain
From: Gautham R Shenoy @ 2020-07-20 9:07 UTC (permalink / raw)
To: Srikar Dronamraju
Cc: Nathan Lynch, Gautham R Shenoy, Oliver OHalloran, Michael Neuling,
Michael Ellerman, Anton Blanchard, linuxppc-dev, Nick Piggin
In-Reply-To: <20200720061911.GC21103@linux.vnet.ibm.com>
On Mon, Jul 20, 2020 at 11:49:11AM +0530, Srikar Dronamraju wrote:
> * Gautham R Shenoy <ego@linux.vnet.ibm.com> [2020-07-17 12:07:55]:
>
> > On Tue, Jul 14, 2020 at 10:06:19AM +0530, Srikar Dronamraju wrote:
> > > Currently "CACHE" domain happens to be the 2nd sched domain as per
> > > powerpc_topology. This domain will collapse if cpumask of l2-cache is
> > > same as SMT domain. However we could generalize this domain such that it
> > > could mean either be a "CACHE" domain or a "BIGCORE" domain.
> > >
> > > While setting up the "CACHE" domain, check if shared_cache is already
> > > set.
> > >
> > > Cc: linuxppc-dev <linuxppc-dev@lists.ozlabs.org>
> > > Cc: Michael Ellerman <michaele@au1.ibm.com>
> > > Cc: Nick Piggin <npiggin@au1.ibm.com>
> > > Cc: Oliver OHalloran <oliveroh@au1.ibm.com>
> > > Cc: Nathan Lynch <nathanl@linux.ibm.com>
> > > Cc: Michael Neuling <mikey@linux.ibm.com>
> > > Cc: Anton Blanchard <anton@au1.ibm.com>
> > > Cc: Gautham R Shenoy <ego@linux.vnet.ibm.com>
> > > Cc: Vaidyanathan Srinivasan <svaidy@linux.ibm.com>
> > > Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
> > > ---
> > > @@ -867,11 +869,16 @@ static const struct cpumask *smallcore_smt_mask(int cpu)
> > > }
> > > #endif
> > >
> > > +static const struct cpumask *cpu_bigcore_mask(int cpu)
> > > +{
> > > + return cpu_core_mask(cpu);
> >
> > It should be cpu_smt_mask() if we want the redundant big-core to be
> > degenerated in favour of the SMT level on P8, no? Because
> > cpu_core_mask refers to all the CPUs that are in the same chip.
> >
>
> Right, but it cant be cpu_smt_mask since cpu_smt_mask is only enabled in
> CONFIG_SCHED_SMT. I was looking at using sibling_map, but we have to careful
> for power9 / PowerNV mode. Guess that should be fine.
Ok.
>
> > > +}
> > > +
> > > static struct sched_domain_topology_level powerpc_topology[] = {
> > > #ifdef CONFIG_SCHED_SMT
> > > { cpu_smt_mask, powerpc_smt_flags, SD_INIT_NAME(SMT) },
> > > #endif
> > > - { shared_cache_mask, powerpc_shared_cache_flags, SD_INIT_NAME(CACHE) },
> > > + { cpu_bigcore_mask, SD_INIT_NAME(BIGCORE) },
> > > { cpu_cpu_mask, SD_INIT_NAME(DIE) },
> > > { NULL, },
> > > };
> > > @@ -1319,7 +1326,6 @@ static void add_cpu_to_masks(int cpu)
> > > void start_secondary(void *unused)
> > > {
> > > unsigned int cpu = smp_processor_id();
> > > - struct cpumask *(*sibling_mask)(int) = cpu_sibling_mask;
> > >
> > > mmgrab(&init_mm);
> > > current->active_mm = &init_mm;
> > > @@ -1345,14 +1351,20 @@ void start_secondary(void *unused)
> > > /* Update topology CPU masks */
> > > add_cpu_to_masks(cpu);
> > >
> > > - if (has_big_cores)
> > > - sibling_mask = cpu_smallcore_mask;
> > > /*
> > > * Check for any shared caches. Note that this must be done on a
> > > * per-core basis because one core in the pair might be disabled.
> > > */
> > > - if (!cpumask_equal(cpu_l2_cache_mask(cpu), sibling_mask(cpu)))
> > > - shared_caches = true;
> > > + if (!shared_caches) {
> > > + struct cpumask *(*sibling_mask)(int) = cpu_sibling_mask;
> > > + struct cpumask *mask = cpu_l2_cache_mask(cpu);
> > > +
> > > + if (has_big_cores)
> > > + sibling_mask = cpu_smallcore_mask;
> > > +
> > > + if (cpumask_weight(mask) > cpumask_weight(sibling_mask(cpu)))
> > > + shared_caches = true;
> >
> > Shouldn't we use cpumask_subset() here ?
>
> Wouldn't cpumask_subset should return 1 if both are same?
When are caches shared ? When the sibling_mask(cpu) is a
strict-subset of cpu_l2_cache_mask(cpu). cpumask_weight() only
checks if the number of CPUs in cpu_l2_cache_mask(cpu) is greater than
sibling_mask(cpu) but not if constituent CPUs of the former forms
a strict superset of the latter.
We are better off using
if (!cpumask_equal(sibling_mask(cpu), mask) &&
cpumask_subset(sibling_mask(cpu), mask))
which is accurate.
> We dont want to have shared_caches set if both the masks are equal.
>
> >
> > > + }
> > >
> > > set_numa_node(numa_cpu_lookup_table[cpu]);
> > > set_numa_mem(local_memory_node(numa_cpu_lookup_table[cpu]));
> > > @@ -1390,6 +1402,14 @@ void __init smp_cpus_done(unsigned int max_cpus)
> > > smp_ops->bringup_done();
> > >
> > > dump_numa_cpu_topology();
> > > + if (shared_caches) {
> > > + pr_info("Using shared cache scheduler topology\n");
> > > + powerpc_topology[bigcore_idx].mask = shared_cache_mask;
> > > +#ifdef CONFIG_SCHED_DEBUG
> > > + powerpc_topology[bigcore_idx].name = "CACHE";
> > > +#endif
> > > + powerpc_topology[bigcore_idx].sd_flags = powerpc_shared_cache_flags;
> > > + }
> >
> >
> > I would much rather that we have all the topology-fixups done in one
> > function.
> >
> > fixup_topology(void) {
> > if (has_big_core)
> > powerpc_topology[smt_idx].mask = smallcore_smt_mask;
> >
> > if (shared_caches) {
> > const char *name = "CACHE";
> > powerpc_topology[bigcore_idx].mask = shared_cache_mask;
> > strlcpy(powerpc_topology[bigcore_idx].name, name,
> > strlen(name));
> > powerpc_topology[bigcore_idx].sd_flags = powerpc_shared_cache_flags;
> > }
> >
> > /* Any other changes to the topology structure here */
>
> We could do this.
>
> >
> > And also as an optimization, get rid of degenerate structures here
> > itself so that we don't pay additional penalty while building the
> > sched-domains each time.
> >
>
> Yes this is definitely in plan, but slightly later in time.
>
Ah, ok. Fine in that case.
> Thanks for the review and comments.
>
> --
> Thanks and Regards
> Srikar Dronamraju
^ permalink raw reply
* Re: [PATCH 10/11] powerpc/smp: Implement cpu_to_coregroup_id
From: Gautham R Shenoy @ 2020-07-20 9:10 UTC (permalink / raw)
To: Srikar Dronamraju
Cc: Nathan Lynch, Gautham R Shenoy, Oliver OHalloran, Michael Neuling,
Michael Ellerman, Anton Blanchard, linuxppc-dev, Nick Piggin
In-Reply-To: <20200720054816.GA21103@linux.vnet.ibm.com>
Hi Srikar,
On Mon, Jul 20, 2020 at 11:18:16AM +0530, Srikar Dronamraju wrote:
> * Gautham R Shenoy <ego@linux.vnet.ibm.com> [2020-07-17 13:56:53]:
>
> > On Tue, Jul 14, 2020 at 10:06:23AM +0530, Srikar Dronamraju wrote:
> > > Lookup the coregroup id from the associativity array.
> > >
> > > If unable to detect the coregroup id, fallback on the core id.
> > > This way, ensure sched_domain degenerates and an extra sched domain is
> > > not created.
> > >
> > > Ideally this function should have been implemented in
> > > arch/powerpc/kernel/smp.c. However if its implemented in mm/numa.c, we
> > > don't need to find the primary domain again.
> > >
> > > If the device-tree mentions more than one coregroup, then kernel
> > > implements only the last or the smallest coregroup, which currently
> > > corresponds to the penultimate domain in the device-tree.
> > >
> > > Cc: linuxppc-dev <linuxppc-dev@lists.ozlabs.org>
> > > Cc: Michael Ellerman <michaele@au1.ibm.com>
> > > Cc: Nick Piggin <npiggin@au1.ibm.com>
> > > Cc: Oliver OHalloran <oliveroh@au1.ibm.com>
> > > Cc: Nathan Lynch <nathanl@linux.ibm.com>
> > > Cc: Michael Neuling <mikey@linux.ibm.com>
> > > Cc: Anton Blanchard <anton@au1.ibm.com>
> > > Cc: Gautham R Shenoy <ego@linux.vnet.ibm.com>
> > > Cc: Vaidyanathan Srinivasan <svaidy@linux.ibm.com>
> > > Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
> > > ---
> > > arch/powerpc/mm/numa.c | 17 +++++++++++++++++
> > > 1 file changed, 17 insertions(+)
> > >
> > > diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
> > > index d9ab9da85eab..4e85564ef62a 100644
> > > --- a/arch/powerpc/mm/numa.c
> > > +++ b/arch/powerpc/mm/numa.c
> > > @@ -1697,6 +1697,23 @@ static const struct proc_ops topology_proc_ops = {
> > >
> > > int cpu_to_coregroup_id(int cpu)
> > > {
> > > + __be32 associativity[VPHN_ASSOC_BUFSIZE] = {0};
> > > + int index;
> > > +
> >
> > It would be good to have an assert here to ensure that we are calling
> > this function only when coregroups are enabled.
> >
> > Else, we may end up returning the penultimate index which maps to the
> > chip-id.
> >
>
> We have a check below exactly for the same reason. Please look
below.
I saw that. However, it would be better to assert within the function
so that we don't call it from any other context without ascertaining
first that core_groups are enabled. Or at least a comment in the
function saying that we should call this only after ascertaining that
core_groups are enabled.
>
> >
> >
> > > + if (cpu < 0 || cpu > nr_cpu_ids)
> > > + return -1;
> > > +
> > > + if (!firmware_has_feature(FW_FEATURE_VPHN))
> > > + goto out;
> > > +
> > > + if (vphn_get_associativity(cpu, associativity))
> > > + goto out;
> > > +
> > > + index = of_read_number(associativity, 1);
> > > + if ((index > min_common_depth + 1) && coregroup_enabled)
> > > + return of_read_number(&associativity[index - 1], 1);
>
> See ^above.
>
> index would be the all the domains in the associativity array,
> min_common_depth would be where the primary domain or the chip-id is
> defined. So we are reading the penultimate domain if and only if the
> min_common_depth isn't the primary domain aka chip-id.
>
> What other check /assertions can we add?
>
>
> > > +
> > > +out:
> > > return cpu_to_core_id(cpu);
> > > }
> > >
> > > --
> > > 2.17.1
> > >
>
> --
> Thanks and Regards
> Srikar Dronamraju
^ permalink raw reply
* Re: [RFC PATCH] powerpc/pseries/svm: capture instruction faulting on MMIO access, in sprg0 register
From: Laurent Dufour @ 2020-07-20 9:39 UTC (permalink / raw)
To: Ram Pai, kvm-ppc, linuxppc-dev
Cc: aik, bharata, sathnaga, sukadev, bauerman, david
In-Reply-To: <1594888333-9370-1-git-send-email-linuxram@us.ibm.com>
Le 16/07/2020 à 10:32, Ram Pai a écrit :
> An instruction accessing a mmio address, generates a HDSI fault. This fault is
> appropriately handled by the Hypervisor. However in the case of secureVMs, the
> fault is delivered to the ultravisor.
>
> Unfortunately the Ultravisor has no correct-way to fetch the faulting
> instruction. The PEF architecture does not allow Ultravisor to enable MMU
> translation. Walking the two level page table to read the instruction can race
> with other vcpus modifying the SVM's process scoped page table.
>
> This problem can be correctly solved with some help from the kernel.
>
> Capture the faulting instruction in SPRG0 register, before executing the
> faulting instruction. This enables the ultravisor to easily procure the
> faulting instruction and emulate it.
>
> Signed-off-by: Ram Pai <linuxram@us.ibm.com>
> ---
> arch/powerpc/include/asm/io.h | 85 ++++++++++++++++++++++++++++++++++++++-----
> 1 file changed, 75 insertions(+), 10 deletions(-)
>
> diff --git a/arch/powerpc/include/asm/io.h b/arch/powerpc/include/asm/io.h
> index 635969b..7ef663d 100644
> --- a/arch/powerpc/include/asm/io.h
> +++ b/arch/powerpc/include/asm/io.h
> @@ -35,6 +35,7 @@
> #include <asm/mmu.h>
> #include <asm/ppc_asm.h>
> #include <asm/pgtable.h>
> +#include <asm/svm.h>
>
> #define SIO_CONFIG_RA 0x398
> #define SIO_CONFIG_RD 0x399
> @@ -105,34 +106,98 @@
> static inline u##size name(const volatile u##size __iomem *addr) \
> { \
> u##size ret; \
> - __asm__ __volatile__("sync;"#insn" %0,%y1;twi 0,%0,0;isync" \
> - : "=r" (ret) : "Z" (*addr) : "memory"); \
> + if (is_secure_guest()) { \
> + __asm__ __volatile__("mfsprg0 %3;" \
> + "lnia %2;" \
> + "ld %2,12(%2);" \
> + "mtsprg0 %2;" \
> + "sync;" \
> + #insn" %0,%y1;" \
> + "twi 0,%0,0;" \
> + "isync;" \
> + "mtsprg0 %3" \
> + : "=r" (ret) \
> + : "Z" (*addr), "r" (0), "r" (0) \
I'm wondering if SPRG0 is restored to its original value.
You're using the same register (r0) for parameters 2 and 3, so when doing lnia
%2, you're overwriting the SPRG0 value you saved in r0 just earlier.
It may be clearer to use explicit registers for %2 and %3 and to mark them as
modified for the compiler.
This applies to the other macros.
Cheers,
Laurent.
> + : "memory"); \
> + } else { \
> + __asm__ __volatile__("sync;" \
> + #insn" %0,%y1;" \
> + "twi 0,%0,0;" \
> + "isync" \
> + : "=r" (ret) : "Z" (*addr) : "memory"); \
> + } \
> return ret; \
> }
>
> #define DEF_MMIO_OUT_X(name, size, insn) \
> static inline void name(volatile u##size __iomem *addr, u##size val) \
> { \
> - __asm__ __volatile__("sync;"#insn" %1,%y0" \
> - : "=Z" (*addr) : "r" (val) : "memory"); \
> - mmiowb_set_pending(); \
> + if (is_secure_guest()) { \
> + __asm__ __volatile__("mfsprg0 %3;" \
> + "lnia %2;" \
> + "ld %2,12(%2);" \
> + "mtsprg0 %2;" \
> + "sync;" \
> + #insn" %1,%y0;" \
> + "mtsprg0 %3" \
> + : "=Z" (*addr) \
> + : "r" (val), "r" (0), "r" (0) \
> + : "memory"); \
> + } else { \
> + __asm__ __volatile__("sync;" \
> + #insn" %1,%y0" \
> + : "=Z" (*addr) : "r" (val) : "memory"); \
> + mmiowb_set_pending(); \
> + } \
> }
>
> #define DEF_MMIO_IN_D(name, size, insn) \
> static inline u##size name(const volatile u##size __iomem *addr) \
> { \
> u##size ret; \
> - __asm__ __volatile__("sync;"#insn"%U1%X1 %0,%1;twi 0,%0,0;isync"\
> - : "=r" (ret) : "m" (*addr) : "memory"); \
> + if (is_secure_guest()) { \
> + __asm__ __volatile__("mfsprg0 %3;" \
> + "lnia %2;" \
> + "ld %2,12(%2);" \
> + "mtsprg0 %2;" \
> + "sync;" \
> + #insn"%U1%X1 %0,%1;" \
> + "twi 0,%0,0;" \
> + "isync;" \
> + "mtsprg0 %3" \
> + : "=r" (ret) \
> + : "m" (*addr), "r" (0), "r" (0) \
> + : "memory"); \
> + } else { \
> + __asm__ __volatile__("sync;" \
> + #insn"%U1%X1 %0,%1;" \
> + "twi 0,%0,0;" \
> + "isync" \
> + : "=r" (ret) : "m" (*addr) : "memory"); \
> + } \
> return ret; \
> }
>
> #define DEF_MMIO_OUT_D(name, size, insn) \
> static inline void name(volatile u##size __iomem *addr, u##size val) \
> { \
> - __asm__ __volatile__("sync;"#insn"%U0%X0 %1,%0" \
> - : "=m" (*addr) : "r" (val) : "memory"); \
> - mmiowb_set_pending(); \
> + if (is_secure_guest()) { \
> + __asm__ __volatile__("mfsprg0 %3;" \
> + "lnia %2;" \
> + "ld %2,12(%2);" \
> + "mtsprg0 %2;" \
> + "sync;" \
> + #insn"%U0%X0 %1,%0;" \
> + "mtsprg0 %3" \
> + : "=m" (*addr) \
> + : "r" (val), "r" (0), "r" (0) \
> + : "memory"); \
> + } else { \
> + __asm__ __volatile__("sync;" \
> + #insn"%U0%X0 %1,%0" \
> + : "=m" (*addr) : "r" (val) : "memory"); \
> + mmiowb_set_pending(); \
> + } \
> }
>
> DEF_MMIO_IN_D(in_8, 8, lbz);
>
^ permalink raw reply
* Re: [v3 11/15] powerpc/perf: BHRB control to disable BHRB logic when not used
From: Gautham R Shenoy @ 2020-07-20 10:05 UTC (permalink / raw)
To: Athira Rajeev
Cc: ego, mikey, maddy, kvm, kvm-ppc, svaidyan, acme, jolsa,
linuxppc-dev
In-Reply-To: <1594996707-3727-12-git-send-email-atrajeev@linux.vnet.ibm.com>
On Fri, Jul 17, 2020 at 10:38:23AM -0400, Athira Rajeev wrote:
> PowerISA v3.1 has few updates for the Branch History Rolling Buffer(BHRB).
>
> BHRB disable is controlled via Monitor Mode Control Register A (MMCRA)
> bit, namely "BHRB Recording Disable (BHRBRD)". This field controls
> whether BHRB entries are written when BHRB recording is enabled by other
> bits. This patch implements support for this BHRB disable bit.
>
> By setting 0b1 to this bit will disable the BHRB and by setting 0b0
> to this bit will have BHRB enabled. This addresses backward
> compatibility (for older OS), since this bit will be cleared and
> hardware will be writing to BHRB by default.
>
> This patch addresses changes to set MMCRA (BHRBRD) at boot for power10
> ( there by the core will run faster) and enable this feature only on
> runtime ie, on explicit need from user. Also save/restore MMCRA in the
> restore path of state-loss idle state to make sure we keep BHRB disabled
> if it was not enabled on request at runtime.
>
> Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
For arch/powerpc/platforms/powernv/idle.c
Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
> ---
> arch/powerpc/perf/core-book3s.c | 20 ++++++++++++++++----
> arch/powerpc/perf/isa207-common.c | 12 ++++++++++++
> arch/powerpc/platforms/powernv/idle.c | 22 ++++++++++++++++++++--
> 3 files changed, 48 insertions(+), 6 deletions(-)
[..snip..]
> diff --git a/arch/powerpc/platforms/powernv/idle.c b/arch/powerpc/platforms/powernv/idle.c
> index 2dd4673..1c9d0a9 100644
> --- a/arch/powerpc/platforms/powernv/idle.c
> +++ b/arch/powerpc/platforms/powernv/idle.c
> @@ -611,6 +611,7 @@ static unsigned long power9_idle_stop(unsigned long psscr, bool mmu_on)
> unsigned long srr1;
> unsigned long pls;
> unsigned long mmcr0 = 0;
> + unsigned long mmcra = 0;
> struct p9_sprs sprs = {}; /* avoid false used-uninitialised */
> bool sprs_saved = false;
>
> @@ -657,6 +658,21 @@ static unsigned long power9_idle_stop(unsigned long psscr, bool mmu_on)
> */
> mmcr0 = mfspr(SPRN_MMCR0);
> }
> +
> + if (cpu_has_feature(CPU_FTR_ARCH_31)) {
> + /*
> + * POWER10 uses MMCRA (BHRBRD) as BHRB disable bit.
> + * If the user hasn't asked for the BHRB to be
> + * written, the value of MMCRA[BHRBRD] is 1.
> + * On wakeup from stop, MMCRA[BHRBD] will be 0,
> + * since it is previleged resource and will be lost.
> + * Thus, if we do not save and restore the MMCRA[BHRBD],
> + * hardware will be needlessly writing to the BHRB
> + * in problem mode.
> + */
> + mmcra = mfspr(SPRN_MMCRA);
> + }
> +
> if ((psscr & PSSCR_RL_MASK) >= pnv_first_spr_loss_level) {
> sprs.lpcr = mfspr(SPRN_LPCR);
> sprs.hfscr = mfspr(SPRN_HFSCR);
> @@ -700,8 +716,6 @@ static unsigned long power9_idle_stop(unsigned long psscr, bool mmu_on)
> WARN_ON_ONCE(mfmsr() & (MSR_IR|MSR_DR));
>
> if ((srr1 & SRR1_WAKESTATE) != SRR1_WS_NOLOSS) {
> - unsigned long mmcra;
> -
> /*
> * We don't need an isync after the mtsprs here because the
> * upcoming mtmsrd is execution synchronizing.
> @@ -721,6 +735,10 @@ static unsigned long power9_idle_stop(unsigned long psscr, bool mmu_on)
> mtspr(SPRN_MMCR0, mmcr0);
> }
>
> + /* Reload MMCRA to restore BHRB disable bit for POWER10 */
> + if (cpu_has_feature(CPU_FTR_ARCH_31))
> + mtspr(SPRN_MMCRA, mmcra);
> +
> /*
> * DD2.2 and earlier need to set then clear bit 60 in MMCRA
> * to ensure the PMU starts running.
> --
> 1.8.3.1
>
^ permalink raw reply
* Re: [PATCH] HID: udraw-ps3: Replace HTTP links with HTTPS ones
From: Jiri Kosina @ 2020-07-20 10:25 UTC (permalink / raw)
To: Bastien Nocera
Cc: linux-kernel, benjamin.tissoires, paulus, Alexander A. Klimov,
linux-input, linuxppc-dev
In-Reply-To: <c1ce6d1eaeed9e239742c12f4f82c84ad3f36fd4.camel@hadess.net>
On Sat, 18 Jul 2020, Bastien Nocera wrote:
> > Rationale:
> > Reduces attack surface on kernel devs opening the links for MITM
> > as HTTPS traffic is much harder to manipulate.
> >
> > Deterministic algorithm:
> > For each file:
> > If not .svg:
> > For each line:
> > If doesn't contain `\bxmlns\b`:
> > For each link, `\bhttp://[^# \t\r\n]*(?:\w|/)`:
> > If neither `\bgnu\.org/license`, nor `\bmozilla\.org/MPL\b`:
> > If both the HTTP and HTTPS versions
> > return 200 OK and serve the same content:
> > Replace HTTP with HTTPS.
> >
> > Signed-off-by: Alexander A. Klimov <grandmaster@al2klimov.de>
>
> Looks good!
>
> Acked-by: Bastien Nocera <hadess@hadess.net>
Applied, thanks.
--
Jiri Kosina
SUSE Labs
^ permalink raw reply
* Re: [PATCH 10/11] powerpc/smp: Implement cpu_to_coregroup_id
From: Srikar Dronamraju @ 2020-07-20 10:26 UTC (permalink / raw)
To: Gautham R Shenoy
Cc: Nathan Lynch, Oliver OHalloran, Michael Neuling, Michael Ellerman,
Anton Blanchard, linuxppc-dev, Nick Piggin
In-Reply-To: <20200720091025.GC6680@in.ibm.com>
* Gautham R Shenoy <ego@linux.vnet.ibm.com> [2020-07-20 14:40:25]:
> Hi Srikar,
>
>
> On Mon, Jul 20, 2020 at 11:18:16AM +0530, Srikar Dronamraju wrote:
> > * Gautham R Shenoy <ego@linux.vnet.ibm.com> [2020-07-17 13:56:53]:
> >
> > > On Tue, Jul 14, 2020 at 10:06:23AM +0530, Srikar Dronamraju wrote:
> > > > Lookup the coregroup id from the associativity array.
> > > >
> > > > If unable to detect the coregroup id, fallback on the core id.
> > > > This way, ensure sched_domain degenerates and an extra sched domain is
> > > > not created.
> > > >
> > > > Ideally this function should have been implemented in
> > > > arch/powerpc/kernel/smp.c. However if its implemented in mm/numa.c, we
> > > > don't need to find the primary domain again.
> > > >
> > > > If the device-tree mentions more than one coregroup, then kernel
> > > > implements only the last or the smallest coregroup, which currently
> > > > corresponds to the penultimate domain in the device-tree.
> > > >
> > > > Cc: linuxppc-dev <linuxppc-dev@lists.ozlabs.org>
> > > > Cc: Michael Ellerman <michaele@au1.ibm.com>
> > > > Cc: Nick Piggin <npiggin@au1.ibm.com>
> > > > Cc: Oliver OHalloran <oliveroh@au1.ibm.com>
> > > > Cc: Nathan Lynch <nathanl@linux.ibm.com>
> > > > Cc: Michael Neuling <mikey@linux.ibm.com>
> > > > Cc: Anton Blanchard <anton@au1.ibm.com>
> > > > Cc: Gautham R Shenoy <ego@linux.vnet.ibm.com>
> > > > Cc: Vaidyanathan Srinivasan <svaidy@linux.ibm.com>
> > > > Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
> > > > ---
> > > > arch/powerpc/mm/numa.c | 17 +++++++++++++++++
> > > > 1 file changed, 17 insertions(+)
> > > >
> > > > diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
> > > > index d9ab9da85eab..4e85564ef62a 100644
> > > > --- a/arch/powerpc/mm/numa.c
> > > > +++ b/arch/powerpc/mm/numa.c
> > > > @@ -1697,6 +1697,23 @@ static const struct proc_ops topology_proc_ops = {
> > > >
> > > > int cpu_to_coregroup_id(int cpu)
> > > > {
> > > > + __be32 associativity[VPHN_ASSOC_BUFSIZE] = {0};
> > > > + int index;
> > > > +
> > >
> > > It would be good to have an assert here to ensure that we are calling
> > > this function only when coregroups are enabled.
> > >
> > > Else, we may end up returning the penultimate index which maps to the
> > > chip-id.
> > >
> >
> > We have a check below exactly for the same reason. Please look
> below.
>
> I saw that. However, it would be better to assert within the function
> so that we don't call it from any other context without ascertaining
> first that core_groups are enabled. Or at least a comment in the
> function saying that we should call this only after ascertaining that
> core_groups are enabled.
>
>
Okay, I can move the coregroup_enabled check above.
>
> >
> > >
> > >
> > > > + if (cpu < 0 || cpu > nr_cpu_ids)
> > > > + return -1;
> > > > +
> > > > + if (!firmware_has_feature(FW_FEATURE_VPHN))
> > > > + goto out;
> > > > +
> > > > + if (vphn_get_associativity(cpu, associativity))
> > > > + goto out;
> > > > +
> > > > + index = of_read_number(associativity, 1);
> > > > + if ((index > min_common_depth + 1) && coregroup_enabled)
> > > > + return of_read_number(&associativity[index - 1], 1);
> >
> > See ^above.
> >
> > index would be the all the domains in the associativity array,
> > min_common_depth would be where the primary domain or the chip-id is
> > defined. So we are reading the penultimate domain if and only if the
> > min_common_depth isn't the primary domain aka chip-id.
> >
> > What other check /assertions can we add?
> >
> >
> > > > +
> > > > +out:
> > > > return cpu_to_core_id(cpu);
> > > > }
> > > >
> > > > --
> > > > 2.17.1
> > > >
> >
> > --
> > Thanks and Regards
> > Srikar Dronamraju
--
Thanks and Regards
Srikar Dronamraju
^ permalink raw reply
* Re: [PATCH] powerpc/fault: kernel can extend a user process's stack
From: Michal Suchánek @ 2020-07-20 10:51 UTC (permalink / raw)
To: Daniel Axtens; +Cc: Tom Lane, linuxppc-dev, Daniel Black
In-Reply-To: <8736drciem.fsf@dja-thinkpad.axtens.net>
Hello,
On Wed, Dec 11, 2019 at 08:37:21PM +1100, Daniel Axtens wrote:
> > Fixes: 14cf11af6cf6 ("powerpc: Merge enough to start building in
> > arch/powerpc.")
>
> Wow, that's pretty ancient! I'm also not sure it's right - in that same
> patch, arch/ppc64/mm/fault.c contains:
>
> ^1da177e4c3f4 (Linus Torvalds 2005-04-16 15:20:36 -0700 213) if (address + 2048 < uregs->gpr[1]
> ^1da177e4c3f4 (Linus Torvalds 2005-04-16 15:20:36 -0700 214) && (!user_mode(regs) || !store_updates_sp(regs)))
> ^1da177e4c3f4 (Linus Torvalds 2005-04-16 15:20:36 -0700 215) goto bad_area;
>
> Which is the same as the new arch/powerpc/mm/fault.c code:
>
> 14cf11af6cf60 (Paul Mackerras 2005-09-26 16:04:21 +1000 234) if (address + 2048 < uregs->gpr[1]
> 14cf11af6cf60 (Paul Mackerras 2005-09-26 16:04:21 +1000 235) && (!user_mode(regs) || !store_updates_sp(regs)))
> 14cf11af6cf60 (Paul Mackerras 2005-09-26 16:04:21 +1000 236) goto bad_area;
>
> So either they're both right or they're both wrong, either way I'm not
> sure how this patch is to blame.
Is there any progress on resolving this?
I did not notice any followup patch nor this one being merged/refuted.
Thanks
Michal
>
> I guess we should also cc stable@...
>
> Regards,
> Daniel
>
> >> Reported-by: Tom Lane <tgl@sss.pgh.pa.us>
> >> Cc: Daniel Black <daniel@linux.ibm.com>
> >> Signed-off-by: Daniel Axtens <dja@axtens.net>
> >> ---
> >> arch/powerpc/mm/fault.c | 10 ++++++++++
> >> 1 file changed, 10 insertions(+)
> >>
> >> diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c
> >> index b5047f9b5dec..00183731ea22 100644
> >> --- a/arch/powerpc/mm/fault.c
> >> +++ b/arch/powerpc/mm/fault.c
> >> @@ -287,7 +287,17 @@ static bool bad_stack_expansion(struct pt_regs *regs, unsigned long address,
> >> if (!res)
> >> return !store_updates_sp(inst);
> >> *must_retry = true;
> >> + } else if ((flags & FAULT_FLAG_WRITE) &&
> >> + !(flags & FAULT_FLAG_USER)) {
> >> + /*
> >> + * the kernel can also attempt to write beyond the end
> >> + * of a process's stack - for example setting up a
> >> + * signal frame. We assume this is valid, subject to
> >> + * the checks in expand_stack() later.
> >> + */
> >> + return false;
> >> }
> >> +
> >> return true;
> >> }
> >> return false;
> >> --
> >> 2.20.1
> >>
^ permalink raw reply
* [PATCH net-next] net: fs_enet: remove redundant null check
From: Zhang Changzhong @ 2020-07-20 11:12 UTC (permalink / raw)
To: pantelis.antoniou, davem, kuba; +Cc: netdev, linuxppc-dev, linux-kernel
Because clk_prepare_enable and clk_disable_unprepare already
checked NULL clock parameter, so the additional checks are
unnecessary, just remove them.
Signed-off-by: Zhang Changzhong <zhangchangzhong@huawei.com>
---
drivers/net/ethernet/freescale/fs_enet/fs_enet-main.c | 6 ++----
1 file changed, 2 insertions(+), 4 deletions(-)
diff --git a/drivers/net/ethernet/freescale/fs_enet/fs_enet-main.c b/drivers/net/ethernet/freescale/fs_enet/fs_enet-main.c
index b0d4b198..bf846b4 100644
--- a/drivers/net/ethernet/freescale/fs_enet/fs_enet-main.c
+++ b/drivers/net/ethernet/freescale/fs_enet/fs_enet-main.c
@@ -1043,8 +1043,7 @@ static int fs_enet_probe(struct platform_device *ofdev)
out_free_dev:
free_netdev(ndev);
out_put:
- if (fpi->clk_per)
- clk_disable_unprepare(fpi->clk_per);
+ clk_disable_unprepare(fpi->clk_per);
out_deregister_fixed_link:
of_node_put(fpi->phy_node);
if (of_phy_is_fixed_link(ofdev->dev.of_node))
@@ -1065,8 +1064,7 @@ static int fs_enet_remove(struct platform_device *ofdev)
fep->ops->cleanup_data(ndev);
dev_set_drvdata(fep->dev, NULL);
of_node_put(fep->fpi->phy_node);
- if (fep->fpi->clk_per)
- clk_disable_unprepare(fep->fpi->clk_per);
+ clk_disable_unprepare(fep->fpi->clk_per);
if (of_phy_is_fixed_link(ofdev->dev.of_node))
of_phy_deregister_fixed_link(ofdev->dev.of_node);
free_netdev(ndev);
--
1.8.3.1
^ permalink raw reply related
* [PATCH v4 00/12] ppc64: enable kdump support for kexec_file_load syscall
From: Hari Bathini @ 2020-07-20 12:49 UTC (permalink / raw)
To: Michael Ellerman, Andrew Morton
Cc: Pingfan Liu, Kexec-ml, Nayna Jain, Petr Tesarik,
Mahesh J Salgaonkar, Mimi Zohar, lkml, linuxppc-dev, Sourabh Jain,
Vivek Goyal, Dave Young, Thiago Jung Bauermann, Eric Biederman
This patch series enables kdump support for kexec_file_load system
call (kexec -s -p) on PPC64. The changes are inspired from kexec-tools
code but heavily modified for kernel consumption.
The first patch adds a weak arch_kexec_locate_mem_hole() function to
override locate memory hole logic suiting arch needs. There are some
special regions in ppc64 which should be avoided while loading buffer
& there are multiple callers to kexec_add_buffer making it complicated
to maintain range sanity and using generic lookup at the same time.
The second patch marks ppc64 specific code within arch/powerpc/kexec
and arch/powerpc/purgatory to make the subsequent code changes easy
to understand.
The next patch adds helper function to setup different memory ranges
needed for loading kdump kernel, booting into it and exporting the
crashing kernel's elfcore.
The fourth patch overrides arch_kexec_locate_mem_hole() function to
locate memory hole for kdump segments by accounting for the special
memory regions, referred to as excluded memory ranges, and sets
kbuf->mem when a suitable memory region is found.
The fifth patch moves walk_drmem_lmbs() out of .init section with
a few changes to reuse it for setting up kdump kernel's usable memory
ranges. The next patch uses walk_drmem_lmbs() to look up the LMBs
and set linux,drconf-usable-memory & linux,usable-memory properties
in order to restrict kdump kernel's memory usage.
The seventh patch adds relocation support for the purgatory. Patch 8
helps setup the stack for the purgatory. The next patch setups up
backup region as a segment while loading kdump kernel and teaches
purgatory to copy it from source to destination.
Patch 10 builds the elfcore header for the running kernel & passes
the info to kdump kernel via "elfcorehdr=" parameter to export as
/proc/vmcore file. The next patch sets up the memory reserve map
for the kexec kernel and also claims kdump support for kdump as
all the necessary changes are added.
The last patch fixes a lookup issue for `kexec -l -s` case when
memory is reserved for crashkernel.
There is scope to improve purgatory to print messages, verify sha256,
move code common across archs - like arch_kexec_apply_relocations_add
and sha256 digest verification, build purgatory as position independent
code & other Makefile improvements in purgatory which can be dealt with
in a separate patch series as a follow-up.
Tested the changes successfully on P8, P9 lpars, couple of OpenPOWER
boxes, one with secureboot enabled and a simulator.
v3 -> v4:
* Updated get_node_path() to be an iterative function instead of a
recursive one.
* Added comment explaining why low memory is added to kdump kernel's
usable memory ranges though it doesn't fall in crashkernel region.
* Fixed stack_buf to be quadword aligned in accordance with ABI.
* Added missing of_node_put() in setup_purgatory_ppc64().
* Added a FIXME tag to indicate issue in adding opal/rtas regions to
core image.
v2 -> v3:
* Fixed TOC pointer calculation for purgatory by using section info
that has relocations applied.
* Fixed arch_kexec_locate_mem_hole() function to fallback to generic
kexec_locate_mem_hole() lookup if exclude ranges list is empty.
* Dropped check for backup_start in trampoline_64.S as purgatory()
function takes care of it anyway.
v1 -> v2:
* Introduced arch_kexec_locate_mem_hole() for override and dropped
weak arch_kexec_add_buffer().
* Addressed warnings reported by lkp.
* Added patch to address kexec load issue when memory is reserved
for crashkernel.
* Used the appropriate license header for the new files added.
* Added an option to merge ranges to minimize reallocations while
adding memory ranges.
* Dropped within_crashkernel parameter for add_opal_mem_range() &
add_rtas_mem_range() functions as it is not really needed.
---
Hari Bathini (12):
kexec_file: allow archs to handle special regions while locating memory hole
powerpc/kexec_file: mark PPC64 specific code
powerpc/kexec_file: add helper functions for getting memory ranges
ppc64/kexec_file: avoid stomping memory used by special regions
powerpc/drmem: make lmb walk a bit more flexible
ppc64/kexec_file: restrict memory usage of kdump kernel
ppc64/kexec_file: add support to relocate purgatory
ppc64/kexec_file: setup the stack for purgatory
ppc64/kexec_file: setup backup region for kdump kernel
ppc64/kexec_file: prepare elfcore header for crashing kernel
ppc64/kexec_file: add appropriate regions for memory reserve map
ppc64/kexec_file: fix kexec load failure with lack of memory hole
arch/powerpc/include/asm/crashdump-ppc64.h | 10
arch/powerpc/include/asm/drmem.h | 9
arch/powerpc/include/asm/kexec.h | 33 +
arch/powerpc/include/asm/kexec_ranges.h | 25
arch/powerpc/include/asm/purgatory.h | 11
arch/powerpc/kernel/prom.c | 13
arch/powerpc/kexec/Makefile | 2
arch/powerpc/kexec/elf_64.c | 36 +
arch/powerpc/kexec/file_load.c | 60 +
arch/powerpc/kexec/file_load_64.c | 1554 ++++++++++++++++++++++++++++
arch/powerpc/kexec/ranges.c | 410 +++++++
arch/powerpc/mm/drmem.c | 87 +-
arch/powerpc/mm/numa.c | 13
arch/powerpc/purgatory/Makefile | 28 -
arch/powerpc/purgatory/purgatory_64.c | 36 +
arch/powerpc/purgatory/trampoline.S | 117 --
arch/powerpc/purgatory/trampoline_64.S | 170 +++
include/linux/kexec.h | 29 -
kernel/kexec_file.c | 16
19 files changed, 2464 insertions(+), 195 deletions(-)
create mode 100644 arch/powerpc/include/asm/crashdump-ppc64.h
create mode 100644 arch/powerpc/include/asm/kexec_ranges.h
create mode 100644 arch/powerpc/include/asm/purgatory.h
create mode 100644 arch/powerpc/kexec/file_load_64.c
create mode 100644 arch/powerpc/kexec/ranges.c
create mode 100644 arch/powerpc/purgatory/purgatory_64.c
delete mode 100644 arch/powerpc/purgatory/trampoline.S
create mode 100644 arch/powerpc/purgatory/trampoline_64.S
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox