LinuxPPC-Dev Archive on lore.kernel.org

LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed

* Re: [PATCH 02/11] powerpc/smp: Merge Power9 topology with Power topology
From: Gautham R Shenoy @ 2020-07-17  5:44 UTC (permalink / raw)
  To: Srikar Dronamraju
  Cc: Nathan Lynch, Gautham R Shenoy, Oliver OHalloran, Michael Neuling,
	Michael Ellerman, Anton Blanchard, linuxppc-dev, Nick Piggin
In-Reply-To: <20200714043624.5648-3-srikar@linux.vnet.ibm.com>

Hi Srikar,

On Tue, Jul 14, 2020 at 10:06:15AM +0530, Srikar Dronamraju wrote:
> A new sched_domain_topology_level was added just for Power9. However the
> same can be achieved by merging powerpc_topology with power9_topology
> and makes the code more simpler especially when adding a new sched
> domain.
> 
> Cc: linuxppc-dev <linuxppc-dev@lists.ozlabs.org>
> Cc: Michael Ellerman <michaele@au1.ibm.com>
> Cc: Nick Piggin <npiggin@au1.ibm.com>
> Cc: Oliver OHalloran <oliveroh@au1.ibm.com>
> Cc: Nathan Lynch <nathanl@linux.ibm.com>
> Cc: Michael Neuling <mikey@linux.ibm.com>
> Cc: Anton Blanchard <anton@au1.ibm.com>
> Cc: Gautham R Shenoy <ego@linux.vnet.ibm.com>
> Cc: Vaidyanathan Srinivasan <svaidy@linux.ibm.com>
> Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
> ---
>  arch/powerpc/kernel/smp.c | 33 ++++++++++-----------------------
>  1 file changed, 10 insertions(+), 23 deletions(-)
> 
> diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
> index 680c0edcc59d..069ea4b21c6d 100644
> --- a/arch/powerpc/kernel/smp.c
> +++ b/arch/powerpc/kernel/smp.c
> @@ -1315,7 +1315,7 @@ int setup_profiling_timer(unsigned int multiplier)
>  }
> 
>  #ifdef CONFIG_SCHED_SMT
> -/* cpumask of CPUs with asymetric SMT dependancy */
> +/* cpumask of CPUs with asymmetric SMT dependency */
>  static int powerpc_smt_flags(void)
>  {
>  	int flags = SD_SHARE_CPUCAPACITY | SD_SHARE_PKG_RESOURCES;
> @@ -1328,14 +1328,6 @@ static int powerpc_smt_flags(void)
>  }
>  #endif
> 
> -static struct sched_domain_topology_level powerpc_topology[] = {
> -#ifdef CONFIG_SCHED_SMT
> -	{ cpu_smt_mask, powerpc_smt_flags, SD_INIT_NAME(SMT) },
> -#endif
> -	{ cpu_cpu_mask, SD_INIT_NAME(DIE) },
> -	{ NULL, },
> -};
> -
>  /*
>   * P9 has a slightly odd architecture where pairs of cores share an L2 cache.
>   * This topology makes it *much* cheaper to migrate tasks between adjacent cores
> @@ -1353,7 +1345,13 @@ static int powerpc_shared_cache_flags(void)
>   */
>  static const struct cpumask *shared_cache_mask(int cpu)
>  {
> -	return cpu_l2_cache_mask(cpu);
> +	if (shared_caches)
> +		return cpu_l2_cache_mask(cpu);
> +
> +	if (has_big_cores)
> +		return cpu_smallcore_mask(cpu);
> +
> +	return cpu_smt_mask(cpu);
>  }
> 
>  #ifdef CONFIG_SCHED_SMT
> @@ -1363,7 +1361,7 @@ static const struct cpumask *smallcore_smt_mask(int cpu)
>  }
>  #endif
> 
> -static struct sched_domain_topology_level power9_topology[] = {
> +static struct sched_domain_topology_level powerpc_topology[] = {


>  #ifdef CONFIG_SCHED_SMT
>  	{ cpu_smt_mask, powerpc_smt_flags, SD_INIT_NAME(SMT) },
>  #endif
> @@ -1388,21 +1386,10 @@ void __init smp_cpus_done(unsigned int max_cpus)
>  #ifdef CONFIG_SCHED_SMT
>  	if (has_big_cores) {
>  		pr_info("Big cores detected but using small core scheduling\n");
I> -		power9_topology[0].mask = smallcore_smt_mask;
>  		powerpc_topology[0].mask = smallcore_smt_mask;
>  	}
>  #endif
> -	/*
> -	 * If any CPU detects that it's sharing a cache with another CPU then
> -	 * use the deeper topology that is aware of this sharing.
> -	 */
> -	if (shared_caches) {
> -		pr_info("Using shared cache scheduler topology\n");
> -		set_sched_topology(power9_topology);
> -	} else {
> -		pr_info("Using standard scheduler topology\n");
> -		set_sched_topology(powerpc_topology);


Ok, so we will go with the three level topology by default (SMT,
CACHE, DIE) and will rely on the sched-domain creation code to
degenerate CACHE domain in case SMT and CACHE have the same set of
CPUs (POWER8 for eg).

From a cleanup perspective this is better, since we won't have to
worry about defining multiple topology structures, but from a
performance point of view, wouldn't we now pay an extra penalty of
degenerating the CACHE domains on POWER8 kind of systems, each time
when a CPU comes online ?

Do we know how bad it is ? If the degeneration takes a few extra
microseconds, that should be ok I suppose.

--
Thanks and Regards
gautham.

^ permalink raw reply

* Re: [PATCH 01/11] powerpc/smp: Cache node for reuse
From: Gautham R Shenoy @ 2020-07-17  4:51 UTC (permalink / raw)
  To: Srikar Dronamraju
  Cc: Nathan Lynch, Gautham R Shenoy, Oliver OHalloran, Michael Neuling,
	Michael Ellerman, Anton Blanchard, linuxppc-dev, Nick Piggin
In-Reply-To: <20200714043624.5648-2-srikar@linux.vnet.ibm.com>

On Tue, Jul 14, 2020 at 10:06:14AM +0530, Srikar Dronamraju wrote:
> While cpu_to_node is inline function with access to per_cpu variable.
> However when using repeatedly, it may be cleaner to cache it in a local
> variable.
> 
> Also fix a build error in a some weird config.
> "error: _numa_cpu_lookup_table_ undeclared"
> 
> No functional change
> 
> Cc: linuxppc-dev <linuxppc-dev@lists.ozlabs.org>
> Cc: Michael Ellerman <michaele@au1.ibm.com>
> Cc: Nick Piggin <npiggin@au1.ibm.com>
> Cc: Oliver OHalloran <oliveroh@au1.ibm.com>
> Cc: Nathan Lynch <nathanl@linux.ibm.com>
> Cc: Michael Neuling <mikey@linux.ibm.com>
> Cc: Anton Blanchard <anton@au1.ibm.com>
> Cc: Gautham R Shenoy <ego@linux.vnet.ibm.com>
> Cc: Vaidyanathan Srinivasan <svaidy@linux.ibm.com>
> Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>


LGTM.

Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>

> ---
>  arch/powerpc/kernel/smp.c | 18 +++++++++++-------
>  1 file changed, 11 insertions(+), 7 deletions(-)
> 
> diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
> index 73199470c265..680c0edcc59d 100644
> --- a/arch/powerpc/kernel/smp.c
> +++ b/arch/powerpc/kernel/smp.c
> @@ -843,7 +843,7 @@ void __init smp_prepare_cpus(unsigned int max_cpus)
> 
>  	DBG("smp_prepare_cpus\n");
> 
> -	/* 
> +	/*
>  	 * setup_cpu may need to be called on the boot cpu. We havent
>  	 * spun any cpus up but lets be paranoid.
>  	 */
> @@ -854,20 +854,24 @@ void __init smp_prepare_cpus(unsigned int max_cpus)
>  	cpu_callin_map[boot_cpuid] = 1;
> 
>  	for_each_possible_cpu(cpu) {
> +		int node = cpu_to_node(cpu);
> +
>  		zalloc_cpumask_var_node(&per_cpu(cpu_sibling_map, cpu),
> -					GFP_KERNEL, cpu_to_node(cpu));
> +					GFP_KERNEL, node);
>  		zalloc_cpumask_var_node(&per_cpu(cpu_l2_cache_map, cpu),
> -					GFP_KERNEL, cpu_to_node(cpu));
> +					GFP_KERNEL, node);
>  		zalloc_cpumask_var_node(&per_cpu(cpu_core_map, cpu),
> -					GFP_KERNEL, cpu_to_node(cpu));
> +					GFP_KERNEL, node);
> +#ifdef CONFIG_NEED_MULTIPLE_NODES
>  		/*
>  		 * numa_node_id() works after this.
>  		 */
>  		if (cpu_present(cpu)) {
> -			set_cpu_numa_node(cpu, numa_cpu_lookup_table[cpu]);
> -			set_cpu_numa_mem(cpu,
> -				local_memory_node(numa_cpu_lookup_table[cpu]));
> +			node = numa_cpu_lookup_table[cpu];
> +			set_cpu_numa_node(cpu, node);
> +			set_cpu_numa_mem(cpu, local_memory_node(node));
>  		}
> +#endif
>  	}
> 
>  	/* Init the cpumasks so the boot CPU is related to itself */
> -- 
> 2.17.1
> 

^ permalink raw reply

* Re: [PATCH v3 02/12] powerpc/kexec_file: mark PPC64 specific code
From: Hari Bathini @ 2020-07-17  4:46 UTC (permalink / raw)
  To: Thiago Jung Bauermann
  Cc: Pingfan Liu, Petr Tesarik, Nayna Jain, Kexec-ml,
	Mahesh J Salgaonkar, Mimi Zohar, lkml, linuxppc-dev, Sourabh Jain,
	Andrew Morton, Dave Young, Vivek Goyal, Eric Biederman
In-Reply-To: <87v9io8c13.fsf@morokweng.localdomain>



On 16/07/20 7:19 am, Thiago Jung Bauermann wrote:
> 
> I didn't forget about this patch. I just wanted to see more of the
> changes before comenting on it.
> 
> Hari Bathini <hbathini@linux.ibm.com> writes:
> 
>> Some of the kexec_file_load code isn't PPC64 specific. Move PPC64
>> specific code from kexec/file_load.c to kexec/file_load_64.c. Also,
>> rename purgatory/trampoline.S to purgatory/trampoline_64.S in the
>> same spirit.
> 
> There's only a 64 bit implementation of kexec_file_load() so this is a
> somewhat theoretical exercise, but there's no harm in getting the code
> organized, so:
> 
> Reviewed-by: Thiago Jung Bauermann <bauerman@linux.ibm.com>
> 
> I have just one question below.

<snip>

>> +/**
>> + * setup_new_fdt_ppc64 - Update the flattend device-tree of the kernel
>> + *                       being loaded.
>> + * @image:               kexec image being loaded.
>> + * @fdt:                 Flattened device tree for the next kernel.
>> + * @initrd_load_addr:    Address where the next initrd will be loaded.
>> + * @initrd_len:          Size of the next initrd, or 0 if there will be none.
>> + * @cmdline:             Command line for the next kernel, or NULL if there will
>> + *                       be none.
>> + *
>> + * Returns 0 on success, negative errno on error.
>> + */
>> +int setup_new_fdt_ppc64(const struct kimage *image, void *fdt,
>> +			unsigned long initrd_load_addr,
>> +			unsigned long initrd_len, const char *cmdline)
>> +{
>> +	int chosen_node, ret;
>> +
>> +	/* Remove memory reservation for the current device tree. */
>> +	ret = delete_fdt_mem_rsv(fdt, __pa(initial_boot_params),
>> +				 fdt_totalsize(initial_boot_params));
>> +	if (ret == 0)
>> +		pr_debug("Removed old device tree reservation.\n");
>> +	else if (ret != -ENOENT) {
>> +		pr_err("Failed to remove old device-tree reservation.\n");
>> +		return ret;
>> +	}
>> +
>> +	ret = setup_new_fdt(image, fdt, initrd_load_addr, initrd_len,
>> +			    cmdline, &chosen_node);
>> +	if (ret)
>> +		return ret;
>> +
>> +	ret = fdt_setprop(fdt, chosen_node, "linux,booted-from-kexec", NULL, 0);
>> +	if (ret)
>> +		pr_err("Failed to update device-tree with linux,booted-from-kexec\n");
>> +
>> +	return ret;
>> +}
> 
> For setup_purgatory_ppc64() you start with an empty function and build
> from there, but for setup_new_fdt_ppc64() you moved some code here. Is
> the code above 64 bit specific?

Actually, I was not quiet sure if fdt updates like in patch 6 & patch 9 can be
done after setup_ima_buffer() call. If you can confirm, I will move them back
to setup_purgatory()

Thanks
Hari

^ permalink raw reply

* Re: [PATCH -next] cpuidle/pseries: Make symbol 'pseries_idle_driver' static
From: Daniel Lezcano @ 2020-07-17  4:41 UTC (permalink / raw)
  To: Michael Ellerman, Hulk Robot, Wei Yongjun, Rafael J. Wysocki,
	Michael Ellerman
  Cc: linuxppc-dev, linux-pm
In-Reply-To: <159490401706.3805857.7133480973769495238.b4-ty@ellerman.id.au>

On 16/07/2020 14:56, Michael Ellerman wrote:
> On Tue, 14 Jul 2020 22:24:24 +0800, Wei Yongjun wrote:
>> The sparse tool complains as follows:
>>
>> drivers/cpuidle/cpuidle-pseries.c:25:23: warning:
>>  symbol 'pseries_idle_driver' was not declared. Should it be static?
>>
>> 'pseries_idle_driver' is not used outside of this file, so marks
>> it static.
> 
> Applied to powerpc/next.
> 
> [1/1] cpuidle/pseries: Make symbol 'pseries_idle_driver' static
>       https://git.kernel.org/powerpc/c/92fe8483b1660feaa602d8be6ca7efe95ae4789b

Rafael already picked the patch.


-- 
<http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog

^ permalink raw reply

* Re: [PATCH v3 03/12] powerpc/kexec_file: add helper functions for getting memory ranges
From: Hari Bathini @ 2020-07-17  4:32 UTC (permalink / raw)
  To: Thiago Jung Bauermann
  Cc: Pingfan Liu, Petr Tesarik, Nayna Jain, Kexec-ml,
	Mahesh J Salgaonkar, Mimi Zohar, lkml, linuxppc-dev, Sourabh Jain,
	Andrew Morton, Dave Young, Vivek Goyal, Eric Biederman
In-Reply-To: <874kq98xo4.fsf@morokweng.localdomain>



On 15/07/20 5:19 am, Thiago Jung Bauermann wrote:
> 
> Hello Hari,
> 
> Hari Bathini <hbathini@linux.ibm.com> writes:
> 
>> In kexec case, the kernel to be loaded uses the same memory layout as
>> the running kernel. So, passing on the DT of the running kernel would
>> be good enough.
>>
>> But in case of kdump, different memory ranges are needed to manage
>> loading the kdump kernel, booting into it and exporting the elfcore
>> of the crashing kernel. The ranges are exlude memory ranges, usable
> 
> s/exlude/exclude/
> 
>> memory ranges, reserved memory ranges and crash memory ranges.
>>
>> Exclude memory ranges specify the list of memory ranges to avoid while
>> loading kdump segments. Usable memory ranges list the memory ranges
>> that could be used for booting kdump kernel. Reserved memory ranges
>> list the memory regions for the loading kernel's reserve map. Crash
>> memory ranges list the memory ranges to be exported as the crashing
>> kernel's elfcore.
>>
>> Add helper functions for setting up the above mentioned memory ranges.
>> This helpers facilitate in understanding the subsequent changes better
>> and make it easy to setup the different memory ranges listed above, as
>> and when appropriate.
>>
>> Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
>> Tested-by: Pingfan Liu <piliu@redhat.com>
> 

<snip>

>> +/**
>> + * add_reserved_ranges - Adds "/reserved-ranges" regions exported by f/w
>> + *                       to the given memory ranges list.
>> + * @mem_ranges:          Range list to add the memory ranges to.
>> + *
>> + * Returns 0 on success, negative errno on error.
>> + */
>> +int add_reserved_ranges(struct crash_mem **mem_ranges)
>> +{
>> +	int i, len, ret = 0;
>> +	const __be32 *prop;
>> +
>> +	prop = of_get_property(of_root, "reserved-ranges", &len);
>> +	if (!prop)
>> +		return 0;
>> +
>> +	/*
>> +	 * Each reserved range is an (address,size) pair, 2 cells each,
>> +	 * totalling 4 cells per range.
> 
> Can you assume that, or do you need to check the #address-cells and
> #size-cells properties of the root node?

Taken from early_reserve_mem_dt() which did not seem to care.
Should we be doing any different here?

Thanks
Hari

^ permalink raw reply

* Re: [PATCH v3 0/3] Off-load TLB invalidations to host for !GTSE
From: Bharata B Rao @ 2020-07-17  4:28 UTC (permalink / raw)
  To: Nicholas Piggin
  Cc: sfr, linux-kernel, linux-next, Qian Cai, aneesh.kumar,
	linuxppc-dev
In-Reply-To: <1594953143.b8px5ir35m.astroid@bobo.none>

On Fri, Jul 17, 2020 at 12:44:00PM +1000, Nicholas Piggin wrote:
> Excerpts from Nicholas Piggin's message of July 17, 2020 12:08 pm:
> > Excerpts from Qian Cai's message of July 17, 2020 3:27 am:
> >> On Fri, Jul 03, 2020 at 11:06:05AM +0530, Bharata B Rao wrote:
> >>> Hypervisor may choose not to enable Guest Translation Shootdown Enable
> >>> (GTSE) option for the guest. When GTSE isn't ON, the guest OS isn't
> >>> permitted to use instructions like tblie and tlbsync directly, but is
> >>> expected to make hypervisor calls to get the TLB flushed.
> >>> 
> >>> This series enables the TLB flush routines in the radix code to
> >>> off-load TLB flushing to hypervisor via the newly proposed hcall
> >>> H_RPT_INVALIDATE. 
> >>> 
> >>> To easily check the availability of GTSE, it is made an MMU feature.
> >>> The OV5 handling and H_REGISTER_PROC_TBL hcall are changed to
> >>> handle GTSE as an optionally available feature and to not assume GTSE
> >>> when radix support is available.
> >>> 
> >>> The actual hcall implementation for KVM isn't included in this
> >>> patchset and will be posted separately.
> >>> 
> >>> Changes in v3
> >>> =============
> >>> - Fixed a bug in the hcall wrapper code where we were missing setting
> >>>   H_RPTI_TYPE_NESTED while retrying the failed flush request with
> >>>   a full flush for the nested case.
> >>> - s/psize_to_h_rpti/psize_to_rpti_pgsize
> >>> 
> >>> v2: https://lore.kernel.org/linuxppc-dev/20200626131000.5207-1-bharata@linux.ibm.com/T/#t
> >>> 
> >>> Bharata B Rao (2):
> >>>   powerpc/mm: Enable radix GTSE only if supported.
> >>>   powerpc/pseries: H_REGISTER_PROC_TBL should ask for GTSE only if
> >>>     enabled
> >>> 
> >>> Nicholas Piggin (1):
> >>>   powerpc/mm/book3s64/radix: Off-load TLB invalidations to host when
> >>>     !GTSE
> >> 
> >> Reverting the whole series fixed random memory corruptions during boot on
> >> POWER9 PowerNV systems below.
> > 
> > If I s/mmu_has_feature(MMU_FTR_GTSE)/(1)/g in radix_tlb.c, then the .o
> > disasm is the same as reverting my patch.
> > 
> > Feature bits not being set right? PowerNV should be pretty simple, seems
> > to do the same as FTR_TYPE_RADIX.
> 
> Might need this fix
> 
> ---
> 
> diff --git a/arch/powerpc/kernel/prom.c b/arch/powerpc/kernel/prom.c
> index 9cc49f265c86..54c9bcea9d4e 100644
> --- a/arch/powerpc/kernel/prom.c
> +++ b/arch/powerpc/kernel/prom.c
> @@ -163,7 +163,7 @@ static struct ibm_pa_feature {
>  	{ .pabyte = 0,  .pabit = 6, .cpu_features  = CPU_FTR_NOEXECUTE },
>  	{ .pabyte = 1,  .pabit = 2, .mmu_features  = MMU_FTR_CI_LARGE_PAGE },
>  #ifdef CONFIG_PPC_RADIX_MMU
> -	{ .pabyte = 40, .pabit = 0, .mmu_features  = MMU_FTR_TYPE_RADIX },
> +	{ .pabyte = 40, .pabit = 0, .mmu_features  = (MMU_FTR_TYPE_RADIX | MMU_FTR_GTSE) },
>  #endif
>  	{ .pabyte = 1,  .pabit = 1, .invert = 1, .cpu_features = CPU_FTR_NODSISRALIGN },
>  	{ .pabyte = 5,  .pabit = 0, .cpu_features  = CPU_FTR_REAL_LE,

Michael - Let me know if this should be folded into 1/3 and the complete
series resent.

Regards,
Bharata.

^ permalink raw reply

* Re: [PATCH v4 04/10] powerpc/watchpoint: Enable watchpoint functionality on power10 guest
From: Jordan Niethe @ 2020-07-17  4:23 UTC (permalink / raw)
  To: Ravi Bangoria
  Cc: Christophe Leroy, apopple, mikey, miltonm, peterz, oleg,
	Nicholas Piggin, linux-kernel, Paul Mackerras, jolsa, fweisbec,
	pedromfc, naveen.n.rao, linuxppc-dev, mingo
In-Reply-To: <20200717040958.70561-5-ravi.bangoria@linux.ibm.com>

On Fri, Jul 17, 2020 at 2:10 PM Ravi Bangoria
<ravi.bangoria@linux.ibm.com> wrote:
>
> CPU_FTR_DAWR is by default enabled for host via CPU_FTRS_DT_CPU_BASE
> (controlled by CONFIG_PPC_DT_CPU_FTRS). But cpu-features device-tree
> node is not PAPR compatible and thus not yet used by kvm or pHyp
> guests. Enable watchpoint functionality on power10 guest (both kvm
> and powervm) by adding CPU_FTR_DAWR to CPU_FTRS_POWER10. Note that
> this change does not enable 2nd DAWR support.
>
> Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
I ran the ptrace-hwbreak selftest successfully within a power10 kvm guest.
Tested-by: Jordan Niethe <jniethe5@gmail.com>
> ---
>  arch/powerpc/include/asm/cputable.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/arch/powerpc/include/asm/cputable.h b/arch/powerpc/include/asm/cputable.h
> index bac2252c839e..e506d429b1af 100644
> --- a/arch/powerpc/include/asm/cputable.h
> +++ b/arch/powerpc/include/asm/cputable.h
> @@ -478,7 +478,7 @@ static inline void cpu_feature_keys_init(void) { }
>             CPU_FTR_CFAR | CPU_FTR_HVMODE | CPU_FTR_VMX_COPY | \
>             CPU_FTR_DBELL | CPU_FTR_HAS_PPR | CPU_FTR_ARCH_207S | \
>             CPU_FTR_TM_COMP | CPU_FTR_ARCH_300 | CPU_FTR_PKEY | \
> -           CPU_FTR_ARCH_31)
> +           CPU_FTR_ARCH_31 | CPU_FTR_DAWR)
>  #define CPU_FTRS_CELL  (CPU_FTR_LWSYNC | \
>             CPU_FTR_PPCAS_ARCH_V2 | CPU_FTR_CTRL | \
>             CPU_FTR_ALTIVEC_COMP | CPU_FTR_MMCRA | CPU_FTR_SMT | \
> --
> 2.26.2
>

^ permalink raw reply

* Re: [PATCH v3 06/12] ppc64/kexec_file: restrict memory usage of kdump kernel
From: Hari Bathini @ 2020-07-17  4:17 UTC (permalink / raw)
  To: Thiago Jung Bauermann
  Cc: Pingfan Liu, Petr Tesarik, Nayna Jain, Kexec-ml,
	Mahesh J Salgaonkar, Mimi Zohar, lkml, linuxppc-dev, Sourabh Jain,
	Andrew Morton, Dave Young, Vivek Goyal, Eric Biederman
In-Reply-To: <875zance3n.fsf@morokweng.localdomain>



On 17/07/20 3:33 am, Thiago Jung Bauermann wrote:
> 
> Hari Bathini <hbathini@linux.ibm.com> writes:
> 
>> On 16/07/20 4:22 am, Thiago Jung Bauermann wrote:
>>>
>>> Hari Bathini <hbathini@linux.ibm.com> writes:
>>>

<snip>
 
>>>> +	 * each representing a memory range.
>>>> +	 */
>>>> +	ranges = (len >> 2) / (n_mem_addr_cells + n_mem_size_cells);
>>>> +
>>>> +	for (i = 0; i < ranges; i++) {
>>>> +		base = of_read_number(prop, n_mem_addr_cells);
>>>> +		prop += n_mem_addr_cells;
>>>> +		end = base + of_read_number(prop, n_mem_size_cells) - 1;
>>
>> prop is not used after the above.
>>
>>> You need to `prop += n_mem_size_cells` here.
>>
>> But yeah, adding it would make it look complete in some sense..
> 
> Isn't it used in the next iteration of the loop?

Memory@XXX/reg typically has only one range. I was looking at it
from that perspective which is not right. Will update.

Thanks
Hari

^ permalink raw reply

* Re: [powerpc:next-test 125/127] arch/powerpc/mm/book3s64/pkeys.c:392:7: error: implicit declaration of function 'is_pkey_enabled'; did you mean
From: Aneesh Kumar K.V @ 2020-07-17  4:16 UTC (permalink / raw)
  To: kernel test robot; +Cc: linuxppc-dev, kbuild-all
In-Reply-To: <202007170943.BGjGSyKg%lkp@intel.com>

On 7/17/20 7:29 AM, kernel test robot wrote:
> tree:   https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git next-test
> head:   0fbd1eb4df96e1cbd039e0b95fdf62cf65a7faf9
> commit: ed411c66eea2ccf93a634ae661a1f79c2bc63d88 [125/127] powerpc/book3s64/pkeys: Remove is_pkey_enabled()
> config: powerpc-allmodconfig (attached as .config)
> compiler: powerpc64-linux-gcc (GCC) 9.3.0
> reproduce (this is a W=1 build):
>          wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
>          chmod +x ~/bin/make.cross
>          git checkout ed411c66eea2ccf93a634ae661a1f79c2bc63d88
>          # save the attached .config to linux build tree
>          COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross ARCH=powerpc
> 
> If you fix the issue, kindly add following tag as appropriate
> Reported-by: kernel test robot <lkp@intel.com>
> 
> All errors (new ones prefixed by >>):
> 
>     arch/powerpc/mm/book3s64/pkeys.c: In function 'pkey_access_permitted':
>>> arch/powerpc/mm/book3s64/pkeys.c:392:7: error: implicit declaration of function 'is_pkey_enabled'; did you mean 'arch_pkeys_enabled'? [-Werror=implicit-function-declaration]
>       392 |  if (!is_pkey_enabled(pkey))
>           |       ^~~~~~~~~~~~~~~
>           |       arch_pkeys_enabled
>     cc1: some warnings being treated as errors
> 
> vim +392 arch/powerpc/mm/book3s64/pkeys.c
> 


We removed that upstream in

19ab500edb5d6020010caba48ce3b4ce4182ab63     powerpc/mm/pkeys: Make pkey 
access check work on execute_only_key


next-test need to be rebased?

-aneesh

^ permalink raw reply

* ASMedia USB 3.x host controllers triggering EEH on POWER9
From: Forest Crossman @ 2020-07-17  4:13 UTC (permalink / raw)
  To: linuxppc-dev, linux-pci, linux-usb

Hi, all,

I have several ASMedia USB 3.x host controllers (ASM2142 and ASM3142,
both share the same Vendor ID/Device ID pair) that I'd like to use
with a POWER9 system (a Raptor Computing Systems Talos II).
Unfortunately, while the kernel recognizes the controllers just fine,
as soon as I plug in a device, an EEH error occurs and the host
controller gets repeatedly reset until it eventually gets disabled. An
example of one of these errors can be seen here:
https://paste.debian.net/hidden/e39698eb

Based on the "PHB4 Diag-data" reported by the kernel, it seems that
LEM_WOF_R bit 35, PHB_FESR bit 20, and RXE_ARB_FESR bit 28 have been
set. According to the PHB4 specification
(https://ibm.ent.box.com/s/jftnfhceul07qjh9jtn91xwjmclabc71), they
respectively mean the following:
 - ARB: IODA TVT Errors - "TCE Validation Table error occurred. The
entry is invalid, or the PCI Address was out of range as defined by
the TTA bounds in the TVE entry."
 - RXE_ARB OR Error Status - "RXE_ARB error bits, ... OR of all error
status bits."
 - IODA TVT Address Range Error - "IODA Error: The PCI Address was out
of range as defined by the TTA bounds in the TVE entry."

In other words, the ASMedia USB controllers seem to be trying to write
to addresses they're not supposed to, and thankfully the PHB4 is
catching these bad writes before they can cause any corruption of my
system's memory. Of course, this has the unfortunate side-effect that
these devices are completely unable to operate with my computer, and
since it seems to be possible to use these controllers on x86 systems
(presumably because of the less-strict/disabled-by-default IOMMU), I
wonder if maybe it would be possible to work around these errors in
either the kernel or the OPAL firmware? My thinking is that instead of
disconnecting the misbehaving devices, maybe the errors could be
"forgiven" (but still blocked) and the device permitted to continue
operating, possibly with some USB data loss from "writes to nowhere"
or retries that may reduce performance. Or maybe if the issue is
caused by some high address bits being set to random values, those
bits could be masked-off so as to not trigger the errors and even
avoid data loss.

So, my question is, is any of this possible? I know the simple
solution for me is to just RMA the cards and avoid purchasing
ASMedia-based USB host controllers in the future, but the fact that
they still seem to work "mostly ok" on x86 systems (with the
occasional kernel panics and BSODs reported by users) piques my
curiosity and makes me wonder if maybe there's a way for me to have my
cheap, buggy hardware cake and eat it, too.

Now, I'm a novice at kernel hacking, so I don't really know what I'm
doing, but just for fun I did try to paper over the issue by adding an
EEH handler to the xhci driver
(https://paste.debian.net/hidden/16081515), but as you might expect,
that didn't do anything but prevent further communication with the
device. I also read a bunch of the PHB4 and IODA2 specs to see if
maybe there'd be a way to implement that bit-masking thing I
mentioned, but both of those documents are, uh, rather dry reading, so
I haven't read them in their entirety, and I don't know enough about
how this all works to try to search the text for what I need.

All that said, if anyone has any suggestions or comments, I'd be
really interested to hear them, even if it's just to question why I'd
go to such ridiculous lengths to try to get software to account for
buggy hardware.

All the best,

Forest

^ permalink raw reply

* [PATCH v4 10/10] powerpc/watchpoint: Remove 512 byte boundary
From: Ravi Bangoria @ 2020-07-17  4:09 UTC (permalink / raw)
  To: mpe, mikey
  Cc: christophe.leroy, ravi.bangoria, apopple, peterz, fweisbec,
	miltonm, oleg, npiggin, linux-kernel, paulus, jolsa, jniethe5,
	pedromfc, naveen.n.rao, linuxppc-dev, mingo
In-Reply-To: <20200717040958.70561-1-ravi.bangoria@linux.ibm.com>

Power10 has removed 512 bytes boundary from match criteria. i.e. The watch
range can cross 512 bytes boundary.

Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
---
 arch/powerpc/kernel/hw_breakpoint.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kernel/hw_breakpoint.c b/arch/powerpc/kernel/hw_breakpoint.c
index c55e67bab271..1f4a1efa0074 100644
--- a/arch/powerpc/kernel/hw_breakpoint.c
+++ b/arch/powerpc/kernel/hw_breakpoint.c
@@ -418,8 +418,9 @@ static int hw_breakpoint_validate_len(struct arch_hw_breakpoint *hw)
 
 	if (dawr_enabled()) {
 		max_len = DAWR_MAX_LEN;
-		/* DAWR region can't cross 512 bytes boundary */
-		if (ALIGN_DOWN(start_addr, SZ_512) != ALIGN_DOWN(end_addr - 1, SZ_512))
+		/* DAWR region can't cross 512 bytes boundary on p10 predecessors */
+		if (!cpu_has_feature(CPU_FTR_ARCH_31) &&
+		    (ALIGN_DOWN(start_addr, SZ_512) != ALIGN_DOWN(end_addr - 1, SZ_512)))
 			return -EINVAL;
 	} else if (IS_ENABLED(CONFIG_PPC_8xx)) {
 		/* 8xx can setup a range without limitation */
-- 
2.26.2


^ permalink raw reply related

* [PATCH v4 09/10] powerpc/watchpoint: Return available watchpoints dynamically
From: Ravi Bangoria @ 2020-07-17  4:09 UTC (permalink / raw)
  To: mpe, mikey
  Cc: christophe.leroy, ravi.bangoria, apopple, peterz, fweisbec,
	miltonm, oleg, npiggin, linux-kernel, paulus, jolsa, jniethe5,
	pedromfc, naveen.n.rao, linuxppc-dev, mingo
In-Reply-To: <20200717040958.70561-1-ravi.bangoria@linux.ibm.com>

So far Book3S Powerpc supported only one watchpoint. Power10 is
introducing 2nd DAWR. Enable 2nd DAWR support for Power10.
Availability of 2nd DAWR will depend on CPU_FTR_DAWR1.

Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
---
 arch/powerpc/include/asm/cputable.h      | 4 +++-
 arch/powerpc/include/asm/hw_breakpoint.h | 5 +++--
 2 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/include/asm/cputable.h b/arch/powerpc/include/asm/cputable.h
index 3445c86e1f6f..36a0851a7a9b 100644
--- a/arch/powerpc/include/asm/cputable.h
+++ b/arch/powerpc/include/asm/cputable.h
@@ -633,7 +633,9 @@ enum {
  * Maximum number of hw breakpoint supported on powerpc. Number of
  * breakpoints supported by actual hw might be less than this.
  */
-#define HBP_NUM_MAX	1
+#define HBP_NUM_MAX	2
+#define HBP_NUM_ONE	1
+#define HBP_NUM_TWO	2
 
 #endif /* !__ASSEMBLY__ */
 
diff --git a/arch/powerpc/include/asm/hw_breakpoint.h b/arch/powerpc/include/asm/hw_breakpoint.h
index cb424799da0d..d4eab1694bcd 100644
--- a/arch/powerpc/include/asm/hw_breakpoint.h
+++ b/arch/powerpc/include/asm/hw_breakpoint.h
@@ -5,10 +5,11 @@
  * Copyright 2010, IBM Corporation.
  * Author: K.Prasad <prasad@linux.vnet.ibm.com>
  */
-
 #ifndef _PPC_BOOK3S_64_HW_BREAKPOINT_H
 #define _PPC_BOOK3S_64_HW_BREAKPOINT_H
 
+#include <asm/cpu_has_feature.h>
+
 #ifdef	__KERNEL__
 struct arch_hw_breakpoint {
 	unsigned long	address;
@@ -46,7 +47,7 @@ struct arch_hw_breakpoint {
 
 static inline int nr_wp_slots(void)
 {
-	return HBP_NUM_MAX;
+	return cpu_has_feature(CPU_FTR_DAWR1) ? HBP_NUM_TWO : HBP_NUM_ONE;
 }
 
 #ifdef CONFIG_HAVE_HW_BREAKPOINT
-- 
2.26.2


^ permalink raw reply related

* [PATCH v4 08/10] powerpc/watchpoint: Guest support for 2nd DAWR hcall
From: Ravi Bangoria @ 2020-07-17  4:09 UTC (permalink / raw)
  To: mpe, mikey
  Cc: christophe.leroy, ravi.bangoria, apopple, peterz, fweisbec,
	miltonm, oleg, npiggin, linux-kernel, paulus, jolsa, jniethe5,
	pedromfc, naveen.n.rao, linuxppc-dev, mingo
In-Reply-To: <20200717040958.70561-1-ravi.bangoria@linux.ibm.com>

2nd DAWR can be set/unset using H_SET_MODE hcall with resource value 5.
Enable powervm guest support with that. This has no effect on kvm guest
because kvm will return error if guest does hcall with resource value 5.

Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
---
 arch/powerpc/include/asm/hvcall.h         | 1 +
 arch/powerpc/include/asm/machdep.h        | 2 +-
 arch/powerpc/include/asm/plpar_wrappers.h | 5 +++++
 arch/powerpc/kernel/dawr.c                | 2 +-
 arch/powerpc/platforms/pseries/setup.c    | 7 +++++--
 5 files changed, 13 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/include/asm/hvcall.h b/arch/powerpc/include/asm/hvcall.h
index b785e9f0071c..33793444144c 100644
--- a/arch/powerpc/include/asm/hvcall.h
+++ b/arch/powerpc/include/asm/hvcall.h
@@ -358,6 +358,7 @@
 #define H_SET_MODE_RESOURCE_SET_DAWR0		2
 #define H_SET_MODE_RESOURCE_ADDR_TRANS_MODE	3
 #define H_SET_MODE_RESOURCE_LE			4
+#define H_SET_MODE_RESOURCE_SET_DAWR1		5
 
 /* Values for argument to H_SIGNAL_SYS_RESET */
 #define H_SIGNAL_SYS_RESET_ALL			-1
diff --git a/arch/powerpc/include/asm/machdep.h b/arch/powerpc/include/asm/machdep.h
index 7bcb64444a39..a90b892f0bfe 100644
--- a/arch/powerpc/include/asm/machdep.h
+++ b/arch/powerpc/include/asm/machdep.h
@@ -131,7 +131,7 @@ struct machdep_calls {
 				    unsigned long dabrx);
 
 	/* Set DAWR for this platform, leave empty for default implementation */
-	int		(*set_dawr)(unsigned long dawr,
+	int		(*set_dawr)(int nr, unsigned long dawr,
 				    unsigned long dawrx);
 
 #ifdef CONFIG_PPC32	/* XXX for now */
diff --git a/arch/powerpc/include/asm/plpar_wrappers.h b/arch/powerpc/include/asm/plpar_wrappers.h
index d12c3680d946..ece84a430701 100644
--- a/arch/powerpc/include/asm/plpar_wrappers.h
+++ b/arch/powerpc/include/asm/plpar_wrappers.h
@@ -315,6 +315,11 @@ static inline long plpar_set_watchpoint0(unsigned long dawr0, unsigned long dawr
 	return plpar_set_mode(0, H_SET_MODE_RESOURCE_SET_DAWR0, dawr0, dawrx0);
 }
 
+static inline long plpar_set_watchpoint1(unsigned long dawr1, unsigned long dawrx1)
+{
+	return plpar_set_mode(0, H_SET_MODE_RESOURCE_SET_DAWR1, dawr1, dawrx1);
+}
+
 static inline long plpar_signal_sys_reset(long cpu)
 {
 	return plpar_hcall_norets(H_SIGNAL_SYS_RESET, cpu);
diff --git a/arch/powerpc/kernel/dawr.c b/arch/powerpc/kernel/dawr.c
index 500f52fa4711..cdc2dccb987d 100644
--- a/arch/powerpc/kernel/dawr.c
+++ b/arch/powerpc/kernel/dawr.c
@@ -37,7 +37,7 @@ int set_dawr(int nr, struct arch_hw_breakpoint *brk)
 	dawrx |= (mrd & 0x3f) << (63 - 53);
 
 	if (ppc_md.set_dawr)
-		return ppc_md.set_dawr(dawr, dawrx);
+		return ppc_md.set_dawr(nr, dawr, dawrx);
 
 	if (nr == 0) {
 		mtspr(SPRN_DAWR0, dawr);
diff --git a/arch/powerpc/platforms/pseries/setup.c b/arch/powerpc/platforms/pseries/setup.c
index 2db8469e475f..d516ee8eb7fc 100644
--- a/arch/powerpc/platforms/pseries/setup.c
+++ b/arch/powerpc/platforms/pseries/setup.c
@@ -831,12 +831,15 @@ static int pseries_set_xdabr(unsigned long dabr, unsigned long dabrx)
 	return plpar_hcall_norets(H_SET_XDABR, dabr, dabrx);
 }
 
-static int pseries_set_dawr(unsigned long dawr, unsigned long dawrx)
+static int pseries_set_dawr(int nr, unsigned long dawr, unsigned long dawrx)
 {
 	/* PAPR says we can't set HYP */
 	dawrx &= ~DAWRX_HYP;
 
-	return  plpar_set_watchpoint0(dawr, dawrx);
+	if (nr == 0)
+		return plpar_set_watchpoint0(dawr, dawrx);
+	else
+		return plpar_set_watchpoint1(dawr, dawrx);
 }
 
 #define CMO_CHARACTERISTICS_TOKEN 44
-- 
2.26.2


^ permalink raw reply related

* [PATCH v4 07/10] powerpc/watchpoint: Rename current H_SET_MODE DAWR macro
From: Ravi Bangoria @ 2020-07-17  4:09 UTC (permalink / raw)
  To: mpe, mikey
  Cc: christophe.leroy, ravi.bangoria, apopple, peterz, fweisbec,
	miltonm, oleg, npiggin, linux-kernel, paulus, jolsa, jniethe5,
	pedromfc, naveen.n.rao, linuxppc-dev, mingo
In-Reply-To: <20200717040958.70561-1-ravi.bangoria@linux.ibm.com>

Current H_SET_MODE hcall macro name for setting/resetting DAWR0 is
H_SET_MODE_RESOURCE_SET_DAWR. Add suffix 0 to macro name as well.

Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
---
 arch/powerpc/include/asm/hvcall.h         | 2 +-
 arch/powerpc/include/asm/plpar_wrappers.h | 2 +-
 arch/powerpc/kvm/book3s_hv.c              | 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/include/asm/hvcall.h b/arch/powerpc/include/asm/hvcall.h
index 43486e773bd6..b785e9f0071c 100644
--- a/arch/powerpc/include/asm/hvcall.h
+++ b/arch/powerpc/include/asm/hvcall.h
@@ -355,7 +355,7 @@
 
 /* Values for 2nd argument to H_SET_MODE */
 #define H_SET_MODE_RESOURCE_SET_CIABR		1
-#define H_SET_MODE_RESOURCE_SET_DAWR		2
+#define H_SET_MODE_RESOURCE_SET_DAWR0		2
 #define H_SET_MODE_RESOURCE_ADDR_TRANS_MODE	3
 #define H_SET_MODE_RESOURCE_LE			4
 
diff --git a/arch/powerpc/include/asm/plpar_wrappers.h b/arch/powerpc/include/asm/plpar_wrappers.h
index 4293c5d2ddf4..d12c3680d946 100644
--- a/arch/powerpc/include/asm/plpar_wrappers.h
+++ b/arch/powerpc/include/asm/plpar_wrappers.h
@@ -312,7 +312,7 @@ static inline long plpar_set_ciabr(unsigned long ciabr)
 
 static inline long plpar_set_watchpoint0(unsigned long dawr0, unsigned long dawrx0)
 {
-	return plpar_set_mode(0, H_SET_MODE_RESOURCE_SET_DAWR, dawr0, dawrx0);
+	return plpar_set_mode(0, H_SET_MODE_RESOURCE_SET_DAWR0, dawr0, dawrx0);
 }
 
 static inline long plpar_signal_sys_reset(long cpu)
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 6bf66649ab92..7ad692c2d7c7 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -764,7 +764,7 @@ static int kvmppc_h_set_mode(struct kvm_vcpu *vcpu, unsigned long mflags,
 			return H_P3;
 		vcpu->arch.ciabr  = value1;
 		return H_SUCCESS;
-	case H_SET_MODE_RESOURCE_SET_DAWR:
+	case H_SET_MODE_RESOURCE_SET_DAWR0:
 		if (!kvmppc_power8_compatible(vcpu))
 			return H_P2;
 		if (!ppc_breakpoint_available())
-- 
2.26.2


^ permalink raw reply related

* [PATCH v4 06/10] powerpc/watchpoint: Set CPU_FTR_DAWR1 based on pa-features bit
From: Ravi Bangoria @ 2020-07-17  4:09 UTC (permalink / raw)
  To: mpe, mikey
  Cc: christophe.leroy, ravi.bangoria, apopple, peterz, fweisbec,
	miltonm, oleg, npiggin, linux-kernel, paulus, jolsa, jniethe5,
	pedromfc, naveen.n.rao, linuxppc-dev, mingo
In-Reply-To: <20200717040958.70561-1-ravi.bangoria@linux.ibm.com>

As per the PAPR, bit 0 of byte 64 in pa-features property indicates
availability of 2nd DAWR registers. i.e. If this bit is set, 2nd
DAWR is present, otherwise not. Host generally uses "cpu-features",
which masks "pa-features". But "cpu-features" are still not used for
guests and thus this change is mostly applicable for guests only.

Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
---
 arch/powerpc/kernel/prom.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/powerpc/kernel/prom.c b/arch/powerpc/kernel/prom.c
index 9cc49f265c86..c76c09b97bc8 100644
--- a/arch/powerpc/kernel/prom.c
+++ b/arch/powerpc/kernel/prom.c
@@ -175,6 +175,8 @@ static struct ibm_pa_feature {
 	 */
 	{ .pabyte = 22, .pabit = 0, .cpu_features = CPU_FTR_TM_COMP,
 	  .cpu_user_ftrs2 = PPC_FEATURE2_HTM_COMP | PPC_FEATURE2_HTM_NOSC_COMP },
+
+	{ .pabyte = 64, .pabit = 0, .cpu_features = CPU_FTR_DAWR1 },
 };
 
 static void __init scan_features(unsigned long node, const unsigned char *ftrs,
-- 
2.26.2


^ permalink raw reply related

* [PATCH v4 05/10] powerpc/dt_cpu_ftrs: Add feature for 2nd DAWR
From: Ravi Bangoria @ 2020-07-17  4:09 UTC (permalink / raw)
  To: mpe, mikey
  Cc: christophe.leroy, ravi.bangoria, apopple, peterz, fweisbec,
	miltonm, oleg, npiggin, linux-kernel, paulus, jolsa, jniethe5,
	pedromfc, naveen.n.rao, linuxppc-dev, mingo
In-Reply-To: <20200717040958.70561-1-ravi.bangoria@linux.ibm.com>

Add new device-tree feature for 2nd DAWR. If this feature is present,
2nd DAWR is supported, otherwise not.

Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
---
 arch/powerpc/include/asm/cputable.h | 7 +++++--
 arch/powerpc/kernel/dt_cpu_ftrs.c   | 7 +++++++
 2 files changed, 12 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/include/asm/cputable.h b/arch/powerpc/include/asm/cputable.h
index e506d429b1af..3445c86e1f6f 100644
--- a/arch/powerpc/include/asm/cputable.h
+++ b/arch/powerpc/include/asm/cputable.h
@@ -214,6 +214,7 @@ static inline void cpu_feature_keys_init(void) { }
 #define CPU_FTR_P9_TLBIE_ERAT_BUG	LONG_ASM_CONST(0x0001000000000000)
 #define CPU_FTR_P9_RADIX_PREFETCH_BUG	LONG_ASM_CONST(0x0002000000000000)
 #define CPU_FTR_ARCH_31			LONG_ASM_CONST(0x0004000000000000)
+#define CPU_FTR_DAWR1			LONG_ASM_CONST(0x0008000000000000)
 
 #ifndef __ASSEMBLY__
 
@@ -497,14 +498,16 @@ static inline void cpu_feature_keys_init(void) { }
 #define CPU_FTRS_POSSIBLE	\
 	    (CPU_FTRS_POWER7 | CPU_FTRS_POWER8E | CPU_FTRS_POWER8 | \
 	     CPU_FTR_ALTIVEC_COMP | CPU_FTR_VSX_COMP | CPU_FTRS_POWER9 | \
-	     CPU_FTRS_POWER9_DD2_1 | CPU_FTRS_POWER9_DD2_2 | CPU_FTRS_POWER10)
+	     CPU_FTRS_POWER9_DD2_1 | CPU_FTRS_POWER9_DD2_2 | CPU_FTRS_POWER10 | \
+	     CPU_FTR_DAWR1)
 #else
 #define CPU_FTRS_POSSIBLE	\
 	    (CPU_FTRS_PPC970 | CPU_FTRS_POWER5 | \
 	     CPU_FTRS_POWER6 | CPU_FTRS_POWER7 | CPU_FTRS_POWER8E | \
 	     CPU_FTRS_POWER8 | CPU_FTRS_CELL | CPU_FTRS_PA6T | \
 	     CPU_FTR_VSX_COMP | CPU_FTR_ALTIVEC_COMP | CPU_FTRS_POWER9 | \
-	     CPU_FTRS_POWER9_DD2_1 | CPU_FTRS_POWER9_DD2_2 | CPU_FTRS_POWER10)
+	     CPU_FTRS_POWER9_DD2_1 | CPU_FTRS_POWER9_DD2_2 | CPU_FTRS_POWER10 | \
+	     CPU_FTR_DAWR1)
 #endif /* CONFIG_CPU_LITTLE_ENDIAN */
 #endif
 #else
diff --git a/arch/powerpc/kernel/dt_cpu_ftrs.c b/arch/powerpc/kernel/dt_cpu_ftrs.c
index ac650c233cd9..c78cd3596ec4 100644
--- a/arch/powerpc/kernel/dt_cpu_ftrs.c
+++ b/arch/powerpc/kernel/dt_cpu_ftrs.c
@@ -574,6 +574,12 @@ static int __init feat_enable_mma(struct dt_cpu_feature *f)
 	return 1;
 }
 
+static int __init feat_enable_debug_facilities_v31(struct dt_cpu_feature *f)
+{
+	cur_cpu_spec->cpu_features |= CPU_FTR_DAWR1;
+	return 1;
+}
+
 struct dt_cpu_feature_match {
 	const char *name;
 	int (*enable)(struct dt_cpu_feature *f);
@@ -649,6 +655,7 @@ static struct dt_cpu_feature_match __initdata
 	{"wait-v3", feat_enable, 0},
 	{"prefix-instructions", feat_enable, 0},
 	{"matrix-multiply-assist", feat_enable_mma, 0},
+	{"debug-facilities-v31", feat_enable_debug_facilities_v31, 0},
 };
 
 static bool __initdata using_dt_cpu_ftrs;
-- 
2.26.2


^ permalink raw reply related

* [PATCH v4 04/10] powerpc/watchpoint: Enable watchpoint functionality on power10 guest
From: Ravi Bangoria @ 2020-07-17  4:09 UTC (permalink / raw)
  To: mpe, mikey
  Cc: christophe.leroy, ravi.bangoria, apopple, peterz, fweisbec,
	miltonm, oleg, npiggin, linux-kernel, paulus, jolsa, jniethe5,
	pedromfc, naveen.n.rao, linuxppc-dev, mingo
In-Reply-To: <20200717040958.70561-1-ravi.bangoria@linux.ibm.com>

CPU_FTR_DAWR is by default enabled for host via CPU_FTRS_DT_CPU_BASE
(controlled by CONFIG_PPC_DT_CPU_FTRS). But cpu-features device-tree
node is not PAPR compatible and thus not yet used by kvm or pHyp
guests. Enable watchpoint functionality on power10 guest (both kvm
and powervm) by adding CPU_FTR_DAWR to CPU_FTRS_POWER10. Note that
this change does not enable 2nd DAWR support.

Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
---
 arch/powerpc/include/asm/cputable.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/cputable.h b/arch/powerpc/include/asm/cputable.h
index bac2252c839e..e506d429b1af 100644
--- a/arch/powerpc/include/asm/cputable.h
+++ b/arch/powerpc/include/asm/cputable.h
@@ -478,7 +478,7 @@ static inline void cpu_feature_keys_init(void) { }
 	    CPU_FTR_CFAR | CPU_FTR_HVMODE | CPU_FTR_VMX_COPY | \
 	    CPU_FTR_DBELL | CPU_FTR_HAS_PPR | CPU_FTR_ARCH_207S | \
 	    CPU_FTR_TM_COMP | CPU_FTR_ARCH_300 | CPU_FTR_PKEY | \
-	    CPU_FTR_ARCH_31)
+	    CPU_FTR_ARCH_31 | CPU_FTR_DAWR)
 #define CPU_FTRS_CELL	(CPU_FTR_LWSYNC | \
 	    CPU_FTR_PPCAS_ARCH_V2 | CPU_FTR_CTRL | \
 	    CPU_FTR_ALTIVEC_COMP | CPU_FTR_MMCRA | CPU_FTR_SMT | \
-- 
2.26.2


^ permalink raw reply related

* [PATCH v4 03/10] powerpc/watchpoint: Fix DAWR exception for CACHEOP
From: Ravi Bangoria @ 2020-07-17  4:09 UTC (permalink / raw)
  To: mpe, mikey
  Cc: christophe.leroy, ravi.bangoria, apopple, peterz, fweisbec,
	miltonm, oleg, npiggin, linux-kernel, paulus, jolsa, jniethe5,
	pedromfc, naveen.n.rao, linuxppc-dev, mingo
In-Reply-To: <20200717040958.70561-1-ravi.bangoria@linux.ibm.com>

'ea' returned by analyse_instr() needs to be aligned down to cache
block size for CACHEOP instructions. analyse_instr() does not set
size for CACHEOP, thus size also needs to be calculated manually.

Fixes: 27985b2a640e ("powerpc/watchpoint: Don't ignore extraneous exceptions blindly")
Fixes: 74c6881019b7 ("powerpc/watchpoint: Prepare handler to handle more than one watchpoint")
Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
---
 arch/powerpc/kernel/hw_breakpoint.c | 21 ++++++++++++++++++++-
 1 file changed, 20 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/hw_breakpoint.c b/arch/powerpc/kernel/hw_breakpoint.c
index a971e22aea81..c55e67bab271 100644
--- a/arch/powerpc/kernel/hw_breakpoint.c
+++ b/arch/powerpc/kernel/hw_breakpoint.c
@@ -538,7 +538,12 @@ static bool check_dawrx_constraints(struct pt_regs *regs, int type,
 	if (OP_IS_LOAD(type) && !(info->type & HW_BRK_TYPE_READ))
 		return false;
 
-	if (OP_IS_STORE(type) && !(info->type & HW_BRK_TYPE_WRITE))
+	/*
+	 * The Cache Management instructions other than dcbz never
+	 * cause a match. i.e. if type is CACHEOP, the instruction
+	 * is dcbz, and dcbz is treated as Store.
+	 */
+	if ((OP_IS_STORE(type) || type == CACHEOP) && !(info->type & HW_BRK_TYPE_WRITE))
 		return false;
 
 	if (is_kernel_addr(regs->nip) && !(info->type & HW_BRK_TYPE_KERNEL))
@@ -601,6 +606,15 @@ static bool check_constraints(struct pt_regs *regs, struct ppc_inst instr,
 	return false;
 }
 
+static int cache_op_size(void)
+{
+#ifdef __powerpc64__
+	return ppc64_caches.l1d.block_size;
+#else
+	return L1_CACHE_BYTES;
+#endif
+}
+
 static void get_instr_detail(struct pt_regs *regs, struct ppc_inst *instr,
 			     int *type, int *size, unsigned long *ea)
 {
@@ -616,7 +630,12 @@ static void get_instr_detail(struct pt_regs *regs, struct ppc_inst *instr,
 	if (!(regs->msr & MSR_64BIT))
 		*ea &= 0xffffffffUL;
 #endif
+
 	*size = GETSIZE(op.type);
+	if (*type == CACHEOP) {
+		*size = cache_op_size();
+		*ea &= ~(*size - 1);
+	}
 }
 
 static bool is_larx_stcx_instr(int type)
-- 
2.26.2


^ permalink raw reply related

* [PATCH v4 02/10] powerpc/watchpoint: Fix DAWR exception constraint
From: Ravi Bangoria @ 2020-07-17  4:09 UTC (permalink / raw)
  To: mpe, mikey
  Cc: christophe.leroy, ravi.bangoria, apopple, peterz, fweisbec,
	miltonm, oleg, npiggin, linux-kernel, paulus, jolsa, jniethe5,
	pedromfc, naveen.n.rao, linuxppc-dev, mingo
In-Reply-To: <20200717040958.70561-1-ravi.bangoria@linux.ibm.com>

Pedro Miraglia Franco de Carvalho noticed that on p8/p9, DAR value is
inconsistent with different type of load/store. Like for byte,word
etc. load/stores, DAR is set to the address of the first byte of
overlap between watch range and real access. But for quadword load/
store it's sometime set to the address of the first byte of real
access whereas sometime set to the address of the first byte of
overlap. This issue has been fixed in p10. In p10(ISA 3.1), DAR is
always set to the address of the first byte of overlap. Commit 27985b2a640e
("powerpc/watchpoint: Don't ignore extraneous exceptions blindly")
wrongly assumes that DAR is set to the address of the first byte of
overlap for all load/stores on p8/p9 as well. Fix that. With the fix,
we now rely on 'ea' provided by analyse_instr(). If analyse_instr()
fails, generate event unconditionally on p8/p9, and on p10 generate
event only if DAR is within a DAWR range.

Note: 8xx is not affected.

Fixes: 27985b2a640e ("powerpc/watchpoint: Don't ignore extraneous exceptions blindly")
Fixes: 74c6881019b7 ("powerpc/watchpoint: Prepare handler to handle more than one watchpoint")
Reported-by: Pedro Miraglia Franco de Carvalho <pedromfc@br.ibm.com>
Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
---
 arch/powerpc/kernel/hw_breakpoint.c | 72 ++++++++++++++++-------------
 1 file changed, 41 insertions(+), 31 deletions(-)

diff --git a/arch/powerpc/kernel/hw_breakpoint.c b/arch/powerpc/kernel/hw_breakpoint.c
index 031e6defc08e..a971e22aea81 100644
--- a/arch/powerpc/kernel/hw_breakpoint.c
+++ b/arch/powerpc/kernel/hw_breakpoint.c
@@ -498,11 +498,11 @@ static bool dar_in_user_range(unsigned long dar, struct arch_hw_breakpoint *info
 	return ((info->address <= dar) && (dar - info->address < info->len));
 }
 
-static bool dar_user_range_overlaps(unsigned long dar, int size,
-				    struct arch_hw_breakpoint *info)
+static bool ea_user_range_overlaps(unsigned long ea, int size,
+				   struct arch_hw_breakpoint *info)
 {
-	return ((dar < info->address + info->len) &&
-		(dar + size > info->address));
+	return ((ea < info->address + info->len) &&
+		(ea + size > info->address));
 }
 
 static bool dar_in_hw_range(unsigned long dar, struct arch_hw_breakpoint *info)
@@ -515,20 +515,22 @@ static bool dar_in_hw_range(unsigned long dar, struct arch_hw_breakpoint *info)
 	return ((hw_start_addr <= dar) && (hw_end_addr > dar));
 }
 
-static bool dar_hw_range_overlaps(unsigned long dar, int size,
-				  struct arch_hw_breakpoint *info)
+static bool ea_hw_range_overlaps(unsigned long ea, int size,
+				 struct arch_hw_breakpoint *info)
 {
 	unsigned long hw_start_addr, hw_end_addr;
 
 	hw_start_addr = ALIGN_DOWN(info->address, HW_BREAKPOINT_SIZE);
 	hw_end_addr = ALIGN(info->address + info->len, HW_BREAKPOINT_SIZE);
 
-	return ((dar < hw_end_addr) && (dar + size > hw_start_addr));
+	return ((ea < hw_end_addr) && (ea + size > hw_start_addr));
 }
 
 /*
  * If hw has multiple DAWR registers, we also need to check all
  * dawrx constraint bits to confirm this is _really_ a valid event.
+ * If type is UNKNOWN, but privilege level matches, consider it as
+ * a positive match.
  */
 static bool check_dawrx_constraints(struct pt_regs *regs, int type,
 				    struct arch_hw_breakpoint *info)
@@ -553,7 +555,8 @@ static bool check_dawrx_constraints(struct pt_regs *regs, int type,
  * including extraneous exception. Otherwise return false.
  */
 static bool check_constraints(struct pt_regs *regs, struct ppc_inst instr,
-			      int type, int size, struct arch_hw_breakpoint *info)
+			      unsigned long ea, int type, int size,
+			      struct arch_hw_breakpoint *info)
 {
 	bool in_user_range = dar_in_user_range(regs->dar, info);
 	bool dawrx_constraints;
@@ -569,22 +572,27 @@ static bool check_constraints(struct pt_regs *regs, struct ppc_inst instr,
 	}
 
 	if (unlikely(ppc_inst_equal(instr, ppc_inst(0)))) {
-		if (in_user_range)
-			return true;
+		if (cpu_has_feature(CPU_FTR_ARCH_31) &&
+		    !dar_in_hw_range(regs->dar, info))
+			return false;
 
-		if (dar_in_hw_range(regs->dar, info)) {
-			info->type |= HW_BRK_TYPE_EXTRANEOUS_IRQ;
-			return true;
-		}
-		return false;
+		return true;
 	}
 
 	dawrx_constraints = check_dawrx_constraints(regs, type, info);
 
-	if (dar_user_range_overlaps(regs->dar, size, info))
+	if (type == UNKNOWN) {
+		if (cpu_has_feature(CPU_FTR_ARCH_31) &&
+		    !dar_in_hw_range(regs->dar, info))
+			return false;
+
 		return dawrx_constraints;
+	}
 
-	if (dar_hw_range_overlaps(regs->dar, size, info)) {
+	if (ea_user_range_overlaps(ea, size, info))
+		return dawrx_constraints;
+
+	if (ea_hw_range_overlaps(ea, size, info)) {
 		if (dawrx_constraints) {
 			info->type |= HW_BRK_TYPE_EXTRANEOUS_IRQ;
 			return true;
@@ -594,7 +602,7 @@ static bool check_constraints(struct pt_regs *regs, struct ppc_inst instr,
 }
 
 static void get_instr_detail(struct pt_regs *regs, struct ppc_inst *instr,
-			     int *type, int *size, bool *larx_stcx)
+			     int *type, int *size, unsigned long *ea)
 {
 	struct instruction_op op;
 
@@ -602,16 +610,18 @@ static void get_instr_detail(struct pt_regs *regs, struct ppc_inst *instr,
 		return;
 
 	analyse_instr(&op, regs, *instr);
-
-	/*
-	 * Set size = 8 if analyse_instr() fails. If it's a userspace
-	 * watchpoint(valid or extraneous), we can notify user about it.
-	 * If it's a kernel watchpoint, instruction  emulation will fail
-	 * in stepping_handler() and watchpoint will be disabled.
-	 */
 	*type = GETTYPE(op.type);
-	*size = !(*type == UNKNOWN) ? GETSIZE(op.type) : 8;
-	*larx_stcx = (*type == LARX || *type == STCX);
+	*ea = op.ea;
+#ifdef __powerpc64__
+	if (!(regs->msr & MSR_64BIT))
+		*ea &= 0xffffffffUL;
+#endif
+	*size = GETSIZE(op.type);
+}
+
+static bool is_larx_stcx_instr(int type)
+{
+	return type == LARX || type == STCX;
 }
 
 /*
@@ -678,7 +688,7 @@ int hw_breakpoint_handler(struct die_args *args)
 	struct ppc_inst instr = ppc_inst(0);
 	int type = 0;
 	int size = 0;
-	bool larx_stcx = false;
+	unsigned long ea;
 
 	/* Disable breakpoints during exception handling */
 	hw_breakpoint_disable();
@@ -692,7 +702,7 @@ int hw_breakpoint_handler(struct die_args *args)
 	rcu_read_lock();
 
 	if (!IS_ENABLED(CONFIG_PPC_8xx))
-		get_instr_detail(regs, &instr, &type, &size, &larx_stcx);
+		get_instr_detail(regs, &instr, &type, &size, &ea);
 
 	for (i = 0; i < nr_wp_slots(); i++) {
 		bp[i] = __this_cpu_read(bp_per_reg[i]);
@@ -702,7 +712,7 @@ int hw_breakpoint_handler(struct die_args *args)
 		info[i] = counter_arch_bp(bp[i]);
 		info[i]->type &= ~HW_BRK_TYPE_EXTRANEOUS_IRQ;
 
-		if (check_constraints(regs, instr, type, size, info[i])) {
+		if (check_constraints(regs, instr, ea, type, size, info[i])) {
 			if (!IS_ENABLED(CONFIG_PPC_8xx) &&
 			    ppc_inst_equal(instr, ppc_inst(0))) {
 				handler_error(bp[i], info[i]);
@@ -744,7 +754,7 @@ int hw_breakpoint_handler(struct die_args *args)
 	}
 
 	if (!IS_ENABLED(CONFIG_PPC_8xx)) {
-		if (larx_stcx) {
+		if (is_larx_stcx_instr(type)) {
 			for (i = 0; i < nr_wp_slots(); i++) {
 				if (!hit[i])
 					continue;
-- 
2.26.2


^ permalink raw reply related

* [PATCH v4 01/10] powerpc/watchpoint: Fix 512 byte boundary limit
From: Ravi Bangoria @ 2020-07-17  4:09 UTC (permalink / raw)
  To: mpe, mikey
  Cc: christophe.leroy, ravi.bangoria, apopple, peterz, fweisbec,
	miltonm, oleg, npiggin, linux-kernel, paulus, jolsa, jniethe5,
	pedromfc, naveen.n.rao, linuxppc-dev, mingo
In-Reply-To: <20200717040958.70561-1-ravi.bangoria@linux.ibm.com>

Milton Miller reported that we are aligning start and end address to
wrong size SZ_512M. It should be SZ_512. Fix that.

While doing this change I also found a case where ALIGN() comparison
fails. Within a given aligned range, ALIGN() of two addresses does not
match when start address is pointing to the first byte and end address
is pointing to any other byte except the first one. But that's not true
for ALIGN_DOWN(). ALIGN_DOWN() of any two addresses within that range
will always point to the first byte. So use ALIGN_DOWN() instead of
ALIGN().

Fixes: e68ef121c1f4 ("powerpc/watchpoint: Use builtin ALIGN*() macros")
Reported-by: Milton Miller <miltonm@us.ibm.com>
Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Tested-by: Jordan Niethe <jniethe5@gmail.com>
---
 arch/powerpc/kernel/hw_breakpoint.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/hw_breakpoint.c b/arch/powerpc/kernel/hw_breakpoint.c
index 0000daf0e1da..031e6defc08e 100644
--- a/arch/powerpc/kernel/hw_breakpoint.c
+++ b/arch/powerpc/kernel/hw_breakpoint.c
@@ -419,7 +419,7 @@ static int hw_breakpoint_validate_len(struct arch_hw_breakpoint *hw)
 	if (dawr_enabled()) {
 		max_len = DAWR_MAX_LEN;
 		/* DAWR region can't cross 512 bytes boundary */
-		if (ALIGN(start_addr, SZ_512M) != ALIGN(end_addr - 1, SZ_512M))
+		if (ALIGN_DOWN(start_addr, SZ_512) != ALIGN_DOWN(end_addr - 1, SZ_512))
 			return -EINVAL;
 	} else if (IS_ENABLED(CONFIG_PPC_8xx)) {
 		/* 8xx can setup a range without limitation */
-- 
2.26.2


^ permalink raw reply related

* [PATCH v4 00/10] powerpc/watchpoint: Enable 2nd DAWR on baremetal and powervm
From: Ravi Bangoria @ 2020-07-17  4:09 UTC (permalink / raw)
  To: mpe, mikey
  Cc: christophe.leroy, ravi.bangoria, apopple, peterz, fweisbec,
	miltonm, oleg, npiggin, linux-kernel, paulus, jolsa, jniethe5,
	pedromfc, naveen.n.rao, linuxppc-dev, mingo

Last series[1] was to add basic infrastructure support for more than
one watchpoint on Book3S powerpc. This series actually enables the 2nd 
DAWR for baremetal and powervm. Kvm guest is still not supported.

v3: https://lore.kernel.org/lkml/20200708045046.135702-1-ravi.bangoria@linux.ibm.com

v3->v4:
 - v3 patch #2 is split into two v4 patches: #2 and #3
 - Few other minor neats suggested by Jordan Niethe
 - Rebased to powerpc/next

[1]: https://lore.kernel.org/linuxppc-dev/20200514111741.97993-1-ravi.bangoria@linux.ibm.com/

Ravi Bangoria (10):
  powerpc/watchpoint: Fix 512 byte boundary limit
  powerpc/watchpoint: Fix DAWR exception constraint
  powerpc/watchpoint: Fix DAWR exception for CACHEOP
  powerpc/watchpoint: Enable watchpoint functionality on power10 guest
  powerpc/dt_cpu_ftrs: Add feature for 2nd DAWR
  powerpc/watchpoint: Set CPU_FTR_DAWR1 based on pa-features bit
  powerpc/watchpoint: Rename current H_SET_MODE DAWR macro
  powerpc/watchpoint: Guest support for 2nd DAWR hcall
  powerpc/watchpoint: Return available watchpoints dynamically
  powerpc/watchpoint: Remove 512 byte boundary

 arch/powerpc/include/asm/cputable.h       | 13 ++-
 arch/powerpc/include/asm/hvcall.h         |  3 +-
 arch/powerpc/include/asm/hw_breakpoint.h  |  5 +-
 arch/powerpc/include/asm/machdep.h        |  2 +-
 arch/powerpc/include/asm/plpar_wrappers.h |  7 +-
 arch/powerpc/kernel/dawr.c                |  2 +-
 arch/powerpc/kernel/dt_cpu_ftrs.c         |  7 ++
 arch/powerpc/kernel/hw_breakpoint.c       | 98 +++++++++++++++--------
 arch/powerpc/kernel/prom.c                |  2 +
 arch/powerpc/kvm/book3s_hv.c              |  2 +-
 arch/powerpc/platforms/pseries/setup.c    |  7 +-
 11 files changed, 101 insertions(+), 47 deletions(-)

-- 
2.26.2


^ permalink raw reply

* Re: [PATCH V5 1/4] mm/debug_vm_pgtable: Add tests validating arch helpers for core MM features
From: Anshuman Khandual @ 2020-07-17  3:20 UTC (permalink / raw)
  To: Steven Price, linux-mm
  Cc: Heiko Carstens, Paul Mackerras, H. Peter Anvin, agordeev,
	Will Deacon, linux-riscv, linux-arch, linux-s390, x86,
	Mike Rapoport, Christian Borntraeger, Ingo Molnar,
	gerald.schaefer, ziy, Catalin Marinas, linux-snps-arc,
	Vasily Gorbik, cai, Paul Walmsley, Kirill A . Shutemov,
	Thomas Gleixner, linux-arm-kernel, christophe.leroy, Vineet Gupta,
	linux-kernel, Palmer Dabbelt, aneesh.kumar, Borislav Petkov,
	Andrew Morton, linuxppc-dev, rppt
In-Reply-To: <2ff756c5-28e2-b64a-3788-260ba30c6409@arm.com>



On 07/16/2020 07:44 PM, Steven Price wrote:
> On 13/07/2020 04:23, Anshuman Khandual wrote:
>> This adds new tests validating arch page table helpers for these following
>> core memory features. These tests create and test specific mapping types at
>> various page table levels.
>>
>> 1. SPECIAL mapping
>> 2. PROTNONE mapping
>> 3. DEVMAP mapping
>> 4. SOFTDIRTY mapping
>> 5. SWAP mapping
>> 6. MIGRATION mapping
>> 7. HUGETLB mapping
>> 8. THP mapping
>>
>> Cc: Andrew Morton <akpm@linux-foundation.org>
>> Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com>
>> Cc: Christophe Leroy <christophe.leroy@c-s.fr>
>> Cc: Mike Rapoport <rppt@linux.ibm.com>
>> Cc: Vineet Gupta <vgupta@synopsys.com>
>> Cc: Catalin Marinas <catalin.marinas@arm.com>
>> Cc: Will Deacon <will@kernel.org>
>> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
>> Cc: Paul Mackerras <paulus@samba.org>
>> Cc: Michael Ellerman <mpe@ellerman.id.au>
>> Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
>> Cc: Vasily Gorbik <gor@linux.ibm.com>
>> Cc: Christian Borntraeger <borntraeger@de.ibm.com>
>> Cc: Thomas Gleixner <tglx@linutronix.de>
>> Cc: Ingo Molnar <mingo@redhat.com>
>> Cc: Borislav Petkov <bp@alien8.de>
>> Cc: "H. Peter Anvin" <hpa@zytor.com>
>> Cc: Kirill A. Shutemov <kirill@shutemov.name>
>> Cc: Paul Walmsley <paul.walmsley@sifive.com>
>> Cc: Palmer Dabbelt <palmer@dabbelt.com>
>> Cc: linux-snps-arc@lists.infradead.org
>> Cc: linux-arm-kernel@lists.infradead.org
>> Cc: linuxppc-dev@lists.ozlabs.org
>> Cc: linux-s390@vger.kernel.org
>> Cc: linux-riscv@lists.infradead.org
>> Cc: x86@kernel.org
>> Cc: linux-mm@kvack.org
>> Cc: linux-arch@vger.kernel.org
>> Cc: linux-kernel@vger.kernel.org
>> Tested-by: Vineet Gupta <vgupta@synopsys.com>    #arc
>> Reviewed-by: Zi Yan <ziy@nvidia.com>
>> Suggested-by: Catalin Marinas <catalin.marinas@arm.com>
>> Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
>> ---
>>   mm/debug_vm_pgtable.c | 302 +++++++++++++++++++++++++++++++++++++++++-
>>   1 file changed, 301 insertions(+), 1 deletion(-)
>>
>> diff --git a/mm/debug_vm_pgtable.c b/mm/debug_vm_pgtable.c
>> index 61ab16fb2e36..2fac47db3eb7 100644
>> --- a/mm/debug_vm_pgtable.c
>> +++ b/mm/debug_vm_pgtable.c
> [...]
>> +
>> +static void __init pte_swap_tests(unsigned long pfn, pgprot_t prot)
>> +{
>> +    swp_entry_t swp;
>> +    pte_t pte;
>> +
>> +    pte = pfn_pte(pfn, prot);
>> +    swp = __pte_to_swp_entry(pte);
> 
> Minor issue: this doesn't look necessarily valid - there's no reason a normal PTE can be turned into a swp_entry. In practise this is likely to work on all architectures because there's no reason not to use (at least) all the PFN bits for the swap entry, but it doesn't exactly seem correct.

Agreed, that it is a simple test but nonetheless a valid one which
makes sure that PFN value remained unchanged during pte <---> swp
conversion.

> 
> Can we start with a swp_entry_t (from __swp_entry()) and check the round trip of that?
> 
> It would also seem sensible to have a check that is_swap_pte(__swp_entry_to_pte(__swp_entry(x,y))) is true.

From past experiences, getting any these new tests involving platform
helpers, working on all existing enabled archs is neither trivial nor
going to be quick. Existing tests here are known to succeed in enabled
platforms. Nonetheless, proposed tests as in the above suggestions do
make sense but will try to accommodate them in a later patch.

^ permalink raw reply

* Re: [RFC PATCH 4/7] x86: use exit_lazy_tlb rather than membarrier_mm_sync_core_before_usermode
From: Alan Stern @ 2020-07-16 21:24 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: linux-arch, paulmck, Arnd Bergmann, Peter Zijlstra, x86,
	linux-kernel, Nicholas Piggin, linux-mm, Andy Lutomirski,
	linuxppc-dev
In-Reply-To: <595582123.17106.1594925921537.JavaMail.zimbra@efficios.com>

On Thu, Jul 16, 2020 at 02:58:41PM -0400, Mathieu Desnoyers wrote:
> ----- On Jul 16, 2020, at 12:03 PM, Mathieu Desnoyers mathieu.desnoyers@efficios.com wrote:
> 
> > ----- On Jul 16, 2020, at 11:46 AM, Mathieu Desnoyers
> > mathieu.desnoyers@efficios.com wrote:
> > 
> >> ----- On Jul 16, 2020, at 12:42 AM, Nicholas Piggin npiggin@gmail.com wrote:
> >>> I should be more complete here, especially since I was complaining
> >>> about unclear barrier comment :)
> >>> 
> >>> 
> >>> CPU0                     CPU1
> >>> a. user stuff            1. user stuff
> >>> b. membarrier()          2. enter kernel
> >>> c. smp_mb()              3. smp_mb__after_spinlock(); // in __schedule
> >>> d. read rq->curr         4. rq->curr switched to kthread
> >>> e. is kthread, skip IPI  5. switch_to kthread
> >>> f. return to user        6. rq->curr switched to user thread
> >>> g. user stuff            7. switch_to user thread
> >>>                         8. exit kernel
> >>>                         9. more user stuff
> >>> 
> >>> What you're really ordering is a, g vs 1, 9 right?
> >>> 
> >>> In other words, 9 must see a if it sees g, g must see 1 if it saw 9,
> >>> etc.
> >>> 
> >>> Userspace does not care where the barriers are exactly or what kernel
> >>> memory accesses might be being ordered by them, so long as there is a
> >>> mb somewhere between a and g, and 1 and 9. Right?
> >> 
> >> This is correct.
> > 
> > Actually, sorry, the above is not quite right. It's been a while
> > since I looked into the details of membarrier.
> > 
> > The smp_mb() at the beginning of membarrier() needs to be paired with a
> > smp_mb() _after_ rq->curr is switched back to the user thread, so the
> > memory barrier is between store to rq->curr and following user-space
> > accesses.
> > 
> > The smp_mb() at the end of membarrier() needs to be paired with the
> > smp_mb__after_spinlock() at the beginning of schedule, which is
> > between accesses to userspace memory and switching rq->curr to kthread.
> > 
> > As to *why* this ordering is needed, I'd have to dig through additional
> > scenarios from https://lwn.net/Articles/573436/. Or maybe Paul remembers ?
> 
> Thinking further about this, I'm beginning to consider that maybe we have been
> overly cautious by requiring memory barriers before and after store to rq->curr.
> 
> If CPU0 observes a CPU1's rq->curr->mm which differs from its own process (current)
> while running the membarrier system call, it necessarily means that CPU1 had
> to issue smp_mb__after_spinlock when entering the scheduler, between any user-space
> loads/stores and update of rq->curr.
> 
> Requiring a memory barrier between update of rq->curr (back to current process's
> thread) and following user-space memory accesses does not seem to guarantee
> anything more than what the initial barrier at the beginning of __schedule already
> provides, because the guarantees are only about accesses to user-space memory.
> 
> Therefore, with the memory barrier at the beginning of __schedule, just observing that
> CPU1's rq->curr differs from current should guarantee that a memory barrier was issued
> between any sequentially consistent instructions belonging to the current process on
> CPU1.
> 
> Or am I missing/misremembering an important point here ?

Is it correct to say that the switch_to operations in 5 and 7 include 
memory barriers?  If they do, then skipping the IPI should be okay.

The reason is as follows: The guarantee you need to enforce is that 
anything written by CPU0 before the membarrier() will be visible to CPU1 
after it returns to user mode.  Let's say that a writes to X and 9 
reads from X.

Then we have an instance of the Store Buffer pattern:

	CPU0			CPU1
	a. Write X		6. Write rq->curr for user thread
	c. smp_mb()		7. switch_to memory barrier
	d. Read rq->curr	9. Read X

In this pattern, the memory barriers make it impossible for both reads 
to miss their corresponding writes.  Since d does fail to read 6 (it 
sees the earlier value stored by 4), 9 must read a.

The other guarantee you need is that g on CPU0 will observe anything 
written by CPU1 in 1.  This is easier to see, using the fact that 3 is a 
memory barrier and d reads from 4.

Alan Stern

^ permalink raw reply

* Re: [PATCH v3 0/3] Off-load TLB invalidations to host for !GTSE
From: Nicholas Piggin @ 2020-07-17  2:44 UTC (permalink / raw)
  To: Bharata B Rao, Qian Cai
  Cc: sfr, aneesh.kumar, linux-kernel, linux-next, linuxppc-dev
In-Reply-To: <1594950229.jn9ipe6td1.astroid@bobo.none>

Excerpts from Nicholas Piggin's message of July 17, 2020 12:08 pm:
> Excerpts from Qian Cai's message of July 17, 2020 3:27 am:
>> On Fri, Jul 03, 2020 at 11:06:05AM +0530, Bharata B Rao wrote:
>>> Hypervisor may choose not to enable Guest Translation Shootdown Enable
>>> (GTSE) option for the guest. When GTSE isn't ON, the guest OS isn't
>>> permitted to use instructions like tblie and tlbsync directly, but is
>>> expected to make hypervisor calls to get the TLB flushed.
>>> 
>>> This series enables the TLB flush routines in the radix code to
>>> off-load TLB flushing to hypervisor via the newly proposed hcall
>>> H_RPT_INVALIDATE. 
>>> 
>>> To easily check the availability of GTSE, it is made an MMU feature.
>>> The OV5 handling and H_REGISTER_PROC_TBL hcall are changed to
>>> handle GTSE as an optionally available feature and to not assume GTSE
>>> when radix support is available.
>>> 
>>> The actual hcall implementation for KVM isn't included in this
>>> patchset and will be posted separately.
>>> 
>>> Changes in v3
>>> =============
>>> - Fixed a bug in the hcall wrapper code where we were missing setting
>>>   H_RPTI_TYPE_NESTED while retrying the failed flush request with
>>>   a full flush for the nested case.
>>> - s/psize_to_h_rpti/psize_to_rpti_pgsize
>>> 
>>> v2: https://lore.kernel.org/linuxppc-dev/20200626131000.5207-1-bharata@linux.ibm.com/T/#t
>>> 
>>> Bharata B Rao (2):
>>>   powerpc/mm: Enable radix GTSE only if supported.
>>>   powerpc/pseries: H_REGISTER_PROC_TBL should ask for GTSE only if
>>>     enabled
>>> 
>>> Nicholas Piggin (1):
>>>   powerpc/mm/book3s64/radix: Off-load TLB invalidations to host when
>>>     !GTSE
>> 
>> Reverting the whole series fixed random memory corruptions during boot on
>> POWER9 PowerNV systems below.
> 
> If I s/mmu_has_feature(MMU_FTR_GTSE)/(1)/g in radix_tlb.c, then the .o
> disasm is the same as reverting my patch.
> 
> Feature bits not being set right? PowerNV should be pretty simple, seems
> to do the same as FTR_TYPE_RADIX.

Might need this fix

---

diff --git a/arch/powerpc/kernel/prom.c b/arch/powerpc/kernel/prom.c
index 9cc49f265c86..54c9bcea9d4e 100644
--- a/arch/powerpc/kernel/prom.c
+++ b/arch/powerpc/kernel/prom.c
@@ -163,7 +163,7 @@ static struct ibm_pa_feature {
 	{ .pabyte = 0,  .pabit = 6, .cpu_features  = CPU_FTR_NOEXECUTE },
 	{ .pabyte = 1,  .pabit = 2, .mmu_features  = MMU_FTR_CI_LARGE_PAGE },
 #ifdef CONFIG_PPC_RADIX_MMU
-	{ .pabyte = 40, .pabit = 0, .mmu_features  = MMU_FTR_TYPE_RADIX },
+	{ .pabyte = 40, .pabit = 0, .mmu_features  = (MMU_FTR_TYPE_RADIX | MMU_FTR_GTSE) },
 #endif
 	{ .pabyte = 1,  .pabit = 1, .invert = 1, .cpu_features = CPU_FTR_NODSISRALIGN },
 	{ .pabyte = 5,  .pabit = 0, .cpu_features  = CPU_FTR_REAL_LE,

^ permalink raw reply related

* Re: [PATCH v3 0/3] Off-load TLB invalidations to host for !GTSE
From: Nicholas Piggin @ 2020-07-17  2:08 UTC (permalink / raw)
  To: Bharata B Rao, Qian Cai
  Cc: sfr, aneesh.kumar, linux-kernel, linux-next, linuxppc-dev
In-Reply-To: <20200716172713.GA4565@lca.pw>

Excerpts from Qian Cai's message of July 17, 2020 3:27 am:
> On Fri, Jul 03, 2020 at 11:06:05AM +0530, Bharata B Rao wrote:
>> Hypervisor may choose not to enable Guest Translation Shootdown Enable
>> (GTSE) option for the guest. When GTSE isn't ON, the guest OS isn't
>> permitted to use instructions like tblie and tlbsync directly, but is
>> expected to make hypervisor calls to get the TLB flushed.
>> 
>> This series enables the TLB flush routines in the radix code to
>> off-load TLB flushing to hypervisor via the newly proposed hcall
>> H_RPT_INVALIDATE. 
>> 
>> To easily check the availability of GTSE, it is made an MMU feature.
>> The OV5 handling and H_REGISTER_PROC_TBL hcall are changed to
>> handle GTSE as an optionally available feature and to not assume GTSE
>> when radix support is available.
>> 
>> The actual hcall implementation for KVM isn't included in this
>> patchset and will be posted separately.
>> 
>> Changes in v3
>> =============
>> - Fixed a bug in the hcall wrapper code where we were missing setting
>>   H_RPTI_TYPE_NESTED while retrying the failed flush request with
>>   a full flush for the nested case.
>> - s/psize_to_h_rpti/psize_to_rpti_pgsize
>> 
>> v2: https://lore.kernel.org/linuxppc-dev/20200626131000.5207-1-bharata@linux.ibm.com/T/#t
>> 
>> Bharata B Rao (2):
>>   powerpc/mm: Enable radix GTSE only if supported.
>>   powerpc/pseries: H_REGISTER_PROC_TBL should ask for GTSE only if
>>     enabled
>> 
>> Nicholas Piggin (1):
>>   powerpc/mm/book3s64/radix: Off-load TLB invalidations to host when
>>     !GTSE
> 
> Reverting the whole series fixed random memory corruptions during boot on
> POWER9 PowerNV systems below.

If I s/mmu_has_feature(MMU_FTR_GTSE)/(1)/g in radix_tlb.c, then the .o
disasm is the same as reverting my patch.

Feature bits not being set right? PowerNV should be pretty simple, seems
to do the same as FTR_TYPE_RADIX.

So... test being done before static keys are set up? Shouldn't be. Must
be something obvious I just can't see it.

Thanks,
Nick


^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox