LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/4] Remove default DMA window before creating DDW
From: Leonardo Bras @ 2020-06-19  5:06 UTC (permalink / raw)
  To: Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras,
	Alexey Kardashevskiy, Leonardo Bras, Thiago Jung Bauermann,
	Ram Pai
  Cc: linuxppc-dev, linux-kernel

There are some devices that only allow 1 DMA window to exist at a time,
and in those cases, a DDW is never created to them, since the default DMA
window keeps using this resource.

LoPAR recommends this procedure:
1. Remove the default DMA window,
2. Query for which configs the DDW can be created,
3. Create a DDW.

Patch #1:
- After LoPAR level 2.8, there is an extension that can make
  ibm,query-pe-dma-windows to have 6 outputs instead of 5. This changes the
  order of the outputs, and that can cause some trouble. 
- query_ddw() was updated to check how many outputs the 
  ibm,query-pe-dma-windows is supposed to have, update the rtas_call() and
  deal correctly with the outputs in both cases.
- This patch looks somehow unrelated to the series, but it can avoid future
  problems on DDW creation.

Patch #2 implements a new rtas call to recover the default DMA window,
in case anything fails after it was removed, and a DDW couldn't be created.

Patch #3 moves the window-removing code from remove_ddw() to
remove_dma_window(), creating a way to delete any DMA window, so it can be
used to delete the default DMA window.

Patch #4 makes use of the remove_dma_window() from patch #3 to remove the
default DMA window before query_ddw() and the rtas call from patch #2
to recover it if something goes wrong.

All patches were tested into an LPAR with an Ethernet VF:
4005:01:00.0 Ethernet controller: Mellanox Technologies MT27700 Family
[ConnectX-4 Virtual Function]

Leonardo Bras (4):
  powerpc/pseries/iommu: Update call to ibm,query-pe-dma-windows
  powerpc/pseries/iommu: Implement ibm,reset-pe-dma-windows rtas call
  powerpc/pseries/iommu: Move window-removing part of remove_ddw into
    remove_dma_window
  powerpc/pseries/iommu: Remove default DMA window before creating DDW

 arch/powerpc/platforms/pseries/iommu.c | 163 +++++++++++++++++++------
 1 file changed, 127 insertions(+), 36 deletions(-)

-- 
2.25.4


^ permalink raw reply

* Re: [PATCH 2/2] powerpc/hv-24x7: Add sysfs files inside hv-24x7 device to show cpumask
From: Gautham R Shenoy @ 2020-06-19  5:05 UTC (permalink / raw)
  To: Kajol Jain; +Cc: nathanl, maddy, suka, anju, linuxppc-dev
In-Reply-To: <20200618122713.9030-3-kjain@linux.ibm.com>

On Thu, Jun 18, 2020 at 05:57:13PM +0530, Kajol Jain wrote:
> Patch here adds a cpumask attr to hv_24x7 pmu along with ABI documentation.
> 
> command:# cat /sys/devices/hv_24x7/cpumask
> 0
> 
> Signed-off-by: Kajol Jain <kjain@linux.ibm.com>
> ---
>  .../sysfs-bus-event_source-devices-hv_24x7    |  6 ++++
>  arch/powerpc/perf/hv-24x7.c                   | 31 ++++++++++++++++++-
>  2 files changed, 36 insertions(+), 1 deletion(-)
> 
> diff --git a/Documentation/ABI/testing/sysfs-bus-event_source-devices-hv_24x7 b/Documentation/ABI/testing/sysfs-bus-event_source-devices-hv_24x7
> index e8698afcd952..281e7b367733 100644
> --- a/Documentation/ABI/testing/sysfs-bus-event_source-devices-hv_24x7
> +++ b/Documentation/ABI/testing/sysfs-bus-event_source-devices-hv_24x7
> @@ -43,6 +43,12 @@ Description:	read only
>  		This sysfs interface exposes the number of cores per chip
>  		present in the system.
> 
> +What:		/sys/devices/hv_24x7/cpumask
> +Date:		June 2020
> +Contact:	Linux on PowerPC Developer List <linuxppc-dev@lists.ozlabs.org>
> +Description:	read only
> +		This sysfs file exposes cpumask.

Could you please describe this in little more detail as to what the
cpumask is ?

> +
>  What:		/sys/bus/event_source/devices/hv_24x7/event_descs/<event-name>
>  Date:		February 2014
>  Contact:	Linux on PowerPC Developer List <linuxppc-dev@lists.ozlabs.org>
> diff --git a/arch/powerpc/perf/hv-24x7.c b/arch/powerpc/perf/hv-24x7.c
> index fdc4ae155d60..03d870a9fc36 100644
> --- a/arch/powerpc/perf/hv-24x7.c
> +++ b/arch/powerpc/perf/hv-24x7.c
> @@ -448,6 +448,12 @@ static ssize_t device_show_string(struct device *dev,
>  	return sprintf(buf, "%s\n", (char *)d->var);
>  }
> 
> +static ssize_t cpumask_get_attr(struct device *dev,
> +				struct device_attribute *attr, char *buf)
> +{
> +	return cpumap_print_to_pagebuf(true, buf, &hv_24x7_cpumask);
> +}
> +
>  static ssize_t sockets_show(struct device *dev,
>  			    struct device_attribute *attr, char *buf)
>  {
> @@ -1116,6 +1122,17 @@ static DEVICE_ATTR_RO(sockets);
>  static DEVICE_ATTR_RO(chipspersocket);
>  static DEVICE_ATTR_RO(coresperchip);
> 
> +static DEVICE_ATTR(cpumask, S_IRUGO, cpumask_get_attr, NULL);
> +
> +static struct attribute *cpumask_attrs[] = {
> +	&dev_attr_cpumask.attr,
> +	NULL,
> +};
> +
> +static struct attribute_group cpumask_attr_group = {
> +	.attrs = cpumask_attrs,
> +};
> +
>  static struct bin_attribute *if_bin_attrs[] = {
>  	&bin_attr_catalog,
>  	NULL,
> @@ -1143,6 +1160,11 @@ static const struct attribute_group *attr_groups[] = {
>  	&event_desc_group,
>  	&event_long_desc_group,
>  	&if_group,
> +	/*
> +	 * This NULL is a placeholder for the cpumask attr which will update
> +	 * onlyif cpuhotplug registration is successful
> +	 */
> +	NULL,
>  	NULL,
>  };
> 
> @@ -1727,8 +1749,15 @@ static int hv_24x7_init(void)
> 
>  	/* init cpuhotplug */
>  	r = hv_24x7_cpu_hotplug_init();
> -	if (r)
> +	if (r) {
>  		pr_err("hv_24x7: CPU hotplug init failed\n");
> +	} else {
> +		/*
> +		 * Cpu hotplug init is successful, add the
> +		 * cpumask file as part of pmu attr group
> +		 */
> +		attr_groups[5] = &cpumask_attr_group;

Since this is only a one-time initialization, wouldn't it be safer to
iterate through attr_groups[] and assin cpumask_attr_group to the
first NULL location ?

> +	}
> 
>  	r = perf_pmu_register(&h_24x7_pmu, h_24x7_pmu.name, -1);
>  	if (r)
> -- 
> 2.18.2
> 

^ permalink raw reply

* Re: [PATCH 1/2] powerpc/perf/hv-24x7: Add cpu hotplug support
From: Gautham R Shenoy @ 2020-06-19  4:58 UTC (permalink / raw)
  To: Kajol Jain; +Cc: nathanl, maddy, suka, anju, linuxppc-dev
In-Reply-To: <20200618122713.9030-2-kjain@linux.ibm.com>

Hello Kajol,

On Thu, Jun 18, 2020 at 05:57:12PM +0530, Kajol Jain wrote:
> Patch here adds cpu hotplug functions to hv_24x7 pmu.
> A new cpuhp_state "CPUHP_AP_PERF_POWERPC_HV_24x7_ONLINE" enum
> is added.
> 
> The online function update the cpumask only if its NULL.
> As the primary intention for adding hotplug support
> is to desiginate a CPU to make HCALL to collect the
> count data.
> 
> The offline function test and clear corresponding cpu in a cpumask
> and update cpumask to any other active cpu.
> 
> With this patchset, perf tool side does not need "-C <cpu>"
> to be added.
> 
> Signed-off-by: Kajol Jain <kjain@linux.ibm.com>
> ---
>  arch/powerpc/perf/hv-24x7.c | 45 +++++++++++++++++++++++++++++++++++++
>  include/linux/cpuhotplug.h  |  1 +
>  2 files changed, 46 insertions(+)
> 
> diff --git a/arch/powerpc/perf/hv-24x7.c b/arch/powerpc/perf/hv-24x7.c
> index db213eb7cb02..fdc4ae155d60 100644
> --- a/arch/powerpc/perf/hv-24x7.c
> +++ b/arch/powerpc/perf/hv-24x7.c
> @@ -31,6 +31,8 @@ static int interface_version;
>  /* Whether we have to aggregate result data for some domains. */
>  static bool aggregate_result_elements;
> 
> +static cpumask_t hv_24x7_cpumask;
> +
>  static bool domain_is_valid(unsigned domain)
>  {
>  	switch (domain) {
> @@ -1641,6 +1643,44 @@ static struct pmu h_24x7_pmu = {
>  	.capabilities = PERF_PMU_CAP_NO_EXCLUDE,
>  };
> 
> +static int ppc_hv_24x7_cpu_online(unsigned int cpu)
> +{
> +	/* Make this CPU the designated target for counter collection */
> +	if (cpumask_empty(&hv_24x7_cpumask))
> +		cpumask_set_cpu(cpu, &hv_24x7_cpumask);
> +
> +	return 0;
> +}
> +
> +static int ppc_hv_24x7_cpu_offline(unsigned int cpu)
> +{
> +	int target = -1;
> +
> +	/* Check if exiting cpu is used for collecting 24x7 events */
> +	if (!cpumask_test_and_clear_cpu(cpu, &hv_24x7_cpumask))
> +		return 0;
> +
> +	/* Find a new cpu to collect 24x7 events */
> +	target = cpumask_any_but(cpu_active_mask, cpu);

cpumask_any_but() typically picks the first CPU in cpu_active_mask
that is not @cpu.


> +
> +	if (target < 0 || target >= nr_cpu_ids)
> +		return -1;
> +
> +	/* Migrate 24x7 events to the new target */
> +	cpumask_set_cpu(target, &hv_24x7_cpumask);
> +	perf_pmu_migrate_context(&h_24x7_pmu, cpu, target);


On a system with N CPUs numbered [O..N-1], can you please verify if
the time required to sequentially offline CPUs [0..N-2] ,in that
order, increase with this patch ?

I am asking this because we have encountered this problem once before
at a customer site and the commit 9c9f8fb71fee ("powerpc/perf: Use
cpumask_last() to determine the designated cpu for nest/core units.")
was introduced to fix that problem.

> +
> +	return 0;
> +}
> +
> +static int hv_24x7_cpu_hotplug_init(void)
> +{
> +	return cpuhp_setup_state(CPUHP_AP_PERF_POWERPC_HV_24x7_ONLINE,
> +			  "perf/powerpc/hv_24x7:online",
> +			  ppc_hv_24x7_cpu_online,
> +			  ppc_hv_24x7_cpu_offline);
> +}
> +
>  static int hv_24x7_init(void)
>  {
>  	int r;
> @@ -1685,6 +1725,11 @@ static int hv_24x7_init(void)
>  	if (r)
>  		return r;
> 
> +	/* init cpuhotplug */
> +	r = hv_24x7_cpu_hotplug_init();
> +	if (r)
> +		pr_err("hv_24x7: CPU hotplug init failed\n");
> +
>  	r = perf_pmu_register(&h_24x7_pmu, h_24x7_pmu.name, -1);
>  	if (r)
>  		return r;
> diff --git a/include/linux/cpuhotplug.h b/include/linux/cpuhotplug.h
> index 8377afef8806..16ed8f6f8774 100644
> --- a/include/linux/cpuhotplug.h
> +++ b/include/linux/cpuhotplug.h
> @@ -180,6 +180,7 @@ enum cpuhp_state {
>  	CPUHP_AP_PERF_POWERPC_CORE_IMC_ONLINE,
>  	CPUHP_AP_PERF_POWERPC_THREAD_IMC_ONLINE,
>  	CPUHP_AP_PERF_POWERPC_TRACE_IMC_ONLINE,
> +	CPUHP_AP_PERF_POWERPC_HV_24x7_ONLINE,
>  	CPUHP_AP_WATCHDOG_ONLINE,
>  	CPUHP_AP_WORKQUEUE_ONLINE,
>  	CPUHP_AP_RCUTREE_ONLINE,
> -- 
> 2.18.2
> 

^ permalink raw reply

* Re: [PATCHv2] tpm: ibmvtpm: Wait for ready buffer before probing for TPM2 attributes
From: Jerry Snitselaar @ 2020-06-19  3:42 UTC (permalink / raw)
  To: David Gibson
  Cc: nayna, linux-kernel, jarkko.sakkinen, jgg, paulus, peterhuewe,
	linuxppc-dev, linux-integrity, stefanb
In-Reply-To: <20200619033040.121412-1-david@gibson.dropbear.id.au>

On Fri Jun 19 20, David Gibson wrote:
>The tpm2_get_cc_attrs_tbl() call will result in TPM commands being issued,
>which will need the use of the internal command/response buffer.  But,
>we're issuing this *before* we've waited to make sure that buffer is
>allocated.
>
>This can result in intermittent failures to probe if the hypervisor / TPM
>implementation doesn't respond quickly enough.  I find it fails almost
>every time with an 8 vcpu guest under KVM with software emulated TPM.
>
>To fix it, just move the tpm2_get_cc_attrs_tlb() call after the
>existing code to wait for initialization, which will ensure the buffer
>is allocated.
>
>Fixes: 18b3670d79ae9 ("tpm: ibmvtpm: Add support for TPM2")
>Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
>---

Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com>

>
>Changes from v1:
> * Fixed a formatting error in the commit message
> * Added some more detail to the commit message
>
>drivers/char/tpm/tpm_ibmvtpm.c | 14 +++++++-------
> 1 file changed, 7 insertions(+), 7 deletions(-)
>
>diff --git a/drivers/char/tpm/tpm_ibmvtpm.c b/drivers/char/tpm/tpm_ibmvtpm.c
>index 09fe45246b8cc..994385bf37c0c 100644
>--- a/drivers/char/tpm/tpm_ibmvtpm.c
>+++ b/drivers/char/tpm/tpm_ibmvtpm.c
>@@ -683,13 +683,6 @@ static int tpm_ibmvtpm_probe(struct vio_dev *vio_dev,
> 	if (rc)
> 		goto init_irq_cleanup;
>
>-	if (!strcmp(id->compat, "IBM,vtpm20")) {
>-		chip->flags |= TPM_CHIP_FLAG_TPM2;
>-		rc = tpm2_get_cc_attrs_tbl(chip);
>-		if (rc)
>-			goto init_irq_cleanup;
>-	}
>-
> 	if (!wait_event_timeout(ibmvtpm->crq_queue.wq,
> 				ibmvtpm->rtce_buf != NULL,
> 				HZ)) {
>@@ -697,6 +690,13 @@ static int tpm_ibmvtpm_probe(struct vio_dev *vio_dev,
> 		goto init_irq_cleanup;
> 	}
>
>+	if (!strcmp(id->compat, "IBM,vtpm20")) {
>+		chip->flags |= TPM_CHIP_FLAG_TPM2;
>+		rc = tpm2_get_cc_attrs_tbl(chip);
>+		if (rc)
>+			goto init_irq_cleanup;
>+	}
>+
> 	return tpm_chip_register(chip);
> init_irq_cleanup:
> 	do {
>-- 
>2.26.2
>


^ permalink raw reply

* [PATCHv2] tpm: ibmvtpm: Wait for ready buffer before probing for TPM2 attributes
From: David Gibson @ 2020-06-19  3:30 UTC (permalink / raw)
  To: jarkko.sakkinen, stefanb
  Cc: nayna, linux-kernel, jgg, paulus, peterhuewe, linuxppc-dev,
	linux-integrity, David Gibson

The tpm2_get_cc_attrs_tbl() call will result in TPM commands being issued,
which will need the use of the internal command/response buffer.  But,
we're issuing this *before* we've waited to make sure that buffer is
allocated.

This can result in intermittent failures to probe if the hypervisor / TPM
implementation doesn't respond quickly enough.  I find it fails almost
every time with an 8 vcpu guest under KVM with software emulated TPM.

To fix it, just move the tpm2_get_cc_attrs_tlb() call after the
existing code to wait for initialization, which will ensure the buffer
is allocated.

Fixes: 18b3670d79ae9 ("tpm: ibmvtpm: Add support for TPM2")
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---

Changes from v1:
 * Fixed a formatting error in the commit message
 * Added some more detail to the commit message
 
drivers/char/tpm/tpm_ibmvtpm.c | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/drivers/char/tpm/tpm_ibmvtpm.c b/drivers/char/tpm/tpm_ibmvtpm.c
index 09fe45246b8cc..994385bf37c0c 100644
--- a/drivers/char/tpm/tpm_ibmvtpm.c
+++ b/drivers/char/tpm/tpm_ibmvtpm.c
@@ -683,13 +683,6 @@ static int tpm_ibmvtpm_probe(struct vio_dev *vio_dev,
 	if (rc)
 		goto init_irq_cleanup;
 
-	if (!strcmp(id->compat, "IBM,vtpm20")) {
-		chip->flags |= TPM_CHIP_FLAG_TPM2;
-		rc = tpm2_get_cc_attrs_tbl(chip);
-		if (rc)
-			goto init_irq_cleanup;
-	}
-
 	if (!wait_event_timeout(ibmvtpm->crq_queue.wq,
 				ibmvtpm->rtce_buf != NULL,
 				HZ)) {
@@ -697,6 +690,13 @@ static int tpm_ibmvtpm_probe(struct vio_dev *vio_dev,
 		goto init_irq_cleanup;
 	}
 
+	if (!strcmp(id->compat, "IBM,vtpm20")) {
+		chip->flags |= TPM_CHIP_FLAG_TPM2;
+		rc = tpm2_get_cc_attrs_tbl(chip);
+		if (rc)
+			goto init_irq_cleanup;
+	}
+
 	return tpm_chip_register(chip);
 init_irq_cleanup:
 	do {
-- 
2.26.2


^ permalink raw reply related

* Re: [PATCH V3 (RESEND) 0/3] arm64: Enable vmemmap mapping from device memory
From: Anshuman Khandual @ 2020-06-19  1:34 UTC (permalink / raw)
  To: Mike Rapoport
  Cc: Mark Rutland, Michal Hocko, linux-ia64, David Hildenbrand,
	Peter Zijlstra, Dave Hansen, linux-mm, Paul Mackerras,
	linux-riscv, Will Deacon, Thomas Gleixner, x86,
	Matthew Wilcox (Oracle), Ingo Molnar, Catalin Marinas, Fenghua Yu,
	Pavel Tatashin, Andy Lutomirski, Paul Walmsley, Dan Williams,
	linux-arm-kernel, Tony Luck, linux-kernel, Palmer Dabbelt,
	Andrew Morton, linuxppc-dev, Kirill A. Shutemov
In-Reply-To: <20200618085641.GE6493@linux.ibm.com>



On 06/18/2020 02:26 PM, Mike Rapoport wrote:
> On Thu, Jun 18, 2020 at 06:45:27AM +0530, Anshuman Khandual wrote:
>> This series enables vmemmap backing memory allocation from device memory
>> ranges on arm64. But before that, it enables vmemmap_populate_basepages()
>> and vmemmap_alloc_block_buf() to accommodate struct vmem_altmap based
>> alocation requests.
>>
>> This series applies on 5.8-rc1.
>>
>> Pending Question:
>>
>> altmap_alloc_block_buf() does not have any other remaining users in
>> the tree after this change. Should it be converted into a static
>> function and it's declaration be dropped from the header
>> (include/linux/mm.h). Avoided doing so because I was not sure if there
>> are any off-tree users or not.
> 
> Well, off-tree users probably have an active fork anyway so they could
> switch to vmemmap_alloc_block_buf()...

Sure, will make the function a static and remove it's declaration
from the header.

> 
> Regardless, can you please update Documentation/vm/memory-model.rst to
> keep it in sync with the code?
Sure, will do.

^ permalink raw reply

* Re: [PATCH v2 2/4] KVM: PPC: Book3S HV: track the state GFNs associated with secure VMs
From: Ram Pai @ 2020-06-19  1:16 UTC (permalink / raw)
  To: Laurent Dufour
  Cc: cclaudio, kvm-ppc, bharata, sathnaga, aneesh.kumar, sukadev,
	linuxppc-dev, bauerman, david
In-Reply-To: <7f5aea68-0cc5-6ae5-c30e-eee60eff5a92@linux.ibm.com>

On Thu, Jun 18, 2020 at 03:31:06PM +0200, Laurent Dufour wrote:
> Le 18/06/2020 à 11:19, Ram Pai a écrit :
> >

.snip..

> >************************************************************************
> >  1. States of a GFN
> >     ---------------
> >  The GFN can be in one of the following states.
> >diff --git a/arch/powerpc/kvm/book3s_64_mmu_radix.c b/arch/powerpc/kvm/book3s_64_mmu_radix.c

...snip...

> >index 803940d..3448459 100644
> >--- a/arch/powerpc/kvm/book3s_64_mmu_radix.c
> >+++ b/arch/powerpc/kvm/book3s_64_mmu_radix.c
> >@@ -1100,7 +1100,7 @@ void kvmppc_radix_flush_memslot(struct kvm *kvm,
> >  	unsigned int shift;
> >  	if (kvm->arch.secure_guest & KVMPPC_SECURE_INIT_START)
> >-		kvmppc_uvmem_drop_pages(memslot, kvm, true);
> >+		kvmppc_uvmem_drop_pages(memslot, kvm, true, false);
> 
> When reviewing the v1 of this series, I asked you the question about
> the fact that the call here is made with purge_gfn = false. Your
> answer was:
> 
> >This function does not know, under what context it is called. Since
> >its job is to just flush the memslot, it cannot assume anything
> >about purging the pages in the memslot.
> 
> Indeed in the case of the memory hotplug operation, this function is
> called to wipe the page from the secure device in the case the pages
> are secured. In that case the purge is required. Indeed, I checked
> the other call to kvmppc_radix_flush_memslot() in
> kvmppc_core_flush_memslot_hv() and I cannot see why in that case too
> purge_gfn should be false, especially when the memslot is reused as
> detailed in __kvm_set_memory_region() around the call to
> kvm_arch_flush_shadow_memslot().
> 
> I'm sorry to not have ask this earlier, but could you please elaborate on this?

You are right. kvmppc_radix_flush_memslot() is getting called everytime with
the intention of disassociating the memslot from that VM. Which implies,
the memslot is intended to be deleted and possibly reused.

I should be calling kvmppc_uvmem_drop_pages() with purge_gfn=true, here
aswell.

I expect some form of problem showing up in memhot-plug/unplug path.

RP


^ permalink raw reply

* Re: [PATCH v5 01/13] powerpc: Remove Xilinx PPC405/PPC440 support
From: Nick Desaulniers @ 2020-06-18 23:33 UTC (permalink / raw)
  To: Nathan Chancellor
  Cc: Robert Lippert, Arnd Bergmann, Michal Simek, LKML,
	clang-built-linux, Paul Mackerras, Greg Thelen, linuxppc-dev
In-Reply-To: <20200618031622.GA195@Ryzen-9-3900X.localdomain>

On Wed, Jun 17, 2020 at 8:16 PM Nathan Chancellor
<natechancellor@gmail.com> wrote:
>
> Admittedly, we really do not have many PowerPC experts in our
> organization

Not with that attitude. :P /s
https://www.youtube.com/watch?v=Kck_upSlx60

> so we are supporting it on a "best effort" basis, which
> often involves using whatever knowledge is floating around or can be
> gained from interactions such as this :) so thank you for that!

That said, Google does have a significant amount of POWER machines in
the fleet used by our "prodkernel" teams.  Anecdotally, I've been
getting cc'ed on more and more internal bugs regarding ppc+clang+linux
kernels.  "prodkernel" is still in the process of moving a massive
fleet over to clang built kernels, but ppc is very important to
Google.  ppc32...I'm not so sure...let me ask some of our ppc folks
more about which uarch's and word sizes they really care about...

--
Thanks,
~Nick Desaulniers

^ permalink raw reply

* [PATCH net v2] ibmveth: Fix max MTU limit
From: Thomas Falcon @ 2020-06-18 21:55 UTC (permalink / raw)
  To: kuba; +Cc: netdev, Thomas Falcon, linuxppc-dev

The max MTU limit defined for ibmveth is not accounting for
virtual ethernet buffer overhead, which is twenty-two additional
bytes set aside for the ethernet header and eight additional bytes
of an opaque handle reserved for use by the hypervisor. Update the
max MTU to reflect this overhead.

Signed-off-by: Thomas Falcon <tlfalcon@linux.ibm.com>
Fixes: d894be57ca92 ("ethernet: use net core MTU range checking in more drivers")
Fixes: 110447f8269a ("ethernet: fix min/max MTU typos")
---
v2: Include Fixes tags suggested by Jakub Kicisnki
---
 drivers/net/ethernet/ibm/ibmveth.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/ibm/ibmveth.c b/drivers/net/ethernet/ibm/ibmveth.c
index 96d36ae5049e..c5c732601e35 100644
--- a/drivers/net/ethernet/ibm/ibmveth.c
+++ b/drivers/net/ethernet/ibm/ibmveth.c
@@ -1715,7 +1715,7 @@ static int ibmveth_probe(struct vio_dev *dev, const struct vio_device_id *id)
 	}
 
 	netdev->min_mtu = IBMVETH_MIN_MTU;
-	netdev->max_mtu = ETH_MAX_MTU;
+	netdev->max_mtu = ETH_MAX_MTU - IBMVETH_BUFF_OH;
 
 	memcpy(netdev->dev_addr, mac_addr_p, ETH_ALEN);
 
-- 
2.26.2


^ permalink raw reply related

* [PATCH net] ibmvnic: continue to init in CRQ reset returns H_CLOSED
From: Dany Madden @ 2020-06-18 19:24 UTC (permalink / raw)
  To: davem; +Cc: Dany Madden, netdev, tlfalcon, linuxppc-dev

Continue the reset path when partner adapter is not ready or H_CLOSED is
returned from reset crq. This patch allows the CRQ init to proceed to
establish a valid CRQ for traffic to flow after reset.

Signed-off-by: Dany Madden <drt@linux.ibm.com>
---
 drivers/net/ethernet/ibm/ibmvnic.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/ibm/ibmvnic.c b/drivers/net/ethernet/ibm/ibmvnic.c
index 2baf7b3ff4cb..4b7cb483c47f 100644
--- a/drivers/net/ethernet/ibm/ibmvnic.c
+++ b/drivers/net/ethernet/ibm/ibmvnic.c
@@ -1971,13 +1971,18 @@ static int do_reset(struct ibmvnic_adapter *adapter,
 			release_sub_crqs(adapter, 1);
 		} else {
 			rc = ibmvnic_reset_crq(adapter);
-			if (!rc)
+			if (rc == H_CLOSED || rc == H_SUCCESS) {
 				rc = vio_enable_interrupts(adapter->vdev);
+				if (rc)
+					netdev_err(adapter->netdev,
+						   "Reset failed to enable interrupts. rc=%d\n",
+						   rc);
+			}
 		}
 
 		if (rc) {
 			netdev_err(adapter->netdev,
-				   "Couldn't initialize crq. rc=%d\n", rc);
+				   "Reset couldn't initialize crq. rc=%d\n", rc);
 			goto out;
 		}
 
-- 
2.18.2


^ permalink raw reply related

* [PATCH] pci: pcie: AER: Fix logging of Correctable errors
From: Matt Jolly @ 2020-06-18 15:55 UTC (permalink / raw)
  To: Russell Currey, Sam Bobroff, Oliver O'Halloran, Bjorn Helgaas,
	linuxppc-dev, linux-pci, linux-kernel
  Cc: Matt Jolly

The AER documentation indicates that correctable (severity=Corrected)
errors should be output as a warning so that users can filter these
errors if they choose to; This functionality does not appear to have been implemented.

This patch modifies the functions aer_print_error and __aer_print_error
to send correctable errors as a warning (pci_warn), rather than as an error (pci_err). It
partially addresses several bugs in relation to kernel message buffer
spam for misbehaving devices - the root cause (possibly device firmware?) isn't
addressed, but the dmesg output is less alarming for end users, and can
be filtered separately from uncorrectable errors. This should hopefully
reduce the need for users to disable AER to suppress corrected errors.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=201517
Link: https://bugzilla.kernel.org/show_bug.cgi?id=196183

Signed-off-by: Matt Jolly <Kangie@footclan.ninja>
---
 drivers/pci/pcie/aer.c | 36 ++++++++++++++++++++++++++----------
 1 file changed, 26 insertions(+), 10 deletions(-)

diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
index 3acf56683915..131ecc0df2cb 100644
--- a/drivers/pci/pcie/aer.c
+++ b/drivers/pci/pcie/aer.c
@@ -662,12 +662,18 @@ static void __aer_print_error(struct pci_dev *dev,
 			errmsg = i < ARRAY_SIZE(aer_uncorrectable_error_string) ?
 				aer_uncorrectable_error_string[i] : NULL;
 
-		if (errmsg)
-			pci_err(dev, "   [%2d] %-22s%s\n", i, errmsg,
-				info->first_error == i ? " (First)" : "");
-		else
+		if (errmsg) {
+			if (info->severity == AER_CORRECTABLE) {
+				pci_warn(dev, "   [%2d] %-22s%s\n", i, errmsg,
+					info->first_error == i ? " (First)" : "");
+			} else {
+				pci_err(dev, "   [%2d] %-22s%s\n", i, errmsg,
+					info->first_error == i ? " (First)" : "");
+			}
+		} else {
 			pci_err(dev, "   [%2d] Unknown Error Bit%s\n",
 				i, info->first_error == i ? " (First)" : "");
+		}
 	}
 	pci_dev_aer_stats_incr(dev, info);
 }
@@ -686,13 +692,23 @@ void aer_print_error(struct pci_dev *dev, struct aer_err_info *info)
 	layer = AER_GET_LAYER_ERROR(info->severity, info->status);
 	agent = AER_GET_AGENT(info->severity, info->status);
 
-	pci_err(dev, "PCIe Bus Error: severity=%s, type=%s, (%s)\n",
-		aer_error_severity_string[info->severity],
-		aer_error_layer[layer], aer_agent_string[agent]);
+	if  (info->severity == AER_CORRECTABLE) {
+		pci_warn(dev, "PCIe Bus Error: severity=%s, type=%s, (%s)\n",
+			aer_error_severity_string[info->severity],
+			aer_error_layer[layer], aer_agent_string[agent]);
 
-	pci_err(dev, "  device [%04x:%04x] error status/mask=%08x/%08x\n",
-		dev->vendor, dev->device,
-		info->status, info->mask);
+		pci_warn(dev, "  device [%04x:%04x] error status/mask=%08x/%08x\n",
+			dev->vendor, dev->device,
+			info->status, info->mask);
+	} else {
+		pci_err(dev, "PCIe Bus Error: severity=%s, type=%s, (%s)\n",
+			aer_error_severity_string[info->severity],
+			aer_error_layer[layer], aer_agent_string[agent]);
+
+		pci_err(dev, "  device [%04x:%04x] error status/mask=%08x/%08x\n",
+			dev->vendor, dev->device,
+			info->status, info->mask);
+	}
 
 	__aer_print_error(dev, info);
 
-- 
2.26.2


^ permalink raw reply related

* Re: rename probe_kernel_* and probe_user_*
From: Helge Deller @ 2020-06-18 20:51 UTC (permalink / raw)
  To: Linus Torvalds, Christoph Hellwig, Russell King, Tony Luck,
	Michael Ellerman
  Cc: linux-arch, linux-ia64, linux-parisc, the arch/x86 maintainers,
	Linux Kernel Mailing List, Andrew Morton, linuxppc-dev
In-Reply-To: <CAHk-=wjpnu=882iD9ck9Ywt6R1LYX_Hv-oS7dBMsWZwDRGZ5jA@mail.gmail.com>

On 18.06.20 21:48, Linus Torvalds wrote:
> [ Explicitly added architecture lists and developers to the cc to make
> this more visible ]
>
> On Wed, Jun 17, 2020 at 12:38 AM Christoph Hellwig <hch@lst.de> wrote:
>>
>> Andrew and I decided to drop the patches implementing your suggested
>> rename of the probe_kernel_* and probe_user_* helpers from -mm as there
>> were way to many conflicts.  After -rc1 might be a good time for this as
>> all the conflicts are resolved now.
>
> So I've merged this renaming now, together with my changes to make
> 'get_kernel_nofault()' look and act a lot more like 'get_user()'.
>
> It just felt wrong (and potentially dangerous) to me to have a
> 'get_kernel_nofault()' naming that implied semantics that we're all
> familiar with from 'get_user()', but acting very differently.
>
> But part of the fixups I made for the type checking are for
> architectures where I didn't even compile-test the end result. I
> looked at every case individually, and the patch looks sane, but I
> could have screwed something up.
>
> Basically, 'get_kernel_nofault()' doesn't do the same automagic type
> munging from the pointer to the target that 'get_user()' does, but at
> least now it checks that the types are superficially compatible.
> There should be build failures if they aren't, but I hopefully fixed
> everything up properly for all architectures.
>
> This email is partly to ask people to double-check, but partly just as
> a heads-up so that _if_ I screwed something up, you'll have the
> background and it won't take you by surprise.

Linus. thanks for the heads-up!
With your change it compiles cleanly on 32- and 64-bit parisc.

Helge

^ permalink raw reply

* Re: [PATCH net] ibmveth: Fix max MTU limit
From: Thomas Falcon @ 2020-06-18 20:51 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: netdev, linuxppc-dev
In-Reply-To: <20200618085722.110f3702@kicinski-fedora-PC1C0HJN>


On 6/18/20 10:57 AM, Jakub Kicinski wrote:
> On Thu, 18 Jun 2020 10:43:46 -0500 Thomas Falcon wrote:
>> The max MTU limit defined for ibmveth is not accounting for
>> virtual ethernet buffer overhead, which is twenty-two additional
>> bytes set aside for the ethernet header and eight additional bytes
>> of an opaque handle reserved for use by the hypervisor. Update the
>> max MTU to reflect this overhead.
>>
>> Signed-off-by: Thomas Falcon <tlfalcon@linux.ibm.com>
> How about
>
> Fixes: d894be57ca92 ("ethernet: use net core MTU range checking in more drivers")
> Fixes: 110447f8269a ("ethernet: fix min/max MTU typos")
>
> ?

Thanks, do you need me to send a v2 with those tags?

Tom


^ permalink raw reply

* Re: [PATCH v2 0/4] Migrate non-migrated pages of a SVM.
From: Ram Pai @ 2020-06-18 20:37 UTC (permalink / raw)
  To: kvm-ppc, linuxppc-dev
  Cc: ldufour, cclaudio, bharata, sathnaga, aneesh.kumar, sukadev,
	bauerman, david
In-Reply-To: <1592471945-24786-1-git-send-email-linuxram@us.ibm.com>

I should have elaborated on the problem and the need for these patches.

Explaining it here. Will add it to the series in next version.

-------------------------------------------------------------

The time taken to switch a VM to Secure-VM, increases by the size of
the VM.  A 100GB VM takes about 7minutes. This is unacceptable.  This
linear increase is caused by a suboptimal behavior by the Ultravisor and
the Hypervisor.  The Ultravisor unnecessarily migrates all the GFN of
the VM from normal-memory to secure-memory. It has to just migrate the
necessary and sufficient GFNs.

However when the optimization is incorporated in the Ultravisor, the
Hypervisor starts misbehaving. The Hypervisor has a inbuilt assumption
that the Ultravisor will explicitly request to migrate, each and every
GFN of the VM. If only necessary and sufficient GFNs are requested for
migration, the Hypervisor continues to manage the rest of the GFNs are
normal GFNs. This leads of memory corruption, manifested consistently
when the SVM reboots.

The same is true, when a memory slot is hotplugged into a SVM. The
Hypervisor expects the ultravisor to request migration of all GFNs to
secure-GFN.  But at the same time the hypervisor is unable to handle any
H_SVM_PAGE_IN requests from the Ultravisor, done in the context of
UV_REGISTER_MEM_SLOT ucall.  This problem manifests as random errors in
the SVM, when a memory-slot is hotplugged.

This patch series automatically migrates the non-migrated pages of a SVM,
and thus solves the problem.

------------------------------------------------------------------



On Thu, Jun 18, 2020 at 02:19:01AM -0700, Ram Pai wrote:
> This patch series migrates the non-migrated pages of a SVM.
> This is required when the UV calls H_SVM_INIT_DONE, and
> when a memory-slot is hotplugged to a Secure VM.
> 
> Testing: Passed rigorous SVM reboot test using different
> 	sized SVMs.
> 
> Changelog:
> 	. fixed a bug observed by Bharata. Pages that
> 	where paged-in and later paged-out must also be
> 	skipped from migration during H_SVM_INIT_DONE.
> 
> Laurent Dufour (1):
>   KVM: PPC: Book3S HV: migrate hot plugged memory
> 
> Ram Pai (3):
>   KVM: PPC: Book3S HV: Fix function definition in book3s_hv_uvmem.c
>   KVM: PPC: Book3S HV: track the state GFNs associated with secure VMs
>   KVM: PPC: Book3S HV: migrate remaining normal-GFNs to secure-GFNs in
>     H_SVM_INIT_DONE
> 
>  Documentation/powerpc/ultravisor.rst        |   2 +
>  arch/powerpc/include/asm/kvm_book3s_uvmem.h |   8 +-
>  arch/powerpc/kvm/book3s_64_mmu_radix.c      |   2 +-
>  arch/powerpc/kvm/book3s_hv.c                |  12 +-
>  arch/powerpc/kvm/book3s_hv_uvmem.c          | 449 ++++++++++++++++++++++------
>  5 files changed, 368 insertions(+), 105 deletions(-)
> 
> -- 
> 1.8.3.1

-- 
Ram Pai

^ permalink raw reply

* Re: rename probe_kernel_* and probe_user_*
From: Linus Torvalds @ 2020-06-18 19:48 UTC (permalink / raw)
  To: Christoph Hellwig, Russell King, Tony Luck, Helge Deller,
	Michael Ellerman
  Cc: linux-arch, linux-ia64, linux-parisc, the arch/x86 maintainers,
	Linux Kernel Mailing List, Andrew Morton, linuxppc-dev
In-Reply-To: <20200617073755.8068-1-hch@lst.de>

[ Explicitly added architecture lists and developers to the cc to make
this more visible ]

On Wed, Jun 17, 2020 at 12:38 AM Christoph Hellwig <hch@lst.de> wrote:
>
> Andrew and I decided to drop the patches implementing your suggested
> rename of the probe_kernel_* and probe_user_* helpers from -mm as there
> were way to many conflicts.  After -rc1 might be a good time for this as
> all the conflicts are resolved now.

So I've merged this renaming now, together with my changes to make
'get_kernel_nofault()' look and act a lot more like 'get_user()'.

It just felt wrong (and potentially dangerous) to me to have a
'get_kernel_nofault()' naming that implied semantics that we're all
familiar with from 'get_user()', but acting very differently.

But part of the fixups I made for the type checking are for
architectures where I didn't even compile-test the end result. I
looked at every case individually, and the patch looks sane, but I
could have screwed something up.

Basically, 'get_kernel_nofault()' doesn't do the same automagic type
munging from the pointer to the target that 'get_user()' does, but at
least now it checks that the types are superficially compatible.
There should be build failures if they aren't, but I hopefully fixed
everything up properly for all architectures.

This email is partly to ask people to double-check, but partly just as
a heads-up so that _if_ I screwed something up, you'll have the
background and it won't take you by surprise.

               Linus

^ permalink raw reply

* Re: [PATCH v2 02/12] ocxl: Change type of pasid to unsigned int
From: Frederic Barrat @ 2020-06-18 16:56 UTC (permalink / raw)
  To: Fenghua Yu
  Cc: Dave Hansen, H Peter Anvin, Dave Jiang, Ashok Raj, Joerg Roedel,
	x86, amd-gfx, Ingo Molnar, Ravi V Shankar, Yu-cheng Yu,
	Andrew Donnellan, Borislav Petkov, Sohil Mehta, Thomas Gleixner,
	Tony Luck, David Woodhouse, Felix Kuehling, linux-kernel, iommu,
	Jacob Jun Pan, linuxppc-dev, Lu Baolu
In-Reply-To: <20200618153747.GE15763@romley-ivt3.sc.intel.com>



Le 18/06/2020 à 17:37, Fenghua Yu a écrit :
> The first 3 patches clean up pasid and flag defitions to prepare for
> following patches.
> 
> If you think this patch can be dropped, we will drop it.

Yes, I think that's the case.

Thanks,

  Fred

^ permalink raw reply

* [PATCH v1 5/5] KVM: PPC: Book3S HV: Use H_RPT_INVALIDATE in nested KVM
From: Bharata B Rao @ 2020-06-18 16:09 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: aneesh.kumar, Bharata B Rao, npiggin
In-Reply-To: <20200618160930.26324-1-bharata@linux.ibm.com>

In the nested KVM case, replace H_TLB_INVALIDATE by the new hcall
H_RPT_INVALIDATE if available. The availability of this hcall
is determined from "hcall-rpt-invalidate" string in ibm,hypertas-functions
DT property.

Signed-off-by: Bharata B Rao <bharata@linux.ibm.com>
---
 arch/powerpc/include/asm/firmware.h       |  4 +++-
 arch/powerpc/kvm/book3s_64_mmu_radix.c    | 27 ++++++++++++++++++-----
 arch/powerpc/kvm/book3s_hv_nested.c       | 13 +++++++++--
 arch/powerpc/platforms/pseries/firmware.c |  1 +
 4 files changed, 36 insertions(+), 9 deletions(-)

diff --git a/arch/powerpc/include/asm/firmware.h b/arch/powerpc/include/asm/firmware.h
index 6003c2e533a0..aa6a5ef5d483 100644
--- a/arch/powerpc/include/asm/firmware.h
+++ b/arch/powerpc/include/asm/firmware.h
@@ -52,6 +52,7 @@
 #define FW_FEATURE_PAPR_SCM 	ASM_CONST(0x0000002000000000)
 #define FW_FEATURE_ULTRAVISOR	ASM_CONST(0x0000004000000000)
 #define FW_FEATURE_STUFF_TCE	ASM_CONST(0x0000008000000000)
+#define FW_FEATURE_RPT_INVALIDATE ASM_CONST(0x0000010000000000)
 
 #ifndef __ASSEMBLY__
 
@@ -71,7 +72,8 @@ enum {
 		FW_FEATURE_TYPE1_AFFINITY | FW_FEATURE_PRRN |
 		FW_FEATURE_HPT_RESIZE | FW_FEATURE_DRMEM_V2 |
 		FW_FEATURE_DRC_INFO | FW_FEATURE_BLOCK_REMOVE |
-		FW_FEATURE_PAPR_SCM | FW_FEATURE_ULTRAVISOR,
+		FW_FEATURE_PAPR_SCM | FW_FEATURE_ULTRAVISOR |
+		FW_FEATURE_RPT_INVALIDATE,
 	FW_FEATURE_PSERIES_ALWAYS = 0,
 	FW_FEATURE_POWERNV_POSSIBLE = FW_FEATURE_OPAL | FW_FEATURE_ULTRAVISOR,
 	FW_FEATURE_POWERNV_ALWAYS = 0,
diff --git a/arch/powerpc/kvm/book3s_64_mmu_radix.c b/arch/powerpc/kvm/book3s_64_mmu_radix.c
index 84acb4769487..fcf8b031a32e 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_radix.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_radix.c
@@ -21,6 +21,7 @@
 #include <asm/pte-walk.h>
 #include <asm/ultravisor.h>
 #include <asm/kvm_book3s_uvmem.h>
+#include <asm/plpar_wrappers.h>
 
 /*
  * Supported radix tree geometry.
@@ -313,10 +314,17 @@ void kvmppc_radix_tlbie_page(struct kvm *kvm, unsigned long addr,
 		return;
 	}
 
-	psi = shift_to_mmu_psize(pshift);
-	rb = addr | (mmu_get_ap(psi) << PPC_BITLSHIFT(58));
-	rc = plpar_hcall_norets(H_TLB_INVALIDATE, H_TLBIE_P1_ENC(0, 0, 1),
-				lpid, rb);
+	if (!firmware_has_feature(FW_FEATURE_RPT_INVALIDATE)) {
+		psi = shift_to_mmu_psize(pshift);
+		rb = addr | (mmu_get_ap(psi) << PPC_BITLSHIFT(58));
+		rc = plpar_hcall_norets(H_TLB_INVALIDATE,
+					H_TLBIE_P1_ENC(0, 0, 1), lpid, rb);
+	} else {
+		rc = pseries_rpt_invalidate(lpid, H_RPTI_TARGET_CMMU,
+					    H_RPTI_TYPE_NESTED |
+					    H_RPTI_TYPE_TLB, H_RPTI_PAGE_ALL,
+					    addr, addr + psize);
+	}
 	if (rc)
 		pr_err("KVM: TLB page invalidation hcall failed, rc=%ld\n", rc);
 }
@@ -330,8 +338,15 @@ static void kvmppc_radix_flush_pwc(struct kvm *kvm, unsigned int lpid)
 		return;
 	}
 
-	rc = plpar_hcall_norets(H_TLB_INVALIDATE, H_TLBIE_P1_ENC(1, 0, 1),
-				lpid, TLBIEL_INVAL_SET_LPID);
+	if (!firmware_has_feature(FW_FEATURE_RPT_INVALIDATE))
+		rc = plpar_hcall_norets(H_TLB_INVALIDATE,
+					H_TLBIE_P1_ENC(1, 0, 1),
+					lpid, TLBIEL_INVAL_SET_LPID);
+	else
+		rc = pseries_rpt_invalidate(lpid, H_RPTI_TARGET_CMMU,
+					    H_RPTI_TYPE_NESTED |
+					    H_RPTI_TYPE_PWC, H_RPTI_PAGE_ALL,
+					    0, -1UL);
 	if (rc)
 		pr_err("KVM: TLB PWC invalidation hcall failed, rc=%ld\n", rc);
 }
diff --git a/arch/powerpc/kvm/book3s_hv_nested.c b/arch/powerpc/kvm/book3s_hv_nested.c
index 75993f44519b..81f903284d34 100644
--- a/arch/powerpc/kvm/book3s_hv_nested.c
+++ b/arch/powerpc/kvm/book3s_hv_nested.c
@@ -19,6 +19,7 @@
 #include <asm/pgalloc.h>
 #include <asm/pte-walk.h>
 #include <asm/reg.h>
+#include <asm/plpar_wrappers.h>
 
 static struct patb_entry *pseries_partition_tb;
 
@@ -402,8 +403,16 @@ static void kvmhv_flush_lpid(unsigned int lpid)
 		return;
 	}
 
-	rc = plpar_hcall_norets(H_TLB_INVALIDATE, H_TLBIE_P1_ENC(2, 0, 1),
-				lpid, TLBIEL_INVAL_SET_LPID);
+	if (!firmware_has_feature(FW_FEATURE_RPT_INVALIDATE))
+		rc = plpar_hcall_norets(H_TLB_INVALIDATE,
+					H_TLBIE_P1_ENC(2, 0, 1),
+					lpid, TLBIEL_INVAL_SET_LPID);
+	else
+		rc = pseries_rpt_invalidate(lpid, H_RPTI_TARGET_CMMU,
+					    H_RPTI_TYPE_NESTED |
+					    H_RPTI_TYPE_TLB | H_RPTI_TYPE_PWC |
+					    H_RPTI_TYPE_PAT,
+					    H_RPTI_PAGE_ALL, 0, -1UL);
 	if (rc)
 		pr_err("KVM: TLB LPID invalidation hcall failed, rc=%ld\n", rc);
 }
diff --git a/arch/powerpc/platforms/pseries/firmware.c b/arch/powerpc/platforms/pseries/firmware.c
index 3e49cc23a97a..4c7b7f5a2ebc 100644
--- a/arch/powerpc/platforms/pseries/firmware.c
+++ b/arch/powerpc/platforms/pseries/firmware.c
@@ -65,6 +65,7 @@ hypertas_fw_features_table[] = {
 	{FW_FEATURE_HPT_RESIZE,		"hcall-hpt-resize"},
 	{FW_FEATURE_BLOCK_REMOVE,	"hcall-block-remove"},
 	{FW_FEATURE_PAPR_SCM,		"hcall-scm"},
+	{FW_FEATURE_RPT_INVALIDATE,	"hcall-rpt-invalidate"},
 };
 
 /* Build up the firmware features bitmask using the contents of
-- 
2.21.3


^ permalink raw reply related

* [PATCH v1 4/5] powerpc/mm/book3s64/radix: Off-load TLB invalidations to host when !GTSE
From: Bharata B Rao @ 2020-06-18 16:09 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: aneesh.kumar, Bharata B Rao, npiggin
In-Reply-To: <20200618160930.26324-1-bharata@linux.ibm.com>

From: Nicholas Piggin <npiggin@gmail.com>

When platform doesn't support GTSE, let TLB invalidation requests
for radix guests be off-loaded to the host using H_RPT_INVALIDATE
hcall.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Bharata B Rao <bharata@linux.ibm.com>
	[hcall wrapper, error path handling and renames]
---
 arch/powerpc/include/asm/hvcall.h         | 27 ++++++-
 arch/powerpc/include/asm/plpar_wrappers.h | 52 +++++++++++++
 arch/powerpc/mm/book3s64/radix_tlb.c      | 95 +++++++++++++++++++++--
 3 files changed, 166 insertions(+), 8 deletions(-)

diff --git a/arch/powerpc/include/asm/hvcall.h b/arch/powerpc/include/asm/hvcall.h
index e90c073e437e..3f9bc7ad1cdd 100644
--- a/arch/powerpc/include/asm/hvcall.h
+++ b/arch/powerpc/include/asm/hvcall.h
@@ -305,7 +305,8 @@
 #define H_SCM_UNBIND_ALL        0x3FC
 #define H_SCM_HEALTH            0x400
 #define H_SCM_PERFORMANCE_STATS 0x418
-#define MAX_HCALL_OPCODE	H_SCM_PERFORMANCE_STATS
+#define H_RPT_INVALIDATE	0x448
+#define MAX_HCALL_OPCODE	H_RPT_INVALIDATE
 
 /* Scope args for H_SCM_UNBIND_ALL */
 #define H_UNBIND_SCOPE_ALL (0x1)
@@ -389,6 +390,30 @@
 #define PROC_TABLE_RADIX	0x04
 #define PROC_TABLE_GTSE		0x01
 
+/*
+ * Defines for
+ * H_RPT_INVALIDATE - Invalidate RPT translation lookaside information.
+ */
+
+/* Type of translation to invalidate (type) */
+#define H_RPTI_TYPE_NESTED	0x0001	/* Invalidate nested guest partition-scope */
+#define H_RPTI_TYPE_TLB		0x0002	/* Invalidate TLB */
+#define H_RPTI_TYPE_PWC		0x0004	/* Invalidate Page Walk Cache */
+#define H_RPTI_TYPE_PRT		0x0008	/* Invalidate Process Table Entries if H_RPTI_TYPE_NESTED is clear */
+#define H_RPTI_TYPE_PAT		0x0008	/* Invalidate Partition Table Entries if H_RPTI_TYPE_NESTED is set */
+
+/* Invalidation targets (target) */
+#define H_RPTI_TARGET_CMMU		0x01 /* All virtual processors in the partition */
+#define H_RPTI_TARGET_CMMU_LOCAL	0x02 /* Current virtual processor */
+#define H_RPTI_TARGET_NMMU		0x04 /* All nest/accelerator agents in use by the partition */
+
+/* Page size mask (page sizes) */
+#define H_RPTI_PAGE_4K	0x01
+#define H_RPTI_PAGE_64K	0x02
+#define H_RPTI_PAGE_2M	0x04
+#define H_RPTI_PAGE_1G	0x08
+#define H_RPTI_PAGE_ALL (-1UL)
+
 #ifndef __ASSEMBLY__
 #include <linux/types.h>
 
diff --git a/arch/powerpc/include/asm/plpar_wrappers.h b/arch/powerpc/include/asm/plpar_wrappers.h
index 4497c8afb573..92320bb309c7 100644
--- a/arch/powerpc/include/asm/plpar_wrappers.h
+++ b/arch/powerpc/include/asm/plpar_wrappers.h
@@ -334,6 +334,51 @@ static inline long plpar_get_cpu_characteristics(struct h_cpu_char_result *p)
 	return rc;
 }
 
+/*
+ * Wrapper to H_RPT_INVALIDATE hcall that handles return values appropriately
+ *
+ * - Returns H_SUCCESS on success
+ * - For H_BUSY return value, we retry the hcall.
+ * - For any other hcall failures, attempt a full flush once before
+ *   resorting to BUG().
+ *
+ * Note: This hcall is expected to fail only very rarely. The correct
+ * error recovery of killing the process/guest will be eventually
+ * needed.
+ */
+static inline long pseries_rpt_invalidate(u32 pid, u64 target, u64 type,
+					  u64 page_sizes, u64 start, u64 end)
+{
+	long rc;
+	unsigned long all = H_RPTI_TYPE_TLB | H_RPTI_TYPE_PWC;
+
+	while (true) {
+		rc = plpar_hcall_norets(H_RPT_INVALIDATE, pid, target, type,
+					page_sizes, start, end);
+		if (rc == H_BUSY) {
+			cpu_relax();
+			continue;
+		} else if (rc == H_SUCCESS)
+			return rc;
+
+		/* Flush request failed, try with a full flush once */
+		if (type & H_RPTI_TYPE_NESTED)
+			all |= H_RPTI_TYPE_PAT;
+		else
+			all |= H_RPTI_TYPE_PRT;
+retry:
+		rc = plpar_hcall_norets(H_RPT_INVALIDATE, pid, target,
+					all, page_sizes, 0, -1UL);
+		if (rc == H_BUSY) {
+			cpu_relax();
+			goto retry;
+		} else if (rc == H_SUCCESS)
+			return rc;
+
+		BUG();
+	}
+}
+
 #else /* !CONFIG_PPC_PSERIES */
 
 static inline long plpar_set_ciabr(unsigned long ciabr)
@@ -346,6 +391,13 @@ static inline long plpar_pte_read_4(unsigned long flags, unsigned long ptex,
 {
 	return 0;
 }
+
+static inline long pseries_rpt_invalidate(u32 pid, u64 target, u64 type,
+					  u64 page_sizes, u64 start, u64 end)
+{
+	return 0;
+}
+
 #endif /* CONFIG_PPC_PSERIES */
 
 #endif /* _ASM_POWERPC_PLPAR_WRAPPERS_H */
diff --git a/arch/powerpc/mm/book3s64/radix_tlb.c b/arch/powerpc/mm/book3s64/radix_tlb.c
index b5cc9b23cf02..733935b68f37 100644
--- a/arch/powerpc/mm/book3s64/radix_tlb.c
+++ b/arch/powerpc/mm/book3s64/radix_tlb.c
@@ -16,11 +16,25 @@
 #include <asm/tlbflush.h>
 #include <asm/trace.h>
 #include <asm/cputhreads.h>
+#include <asm/plpar_wrappers.h>
 
 #define RIC_FLUSH_TLB 0
 #define RIC_FLUSH_PWC 1
 #define RIC_FLUSH_ALL 2
 
+static inline u64 psize_to_h_rpti(unsigned long psize)
+{
+	if (psize == MMU_PAGE_4K)
+		return H_RPTI_PAGE_4K;
+	if (psize == MMU_PAGE_64K)
+		return H_RPTI_PAGE_64K;
+	if (psize == MMU_PAGE_2M)
+		return H_RPTI_PAGE_2M;
+	if (psize == MMU_PAGE_1G)
+		return H_RPTI_PAGE_1G;
+	return H_RPTI_PAGE_ALL;
+}
+
 /*
  * tlbiel instruction for radix, set invalidation
  * i.e., r=1 and is=01 or is=10 or is=11
@@ -694,7 +708,14 @@ void radix__flush_tlb_mm(struct mm_struct *mm)
 			goto local;
 		}
 
-		if (cputlb_use_tlbie()) {
+		if (!mmu_has_feature(MMU_FTR_GTSE)) {
+			unsigned long tgt = H_RPTI_TARGET_CMMU;
+
+			if (atomic_read(&mm->context.copros) > 0)
+				tgt |= H_RPTI_TARGET_NMMU;
+			pseries_rpt_invalidate(pid, tgt, H_RPTI_TYPE_TLB,
+					       H_RPTI_PAGE_ALL, 0, -1UL);
+		} else if (cputlb_use_tlbie()) {
 			if (mm_needs_flush_escalation(mm))
 				_tlbie_pid(pid, RIC_FLUSH_ALL);
 			else
@@ -727,7 +748,16 @@ static void __flush_all_mm(struct mm_struct *mm, bool fullmm)
 				goto local;
 			}
 		}
-		if (cputlb_use_tlbie())
+		if (!mmu_has_feature(MMU_FTR_GTSE)) {
+			unsigned long tgt = H_RPTI_TARGET_CMMU;
+			unsigned long type = H_RPTI_TYPE_TLB | H_RPTI_TYPE_PWC |
+					     H_RPTI_TYPE_PRT;
+
+			if (atomic_read(&mm->context.copros) > 0)
+				tgt |= H_RPTI_TARGET_NMMU;
+			pseries_rpt_invalidate(pid, tgt, type,
+					       H_RPTI_PAGE_ALL, 0, -1UL);
+		} else if (cputlb_use_tlbie())
 			_tlbie_pid(pid, RIC_FLUSH_ALL);
 		else
 			_tlbiel_pid_multicast(mm, pid, RIC_FLUSH_ALL);
@@ -760,7 +790,19 @@ void radix__flush_tlb_page_psize(struct mm_struct *mm, unsigned long vmaddr,
 			exit_flush_lazy_tlbs(mm);
 			goto local;
 		}
-		if (cputlb_use_tlbie())
+		if (!mmu_has_feature(MMU_FTR_GTSE)) {
+			unsigned long tgt, page_sizes, size;
+
+			tgt = H_RPTI_TARGET_CMMU;
+			page_sizes = psize_to_h_rpti(psize);
+			size = 1UL << mmu_psize_to_shift(psize);
+
+			if (atomic_read(&mm->context.copros) > 0)
+				tgt |= H_RPTI_TARGET_NMMU;
+			pseries_rpt_invalidate(pid, tgt, H_RPTI_TYPE_TLB,
+					       page_sizes, vmaddr,
+					       vmaddr + size);
+		} else if (cputlb_use_tlbie())
 			_tlbie_va(vmaddr, pid, psize, RIC_FLUSH_TLB);
 		else
 			_tlbiel_va_multicast(mm, vmaddr, pid, psize, RIC_FLUSH_TLB);
@@ -810,7 +852,14 @@ static inline void _tlbiel_kernel_broadcast(void)
  */
 void radix__flush_tlb_kernel_range(unsigned long start, unsigned long end)
 {
-	if (cputlb_use_tlbie())
+	if (!mmu_has_feature(MMU_FTR_GTSE)) {
+		unsigned long tgt = H_RPTI_TARGET_CMMU | H_RPTI_TARGET_NMMU;
+		unsigned long type = H_RPTI_TYPE_TLB | H_RPTI_TYPE_PWC |
+				     H_RPTI_TYPE_PRT;
+
+		pseries_rpt_invalidate(0, tgt, type, H_RPTI_PAGE_ALL,
+				       start, end);
+	} else if (cputlb_use_tlbie())
 		_tlbie_pid(0, RIC_FLUSH_ALL);
 	else
 		_tlbiel_kernel_broadcast();
@@ -864,7 +913,17 @@ static inline void __radix__flush_tlb_range(struct mm_struct *mm,
 				nr_pages > tlb_local_single_page_flush_ceiling);
 	}
 
-	if (full) {
+	if (!mmu_has_feature(MMU_FTR_GTSE) && !local) {
+		unsigned long tgt = H_RPTI_TARGET_CMMU;
+		unsigned long page_sizes = psize_to_h_rpti(mmu_virtual_psize);
+
+		if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE))
+			page_sizes |= psize_to_h_rpti(MMU_PAGE_2M);
+		if (atomic_read(&mm->context.copros) > 0)
+			tgt |= H_RPTI_TARGET_NMMU;
+		pseries_rpt_invalidate(pid, tgt, H_RPTI_TYPE_TLB, page_sizes,
+				       start, end);
+	} else if (full) {
 		if (local) {
 			_tlbiel_pid(pid, RIC_FLUSH_TLB);
 		} else {
@@ -1046,7 +1105,17 @@ static __always_inline void __radix__flush_tlb_range_psize(struct mm_struct *mm,
 				nr_pages > tlb_local_single_page_flush_ceiling);
 	}
 
-	if (full) {
+	if (!mmu_has_feature(MMU_FTR_GTSE) && !local) {
+		unsigned long tgt = H_RPTI_TARGET_CMMU;
+		unsigned long type = H_RPTI_TYPE_TLB;
+		unsigned long page_sizes = psize_to_h_rpti(psize);
+
+		if (also_pwc)
+			type |= H_RPTI_TYPE_PWC;
+		if (atomic_read(&mm->context.copros) > 0)
+			tgt |= H_RPTI_TARGET_NMMU;
+		pseries_rpt_invalidate(pid, tgt, type, page_sizes, start, end);
+	} else if (full) {
 		if (local) {
 			_tlbiel_pid(pid, also_pwc ? RIC_FLUSH_ALL : RIC_FLUSH_TLB);
 		} else {
@@ -1111,7 +1180,19 @@ void radix__flush_tlb_collapsed_pmd(struct mm_struct *mm, unsigned long addr)
 			exit_flush_lazy_tlbs(mm);
 			goto local;
 		}
-		if (cputlb_use_tlbie())
+		if (!mmu_has_feature(MMU_FTR_GTSE)) {
+			unsigned long tgt, type, page_sizes;
+
+			tgt = H_RPTI_TARGET_CMMU;
+			type = H_RPTI_TYPE_TLB | H_RPTI_TYPE_PWC |
+			       H_RPTI_TYPE_PRT;
+			page_sizes = psize_to_h_rpti(mmu_virtual_psize);
+
+			if (atomic_read(&mm->context.copros) > 0)
+				tgt |= H_RPTI_TARGET_NMMU;
+			pseries_rpt_invalidate(pid, tgt, type, page_sizes,
+					       addr, end);
+		} else if (cputlb_use_tlbie())
 			_tlbie_va_range(addr, end, pid, PAGE_SIZE, mmu_virtual_psize, true);
 		else
 			_tlbiel_va_range_multicast(mm,
-- 
2.21.3


^ permalink raw reply related

* [PATCH v1 3/5] powerpc/pseries: H_REGISTER_PROC_TBL should ask for GTSE only if enabled
From: Bharata B Rao @ 2020-06-18 16:09 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: aneesh.kumar, Bharata B Rao, npiggin
In-Reply-To: <20200618160930.26324-1-bharata@linux.ibm.com>

H_REGISTER_PROC_TBL asks for GTSE by default. GTSE flag bit should
be set only when GTSE is supported.

Signed-off-by: Bharata B Rao <bharata@linux.ibm.com>
---
 arch/powerpc/platforms/pseries/lpar.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/lpar.c b/arch/powerpc/platforms/pseries/lpar.c
index e4ed5317f117..58ba76bc1964 100644
--- a/arch/powerpc/platforms/pseries/lpar.c
+++ b/arch/powerpc/platforms/pseries/lpar.c
@@ -1680,9 +1680,11 @@ static int pseries_lpar_register_process_table(unsigned long base,
 
 	if (table_size)
 		flags |= PROC_TABLE_NEW;
-	if (radix_enabled())
-		flags |= PROC_TABLE_RADIX | PROC_TABLE_GTSE;
-	else
+	if (radix_enabled()) {
+		flags |= PROC_TABLE_RADIX;
+		if (mmu_has_feature(MMU_FTR_GTSE))
+			flags |= PROC_TABLE_GTSE;
+	} else
 		flags |= PROC_TABLE_HPT_SLB;
 	for (;;) {
 		rc = plpar_hcall_norets(H_REGISTER_PROC_TBL, flags, base,
-- 
2.21.3


^ permalink raw reply related

* [PATCH v1 2/5] powerpc/prom_init: Ask for Radix GTSE only if supported.
From: Bharata B Rao @ 2020-06-18 16:09 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: aneesh.kumar, Bharata B Rao, npiggin
In-Reply-To: <20200618160930.26324-1-bharata@linux.ibm.com>

In the case of radix, don't ask for GTSE by default but ask
only if GTSE is enabled.

Signed-off-by: Bharata B Rao <bharata@linux.ibm.com>
---
 arch/powerpc/kernel/prom_init.c | 13 ++++++++-----
 1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/kernel/prom_init.c b/arch/powerpc/kernel/prom_init.c
index 5f15b10eb007..16dd14f58ba6 100644
--- a/arch/powerpc/kernel/prom_init.c
+++ b/arch/powerpc/kernel/prom_init.c
@@ -1336,12 +1336,15 @@ static void __init prom_check_platform_support(void)
 		}
 	}
 
-	if (supported.radix_mmu && supported.radix_gtse &&
-	    IS_ENABLED(CONFIG_PPC_RADIX_MMU)) {
-		/* Radix preferred - but we require GTSE for now */
-		prom_debug("Asking for radix with GTSE\n");
+	if (supported.radix_mmu && IS_ENABLED(CONFIG_PPC_RADIX_MMU)) {
+		/* Radix preferred - Check if GTSE is also supported */
+		prom_debug("Asking for radix\n");
 		ibm_architecture_vec.vec5.mmu = OV5_FEAT(OV5_MMU_RADIX);
-		ibm_architecture_vec.vec5.radix_ext = OV5_FEAT(OV5_RADIX_GTSE);
+		if (supported.radix_gtse)
+			ibm_architecture_vec.vec5.radix_ext =
+					OV5_FEAT(OV5_RADIX_GTSE);
+		else
+			prom_debug("Radix GTSE isn't supported\n");
 	} else if (supported.hash_mmu) {
 		/* Default to hash mmu (if we can) */
 		prom_debug("Asking for hash\n");
-- 
2.21.3


^ permalink raw reply related

* [PATCH v1 1/5] powerpc/mm: Make GTSE an MMU FTR
From: Bharata B Rao @ 2020-06-18 16:09 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: aneesh.kumar, Bharata B Rao, npiggin
In-Reply-To: <20200618160930.26324-1-bharata@linux.ibm.com>

Make GTSE an MMU feature and enable it by default for radix.
However for guest, conditionally enable it if hypervisor supports
it via OV5 vector.

Having GTSE as an MMU feature will make it easy to enable radix
without GTSE. Currently radix assumes GTSE is enabled by default.

Signed-off-by: Bharata B Rao <bharata@linux.ibm.com>
---
 arch/powerpc/include/asm/mmu.h    | 4 ++++
 arch/powerpc/kernel/dt_cpu_ftrs.c | 1 +
 arch/powerpc/mm/init_64.c         | 5 ++++-
 3 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/mmu.h b/arch/powerpc/include/asm/mmu.h
index f4ac25d4df05..884d51995934 100644
--- a/arch/powerpc/include/asm/mmu.h
+++ b/arch/powerpc/include/asm/mmu.h
@@ -28,6 +28,9 @@
  * Individual features below.
  */
 
+/* Guest Translation Shootdown Enable */
+#define MMU_FTR_GTSE			ASM_CONST(0x00001000)
+
 /*
  * Support for 68 bit VA space. We added that from ISA 2.05
  */
@@ -173,6 +176,7 @@ enum {
 #endif
 #ifdef CONFIG_PPC_RADIX_MMU
 		MMU_FTR_TYPE_RADIX |
+		MMU_FTR_GTSE |
 #ifdef CONFIG_PPC_KUAP
 		MMU_FTR_RADIX_KUAP |
 #endif /* CONFIG_PPC_KUAP */
diff --git a/arch/powerpc/kernel/dt_cpu_ftrs.c b/arch/powerpc/kernel/dt_cpu_ftrs.c
index 3a409517c031..fcb815b3a84d 100644
--- a/arch/powerpc/kernel/dt_cpu_ftrs.c
+++ b/arch/powerpc/kernel/dt_cpu_ftrs.c
@@ -337,6 +337,7 @@ static int __init feat_enable_mmu_radix(struct dt_cpu_feature *f)
 #ifdef CONFIG_PPC_RADIX_MMU
 	cur_cpu_spec->mmu_features |= MMU_FTR_TYPE_RADIX;
 	cur_cpu_spec->mmu_features |= MMU_FTRS_HASH_BASE;
+	cur_cpu_spec->mmu_features |= MMU_FTR_GTSE;
 	cur_cpu_spec->cpu_user_features |= PPC_FEATURE_HAS_MMU;
 
 	return 1;
diff --git a/arch/powerpc/mm/init_64.c b/arch/powerpc/mm/init_64.c
index c7ce4ec5060e..a7b571c60e90 100644
--- a/arch/powerpc/mm/init_64.c
+++ b/arch/powerpc/mm/init_64.c
@@ -408,12 +408,15 @@ static void __init early_check_vec5(void)
 		if (!(vec5[OV5_INDX(OV5_RADIX_GTSE)] &
 						OV5_FEAT(OV5_RADIX_GTSE))) {
 			pr_warn("WARNING: Hypervisor doesn't support RADIX with GTSE\n");
-		}
+			cur_cpu_spec->mmu_features &= ~MMU_FTR_GTSE;
+		} else
+			cur_cpu_spec->mmu_features |= MMU_FTR_GTSE;
 		/* Do radix anyway - the hypervisor said we had to */
 		cur_cpu_spec->mmu_features |= MMU_FTR_TYPE_RADIX;
 	} else if (mmu_supported == OV5_FEAT(OV5_MMU_HASH)) {
 		/* Hypervisor only supports hash - disable radix */
 		cur_cpu_spec->mmu_features &= ~MMU_FTR_TYPE_RADIX;
+		cur_cpu_spec->mmu_features &= ~MMU_FTR_GTSE;
 	}
 }
 
-- 
2.21.3


^ permalink raw reply related

* [PATCH v1 0/5] Off-load TLB invalidations to host for !GTSE
From: Bharata B Rao @ 2020-06-18 16:09 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: aneesh.kumar, Bharata B Rao, npiggin

Hypervisor may choose not to enable Guest Translation Shootdown Enable
(GTSE) option for the guest. When GTSE isn't ON, the guest OS isn't
permitted to use instructions like tblie and tlbsync directly, but is
expected to make hypervisor calls to get the TLB flushed.

This series enables the TLB flush routines in the radix code to
off-load TLB flushing to hypervisor via the newly proposed hcall
H_RPT_INVALIDATE. The specification of this hcall is still evolving
while the patchset is posted here for any early comments.

To easily check the availability of GTSE, it is made an MMU feature.
The OV5 handling and H_REGISTER_PROC_TBL hcall are changed to
handle GTSE as an optionally available feature and to not assume GTSE
when radix support is available.

The actual hcall implementation for KVM isn't included in this
patchset.

H_RPT_INVALIDATE
================
Syntax:
int64   /* H_Success: Return code on successful completion */
        /* H_Busy - repeat the call with the same */
        /* H_Parameter, H_P2, H_P3, H_P4, H_P5 : Invalid parameters */
        hcall(const uint64 H_RPT_INVALIDATE, /* Invalidate RPT translation lookaside information */
              uint64 pid,       /* PID/LPID to invalidate */
              uint64 target,    /* Invalidation target */
              uint64 type,      /* Type of lookaside information */
              uint64 pageSizes,     /* Page sizes */
              uint64 start,     /* Start of Effective Address (EA) range (inclusive) */
              uint64 end)       /* End of EA range (exclusive) */

Invalidation targets (target)
-----------------------------
Core MMU        0x01 /* All virtual processors in the partition */
Core local MMU  0x02 /* Current virtual processor */
Nest MMU        0x04 /* All nest/accelerator agents in use by the partition */

A combination of the above can be specified, except core and core local.

Type of translation to invalidate (type)
---------------------------------------
NESTED       0x0001  /* invalidate nested guest partition-scope */
TLB          0x0002  /* Invalidate TLB */
PWC          0x0004  /* Invalidate Page Walk Cache */
PRT          0x0008  /* Invalidate Process Table Entries if NESTED is clear*/
PAT          0x0008  /* Invalidate Partition Table Entries  if NESTED is set*/

A combination of the above can be specified.

Page size mask (pages)
----------------------
4K              0x01
64K             0x02
2M              0x04
1G              0x08
All sizes       (-1UL)

A combination of the above can be specified.
All page sizes can be selected with -1.

Semantics: Invalidate radix tree lookaside information
           matching the parameters given.
* Return H_P2, H_P3 or H_P4 if target, type, or pageSizes parameters are
  different from the defined values.
* Return H_PARAMETER if NESTED is set and pid is not a valid nested
  LPID allocated to this partition
* Return H_P5 if (start, end) doesn't form a valid range. Start and end
  should be a valid Quadrant address and  end > start.
* Return H_NotSupported if the partition is not in running in radix
  translation mode.
* May invalidate more translation information than requested.
* If start = 0 and end = -1, set the range to cover all valid addresses.
  Else start and end should be aligned to 4kB (lower 11 bits clear).
* If NESTED is clear, then invalidate process scoped lookaside information.
  Else pid specifies a nested LPID, and the invalidation is performed
  on nested guest partition table and nested guest partition scope real
  addresses.
* If pid = 0 and NESTED is clear, then valid addresses are quadrant 3 and
  quadrant 0 spaces, Else valid addresses are quadrant 0.
* Pages which are fully covered by the range are to be invalidated.
  Those which are partially covered are considered outside invalidation
  range, which allows a caller to optimally invalidate ranges that may
  contain mixed page sizes.
* Return H_SUCCESS on success.

Bharata B Rao (4):
  powerpc/mm: Make GTSE an MMU FTR
  powerpc/prom_init: Ask for Radix GTSE only if supported.
  powerpc/pseries: H_REGISTER_PROC_TBL should ask for GTSE only if
    enabled
  KVM: PPC: Book3S HV: Use H_RPT_INVALIDATE in nested KVM

Nicholas Piggin (1):
  powerpc/mm/book3s64/radix: Off-load TLB invalidations to host when
    !GTSE

 arch/powerpc/include/asm/firmware.h       |  4 +-
 arch/powerpc/include/asm/hvcall.h         | 27 ++++++-
 arch/powerpc/include/asm/mmu.h            |  4 +
 arch/powerpc/include/asm/plpar_wrappers.h | 52 +++++++++++++
 arch/powerpc/kernel/dt_cpu_ftrs.c         |  1 +
 arch/powerpc/kernel/prom_init.c           | 13 ++--
 arch/powerpc/kvm/book3s_64_mmu_radix.c    | 27 +++++--
 arch/powerpc/kvm/book3s_hv_nested.c       | 13 +++-
 arch/powerpc/mm/book3s64/radix_tlb.c      | 95 +++++++++++++++++++++--
 arch/powerpc/mm/init_64.c                 |  5 +-
 arch/powerpc/platforms/pseries/firmware.c |  1 +
 arch/powerpc/platforms/pseries/lpar.c     |  8 +-
 12 files changed, 224 insertions(+), 26 deletions(-)

-- 
2.21.3


^ permalink raw reply

* Re: [PATCH net] ibmveth: Fix max MTU limit
From: Jakub Kicinski @ 2020-06-18 15:57 UTC (permalink / raw)
  To: Thomas Falcon; +Cc: netdev, linuxppc-dev
In-Reply-To: <1592495026-27202-1-git-send-email-tlfalcon@linux.ibm.com>

On Thu, 18 Jun 2020 10:43:46 -0500 Thomas Falcon wrote:
> The max MTU limit defined for ibmveth is not accounting for
> virtual ethernet buffer overhead, which is twenty-two additional
> bytes set aside for the ethernet header and eight additional bytes
> of an opaque handle reserved for use by the hypervisor. Update the
> max MTU to reflect this overhead.
> 
> Signed-off-by: Thomas Falcon <tlfalcon@linux.ibm.com>

How about

Fixes: d894be57ca92 ("ethernet: use net core MTU range checking in more drivers")
Fixes: 110447f8269a ("ethernet: fix min/max MTU typos")

?

^ permalink raw reply

* [PATCH net] ibmveth: Fix max MTU limit
From: Thomas Falcon @ 2020-06-18 15:43 UTC (permalink / raw)
  To: netdev; +Cc: Thomas Falcon, linuxppc-dev

The max MTU limit defined for ibmveth is not accounting for
virtual ethernet buffer overhead, which is twenty-two additional
bytes set aside for the ethernet header and eight additional bytes
of an opaque handle reserved for use by the hypervisor. Update the
max MTU to reflect this overhead.

Signed-off-by: Thomas Falcon <tlfalcon@linux.ibm.com>
---
 drivers/net/ethernet/ibm/ibmveth.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/ibm/ibmveth.c b/drivers/net/ethernet/ibm/ibmveth.c
index 96d36ae5049e..c5c732601e35 100644
--- a/drivers/net/ethernet/ibm/ibmveth.c
+++ b/drivers/net/ethernet/ibm/ibmveth.c
@@ -1715,7 +1715,7 @@ static int ibmveth_probe(struct vio_dev *dev, const struct vio_device_id *id)
 	}
 
 	netdev->min_mtu = IBMVETH_MIN_MTU;
-	netdev->max_mtu = ETH_MAX_MTU;
+	netdev->max_mtu = ETH_MAX_MTU - IBMVETH_BUFF_OH;
 
 	memcpy(netdev->dev_addr, mac_addr_p, ETH_ALEN);
 
-- 
2.26.2


^ permalink raw reply related

* Re: [PATCH v2 02/12] ocxl: Change type of pasid to unsigned int
From: Fenghua Yu @ 2020-06-18 15:37 UTC (permalink / raw)
  To: Frederic Barrat
  Cc: Dave Hansen, H Peter Anvin, Dave Jiang, Ashok Raj, Joerg Roedel,
	x86, amd-gfx, Ingo Molnar, Ravi V Shankar, Yu-cheng Yu,
	Andrew Donnellan, Borislav Petkov, Sohil Mehta, Thomas Gleixner,
	Tony Luck, David Woodhouse, Felix Kuehling, linux-kernel, iommu,
	Jacob Jun Pan, linuxppc-dev, Lu Baolu
In-Reply-To: <972dc2cb-9643-53af-b11d-ebb56d96053d@linux.ibm.com>

Hi, Frederic,

On Thu, Jun 18, 2020 at 10:05:19AM +0200, Frederic Barrat wrote:
> 
> 
> Le 13/06/2020 à 02:41, Fenghua Yu a écrit :
> >PASID is defined as "int" although it's a 20-bit value and shouldn't be
> >negative int. To be consistent with type defined in iommu, define PASID
> >as "unsigned int".
> 
> 
> It looks like this patch was considered because of the use of 'pasid' in
> variable or function names. The ocxl driver only makes sense on powerpc and
> shouldn't compile on anything else, so it's probably useless in the context
> of that series.
> The pasid here is defined by the opencapi specification
> (https://opencapi.org), it is borrowed from the PCI world and you could
> argue it could be an unsigned int. But then I think the patch doesn't go far
> enough. But considering it's not used on x86, I think this patch can be
> dropped.

The first 3 patches clean up pasid and flag defitions to prepare for
following patches.

If you think this patch can be dropped, we will drop it.

Thanks.

-Fenghua

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox