LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH V2] powerpc/pseries/svm: Remove unwanted check for shared_lppaca_size
From: Satheesh Rajendran @ 2020-06-19  7:01 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Laurent Dufour, Ram Pai, linux-kernel, Satheesh Rajendran,
	Sukadev Bhattiprolu, Thiago Jung Bauermann

Early secure guest boot hits the below crash while booting with
vcpus numbers aligned with page boundary for PAGE size of 64k
and LPPACA size of 1k i.e 64, 128 etc, due to the BUG_ON assert
for shared_lppaca_total_size equal to shared_lppaca_size,

 [    0.000000] Partition configured for 64 cpus.
 [    0.000000] CPU maps initialized for 1 thread per core
 [    0.000000] ------------[ cut here ]------------
 [    0.000000] kernel BUG at arch/powerpc/kernel/paca.c:89!
 [    0.000000] Oops: Exception in kernel mode, sig: 5 [#1]
 [    0.000000] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries

which is not necessary, let's remove it.

Fixes: bd104e6db6f0 ("powerpc/pseries/svm: Use shared memory for LPPACA structures")
Cc: linux-kernel@vger.kernel.org
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Thiago Jung Bauermann <bauerman@linux.ibm.com>
Cc: Ram Pai <linuxram@us.ibm.com>
Cc: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Cc: Laurent Dufour <ldufour@linux.ibm.com>
Reviewed-by: Laurent Dufour <ldufour@linux.ibm.com>
Reviewed-by: Thiago Jung Bauermann <bauerman@linux.ibm.com>
Signed-off-by: Satheesh Rajendran <sathnaga@linux.vnet.ibm.com>
---

V2:
Added Reviewed by Thiago and Laurent.
Added Fixes tag as per Thiago suggest.

V1: https://patchwork.ozlabs.org/project/linuxppc-dev/patch/20200609105731.14032-1-sathnaga@linux.vnet.ibm.com/ 
---
 arch/powerpc/kernel/paca.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/paca.c b/arch/powerpc/kernel/paca.c
index 2168372b792d..74da65aacbc9 100644
--- a/arch/powerpc/kernel/paca.c
+++ b/arch/powerpc/kernel/paca.c
@@ -87,7 +87,7 @@ static void *__init alloc_shared_lppaca(unsigned long size, unsigned long align,
 	 * This is very early in boot, so no harm done if the kernel crashes at
 	 * this point.
 	 */
-	BUG_ON(shared_lppaca_size >= shared_lppaca_total_size);
+	BUG_ON(shared_lppaca_size > shared_lppaca_total_size);
 
 	return ptr;
 }
-- 
2.26.2


^ permalink raw reply related

* Re: rename probe_kernel_* and probe_user_*
From: Michael Ellerman @ 2020-06-19  6:21 UTC (permalink / raw)
  To: Linus Torvalds, Christoph Hellwig, Russell King, Tony Luck,
	Helge Deller
  Cc: linux-arch, linux-ia64, linux-parisc, the arch/x86 maintainers,
	Linux Kernel Mailing List, Andrew Morton, linuxppc-dev
In-Reply-To: <CAHk-=wjpnu=882iD9ck9Ywt6R1LYX_Hv-oS7dBMsWZwDRGZ5jA@mail.gmail.com>

Linus Torvalds <torvalds@linux-foundation.org> writes:
> [ Explicitly added architecture lists and developers to the cc to make
> this more visible ]
>
> On Wed, Jun 17, 2020 at 12:38 AM Christoph Hellwig <hch@lst.de> wrote:
>>
>> Andrew and I decided to drop the patches implementing your suggested
>> rename of the probe_kernel_* and probe_user_* helpers from -mm as there
>> were way to many conflicts.  After -rc1 might be a good time for this as
>> all the conflicts are resolved now.
>
> So I've merged this renaming now, together with my changes to make
> 'get_kernel_nofault()' look and act a lot more like 'get_user()'.
>
> It just felt wrong (and potentially dangerous) to me to have a
> 'get_kernel_nofault()' naming that implied semantics that we're all
> familiar with from 'get_user()', but acting very differently.
>
> But part of the fixups I made for the type checking are for
> architectures where I didn't even compile-test the end result. I
> looked at every case individually, and the patch looks sane, but I
> could have screwed something up.
>
> Basically, 'get_kernel_nofault()' doesn't do the same automagic type
> munging from the pointer to the target that 'get_user()' does, but at
> least now it checks that the types are superficially compatible.
> There should be build failures if they aren't, but I hopefully fixed
> everything up properly for all architectures.
>
> This email is partly to ask people to double-check, but partly just as
> a heads-up so that _if_ I screwed something up, you'll have the
> background and it won't take you by surprise.

The powerpc changes look right, compile cleanly and seem to work
correctly.

cheers

^ permalink raw reply

* Re: powerpc/pci: [PATCH 1/1 V3] PCIE PHB reset
From: Oliver O'Halloran @ 2020-06-19  6:09 UTC (permalink / raw)
  To: Michael Ellerman; +Cc: Brian King, Wen Xiong, linuxppc-dev, wenxiong
In-Reply-To: <87ftaudx1x.fsf@mpe.ellerman.id.au>

On Wed, Jun 17, 2020 at 4:29 PM Michael Ellerman <mpe@ellerman.id.au> wrote:
>
> "Oliver O'Halloran" <oohall@gmail.com> writes:
> > On Tue, Jun 16, 2020 at 9:55 PM Michael Ellerman <mpe@ellerman.id.au> wrote:
> >> wenxiong@linux.vnet.ibm.com writes:
> >> > From: Wen Xiong <wenxiong@linux.vnet.ibm.com>
> >> >
> >> > Several device drivers hit EEH(Extended Error handling) when triggering
> >> > kdump on Pseries PowerVM. This patch implemented a reset of the PHBs
> >> > in pci general code when triggering kdump.
> >>
> >> Actually it's in pseries specific PCI code, and the reset is done in the
> >> 2nd kernel as it boots, not when triggering the kdump.
> >>
> >> You're doing it as a:
> >>
> >>   machine_postcore_initcall(pseries, pseries_phb_reset);
> >>
> >> But we do the EEH initialisation in:
> >>
> >>   core_initcall_sync(eeh_init);
> >>
> >> Which happens first.
> >>
> >> So it seems to me that this should be called from pseries_eeh_init().
> >
> > This happens to use some of the same RTAS calls as EEH, but it's
> > entirely orthogonal to it.
>
> I don't agree. I mean it's literally calling EEH_RESET_FUNDAMENTAL etc.
> Those RTAS calls are all documented in the EEH section of PAPR.
>
> I guess you're saying it's orthogonal to the kernel handling an EEH and
> doing the recovery process etc, which I can kind of see.
>
> > Wedging the two together doesn't make any real sense IMO since this
> > should be usable even with !CONFIG_EEH.
>
> You can't turn CONFIG_EEH off for pseries or powernv.

Not yet :)

> And if you could this patch wouldn't compile because it uses EEH
> constants that are behind #ifdef CONFIG_EEH.

That's fixable.

> If you could turn CONFIG_EEH off it would presumably be because you were
> on a platform that didn't support EEH, in which case you wouldn't need
> this code.

I think there's an argument to be made for disabling EEH in some
situations. A lot of drivers do a pretty poor job of recovering in the
first place so it's conceivable that someone might want to disable it
in say, a kdump kernel. That said, the real reason is mostly for the
sake of code organisation. EEH is an optional platform feature but you
wouldn't know it looking at the implementation and I'd like to stop it
bleeding into odd places. Making it buildable without !CONFIG_EEH
would probably help.

> So IMO this is EEH code, and should be with the other EEH code and
> should be behind CONFIG_EEH.

*shrug*

I wanted it to follow the model of the powernv implementation of the
same feature which is done immediately after initialising the
pci_controller and independent of all of the EEH setup. Although,
looking at it again I see it calls pnv_eeh_phb_reset() which is in
eeh_powernv.c so I guess that's pretty similar to what you're
suggesting.

> That sounds like a good cleanup. I'm not concerned about conflicts
> within arch/powerpc, I can fix them up.
>
> >> > +             list_for_each_entry(phb, &hose_list, list_node) {
> >> > +                     config_addr = pseries_get_pdn_addr(phb);
> >> > +                     if (config_addr == -1)
> >> > +                             continue;
> >> > +
> >> > +                     ret = rtas_call(ibm_set_slot_reset, 4, 1, NULL,
> >> > +                             config_addr, BUID_HI(phb->buid),
> >> > +                             BUID_LO(phb->buid), EEH_RESET_FUNDAMENTAL);
> >> > +
> >> > +                     /* If fundamental-reset not supported, try hot-reset */
> >> > +                     if (ret == -8)
> >>
> >> Where does -8 come from?
> >
> > There's a comment right there.
>
> Yeah I guess. I was expecting it would map to some RTAS_ERROR_FOO value,
> but it's just literally -8 in PAPR.

Yeah, as far as I can tell the meaning of the return codes are
specific to each RTAS call, it's a bit bad.

^ permalink raw reply

* Re: [PATCH] ASoC: fsl_spdif: Add pm runtime function
From: Nicolin Chen @ 2020-06-19  5:49 UTC (permalink / raw)
  To: Shengjiu Wang
  Cc: alsa-devel, timur, Xiubo.Lee, linuxppc-dev, tiwai, perex, broonie,
	festevam, linux-kernel
In-Reply-To: <1592481334-3680-1-git-send-email-shengjiu.wang@nxp.com>

On Thu, Jun 18, 2020 at 07:55:34PM +0800, Shengjiu Wang wrote:
> Add pm runtime support and move clock handling there.
> Close the clocks at suspend to reduce the power consumption.
> 
> fsl_spdif_suspend is replaced by pm_runtime_force_suspend.
> fsl_spdif_resume is replaced by pm_runtime_force_resume.
> 
> Signed-off-by: Shengjiu Wang <shengjiu.wang@nxp.com>

LGTM, yet some nits, please add my ack after fixing:

Acked-by: Nicolin Chen <nicoleotsuka@gmail.com>

> @@ -495,25 +496,10 @@ static int fsl_spdif_startup(struct snd_pcm_substream *substream,

>  
> -disable_txclk:
> -	for (i--; i >= 0; i--)
> -		clk_disable_unprepare(spdif_priv->txclk[i]);
>  err:
> -	if (!IS_ERR(spdif_priv->spbaclk))
> -		clk_disable_unprepare(spdif_priv->spbaclk);
> -err_spbaclk:
> -	clk_disable_unprepare(spdif_priv->coreclk);
> -
>  	return ret;

Only "return ret;" remains now. We could clean the goto away.

> -static int fsl_spdif_resume(struct device *dev)
> +static int fsl_spdif_runtime_resume(struct device *dev)

> +disable_rx_clk:
> +	clk_disable_unprepare(spdif_priv->rxclk);
> +disable_tx_clk:
> +disable_spba_clk:

Why have two duplicated ones? Could probably drop the 2nd one.

^ permalink raw reply

* [PATCH 4/4] powerpc/pseries/iommu: Remove default DMA window before creating DDW
From: Leonardo Bras @ 2020-06-19  5:06 UTC (permalink / raw)
  To: Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras,
	Alexey Kardashevskiy, Leonardo Bras, Thiago Jung Bauermann,
	Ram Pai
  Cc: linuxppc-dev, linux-kernel
In-Reply-To: <20200619050619.266888-1-leobras.c@gmail.com>

On LoPAR "DMA Window Manipulation Calls", it's recommended to remove the
default DMA window for the device, before attempting to configure a DDW,
in order to make the maximum resources available for the next DDW to be
created.

This is a requirement for some devices to use DDW, given they only
allow one DMA window.

If setting up a new DDW fails anywhere after the removal of this
default DMA window, restore it using reset_dma_window.

Signed-off-by: Leonardo Bras <leobras.c@gmail.com>
---
 arch/powerpc/platforms/pseries/iommu.c | 20 +++++++++++++++++---
 1 file changed, 17 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/iommu.c b/arch/powerpc/platforms/pseries/iommu.c
index de633f6ae093..68d1ea957ac7 100644
--- a/arch/powerpc/platforms/pseries/iommu.c
+++ b/arch/powerpc/platforms/pseries/iommu.c
@@ -1074,8 +1074,9 @@ static u64 enable_ddw(struct pci_dev *dev, struct device_node *pdn)
 	u64 dma_addr, max_addr;
 	struct device_node *dn;
 	u32 ddw_avail[3];
+
 	struct direct_window *window;
-	struct property *win64;
+	struct property *win64, *dfl_win;
 	struct dynamic_dma_window_prop *ddwprop;
 	struct failed_ddw_pdn *fpdn;
 
@@ -1110,8 +1111,19 @@ static u64 enable_ddw(struct pci_dev *dev, struct device_node *pdn)
 	if (ret)
 		goto out_failed;
 
-       /*
-	 * Query if there is a second window of size to map the
+	/*
+	 * First step of setting up DDW is removing the default DMA window,
+	 * if it's present. It will make all the resources available to the
+	 * new DDW window.
+	 * If anything fails after this, we need to restore it.
+	 */
+
+	dfl_win = of_find_property(pdn, "ibm,dma-window", NULL);
+	if (dfl_win)
+		remove_dma_window(pdn, ddw_avail, dfl_win);
+
+	/*
+	 * Query if there is a window of size to map the
 	 * whole partition.  Query returns number of windows, largest
 	 * block assigned to PE (partition endpoint), and two bitmasks
 	 * of page sizes: supported and supported for migrate-dma.
@@ -1219,6 +1231,8 @@ static u64 enable_ddw(struct pci_dev *dev, struct device_node *pdn)
 	kfree(win64);
 
 out_failed:
+	if (dfl_win)
+		reset_dma_window(dev, pdn);
 
 	fpdn = kzalloc(sizeof(*fpdn), GFP_KERNEL);
 	if (!fpdn)
-- 
2.25.4


^ permalink raw reply related

* [PATCH 3/4] powerpc/pseries/iommu: Move window-removing part of remove_ddw into remove_dma_window
From: Leonardo Bras @ 2020-06-19  5:06 UTC (permalink / raw)
  To: Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras,
	Alexey Kardashevskiy, Leonardo Bras, Thiago Jung Bauermann,
	Ram Pai
  Cc: linuxppc-dev, linux-kernel
In-Reply-To: <20200619050619.266888-1-leobras.c@gmail.com>

Move the window-removing part of remove_ddw into a new function
(remove_dma_window), so it can be used to remove other DMA windows.

It's useful for removing DMA windows that don't create DIRECT64_PROPNAME
property, like the default DMA window from the device, which uses
"ibm,dma-window".

Signed-off-by: Leonardo Bras <leobras.c@gmail.com>
---
 arch/powerpc/platforms/pseries/iommu.c | 53 +++++++++++++++-----------
 1 file changed, 31 insertions(+), 22 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/iommu.c b/arch/powerpc/platforms/pseries/iommu.c
index 5e1fbc176a37..de633f6ae093 100644
--- a/arch/powerpc/platforms/pseries/iommu.c
+++ b/arch/powerpc/platforms/pseries/iommu.c
@@ -767,25 +767,14 @@ static int __init disable_ddw_setup(char *str)
 
 early_param("disable_ddw", disable_ddw_setup);
 
-static void remove_ddw(struct device_node *np, bool remove_prop)
+static void remove_dma_window(struct device_node *pdn, u32 *ddw_avail,
+			      struct property *win)
 {
 	struct dynamic_dma_window_prop *dwp;
-	struct property *win64;
-	u32 ddw_avail[3];
 	u64 liobn;
-	int ret = 0;
-
-	ret = of_property_read_u32_array(np, "ibm,ddw-applicable",
-					 &ddw_avail[0], 3);
-
-	win64 = of_find_property(np, DIRECT64_PROPNAME, NULL);
-	if (!win64)
-		return;
-
-	if (ret || win64->length < sizeof(*dwp))
-		goto delprop;
+	int ret;
 
-	dwp = win64->value;
+	dwp = win->value;
 	liobn = (u64)be32_to_cpu(dwp->liobn);
 
 	/* clear the whole window, note the arg is in kernel pages */
@@ -793,24 +782,44 @@ static void remove_ddw(struct device_node *np, bool remove_prop)
 		1ULL << (be32_to_cpu(dwp->window_shift) - PAGE_SHIFT), dwp);
 	if (ret)
 		pr_warn("%pOF failed to clear tces in window.\n",
-			np);
+			pdn);
 	else
 		pr_debug("%pOF successfully cleared tces in window.\n",
-			 np);
+			 pdn);
 
 	ret = rtas_call(ddw_avail[2], 1, 1, NULL, liobn);
 	if (ret)
 		pr_warn("%pOF: failed to remove direct window: rtas returned "
 			"%d to ibm,remove-pe-dma-window(%x) %llx\n",
-			np, ret, ddw_avail[2], liobn);
+			pdn, ret, ddw_avail[2], liobn);
 	else
 		pr_debug("%pOF: successfully removed direct window: rtas returned "
 			"%d to ibm,remove-pe-dma-window(%x) %llx\n",
-			np, ret, ddw_avail[2], liobn);
+			pdn, ret, ddw_avail[2], liobn);
+}
+
+static void remove_ddw(struct device_node *np, bool remove_prop)
+{
+	struct property *win;
+	u32 ddw_avail[3];
+	int ret = 0;
+
+	ret = of_property_read_u32_array(np, "ibm,ddw-applicable",
+					 &ddw_avail[0], 3);
+	if (ret)
+		return;
+
+	win = of_find_property(np, DIRECT64_PROPNAME, NULL);
+	if (!win)
+		return;
+
+	if (win->length >= sizeof(struct dynamic_dma_window_prop))
+		remove_dma_window(np, ddw_avail, win);
+
+	if (!remove_prop)
+		return;
 
-delprop:
-	if (remove_prop)
-		ret = of_remove_property(np, win64);
+	ret = of_remove_property(np, win);
 	if (ret)
 		pr_warn("%pOF: failed to remove direct window property: %d\n",
 			np, ret);
-- 
2.25.4


^ permalink raw reply related

* [PATCH 2/4] powerpc/pseries/iommu: Implement ibm, reset-pe-dma-windows rtas call
From: Leonardo Bras @ 2020-06-19  5:06 UTC (permalink / raw)
  To: Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras,
	Alexey Kardashevskiy, Leonardo Bras, Thiago Jung Bauermann,
	Ram Pai
  Cc: linuxppc-dev, linux-kernel
In-Reply-To: <20200619050619.266888-1-leobras.c@gmail.com>

Platforms supporting the DDW option starting with LoPAR level 2.7 implement
ibm,ddw-extensions. The first extension available (index 2) carries the
token for ibm,reset-pe-dma-windows rtas call, which is used to restore
the default DMA window for a device, if it has been deleted.

It does so by resetting the TCE table allocation for the PE to it's
boot time value, available in "ibm,dma-window" device tree node.

Signed-off-by: Leonardo Bras <leobras.c@gmail.com>
---
 arch/powerpc/platforms/pseries/iommu.c | 33 ++++++++++++++++++++++++++
 1 file changed, 33 insertions(+)

diff --git a/arch/powerpc/platforms/pseries/iommu.c b/arch/powerpc/platforms/pseries/iommu.c
index e5a617738c8b..5e1fbc176a37 100644
--- a/arch/powerpc/platforms/pseries/iommu.c
+++ b/arch/powerpc/platforms/pseries/iommu.c
@@ -1012,6 +1012,39 @@ static phys_addr_t ddw_memory_hotplug_max(void)
 	return max_addr;
 }
 
+/*
+ * Platforms supporting the DDW option starting with LoPAR level 2.7 implement
+ * ibm,ddw-extensions, which carries the rtas token for
+ * ibm,reset-pe-dma-windows.
+ * That rtas-call can be used to restore the default DMA window for the device.
+ */
+static void reset_dma_window(struct pci_dev *dev, struct device_node *par_dn)
+{
+	int ret;
+	u32 cfg_addr, ddw_ext[3];
+	u64 buid;
+	struct device_node *dn;
+	struct pci_dn *pdn;
+
+	ret = of_property_read_u32_array(par_dn, "ibm,ddw-extensions",
+					 &ddw_ext[0], 3);
+	if (ret)
+		return;
+
+	dn = pci_device_to_OF_node(dev);
+	pdn = PCI_DN(dn);
+	buid = pdn->phb->buid;
+	cfg_addr = ((pdn->busno << 16) | (pdn->devfn << 8));
+
+	ret = rtas_call(ddw_ext[1], 3, 1, NULL, cfg_addr,
+			BUID_HI(buid), BUID_LO(buid));
+	if (ret)
+		dev_info(&dev->dev,
+			 "ibm,reset-pe-dma-windows(%x) %x %x %x returned %d ",
+			 ddw_ext[1], cfg_addr, BUID_HI(buid), BUID_LO(buid),
+			 ret);
+}
+
 /*
  * If the PE supports dynamic dma windows, and there is space for a table
  * that can map all pages in a linear offset, then setup such a table,
-- 
2.25.4


^ permalink raw reply related

* [PATCH 1/4] powerpc/pseries/iommu: Update call to ibm, query-pe-dma-windows
From: Leonardo Bras @ 2020-06-19  5:06 UTC (permalink / raw)
  To: Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras,
	Alexey Kardashevskiy, Leonardo Bras, Thiago Jung Bauermann,
	Ram Pai
  Cc: linuxppc-dev, linux-kernel
In-Reply-To: <20200619050619.266888-1-leobras.c@gmail.com>

From LoPAR level 2.8, "ibm,ddw-extensions" index 3 can make the number of
outputs from "ibm,query-pe-dma-windows" go from 5 to 6.

This change of output size is meant to expand the address size of
largest_available_block PE TCE from 32-bit to 64-bit, which ends up
shifting page_size and migration_capable.

This ends up requiring the update of
ddw_query_response->largest_available_block from u32 to u64, and manually
assigning the values from the buffer into this struct, according to
output size.

Signed-off-by: Leonardo Bras <leobras.c@gmail.com>
---
 arch/powerpc/platforms/pseries/iommu.c | 57 +++++++++++++++++++++-----
 1 file changed, 46 insertions(+), 11 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/iommu.c b/arch/powerpc/platforms/pseries/iommu.c
index 6d47b4a3ce39..e5a617738c8b 100644
--- a/arch/powerpc/platforms/pseries/iommu.c
+++ b/arch/powerpc/platforms/pseries/iommu.c
@@ -334,7 +334,7 @@ struct direct_window {
 /* Dynamic DMA Window support */
 struct ddw_query_response {
 	u32 windows_available;
-	u32 largest_available_block;
+	u64 largest_available_block;
 	u32 page_size;
 	u32 migration_capable;
 };
@@ -869,14 +869,32 @@ static int find_existing_ddw_windows(void)
 }
 machine_arch_initcall(pseries, find_existing_ddw_windows);
 
+/*
+ * From LoPAR level 2.8, "ibm,ddw-extensions" index 3 can rule how many output
+ * parameters ibm,query-pe-dma-windows will have, ranging from 5 to 6.
+ */
+
+static int query_ddw_out_sz(struct device_node *par_dn)
+{
+	int ret;
+	u32 ddw_ext[3];
+
+	ret = of_property_read_u32_array(par_dn, "ibm,ddw-extensions",
+					 &ddw_ext[0], 3);
+	if (ret || ddw_ext[0] < 2 || ddw_ext[2] != 1)
+		return 5;
+	return 6;
+}
+
 static int query_ddw(struct pci_dev *dev, const u32 *ddw_avail,
-			struct ddw_query_response *query)
+		     struct ddw_query_response *query,
+		     struct device_node *par_dn)
 {
 	struct device_node *dn;
 	struct pci_dn *pdn;
-	u32 cfg_addr;
+	u32 cfg_addr, query_out[5];
 	u64 buid;
-	int ret;
+	int ret, out_sz;
 
 	/*
 	 * Get the config address and phb buid of the PE window.
@@ -888,12 +906,29 @@ static int query_ddw(struct pci_dev *dev, const u32 *ddw_avail,
 	pdn = PCI_DN(dn);
 	buid = pdn->phb->buid;
 	cfg_addr = ((pdn->busno << 16) | (pdn->devfn << 8));
+	out_sz = query_ddw_out_sz(par_dn);
+
+	ret = rtas_call(ddw_avail[0], 3, out_sz, query_out,
+			cfg_addr, BUID_HI(buid), BUID_LO(buid));
+	dev_info(&dev->dev, "ibm,query-pe-dma-windows(%x) %x %x %x returned %d\n",
+		 ddw_avail[0], cfg_addr, BUID_HI(buid), BUID_LO(buid), ret);
+
+	switch (out_sz) {
+	case 5:
+		query->windows_available = query_out[0];
+		query->largest_available_block = query_out[1];
+		query->page_size = query_out[2];
+		query->migration_capable = query_out[3];
+		break;
+	case 6:
+		query->windows_available = query_out[0];
+		query->largest_available_block = ((u64)query_out[1] << 32) |
+						 query_out[2];
+		query->page_size = query_out[3];
+		query->migration_capable = query_out[4];
+		break;
+	}
 
-	ret = rtas_call(ddw_avail[0], 3, 5, (u32 *)query,
-		  cfg_addr, BUID_HI(buid), BUID_LO(buid));
-	dev_info(&dev->dev, "ibm,query-pe-dma-windows(%x) %x %x %x"
-		" returned %d\n", ddw_avail[0], cfg_addr, BUID_HI(buid),
-		BUID_LO(buid), ret);
 	return ret;
 }
 
@@ -1040,7 +1075,7 @@ static u64 enable_ddw(struct pci_dev *dev, struct device_node *pdn)
 	 * of page sizes: supported and supported for migrate-dma.
 	 */
 	dn = pci_device_to_OF_node(dev);
-	ret = query_ddw(dev, ddw_avail, &query);
+	ret = query_ddw(dev, ddw_avail, &query, pdn);
 	if (ret != 0)
 		goto out_failed;
 
@@ -1068,7 +1103,7 @@ static u64 enable_ddw(struct pci_dev *dev, struct device_node *pdn)
 	/* check largest block * page size > max memory hotplug addr */
 	max_addr = ddw_memory_hotplug_max();
 	if (query.largest_available_block < (max_addr >> page_shift)) {
-		dev_dbg(&dev->dev, "can't map partition max 0x%llx with %u "
+		dev_dbg(&dev->dev, "can't map partition max 0x%llx with %llu "
 			  "%llu-sized pages\n", max_addr,  query.largest_available_block,
 			  1ULL << page_shift);
 		goto out_failed;
-- 
2.25.4


^ permalink raw reply related

* [PATCH 0/4] Remove default DMA window before creating DDW
From: Leonardo Bras @ 2020-06-19  5:06 UTC (permalink / raw)
  To: Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras,
	Alexey Kardashevskiy, Leonardo Bras, Thiago Jung Bauermann,
	Ram Pai
  Cc: linuxppc-dev, linux-kernel

There are some devices that only allow 1 DMA window to exist at a time,
and in those cases, a DDW is never created to them, since the default DMA
window keeps using this resource.

LoPAR recommends this procedure:
1. Remove the default DMA window,
2. Query for which configs the DDW can be created,
3. Create a DDW.

Patch #1:
- After LoPAR level 2.8, there is an extension that can make
  ibm,query-pe-dma-windows to have 6 outputs instead of 5. This changes the
  order of the outputs, and that can cause some trouble. 
- query_ddw() was updated to check how many outputs the 
  ibm,query-pe-dma-windows is supposed to have, update the rtas_call() and
  deal correctly with the outputs in both cases.
- This patch looks somehow unrelated to the series, but it can avoid future
  problems on DDW creation.

Patch #2 implements a new rtas call to recover the default DMA window,
in case anything fails after it was removed, and a DDW couldn't be created.

Patch #3 moves the window-removing code from remove_ddw() to
remove_dma_window(), creating a way to delete any DMA window, so it can be
used to delete the default DMA window.

Patch #4 makes use of the remove_dma_window() from patch #3 to remove the
default DMA window before query_ddw() and the rtas call from patch #2
to recover it if something goes wrong.

All patches were tested into an LPAR with an Ethernet VF:
4005:01:00.0 Ethernet controller: Mellanox Technologies MT27700 Family
[ConnectX-4 Virtual Function]

Leonardo Bras (4):
  powerpc/pseries/iommu: Update call to ibm,query-pe-dma-windows
  powerpc/pseries/iommu: Implement ibm,reset-pe-dma-windows rtas call
  powerpc/pseries/iommu: Move window-removing part of remove_ddw into
    remove_dma_window
  powerpc/pseries/iommu: Remove default DMA window before creating DDW

 arch/powerpc/platforms/pseries/iommu.c | 163 +++++++++++++++++++------
 1 file changed, 127 insertions(+), 36 deletions(-)

-- 
2.25.4


^ permalink raw reply

* Re: [PATCH 2/2] powerpc/hv-24x7: Add sysfs files inside hv-24x7 device to show cpumask
From: Gautham R Shenoy @ 2020-06-19  5:05 UTC (permalink / raw)
  To: Kajol Jain; +Cc: nathanl, maddy, suka, anju, linuxppc-dev
In-Reply-To: <20200618122713.9030-3-kjain@linux.ibm.com>

On Thu, Jun 18, 2020 at 05:57:13PM +0530, Kajol Jain wrote:
> Patch here adds a cpumask attr to hv_24x7 pmu along with ABI documentation.
> 
> command:# cat /sys/devices/hv_24x7/cpumask
> 0
> 
> Signed-off-by: Kajol Jain <kjain@linux.ibm.com>
> ---
>  .../sysfs-bus-event_source-devices-hv_24x7    |  6 ++++
>  arch/powerpc/perf/hv-24x7.c                   | 31 ++++++++++++++++++-
>  2 files changed, 36 insertions(+), 1 deletion(-)
> 
> diff --git a/Documentation/ABI/testing/sysfs-bus-event_source-devices-hv_24x7 b/Documentation/ABI/testing/sysfs-bus-event_source-devices-hv_24x7
> index e8698afcd952..281e7b367733 100644
> --- a/Documentation/ABI/testing/sysfs-bus-event_source-devices-hv_24x7
> +++ b/Documentation/ABI/testing/sysfs-bus-event_source-devices-hv_24x7
> @@ -43,6 +43,12 @@ Description:	read only
>  		This sysfs interface exposes the number of cores per chip
>  		present in the system.
> 
> +What:		/sys/devices/hv_24x7/cpumask
> +Date:		June 2020
> +Contact:	Linux on PowerPC Developer List <linuxppc-dev@lists.ozlabs.org>
> +Description:	read only
> +		This sysfs file exposes cpumask.

Could you please describe this in little more detail as to what the
cpumask is ?

> +
>  What:		/sys/bus/event_source/devices/hv_24x7/event_descs/<event-name>
>  Date:		February 2014
>  Contact:	Linux on PowerPC Developer List <linuxppc-dev@lists.ozlabs.org>
> diff --git a/arch/powerpc/perf/hv-24x7.c b/arch/powerpc/perf/hv-24x7.c
> index fdc4ae155d60..03d870a9fc36 100644
> --- a/arch/powerpc/perf/hv-24x7.c
> +++ b/arch/powerpc/perf/hv-24x7.c
> @@ -448,6 +448,12 @@ static ssize_t device_show_string(struct device *dev,
>  	return sprintf(buf, "%s\n", (char *)d->var);
>  }
> 
> +static ssize_t cpumask_get_attr(struct device *dev,
> +				struct device_attribute *attr, char *buf)
> +{
> +	return cpumap_print_to_pagebuf(true, buf, &hv_24x7_cpumask);
> +}
> +
>  static ssize_t sockets_show(struct device *dev,
>  			    struct device_attribute *attr, char *buf)
>  {
> @@ -1116,6 +1122,17 @@ static DEVICE_ATTR_RO(sockets);
>  static DEVICE_ATTR_RO(chipspersocket);
>  static DEVICE_ATTR_RO(coresperchip);
> 
> +static DEVICE_ATTR(cpumask, S_IRUGO, cpumask_get_attr, NULL);
> +
> +static struct attribute *cpumask_attrs[] = {
> +	&dev_attr_cpumask.attr,
> +	NULL,
> +};
> +
> +static struct attribute_group cpumask_attr_group = {
> +	.attrs = cpumask_attrs,
> +};
> +
>  static struct bin_attribute *if_bin_attrs[] = {
>  	&bin_attr_catalog,
>  	NULL,
> @@ -1143,6 +1160,11 @@ static const struct attribute_group *attr_groups[] = {
>  	&event_desc_group,
>  	&event_long_desc_group,
>  	&if_group,
> +	/*
> +	 * This NULL is a placeholder for the cpumask attr which will update
> +	 * onlyif cpuhotplug registration is successful
> +	 */
> +	NULL,
>  	NULL,
>  };
> 
> @@ -1727,8 +1749,15 @@ static int hv_24x7_init(void)
> 
>  	/* init cpuhotplug */
>  	r = hv_24x7_cpu_hotplug_init();
> -	if (r)
> +	if (r) {
>  		pr_err("hv_24x7: CPU hotplug init failed\n");
> +	} else {
> +		/*
> +		 * Cpu hotplug init is successful, add the
> +		 * cpumask file as part of pmu attr group
> +		 */
> +		attr_groups[5] = &cpumask_attr_group;

Since this is only a one-time initialization, wouldn't it be safer to
iterate through attr_groups[] and assin cpumask_attr_group to the
first NULL location ?

> +	}
> 
>  	r = perf_pmu_register(&h_24x7_pmu, h_24x7_pmu.name, -1);
>  	if (r)
> -- 
> 2.18.2
> 

^ permalink raw reply

* Re: [PATCH 1/2] powerpc/perf/hv-24x7: Add cpu hotplug support
From: Gautham R Shenoy @ 2020-06-19  4:58 UTC (permalink / raw)
  To: Kajol Jain; +Cc: nathanl, maddy, suka, anju, linuxppc-dev
In-Reply-To: <20200618122713.9030-2-kjain@linux.ibm.com>

Hello Kajol,

On Thu, Jun 18, 2020 at 05:57:12PM +0530, Kajol Jain wrote:
> Patch here adds cpu hotplug functions to hv_24x7 pmu.
> A new cpuhp_state "CPUHP_AP_PERF_POWERPC_HV_24x7_ONLINE" enum
> is added.
> 
> The online function update the cpumask only if its NULL.
> As the primary intention for adding hotplug support
> is to desiginate a CPU to make HCALL to collect the
> count data.
> 
> The offline function test and clear corresponding cpu in a cpumask
> and update cpumask to any other active cpu.
> 
> With this patchset, perf tool side does not need "-C <cpu>"
> to be added.
> 
> Signed-off-by: Kajol Jain <kjain@linux.ibm.com>
> ---
>  arch/powerpc/perf/hv-24x7.c | 45 +++++++++++++++++++++++++++++++++++++
>  include/linux/cpuhotplug.h  |  1 +
>  2 files changed, 46 insertions(+)
> 
> diff --git a/arch/powerpc/perf/hv-24x7.c b/arch/powerpc/perf/hv-24x7.c
> index db213eb7cb02..fdc4ae155d60 100644
> --- a/arch/powerpc/perf/hv-24x7.c
> +++ b/arch/powerpc/perf/hv-24x7.c
> @@ -31,6 +31,8 @@ static int interface_version;
>  /* Whether we have to aggregate result data for some domains. */
>  static bool aggregate_result_elements;
> 
> +static cpumask_t hv_24x7_cpumask;
> +
>  static bool domain_is_valid(unsigned domain)
>  {
>  	switch (domain) {
> @@ -1641,6 +1643,44 @@ static struct pmu h_24x7_pmu = {
>  	.capabilities = PERF_PMU_CAP_NO_EXCLUDE,
>  };
> 
> +static int ppc_hv_24x7_cpu_online(unsigned int cpu)
> +{
> +	/* Make this CPU the designated target for counter collection */
> +	if (cpumask_empty(&hv_24x7_cpumask))
> +		cpumask_set_cpu(cpu, &hv_24x7_cpumask);
> +
> +	return 0;
> +}
> +
> +static int ppc_hv_24x7_cpu_offline(unsigned int cpu)
> +{
> +	int target = -1;
> +
> +	/* Check if exiting cpu is used for collecting 24x7 events */
> +	if (!cpumask_test_and_clear_cpu(cpu, &hv_24x7_cpumask))
> +		return 0;
> +
> +	/* Find a new cpu to collect 24x7 events */
> +	target = cpumask_any_but(cpu_active_mask, cpu);

cpumask_any_but() typically picks the first CPU in cpu_active_mask
that is not @cpu.


> +
> +	if (target < 0 || target >= nr_cpu_ids)
> +		return -1;
> +
> +	/* Migrate 24x7 events to the new target */
> +	cpumask_set_cpu(target, &hv_24x7_cpumask);
> +	perf_pmu_migrate_context(&h_24x7_pmu, cpu, target);


On a system with N CPUs numbered [O..N-1], can you please verify if
the time required to sequentially offline CPUs [0..N-2] ,in that
order, increase with this patch ?

I am asking this because we have encountered this problem once before
at a customer site and the commit 9c9f8fb71fee ("powerpc/perf: Use
cpumask_last() to determine the designated cpu for nest/core units.")
was introduced to fix that problem.

> +
> +	return 0;
> +}
> +
> +static int hv_24x7_cpu_hotplug_init(void)
> +{
> +	return cpuhp_setup_state(CPUHP_AP_PERF_POWERPC_HV_24x7_ONLINE,
> +			  "perf/powerpc/hv_24x7:online",
> +			  ppc_hv_24x7_cpu_online,
> +			  ppc_hv_24x7_cpu_offline);
> +}
> +
>  static int hv_24x7_init(void)
>  {
>  	int r;
> @@ -1685,6 +1725,11 @@ static int hv_24x7_init(void)
>  	if (r)
>  		return r;
> 
> +	/* init cpuhotplug */
> +	r = hv_24x7_cpu_hotplug_init();
> +	if (r)
> +		pr_err("hv_24x7: CPU hotplug init failed\n");
> +
>  	r = perf_pmu_register(&h_24x7_pmu, h_24x7_pmu.name, -1);
>  	if (r)
>  		return r;
> diff --git a/include/linux/cpuhotplug.h b/include/linux/cpuhotplug.h
> index 8377afef8806..16ed8f6f8774 100644
> --- a/include/linux/cpuhotplug.h
> +++ b/include/linux/cpuhotplug.h
> @@ -180,6 +180,7 @@ enum cpuhp_state {
>  	CPUHP_AP_PERF_POWERPC_CORE_IMC_ONLINE,
>  	CPUHP_AP_PERF_POWERPC_THREAD_IMC_ONLINE,
>  	CPUHP_AP_PERF_POWERPC_TRACE_IMC_ONLINE,
> +	CPUHP_AP_PERF_POWERPC_HV_24x7_ONLINE,
>  	CPUHP_AP_WATCHDOG_ONLINE,
>  	CPUHP_AP_WORKQUEUE_ONLINE,
>  	CPUHP_AP_RCUTREE_ONLINE,
> -- 
> 2.18.2
> 

^ permalink raw reply

* Re: [PATCHv2] tpm: ibmvtpm: Wait for ready buffer before probing for TPM2 attributes
From: Jerry Snitselaar @ 2020-06-19  3:42 UTC (permalink / raw)
  To: David Gibson
  Cc: nayna, linux-kernel, jarkko.sakkinen, jgg, paulus, peterhuewe,
	linuxppc-dev, linux-integrity, stefanb
In-Reply-To: <20200619033040.121412-1-david@gibson.dropbear.id.au>

On Fri Jun 19 20, David Gibson wrote:
>The tpm2_get_cc_attrs_tbl() call will result in TPM commands being issued,
>which will need the use of the internal command/response buffer.  But,
>we're issuing this *before* we've waited to make sure that buffer is
>allocated.
>
>This can result in intermittent failures to probe if the hypervisor / TPM
>implementation doesn't respond quickly enough.  I find it fails almost
>every time with an 8 vcpu guest under KVM with software emulated TPM.
>
>To fix it, just move the tpm2_get_cc_attrs_tlb() call after the
>existing code to wait for initialization, which will ensure the buffer
>is allocated.
>
>Fixes: 18b3670d79ae9 ("tpm: ibmvtpm: Add support for TPM2")
>Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
>---

Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com>

>
>Changes from v1:
> * Fixed a formatting error in the commit message
> * Added some more detail to the commit message
>
>drivers/char/tpm/tpm_ibmvtpm.c | 14 +++++++-------
> 1 file changed, 7 insertions(+), 7 deletions(-)
>
>diff --git a/drivers/char/tpm/tpm_ibmvtpm.c b/drivers/char/tpm/tpm_ibmvtpm.c
>index 09fe45246b8cc..994385bf37c0c 100644
>--- a/drivers/char/tpm/tpm_ibmvtpm.c
>+++ b/drivers/char/tpm/tpm_ibmvtpm.c
>@@ -683,13 +683,6 @@ static int tpm_ibmvtpm_probe(struct vio_dev *vio_dev,
> 	if (rc)
> 		goto init_irq_cleanup;
>
>-	if (!strcmp(id->compat, "IBM,vtpm20")) {
>-		chip->flags |= TPM_CHIP_FLAG_TPM2;
>-		rc = tpm2_get_cc_attrs_tbl(chip);
>-		if (rc)
>-			goto init_irq_cleanup;
>-	}
>-
> 	if (!wait_event_timeout(ibmvtpm->crq_queue.wq,
> 				ibmvtpm->rtce_buf != NULL,
> 				HZ)) {
>@@ -697,6 +690,13 @@ static int tpm_ibmvtpm_probe(struct vio_dev *vio_dev,
> 		goto init_irq_cleanup;
> 	}
>
>+	if (!strcmp(id->compat, "IBM,vtpm20")) {
>+		chip->flags |= TPM_CHIP_FLAG_TPM2;
>+		rc = tpm2_get_cc_attrs_tbl(chip);
>+		if (rc)
>+			goto init_irq_cleanup;
>+	}
>+
> 	return tpm_chip_register(chip);
> init_irq_cleanup:
> 	do {
>-- 
>2.26.2
>


^ permalink raw reply

* [PATCHv2] tpm: ibmvtpm: Wait for ready buffer before probing for TPM2 attributes
From: David Gibson @ 2020-06-19  3:30 UTC (permalink / raw)
  To: jarkko.sakkinen, stefanb
  Cc: nayna, linux-kernel, jgg, paulus, peterhuewe, linuxppc-dev,
	linux-integrity, David Gibson

The tpm2_get_cc_attrs_tbl() call will result in TPM commands being issued,
which will need the use of the internal command/response buffer.  But,
we're issuing this *before* we've waited to make sure that buffer is
allocated.

This can result in intermittent failures to probe if the hypervisor / TPM
implementation doesn't respond quickly enough.  I find it fails almost
every time with an 8 vcpu guest under KVM with software emulated TPM.

To fix it, just move the tpm2_get_cc_attrs_tlb() call after the
existing code to wait for initialization, which will ensure the buffer
is allocated.

Fixes: 18b3670d79ae9 ("tpm: ibmvtpm: Add support for TPM2")
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---

Changes from v1:
 * Fixed a formatting error in the commit message
 * Added some more detail to the commit message
 
drivers/char/tpm/tpm_ibmvtpm.c | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/drivers/char/tpm/tpm_ibmvtpm.c b/drivers/char/tpm/tpm_ibmvtpm.c
index 09fe45246b8cc..994385bf37c0c 100644
--- a/drivers/char/tpm/tpm_ibmvtpm.c
+++ b/drivers/char/tpm/tpm_ibmvtpm.c
@@ -683,13 +683,6 @@ static int tpm_ibmvtpm_probe(struct vio_dev *vio_dev,
 	if (rc)
 		goto init_irq_cleanup;
 
-	if (!strcmp(id->compat, "IBM,vtpm20")) {
-		chip->flags |= TPM_CHIP_FLAG_TPM2;
-		rc = tpm2_get_cc_attrs_tbl(chip);
-		if (rc)
-			goto init_irq_cleanup;
-	}
-
 	if (!wait_event_timeout(ibmvtpm->crq_queue.wq,
 				ibmvtpm->rtce_buf != NULL,
 				HZ)) {
@@ -697,6 +690,13 @@ static int tpm_ibmvtpm_probe(struct vio_dev *vio_dev,
 		goto init_irq_cleanup;
 	}
 
+	if (!strcmp(id->compat, "IBM,vtpm20")) {
+		chip->flags |= TPM_CHIP_FLAG_TPM2;
+		rc = tpm2_get_cc_attrs_tbl(chip);
+		if (rc)
+			goto init_irq_cleanup;
+	}
+
 	return tpm_chip_register(chip);
 init_irq_cleanup:
 	do {
-- 
2.26.2


^ permalink raw reply related

* Re: [PATCH V3 (RESEND) 0/3] arm64: Enable vmemmap mapping from device memory
From: Anshuman Khandual @ 2020-06-19  1:34 UTC (permalink / raw)
  To: Mike Rapoport
  Cc: Mark Rutland, Michal Hocko, linux-ia64, David Hildenbrand,
	Peter Zijlstra, Dave Hansen, linux-mm, Paul Mackerras,
	linux-riscv, Will Deacon, Thomas Gleixner, x86,
	Matthew Wilcox (Oracle), Ingo Molnar, Catalin Marinas, Fenghua Yu,
	Pavel Tatashin, Andy Lutomirski, Paul Walmsley, Dan Williams,
	linux-arm-kernel, Tony Luck, linux-kernel, Palmer Dabbelt,
	Andrew Morton, linuxppc-dev, Kirill A. Shutemov
In-Reply-To: <20200618085641.GE6493@linux.ibm.com>



On 06/18/2020 02:26 PM, Mike Rapoport wrote:
> On Thu, Jun 18, 2020 at 06:45:27AM +0530, Anshuman Khandual wrote:
>> This series enables vmemmap backing memory allocation from device memory
>> ranges on arm64. But before that, it enables vmemmap_populate_basepages()
>> and vmemmap_alloc_block_buf() to accommodate struct vmem_altmap based
>> alocation requests.
>>
>> This series applies on 5.8-rc1.
>>
>> Pending Question:
>>
>> altmap_alloc_block_buf() does not have any other remaining users in
>> the tree after this change. Should it be converted into a static
>> function and it's declaration be dropped from the header
>> (include/linux/mm.h). Avoided doing so because I was not sure if there
>> are any off-tree users or not.
> 
> Well, off-tree users probably have an active fork anyway so they could
> switch to vmemmap_alloc_block_buf()...

Sure, will make the function a static and remove it's declaration
from the header.

> 
> Regardless, can you please update Documentation/vm/memory-model.rst to
> keep it in sync with the code?
Sure, will do.

^ permalink raw reply

* Re: [PATCH v2 2/4] KVM: PPC: Book3S HV: track the state GFNs associated with secure VMs
From: Ram Pai @ 2020-06-19  1:16 UTC (permalink / raw)
  To: Laurent Dufour
  Cc: cclaudio, kvm-ppc, bharata, sathnaga, aneesh.kumar, sukadev,
	linuxppc-dev, bauerman, david
In-Reply-To: <7f5aea68-0cc5-6ae5-c30e-eee60eff5a92@linux.ibm.com>

On Thu, Jun 18, 2020 at 03:31:06PM +0200, Laurent Dufour wrote:
> Le 18/06/2020 à 11:19, Ram Pai a écrit :
> >

.snip..

> >************************************************************************
> >  1. States of a GFN
> >     ---------------
> >  The GFN can be in one of the following states.
> >diff --git a/arch/powerpc/kvm/book3s_64_mmu_radix.c b/arch/powerpc/kvm/book3s_64_mmu_radix.c

...snip...

> >index 803940d..3448459 100644
> >--- a/arch/powerpc/kvm/book3s_64_mmu_radix.c
> >+++ b/arch/powerpc/kvm/book3s_64_mmu_radix.c
> >@@ -1100,7 +1100,7 @@ void kvmppc_radix_flush_memslot(struct kvm *kvm,
> >  	unsigned int shift;
> >  	if (kvm->arch.secure_guest & KVMPPC_SECURE_INIT_START)
> >-		kvmppc_uvmem_drop_pages(memslot, kvm, true);
> >+		kvmppc_uvmem_drop_pages(memslot, kvm, true, false);
> 
> When reviewing the v1 of this series, I asked you the question about
> the fact that the call here is made with purge_gfn = false. Your
> answer was:
> 
> >This function does not know, under what context it is called. Since
> >its job is to just flush the memslot, it cannot assume anything
> >about purging the pages in the memslot.
> 
> Indeed in the case of the memory hotplug operation, this function is
> called to wipe the page from the secure device in the case the pages
> are secured. In that case the purge is required. Indeed, I checked
> the other call to kvmppc_radix_flush_memslot() in
> kvmppc_core_flush_memslot_hv() and I cannot see why in that case too
> purge_gfn should be false, especially when the memslot is reused as
> detailed in __kvm_set_memory_region() around the call to
> kvm_arch_flush_shadow_memslot().
> 
> I'm sorry to not have ask this earlier, but could you please elaborate on this?

You are right. kvmppc_radix_flush_memslot() is getting called everytime with
the intention of disassociating the memslot from that VM. Which implies,
the memslot is intended to be deleted and possibly reused.

I should be calling kvmppc_uvmem_drop_pages() with purge_gfn=true, here
aswell.

I expect some form of problem showing up in memhot-plug/unplug path.

RP


^ permalink raw reply

* Re: [PATCH v5 01/13] powerpc: Remove Xilinx PPC405/PPC440 support
From: Nick Desaulniers @ 2020-06-18 23:33 UTC (permalink / raw)
  To: Nathan Chancellor
  Cc: Robert Lippert, Arnd Bergmann, Michal Simek, LKML,
	clang-built-linux, Paul Mackerras, Greg Thelen, linuxppc-dev
In-Reply-To: <20200618031622.GA195@Ryzen-9-3900X.localdomain>

On Wed, Jun 17, 2020 at 8:16 PM Nathan Chancellor
<natechancellor@gmail.com> wrote:
>
> Admittedly, we really do not have many PowerPC experts in our
> organization

Not with that attitude. :P /s
https://www.youtube.com/watch?v=Kck_upSlx60

> so we are supporting it on a "best effort" basis, which
> often involves using whatever knowledge is floating around or can be
> gained from interactions such as this :) so thank you for that!

That said, Google does have a significant amount of POWER machines in
the fleet used by our "prodkernel" teams.  Anecdotally, I've been
getting cc'ed on more and more internal bugs regarding ppc+clang+linux
kernels.  "prodkernel" is still in the process of moving a massive
fleet over to clang built kernels, but ppc is very important to
Google.  ppc32...I'm not so sure...let me ask some of our ppc folks
more about which uarch's and word sizes they really care about...

--
Thanks,
~Nick Desaulniers

^ permalink raw reply

* [PATCH net v2] ibmveth: Fix max MTU limit
From: Thomas Falcon @ 2020-06-18 21:55 UTC (permalink / raw)
  To: kuba; +Cc: netdev, Thomas Falcon, linuxppc-dev

The max MTU limit defined for ibmveth is not accounting for
virtual ethernet buffer overhead, which is twenty-two additional
bytes set aside for the ethernet header and eight additional bytes
of an opaque handle reserved for use by the hypervisor. Update the
max MTU to reflect this overhead.

Signed-off-by: Thomas Falcon <tlfalcon@linux.ibm.com>
Fixes: d894be57ca92 ("ethernet: use net core MTU range checking in more drivers")
Fixes: 110447f8269a ("ethernet: fix min/max MTU typos")
---
v2: Include Fixes tags suggested by Jakub Kicisnki
---
 drivers/net/ethernet/ibm/ibmveth.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/ibm/ibmveth.c b/drivers/net/ethernet/ibm/ibmveth.c
index 96d36ae5049e..c5c732601e35 100644
--- a/drivers/net/ethernet/ibm/ibmveth.c
+++ b/drivers/net/ethernet/ibm/ibmveth.c
@@ -1715,7 +1715,7 @@ static int ibmveth_probe(struct vio_dev *dev, const struct vio_device_id *id)
 	}
 
 	netdev->min_mtu = IBMVETH_MIN_MTU;
-	netdev->max_mtu = ETH_MAX_MTU;
+	netdev->max_mtu = ETH_MAX_MTU - IBMVETH_BUFF_OH;
 
 	memcpy(netdev->dev_addr, mac_addr_p, ETH_ALEN);
 
-- 
2.26.2


^ permalink raw reply related

* [PATCH net] ibmvnic: continue to init in CRQ reset returns H_CLOSED
From: Dany Madden @ 2020-06-18 19:24 UTC (permalink / raw)
  To: davem; +Cc: Dany Madden, netdev, tlfalcon, linuxppc-dev

Continue the reset path when partner adapter is not ready or H_CLOSED is
returned from reset crq. This patch allows the CRQ init to proceed to
establish a valid CRQ for traffic to flow after reset.

Signed-off-by: Dany Madden <drt@linux.ibm.com>
---
 drivers/net/ethernet/ibm/ibmvnic.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/ibm/ibmvnic.c b/drivers/net/ethernet/ibm/ibmvnic.c
index 2baf7b3ff4cb..4b7cb483c47f 100644
--- a/drivers/net/ethernet/ibm/ibmvnic.c
+++ b/drivers/net/ethernet/ibm/ibmvnic.c
@@ -1971,13 +1971,18 @@ static int do_reset(struct ibmvnic_adapter *adapter,
 			release_sub_crqs(adapter, 1);
 		} else {
 			rc = ibmvnic_reset_crq(adapter);
-			if (!rc)
+			if (rc == H_CLOSED || rc == H_SUCCESS) {
 				rc = vio_enable_interrupts(adapter->vdev);
+				if (rc)
+					netdev_err(adapter->netdev,
+						   "Reset failed to enable interrupts. rc=%d\n",
+						   rc);
+			}
 		}
 
 		if (rc) {
 			netdev_err(adapter->netdev,
-				   "Couldn't initialize crq. rc=%d\n", rc);
+				   "Reset couldn't initialize crq. rc=%d\n", rc);
 			goto out;
 		}
 
-- 
2.18.2


^ permalink raw reply related

* [PATCH] pci: pcie: AER: Fix logging of Correctable errors
From: Matt Jolly @ 2020-06-18 15:55 UTC (permalink / raw)
  To: Russell Currey, Sam Bobroff, Oliver O'Halloran, Bjorn Helgaas,
	linuxppc-dev, linux-pci, linux-kernel
  Cc: Matt Jolly

The AER documentation indicates that correctable (severity=Corrected)
errors should be output as a warning so that users can filter these
errors if they choose to; This functionality does not appear to have been implemented.

This patch modifies the functions aer_print_error and __aer_print_error
to send correctable errors as a warning (pci_warn), rather than as an error (pci_err). It
partially addresses several bugs in relation to kernel message buffer
spam for misbehaving devices - the root cause (possibly device firmware?) isn't
addressed, but the dmesg output is less alarming for end users, and can
be filtered separately from uncorrectable errors. This should hopefully
reduce the need for users to disable AER to suppress corrected errors.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=201517
Link: https://bugzilla.kernel.org/show_bug.cgi?id=196183

Signed-off-by: Matt Jolly <Kangie@footclan.ninja>
---
 drivers/pci/pcie/aer.c | 36 ++++++++++++++++++++++++++----------
 1 file changed, 26 insertions(+), 10 deletions(-)

diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
index 3acf56683915..131ecc0df2cb 100644
--- a/drivers/pci/pcie/aer.c
+++ b/drivers/pci/pcie/aer.c
@@ -662,12 +662,18 @@ static void __aer_print_error(struct pci_dev *dev,
 			errmsg = i < ARRAY_SIZE(aer_uncorrectable_error_string) ?
 				aer_uncorrectable_error_string[i] : NULL;
 
-		if (errmsg)
-			pci_err(dev, "   [%2d] %-22s%s\n", i, errmsg,
-				info->first_error == i ? " (First)" : "");
-		else
+		if (errmsg) {
+			if (info->severity == AER_CORRECTABLE) {
+				pci_warn(dev, "   [%2d] %-22s%s\n", i, errmsg,
+					info->first_error == i ? " (First)" : "");
+			} else {
+				pci_err(dev, "   [%2d] %-22s%s\n", i, errmsg,
+					info->first_error == i ? " (First)" : "");
+			}
+		} else {
 			pci_err(dev, "   [%2d] Unknown Error Bit%s\n",
 				i, info->first_error == i ? " (First)" : "");
+		}
 	}
 	pci_dev_aer_stats_incr(dev, info);
 }
@@ -686,13 +692,23 @@ void aer_print_error(struct pci_dev *dev, struct aer_err_info *info)
 	layer = AER_GET_LAYER_ERROR(info->severity, info->status);
 	agent = AER_GET_AGENT(info->severity, info->status);
 
-	pci_err(dev, "PCIe Bus Error: severity=%s, type=%s, (%s)\n",
-		aer_error_severity_string[info->severity],
-		aer_error_layer[layer], aer_agent_string[agent]);
+	if  (info->severity == AER_CORRECTABLE) {
+		pci_warn(dev, "PCIe Bus Error: severity=%s, type=%s, (%s)\n",
+			aer_error_severity_string[info->severity],
+			aer_error_layer[layer], aer_agent_string[agent]);
 
-	pci_err(dev, "  device [%04x:%04x] error status/mask=%08x/%08x\n",
-		dev->vendor, dev->device,
-		info->status, info->mask);
+		pci_warn(dev, "  device [%04x:%04x] error status/mask=%08x/%08x\n",
+			dev->vendor, dev->device,
+			info->status, info->mask);
+	} else {
+		pci_err(dev, "PCIe Bus Error: severity=%s, type=%s, (%s)\n",
+			aer_error_severity_string[info->severity],
+			aer_error_layer[layer], aer_agent_string[agent]);
+
+		pci_err(dev, "  device [%04x:%04x] error status/mask=%08x/%08x\n",
+			dev->vendor, dev->device,
+			info->status, info->mask);
+	}
 
 	__aer_print_error(dev, info);
 
-- 
2.26.2


^ permalink raw reply related

* Re: rename probe_kernel_* and probe_user_*
From: Helge Deller @ 2020-06-18 20:51 UTC (permalink / raw)
  To: Linus Torvalds, Christoph Hellwig, Russell King, Tony Luck,
	Michael Ellerman
  Cc: linux-arch, linux-ia64, linux-parisc, the arch/x86 maintainers,
	Linux Kernel Mailing List, Andrew Morton, linuxppc-dev
In-Reply-To: <CAHk-=wjpnu=882iD9ck9Ywt6R1LYX_Hv-oS7dBMsWZwDRGZ5jA@mail.gmail.com>

On 18.06.20 21:48, Linus Torvalds wrote:
> [ Explicitly added architecture lists and developers to the cc to make
> this more visible ]
>
> On Wed, Jun 17, 2020 at 12:38 AM Christoph Hellwig <hch@lst.de> wrote:
>>
>> Andrew and I decided to drop the patches implementing your suggested
>> rename of the probe_kernel_* and probe_user_* helpers from -mm as there
>> were way to many conflicts.  After -rc1 might be a good time for this as
>> all the conflicts are resolved now.
>
> So I've merged this renaming now, together with my changes to make
> 'get_kernel_nofault()' look and act a lot more like 'get_user()'.
>
> It just felt wrong (and potentially dangerous) to me to have a
> 'get_kernel_nofault()' naming that implied semantics that we're all
> familiar with from 'get_user()', but acting very differently.
>
> But part of the fixups I made for the type checking are for
> architectures where I didn't even compile-test the end result. I
> looked at every case individually, and the patch looks sane, but I
> could have screwed something up.
>
> Basically, 'get_kernel_nofault()' doesn't do the same automagic type
> munging from the pointer to the target that 'get_user()' does, but at
> least now it checks that the types are superficially compatible.
> There should be build failures if they aren't, but I hopefully fixed
> everything up properly for all architectures.
>
> This email is partly to ask people to double-check, but partly just as
> a heads-up so that _if_ I screwed something up, you'll have the
> background and it won't take you by surprise.

Linus. thanks for the heads-up!
With your change it compiles cleanly on 32- and 64-bit parisc.

Helge

^ permalink raw reply

* Re: [PATCH net] ibmveth: Fix max MTU limit
From: Thomas Falcon @ 2020-06-18 20:51 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: netdev, linuxppc-dev
In-Reply-To: <20200618085722.110f3702@kicinski-fedora-PC1C0HJN>


On 6/18/20 10:57 AM, Jakub Kicinski wrote:
> On Thu, 18 Jun 2020 10:43:46 -0500 Thomas Falcon wrote:
>> The max MTU limit defined for ibmveth is not accounting for
>> virtual ethernet buffer overhead, which is twenty-two additional
>> bytes set aside for the ethernet header and eight additional bytes
>> of an opaque handle reserved for use by the hypervisor. Update the
>> max MTU to reflect this overhead.
>>
>> Signed-off-by: Thomas Falcon <tlfalcon@linux.ibm.com>
> How about
>
> Fixes: d894be57ca92 ("ethernet: use net core MTU range checking in more drivers")
> Fixes: 110447f8269a ("ethernet: fix min/max MTU typos")
>
> ?

Thanks, do you need me to send a v2 with those tags?

Tom


^ permalink raw reply

* Re: [PATCH v2 0/4] Migrate non-migrated pages of a SVM.
From: Ram Pai @ 2020-06-18 20:37 UTC (permalink / raw)
  To: kvm-ppc, linuxppc-dev
  Cc: ldufour, cclaudio, bharata, sathnaga, aneesh.kumar, sukadev,
	bauerman, david
In-Reply-To: <1592471945-24786-1-git-send-email-linuxram@us.ibm.com>

I should have elaborated on the problem and the need for these patches.

Explaining it here. Will add it to the series in next version.

-------------------------------------------------------------

The time taken to switch a VM to Secure-VM, increases by the size of
the VM.  A 100GB VM takes about 7minutes. This is unacceptable.  This
linear increase is caused by a suboptimal behavior by the Ultravisor and
the Hypervisor.  The Ultravisor unnecessarily migrates all the GFN of
the VM from normal-memory to secure-memory. It has to just migrate the
necessary and sufficient GFNs.

However when the optimization is incorporated in the Ultravisor, the
Hypervisor starts misbehaving. The Hypervisor has a inbuilt assumption
that the Ultravisor will explicitly request to migrate, each and every
GFN of the VM. If only necessary and sufficient GFNs are requested for
migration, the Hypervisor continues to manage the rest of the GFNs are
normal GFNs. This leads of memory corruption, manifested consistently
when the SVM reboots.

The same is true, when a memory slot is hotplugged into a SVM. The
Hypervisor expects the ultravisor to request migration of all GFNs to
secure-GFN.  But at the same time the hypervisor is unable to handle any
H_SVM_PAGE_IN requests from the Ultravisor, done in the context of
UV_REGISTER_MEM_SLOT ucall.  This problem manifests as random errors in
the SVM, when a memory-slot is hotplugged.

This patch series automatically migrates the non-migrated pages of a SVM,
and thus solves the problem.

------------------------------------------------------------------



On Thu, Jun 18, 2020 at 02:19:01AM -0700, Ram Pai wrote:
> This patch series migrates the non-migrated pages of a SVM.
> This is required when the UV calls H_SVM_INIT_DONE, and
> when a memory-slot is hotplugged to a Secure VM.
> 
> Testing: Passed rigorous SVM reboot test using different
> 	sized SVMs.
> 
> Changelog:
> 	. fixed a bug observed by Bharata. Pages that
> 	where paged-in and later paged-out must also be
> 	skipped from migration during H_SVM_INIT_DONE.
> 
> Laurent Dufour (1):
>   KVM: PPC: Book3S HV: migrate hot plugged memory
> 
> Ram Pai (3):
>   KVM: PPC: Book3S HV: Fix function definition in book3s_hv_uvmem.c
>   KVM: PPC: Book3S HV: track the state GFNs associated with secure VMs
>   KVM: PPC: Book3S HV: migrate remaining normal-GFNs to secure-GFNs in
>     H_SVM_INIT_DONE
> 
>  Documentation/powerpc/ultravisor.rst        |   2 +
>  arch/powerpc/include/asm/kvm_book3s_uvmem.h |   8 +-
>  arch/powerpc/kvm/book3s_64_mmu_radix.c      |   2 +-
>  arch/powerpc/kvm/book3s_hv.c                |  12 +-
>  arch/powerpc/kvm/book3s_hv_uvmem.c          | 449 ++++++++++++++++++++++------
>  5 files changed, 368 insertions(+), 105 deletions(-)
> 
> -- 
> 1.8.3.1

-- 
Ram Pai

^ permalink raw reply

* Re: rename probe_kernel_* and probe_user_*
From: Linus Torvalds @ 2020-06-18 19:48 UTC (permalink / raw)
  To: Christoph Hellwig, Russell King, Tony Luck, Helge Deller,
	Michael Ellerman
  Cc: linux-arch, linux-ia64, linux-parisc, the arch/x86 maintainers,
	Linux Kernel Mailing List, Andrew Morton, linuxppc-dev
In-Reply-To: <20200617073755.8068-1-hch@lst.de>

[ Explicitly added architecture lists and developers to the cc to make
this more visible ]

On Wed, Jun 17, 2020 at 12:38 AM Christoph Hellwig <hch@lst.de> wrote:
>
> Andrew and I decided to drop the patches implementing your suggested
> rename of the probe_kernel_* and probe_user_* helpers from -mm as there
> were way to many conflicts.  After -rc1 might be a good time for this as
> all the conflicts are resolved now.

So I've merged this renaming now, together with my changes to make
'get_kernel_nofault()' look and act a lot more like 'get_user()'.

It just felt wrong (and potentially dangerous) to me to have a
'get_kernel_nofault()' naming that implied semantics that we're all
familiar with from 'get_user()', but acting very differently.

But part of the fixups I made for the type checking are for
architectures where I didn't even compile-test the end result. I
looked at every case individually, and the patch looks sane, but I
could have screwed something up.

Basically, 'get_kernel_nofault()' doesn't do the same automagic type
munging from the pointer to the target that 'get_user()' does, but at
least now it checks that the types are superficially compatible.
There should be build failures if they aren't, but I hopefully fixed
everything up properly for all architectures.

This email is partly to ask people to double-check, but partly just as
a heads-up so that _if_ I screwed something up, you'll have the
background and it won't take you by surprise.

               Linus

^ permalink raw reply

* Re: [PATCH v2 02/12] ocxl: Change type of pasid to unsigned int
From: Frederic Barrat @ 2020-06-18 16:56 UTC (permalink / raw)
  To: Fenghua Yu
  Cc: Dave Hansen, H Peter Anvin, Dave Jiang, Ashok Raj, Joerg Roedel,
	x86, amd-gfx, Ingo Molnar, Ravi V Shankar, Yu-cheng Yu,
	Andrew Donnellan, Borislav Petkov, Sohil Mehta, Thomas Gleixner,
	Tony Luck, David Woodhouse, Felix Kuehling, linux-kernel, iommu,
	Jacob Jun Pan, linuxppc-dev, Lu Baolu
In-Reply-To: <20200618153747.GE15763@romley-ivt3.sc.intel.com>



Le 18/06/2020 à 17:37, Fenghua Yu a écrit :
> The first 3 patches clean up pasid and flag defitions to prepare for
> following patches.
> 
> If you think this patch can be dropped, we will drop it.

Yes, I think that's the case.

Thanks,

  Fred

^ permalink raw reply

* [PATCH v1 5/5] KVM: PPC: Book3S HV: Use H_RPT_INVALIDATE in nested KVM
From: Bharata B Rao @ 2020-06-18 16:09 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: aneesh.kumar, Bharata B Rao, npiggin
In-Reply-To: <20200618160930.26324-1-bharata@linux.ibm.com>

In the nested KVM case, replace H_TLB_INVALIDATE by the new hcall
H_RPT_INVALIDATE if available. The availability of this hcall
is determined from "hcall-rpt-invalidate" string in ibm,hypertas-functions
DT property.

Signed-off-by: Bharata B Rao <bharata@linux.ibm.com>
---
 arch/powerpc/include/asm/firmware.h       |  4 +++-
 arch/powerpc/kvm/book3s_64_mmu_radix.c    | 27 ++++++++++++++++++-----
 arch/powerpc/kvm/book3s_hv_nested.c       | 13 +++++++++--
 arch/powerpc/platforms/pseries/firmware.c |  1 +
 4 files changed, 36 insertions(+), 9 deletions(-)

diff --git a/arch/powerpc/include/asm/firmware.h b/arch/powerpc/include/asm/firmware.h
index 6003c2e533a0..aa6a5ef5d483 100644
--- a/arch/powerpc/include/asm/firmware.h
+++ b/arch/powerpc/include/asm/firmware.h
@@ -52,6 +52,7 @@
 #define FW_FEATURE_PAPR_SCM 	ASM_CONST(0x0000002000000000)
 #define FW_FEATURE_ULTRAVISOR	ASM_CONST(0x0000004000000000)
 #define FW_FEATURE_STUFF_TCE	ASM_CONST(0x0000008000000000)
+#define FW_FEATURE_RPT_INVALIDATE ASM_CONST(0x0000010000000000)
 
 #ifndef __ASSEMBLY__
 
@@ -71,7 +72,8 @@ enum {
 		FW_FEATURE_TYPE1_AFFINITY | FW_FEATURE_PRRN |
 		FW_FEATURE_HPT_RESIZE | FW_FEATURE_DRMEM_V2 |
 		FW_FEATURE_DRC_INFO | FW_FEATURE_BLOCK_REMOVE |
-		FW_FEATURE_PAPR_SCM | FW_FEATURE_ULTRAVISOR,
+		FW_FEATURE_PAPR_SCM | FW_FEATURE_ULTRAVISOR |
+		FW_FEATURE_RPT_INVALIDATE,
 	FW_FEATURE_PSERIES_ALWAYS = 0,
 	FW_FEATURE_POWERNV_POSSIBLE = FW_FEATURE_OPAL | FW_FEATURE_ULTRAVISOR,
 	FW_FEATURE_POWERNV_ALWAYS = 0,
diff --git a/arch/powerpc/kvm/book3s_64_mmu_radix.c b/arch/powerpc/kvm/book3s_64_mmu_radix.c
index 84acb4769487..fcf8b031a32e 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_radix.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_radix.c
@@ -21,6 +21,7 @@
 #include <asm/pte-walk.h>
 #include <asm/ultravisor.h>
 #include <asm/kvm_book3s_uvmem.h>
+#include <asm/plpar_wrappers.h>
 
 /*
  * Supported radix tree geometry.
@@ -313,10 +314,17 @@ void kvmppc_radix_tlbie_page(struct kvm *kvm, unsigned long addr,
 		return;
 	}
 
-	psi = shift_to_mmu_psize(pshift);
-	rb = addr | (mmu_get_ap(psi) << PPC_BITLSHIFT(58));
-	rc = plpar_hcall_norets(H_TLB_INVALIDATE, H_TLBIE_P1_ENC(0, 0, 1),
-				lpid, rb);
+	if (!firmware_has_feature(FW_FEATURE_RPT_INVALIDATE)) {
+		psi = shift_to_mmu_psize(pshift);
+		rb = addr | (mmu_get_ap(psi) << PPC_BITLSHIFT(58));
+		rc = plpar_hcall_norets(H_TLB_INVALIDATE,
+					H_TLBIE_P1_ENC(0, 0, 1), lpid, rb);
+	} else {
+		rc = pseries_rpt_invalidate(lpid, H_RPTI_TARGET_CMMU,
+					    H_RPTI_TYPE_NESTED |
+					    H_RPTI_TYPE_TLB, H_RPTI_PAGE_ALL,
+					    addr, addr + psize);
+	}
 	if (rc)
 		pr_err("KVM: TLB page invalidation hcall failed, rc=%ld\n", rc);
 }
@@ -330,8 +338,15 @@ static void kvmppc_radix_flush_pwc(struct kvm *kvm, unsigned int lpid)
 		return;
 	}
 
-	rc = plpar_hcall_norets(H_TLB_INVALIDATE, H_TLBIE_P1_ENC(1, 0, 1),
-				lpid, TLBIEL_INVAL_SET_LPID);
+	if (!firmware_has_feature(FW_FEATURE_RPT_INVALIDATE))
+		rc = plpar_hcall_norets(H_TLB_INVALIDATE,
+					H_TLBIE_P1_ENC(1, 0, 1),
+					lpid, TLBIEL_INVAL_SET_LPID);
+	else
+		rc = pseries_rpt_invalidate(lpid, H_RPTI_TARGET_CMMU,
+					    H_RPTI_TYPE_NESTED |
+					    H_RPTI_TYPE_PWC, H_RPTI_PAGE_ALL,
+					    0, -1UL);
 	if (rc)
 		pr_err("KVM: TLB PWC invalidation hcall failed, rc=%ld\n", rc);
 }
diff --git a/arch/powerpc/kvm/book3s_hv_nested.c b/arch/powerpc/kvm/book3s_hv_nested.c
index 75993f44519b..81f903284d34 100644
--- a/arch/powerpc/kvm/book3s_hv_nested.c
+++ b/arch/powerpc/kvm/book3s_hv_nested.c
@@ -19,6 +19,7 @@
 #include <asm/pgalloc.h>
 #include <asm/pte-walk.h>
 #include <asm/reg.h>
+#include <asm/plpar_wrappers.h>
 
 static struct patb_entry *pseries_partition_tb;
 
@@ -402,8 +403,16 @@ static void kvmhv_flush_lpid(unsigned int lpid)
 		return;
 	}
 
-	rc = plpar_hcall_norets(H_TLB_INVALIDATE, H_TLBIE_P1_ENC(2, 0, 1),
-				lpid, TLBIEL_INVAL_SET_LPID);
+	if (!firmware_has_feature(FW_FEATURE_RPT_INVALIDATE))
+		rc = plpar_hcall_norets(H_TLB_INVALIDATE,
+					H_TLBIE_P1_ENC(2, 0, 1),
+					lpid, TLBIEL_INVAL_SET_LPID);
+	else
+		rc = pseries_rpt_invalidate(lpid, H_RPTI_TARGET_CMMU,
+					    H_RPTI_TYPE_NESTED |
+					    H_RPTI_TYPE_TLB | H_RPTI_TYPE_PWC |
+					    H_RPTI_TYPE_PAT,
+					    H_RPTI_PAGE_ALL, 0, -1UL);
 	if (rc)
 		pr_err("KVM: TLB LPID invalidation hcall failed, rc=%ld\n", rc);
 }
diff --git a/arch/powerpc/platforms/pseries/firmware.c b/arch/powerpc/platforms/pseries/firmware.c
index 3e49cc23a97a..4c7b7f5a2ebc 100644
--- a/arch/powerpc/platforms/pseries/firmware.c
+++ b/arch/powerpc/platforms/pseries/firmware.c
@@ -65,6 +65,7 @@ hypertas_fw_features_table[] = {
 	{FW_FEATURE_HPT_RESIZE,		"hcall-hpt-resize"},
 	{FW_FEATURE_BLOCK_REMOVE,	"hcall-block-remove"},
 	{FW_FEATURE_PAPR_SCM,		"hcall-scm"},
+	{FW_FEATURE_RPT_INVALIDATE,	"hcall-rpt-invalidate"},
 };
 
 /* Build up the firmware features bitmask using the contents of
-- 
2.21.3


^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox