LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] KVM: PPC: Book3S HV: XIVE: Fix vCPU id sanity check
From: Greg Kurz @ 2020-11-30 12:19 UTC (permalink / raw)
  To: Paul Mackerras, Michael Ellerman
  Cc: linuxppc-dev, Cédric Le Goater, kvm-ppc, linux-kernel

Commit 062cfab7069f ("KVM: PPC: Book3S HV: XIVE: Make VP block size
configurable") updated kvmppc_xive_vcpu_id_valid() in a way that
allows userspace to trigger an assertion in skiboot and crash the host:

[  696.186248988,3] XIVE[ IC 08  ] eq_blk != vp_blk (0 vs. 1) for target 0x4300008c/0
[  696.186314757,0] Assert fail: hw/xive.c:2370:0
[  696.186342458,0] Aborting!
xive-kvCPU 0043 Backtrace:
 S: 0000000031e2b8f0 R: 0000000030013840   .backtrace+0x48
 S: 0000000031e2b990 R: 000000003001b2d0   ._abort+0x4c
 S: 0000000031e2ba10 R: 000000003001b34c   .assert_fail+0x34
 S: 0000000031e2ba90 R: 0000000030058984   .xive_eq_for_target.part.20+0xb0
 S: 0000000031e2bb40 R: 0000000030059fdc   .xive_setup_silent_gather+0x2c
 S: 0000000031e2bc20 R: 000000003005a334   .opal_xive_set_vp_info+0x124
 S: 0000000031e2bd20 R: 00000000300051a4   opal_entry+0x134
 --- OPAL call token: 0x8a caller R1: 0xc000001f28563850 ---

XIVE maintains the interrupt context state of non-dispatched vCPUs in
an internal VP structure. We allocate a bunch of those on startup to
accommodate all possible vCPUs. Each VP has an id, that we derive from
the vCPU id for efficiency:

static inline u32 kvmppc_xive_vp(struct kvmppc_xive *xive, u32 server)
{
	return xive->vp_base + kvmppc_pack_vcpu_id(xive->kvm, server);
}

The KVM XIVE device used to allocate KVM_MAX_VCPUS VPs. This was
limitting the number of concurrent VMs because the VP space is
limited on the HW. Since most of the time, VMs run with a lot less
vCPUs, commit 062cfab7069f ("KVM: PPC: Book3S HV: XIVE: Make VP
block size configurable") gave the possibility for userspace to
tune the size of the VP block through the KVM_DEV_XIVE_NR_SERVERS
attribute.

The check in kvmppc_pack_vcpu_id() was changed from

	cpu < KVM_MAX_VCPUS * xive->kvm->arch.emul_smt_mode

to

	cpu < xive->nr_servers * xive->kvm->arch.emul_smt_mode

The previous check was based on the fact that the VP block had
KVM_MAX_VCPUS entries and that kvmppc_pack_vcpu_id() guarantees
that packed vCPU ids are below KVM_MAX_VCPUS. We've changed the
size of the VP block, but kvmppc_pack_vcpu_id() has nothing to
do with it and it certainly doesn't ensure that the packed vCPU
ids are below xive->nr_servers. kvmppc_xive_vcpu_id_valid() might
thus return true when the VM was configured with a non-standard
VSMT mode, even if the packed vCPU id is higher than what we
expect. We end up using an unallocated VP id, which confuses
OPAL. The assert in OPAL is probably abusive and should be
converted to a regular error that the kernel can handle, but
we shouldn't really use broken VP ids in the first place.

Fix kvmppc_xive_vcpu_id_valid() so that it checks the packed
vCPU id is below xive->nr_servers, which is explicitly what we
want.

Fixes: 062cfab7069f ("KVM: PPC: Book3S HV: XIVE: Make VP block size configurable")
Cc: stable@vger.kernel.org # v5.5+
Signed-off-by: Greg Kurz <groug@kaod.org>
---
 arch/powerpc/kvm/book3s_xive.c |    7 ++-----
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_xive.c b/arch/powerpc/kvm/book3s_xive.c
index 85215e79db42..a0ebc29f30b2 100644
--- a/arch/powerpc/kvm/book3s_xive.c
+++ b/arch/powerpc/kvm/book3s_xive.c
@@ -1214,12 +1214,9 @@ void kvmppc_xive_cleanup_vcpu(struct kvm_vcpu *vcpu)
 static bool kvmppc_xive_vcpu_id_valid(struct kvmppc_xive *xive, u32 cpu)
 {
 	/* We have a block of xive->nr_servers VPs. We just need to check
-	 * raw vCPU ids are below the expected limit for this guest's
-	 * core stride ; kvmppc_pack_vcpu_id() will pack them down to an
-	 * index that can be safely used to compute a VP id that belongs
-	 * to the VP block.
+	 * packed vCPU ids are below that.
 	 */
-	return cpu < xive->nr_servers * xive->kvm->arch.emul_smt_mode;
+	return kvmppc_pack_vcpu_id(xive->kvm, cpu) < xive->nr_servers;
 }
 
 int kvmppc_xive_compute_vp_id(struct kvmppc_xive *xive, u32 cpu, u32 *vp)



^ permalink raw reply related

* Re: [PATCH v6 1/5] PCI: Unify ECAM constants in native PCI Express drivers
From: Lorenzo Pieralisi @ 2020-11-30 11:08 UTC (permalink / raw)
  To: Krzysztof Wilczyński
  Cc: Heiko Stuebner, Shawn Lin, Paul Mackerras, Thomas Petazzoni,
	Jonathan Chocron, Toan Le, Will Deacon, Rob Herring,
	Florian Fainelli, Michal Simek, linux-rockchip,
	bcm-kernel-feedback-list, Jonathan Derrick, linux-pci, Ray Jui,
	linux-rpi-kernel, Jonathan Cameron, Bjorn Helgaas,
	linux-arm-kernel, Scott Branden, Zhou Wang, Robert Richter,
	linuxppc-dev, Nicolas Saenz Julienne
In-Reply-To: <20201129230743.3006978-2-kw@linux.com>

On Sun, Nov 29, 2020 at 11:07:39PM +0000, Krzysztof Wilczyński wrote:
> Add ECAM-related constants to provide a set of standard constants
> defining memory address shift values to the byte-level address that can
> be used to access the PCI Express Configuration Space, and then move
> native PCI Express controller drivers to use the newly introduced
> definitions retiring driver-specific ones.
> 
> Refactor pci_ecam_map_bus() function to use newly added constants so
> that limits to the bus, device function and offset (now limited to 4K as
> per the specification) are in place to prevent the defective or
> malicious caller from supplying incorrect configuration offset and thus
> targeting the wrong device when accessing extended configuration space.
> This refactor also allows for the ".bus_shit" initialisers to be dropped
                                          ^^^^

Nice typo, I'd fix it while applying it though if you don't mind ;-),
no need to resend it.

Jokes aside, nice piece of work, thanks for that.

> when the user is not using a custom value as a default value will be
> used as per the PCI Express Specification.
> 
> Suggested-by: Bjorn Helgaas <bhelgaas@google.com>
> Signed-off-by: Krzysztof Wilczyński <kw@linux.com>

I think Bjorn's reviewed-by still stands so I will apply it.

Thanks !
Lorenzo

> ---
>  drivers/pci/controller/dwc/pcie-al.c        | 12 ++-------
>  drivers/pci/controller/dwc/pcie-hisi.c      |  2 --
>  drivers/pci/controller/pci-aardvark.c       | 13 +++-------
>  drivers/pci/controller/pci-host-generic.c   |  1 -
>  drivers/pci/controller/pci-thunder-ecam.c   |  1 -
>  drivers/pci/controller/pcie-brcmstb.c       | 16 ++----------
>  drivers/pci/controller/pcie-rockchip-host.c | 27 ++++++++++-----------
>  drivers/pci/controller/pcie-rockchip.h      |  8 +-----
>  drivers/pci/controller/pcie-tango.c         |  1 -
>  drivers/pci/controller/pcie-xilinx-nwl.c    |  9 ++-----
>  drivers/pci/controller/pcie-xilinx.c        | 11 ++-------
>  drivers/pci/controller/vmd.c                | 11 ++++-----
>  drivers/pci/ecam.c                          | 23 ++++++++++++------
>  include/linux/pci-ecam.h                    | 27 +++++++++++++++++++++
>  14 files changed, 73 insertions(+), 89 deletions(-)
> 
> diff --git a/drivers/pci/controller/dwc/pcie-al.c b/drivers/pci/controller/dwc/pcie-al.c
> index f973fbca90cf..af9e51ab1af8 100644
> --- a/drivers/pci/controller/dwc/pcie-al.c
> +++ b/drivers/pci/controller/dwc/pcie-al.c
> @@ -76,7 +76,6 @@ static int al_pcie_init(struct pci_config_window *cfg)
>  }
>  
>  const struct pci_ecam_ops al_pcie_ops = {
> -	.bus_shift    = 20,
>  	.init         =  al_pcie_init,
>  	.pci_ops      = {
>  		.map_bus    = al_pcie_map_bus,
> @@ -138,8 +137,6 @@ struct al_pcie {
>  	struct al_pcie_target_bus_cfg target_bus_cfg;
>  };
>  
> -#define PCIE_ECAM_DEVFN(x)		(((x) & 0xff) << 12)
> -
>  #define to_al_pcie(x)		dev_get_drvdata((x)->dev)
>  
>  static inline u32 al_pcie_controller_readl(struct al_pcie *pcie, u32 offset)
> @@ -226,11 +223,6 @@ static void __iomem *al_pcie_conf_addr_map_bus(struct pci_bus *bus,
>  	struct al_pcie_target_bus_cfg *target_bus_cfg = &pcie->target_bus_cfg;
>  	unsigned int busnr_ecam = busnr & target_bus_cfg->ecam_mask;
>  	unsigned int busnr_reg = busnr & target_bus_cfg->reg_mask;
> -	void __iomem *pci_base_addr;
> -
> -	pci_base_addr = (void __iomem *)((uintptr_t)pp->va_cfg0_base +
> -					 (busnr_ecam << 20) +
> -					 PCIE_ECAM_DEVFN(devfn));
>  
>  	if (busnr_reg != target_bus_cfg->reg_val) {
>  		dev_dbg(pcie->pci->dev, "Changing target bus busnum val from 0x%x to 0x%x\n",
> @@ -241,7 +233,7 @@ static void __iomem *al_pcie_conf_addr_map_bus(struct pci_bus *bus,
>  				       target_bus_cfg->reg_mask);
>  	}
>  
> -	return pci_base_addr + where;
> +	return pp->va_cfg0_base + PCIE_ECAM_OFFSET(busnr_ecam, devfn, where);
>  }
>  
>  static struct pci_ops al_child_pci_ops = {
> @@ -264,7 +256,7 @@ static void al_pcie_config_prepare(struct al_pcie *pcie)
>  
>  	target_bus_cfg = &pcie->target_bus_cfg;
>  
> -	ecam_bus_mask = (pcie->ecam_size >> 20) - 1;
> +	ecam_bus_mask = (pcie->ecam_size >> PCIE_ECAM_BUS_SHIFT) - 1;
>  	if (ecam_bus_mask > 255) {
>  		dev_warn(pcie->dev, "ECAM window size is larger than 256MB. Cutting off at 256\n");
>  		ecam_bus_mask = 255;
> diff --git a/drivers/pci/controller/dwc/pcie-hisi.c b/drivers/pci/controller/dwc/pcie-hisi.c
> index 5ca86796d43a..8fc5960faf28 100644
> --- a/drivers/pci/controller/dwc/pcie-hisi.c
> +++ b/drivers/pci/controller/dwc/pcie-hisi.c
> @@ -100,7 +100,6 @@ static int hisi_pcie_init(struct pci_config_window *cfg)
>  }
>  
>  const struct pci_ecam_ops hisi_pcie_ops = {
> -	.bus_shift    = 20,
>  	.init         =  hisi_pcie_init,
>  	.pci_ops      = {
>  		.map_bus    = hisi_pcie_map_bus,
> @@ -135,7 +134,6 @@ static int hisi_pcie_platform_init(struct pci_config_window *cfg)
>  }
>  
>  static const struct pci_ecam_ops hisi_pcie_platform_ops = {
> -	.bus_shift    = 20,
>  	.init         =  hisi_pcie_platform_init,
>  	.pci_ops      = {
>  		.map_bus    = hisi_pcie_map_bus,
> diff --git a/drivers/pci/controller/pci-aardvark.c b/drivers/pci/controller/pci-aardvark.c
> index 0be485a25327..1043e54c73bd 100644
> --- a/drivers/pci/controller/pci-aardvark.c
> +++ b/drivers/pci/controller/pci-aardvark.c
> @@ -16,6 +16,7 @@
>  #include <linux/kernel.h>
>  #include <linux/module.h>
>  #include <linux/pci.h>
> +#include <linux/pci-ecam.h>
>  #include <linux/init.h>
>  #include <linux/phy/phy.h>
>  #include <linux/platform_device.h>
> @@ -164,14 +165,6 @@
>  #define PCIE_CONFIG_WR_TYPE0			0xa
>  #define PCIE_CONFIG_WR_TYPE1			0xb
>  
> -#define PCIE_CONF_BUS(bus)			(((bus) & 0xff) << 20)
> -#define PCIE_CONF_DEV(dev)			(((dev) & 0x1f) << 15)
> -#define PCIE_CONF_FUNC(fun)			(((fun) & 0x7)	<< 12)
> -#define PCIE_CONF_REG(reg)			((reg) & 0xffc)
> -#define PCIE_CONF_ADDR(bus, devfn, where)	\
> -	(PCIE_CONF_BUS(bus) | PCIE_CONF_DEV(PCI_SLOT(devfn))	| \
> -	 PCIE_CONF_FUNC(PCI_FUNC(devfn)) | PCIE_CONF_REG(where))
> -
>  #define PIO_RETRY_CNT			500
>  #define PIO_RETRY_DELAY			2 /* 2 us*/
>  
> @@ -687,7 +680,7 @@ static int advk_pcie_rd_conf(struct pci_bus *bus, u32 devfn,
>  	advk_writel(pcie, reg, PIO_CTRL);
>  
>  	/* Program the address registers */
> -	reg = PCIE_CONF_ADDR(bus->number, devfn, where);
> +	reg = ALIGN_DOWN(PCIE_ECAM_OFFSET(bus->number, devfn, where), 4);
>  	advk_writel(pcie, reg, PIO_ADDR_LS);
>  	advk_writel(pcie, 0, PIO_ADDR_MS);
>  
> @@ -748,7 +741,7 @@ static int advk_pcie_wr_conf(struct pci_bus *bus, u32 devfn,
>  	advk_writel(pcie, reg, PIO_CTRL);
>  
>  	/* Program the address registers */
> -	reg = PCIE_CONF_ADDR(bus->number, devfn, where);
> +	reg = ALIGN_DOWN(PCIE_ECAM_OFFSET(bus->number, devfn, where), 4);
>  	advk_writel(pcie, reg, PIO_ADDR_LS);
>  	advk_writel(pcie, 0, PIO_ADDR_MS);
>  
> diff --git a/drivers/pci/controller/pci-host-generic.c b/drivers/pci/controller/pci-host-generic.c
> index b51977abfdf1..63865aeb636b 100644
> --- a/drivers/pci/controller/pci-host-generic.c
> +++ b/drivers/pci/controller/pci-host-generic.c
> @@ -49,7 +49,6 @@ static void __iomem *pci_dw_ecam_map_bus(struct pci_bus *bus,
>  }
>  
>  static const struct pci_ecam_ops pci_dw_ecam_bus_ops = {
> -	.bus_shift	= 20,
>  	.pci_ops	= {
>  		.map_bus	= pci_dw_ecam_map_bus,
>  		.read		= pci_generic_config_read,
> diff --git a/drivers/pci/controller/pci-thunder-ecam.c b/drivers/pci/controller/pci-thunder-ecam.c
> index 7e8835fee5f7..f964fd26f7e0 100644
> --- a/drivers/pci/controller/pci-thunder-ecam.c
> +++ b/drivers/pci/controller/pci-thunder-ecam.c
> @@ -346,7 +346,6 @@ static int thunder_ecam_config_write(struct pci_bus *bus, unsigned int devfn,
>  }
>  
>  const struct pci_ecam_ops pci_thunder_ecam_ops = {
> -	.bus_shift	= 20,
>  	.pci_ops	= {
>  		.map_bus        = pci_ecam_map_bus,
>  		.read           = thunder_ecam_config_read,
> diff --git a/drivers/pci/controller/pcie-brcmstb.c b/drivers/pci/controller/pcie-brcmstb.c
> index bea86899bd5d..7fc80fd6f13f 100644
> --- a/drivers/pci/controller/pcie-brcmstb.c
> +++ b/drivers/pci/controller/pcie-brcmstb.c
> @@ -22,6 +22,7 @@
>  #include <linux/of_pci.h>
>  #include <linux/of_platform.h>
>  #include <linux/pci.h>
> +#include <linux/pci-ecam.h>
>  #include <linux/printk.h>
>  #include <linux/reset.h>
>  #include <linux/sizes.h>
> @@ -127,11 +128,7 @@
>  #define  MSI_INT_MASK_CLR		0x14
>  
>  #define PCIE_EXT_CFG_DATA				0x8000
> -
>  #define PCIE_EXT_CFG_INDEX				0x9000
> -#define  PCIE_EXT_BUSNUM_SHIFT				20
> -#define  PCIE_EXT_SLOT_SHIFT				15
> -#define  PCIE_EXT_FUNC_SHIFT				12
>  
>  #define  PCIE_RGR1_SW_INIT_1_PERST_MASK			0x1
>  #define  PCIE_RGR1_SW_INIT_1_PERST_SHIFT		0x0
> @@ -695,15 +692,6 @@ static bool brcm_pcie_link_up(struct brcm_pcie *pcie)
>  	return dla && plu;
>  }
>  
> -/* Configuration space read/write support */
> -static inline int brcm_pcie_cfg_index(int busnr, int devfn, int reg)
> -{
> -	return ((PCI_SLOT(devfn) & 0x1f) << PCIE_EXT_SLOT_SHIFT)
> -		| ((PCI_FUNC(devfn) & 0x07) << PCIE_EXT_FUNC_SHIFT)
> -		| (busnr << PCIE_EXT_BUSNUM_SHIFT)
> -		| (reg & ~3);
> -}
> -
>  static void __iomem *brcm_pcie_map_conf(struct pci_bus *bus, unsigned int devfn,
>  					int where)
>  {
> @@ -716,7 +704,7 @@ static void __iomem *brcm_pcie_map_conf(struct pci_bus *bus, unsigned int devfn,
>  		return PCI_SLOT(devfn) ? NULL : base + where;
>  
>  	/* For devices, write to the config space index register */
> -	idx = brcm_pcie_cfg_index(bus->number, devfn, 0);
> +	idx = PCIE_ECAM_OFFSET(bus->number, devfn, 0);
>  	writel(idx, pcie->base + PCIE_EXT_CFG_INDEX);
>  	return base + PCIE_EXT_CFG_DATA + where;
>  }
> diff --git a/drivers/pci/controller/pcie-rockchip-host.c b/drivers/pci/controller/pcie-rockchip-host.c
> index 9705059523a6..f1d08a1b1591 100644
> --- a/drivers/pci/controller/pcie-rockchip-host.c
> +++ b/drivers/pci/controller/pcie-rockchip-host.c
> @@ -157,12 +157,11 @@ static int rockchip_pcie_rd_other_conf(struct rockchip_pcie *rockchip,
>  				       struct pci_bus *bus, u32 devfn,
>  				       int where, int size, u32 *val)
>  {
> -	u32 busdev;
> +	void __iomem *addr;
>  
> -	busdev = PCIE_ECAM_ADDR(bus->number, PCI_SLOT(devfn),
> -				PCI_FUNC(devfn), where);
> +	addr = rockchip->reg_base + PCIE_ECAM_OFFSET(bus->number, devfn, where);
>  
> -	if (!IS_ALIGNED(busdev, size)) {
> +	if (!IS_ALIGNED((uintptr_t)addr, size)) {
>  		*val = 0;
>  		return PCIBIOS_BAD_REGISTER_NUMBER;
>  	}
> @@ -175,11 +174,11 @@ static int rockchip_pcie_rd_other_conf(struct rockchip_pcie *rockchip,
>  						AXI_WRAPPER_TYPE1_CFG);
>  
>  	if (size == 4) {
> -		*val = readl(rockchip->reg_base + busdev);
> +		*val = readl(addr);
>  	} else if (size == 2) {
> -		*val = readw(rockchip->reg_base + busdev);
> +		*val = readw(addr);
>  	} else if (size == 1) {
> -		*val = readb(rockchip->reg_base + busdev);
> +		*val = readb(addr);
>  	} else {
>  		*val = 0;
>  		return PCIBIOS_BAD_REGISTER_NUMBER;
> @@ -191,11 +190,11 @@ static int rockchip_pcie_wr_other_conf(struct rockchip_pcie *rockchip,
>  				       struct pci_bus *bus, u32 devfn,
>  				       int where, int size, u32 val)
>  {
> -	u32 busdev;
> +	void __iomem *addr;
>  
> -	busdev = PCIE_ECAM_ADDR(bus->number, PCI_SLOT(devfn),
> -				PCI_FUNC(devfn), where);
> -	if (!IS_ALIGNED(busdev, size))
> +	addr = rockchip->reg_base + PCIE_ECAM_OFFSET(bus->number, devfn, where);
> +
> +	if (!IS_ALIGNED((uintptr_t)addr, size))
>  		return PCIBIOS_BAD_REGISTER_NUMBER;
>  
>  	if (pci_is_root_bus(bus->parent))
> @@ -206,11 +205,11 @@ static int rockchip_pcie_wr_other_conf(struct rockchip_pcie *rockchip,
>  						AXI_WRAPPER_TYPE1_CFG);
>  
>  	if (size == 4)
> -		writel(val, rockchip->reg_base + busdev);
> +		writel(val, addr);
>  	else if (size == 2)
> -		writew(val, rockchip->reg_base + busdev);
> +		writew(val, addr);
>  	else if (size == 1)
> -		writeb(val, rockchip->reg_base + busdev);
> +		writeb(val, addr);
>  	else
>  		return PCIBIOS_BAD_REGISTER_NUMBER;
>  
> diff --git a/drivers/pci/controller/pcie-rockchip.h b/drivers/pci/controller/pcie-rockchip.h
> index c7d0178fc8c2..1650a5087450 100644
> --- a/drivers/pci/controller/pcie-rockchip.h
> +++ b/drivers/pci/controller/pcie-rockchip.h
> @@ -13,6 +13,7 @@
>  
>  #include <linux/kernel.h>
>  #include <linux/pci.h>
> +#include <linux/pci-ecam.h>
>  
>  /*
>   * The upper 16 bits of PCIE_CLIENT_CONFIG are a write mask for the lower 16
> @@ -178,13 +179,6 @@
>  #define MIN_AXI_ADDR_BITS_PASSED		8
>  #define PCIE_RC_SEND_PME_OFF			0x11960
>  #define ROCKCHIP_VENDOR_ID			0x1d87
> -#define PCIE_ECAM_BUS(x)			(((x) & 0xff) << 20)
> -#define PCIE_ECAM_DEV(x)			(((x) & 0x1f) << 15)
> -#define PCIE_ECAM_FUNC(x)			(((x) & 0x7) << 12)
> -#define PCIE_ECAM_REG(x)			(((x) & 0xfff) << 0)
> -#define PCIE_ECAM_ADDR(bus, dev, func, reg) \
> -	  (PCIE_ECAM_BUS(bus) | PCIE_ECAM_DEV(dev) | \
> -	   PCIE_ECAM_FUNC(func) | PCIE_ECAM_REG(reg))
>  #define PCIE_LINK_IS_L2(x) \
>  	(((x) & PCIE_CLIENT_DEBUG_LTSSM_MASK) == PCIE_CLIENT_DEBUG_LTSSM_L2)
>  #define PCIE_LINK_UP(x) \
> diff --git a/drivers/pci/controller/pcie-tango.c b/drivers/pci/controller/pcie-tango.c
> index d093a8ce4bb1..62a061f1d62e 100644
> --- a/drivers/pci/controller/pcie-tango.c
> +++ b/drivers/pci/controller/pcie-tango.c
> @@ -208,7 +208,6 @@ static int smp8759_config_write(struct pci_bus *bus, unsigned int devfn,
>  }
>  
>  static const struct pci_ecam_ops smp8759_ecam_ops = {
> -	.bus_shift	= 20,
>  	.pci_ops	= {
>  		.map_bus	= pci_ecam_map_bus,
>  		.read		= smp8759_config_read,
> diff --git a/drivers/pci/controller/pcie-xilinx-nwl.c b/drivers/pci/controller/pcie-xilinx-nwl.c
> index f3cf7d61924f..7f29c2fdcd51 100644
> --- a/drivers/pci/controller/pcie-xilinx-nwl.c
> +++ b/drivers/pci/controller/pcie-xilinx-nwl.c
> @@ -18,6 +18,7 @@
>  #include <linux/of_platform.h>
>  #include <linux/of_irq.h>
>  #include <linux/pci.h>
> +#include <linux/pci-ecam.h>
>  #include <linux/platform_device.h>
>  #include <linux/irqchip/chained_irq.h>
>  
> @@ -124,8 +125,6 @@
>  #define E_ECAM_CR_ENABLE		BIT(0)
>  #define E_ECAM_SIZE_LOC			GENMASK(20, 16)
>  #define E_ECAM_SIZE_SHIFT		16
> -#define ECAM_BUS_LOC_SHIFT		20
> -#define ECAM_DEV_LOC_SHIFT		12
>  #define NWL_ECAM_VALUE_DEFAULT		12
>  
>  #define CFG_DMA_REG_BAR			GENMASK(2, 0)
> @@ -240,15 +239,11 @@ static void __iomem *nwl_pcie_map_bus(struct pci_bus *bus, unsigned int devfn,
>  				      int where)
>  {
>  	struct nwl_pcie *pcie = bus->sysdata;
> -	int relbus;
>  
>  	if (!nwl_pcie_valid_device(bus, devfn))
>  		return NULL;
>  
> -	relbus = (bus->number << ECAM_BUS_LOC_SHIFT) |
> -			(devfn << ECAM_DEV_LOC_SHIFT);
> -
> -	return pcie->ecam_base + relbus + where;
> +	return pcie->ecam_base + PCIE_ECAM_OFFSET(bus->number, devfn, where);
>  }
>  
>  /* PCIe operations */
> diff --git a/drivers/pci/controller/pcie-xilinx.c b/drivers/pci/controller/pcie-xilinx.c
> index 8523be61bba5..fa5baeb82653 100644
> --- a/drivers/pci/controller/pcie-xilinx.c
> +++ b/drivers/pci/controller/pcie-xilinx.c
> @@ -21,6 +21,7 @@
>  #include <linux/of_platform.h>
>  #include <linux/of_irq.h>
>  #include <linux/pci.h>
> +#include <linux/pci-ecam.h>
>  #include <linux/platform_device.h>
>  
>  #include "../pci.h"
> @@ -86,10 +87,6 @@
>  /* Phy Status/Control Register definitions */
>  #define XILINX_PCIE_REG_PSCR_LNKUP	BIT(11)
>  
> -/* ECAM definitions */
> -#define ECAM_BUS_NUM_SHIFT		20
> -#define ECAM_DEV_NUM_SHIFT		12
> -
>  /* Number of MSI IRQs */
>  #define XILINX_NUM_MSI_IRQS		128
>  
> @@ -183,15 +180,11 @@ static void __iomem *xilinx_pcie_map_bus(struct pci_bus *bus,
>  					 unsigned int devfn, int where)
>  {
>  	struct xilinx_pcie_port *port = bus->sysdata;
> -	int relbus;
>  
>  	if (!xilinx_pcie_valid_device(bus, devfn))
>  		return NULL;
>  
> -	relbus = (bus->number << ECAM_BUS_NUM_SHIFT) |
> -		 (devfn << ECAM_DEV_NUM_SHIFT);
> -
> -	return port->reg_base + relbus + where;
> +	return port->reg_base + PCIE_ECAM_OFFSET(bus->number, devfn, where);
>  }
>  
>  /* PCIe operations */
> diff --git a/drivers/pci/controller/vmd.c b/drivers/pci/controller/vmd.c
> index f375c21ceeb1..1361a79bd1e7 100644
> --- a/drivers/pci/controller/vmd.c
> +++ b/drivers/pci/controller/vmd.c
> @@ -11,6 +11,7 @@
>  #include <linux/module.h>
>  #include <linux/msi.h>
>  #include <linux/pci.h>
> +#include <linux/pci-ecam.h>
>  #include <linux/srcu.h>
>  #include <linux/rculist.h>
>  #include <linux/rcupdate.h>
> @@ -328,15 +329,13 @@ static void vmd_remove_irq_domain(struct vmd_dev *vmd)
>  static char __iomem *vmd_cfg_addr(struct vmd_dev *vmd, struct pci_bus *bus,
>  				  unsigned int devfn, int reg, int len)
>  {
> -	char __iomem *addr = vmd->cfgbar +
> -			     ((bus->number - vmd->busn_start) << 20) +
> -			     (devfn << 12) + reg;
> +	unsigned int busnr_ecam = bus->number - vmd->busn_start;
> +	u32 offset = PCIE_ECAM_OFFSET(busnr_ecam, devfn, reg);
>  
> -	if ((addr - vmd->cfgbar) + len >=
> -	    resource_size(&vmd->dev->resource[VMD_CFGBAR]))
> +	if (offset + len >= resource_size(&vmd->dev->resource[VMD_CFGBAR]))
>  		return NULL;
>  
> -	return addr;
> +	return vmd->cfgbar + offset;
>  }
>  
>  /*
> diff --git a/drivers/pci/ecam.c b/drivers/pci/ecam.c
> index b54d32a31669..59f91d434859 100644
> --- a/drivers/pci/ecam.c
> +++ b/drivers/pci/ecam.c
> @@ -131,25 +131,36 @@ void __iomem *pci_ecam_map_bus(struct pci_bus *bus, unsigned int devfn,
>  			       int where)
>  {
>  	struct pci_config_window *cfg = bus->sysdata;
> +	unsigned int bus_shift = cfg->ops->bus_shift;
>  	unsigned int devfn_shift = cfg->ops->bus_shift - 8;
>  	unsigned int busn = bus->number;
>  	void __iomem *base;
> +	u32 bus_offset, devfn_offset;
>  
>  	if (busn < cfg->busr.start || busn > cfg->busr.end)
>  		return NULL;
>  
>  	busn -= cfg->busr.start;
> -	if (per_bus_mapping)
> +	if (per_bus_mapping) {
>  		base = cfg->winp[busn];
> -	else
> -		base = cfg->win + (busn << cfg->ops->bus_shift);
> -	return base + (devfn << devfn_shift) + where;
> +		busn = 0;
> +	} else
> +		base = cfg->win;
> +
> +	if (cfg->ops->bus_shift) {
> +		bus_offset = (busn & PCIE_ECAM_BUS_MASK) << bus_shift;
> +		devfn_offset = (devfn & PCIE_ECAM_DEVFN_MASK) << devfn_shift;
> +		where &= PCIE_ECAM_REG_MASK;
> +
> +		return base + (bus_offset | devfn_offset | where);
> +	}
> +
> +	return base + PCIE_ECAM_OFFSET(busn, devfn, where);
>  }
>  EXPORT_SYMBOL_GPL(pci_ecam_map_bus);
>  
>  /* ECAM ops */
>  const struct pci_ecam_ops pci_generic_ecam_ops = {
> -	.bus_shift	= 20,
>  	.pci_ops	= {
>  		.map_bus	= pci_ecam_map_bus,
>  		.read		= pci_generic_config_read,
> @@ -161,7 +172,6 @@ EXPORT_SYMBOL_GPL(pci_generic_ecam_ops);
>  #if defined(CONFIG_ACPI) && defined(CONFIG_PCI_QUIRKS)
>  /* ECAM ops for 32-bit access only (non-compliant) */
>  const struct pci_ecam_ops pci_32b_ops = {
> -	.bus_shift	= 20,
>  	.pci_ops	= {
>  		.map_bus	= pci_ecam_map_bus,
>  		.read		= pci_generic_config_read32,
> @@ -171,7 +181,6 @@ const struct pci_ecam_ops pci_32b_ops = {
>  
>  /* ECAM ops for 32-bit read only (non-compliant) */
>  const struct pci_ecam_ops pci_32b_read_ops = {
> -	.bus_shift	= 20,
>  	.pci_ops	= {
>  		.map_bus	= pci_ecam_map_bus,
>  		.read		= pci_generic_config_read32,
> diff --git a/include/linux/pci-ecam.h b/include/linux/pci-ecam.h
> index 033ce74f02e8..65d3d83015c3 100644
> --- a/include/linux/pci-ecam.h
> +++ b/include/linux/pci-ecam.h
> @@ -9,6 +9,33 @@
>  #include <linux/kernel.h>
>  #include <linux/platform_device.h>
>  
> +/*
> + * Memory address shift values for the byte-level address that
> + * can be used when accessing the PCI Express Configuration Space.
> + */
> +
> +/*
> + * Enhanced Configuration Access Mechanism (ECAM)
> + *
> + * See PCI Express Base Specification, Revision 5.0, Version 1.0,
> + * Section 7.2.2, Table 7-1, p. 677.
> + */
> +#define PCIE_ECAM_BUS_SHIFT	20 /* Bus number */
> +#define PCIE_ECAM_DEVFN_SHIFT	12 /* Device and Function number */
> +
> +#define PCIE_ECAM_BUS_MASK	0xff
> +#define PCIE_ECAM_DEVFN_MASK	0xff
> +#define PCIE_ECAM_REG_MASK	0xfff /* Limit offset to a maximum of 4K */
> +
> +#define PCIE_ECAM_BUS(x)	(((x) & PCIE_ECAM_BUS_MASK) << PCIE_ECAM_BUS_SHIFT)
> +#define PCIE_ECAM_DEVFN(x)	(((x) & PCIE_ECAM_DEVFN_MASK) << PCIE_ECAM_DEVFN_SHIFT)
> +#define PCIE_ECAM_REG(x)	((x) & PCIE_ECAM_REG_MASK)
> +
> +#define PCIE_ECAM_OFFSET(bus, devfn, where) \
> +	(PCIE_ECAM_BUS(bus) | \
> +	 PCIE_ECAM_DEVFN(devfn) | \
> +	 PCIE_ECAM_REG(where))
> +
>  /*
>   * struct to hold pci ops and bus shift of the config window
>   * for a PCI controller.
> -- 
> 2.29.2
> 

^ permalink raw reply

* Re: [PATCH] powerpc: fix the allyesconfig build
From: Stephen Rothwell @ 2020-11-30 10:26 UTC (permalink / raw)
  To: Geert Uytterhoeven
  Cc: Salil Mehta, Geert Uytterhoeven, Stephen Boyd, Michael Turquette,
	Linux Kernel Mailing List, Nicholas Piggin, linux-clk,
	Linux-Renesas, Yisen Zhuang, Joel Stanley, netdev, Jakub Kicinski,
	PowerPC, David S. Miller, Daniel Axtens
In-Reply-To: <CAMuHMdVJKarCRRRJq_hmvvv0NcSpREdqDbH8L5NitZmFUEbqmw@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 482 bytes --]

Hi Geert,

On Mon, 30 Nov 2020 09:58:23 +0100 Geert Uytterhoeven <geert@linux-m68k.org> wrote:
>
> Thanks for your patch!

No worries, it has been a small irritant to me for quite a while.

> I prefer to fix this in the driver instead.  The space saving by packing the
> structure is minimal.
> I've sent a patch
> https://lore.kernel.org/r/20201130085743.1656317-1-geert+renesas@glider.be
> (when lore is back)

Absolutely, thanks.

-- 
Cheers,
Stephen Rothwell

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply

* Re: [PATCH 6/8] lazy tlb: shoot lazies, a non-refcounting lazy tlb option
From: Peter Zijlstra @ 2020-11-30  9:34 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: linux-arch, Arnd Bergmann, X86 ML, LKML, Nicholas Piggin,
	Linux-MM, Mathieu Desnoyers, linuxppc-dev
In-Reply-To: <20201130093000.GM2414@hirez.programming.kicks-ass.net>

On Mon, Nov 30, 2020 at 10:30:00AM +0100, Peter Zijlstra wrote:
> On Sat, Nov 28, 2020 at 07:54:57PM -0800, Andy Lutomirski wrote:
> > This means that mm_cpumask operations won't need to be full barriers
> > forever, and we might not want to take the implied full barriers in
> > set_bit() and clear_bit() for granted.
> 
> There is no implied full barrier for those ops.

Neither ARM nor Power implies any ordering on those ops. But Power has
some of the worst atomic performance in the world despite of that.

IIRC they don't do their LL/SC in L1.

^ permalink raw reply

* Re: [PATCH 6/8] lazy tlb: shoot lazies, a non-refcounting lazy tlb option
From: Peter Zijlstra @ 2020-11-30  9:30 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: linux-arch, Arnd Bergmann, X86 ML, LKML, Nicholas Piggin,
	Linux-MM, Mathieu Desnoyers, linuxppc-dev
In-Reply-To: <CALCETrVXUbe8LfNn-Qs+DzrOQaiw+sFUg1J047yByV31SaTOZw@mail.gmail.com>

On Sat, Nov 28, 2020 at 07:54:57PM -0800, Andy Lutomirski wrote:
> This means that mm_cpumask operations won't need to be full barriers
> forever, and we might not want to take the implied full barriers in
> set_bit() and clear_bit() for granted.

There is no implied full barrier for those ops.

^ permalink raw reply

* Re: [PATCH 6/8] lazy tlb: shoot lazies, a non-refcounting lazy tlb option
From: Peter Zijlstra @ 2020-11-30  9:26 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: linux-arch, Arnd Bergmann, X86 ML, LKML, Nicholas Piggin,
	Linux-MM, Mathieu Desnoyers, linuxppc-dev
In-Reply-To: <CALCETrVXUbe8LfNn-Qs+DzrOQaiw+sFUg1J047yByV31SaTOZw@mail.gmail.com>

On Sat, Nov 28, 2020 at 07:54:57PM -0800, Andy Lutomirski wrote:
> Version (b) seems fairly straightforward to implement -- add RCU
> protection and a atomic_t special_ref_cleared (initially 0) to struct
> mm_struct itself.  After anyone clears a bit to mm_cpumask (which is
> already a barrier),

No it isn't. clear_bit() implies no barrier what so ever. That's x86
you're thinking about.

> they read mm_users.  If it's zero, then they scan
> mm_cpumask and see if it's empty.  If it is, they atomically swap
> special_ref_cleared to 1.  If it was zero before the swap, they do
> mmdrop().  I can imagine some tweaks that could make this a big
> faster, at least in the limit of a huge number of CPUs.

^ permalink raw reply

* Re: [PATCH 6/8] lazy tlb: shoot lazies, a non-refcounting lazy tlb option
From: Peter Zijlstra @ 2020-11-30  9:25 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: linux-arch, Arnd Bergmann, X86 ML, LKML, Nicholas Piggin,
	Linux-MM, Mathieu Desnoyers, linuxppc-dev
In-Reply-To: <CALCETrWBtCfD+jZ3S+O8FK-HFPODuhbDEbbfWvS=-iPATNFAOA@mail.gmail.com>

On Sun, Nov 29, 2020 at 12:16:26PM -0800, Andy Lutomirski wrote:
> On Sat, Nov 28, 2020 at 7:54 PM Andy Lutomirski <luto@kernel.org> wrote:
> >
> > On Sat, Nov 28, 2020 at 8:02 AM Nicholas Piggin <npiggin@gmail.com> wrote:
> > >
> > > On big systems, the mm refcount can become highly contented when doing
> > > a lot of context switching with threaded applications (particularly
> > > switching between the idle thread and an application thread).
> > >
> > > Abandoning lazy tlb slows switching down quite a bit in the important
> > > user->idle->user cases, so so instead implement a non-refcounted scheme
> > > that causes __mmdrop() to IPI all CPUs in the mm_cpumask and shoot down
> > > any remaining lazy ones.
> > >
> > > Shootdown IPIs are some concern, but they have not been observed to be
> > > a big problem with this scheme (the powerpc implementation generated
> > > 314 additional interrupts on a 144 CPU system during a kernel compile).
> > > There are a number of strategies that could be employed to reduce IPIs
> > > if they turn out to be a problem for some workload.
> >
> > I'm still wondering whether we can do even better.
> >
> 
> Hold on a sec.. __mmput() unmaps VMAs, frees pagetables, and flushes
> the TLB.  On x86, this will shoot down all lazies as long as even a
> single pagetable was freed.  (Or at least it will if we don't have a
> serious bug, but the code seems okay.  We'll hit pmd_free_tlb, which
> sets tlb->freed_tables, which will trigger the IPI.)  So, on
> architectures like x86, the shootdown approach should be free.  The
> only way it ought to have any excess IPIs is if we have CPUs in
> mm_cpumask() that don't need IPI to free pagetables, which could
> happen on paravirt.
> 
> Can you try to figure out why you saw any increase in IPIs?  It would
> be nice if we can make the new code unconditional.

Power doesn't do IPI based TLBI.

^ permalink raw reply

* RE: [PATCH v6 4/5] PCI: vmd: Update type of the __iomem pointers
From: David Laight @ 2020-11-30  9:06 UTC (permalink / raw)
  To: 'Krzysztof Wilczyński', Bjorn Helgaas
  Cc: Heiko Stuebner, linux-pci@vger.kernel.org, Shawn Lin,
	Paul Mackerras, Thomas Petazzoni, Jonathan Chocron, Toan Le,
	Will Deacon, Rob Herring, Lorenzo Pieralisi, Michal Simek,
	linux-rockchip@lists.infradead.org,
	bcm-kernel-feedback-list@broadcom.com, Jonathan Derrick, Ray Jui,
	Florian Fainelli, linux-rpi-kernel@lists.infradead.org,
	Jonathan Cameron, linux-arm-kernel@lists.infradead.org,
	Scott Branden, Zhou Wang, Robert Richter,
	linuxppc-dev@lists.ozlabs.org, Nicolas Saenz Julienne
In-Reply-To: <20201129230743.3006978-5-kw@linux.com>

From: Krzysztof Wilczynski
> Sent: 29 November 2020 23:08
> 
> Use "void __iomem" instead "char __iomem" pointer type when working with
> the accessor functions (with names like readb() or writel(), etc.) to
> better match a given accessor function signature where commonly the
> address pointing to an I/O memory region would be a "void __iomem"
> pointer.

ISTM that is heading in the wrong direction.

I think (form the variable names etc) that these are pointers
to specific registers.

So what you ought to have is a type for that register block.
Typically this is actually a structure - to give some type
checking that the offsets are being used with the correct
base address.

If the code is using numeric offsets (hardware engineers like
numeric offsets) then you can get some type protection by using
a structure that only contains a single field (char in this case).

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

^ permalink raw reply

* Re: [PATCH 5/8] net: ethernet: ibm: ibmvnic: Fix some kernel-doc misdemeanours
From: Lee Jones @ 2020-11-30  9:04 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Thomas Falcon, John Allen, linux-kernel, Santiago Leon,
	Jakub Kicinski, netdev, Lijun Pan, Dany Madden, Paul Mackerras,
	Sukadev Bhattiprolu, linuxppc-dev, David S. Miller
In-Reply-To: <20201129184354.GL2234159@lunn.ch>

On Sun, 29 Nov 2020, Andrew Lunn wrote:

> Hi Lee
> 
> >  /**
> >   * build_hdr_data - creates L2/L3/L4 header data buffer
> > - * @hdr_field - bitfield determining needed headers
> > - * @skb - socket buffer
> > - * @hdr_len - array of header lengths
> > - * @tot_len - total length of data
> > + * @hdr_field: bitfield determining needed headers
> > + * @skb: socket buffer
> > + * @hdr_len: array of header lengths
> > + * @tot_len: total length of data
> >   *
> >   * Reads hdr_field to determine which headers are needed by firmware.
> >   * Builds a buffer containing these headers.  Saves individual header
> 
> The code is:
> 
> static int build_hdr_data(u8 hdr_field, struct sk_buff *skb,
>                           int *hdr_len, u8 *hdr_data)
> {
> 
> What about hdr_data? 
> 
> >  /**
> >   * create_hdr_descs - create header and header extension descriptors
> > - * @hdr_field - bitfield determining needed headers
> > - * @data - buffer containing header data
> > - * @len - length of data buffer
> > - * @hdr_len - array of individual header lengths
> > - * @scrq_arr - descriptor array
> > + * @hdr_field: bitfield determining needed headers
> > + * @data: buffer containing header data
> > + * @len: length of data buffer
> > + * @hdr_len: array of individual header lengths
> > + * @scrq_arr: descriptor array
> 
> static int create_hdr_descs(u8 hdr_field, u8 *hdr_data, int len, int *hdr_len,
>                             union sub_crq *scrq_arr)
> 
> There is no data parameter.
> 
> It looks like you just changes - to :, but did not validate the
> parameters are actually correct.

Right.  This is a 'quirk' of my current process.

I build once, then use a script to parse the output, fixing each
issue in the order the compiler presents them.  Then, either after
fixing a reasonable collection, or all issues, I re-run the compile
and fix the next batch (if any).

This patch is only fixing the formatting issue(s).  As you've seen,
there is a subsequent patch in the series which fixes the disparity.

I can squash them though.  No problem.

-- 
Lee Jones [李琼斯]
Senior Technical Lead - Developer Services
Linaro.org │ Open source software for Arm SoCs
Follow Linaro: Facebook | Twitter | Blog

^ permalink raw reply

* Re: [PATCH 8/8] net: ethernet: ibm: ibmvnic: Fix some kernel-doc issues
From: Lee Jones @ 2020-11-30  8:59 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Thomas Falcon, John Allen, linux-kernel, Santiago Leon,
	Jakub Kicinski, netdev, Lijun Pan, Dany Madden, Paul Mackerras,
	Sukadev Bhattiprolu, linuxppc-dev, David S. Miller
In-Reply-To: <20201129191015.GO2234159@lunn.ch>

On Sun, 29 Nov 2020, Andrew Lunn wrote:

> On Thu, Nov 26, 2020 at 01:38:53PM +0000, Lee Jones wrote:
> > Fixes the following W=1 kernel build warning(s):
> > 
> >  from drivers/net/ethernet/ibm/ibmvnic.c:35:
> >  inlined from ‘handle_vpd_rsp’ at drivers/net/ethernet/ibm/ibmvnic.c:4124:3:
> >  drivers/net/ethernet/ibm/ibmvnic.c:1362: warning: Function parameter or member 'hdr_data' not described in 'build_hdr_data'
> >  drivers/net/ethernet/ibm/ibmvnic.c:1362: warning: Excess function parameter 'tot_len' description in 'build_hdr_data'
> >  drivers/net/ethernet/ibm/ibmvnic.c:1423: warning: Function parameter or member 'hdr_data' not described in 'create_hdr_descs'
> >  drivers/net/ethernet/ibm/ibmvnic.c:1423: warning: Excess function parameter 'data' description in 'create_hdr_descs'
> >  drivers/net/ethernet/ibm/ibmvnic.c:1474: warning: Function parameter or member 'txbuff' not described in 'build_hdr_descs_arr'
> >  drivers/net/ethernet/ibm/ibmvnic.c:1474: warning: Excess function parameter 'skb' description in 'build_hdr_descs_arr'
> >  drivers/net/ethernet/ibm/ibmvnic.c:1474: warning: Excess function parameter 'subcrq' description in 'build_hdr_descs_arr'
> 
> Hi Lee
> 
> It looks like this should be squashed into the previous patch to this
> file.

It certainly looks that way.  Will fix.

-- 
Lee Jones [李琼斯]
Senior Technical Lead - Developer Services
Linaro.org │ Open source software for Arm SoCs
Follow Linaro: Facebook | Twitter | Blog

^ permalink raw reply

* Re: [PATCH] powerpc: fix the allyesconfig build
From: Geert Uytterhoeven @ 2020-11-30  8:58 UTC (permalink / raw)
  To: Stephen Rothwell
  Cc: Salil Mehta, Geert Uytterhoeven, Stephen Boyd, Michael Turquette,
	Linux Kernel Mailing List, Nicholas Piggin, linux-clk,
	Linux-Renesas, Yisen Zhuang, Joel Stanley, netdev, Jakub Kicinski,
	PowerPC, David S. Miller, Daniel Axtens
In-Reply-To: <20201128122819.32187696@canb.auug.org.au>

Hi Stephen,

On Sat, Nov 28, 2020 at 2:28 AM Stephen Rothwell <sfr@canb.auug.org.au> wrote:
> There are 2 drivers that have arrays of packed structures that contain
> pointers that end up at unaligned offsets.  These produce warnings in
> the PowerPC allyesconfig build like this:
>
> WARNING: 148 bad relocations
> c00000000e56510b R_PPC64_UADDR64   .rodata+0x0000000001c72378
> c00000000e565126 R_PPC64_UADDR64   .rodata+0x0000000001c723c0
>
> They are not drivers that are used on PowerPC (I assume), so mark them
> to not be built on PPC64 when CONFIG_RELOCATABLE is enabled.
>
> Cc: Geert Uytterhoeven <geert+renesas@glider.be>
> Cc: Michael Turquette <mturquette@baylibre.com>
> Cc: Stephen Boyd <sboyd@kernel.org>
> Cc: Yisen Zhuang <yisen.zhuang@huawei.com>
> Cc: Salil Mehta <salil.mehta@huawei.com>
> Cc: David S. Miller <davem@davemloft.net>
> Cc: Jakub Kicinski <kuba@kernel.org>
> Cc: Nicholas Piggin  <npiggin@gmail.com>
> Cc: Daniel Axtens <dja@axtens.net>
> Cc: Joel Stanley <joel@jms.id.au>
> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>

Thanks for your patch!

> --- a/drivers/clk/renesas/Kconfig
> +++ b/drivers/clk/renesas/Kconfig
> @@ -151,6 +151,10 @@ config CLK_R8A779A0
>         select CLK_RENESAS_CPG_MSSR
>
>  config CLK_R9A06G032
> +       # PPC64 RELOCATABLE kernels cannot handle relocations to
> +       # unaligned locations that are produced by the array of
> +       # packed structures in this driver.
> +       depends on !(PPC64 && RELOCATABLE)
>         bool "Renesas R9A06G032 clock driver"
>         help
>           This is a driver for R9A06G032 clocks

I prefer to fix this in the driver instead.  The space saving by packing the
structure is minimal.
I've sent a patch
https://lore.kernel.org/r/20201130085743.1656317-1-geert+renesas@glider.be
(when lore is back)

Gr{oetje,eeting}s,

                        Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply

* [PATCH v2] clk: renesas: r9a06g032: Drop __packed for portability
From: Geert Uytterhoeven @ 2020-11-30  8:57 UTC (permalink / raw)
  To: Stephen Rothwell, Michael Turquette, Stephen Boyd,
	Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras
  Cc: Geert Uytterhoeven, linux-kernel, Gareth Williams,
	linux-renesas-soc, linuxppc-dev, linux-clk

The R9A06G032 clock driver uses an array of packed structures to reduce
kernel size.  However, this array contains pointers, which are no longer
aligned naturally, and cannot be relocated on PPC64.  Hence when
compile-testing this driver on PPC64 with CONFIG_RELOCATABLE=y (e.g.
PowerPC allyesconfig), the following warnings are produced:

    WARNING: 136 bad relocations
    c000000000616be3 R_PPC64_UADDR64   .rodata+0x00000000000cf338
    c000000000616bfe R_PPC64_UADDR64   .rodata+0x00000000000cf370
    ...

Fix this by dropping the __packed attribute from the r9a06g032_clkdesc
definition, trading a small size increase for portability.

This increases the 156-entry clock table by 1 byte per entry, but due to
the compiler generating more efficient code for unpacked accesses, the
net size increase is only 76 bytes (gcc 9.3.0 on arm32).

Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Fixes: 4c3d88526eba2143 ("clk: renesas: Renesas R9A06G032 clock driver")
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
---
v2:
  - Fix authorship.
---
 drivers/clk/renesas/r9a06g032-clocks.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/clk/renesas/r9a06g032-clocks.c b/drivers/clk/renesas/r9a06g032-clocks.c
index d900f6bf53d0b944..892e91b92f2c80f5 100644
--- a/drivers/clk/renesas/r9a06g032-clocks.c
+++ b/drivers/clk/renesas/r9a06g032-clocks.c
@@ -55,7 +55,7 @@ struct r9a06g032_clkdesc {
 			u16 sel, g1, r1, g2, r2;
 		} dual;
 	};
-} __packed;
+};
 
 #define I_GATE(_clk, _rst, _rdy, _midle, _scon, _mirack, _mistat) \
 	{ .gate = _clk, .reset = _rst, \
-- 
2.25.1


^ permalink raw reply related

* [PATCH] clk: renesas: r9a06g032: Drop __packed for portability
From: Geert Uytterhoeven @ 2020-11-30  8:53 UTC (permalink / raw)
  To: Stephen Rothwell, Michael Turquette, Stephen Boyd,
	Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras
  Cc: linux-kernel, Gareth Williams, linux-renesas-soc,
	Geert Uytterhoeven, linuxppc-dev, linux-clk

The R9A06G032 clock driver uses an array of packed structures to reduce
kernel size.  However, this array contains pointers, which are no longer
aligned naturally, and cannot be relocated on PPC64.  Hence when
compile-testing this driver on PPC64 with CONFIG_RELOCATABLE=y (e.g.
PowerPC allyesconfig), the following warnings are produced:

    WARNING: 136 bad relocations
    c000000000616be3 R_PPC64_UADDR64   .rodata+0x00000000000cf338
    c000000000616bfe R_PPC64_UADDR64   .rodata+0x00000000000cf370
    ...

Fix this by dropping the __packed attribute from the r9a06g032_clkdesc
definition, trading a small size increase for portability.

This increases the 156-entry clock table by 1 byte per entry, but due to
the compiler generating more efficient code for unpacked accesses, the
net size increase is only 76 bytes (gcc 9.3.0 on arm32).

Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Fixes: 4c3d88526eba2143 ("clk: renesas: Renesas R9A06G032 clock driver")
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
---
Please take directly (ppc or clk), as this is a build fix.
https://lore.kernel.org/linux-clk/20201128122819.32187696@canb.auug.org.au/

Compile-tested only due to lack of hardware.

 drivers/clk/renesas/r9a06g032-clocks.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/clk/renesas/r9a06g032-clocks.c b/drivers/clk/renesas/r9a06g032-clocks.c
index d900f6bf53d0b944..892e91b92f2c80f5 100644
--- a/drivers/clk/renesas/r9a06g032-clocks.c
+++ b/drivers/clk/renesas/r9a06g032-clocks.c
@@ -55,7 +55,7 @@ struct r9a06g032_clkdesc {
 			u16 sel, g1, r1, g2, r2;
 		} dual;
 	};
-} __packed;
+};
 
 #define I_GATE(_clk, _rst, _rdy, _midle, _scon, _mirack, _mistat) \
 	{ .gate = _clk, .reset = _rst, \
-- 
2.25.1


^ permalink raw reply related

* [PATCH v5] lkdtm/powerpc: Add SLB multihit test
From: Ganesh Goudar @ 2020-11-30  8:30 UTC (permalink / raw)
  To: linuxppc-dev, mpe; +Cc: msuchanek, Ganesh Goudar, keescook, npiggin, mahesh

To check machine check handling, add support to inject slb
multihit errors.

Cc: Kees Cook <keescook@chromium.org>
Cc: Michal Suchánek <msuchanek@suse.de>
Co-developed-by: Mahesh Salgaonkar <mahesh@linux.ibm.com>
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.ibm.com>
Signed-off-by: Ganesh Goudar <ganeshgr@linux.ibm.com>
---
v5:
- Insert entries at SLB_NUM_BOLTED and SLB_NUM_BOLTED +1, remove index
  allocating helper function.
- Move mk_esid_data and mk_vsid_data helpers to asm/book3s/64/mmu-hash.h.
- Use mmu_linear_psize and mmu_vmalloc_psize to get page size.
- Use !radix_enabled() to check if we are in HASH mode.
- And other minor improvements.

v1-v4:
- No major changes here for this patch, This patch was initially posted
  along with the other patch which got accepted.
https://git.kernel.org/powerpc/c/8d0e2101274358d9b6b1f27232b40253ca48bab5
---
 arch/powerpc/include/asm/book3s/64/mmu-hash.h |  28 +++-
 arch/powerpc/mm/book3s64/hash_utils.c         |   1 +
 arch/powerpc/mm/book3s64/slb.c                |  27 ----
 drivers/misc/lkdtm/Makefile                   |   1 +
 drivers/misc/lkdtm/core.c                     |   3 +
 drivers/misc/lkdtm/lkdtm.h                    |   3 +
 drivers/misc/lkdtm/powerpc.c                  | 120 ++++++++++++++++++
 tools/testing/selftests/lkdtm/tests.txt       |   1 +
 8 files changed, 156 insertions(+), 28 deletions(-)
 create mode 100644 drivers/misc/lkdtm/powerpc.c

diff --git a/arch/powerpc/include/asm/book3s/64/mmu-hash.h b/arch/powerpc/include/asm/book3s/64/mmu-hash.h
index 683a9c7d1b03..8b9f07900395 100644
--- a/arch/powerpc/include/asm/book3s/64/mmu-hash.h
+++ b/arch/powerpc/include/asm/book3s/64/mmu-hash.h
@@ -842,6 +842,32 @@ static inline unsigned long get_kernel_vsid(unsigned long ea, int ssize)
 
 unsigned htab_shift_for_mem_size(unsigned long mem_size);
 
-#endif /* __ASSEMBLY__ */
+enum slb_index {
+	LINEAR_INDEX	= 0, /* Kernel linear map  (0xc000000000000000) */
+	KSTACK_INDEX	= 1, /* Kernel stack map */
+};
 
+#define slb_esid_mask(ssize)	\
+	(((ssize) == MMU_SEGSIZE_256M) ? ESID_MASK : ESID_MASK_1T)
+
+static inline unsigned long mk_esid_data(unsigned long ea, int ssize,
+					 enum slb_index index)
+{
+	return (ea & slb_esid_mask(ssize)) | SLB_ESID_V | index;
+}
+
+static inline unsigned long __mk_vsid_data(unsigned long vsid, int ssize,
+					   unsigned long flags)
+{
+	return (vsid << slb_vsid_shift(ssize)) | flags |
+		((unsigned long)ssize << SLB_VSID_SSIZE_SHIFT);
+}
+
+static inline unsigned long mk_vsid_data(unsigned long ea, int ssize,
+					 unsigned long flags)
+{
+	return __mk_vsid_data(get_kernel_vsid(ea, ssize), ssize, flags);
+}
+
+#endif /* __ASSEMBLY__ */
 #endif /* _ASM_POWERPC_BOOK3S_64_MMU_HASH_H_ */
diff --git a/arch/powerpc/mm/book3s64/hash_utils.c b/arch/powerpc/mm/book3s64/hash_utils.c
index 24702c0a92e0..38076a998850 100644
--- a/arch/powerpc/mm/book3s64/hash_utils.c
+++ b/arch/powerpc/mm/book3s64/hash_utils.c
@@ -112,6 +112,7 @@ int mmu_linear_psize = MMU_PAGE_4K;
 EXPORT_SYMBOL_GPL(mmu_linear_psize);
 int mmu_virtual_psize = MMU_PAGE_4K;
 int mmu_vmalloc_psize = MMU_PAGE_4K;
+EXPORT_SYMBOL_GPL(mmu_vmalloc_psize);
 #ifdef CONFIG_SPARSEMEM_VMEMMAP
 int mmu_vmemmap_psize = MMU_PAGE_4K;
 #endif
diff --git a/arch/powerpc/mm/book3s64/slb.c b/arch/powerpc/mm/book3s64/slb.c
index c30fcbfa0e32..985706acb0e5 100644
--- a/arch/powerpc/mm/book3s64/slb.c
+++ b/arch/powerpc/mm/book3s64/slb.c
@@ -28,35 +28,8 @@
 #include "internal.h"
 
 
-enum slb_index {
-	LINEAR_INDEX	= 0, /* Kernel linear map  (0xc000000000000000) */
-	KSTACK_INDEX	= 1, /* Kernel stack map */
-};
-
 static long slb_allocate_user(struct mm_struct *mm, unsigned long ea);
 
-#define slb_esid_mask(ssize)	\
-	(((ssize) == MMU_SEGSIZE_256M)? ESID_MASK: ESID_MASK_1T)
-
-static inline unsigned long mk_esid_data(unsigned long ea, int ssize,
-					 enum slb_index index)
-{
-	return (ea & slb_esid_mask(ssize)) | SLB_ESID_V | index;
-}
-
-static inline unsigned long __mk_vsid_data(unsigned long vsid, int ssize,
-					 unsigned long flags)
-{
-	return (vsid << slb_vsid_shift(ssize)) | flags |
-		((unsigned long) ssize << SLB_VSID_SSIZE_SHIFT);
-}
-
-static inline unsigned long mk_vsid_data(unsigned long ea, int ssize,
-					 unsigned long flags)
-{
-	return __mk_vsid_data(get_kernel_vsid(ea, ssize), ssize, flags);
-}
-
 bool stress_slb_enabled __initdata;
 
 static int __init parse_stress_slb(char *p)
diff --git a/drivers/misc/lkdtm/Makefile b/drivers/misc/lkdtm/Makefile
index c70b3822013f..f37ecfb0a707 100644
--- a/drivers/misc/lkdtm/Makefile
+++ b/drivers/misc/lkdtm/Makefile
@@ -10,6 +10,7 @@ lkdtm-$(CONFIG_LKDTM)		+= rodata_objcopy.o
 lkdtm-$(CONFIG_LKDTM)		+= usercopy.o
 lkdtm-$(CONFIG_LKDTM)		+= stackleak.o
 lkdtm-$(CONFIG_LKDTM)		+= cfi.o
+lkdtm-$(CONFIG_PPC64)		+= powerpc.o
 
 KASAN_SANITIZE_stackleak.o	:= n
 KCOV_INSTRUMENT_rodata.o	:= n
diff --git a/drivers/misc/lkdtm/core.c b/drivers/misc/lkdtm/core.c
index 97803f213d9d..538098dc5aec 100644
--- a/drivers/misc/lkdtm/core.c
+++ b/drivers/misc/lkdtm/core.c
@@ -176,6 +176,9 @@ static const struct crashtype crashtypes[] = {
 #ifdef CONFIG_X86_32
 	CRASHTYPE(DOUBLE_FAULT),
 #endif
+#ifdef CONFIG_PPC64
+	CRASHTYPE(PPC_SLB_MULTIHIT),
+#endif
 };
 
 
diff --git a/drivers/misc/lkdtm/lkdtm.h b/drivers/misc/lkdtm/lkdtm.h
index 6dec4c9b442f..79ec05c18dd1 100644
--- a/drivers/misc/lkdtm/lkdtm.h
+++ b/drivers/misc/lkdtm/lkdtm.h
@@ -102,4 +102,7 @@ void lkdtm_STACKLEAK_ERASING(void);
 /* cfi.c */
 void lkdtm_CFI_FORWARD_PROTO(void);
 
+/* powerpc.c */
+void lkdtm_PPC_SLB_MULTIHIT(void);
+
 #endif
diff --git a/drivers/misc/lkdtm/powerpc.c b/drivers/misc/lkdtm/powerpc.c
new file mode 100644
index 000000000000..077c9f9ed8d0
--- /dev/null
+++ b/drivers/misc/lkdtm/powerpc.c
@@ -0,0 +1,120 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include "lkdtm.h"
+#include <linux/slab.h>
+#include <linux/vmalloc.h>
+#include <asm/mmu.h>
+
+/* Inserts new slb entries */
+static void insert_slb_entry(unsigned long p, int ssize, int page_size)
+{
+	unsigned long flags;
+
+	flags = SLB_VSID_KERNEL | mmu_psize_defs[page_size].sllp;
+	preempt_disable();
+
+	asm volatile("slbmte %0,%1" :
+		     : "r" (mk_vsid_data(p, ssize, flags)),
+		       "r" (mk_esid_data(p, ssize, SLB_NUM_BOLTED))
+		     : "memory");
+
+	asm volatile("slbmte %0,%1" :
+			: "r" (mk_vsid_data(p, ssize, flags)),
+			  "r" (mk_esid_data(p, ssize, SLB_NUM_BOLTED + 1))
+			: "memory");
+	preempt_enable();
+}
+
+/* Inject slb multihit on vmalloc-ed address i.e 0xD00... */
+static int inject_vmalloc_slb_multihit(void)
+{
+	char *p;
+
+	p = vmalloc(PAGE_SIZE);
+	if (!p)
+		return -ENOMEM;
+
+	insert_slb_entry((unsigned long)p, MMU_SEGSIZE_1T, mmu_vmalloc_psize);
+	/*
+	 * This triggers exception, If handled correctly we must recover
+	 * from this error.
+	 */
+	p[0] = '!';
+	vfree(p);
+	return 0;
+}
+
+/* Inject slb multihit on kmalloc-ed address i.e 0xC00... */
+static int inject_kmalloc_slb_multihit(void)
+{
+	char *p;
+
+	p = kmalloc(2048, GFP_KERNEL);
+	if (!p)
+		return -ENOMEM;
+
+	insert_slb_entry((unsigned long)p, MMU_SEGSIZE_1T, mmu_linear_psize);
+	/*
+	 * This triggers exception, If handled correctly we must recover
+	 * from this error.
+	 */
+	p[0] = '!';
+	kfree(p);
+	return 0;
+}
+
+/*
+ * Few initial SLB entries are bolted. Add a test to inject
+ * multihit in bolted entry 0.
+ */
+static void insert_dup_slb_entry_0(void)
+{
+	unsigned long test_address = PAGE_OFFSET, *test_ptr;
+	unsigned long esid, vsid;
+	unsigned long i = 0;
+
+	test_ptr = (unsigned long *)test_address;
+	preempt_disable();
+
+	asm volatile("slbmfee  %0,%1" : "=r" (esid) : "r" (i));
+	asm volatile("slbmfev  %0,%1" : "=r" (vsid) : "r" (i));
+
+	/* for i !=0 we would need to mask out the old entry number */
+	asm volatile("slbmte %0,%1" :
+			: "r" (vsid),
+			  "r" (esid | SLB_NUM_BOLTED)
+			: "memory");
+
+	asm volatile("slbmfee  %0,%1" : "=r" (esid) : "r" (i));
+	asm volatile("slbmfev  %0,%1" : "=r" (vsid) : "r" (i));
+
+	/* for i !=0 we would need to mask out the old entry number */
+	asm volatile("slbmte %0,%1" :
+			: "r" (vsid),
+			  "r" (esid | (SLB_NUM_BOLTED + 1))
+			: "memory");
+
+	pr_info("%s accessing test address 0x%lx: 0x%lx\n",
+		__func__, test_address, *test_ptr);
+
+	preempt_enable();
+}
+
+void lkdtm_PPC_SLB_MULTIHIT(void)
+{
+	if (!radix_enabled()) {
+		pr_info("Injecting SLB multihit errors\n");
+		/*
+		 * These need not be separate tests, And they do pretty
+		 * much same thing. In any case we must recover from the
+		 * errors introduced by these functions, machine would not
+		 * survive these tests in case of failure to handle.
+		 */
+		inject_vmalloc_slb_multihit();
+		inject_kmalloc_slb_multihit();
+		insert_dup_slb_entry_0();
+		pr_info("Recovered from SLB multihit errors\n");
+	} else {
+		pr_err("XFAIL: This test is for ppc64 and with hash mode MMU only\n");
+	}
+}
diff --git a/tools/testing/selftests/lkdtm/tests.txt b/tools/testing/selftests/lkdtm/tests.txt
index 74a8d329a72c..18e4599863c0 100644
--- a/tools/testing/selftests/lkdtm/tests.txt
+++ b/tools/testing/selftests/lkdtm/tests.txt
@@ -68,3 +68,4 @@ USERCOPY_STACK_BEYOND
 USERCOPY_KERNEL
 STACKLEAK_ERASING OK: the rest of the thread stack is properly erased
 CFI_FORWARD_PROTO
+PPC_SLB_MULTIHIT Recovered
-- 
2.26.2


^ permalink raw reply related

* Re: [PATCH v3 05/19] powerpc: interrupt handler wrapper functions
From: Aneesh Kumar K.V @ 2020-11-30  7:37 UTC (permalink / raw)
  To: Nicholas Piggin, linuxppc-dev; +Cc: Nicholas Piggin
In-Reply-To: <20201128144114.944000-6-npiggin@gmail.com>

Nicholas Piggin <npiggin@gmail.com> writes:

.....
 +#endif
> +DECLARE_INTERRUPT_HANDLER(emulation_assist_interrupt);
> +DECLARE_INTERRUPT_HANDLER_RAW(do_slb_fault);

Can we add comments here explaining why some of these handlers need to remain RAW()?

> +DECLARE_INTERRUPT_HANDLER(do_bad_slb_fault);
> +DECLARE_INTERRUPT_HANDLER_RET(do_hash_fault);
> +DECLARE_INTERRUPT_HANDLER_RET(do_page_fault);
> +DECLARE_INTERRUPT_HANDLER(do_bad_page_fault);
> +
> +DECLARE_INTERRUPT_HANDLER_ASYNC(timer_interrupt);
> +DECLARE_INTERRUPT_HANDLER_NMI(performance_monitor_exception_nmi);
> +DECLARE_INTERRUPT_HANDLER_ASYNC(performance_monitor_exception_async);
> +DECLARE_INTERRUPT_HANDLER_RAW(performance_monitor_exception);

Same for this.

> +DECLARE_INTERRUPT_HANDLER(WatchdogException);
> +DECLARE_INTERRUPT_HANDLER(unknown_exception);
> +DECLARE_INTERRUPT_HANDLER_ASYNC(unknown_async_exception);
> +
> +void replay_system_reset(void);
> +void replay_soft_interrupts(void);
> +
> +#endif /* _ASM_POWERPC_INTERRUPT_H */
> diff --git a/arch/powerpc/include/asm/time.h b/arch/powerpc/include/asm/time.h
> index 2f566c1a754c..335d6fd589a7 100644
> --- a/arch/powerpc/include/asm/time.h
> +++ b/arch/powerpc/include/asm/time.h
> @@ -131,6 +131,8 @@ DECLARE_PER_CPU(u64, decrementers_next_tb);
>  /* Convert timebase ticks to nanoseconds */
>  unsigned long long tb_to_ns(unsigned long long tb_ticks);
>  
> +void timer_broadcast_interrupt(void);
> +
>  /* SPLPAR */
>  void accumulate_stolen_time(void);
>  
> 

-aneesh

^ permalink raw reply

* Re: [PATCH v3 03/19] powerpc: bad_page_fault, do_break get registers from regs
From: Aneesh Kumar K.V @ 2020-11-30  7:36 UTC (permalink / raw)
  To: Nicholas Piggin, linuxppc-dev; +Cc: Nicholas Piggin
In-Reply-To: <20201128144114.944000-4-npiggin@gmail.com>

Nicholas Piggin <npiggin@gmail.com> writes:

> Similar to the previous patch this makes interrupt handler function
> types more regular so they can be wrapped with the next patch.
>
> bad_page_fault and do_break are not performance critical.
>

Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>

> [32s DABR code from Christophe Leroy <christophe.leroy@csgroup.eu>]
> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> ---
>  arch/powerpc/include/asm/bug.h             |  2 +-
>  arch/powerpc/include/asm/debug.h           |  3 +--
>  arch/powerpc/kernel/entry_32.S             | 18 +-----------------
>  arch/powerpc/kernel/exceptions-64e.S       |  3 +--
>  arch/powerpc/kernel/exceptions-64s.S       |  3 +--
>  arch/powerpc/kernel/head_8xx.S             |  5 ++---
>  arch/powerpc/kernel/head_book3s_32.S       |  3 +++
>  arch/powerpc/kernel/process.c              |  7 +++----
>  arch/powerpc/kernel/traps.c                |  2 +-
>  arch/powerpc/mm/book3s64/hash_utils.c      |  4 ++--
>  arch/powerpc/mm/book3s64/slb.c             |  2 +-
>  arch/powerpc/mm/fault.c                    | 10 +++++-----
>  arch/powerpc/platforms/8xx/machine_check.c |  2 +-
>  13 files changed, 23 insertions(+), 41 deletions(-)
>
> diff --git a/arch/powerpc/include/asm/bug.h b/arch/powerpc/include/asm/bug.h
> index 897bad6b6bbb..49162faba33f 100644
> --- a/arch/powerpc/include/asm/bug.h
> +++ b/arch/powerpc/include/asm/bug.h
> @@ -113,7 +113,7 @@
>  struct pt_regs;
>  long do_page_fault(struct pt_regs *);
>  long hash__do_page_fault(struct pt_regs *);
> -extern void bad_page_fault(struct pt_regs *, unsigned long, int);
> +void bad_page_fault(struct pt_regs *, int);
>  extern void _exception(int, struct pt_regs *, int, unsigned long);
>  extern void _exception_pkey(struct pt_regs *, unsigned long, int);
>  extern void die(const char *, struct pt_regs *, long);
> diff --git a/arch/powerpc/include/asm/debug.h b/arch/powerpc/include/asm/debug.h
> index ec57daf87f40..0550eceab3ca 100644
> --- a/arch/powerpc/include/asm/debug.h
> +++ b/arch/powerpc/include/asm/debug.h
> @@ -52,8 +52,7 @@ extern void do_send_trap(struct pt_regs *regs, unsigned long address,
>  			 unsigned long error_code, int brkpt);
>  #else
>  
> -extern void do_break(struct pt_regs *regs, unsigned long address,
> -		     unsigned long error_code);
> +void do_break(struct pt_regs *regs);
>  #endif
>  
>  #endif /* _ASM_POWERPC_DEBUG_H */
> diff --git a/arch/powerpc/kernel/entry_32.S b/arch/powerpc/kernel/entry_32.S
> index 8cdc8bcde703..57b8e95ea2a0 100644
> --- a/arch/powerpc/kernel/entry_32.S
> +++ b/arch/powerpc/kernel/entry_32.S
> @@ -657,10 +657,6 @@ ppc_swapcontext:
>  	.globl	handle_page_fault
>  handle_page_fault:
>  	addi	r3,r1,STACK_FRAME_OVERHEAD
> -#ifdef CONFIG_PPC_BOOK3S_32
> -	andis.  r0,r5,DSISR_DABRMATCH@h
> -	bne-    handle_dabr_fault
> -#endif
>  	bl	do_page_fault
>  	cmpwi	r3,0
>  	beq+	ret_from_except
> @@ -668,23 +664,11 @@ handle_page_fault:
>  	lwz	r0,_TRAP(r1)
>  	clrrwi	r0,r0,1
>  	stw	r0,_TRAP(r1)
> -	mr	r5,r3
> +	mr	r4,r3		/* err arg for bad_page_fault */
>  	addi	r3,r1,STACK_FRAME_OVERHEAD
> -	lwz	r4,_DAR(r1)
>  	bl	bad_page_fault
>  	b	ret_from_except_full
>  
> -#ifdef CONFIG_PPC_BOOK3S_32
> -	/* We have a data breakpoint exception - handle it */
> -handle_dabr_fault:
> -	SAVE_NVGPRS(r1)
> -	lwz	r0,_TRAP(r1)
> -	clrrwi	r0,r0,1
> -	stw	r0,_TRAP(r1)
> -	bl      do_break
> -	b	ret_from_except_full
> -#endif
> -
>  /*
>   * This routine switches between two different tasks.  The process
>   * state of one is saved on its kernel stack.  Then the state
> diff --git a/arch/powerpc/kernel/exceptions-64e.S b/arch/powerpc/kernel/exceptions-64e.S
> index 25fa7d5a643c..dc728bb1c89a 100644
> --- a/arch/powerpc/kernel/exceptions-64e.S
> +++ b/arch/powerpc/kernel/exceptions-64e.S
> @@ -1018,9 +1018,8 @@ storage_fault_common:
>  	bne-	1f
>  	b	ret_from_except_lite
>  1:	bl	save_nvgprs
> -	mr	r5,r3
> +	mr	r4,r3
>  	addi	r3,r1,STACK_FRAME_OVERHEAD
> -	ld	r4,_DAR(r1)
>  	bl	bad_page_fault
>  	b	ret_from_except
>  
> diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S
> index 690058043b17..77b730f515c4 100644
> --- a/arch/powerpc/kernel/exceptions-64s.S
> +++ b/arch/powerpc/kernel/exceptions-64s.S
> @@ -2136,8 +2136,7 @@ EXC_COMMON_BEGIN(h_data_storage_common)
>  	GEN_COMMON h_data_storage
>  	addi    r3,r1,STACK_FRAME_OVERHEAD
>  BEGIN_MMU_FTR_SECTION
> -	ld	r4,_DAR(r1)
> -	li	r5,SIGSEGV
> +	li	r4,SIGSEGV
>  	bl      bad_page_fault
>  MMU_FTR_SECTION_ELSE
>  	bl      unknown_exception
> diff --git a/arch/powerpc/kernel/head_8xx.S b/arch/powerpc/kernel/head_8xx.S
> index 8acd365a2be6..71ad7ce28469 100644
> --- a/arch/powerpc/kernel/head_8xx.S
> +++ b/arch/powerpc/kernel/head_8xx.S
> @@ -376,10 +376,9 @@ do_databreakpoint:
>  	addi	r3,r1,STACK_FRAME_OVERHEAD
>  	mfspr	r4,SPRN_BAR
>  	stw	r4,_DAR(r11)
> -#ifdef CONFIG_VMAP_STACK
> -	lwz	r5,_DSISR(r11)
> -#else
> +#ifndef CONFIG_VMAP_STACK
>  	mfspr	r5,SPRN_DSISR
> +	stw	r5,_DSISR(r11)
>  #endif
>  	EXC_XFER_STD(0x1c00, do_break)
>  
> diff --git a/arch/powerpc/kernel/head_book3s_32.S b/arch/powerpc/kernel/head_book3s_32.S
> index 7addf67832f9..5875f8795d5b 100644
> --- a/arch/powerpc/kernel/head_book3s_32.S
> +++ b/arch/powerpc/kernel/head_book3s_32.S
> @@ -689,7 +689,10 @@ handle_page_fault_tramp_1:
>  #endif
>  	/* fall through */
>  handle_page_fault_tramp_2:
> +	andis.	r0, r5, DSISR_DABRMATCH@h
> +	bne-	1f
>  	EXC_XFER_LITE(0x300, handle_page_fault)
> +1:	EXC_XFER_STD(0x300, do_break)
>  
>  #ifdef CONFIG_VMAP_STACK
>  .macro save_regs_thread		thread
> diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
> index d421a2c7f822..0bdd3ed653df 100644
> --- a/arch/powerpc/kernel/process.c
> +++ b/arch/powerpc/kernel/process.c
> @@ -660,11 +660,10 @@ static void do_break_handler(struct pt_regs *regs)
>  	}
>  }
>  
> -void do_break (struct pt_regs *regs, unsigned long address,
> -		    unsigned long error_code)
> +void do_break(struct pt_regs *regs)
>  {
>  	current->thread.trap_nr = TRAP_HWBKPT;
> -	if (notify_die(DIE_DABR_MATCH, "dabr_match", regs, error_code,
> +	if (notify_die(DIE_DABR_MATCH, "dabr_match", regs, regs->dsisr,
>  			11, SIGSEGV) == NOTIFY_STOP)
>  		return;
>  
> @@ -682,7 +681,7 @@ void do_break (struct pt_regs *regs, unsigned long address,
>  		do_break_handler(regs);
>  
>  	/* Deliver the signal to userspace */
> -	force_sig_fault(SIGTRAP, TRAP_HWBKPT, (void __user *)address);
> +	force_sig_fault(SIGTRAP, TRAP_HWBKPT, (void __user *)regs->dar);
>  }
>  #endif	/* CONFIG_PPC_ADV_DEBUG_REGS */
>  
> diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
> index 5006dcbe1d9f..902fcbd1a778 100644
> --- a/arch/powerpc/kernel/traps.c
> +++ b/arch/powerpc/kernel/traps.c
> @@ -1641,7 +1641,7 @@ void alignment_exception(struct pt_regs *regs)
>  	if (user_mode(regs))
>  		_exception(sig, regs, code, regs->dar);
>  	else
> -		bad_page_fault(regs, regs->dar, sig);
> +		bad_page_fault(regs, sig);
>  
>  bail:
>  	exception_exit(prev_state);
> diff --git a/arch/powerpc/mm/book3s64/hash_utils.c b/arch/powerpc/mm/book3s64/hash_utils.c
> index 0f0bd4af4b2d..731518e7d56f 100644
> --- a/arch/powerpc/mm/book3s64/hash_utils.c
> +++ b/arch/powerpc/mm/book3s64/hash_utils.c
> @@ -1537,7 +1537,7 @@ long do_hash_fault(struct pt_regs *regs)
>  	 * the access, or panic if there isn't a handler.
>  	 */
>  	if (unlikely(in_nmi())) {
> -		bad_page_fault(regs, ea, SIGSEGV);
> +		bad_page_fault(regs, SIGSEGV);
>  		return 0;
>  	}
>  
> @@ -1576,7 +1576,7 @@ long do_hash_fault(struct pt_regs *regs)
>  			else
>  				_exception(SIGBUS, regs, BUS_ADRERR, ea);
>  		} else {
> -			bad_page_fault(regs, ea, SIGBUS);
> +			bad_page_fault(regs, SIGBUS);
>  		}
>  		err = 0;
>  
> diff --git a/arch/powerpc/mm/book3s64/slb.c b/arch/powerpc/mm/book3s64/slb.c
> index cc34d50874c1..ae89ad516247 100644
> --- a/arch/powerpc/mm/book3s64/slb.c
> +++ b/arch/powerpc/mm/book3s64/slb.c
> @@ -898,7 +898,7 @@ void do_bad_slb_fault(struct pt_regs *regs)
>  		if (user_mode(regs))
>  			_exception(SIGSEGV, regs, SEGV_BNDERR, regs->dar);
>  		else
> -			bad_page_fault(regs, regs->dar, SIGSEGV);
> +			bad_page_fault(regs, SIGSEGV);
>  	} else if (err == -EINVAL) {
>  		unrecoverable_exception(regs);
>  	} else {
> diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c
> index 390a296b16a3..e11989be8f1c 100644
> --- a/arch/powerpc/mm/fault.c
> +++ b/arch/powerpc/mm/fault.c
> @@ -562,14 +562,14 @@ long do_page_fault(struct pt_regs *regs)
>  	/* 32 and 64e handle errors in their asm code */
>  	if (unlikely(err)) {
>  		if (err > 0) {
> -			bad_page_fault(regs, address, err);
> +			bad_page_fault(regs, err);
>  			err = 0;
>  		} else {
>  			/*
>  			 * do_break() may change NV GPRS while handling the
>  			 * breakpoint. Return -ve to caller to do that.
>  			 */
> -			do_break(regs, address, error_code);
> +			do_break(regs);
>  		}
>  	}
>  #endif
> @@ -591,14 +591,14 @@ long hash__do_page_fault(struct pt_regs *regs)
>  	err = __do_page_fault(regs, address, error_code);
>  	if (unlikely(err)) {
>  		if (err > 0) {
> -			bad_page_fault(regs, address, err);
> +			bad_page_fault(regs, err);
>  			err = 0;
>  		} else {
>  			/*
>  			 * do_break() may change NV GPRS while handling the
>  			 * breakpoint. Return -ve to caller to do that.
>  			 */
> -			do_break(regs, address, error_code);
> +			do_break(regs);
>  		}
>  	}
>  
> @@ -612,7 +612,7 @@ NOKPROBE_SYMBOL(hash__do_page_fault);
>   * It is called from the DSI and ISI handlers in head.S and from some
>   * of the procedures in traps.c.
>   */
> -void bad_page_fault(struct pt_regs *regs, unsigned long address, int sig)
> +void bad_page_fault(struct pt_regs *regs, int sig)
>  {
>  	const struct exception_table_entry *entry;
>  	int is_write = page_fault_is_write(regs->dsisr);
> diff --git a/arch/powerpc/platforms/8xx/machine_check.c b/arch/powerpc/platforms/8xx/machine_check.c
> index 88dedf38eccd..656365975895 100644
> --- a/arch/powerpc/platforms/8xx/machine_check.c
> +++ b/arch/powerpc/platforms/8xx/machine_check.c
> @@ -26,7 +26,7 @@ int machine_check_8xx(struct pt_regs *regs)
>  	 * to deal with that than having a wart in the mcheck handler.
>  	 * -- BenH
>  	 */
> -	bad_page_fault(regs, regs->dar, SIGBUS);
> +	bad_page_fault(regs, SIGBUS);
>  	return 1;
>  #else
>  	return 0;
> -- 
> 2.23.0

^ permalink raw reply

* Re: [PATCH v3 02/19] powerpc: remove arguments from fault handler functions
From: Aneesh Kumar K.V @ 2020-11-30  7:35 UTC (permalink / raw)
  To: Nicholas Piggin, linuxppc-dev; +Cc: Nicholas Piggin
In-Reply-To: <20201128144114.944000-3-npiggin@gmail.com>

Nicholas Piggin <npiggin@gmail.com> writes:

> Make mm fault handlers all just take the pt_regs * argument and load
> DAR/DSISR from that. Make those that return a value return long.
>
> This is done to make the function signatures match other handlers, which
> will help with a future patch to add wrappers. Explicit arguments could
> be added for performance but that would require more wrapper macro
> variants.

Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
>
> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> ---
>  arch/powerpc/include/asm/asm-prototypes.h     |  4 ++--
>  arch/powerpc/include/asm/book3s/64/mmu-hash.h |  2 +-
>  arch/powerpc/include/asm/bug.h                |  4 ++--
>  arch/powerpc/kernel/exceptions-64e.S          |  2 --
>  arch/powerpc/kernel/exceptions-64s.S          | 14 ++------------
>  arch/powerpc/kernel/head_40x.S                | 10 +++++-----
>  arch/powerpc/kernel/head_8xx.S                |  6 +++---
>  arch/powerpc/kernel/head_book3s_32.S          |  6 ++----
>  arch/powerpc/kernel/head_booke.h              |  4 +---
>  arch/powerpc/mm/book3s64/hash_utils.c         |  8 +++++---
>  arch/powerpc/mm/book3s64/slb.c                | 11 +++++++----
>  arch/powerpc/mm/fault.c                       | 16 +++++++++-------
>  12 files changed, 39 insertions(+), 48 deletions(-)
>
> diff --git a/arch/powerpc/include/asm/asm-prototypes.h b/arch/powerpc/include/asm/asm-prototypes.h
> index d0b832cbbec8..22c9d08fa3a4 100644
> --- a/arch/powerpc/include/asm/asm-prototypes.h
> +++ b/arch/powerpc/include/asm/asm-prototypes.h
> @@ -82,8 +82,8 @@ void kernel_bad_stack(struct pt_regs *regs);
>  void system_reset_exception(struct pt_regs *regs);
>  void machine_check_exception(struct pt_regs *regs);
>  void emulation_assist_interrupt(struct pt_regs *regs);
> -long do_slb_fault(struct pt_regs *regs, unsigned long ea);
> -void do_bad_slb_fault(struct pt_regs *regs, unsigned long ea, long err);
> +long do_slb_fault(struct pt_regs *regs);
> +void do_bad_slb_fault(struct pt_regs *regs);
>  
>  /* signals, syscalls and interrupts */
>  long sys_swapcontext(struct ucontext __user *old_ctx,
> diff --git a/arch/powerpc/include/asm/book3s/64/mmu-hash.h b/arch/powerpc/include/asm/book3s/64/mmu-hash.h
> index bc8c91f2d26f..e843d0b193d3 100644
> --- a/arch/powerpc/include/asm/book3s/64/mmu-hash.h
> +++ b/arch/powerpc/include/asm/book3s/64/mmu-hash.h
> @@ -453,7 +453,7 @@ static inline unsigned long hpt_hash(unsigned long vpn,
>  #define HPTE_LOCAL_UPDATE	0x1
>  #define HPTE_NOHPTE_UPDATE	0x2
>  
> -int do_hash_fault(struct pt_regs *regs, unsigned long ea, unsigned long dsisr);
> +long do_hash_fault(struct pt_regs *regs);
>  extern int __hash_page_4K(unsigned long ea, unsigned long access,
>  			  unsigned long vsid, pte_t *ptep, unsigned long trap,
>  			  unsigned long flags, int ssize, int subpage_prot);
> diff --git a/arch/powerpc/include/asm/bug.h b/arch/powerpc/include/asm/bug.h
> index c0e9b7a967a8..897bad6b6bbb 100644
> --- a/arch/powerpc/include/asm/bug.h
> +++ b/arch/powerpc/include/asm/bug.h
> @@ -111,8 +111,8 @@
>  #ifndef __ASSEMBLY__
>  
>  struct pt_regs;
> -extern int do_page_fault(struct pt_regs *, unsigned long, unsigned long);
> -int hash__do_page_fault(struct pt_regs *, unsigned long, unsigned long);
> +long do_page_fault(struct pt_regs *);
> +long hash__do_page_fault(struct pt_regs *);
>  extern void bad_page_fault(struct pt_regs *, unsigned long, int);
>  extern void _exception(int, struct pt_regs *, int, unsigned long);
>  extern void _exception_pkey(struct pt_regs *, unsigned long, int);
> diff --git a/arch/powerpc/kernel/exceptions-64e.S b/arch/powerpc/kernel/exceptions-64e.S
> index f579ce46eef2..25fa7d5a643c 100644
> --- a/arch/powerpc/kernel/exceptions-64e.S
> +++ b/arch/powerpc/kernel/exceptions-64e.S
> @@ -1011,8 +1011,6 @@ storage_fault_common:
>  	std	r14,_DAR(r1)
>  	std	r15,_DSISR(r1)
>  	addi	r3,r1,STACK_FRAME_OVERHEAD
> -	mr	r4,r14
> -	mr	r5,r15
>  	ld	r14,PACA_EXGEN+EX_R14(r13)
>  	ld	r15,PACA_EXGEN+EX_R15(r13)
>  	bl	do_page_fault
> diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S
> index 336fa1fa39d1..690058043b17 100644
> --- a/arch/powerpc/kernel/exceptions-64s.S
> +++ b/arch/powerpc/kernel/exceptions-64s.S
> @@ -1438,8 +1438,6 @@ EXC_VIRT_BEGIN(data_access, 0x4300, 0x80)
>  EXC_VIRT_END(data_access, 0x4300, 0x80)
>  EXC_COMMON_BEGIN(data_access_common)
>  	GEN_COMMON data_access
> -	ld	r4,_DAR(r1)
> -	ld	r5,_DSISR(r1)
>  	addi	r3,r1,STACK_FRAME_OVERHEAD
>  BEGIN_MMU_FTR_SECTION
>  	bl	do_hash_fault
> @@ -1492,10 +1490,9 @@ EXC_VIRT_BEGIN(data_access_slb, 0x4380, 0x80)
>  EXC_VIRT_END(data_access_slb, 0x4380, 0x80)
>  EXC_COMMON_BEGIN(data_access_slb_common)
>  	GEN_COMMON data_access_slb
> -	ld	r4,_DAR(r1)
> -	addi	r3,r1,STACK_FRAME_OVERHEAD
>  BEGIN_MMU_FTR_SECTION
>  	/* HPT case, do SLB fault */
> +	addi	r3,r1,STACK_FRAME_OVERHEAD
>  	bl	do_slb_fault
>  	cmpdi	r3,0
>  	bne-	1f
> @@ -1507,8 +1504,6 @@ MMU_FTR_SECTION_ELSE
>  ALT_MMU_FTR_SECTION_END_IFCLR(MMU_FTR_TYPE_RADIX)
>  	std	r3,RESULT(r1)
>  	RECONCILE_IRQ_STATE(r10, r11)
> -	ld	r4,_DAR(r1)
> -	ld	r5,RESULT(r1)
>  	addi	r3,r1,STACK_FRAME_OVERHEAD
>  	bl	do_bad_slb_fault
>  	b	interrupt_return
> @@ -1543,8 +1538,6 @@ EXC_VIRT_BEGIN(instruction_access, 0x4400, 0x80)
>  EXC_VIRT_END(instruction_access, 0x4400, 0x80)
>  EXC_COMMON_BEGIN(instruction_access_common)
>  	GEN_COMMON instruction_access
> -	ld	r4,_DAR(r1)
> -	ld	r5,_DSISR(r1)
>  	addi	r3,r1,STACK_FRAME_OVERHEAD
>  BEGIN_MMU_FTR_SECTION
>  	bl	do_hash_fault
> @@ -1588,10 +1581,9 @@ EXC_VIRT_BEGIN(instruction_access_slb, 0x4480, 0x80)
>  EXC_VIRT_END(instruction_access_slb, 0x4480, 0x80)
>  EXC_COMMON_BEGIN(instruction_access_slb_common)
>  	GEN_COMMON instruction_access_slb
> -	ld	r4,_DAR(r1)
> -	addi	r3,r1,STACK_FRAME_OVERHEAD
>  BEGIN_MMU_FTR_SECTION
>  	/* HPT case, do SLB fault */
> +	addi	r3,r1,STACK_FRAME_OVERHEAD
>  	bl	do_slb_fault
>  	cmpdi	r3,0
>  	bne-	1f
> @@ -1603,8 +1595,6 @@ MMU_FTR_SECTION_ELSE
>  ALT_MMU_FTR_SECTION_END_IFCLR(MMU_FTR_TYPE_RADIX)
>  	std	r3,RESULT(r1)
>  	RECONCILE_IRQ_STATE(r10, r11)
> -	ld	r4,_DAR(r1)
> -	ld	r5,RESULT(r1)
>  	addi	r3,r1,STACK_FRAME_OVERHEAD
>  	bl	do_bad_slb_fault
>  	b	interrupt_return
> diff --git a/arch/powerpc/kernel/head_40x.S b/arch/powerpc/kernel/head_40x.S
> index a1ae00689e0f..3c5577ac4dc8 100644
> --- a/arch/powerpc/kernel/head_40x.S
> +++ b/arch/powerpc/kernel/head_40x.S
> @@ -179,9 +179,9 @@ _ENTRY(saved_ksp_limit)
>   */
>  	START_EXCEPTION(0x0300,	DataStorage)
>  	EXCEPTION_PROLOG
> -	mfspr	r5, SPRN_ESR		/* Grab the ESR, save it, pass arg3 */
> +	mfspr	r5, SPRN_ESR		/* Grab the ESR, save it */
>  	stw	r5, _ESR(r11)
> -	mfspr	r4, SPRN_DEAR		/* Grab the DEAR, save it, pass arg2 */
> +	mfspr	r4, SPRN_DEAR		/* Grab the DEAR, save it */
>  	stw	r4, _DEAR(r11)
>  	EXC_XFER_LITE(0x300, handle_page_fault)
>  
> @@ -191,9 +191,9 @@ _ENTRY(saved_ksp_limit)
>   */
>  	START_EXCEPTION(0x0400, InstructionAccess)
>  	EXCEPTION_PROLOG
> -	mr	r4,r12			/* Pass SRR0 as arg2 */
> -	stw	r4, _DEAR(r11)
> -	li	r5,0			/* Pass zero as arg3 */
> +	li	r5,0
> +	stw	r5, _ESR(r11)		/* Zero ESR */
> +	stw	r12, _DEAR(r11)		/* SRR0 as DEAR */
>  	EXC_XFER_LITE(0x400, handle_page_fault)
>  
>  /* 0x0500 - External Interrupt Exception */
> diff --git a/arch/powerpc/kernel/head_8xx.S b/arch/powerpc/kernel/head_8xx.S
> index ee0bfebc375f..8acd365a2be6 100644
> --- a/arch/powerpc/kernel/head_8xx.S
> +++ b/arch/powerpc/kernel/head_8xx.S
> @@ -324,14 +324,14 @@ DataStoreTLBMiss:
>  	. = 0x1300
>  InstructionTLBError:
>  	EXCEPTION_PROLOG
> -	mr	r4,r12
>  	andis.	r5,r9,DSISR_SRR1_MATCH_32S@h /* Filter relevant SRR1 bits */
>  	andis.	r10,r9,SRR1_ISI_NOPT@h
>  	beq+	.Litlbie
> -	tlbie	r4
> +	tlbie	r12
>  	/* 0x400 is InstructionAccess exception, needed by bad_page_fault() */
>  .Litlbie:
> -	stw	r4, _DAR(r11)
> +	stw	r12, _DAR(r11)
> +	stw	r5, _DSISR(r11)
>  	EXC_XFER_LITE(0x400, handle_page_fault)
>  
>  /* This is the data TLB error on the MPC8xx.  This could be due to
> diff --git a/arch/powerpc/kernel/head_book3s_32.S b/arch/powerpc/kernel/head_book3s_32.S
> index a0dda2a1f2df..7addf67832f9 100644
> --- a/arch/powerpc/kernel/head_book3s_32.S
> +++ b/arch/powerpc/kernel/head_book3s_32.S
> @@ -370,9 +370,9 @@ BEGIN_MMU_FTR_SECTION
>  	bl	hash_page
>  END_MMU_FTR_SECTION_IFSET(MMU_FTR_HPTE_TABLE)
>  #endif	/* CONFIG_VMAP_STACK */
> -1:	mr	r4,r12
>  	andis.	r5,r9,DSISR_SRR1_MATCH_32S@h /* Filter relevant SRR1 bits */
> -	stw	r4, _DAR(r11)
> +	stw	r5, _DSISR(r11)
> +	stw	r12, _DAR(r11)
>  	EXC_XFER_LITE(0x400, handle_page_fault)
>  
>  /* External interrupt */
> @@ -687,8 +687,6 @@ handle_page_fault_tramp_1:
>  #ifdef CONFIG_VMAP_STACK
>  	EXCEPTION_PROLOG_2 handle_dar_dsisr=1
>  #endif
> -	lwz	r4, _DAR(r11)
> -	lwz	r5, _DSISR(r11)
>  	/* fall through */
>  handle_page_fault_tramp_2:
>  	EXC_XFER_LITE(0x300, handle_page_fault)
> diff --git a/arch/powerpc/kernel/head_booke.h b/arch/powerpc/kernel/head_booke.h
> index 71c359d438b5..1da0c1d1b0a1 100644
> --- a/arch/powerpc/kernel/head_booke.h
> +++ b/arch/powerpc/kernel/head_booke.h
> @@ -477,9 +477,7 @@ ALT_FTR_SECTION_END_IFSET(CPU_FTR_EMB_HV)
>  	NORMAL_EXCEPTION_PROLOG(INST_STORAGE);		      \
>  	mfspr	r5,SPRN_ESR;		/* Grab the ESR and save it */	      \
>  	stw	r5,_ESR(r11);						      \
> -	mr      r4,r12;                 /* Pass SRR0 as arg2 */		      \
> -	stw	r4, _DEAR(r11);						      \
> -	li      r5,0;                   /* Pass zero as arg3 */		      \
> +	stw	r12, _DEAR(r11);	/* Pass SRR0 as arg2 */		      \
>  	EXC_XFER_LITE(0x0400, handle_page_fault)
>  
>  #define ALIGNMENT_EXCEPTION						      \
> diff --git a/arch/powerpc/mm/book3s64/hash_utils.c b/arch/powerpc/mm/book3s64/hash_utils.c
> index bfa1b1966218..0f0bd4af4b2d 100644
> --- a/arch/powerpc/mm/book3s64/hash_utils.c
> +++ b/arch/powerpc/mm/book3s64/hash_utils.c
> @@ -1510,13 +1510,15 @@ int hash_page(unsigned long ea, unsigned long access, unsigned long trap,
>  }
>  EXPORT_SYMBOL_GPL(hash_page);
>  
> -int do_hash_fault(struct pt_regs *regs, unsigned long ea, unsigned long dsisr)
> +long do_hash_fault(struct pt_regs *regs)
>  {
> +	unsigned long ea = regs->dar;
> +	unsigned long dsisr = regs->dsisr;
>  	unsigned long access = _PAGE_PRESENT | _PAGE_READ;
>  	unsigned long flags = 0;
>  	struct mm_struct *mm;
>  	unsigned int region_id;
> -	int err;
> +	long err;
>  
>  	if (unlikely(dsisr & (DSISR_BAD_FAULT_64S | DSISR_DABRMATCH | DSISR_KEYFAULT)))
>  		goto _do_page_fault;
> @@ -1580,7 +1582,7 @@ int do_hash_fault(struct pt_regs *regs, unsigned long ea, unsigned long dsisr)
>  
>  	} else if (err) {
>  _do_page_fault:
> -		err = hash__do_page_fault(regs, ea, dsisr);
> +		err = hash__do_page_fault(regs);
>  	}
>  
>  	return err;
> diff --git a/arch/powerpc/mm/book3s64/slb.c b/arch/powerpc/mm/book3s64/slb.c
> index c30fcbfa0e32..cc34d50874c1 100644
> --- a/arch/powerpc/mm/book3s64/slb.c
> +++ b/arch/powerpc/mm/book3s64/slb.c
> @@ -837,8 +837,9 @@ static long slb_allocate_user(struct mm_struct *mm, unsigned long ea)
>  	return slb_insert_entry(ea, context, flags, ssize, false);
>  }
>  
> -long do_slb_fault(struct pt_regs *regs, unsigned long ea)
> +long do_slb_fault(struct pt_regs *regs)
>  {
> +	unsigned long ea = regs->dar;
>  	unsigned long id = get_region_id(ea);
>  
>  	/* IRQs are not reconciled here, so can't check irqs_disabled */
> @@ -889,13 +890,15 @@ long do_slb_fault(struct pt_regs *regs, unsigned long ea)
>  	}
>  }
>  
> -void do_bad_slb_fault(struct pt_regs *regs, unsigned long ea, long err)
> +void do_bad_slb_fault(struct pt_regs *regs)
>  {
> +	int err = regs->result;
> +
>  	if (err == -EFAULT) {
>  		if (user_mode(regs))
> -			_exception(SIGSEGV, regs, SEGV_BNDERR, ea);
> +			_exception(SIGSEGV, regs, SEGV_BNDERR, regs->dar);
>  		else
> -			bad_page_fault(regs, ea, SIGSEGV);
> +			bad_page_fault(regs, regs->dar, SIGSEGV);
>  	} else if (err == -EINVAL) {
>  		unrecoverable_exception(regs);
>  	} else {
> diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c
> index e65a49f246ef..390a296b16a3 100644
> --- a/arch/powerpc/mm/fault.c
> +++ b/arch/powerpc/mm/fault.c
> @@ -549,11 +549,12 @@ static int __do_page_fault(struct pt_regs *regs, unsigned long address,
>  }
>  NOKPROBE_SYMBOL(__do_page_fault);
>  
> -int do_page_fault(struct pt_regs *regs, unsigned long address,
> -		  unsigned long error_code)
> +long do_page_fault(struct pt_regs *regs)
>  {
>  	enum ctx_state prev_state = exception_enter();
> -	int err;
> +	unsigned long address = regs->dar;
> +	unsigned long error_code = regs->dsisr;
> +	long err;
>  
>  	err = __do_page_fault(regs, address, error_code);
>  
> @@ -580,11 +581,12 @@ int do_page_fault(struct pt_regs *regs, unsigned long address,
>  NOKPROBE_SYMBOL(do_page_fault);
>  
>  #ifdef CONFIG_PPC_BOOK3S_64
> -/* Same as do_page_fault but interrupt entry has already run in do_hash_fault */
> -int hash__do_page_fault(struct pt_regs *regs, unsigned long address,
> -		  unsigned long error_code)
> +/* Same as do_page_fault but no interrupt entry */
> +long hash__do_page_fault(struct pt_regs *regs)
>  {
> -	int err;
> +	unsigned long address = regs->dar;
> +	unsigned long error_code = regs->dsisr;
> +	long err;
>  
>  	err = __do_page_fault(regs, address, error_code);
>  	if (unlikely(err)) {
> -- 
> 2.23.0

^ permalink raw reply

* Re: [PATCH v3 01/19] powerpc/64s: move the last of the page fault handling logic to C
From: Aneesh Kumar K.V @ 2020-11-30  7:35 UTC (permalink / raw)
  To: Nicholas Piggin, linuxppc-dev; +Cc: Nicholas Piggin
In-Reply-To: <20201128144114.944000-2-npiggin@gmail.com>

Nicholas Piggin <npiggin@gmail.com> writes:

> The page fault handling still has some complex logic particularly around
> hash table handling, in asm. Implement this in C instead.
>

Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>

> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> ---
>  arch/powerpc/include/asm/book3s/64/mmu-hash.h |   1 +
>  arch/powerpc/include/asm/bug.h                |   1 +
>  arch/powerpc/kernel/exceptions-64s.S          | 131 +++---------------
>  arch/powerpc/mm/book3s64/hash_utils.c         |  77 ++++++----
>  arch/powerpc/mm/fault.c                       |  57 +++++++-
>  5 files changed, 127 insertions(+), 140 deletions(-)
>
> diff --git a/arch/powerpc/include/asm/book3s/64/mmu-hash.h b/arch/powerpc/include/asm/book3s/64/mmu-hash.h
> index 683a9c7d1b03..bc8c91f2d26f 100644
> --- a/arch/powerpc/include/asm/book3s/64/mmu-hash.h
> +++ b/arch/powerpc/include/asm/book3s/64/mmu-hash.h
> @@ -453,6 +453,7 @@ static inline unsigned long hpt_hash(unsigned long vpn,
>  #define HPTE_LOCAL_UPDATE	0x1
>  #define HPTE_NOHPTE_UPDATE	0x2
>  
> +int do_hash_fault(struct pt_regs *regs, unsigned long ea, unsigned long dsisr);
>  extern int __hash_page_4K(unsigned long ea, unsigned long access,
>  			  unsigned long vsid, pte_t *ptep, unsigned long trap,
>  			  unsigned long flags, int ssize, int subpage_prot);
> diff --git a/arch/powerpc/include/asm/bug.h b/arch/powerpc/include/asm/bug.h
> index 338f36cd9934..c0e9b7a967a8 100644
> --- a/arch/powerpc/include/asm/bug.h
> +++ b/arch/powerpc/include/asm/bug.h
> @@ -112,6 +112,7 @@
>  
>  struct pt_regs;
>  extern int do_page_fault(struct pt_regs *, unsigned long, unsigned long);
> +int hash__do_page_fault(struct pt_regs *, unsigned long, unsigned long);
>  extern void bad_page_fault(struct pt_regs *, unsigned long, int);
>  extern void _exception(int, struct pt_regs *, int, unsigned long);
>  extern void _exception_pkey(struct pt_regs *, unsigned long, int);
> diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S
> index 4d01f09ecf80..336fa1fa39d1 100644
> --- a/arch/powerpc/kernel/exceptions-64s.S
> +++ b/arch/powerpc/kernel/exceptions-64s.S
> @@ -1401,14 +1401,15 @@ END_FTR_SECTION_IFSET(CPU_FTR_HVMODE)
>   *
>   * Handling:
>   * - Hash MMU
> - *   Go to do_hash_page first to see if the HPT can be filled from an entry in
> - *   the Linux page table. Hash faults can hit in kernel mode in a fairly
> + *   Go to do_hash_fault, which attempts to fill the HPT from an entry in the
> + *   Linux page table. Hash faults can hit in kernel mode in a fairly
>   *   arbitrary state (e.g., interrupts disabled, locks held) when accessing
>   *   "non-bolted" regions, e.g., vmalloc space. However these should always be
> - *   backed by Linux page tables.
> + *   backed by Linux page table entries.
>   *
> - *   If none is found, do a Linux page fault. Linux page faults can happen in
> - *   kernel mode due to user copy operations of course.
> + *   If no entry is found the Linux page fault handler is invoked (by
> + *   do_hash_fault). Linux page faults can happen in kernel mode due to user
> + *   copy operations of course.
>   *
>   *   KVM: The KVM HDSI handler may perform a load with MSR[DR]=1 in guest
>   *   MMU context, which may cause a DSI in the host, which must go to the
> @@ -1439,13 +1440,17 @@ EXC_COMMON_BEGIN(data_access_common)
>  	GEN_COMMON data_access
>  	ld	r4,_DAR(r1)
>  	ld	r5,_DSISR(r1)
> +	addi	r3,r1,STACK_FRAME_OVERHEAD
>  BEGIN_MMU_FTR_SECTION
> -	ld	r6,_MSR(r1)
> -	li	r3,0x300
> -	b	do_hash_page		/* Try to handle as hpte fault */
> +	bl	do_hash_fault
>  MMU_FTR_SECTION_ELSE
> -	b	handle_page_fault
> +	bl	do_page_fault
>  ALT_MMU_FTR_SECTION_END_IFCLR(MMU_FTR_TYPE_RADIX)
> +        cmpdi	r3,0
> +	beq+	interrupt_return
> +	/* We need to restore NVGPRS */
> +	REST_NVGPRS(r1)
> +	b       interrupt_return
>  
>  	GEN_KVM data_access
>  
> @@ -1540,13 +1545,17 @@ EXC_COMMON_BEGIN(instruction_access_common)
>  	GEN_COMMON instruction_access
>  	ld	r4,_DAR(r1)
>  	ld	r5,_DSISR(r1)
> +	addi	r3,r1,STACK_FRAME_OVERHEAD
>  BEGIN_MMU_FTR_SECTION
> -	ld      r6,_MSR(r1)
> -	li	r3,0x400
> -	b	do_hash_page		/* Try to handle as hpte fault */
> +	bl	do_hash_fault
>  MMU_FTR_SECTION_ELSE
> -	b	handle_page_fault
> +	bl	do_page_fault
>  ALT_MMU_FTR_SECTION_END_IFCLR(MMU_FTR_TYPE_RADIX)
> +        cmpdi	r3,0
> +	beq+	interrupt_return
> +	/* We need to restore NVGPRS */
> +	REST_NVGPRS(r1)
> +	b       interrupt_return
>  
>  	GEN_KVM instruction_access
>  
> @@ -3202,99 +3211,3 @@ disable_machine_check:
>  	RFI_TO_KERNEL
>  1:	mtlr	r0
>  	blr
> -
> -/*
> - * Hash table stuff
> - */
> -	.balign	IFETCH_ALIGN_BYTES
> -do_hash_page:
> -#ifdef CONFIG_PPC_BOOK3S_64
> -	lis	r0,(DSISR_BAD_FAULT_64S | DSISR_DABRMATCH | DSISR_KEYFAULT)@h
> -	ori	r0,r0,DSISR_BAD_FAULT_64S@l
> -	and.	r0,r5,r0		/* weird error? */
> -	bne-	handle_page_fault	/* if not, try to insert a HPTE */
> -
> -	/*
> -	 * If we are in an "NMI" (e.g., an interrupt when soft-disabled), then
> -	 * don't call hash_page, just fail the fault. This is required to
> -	 * prevent re-entrancy problems in the hash code, namely perf
> -	 * interrupts hitting while something holds H_PAGE_BUSY, and taking a
> -	 * hash fault. See the comment in hash_preload().
> -	 */
> -	ld	r11, PACA_THREAD_INFO(r13)
> -	lwz	r0,TI_PREEMPT(r11)
> -	andis.	r0,r0,NMI_MASK@h
> -	bne	77f
> -
> -	/*
> -	 * r3 contains the trap number
> -	 * r4 contains the faulting address
> -	 * r5 contains dsisr
> -	 * r6 msr
> -	 *
> -	 * at return r3 = 0 for success, 1 for page fault, negative for error
> -	 */
> -	bl	__hash_page		/* build HPTE if possible */
> -        cmpdi	r3,0			/* see if __hash_page succeeded */
> -
> -	/* Success */
> -	beq	interrupt_return	/* Return from exception on success */
> -
> -	/* Error */
> -	blt-	13f
> -
> -	/* Reload DAR/DSISR into r4/r5 for the DABR check below */
> -	ld	r4,_DAR(r1)
> -	ld      r5,_DSISR(r1)
> -#endif /* CONFIG_PPC_BOOK3S_64 */
> -
> -/* Here we have a page fault that hash_page can't handle. */
> -handle_page_fault:
> -11:	andis.  r0,r5,DSISR_DABRMATCH@h
> -	bne-    handle_dabr_fault
> -	addi	r3,r1,STACK_FRAME_OVERHEAD
> -	bl	do_page_fault
> -	cmpdi	r3,0
> -	beq+	interrupt_return
> -	mr	r5,r3
> -	addi	r3,r1,STACK_FRAME_OVERHEAD
> -	ld	r4,_DAR(r1)
> -	bl	bad_page_fault
> -	b	interrupt_return
> -
> -/* We have a data breakpoint exception - handle it */
> -handle_dabr_fault:
> -	ld      r4,_DAR(r1)
> -	ld      r5,_DSISR(r1)
> -	addi    r3,r1,STACK_FRAME_OVERHEAD
> -	bl      do_break
> -	/*
> -	 * do_break() may have changed the NV GPRS while handling a breakpoint.
> -	 * If so, we need to restore them with their updated values.
> -	 */
> -	REST_NVGPRS(r1)
> -	b       interrupt_return
> -
> -
> -#ifdef CONFIG_PPC_BOOK3S_64
> -/* We have a page fault that hash_page could handle but HV refused
> - * the PTE insertion
> - */
> -13:	mr	r5,r3
> -	addi	r3,r1,STACK_FRAME_OVERHEAD
> -	ld	r4,_DAR(r1)
> -	bl	low_hash_fault
> -	b	interrupt_return
> -#endif
> -
> -/*
> - * We come here as a result of a DSI at a point where we don't want
> - * to call hash_page, such as when we are accessing memory (possibly
> - * user memory) inside a PMU interrupt that occurred while interrupts
> - * were soft-disabled.  We want to invoke the exception handler for
> - * the access, or panic if there isn't a handler.
> - */
> -77:	addi	r3,r1,STACK_FRAME_OVERHEAD
> -	li	r5,SIGSEGV
> -	bl	bad_page_fault
> -	b	interrupt_return
> diff --git a/arch/powerpc/mm/book3s64/hash_utils.c b/arch/powerpc/mm/book3s64/hash_utils.c
> index 24702c0a92e0..bfa1b1966218 100644
> --- a/arch/powerpc/mm/book3s64/hash_utils.c
> +++ b/arch/powerpc/mm/book3s64/hash_utils.c
> @@ -1510,16 +1510,40 @@ int hash_page(unsigned long ea, unsigned long access, unsigned long trap,
>  }
>  EXPORT_SYMBOL_GPL(hash_page);
>  
> -int __hash_page(unsigned long trap, unsigned long ea, unsigned long dsisr,
> -		unsigned long msr)
> +int do_hash_fault(struct pt_regs *regs, unsigned long ea, unsigned long dsisr)
>  {
>  	unsigned long access = _PAGE_PRESENT | _PAGE_READ;
>  	unsigned long flags = 0;
> -	struct mm_struct *mm = current->mm;
> -	unsigned int region_id = get_region_id(ea);
> +	struct mm_struct *mm;
> +	unsigned int region_id;
> +	int err;
> +
> +	if (unlikely(dsisr & (DSISR_BAD_FAULT_64S | DSISR_DABRMATCH | DSISR_KEYFAULT)))
> +		goto _do_page_fault;
> +
> +	/*
> +	 * If we are in an "NMI" (e.g., an interrupt when soft-disabled), then
> +	 * don't call hash_page, just fail the fault. This is required to
> +	 * prevent re-entrancy problems in the hash code, namely perf
> +	 * interrupts hitting while something holds H_PAGE_BUSY, and taking a
> +	 * hash fault. See the comment in hash_preload().
> +	 *
> +	 * We come here as a result of a DSI at a point where we don't want
> +	 * to call hash_page, such as when we are accessing memory (possibly
> +	 * user memory) inside a PMU interrupt that occurred while interrupts
> +	 * were soft-disabled.  We want to invoke the exception handler for
> +	 * the access, or panic if there isn't a handler.
> +	 */
> +	if (unlikely(in_nmi())) {
> +		bad_page_fault(regs, ea, SIGSEGV);
> +		return 0;
> +	}
>  
> +	region_id = get_region_id(ea);
>  	if ((region_id == VMALLOC_REGION_ID) || (region_id == IO_REGION_ID))
>  		mm = &init_mm;
> +	else
> +		mm = current->mm;
>  
>  	if (dsisr & DSISR_NOHPTE)
>  		flags |= HPTE_NOHPTE_UPDATE;
> @@ -1535,13 +1559,31 @@ int __hash_page(unsigned long trap, unsigned long ea, unsigned long dsisr,
>  	 * 2) user space access kernel space.
>  	 */
>  	access |= _PAGE_PRIVILEGED;
> -	if ((msr & MSR_PR) || (region_id == USER_REGION_ID))
> +	if (user_mode(regs) || (region_id == USER_REGION_ID))
>  		access &= ~_PAGE_PRIVILEGED;
>  
> -	if (trap == 0x400)
> +	if (regs->trap == 0x400)
>  		access |= _PAGE_EXEC;
>  
> -	return hash_page_mm(mm, ea, access, trap, flags);
> +	err = hash_page_mm(mm, ea, access, regs->trap, flags);
> +	if (unlikely(err < 0)) {
> +		// failed to instert a hash PTE due to an hypervisor error
> +		if (user_mode(regs)) {
> +			if (IS_ENABLED(CONFIG_PPC_SUBPAGE_PROT) && err == -2)
> +				_exception(SIGSEGV, regs, SEGV_ACCERR, ea);
> +			else
> +				_exception(SIGBUS, regs, BUS_ADRERR, ea);
> +		} else {
> +			bad_page_fault(regs, ea, SIGBUS);
> +		}
> +		err = 0;
> +
> +	} else if (err) {
> +_do_page_fault:
> +		err = hash__do_page_fault(regs, ea, dsisr);
> +	}
> +
> +	return err;
>  }
>  
>  #ifdef CONFIG_PPC_MM_SLICES
> @@ -1841,27 +1883,6 @@ void flush_hash_range(unsigned long number, int local)
>  	}
>  }
>  
> -/*
> - * low_hash_fault is called when we the low level hash code failed
> - * to instert a PTE due to an hypervisor error
> - */
> -void low_hash_fault(struct pt_regs *regs, unsigned long address, int rc)
> -{
> -	enum ctx_state prev_state = exception_enter();
> -
> -	if (user_mode(regs)) {
> -#ifdef CONFIG_PPC_SUBPAGE_PROT
> -		if (rc == -2)
> -			_exception(SIGSEGV, regs, SEGV_ACCERR, address);
> -		else
> -#endif
> -			_exception(SIGBUS, regs, BUS_ADRERR, address);
> -	} else
> -		bad_page_fault(regs, address, SIGBUS);
> -
> -	exception_exit(prev_state);
> -}
> -
>  long hpte_insert_repeating(unsigned long hash, unsigned long vpn,
>  			   unsigned long pa, unsigned long rflags,
>  			   unsigned long vflags, int psize, int ssize)
> diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c
> index 0add963a849b..e65a49f246ef 100644
> --- a/arch/powerpc/mm/fault.c
> +++ b/arch/powerpc/mm/fault.c
> @@ -370,7 +370,9 @@ static void sanity_check_fault(bool is_write, bool is_user,
>  #define page_fault_is_write(__err)	((__err) & DSISR_ISSTORE)
>  #if defined(CONFIG_PPC_8xx)
>  #define page_fault_is_bad(__err)	((__err) & DSISR_NOEXEC_OR_G)
> -#elif defined(CONFIG_PPC64)
> +#elif defined(CONFIG_PPC_BOOK3S_64)
> +#define page_fault_is_bad(__err)	((__err) & (DSISR_BAD_FAULT_64S | DSISR_DABRMATCH))
> +#elif defined(CONFIG_PPC_BOOK3E_64)
>  #define page_fault_is_bad(__err)	((__err) & DSISR_BAD_FAULT_64S)
>  #else
>  #define page_fault_is_bad(__err)	((__err) & DSISR_BAD_FAULT_32S)
> @@ -406,6 +408,9 @@ static int __do_page_fault(struct pt_regs *regs, unsigned long address,
>  		return 0;
>  
>  	if (unlikely(page_fault_is_bad(error_code))) {
> +		if (IS_ENABLED(CONFIG_PPC_BOOK3S_64) && (error_code & DSISR_DABRMATCH))
> +			return -1;
> +
>  		if (is_user) {
>  			_exception(SIGBUS, regs, BUS_OBJERR, address);
>  			return 0;
> @@ -548,12 +553,58 @@ int do_page_fault(struct pt_regs *regs, unsigned long address,
>  		  unsigned long error_code)
>  {
>  	enum ctx_state prev_state = exception_enter();
> -	int rc = __do_page_fault(regs, address, error_code);
> +	int err;
> +
> +	err = __do_page_fault(regs, address, error_code);
> +
> +#ifdef CONFIG_PPC_BOOK3S_64
> +	/* 32 and 64e handle errors in their asm code */
> +	if (unlikely(err)) {
> +		if (err > 0) {
> +			bad_page_fault(regs, address, err);
> +			err = 0;
> +		} else {
> +			/*
> +			 * do_break() may change NV GPRS while handling the
> +			 * breakpoint. Return -ve to caller to do that.
> +			 */
> +			do_break(regs, address, error_code);
> +		}
> +	}
> +#endif
> +
>  	exception_exit(prev_state);
> -	return rc;
> +
> +	return err;
>  }
>  NOKPROBE_SYMBOL(do_page_fault);
>  
> +#ifdef CONFIG_PPC_BOOK3S_64
> +/* Same as do_page_fault but interrupt entry has already run in do_hash_fault */
> +int hash__do_page_fault(struct pt_regs *regs, unsigned long address,
> +		  unsigned long error_code)
> +{
> +	int err;
> +
> +	err = __do_page_fault(regs, address, error_code);
> +	if (unlikely(err)) {
> +		if (err > 0) {
> +			bad_page_fault(regs, address, err);
> +			err = 0;
> +		} else {
> +			/*
> +			 * do_break() may change NV GPRS while handling the
> +			 * breakpoint. Return -ve to caller to do that.
> +			 */
> +			do_break(regs, address, error_code);
> +		}
> +	}
> +
> +	return err;
> +}
> +NOKPROBE_SYMBOL(hash__do_page_fault);
> +#endif
> +
>  /*
>   * bad_page_fault is called when we have a bad access from the kernel.
>   * It is called from the DSI and ISI handlers in head.S and from some
> -- 
> 2.23.0

^ permalink raw reply

* Re: [PATCH 2/3] powerpc/pseries/hotplug-cpu: fix memleak in dlpar_cpu_add_by_count
From: Qinglang Miao @ 2020-11-30  7:28 UTC (permalink / raw)
  To: Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras
  Cc: linuxppc-dev, linux-kernel
In-Reply-To: <871rgby5lb.fsf@mpe.ellerman.id.au>



在 2020/11/30 9:51, Michael Ellerman 写道:
> Qinglang Miao <miaoqinglang@huawei.com> writes:
>> kfree(cpu_drcs) should be called when it fails to perform
>> of_find_node_by_path("/cpus") in dlpar_cpu_add_by_count,
>> otherwise there would be a memleak.
>>
>> In fact, the patch a0ff72f9f5a7 ought to remove kfree in
>> find_dlpar_cpus_to_add rather than dlpar_cpu_add_by_count.
>> I guess there might be a mistake when apply that one.
>>
>> Fixes: a0ff72f9f5a7 ("powerpc/pseries/hotplug-cpu: Remove double free in error path")
>> Reported-by: Hulk Robot <hulkci@huawei.com>
>> Signed-off-by: Qinglang Miao <miaoqinglang@huawei.com>
>> ---
>>   arch/powerpc/platforms/pseries/hotplug-cpu.c | 1 +
>>   1 file changed, 1 insertion(+)
> 
> This is already fixed in my next by:
> 
>    a40fdaf1420d ("Revert "powerpc/pseries/hotplug-cpu: Remove double free in error path"")
> 
> cheers'Revert' sounds resonable to this one, glad to know that.
> 
>> diff --git a/arch/powerpc/platforms/pseries/hotplug-cpu.c b/arch/powerpc/platforms/pseries/hotplug-cpu.c
>> index f2837e33b..4bb1c9f2b 100644
>> --- a/arch/powerpc/platforms/pseries/hotplug-cpu.c
>> +++ b/arch/powerpc/platforms/pseries/hotplug-cpu.c
>> @@ -743,6 +743,7 @@ static int dlpar_cpu_add_by_count(u32 cpus_to_add)
>>   	parent = of_find_node_by_path("/cpus");
>>   	if (!parent) {
>>   		pr_warn("Could not find CPU root node in device tree\n");
>> +		kfree(cpu_drcs);
>>   		return -1;
>>   	}
>>   
>> -- 
>> 2.23.0
> .
> 

^ permalink raw reply

* Re: [PATCH] powerpc: Allow relative pointers in bug table entries
From: Christophe Leroy @ 2020-11-30  6:27 UTC (permalink / raw)
  To: Jordan Niethe, Michael Ellerman; +Cc: linuxppc-dev
In-Reply-To: <CACzsE9r6GoFANSGw_6SK0R7SZGbU+U0_UvDLH9Pzj_LRBsHJQw@mail.gmail.com>



Le 30/11/2020 à 02:50, Jordan Niethe a écrit :
> On Mon, Nov 30, 2020 at 12:42 PM Michael Ellerman <mpe@ellerman.id.au> wrote:
>>
>> Christophe Leroy <christophe.leroy@csgroup.eu> writes:
>>> Le 27/11/2020 à 04:02, Jordan Niethe a écrit :
>>>> This enables GENERIC_BUG_RELATIVE_POINTERS on Power so that 32-bit
>>>> offsets are stored in the bug entries rather than 64-bit pointers.
>>>>
>>>> Signed-off-by: Jordan Niethe <jniethe5@gmail.com>
>>>> ---
>>>>    arch/powerpc/Kconfig           |  4 ++++
>>>>    arch/powerpc/include/asm/bug.h | 37 ++++++++++++++++++++++++++++++++--
>>>>    arch/powerpc/xmon/xmon.c       | 17 ++++++++++++++--
>>>>    3 files changed, 54 insertions(+), 4 deletions(-)
>>>>
>>>> diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
>>>> index e9f13fe08492..294108e0e5c6 100644
>>>> --- a/arch/powerpc/Kconfig
>>>> +++ b/arch/powerpc/Kconfig
>>>> @@ -311,6 +311,10 @@ config GENERIC_BUG
>>>>       default y
>>>>       depends on BUG
>>>>
>>>> +config GENERIC_BUG_RELATIVE_POINTERS
>>>> +    def_bool y
>>>> +    depends on GENERIC_BUG
>>>> +
>>>>    config SYS_SUPPORTS_APM_EMULATION
>>>>       default y if PMAC_APM_EMU
>>>>       bool
>>>> diff --git a/arch/powerpc/include/asm/bug.h b/arch/powerpc/include/asm/bug.h
>>>> index 338f36cd9934..d03d834042a1 100644
>>>> --- a/arch/powerpc/include/asm/bug.h
>>>> +++ b/arch/powerpc/include/asm/bug.h
>>>> @@ -12,7 +12,11 @@
>>>>    #ifdef CONFIG_DEBUG_BUGVERBOSE
>>>>    .macro EMIT_BUG_ENTRY addr,file,line,flags
>>>>        .section __bug_table,"aw"
>>>> +#ifndef CONFIG_GENERIC_BUG_RELATIVE_POINTERS
>>>
>>> As far as I understand, as soon as CONFIG_BUG is selected, GENERIC_BUG is automatically selected so
>>> GENERIC_BUG_RELATIVE_POINTERS is selected as well. Therefore this #ifndef is never possible.
>>
>> Yeah.
>>
>> There is one place in the generic code that has an ifndef CONFIG_GENERIC_BUG_RELATIVE_POINTERS
>> but that's because it has to support arches that don't select it.
>>
>> In the arch code we know that it's enabled, so there should be no need
>> for any ifdefs.
> For 32bit, pointers are 4 bytes anyway so it would be pointless to
> store a displacement, so won't we need some ifdefs for that?

I'd say it the other way round:

For 32bit, pointers are 4 bytes anyway so it would be pointless to
make it different from 64bit.

We are definitely not on a performance critical path when dealing with bug entries, so I think it is 
better to keep a common code for PPC32 and PPC64.

Christophe

^ permalink raw reply

* [PATCH 1/2] ASoC: fsl-asoc-card: Add support for si476x codec
From: Shengjiu Wang @ 2020-11-30  3:57 UTC (permalink / raw)
  To: timur, nicoleotsuka, Xiubo.Lee, festevam, lgirdwood, broonie,
	perex, tiwai, alsa-devel, linuxppc-dev, linux-kernel, robh+dt,
	devicetree

The si476x codec is used for FM radio function on i.MX6
auto board, it only supports recording function.

Signed-off-by: Shengjiu Wang <shengjiu.wang@nxp.com>
---
 sound/soc/fsl/fsl-asoc-card.c | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/sound/soc/fsl/fsl-asoc-card.c b/sound/soc/fsl/fsl-asoc-card.c
index a2dd3b6b7fec..f62f81ceab0d 100644
--- a/sound/soc/fsl/fsl-asoc-card.c
+++ b/sound/soc/fsl/fsl-asoc-card.c
@@ -131,6 +131,13 @@ static const struct snd_soc_dapm_route audio_map_tx[] = {
 	{"CPU-Playback",  NULL, "ASRC-Playback"},
 };
 
+static const struct snd_soc_dapm_route audio_map_rx[] = {
+	/* 1st half -- Normal DAPM routes */
+	{"CPU-Capture",  NULL, "Capture"},
+	/* 2nd half -- ASRC DAPM routes */
+	{"ASRC-Capture",  NULL, "CPU-Capture"},
+};
+
 /* Add all possible widgets into here without being redundant */
 static const struct snd_soc_dapm_widget fsl_asoc_card_dapm_widgets[] = {
 	SND_SOC_DAPM_LINE("Line Out Jack", NULL),
@@ -653,6 +660,11 @@ static int fsl_asoc_card_probe(struct platform_device *pdev)
 		priv->cpu_priv.slot_width = 32;
 		priv->card.dapm_routes = audio_map_tx;
 		priv->card.num_dapm_routes = ARRAY_SIZE(audio_map_tx);
+	} else if (of_device_is_compatible(np, "fsl,imx-audio-si476x")) {
+		codec_dai_name = "si476x-codec";
+		priv->dai_fmt |= SND_SOC_DAIFMT_CBS_CFS;
+		priv->card.dapm_routes = audio_map_rx;
+		priv->card.num_dapm_routes = ARRAY_SIZE(audio_map_rx);
 	} else {
 		dev_err(&pdev->dev, "unknown Device Tree compatible\n");
 		ret = -EINVAL;
@@ -869,6 +881,7 @@ static const struct of_device_id fsl_asoc_card_dt_ids[] = {
 	{ .compatible = "fsl,imx-audio-wm8960", },
 	{ .compatible = "fsl,imx-audio-mqs", },
 	{ .compatible = "fsl,imx-audio-wm8524", },
+	{ .compatible = "fsl,imx-audio-si476x", },
 	{}
 };
 MODULE_DEVICE_TABLE(of, fsl_asoc_card_dt_ids);
-- 
2.27.0


^ permalink raw reply related

* [PATCH 2/2] ASoC: bindings: fsl-asoc-card: add compatible string for si476x codec
From: Shengjiu Wang @ 2020-11-30  3:57 UTC (permalink / raw)
  To: timur, nicoleotsuka, Xiubo.Lee, festevam, lgirdwood, broonie,
	perex, tiwai, alsa-devel, linuxppc-dev, linux-kernel, robh+dt,
	devicetree
In-Reply-To: <1606708668-28786-1-git-send-email-shengjiu.wang@nxp.com>

The si476x codec is used for FM radio function on i.MX6
auto board.

Signed-off-by: Shengjiu Wang <shengjiu.wang@nxp.com>
---
 Documentation/devicetree/bindings/sound/fsl-asoc-card.txt | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/Documentation/devicetree/bindings/sound/fsl-asoc-card.txt b/Documentation/devicetree/bindings/sound/fsl-asoc-card.txt
index f339be62e7e4..90d9e9d81624 100644
--- a/Documentation/devicetree/bindings/sound/fsl-asoc-card.txt
+++ b/Documentation/devicetree/bindings/sound/fsl-asoc-card.txt
@@ -40,6 +40,8 @@ The compatible list for this generic sound card currently:
 
  "fsl,imx-audio-tlv320aic32x4"
 
+ "fsl,imx-audio-si476x"
+
 Required properties:
 
   - compatible		: Contains one of entries in the compatible list.
-- 
2.27.0


^ permalink raw reply related

* Re: [PATCH 1/8] powerpc/64s/powernv: Fix memory corruption when saving SLB entries on MCE
From: Mahesh J Salgaonkar @ 2020-11-30  3:55 UTC (permalink / raw)
  To: Nicholas Piggin; +Cc: linuxppc-dev, kvm-ppc
In-Reply-To: <20201128070728.825934-2-npiggin@gmail.com>

On 2020-11-28 17:07:21 Sat, Nicholas Piggin wrote:
> This can be hit by an HPT guest running on an HPT host and bring down
> the host, so it's quite important to fix.
> 
> Fixes: 7290f3b3d3e66 ("powerpc/64s/powernv: machine check dump SLB contents")
> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> ---
>  arch/powerpc/platforms/powernv/setup.c | 9 +++++++--
>  1 file changed, 7 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/powerpc/platforms/powernv/setup.c b/arch/powerpc/platforms/powernv/setup.c
> index 46115231a3b2..4426a109ec2f 100644
> --- a/arch/powerpc/platforms/powernv/setup.c
> +++ b/arch/powerpc/platforms/powernv/setup.c
> @@ -211,11 +211,16 @@ static void __init pnv_init(void)
>  		add_preferred_console("hvc", 0, NULL);
>  
>  	if (!radix_enabled()) {
> +		size_t size = sizeof(struct slb_entry) * mmu_slb_size;

Acked-by: Mahesh Salgaonkar <mahesh@linux.ibm.com>

Thanks,
-Mahesh.


>  		int i;
>  
>  		/* Allocate per cpu area to save old slb contents during MCE */
> -		for_each_possible_cpu(i)
> -			paca_ptrs[i]->mce_faulty_slbs = memblock_alloc_node(mmu_slb_size, __alignof__(*paca_ptrs[i]->mce_faulty_slbs), cpu_to_node(i));
> +		for_each_possible_cpu(i) {
> +			paca_ptrs[i]->mce_faulty_slbs =
> +					memblock_alloc_node(size,
> +						__alignof__(struct slb_entry),
> +						cpu_to_node(i));
> +		}
>  	}
>  }
>  
> -- 
> 2.23.0
> 

-- 
Mahesh J Salgaonkar

^ permalink raw reply

* [PATCH] powerpc/xmon: Fix build failure for 8xx
From: Ravi Bangoria @ 2020-11-30  3:44 UTC (permalink / raw)
  To: mpe; +Cc: christophe.leroy, ravi.bangoria, mikey, linuxppc-dev

With CONFIG_PPC_8xx and CONFIG_XMON set, kernel build fails with

  arch/powerpc/xmon/xmon.c:1379:12: error: 'find_free_data_bpt' defined
  but not used [-Werror=unused-function]

Fix it by enclosing find_free_data_bpt() inside #ifndef CONFIG_PPC_8xx.

Reported-by: kernel test robot <lkp@intel.com>
Fixes: 30df74d67d48 ("powerpc/watchpoint/xmon: Support 2nd DAWR")
Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
---
 arch/powerpc/xmon/xmon.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/powerpc/xmon/xmon.c b/arch/powerpc/xmon/xmon.c
index 55c43a6c9111..5559edf36756 100644
--- a/arch/powerpc/xmon/xmon.c
+++ b/arch/powerpc/xmon/xmon.c
@@ -1383,6 +1383,7 @@ static long check_bp_loc(unsigned long addr)
 	return 1;
 }
 
+#ifndef CONFIG_PPC_8xx
 static int find_free_data_bpt(void)
 {
 	int i;
@@ -1394,6 +1395,7 @@ static int find_free_data_bpt(void)
 	printf("Couldn't find free breakpoint register\n");
 	return -1;
 }
+#endif
 
 static void print_data_bpts(void)
 {
-- 
2.17.1


^ permalink raw reply related

* Re: [RFC PATCH] KVM: PPC: Book3S HV: Remove support for running HPT guest on RPT host without mixed mode support
From: Michael Ellerman @ 2020-11-30  2:40 UTC (permalink / raw)
  To: Nicholas Piggin, kvm-ppc; +Cc: linuxppc-dev, Nicholas Piggin
In-Reply-To: <20201128095232.837260-1-npiggin@gmail.com>

Nicholas Piggin <npiggin@gmail.com> writes:
> This reverts much of commit c01015091a770 ("KVM: PPC: Book3S HV: Run HPT
> guests on POWER9 radix hosts"), which was required to run HPT guests on
> RPT hosts on early POWER9 CPUs without support for "mixed mode", which
> meant the host could not run with MMU on while guests were running.

Would be worth mentioning which CPU versions. 

Looking at the code it seems like it's P9N < 2.2 and P9C < 1.1.

> This code has some corner case bugs, e.g., when the guest hits a machine
> check or HMI the primary locks up waiting for secondaries to switch LPCR
> to host, which they never do. This could all be fixed in software, but
> most CPUs in production have mixed mode support, and those that don't
> are believed to be all in installations that don't use this capability.
> So simplify things and remove support.

Key detail being, AFAICS, you retain enough code to detect that we're in
that configuration and cleanly return an error, rather than crashing or
anything horrible.

Otherwise looks good to me.

cheers

> diff --git a/arch/powerpc/include/asm/kvm_book3s_asm.h b/arch/powerpc/include/asm/kvm_book3s_asm.h
> index 078f4648ea27..b6d31bff5209 100644
> --- a/arch/powerpc/include/asm/kvm_book3s_asm.h
> +++ b/arch/powerpc/include/asm/kvm_book3s_asm.h
> @@ -74,16 +74,6 @@ struct kvm_split_mode {
>  	u8		do_nap;
>  	u8		napped[MAX_SMT_THREADS];
>  	struct kvmppc_vcore *vc[MAX_SUBCORES];
> -	/* Bits for changing lpcr on P9 */
> -	unsigned long	lpcr_req;
> -	unsigned long	lpidr_req;
> -	unsigned long	host_lpcr;
> -	u32		do_set;
> -	u32		do_restore;
> -	union {
> -		u32	allphases;
> -		u8	phase[4];
> -	} lpcr_sync;
>  };
>  
>  /*
> @@ -110,7 +100,6 @@ struct kvmppc_host_state {
>  	u8 hwthread_state;
>  	u8 host_ipi;
>  	u8 ptid;		/* thread number within subcore when split */
> -	u8 tid;			/* thread number within whole core */
>  	u8 fake_suspend;
>  	struct kvm_vcpu *kvm_vcpu;
>  	struct kvmppc_vcore *kvm_vcore;
> diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c
> index c2722ff36e98..21496ea09bf1 100644
> --- a/arch/powerpc/kernel/asm-offsets.c
> +++ b/arch/powerpc/kernel/asm-offsets.c
> @@ -690,7 +690,6 @@ int main(void)
>  	HSTATE_FIELD(HSTATE_SAVED_XIRR, saved_xirr);
>  	HSTATE_FIELD(HSTATE_HOST_IPI, host_ipi);
>  	HSTATE_FIELD(HSTATE_PTID, ptid);
> -	HSTATE_FIELD(HSTATE_TID, tid);
>  	HSTATE_FIELD(HSTATE_FAKE_SUSPEND, fake_suspend);
>  	HSTATE_FIELD(HSTATE_MMCR0, host_mmcr[0]);
>  	HSTATE_FIELD(HSTATE_MMCR1, host_mmcr[1]);
> @@ -720,8 +719,6 @@ int main(void)
>  	OFFSET(KVM_SPLIT_LDBAR, kvm_split_mode, ldbar);
>  	OFFSET(KVM_SPLIT_DO_NAP, kvm_split_mode, do_nap);
>  	OFFSET(KVM_SPLIT_NAPPED, kvm_split_mode, napped);
> -	OFFSET(KVM_SPLIT_DO_SET, kvm_split_mode, do_set);
> -	OFFSET(KVM_SPLIT_DO_RESTORE, kvm_split_mode, do_restore);
>  #endif /* CONFIG_KVM_BOOK3S_HV_POSSIBLE */
>  
>  #ifdef CONFIG_PPC_BOOK3S_64
> diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
> index c94f9595133d..86b78f8e3dde 100644
> --- a/arch/powerpc/kvm/book3s_hv.c
> +++ b/arch/powerpc/kvm/book3s_hv.c
> @@ -134,7 +134,7 @@ static inline bool nesting_enabled(struct kvm *kvm)
>  }
>  
>  /* If set, the threads on each CPU core have to be in the same MMU mode */
> -static bool no_mixing_hpt_and_radix;
> +static bool no_mixing_hpt_and_radix __read_mostly;
>  
>  static int kvmppc_hv_setup_htab_rma(struct kvm_vcpu *vcpu);
>  
> @@ -2855,11 +2855,6 @@ static bool can_dynamic_split(struct kvmppc_vcore *vc, struct core_info *cip)
>  	if (one_vm_per_core && vc->kvm != cip->vc[0]->kvm)
>  		return false;
>  
> -	/* Some POWER9 chips require all threads to be in the same MMU mode */
> -	if (no_mixing_hpt_and_radix &&
> -	    kvm_is_radix(vc->kvm) != kvm_is_radix(cip->vc[0]->kvm))
> -		return false;
> -
>  	if (n_threads < cip->max_subcore_threads)
>  		n_threads = cip->max_subcore_threads;
>  	if (!subcore_config_ok(cip->n_subcores + 1, n_threads))
> @@ -2898,6 +2893,9 @@ static void prepare_threads(struct kvmppc_vcore *vc)
>  	for_each_runnable_thread(i, vcpu, vc) {
>  		if (signal_pending(vcpu->arch.run_task))
>  			vcpu->arch.ret = -EINTR;
> +		else if (no_mixing_hpt_and_radix &&
> +			 kvm_is_radix(vc->kvm) != radix_enabled())
> +			vcpu->arch.ret = -EINVAL;
>  		else if (vcpu->arch.vpa.update_pending ||
>  			 vcpu->arch.slb_shadow.update_pending ||
>  			 vcpu->arch.dtl.update_pending)
> @@ -3103,7 +3101,6 @@ static noinline void kvmppc_run_core(struct kvmppc_vcore *vc)
>  	int controlled_threads;
>  	int trap;
>  	bool is_power8;
> -	bool hpt_on_radix;
>  
>  	/*
>  	 * Remove from the list any threads that have a signal pending
> @@ -3136,11 +3133,8 @@ static noinline void kvmppc_run_core(struct kvmppc_vcore *vc)
>  	 * this is a HPT guest on a radix host machine where the
>  	 * CPU threads may not be in different MMU modes.
>  	 */
> -	hpt_on_radix = no_mixing_hpt_and_radix && radix_enabled() &&
> -		!kvm_is_radix(vc->kvm);
> -	if (((controlled_threads > 1) &&
> -	     ((vc->num_threads > threads_per_subcore) || !on_primary_thread())) ||
> -	    (hpt_on_radix && vc->kvm->arch.threads_indep)) {
> +	if ((controlled_threads > 1) &&
> +	    ((vc->num_threads > threads_per_subcore) || !on_primary_thread())) {
>  		for_each_runnable_thread(i, vcpu, vc) {
>  			vcpu->arch.ret = -EBUSY;
>  			kvmppc_remove_runnable(vc, vcpu);
> @@ -3208,7 +3202,7 @@ static noinline void kvmppc_run_core(struct kvmppc_vcore *vc)
>  	is_power8 = cpu_has_feature(CPU_FTR_ARCH_207S)
>  		&& !cpu_has_feature(CPU_FTR_ARCH_300);
>  
> -	if (split > 1 || hpt_on_radix) {
> +	if (split > 1) {
>  		sip = &split_info;
>  		memset(&split_info, 0, sizeof(split_info));
>  		for (sub = 0; sub < core_info.n_subcores; ++sub)
> @@ -3230,13 +3224,6 @@ static noinline void kvmppc_run_core(struct kvmppc_vcore *vc)
>  			split_info.subcore_size = subcore_size;
>  		} else {
>  			split_info.subcore_size = 1;
> -			if (hpt_on_radix) {
> -				/* Use the split_info for LPCR/LPIDR changes */
> -				split_info.lpcr_req = vc->lpcr;
> -				split_info.lpidr_req = vc->kvm->arch.lpid;
> -				split_info.host_lpcr = vc->kvm->arch.host_lpcr;
> -				split_info.do_set = 1;
> -			}
>  		}
>  
>  		/* order writes to split_info before kvm_split_mode pointer */
> @@ -3246,7 +3233,6 @@ static noinline void kvmppc_run_core(struct kvmppc_vcore *vc)
>  	for (thr = 0; thr < controlled_threads; ++thr) {
>  		struct paca_struct *paca = paca_ptrs[pcpu + thr];
>  
> -		paca->kvm_hstate.tid = thr;
>  		paca->kvm_hstate.napping = 0;
>  		paca->kvm_hstate.kvm_split_mode = sip;
>  	}
> @@ -3320,10 +3306,8 @@ static noinline void kvmppc_run_core(struct kvmppc_vcore *vc)
>  	 * When doing micro-threading, poke the inactive threads as well.
>  	 * This gets them to the nap instruction after kvm_do_nap,
>  	 * which reduces the time taken to unsplit later.
> -	 * For POWER9 HPT guest on radix host, we need all the secondary
> -	 * threads woken up so they can do the LPCR/LPIDR change.
>  	 */
> -	if (cmd_bit || hpt_on_radix) {
> +	if (cmd_bit) {
>  		split_info.do_nap = 1;	/* ask secondaries to nap when done */
>  		for (thr = 1; thr < threads_per_subcore; ++thr)
>  			if (!(active & (1 << thr)))
> @@ -3384,19 +3368,8 @@ static noinline void kvmppc_run_core(struct kvmppc_vcore *vc)
>  			cpu_relax();
>  			++loops;
>  		}
> -	} else if (hpt_on_radix) {
> -		/* Wait for all threads to have seen final sync */
> -		for (thr = 1; thr < controlled_threads; ++thr) {
> -			struct paca_struct *paca = paca_ptrs[pcpu + thr];
> -
> -			while (paca->kvm_hstate.kvm_split_mode) {
> -				HMT_low();
> -				barrier();
> -			}
> -			HMT_medium();
> -		}
> +		split_info.do_nap = 0;
>  	}
> -	split_info.do_nap = 0;
>  
>  	kvmppc_set_host_core(pcpu);
>  
> @@ -4166,7 +4139,6 @@ int kvmhv_run_single_vcpu(struct kvm_vcpu *vcpu, u64 time_limit,
>  
>  	kvmppc_clear_host_core(pcpu);
>  
> -	local_paca->kvm_hstate.tid = 0;
>  	local_paca->kvm_hstate.napping = 0;
>  	local_paca->kvm_hstate.kvm_split_mode = NULL;
>  	kvmppc_start_thread(vcpu, vc);
> @@ -4351,15 +4323,11 @@ static int kvmppc_vcpu_run_hv(struct kvm_vcpu *vcpu)
>  
>  	do {
>  		/*
> -		 * The early POWER9 chips that can't mix radix and HPT threads
> -		 * on the same core also need the workaround for the problem
> -		 * where the TLB would prefetch entries in the guest exit path
> -		 * for radix guests using the guest PIDR value and LPID 0.
> -		 * The workaround is in the old path (kvmppc_run_vcpu())
> -		 * but not the new path (kvmhv_run_single_vcpu()).
> +		 * The TLB prefetch bug fixup is only in the kvmppc_run_vcpu
> +		 * path, which also handles hash and dependent threads mode.
>  		 */
>  		if (kvm->arch.threads_indep && kvm_is_radix(kvm) &&
> -		    !no_mixing_hpt_and_radix)
> +		    !cpu_has_feature(CPU_FTR_P9_RADIX_PREFETCH_BUG))
>  			r = kvmhv_run_single_vcpu(vcpu, ~(u64)0,
>  						  vcpu->arch.vcore->lpcr);
>  		else
> diff --git a/arch/powerpc/kvm/book3s_hv_builtin.c b/arch/powerpc/kvm/book3s_hv_builtin.c
> index 8f58dd20b362..f3d3183249fe 100644
> --- a/arch/powerpc/kvm/book3s_hv_builtin.c
> +++ b/arch/powerpc/kvm/book3s_hv_builtin.c
> @@ -277,8 +277,7 @@ void kvmhv_commence_exit(int trap)
>  	struct kvmppc_vcore *vc = local_paca->kvm_hstate.kvm_vcore;
>  	int ptid = local_paca->kvm_hstate.ptid;
>  	struct kvm_split_mode *sip = local_paca->kvm_hstate.kvm_split_mode;
> -	int me, ee, i, t;
> -	int cpu0;
> +	int me, ee, i;
>  
>  	/* Set our bit in the threads-exiting-guest map in the 0xff00
>  	   bits of vcore->entry_exit_map */
> @@ -320,22 +319,6 @@ void kvmhv_commence_exit(int trap)
>  		if ((ee >> 8) == 0)
>  			kvmhv_interrupt_vcore(vc, ee);
>  	}
> -
> -	/*
> -	 * On POWER9 when running a HPT guest on a radix host (sip != NULL),
> -	 * we have to interrupt inactive CPU threads to get them to
> -	 * restore the host LPCR value.
> -	 */
> -	if (sip->lpcr_req) {
> -		if (cmpxchg(&sip->do_restore, 0, 1) == 0) {
> -			vc = local_paca->kvm_hstate.kvm_vcore;
> -			cpu0 = vc->pcpu + ptid - local_paca->kvm_hstate.tid;
> -			for (t = 1; t < threads_per_core; ++t) {
> -				if (sip->napped[t])
> -					kvmhv_rm_send_ipi(cpu0 + t);
> -			}
> -		}
> -	}
>  }
>  
>  struct kvmppc_host_rm_ops *kvmppc_host_rm_ops_hv;
> @@ -667,86 +650,6 @@ void kvmppc_bad_interrupt(struct pt_regs *regs)
>  	panic("Bad KVM trap");
>  }
>  
> -/*
> - * Functions used to switch LPCR HR and UPRT bits on all threads
> - * when entering and exiting HPT guests on a radix host.
> - */
> -
> -#define PHASE_REALMODE		1	/* in real mode */
> -#define PHASE_SET_LPCR		2	/* have set LPCR */
> -#define PHASE_OUT_OF_GUEST	4	/* have finished executing in guest */
> -#define PHASE_RESET_LPCR	8	/* have reset LPCR to host value */
> -
> -#define ALL(p)		(((p) << 24) | ((p) << 16) | ((p) << 8) | (p))
> -
> -static void wait_for_sync(struct kvm_split_mode *sip, int phase)
> -{
> -	int thr = local_paca->kvm_hstate.tid;
> -
> -	sip->lpcr_sync.phase[thr] |= phase;
> -	phase = ALL(phase);
> -	while ((sip->lpcr_sync.allphases & phase) != phase) {
> -		HMT_low();
> -		barrier();
> -	}
> -	HMT_medium();
> -}
> -
> -void kvmhv_p9_set_lpcr(struct kvm_split_mode *sip)
> -{
> -	unsigned long rb, set;
> -
> -	/* wait for every other thread to get to real mode */
> -	wait_for_sync(sip, PHASE_REALMODE);
> -
> -	/* Set LPCR and LPIDR */
> -	mtspr(SPRN_LPCR, sip->lpcr_req);
> -	mtspr(SPRN_LPID, sip->lpidr_req);
> -	isync();
> -
> -	/* Invalidate the TLB on thread 0 */
> -	if (local_paca->kvm_hstate.tid == 0) {
> -		sip->do_set = 0;
> -		asm volatile("ptesync" : : : "memory");
> -		for (set = 0; set < POWER9_TLB_SETS_RADIX; ++set) {
> -			rb = TLBIEL_INVAL_SET_LPID +
> -				(set << TLBIEL_INVAL_SET_SHIFT);
> -			asm volatile(PPC_TLBIEL(%0, %1, 0, 0, 0) : :
> -				     "r" (rb), "r" (0));
> -		}
> -		asm volatile("ptesync" : : : "memory");
> -	}
> -
> -	/* indicate that we have done so and wait for others */
> -	wait_for_sync(sip, PHASE_SET_LPCR);
> -	/* order read of sip->lpcr_sync.allphases vs. sip->do_set */
> -	smp_rmb();
> -}
> -
> -/*
> - * Called when a thread that has been in the guest needs
> - * to reload the host LPCR value - but only on POWER9 when
> - * running a HPT guest on a radix host.
> - */
> -void kvmhv_p9_restore_lpcr(struct kvm_split_mode *sip)
> -{
> -	/* we're out of the guest... */
> -	wait_for_sync(sip, PHASE_OUT_OF_GUEST);
> -
> -	mtspr(SPRN_LPID, 0);
> -	mtspr(SPRN_LPCR, sip->host_lpcr);
> -	isync();
> -
> -	if (local_paca->kvm_hstate.tid == 0) {
> -		sip->do_restore = 0;
> -		smp_wmb();	/* order store of do_restore vs. phase */
> -	}
> -
> -	wait_for_sync(sip, PHASE_RESET_LPCR);
> -	smp_mb();
> -	local_paca->kvm_hstate.kvm_split_mode = NULL;
> -}
> -
>  static void kvmppc_end_cede(struct kvm_vcpu *vcpu)
>  {
>  	vcpu->arch.ceded = 0;
> diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> index cd9995ee8441..d5a9b57ec129 100644
> --- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> +++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> @@ -85,19 +85,6 @@ _GLOBAL_TOC(kvmppc_hv_entry_trampoline)
>  	RFI_TO_KERNEL
>  
>  kvmppc_call_hv_entry:
> -BEGIN_FTR_SECTION
> -	/* On P9, do LPCR setting, if necessary */
> -	ld	r3, HSTATE_SPLIT_MODE(r13)
> -	cmpdi	r3, 0
> -	beq	46f
> -	lwz	r4, KVM_SPLIT_DO_SET(r3)
> -	cmpwi	r4, 0
> -	beq	46f
> -	bl	kvmhv_p9_set_lpcr
> -	nop
> -46:
> -END_FTR_SECTION_IFSET(CPU_FTR_ARCH_300)
> -
>  	ld	r4, HSTATE_KVM_VCPU(r13)
>  	bl	kvmppc_hv_entry
>  
> @@ -361,11 +348,11 @@ kvm_secondary_got_guest:
>  	LOAD_REG_ADDR(r6, decrementer_max)
>  	ld	r6, 0(r6)
>  	mtspr	SPRN_HDEC, r6
> +BEGIN_FTR_SECTION
>  	/* and set per-LPAR registers, if doing dynamic micro-threading */
>  	ld	r6, HSTATE_SPLIT_MODE(r13)
>  	cmpdi	r6, 0
>  	beq	63f
> -BEGIN_FTR_SECTION
>  	ld	r0, KVM_SPLIT_RPR(r6)
>  	mtspr	SPRN_RPR, r0
>  	ld	r0, KVM_SPLIT_PMMAR(r6)
> @@ -373,16 +360,7 @@ BEGIN_FTR_SECTION
>  	ld	r0, KVM_SPLIT_LDBAR(r6)
>  	mtspr	SPRN_LDBAR, r0
>  	isync
> -FTR_SECTION_ELSE
> -	/* On P9 we use the split_info for coordinating LPCR changes */
> -	lwz	r4, KVM_SPLIT_DO_SET(r6)
> -	cmpwi	r4, 0
> -	beq	1f
> -	mr	r3, r6
> -	bl	kvmhv_p9_set_lpcr
> -	nop
> -1:
> -ALT_FTR_SECTION_END_IFCLR(CPU_FTR_ARCH_300)
> +END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_300)
>  63:
>  	/* Order load of vcpu after load of vcore */
>  	lwsync
> @@ -452,19 +430,15 @@ kvm_no_guest:
>  	mtcr	r5
>  	blr
>  
> -53:	HMT_LOW
> +53:
> +BEGIN_FTR_SECTION
> +	HMT_LOW
>  	ld	r5, HSTATE_KVM_VCORE(r13)
>  	cmpdi	r5, 0
>  	bne	60f
>  	ld	r3, HSTATE_SPLIT_MODE(r13)
>  	cmpdi	r3, 0
>  	beq	kvm_no_guest
> -	lwz	r0, KVM_SPLIT_DO_SET(r3)
> -	cmpwi	r0, 0
> -	bne	kvmhv_do_set
> -	lwz	r0, KVM_SPLIT_DO_RESTORE(r3)
> -	cmpwi	r0, 0
> -	bne	kvmhv_do_restore
>  	lbz	r0, KVM_SPLIT_DO_NAP(r3)
>  	cmpwi	r0, 0
>  	beq	kvm_no_guest
> @@ -472,24 +446,19 @@ kvm_no_guest:
>  	b	kvm_unsplit_nap
>  60:	HMT_MEDIUM
>  	b	kvm_secondary_got_guest
> +FTR_SECTION_ELSE
> +	HMT_LOW
> +	ld	r5, HSTATE_KVM_VCORE(r13)
> +	cmpdi	r5, 0
> +	beq	kvm_no_guest
> +	HMT_MEDIUM
> +	b	kvm_secondary_got_guest
> +ALT_FTR_SECTION_END_IFCLR(CPU_FTR_ARCH_300)
>  
>  54:	li	r0, KVM_HWTHREAD_IN_KVM
>  	stb	r0, HSTATE_HWTHREAD_STATE(r13)
>  	b	kvm_no_guest
>  
> -kvmhv_do_set:
> -	/* Set LPCR, LPIDR etc. on P9 */
> -	HMT_MEDIUM
> -	bl	kvmhv_p9_set_lpcr
> -	nop
> -	b	kvm_no_guest
> -
> -kvmhv_do_restore:
> -	HMT_MEDIUM
> -	bl	kvmhv_p9_restore_lpcr
> -	nop
> -	b	kvm_no_guest
> -
>  /*
>   * Here the primary thread is trying to return the core to
>   * whole-core mode, so we need to nap.
> @@ -527,7 +496,8 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
>  	/* Set kvm_split_mode.napped[tid] = 1 */
>  	ld	r3, HSTATE_SPLIT_MODE(r13)
>  	li	r0, 1
> -	lbz	r4, HSTATE_TID(r13)
> +	lhz	r4, PACAPACAINDEX(r13)
> +	clrldi	r4, r4, 61	/* micro-threading => P8 => 8 threads/core */
>  	addi	r4, r4, KVM_SPLIT_NAPPED
>  	stbx	r0, r3, r4
>  	/* Check the do_nap flag again after setting napped[] */
> @@ -1938,24 +1908,10 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
>  19:	lis	r8,0x7fff		/* MAX_INT@h */
>  	mtspr	SPRN_HDEC,r8
>  
> -16:
> -BEGIN_FTR_SECTION
> -	/* On POWER9 with HPT-on-radix we need to wait for all other threads */
> -	ld	r3, HSTATE_SPLIT_MODE(r13)
> -	cmpdi	r3, 0
> -	beq	47f
> -	lwz	r8, KVM_SPLIT_DO_RESTORE(r3)
> -	cmpwi	r8, 0
> -	beq	47f
> -	bl	kvmhv_p9_restore_lpcr
> -	nop
> -	b	48f
> -47:
> -END_FTR_SECTION_IFSET(CPU_FTR_ARCH_300)
> -	ld	r8,KVM_HOST_LPCR(r4)
> +16:	ld	r8,KVM_HOST_LPCR(r4)
>  	mtspr	SPRN_LPCR,r8
>  	isync
> -48:
> +
>  #ifdef CONFIG_KVM_BOOK3S_HV_EXIT_TIMING
>  	/* Finish timing, if we have a vcpu */
>  	ld	r4, HSTATE_KVM_VCPU(r13)
> @@ -2779,8 +2735,10 @@ ALT_FTR_SECTION_END_IFSET(CPU_FTR_ARCH_300)
>  	beq	kvm_end_cede
>  	cmpwi	r0, NAPPING_NOVCPU
>  	beq	kvm_novcpu_wakeup
> +BEGIN_FTR_SECTION
>  	cmpwi	r0, NAPPING_UNSPLIT
>  	beq	kvm_unsplit_wakeup
> +END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_300)
>  	twi	31,0,0 /* Nap state must not be zero */
>  
>  33:	mr	r4, r3
> -- 
> 2.23.0

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox