LinuxPPC-Dev Archive on lore.kernel.org

LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH -next] tty: hvc: fix link error with CONFIG_SERIAL_CORE_CONSOLE=n
From: Yang Yingliang @ 2020-09-18  9:20 UTC (permalink / raw)
  To: linuxppc-dev, linux-kernel; +Cc: gregkh

Fix the link error by selecting SERIAL_CORE_CONSOLE.

aarch64-linux-gnu-ld: drivers/tty/hvc/hvc_dcc.o: in function `dcc_early_write':
hvc_dcc.c:(.text+0x164): undefined reference to `uart_console_write'

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
---
 drivers/tty/hvc/Kconfig | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/tty/hvc/Kconfig b/drivers/tty/hvc/Kconfig
index d1b27b0522a3..8d60e0ff67b4 100644
--- a/drivers/tty/hvc/Kconfig
+++ b/drivers/tty/hvc/Kconfig
@@ -81,6 +81,7 @@ config HVC_DCC
 	bool "ARM JTAG DCC console"
 	depends on ARM || ARM64
 	select HVC_DRIVER
+	select SERIAL_CORE_CONSOLE
 	help
 	  This console uses the JTAG DCC on ARM to create a console under the HVC
 	  driver. This console is used through a JTAG only on ARM. If you don't have
-- 
2.25.1


^ permalink raw reply related

* [PATCH -next] powerpc/papr_scm: Fix warnings about undeclared variable
From: Wang Wensheng @ 2020-09-18  8:59 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: santosh, linux-kernel, paulus, aneesh.kumar, vaibhav,
	dan.j.williams, ira.weiny

Build the kernel with 'make C=2':
arch/powerpc/platforms/pseries/papr_scm.c:825:1: warning: symbol
'dev_attr_perf_stats' was not declared. Should it be static?

Signed-off-by: Wang Wensheng <wangwensheng4@huawei.com>
---
 arch/powerpc/platforms/pseries/papr_scm.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/platforms/pseries/papr_scm.c b/arch/powerpc/platforms/pseries/papr_scm.c
index 5493bc847bd0..a95aa425e7d4 100644
--- a/arch/powerpc/platforms/pseries/papr_scm.c
+++ b/arch/powerpc/platforms/pseries/papr_scm.c
@@ -823,7 +823,7 @@ static ssize_t perf_stats_show(struct device *dev,
 	kfree(stats);
 	return rc ? rc : (ssize_t)seq_buf_used(&s);
 }
-DEVICE_ATTR_ADMIN_RO(perf_stats);
+static DEVICE_ATTR_ADMIN_RO(perf_stats);
 
 static ssize_t flags_show(struct device *dev,
 			  struct device_attribute *attr, char *buf)
-- 
2.25.0


^ permalink raw reply related

* Re: [PATCH v6 0/8] powerpc/watchpoint: Bug fixes plus new feature flag
From: Ravi Bangoria @ 2020-09-18  8:31 UTC (permalink / raw)
  To: Rogerio Alves
  Cc: christophe.leroy, Ravi Bangoria, mikey, pedromfc, linux-kernel,
	paulus, jniethe5, rogealve, naveen.n.rao, linuxppc-dev
In-Reply-To: <6927523d-de63-910a-e789-5fab424c7eb9@linux.ibm.com>



On 9/17/20 6:54 PM, Rogerio Alves wrote:
> On 9/2/20 1:29 AM, Ravi Bangoria wrote:
>> Patch #1 fixes issue for quardword instruction on p10 predecessors.
>> Patch #2 fixes issue for vector instructions.
>> Patch #3 fixes a bug about watchpoint not firing when created with
>>           ptrace PPC_PTRACE_SETHWDEBUG and CONFIG_HAVE_HW_BREAKPOINT=N.
>>           The fix uses HW_BRK_TYPE_PRIV_ALL for ptrace user which, I
>>           guess, should be fine because we don't leak any kernel
>>           addresses and PRIV_ALL will also help to cover scenarios when
>>           kernel accesses user memory.
>> Patch #4,#5 fixes infinite exception bug, again the bug happens only
>>           with CONFIG_HAVE_HW_BREAKPOINT=N.
>> Patch #6 fixes two places where we are missing to set hw_len.
>> Patch #7 introduce new feature bit PPC_DEBUG_FEATURE_DATA_BP_ARCH_31
>>           which will be set when running on ISA 3.1 compliant machine.
>> Patch #8 finally adds selftest to test scenarios fixed by patch#2,#3
>>           and also moves MODE_EXACT tests outside of BP_RANGE condition.
>>
[...]

> 
> Tested this patch set for:
> - SETHWDEBUG when CONFIG_HAVE_HW_BREAKPOINT=N = OK
> - Fix exception handling for CONFIG_HAVE_HW_BREAKPOINT=N = OK
> - Check for PPC_DEBUG_FEATURE_DATA_BP_ARCH_31 = OK
> - Fix quarword instruction handling on p10 predecessors = OK
> - Fix handling of vector instructions = OK
> 
> Also tested for:
> - Set second watchpoint (P10 Mambo) = OK
> - Infinity loop on sc instruction = OK

Thanks Rogerio!

Ravi

^ permalink raw reply

* RE: [PATCHv7 04/12] PCI: designware-ep: Modify MSI and MSIX CAP way of finding
From: Z.q. Hou @ 2020-09-18  8:15 UTC (permalink / raw)
  To: Rob Herring
  Cc: devicetree@vger.kernel.org, lorenzo.pieralisi@arm.com,
	Xiaowei Bao, Roy Zang, linux-pci@vger.kernel.org,
	linux-kernel@vger.kernel.org, Leo Li, M.h. Lian,
	jingoohan1@gmail.com, andrew.murray@arm.com, Mingkai Hu,
	gustavo.pimentel@synopsys.com, bhelgaas@google.com,
	shawnguo@kernel.org, kishon@ti.com, linuxppc-dev@lists.ozlabs.org,
	linux-arm-kernel@lists.infradead.org
In-Reply-To: <20200910181009.GB592152@bogus>

Hi Rob,

> -----Original Message-----
> From: Rob Herring <robh@kernel.org>
> Sent: 2020年9月11日 2:10
> To: Z.q. Hou <zhiqiang.hou@nxp.com>
> Cc: linux-pci@vger.kernel.org; devicetree@vger.kernel.org;
> linux-kernel@vger.kernel.org; linux-arm-kernel@lists.infradead.org;
> linuxppc-dev@lists.ozlabs.org; bhelgaas@google.com;
> lorenzo.pieralisi@arm.com; shawnguo@kernel.org; Leo Li
> <leoyang.li@nxp.com>; kishon@ti.com; gustavo.pimentel@synopsys.com;
> Roy Zang <roy.zang@nxp.com>; jingoohan1@gmail.com;
> andrew.murray@arm.com; Mingkai Hu <mingkai.hu@nxp.com>; M.h. Lian
> <minghuan.lian@nxp.com>; Xiaowei Bao <xiaowei.bao@nxp.com>
> Subject: Re: [PATCHv7 04/12] PCI: designware-ep: Modify MSI and MSIX CAP
> way of finding
> 
> On Tue, Aug 11, 2020 at 05:54:33PM +0800, Zhiqiang Hou wrote:
> > From: Xiaowei Bao <xiaowei.bao@nxp.com>
> >
> > Each PF of EP device should have its own MSI or MSIX capabitily
> > struct, so create a dw_pcie_ep_func struct and move the msi_cap and
> > msix_cap to this struct from dw_pcie_ep, and manage the PFs via a
> > list.
> >
> > Signed-off-by: Xiaowei Bao <xiaowei.bao@nxp.com>
> > Signed-off-by: Hou Zhiqiang <Zhiqiang.Hou@nxp.com>
> > ---
> > V7:
> >  - Rebase the patch without functionality change.
> >
> >  .../pci/controller/dwc/pcie-designware-ep.c   | 139
> +++++++++++++++---
> >  drivers/pci/controller/dwc/pcie-designware.h  |  18 ++-
> >  2 files changed, 136 insertions(+), 21 deletions(-)
> >
> > diff --git a/drivers/pci/controller/dwc/pcie-designware-ep.c
> > b/drivers/pci/controller/dwc/pcie-designware-ep.c
> > index 56bd1cd71f16..4680a51c49c0 100644
> > --- a/drivers/pci/controller/dwc/pcie-designware-ep.c
> > +++ b/drivers/pci/controller/dwc/pcie-designware-ep.c
> > @@ -28,6 +28,19 @@ void dw_pcie_ep_init_notify(struct dw_pcie_ep
> *ep)
> > }  EXPORT_SYMBOL_GPL(dw_pcie_ep_init_notify);
> >
> > +struct dw_pcie_ep_func *
> > +dw_pcie_ep_get_func_from_ep(struct dw_pcie_ep *ep, u8 func_no) {
> > +	struct dw_pcie_ep_func *ep_func;
> > +
> > +	list_for_each_entry(ep_func, &ep->func_list, list) {
> > +		if (ep_func->func_no == func_no)
> > +			return ep_func;
> > +	}
> > +
> > +	return NULL;
> > +}
> > +
> >  static unsigned int dw_pcie_ep_func_select(struct dw_pcie_ep *ep, u8
> > func_no)  {
> >  	unsigned int func_offset = 0;
> > @@ -68,6 +81,47 @@ void dw_pcie_ep_reset_bar(struct dw_pcie *pci,
> enum pci_barno bar)
> >  		__dw_pcie_ep_reset_bar(pci, func_no, bar, 0);  }
> >
> > +static u8 __dw_pcie_ep_find_next_cap(struct dw_pcie_ep *ep, u8
> func_no,
> > +		u8 cap_ptr, u8 cap)
> > +{
> > +	struct dw_pcie *pci = to_dw_pcie_from_ep(ep);
> > +	unsigned int func_offset = 0;
> > +	u8 cap_id, next_cap_ptr;
> > +	u16 reg;
> > +
> > +	if (!cap_ptr)
> > +		return 0;
> > +
> > +	func_offset = dw_pcie_ep_func_select(ep, func_no);
> > +
> > +	reg = dw_pcie_readw_dbi(pci, func_offset + cap_ptr);
> > +	cap_id = (reg & 0x00ff);
> > +
> > +	if (cap_id > PCI_CAP_ID_MAX)
> > +		return 0;
> > +
> > +	if (cap_id == cap)
> > +		return cap_ptr;
> > +
> > +	next_cap_ptr = (reg & 0xff00) >> 8;
> > +	return __dw_pcie_ep_find_next_cap(ep, func_no, next_cap_ptr, cap); }
> > +
> > +static u8 dw_pcie_ep_find_capability(struct dw_pcie_ep *ep, u8
> > +func_no, u8 cap) {
> > +	struct dw_pcie *pci = to_dw_pcie_from_ep(ep);
> > +	unsigned int func_offset = 0;
> > +	u8 next_cap_ptr;
> > +	u16 reg;
> > +
> > +	func_offset = dw_pcie_ep_func_select(ep, func_no);
> > +
> > +	reg = dw_pcie_readw_dbi(pci, func_offset + PCI_CAPABILITY_LIST);
> > +	next_cap_ptr = (reg & 0x00ff);
> > +
> > +	return __dw_pcie_ep_find_next_cap(ep, func_no, next_cap_ptr, cap); }
> 
> These are almost the same as __dw_pcie_find_next_cap and
> dw_pcie_find_capability. Please modify them to take a function offset and
> work for both host and endpoints.
> 

I sent out v8 patches but without the rework of the functions of finding capability, I will submit a separate patch to do this.

> > +
> >  static int dw_pcie_ep_write_header(struct pci_epc *epc, u8 func_no,
> >  				   struct pci_epf_header *hdr)
> >  {
> > @@ -257,13 +311,18 @@ static int dw_pcie_ep_get_msi(struct pci_epc
> *epc, u8 func_no)
> >  	struct dw_pcie *pci = to_dw_pcie_from_ep(ep);
> >  	u32 val, reg;
> >  	unsigned int func_offset = 0;
> > +	struct dw_pcie_ep_func *ep_func;
> >
> > -	if (!ep->msi_cap)
> > +	ep_func = dw_pcie_ep_get_func_from_ep(ep, func_no);
> > +	if (!ep_func)
> > +		return -EINVAL;
> > +
> > +	if (!ep_func->msi_cap)
> >  		return -EINVAL;
> >
> >  	func_offset = dw_pcie_ep_func_select(ep, func_no);
> >
> > -	reg = ep->msi_cap + func_offset + PCI_MSI_FLAGS;
> > +	reg = ep_func->msi_cap + func_offset + PCI_MSI_FLAGS;
> >  	val = dw_pcie_readw_dbi(pci, reg);
> >  	if (!(val & PCI_MSI_FLAGS_ENABLE))
> >  		return -EINVAL;
> > @@ -279,13 +338,18 @@ static int dw_pcie_ep_set_msi(struct pci_epc
> *epc, u8 func_no, u8 interrupts)
> >  	struct dw_pcie *pci = to_dw_pcie_from_ep(ep);
> >  	u32 val, reg;
> >  	unsigned int func_offset = 0;
> > +	struct dw_pcie_ep_func *ep_func;
> > +
> > +	ep_func = dw_pcie_ep_get_func_from_ep(ep, func_no);
> > +	if (!ep_func)
> > +		return -EINVAL;
> >
> > -	if (!ep->msi_cap)
> > +	if (!ep_func->msi_cap)
> >  		return -EINVAL;
> >
> >  	func_offset = dw_pcie_ep_func_select(ep, func_no);
> >
> > -	reg = ep->msi_cap + func_offset + PCI_MSI_FLAGS;
> > +	reg = ep_func->msi_cap + func_offset + PCI_MSI_FLAGS;
> 
> If msi_cap is now per function, then shouldn't it already include
> 'func_offset'?

There are many calls of the function to get func_offset, I also will
Submit a separate patch to recode them.

Thanks,
Zhiqiang

> 
> >  	val = dw_pcie_readw_dbi(pci, reg);
> >  	val &= ~PCI_MSI_FLAGS_QMASK;
> >  	val |= (interrupts << 1) & PCI_MSI_FLAGS_QMASK; @@ -302,13 +366,18
> > @@ static int dw_pcie_ep_get_msix(struct pci_epc *epc, u8 func_no)
> >  	struct dw_pcie *pci = to_dw_pcie_from_ep(ep);
> >  	u32 val, reg;
> >  	unsigned int func_offset = 0;
> > +	struct dw_pcie_ep_func *ep_func;
> > +
> > +	ep_func = dw_pcie_ep_get_func_from_ep(ep, func_no);
> > +	if (!ep_func)
> > +		return -EINVAL;
> >
> > -	if (!ep->msix_cap)
> > +	if (!ep_func->msix_cap)
> >  		return -EINVAL;
> >
> >  	func_offset = dw_pcie_ep_func_select(ep, func_no);
> >
> > -	reg = ep->msix_cap + func_offset + PCI_MSIX_FLAGS;
> > +	reg = ep_func->msix_cap + func_offset + PCI_MSIX_FLAGS;
> >  	val = dw_pcie_readw_dbi(pci, reg);
> >  	if (!(val & PCI_MSIX_FLAGS_ENABLE))
> >  		return -EINVAL;
> > @@ -325,25 +394,30 @@ static int dw_pcie_ep_set_msix(struct pci_epc
> *epc, u8 func_no, u16 interrupts,
> >  	struct dw_pcie *pci = to_dw_pcie_from_ep(ep);
> >  	u32 val, reg;
> >  	unsigned int func_offset = 0;
> > +	struct dw_pcie_ep_func *ep_func;
> >
> > -	if (!ep->msix_cap)
> > +	ep_func = dw_pcie_ep_get_func_from_ep(ep, func_no);
> > +	if (!ep_func)
> > +		return -EINVAL;
> > +
> > +	if (!ep_func->msix_cap)
> >  		return -EINVAL;
> >
> >  	dw_pcie_dbi_ro_wr_en(pci);
> >
> >  	func_offset = dw_pcie_ep_func_select(ep, func_no);
> >
> > -	reg = ep->msix_cap + func_offset + PCI_MSIX_FLAGS;
> > +	reg = ep_func->msix_cap + func_offset + PCI_MSIX_FLAGS;
> >  	val = dw_pcie_readw_dbi(pci, reg);
> >  	val &= ~PCI_MSIX_FLAGS_QSIZE;
> >  	val |= interrupts;
> >  	dw_pcie_writew_dbi(pci, reg, val);
> >
> > -	reg = ep->msix_cap + func_offset + PCI_MSIX_TABLE;
> > +	reg = ep_func->msix_cap + func_offset + PCI_MSIX_TABLE;
> >  	val = offset | bir;
> >  	dw_pcie_writel_dbi(pci, reg, val);
> >
> > -	reg = ep->msix_cap + func_offset + PCI_MSIX_PBA;
> > +	reg = ep_func->msix_cap + func_offset + PCI_MSIX_PBA;
> >  	val = (offset + (interrupts * PCI_MSIX_ENTRY_SIZE)) | bir;
> >  	dw_pcie_writel_dbi(pci, reg, val);
> >
> > @@ -426,6 +500,7 @@ int dw_pcie_ep_raise_msi_irq(struct dw_pcie_ep
> *ep, u8 func_no,
> >  			     u8 interrupt_num)
> >  {
> >  	struct dw_pcie *pci = to_dw_pcie_from_ep(ep);
> > +	struct dw_pcie_ep_func *ep_func;
> >  	struct pci_epc *epc = ep->epc;
> >  	unsigned int aligned_offset;
> >  	unsigned int func_offset = 0;
> > @@ -435,25 +510,29 @@ int dw_pcie_ep_raise_msi_irq(struct
> dw_pcie_ep *ep, u8 func_no,
> >  	bool has_upper;
> >  	int ret;
> >
> > -	if (!ep->msi_cap)
> > +	ep_func = dw_pcie_ep_get_func_from_ep(ep, func_no);
> > +	if (!ep_func)
> > +		return -EINVAL;
> > +
> > +	if (!ep_func->msi_cap)
> >  		return -EINVAL;
> >
> >  	func_offset = dw_pcie_ep_func_select(ep, func_no);
> >
> >  	/* Raise MSI per the PCI Local Bus Specification Revision 3.0, 6.8.1. */
> > -	reg = ep->msi_cap + func_offset + PCI_MSI_FLAGS;
> > +	reg = ep_func->msi_cap + func_offset + PCI_MSI_FLAGS;
> >  	msg_ctrl = dw_pcie_readw_dbi(pci, reg);
> >  	has_upper = !!(msg_ctrl & PCI_MSI_FLAGS_64BIT);
> > -	reg = ep->msi_cap + func_offset + PCI_MSI_ADDRESS_LO;
> > +	reg = ep_func->msi_cap + func_offset + PCI_MSI_ADDRESS_LO;
> >  	msg_addr_lower = dw_pcie_readl_dbi(pci, reg);
> >  	if (has_upper) {
> > -		reg = ep->msi_cap + func_offset + PCI_MSI_ADDRESS_HI;
> > +		reg = ep_func->msi_cap + func_offset + PCI_MSI_ADDRESS_HI;
> >  		msg_addr_upper = dw_pcie_readl_dbi(pci, reg);
> > -		reg = ep->msi_cap + func_offset + PCI_MSI_DATA_64;
> > +		reg = ep_func->msi_cap + func_offset + PCI_MSI_DATA_64;
> >  		msg_data = dw_pcie_readw_dbi(pci, reg);
> >  	} else {
> >  		msg_addr_upper = 0;
> > -		reg = ep->msi_cap + func_offset + PCI_MSI_DATA_32;
> > +		reg = ep_func->msi_cap + func_offset + PCI_MSI_DATA_32;
> >  		msg_data = dw_pcie_readw_dbi(pci, reg);
> >  	}
> >  	aligned_offset = msg_addr_lower & (epc->mem->window.page_size -
> 1);
> > @@ -489,6 +568,7 @@ int dw_pcie_ep_raise_msix_irq(struct dw_pcie_ep
> *ep, u8 func_no,
> >  			      u16 interrupt_num)
> >  {
> >  	struct dw_pcie *pci = to_dw_pcie_from_ep(ep);
> > +	struct dw_pcie_ep_func *ep_func;
> >  	struct pci_epf_msix_tbl *msix_tbl;
> >  	struct pci_epc *epc = ep->epc;
> >  	unsigned int func_offset = 0;
> > @@ -499,9 +579,16 @@ int dw_pcie_ep_raise_msix_irq(struct
> dw_pcie_ep *ep, u8 func_no,
> >  	int ret;
> >  	u8 bir;
> >
> > +	ep_func = dw_pcie_ep_get_func_from_ep(ep, func_no);
> > +	if (!ep_func)
> > +		return -EINVAL;
> > +
> > +	if (!ep_func->msix_cap)
> > +		return -EINVAL;
> > +
> >  	func_offset = dw_pcie_ep_func_select(ep, func_no);
> >
> > -	reg = ep->msix_cap + func_offset + PCI_MSIX_TABLE;
> > +	reg = ep_func->msix_cap + func_offset + PCI_MSIX_TABLE;
> >  	tbl_offset = dw_pcie_readl_dbi(pci, reg);
> >  	bir = (tbl_offset & PCI_MSIX_TABLE_BIR);
> >  	tbl_offset &= PCI_MSIX_TABLE_OFFSET; @@ -596,11 +683,15 @@ int
> > dw_pcie_ep_init(struct dw_pcie_ep *ep)  {
> >  	int ret;
> >  	void *addr;
> > +	u8 func_no;
> >  	struct pci_epc *epc;
> >  	struct dw_pcie *pci = to_dw_pcie_from_ep(ep);
> >  	struct device *dev = pci->dev;
> >  	struct device_node *np = dev->of_node;
> >  	const struct pci_epc_features *epc_features;
> > +	struct dw_pcie_ep_func *ep_func;
> > +
> > +	INIT_LIST_HEAD(&ep->func_list);
> >
> >  	if (!pci->dbi_base || !pci->dbi_base2) {
> >  		dev_err(dev, "dbi_base/dbi_base2 is not populated\n"); @@ -660,9
> > +751,19 @@ int dw_pcie_ep_init(struct dw_pcie_ep *ep)
> >  	if (ret < 0)
> >  		epc->max_functions = 1;
> >
> > -	ep->msi_cap = dw_pcie_find_capability(pci, PCI_CAP_ID_MSI);
> > +	for (func_no = 0; func_no < epc->max_functions; func_no++) {
> > +		ep_func = devm_kzalloc(dev, sizeof(*ep_func), GFP_KERNEL);
> > +		if (!ep_func)
> > +			return -ENOMEM;
> >
> > -	ep->msix_cap = dw_pcie_find_capability(pci, PCI_CAP_ID_MSIX);
> > +		ep_func->func_no = func_no;
> > +		ep_func->msi_cap = dw_pcie_ep_find_capability(ep, func_no,
> > +							      PCI_CAP_ID_MSI);
> > +		ep_func->msix_cap = dw_pcie_ep_find_capability(ep, func_no,
> > +							       PCI_CAP_ID_MSIX);
> > +
> > +		list_add_tail(&ep_func->list, &ep->func_list);
> > +	}
> >
> >  	if (ep->ops->ep_init)
> >  		ep->ops->ep_init(ep);
> > diff --git a/drivers/pci/controller/dwc/pcie-designware.h
> > b/drivers/pci/controller/dwc/pcie-designware.h
> > index 745b4938225a..19c4ba486239 100644
> > --- a/drivers/pci/controller/dwc/pcie-designware.h
> > +++ b/drivers/pci/controller/dwc/pcie-designware.h
> > @@ -230,8 +230,16 @@ struct dw_pcie_ep_ops {
> >  	unsigned int (*func_conf_select)(struct dw_pcie_ep *ep, u8 func_no);
> > };
> >
> > +struct dw_pcie_ep_func {
> > +	struct list_head	list;
> > +	u8			func_no;
> > +	u8			msi_cap;	/* MSI capability offset */
> > +	u8			msix_cap;	/* MSI-X capability offset */
> > +};
> > +
> >  struct dw_pcie_ep {
> >  	struct pci_epc		*epc;
> > +	struct list_head	func_list;
> >  	const struct dw_pcie_ep_ops *ops;
> >  	phys_addr_t		phys_base;
> >  	size_t			addr_size;
> > @@ -244,8 +252,6 @@ struct dw_pcie_ep {
> >  	u32			num_ob_windows;
> >  	void __iomem		*msi_mem;
> >  	phys_addr_t		msi_mem_phys;
> > -	u8			msi_cap;	/* MSI capability offset */
> > -	u8			msix_cap;	/* MSI-X capability offset */
> >  	struct pci_epf_bar	*epf_bar[PCI_STD_NUM_BARS];
> >  };
> >
> > @@ -440,6 +446,8 @@ int dw_pcie_ep_raise_msix_irq(struct dw_pcie_ep
> > *ep, u8 func_no,  int dw_pcie_ep_raise_msix_irq_doorbell(struct
> dw_pcie_ep *ep, u8 func_no,
> >  				       u16 interrupt_num);
> >  void dw_pcie_ep_reset_bar(struct dw_pcie *pci, enum pci_barno bar);
> > +struct dw_pcie_ep_func *
> > +dw_pcie_ep_get_func_from_ep(struct dw_pcie_ep *ep, u8 func_no);
> >  #else
> >  static inline void dw_pcie_ep_linkup(struct dw_pcie_ep *ep)  { @@
> > -490,5 +498,11 @@ static inline int
> > dw_pcie_ep_raise_msix_irq_doorbell(struct dw_pcie_ep *ep,  static
> > inline void dw_pcie_ep_reset_bar(struct dw_pcie *pci, enum pci_barno
> > bar)  {  }
> > +
> > +static inline struct dw_pcie_ep_func *
> > +dw_pcie_ep_get_func_from_ep(struct dw_pcie_ep *ep, u8 func_no) {
> > +	return NULL;
> > +}
> >  #endif
> >  #endif /* _PCIE_DESIGNWARE_H */
> > --
> > 2.17.1
> >

^ permalink raw reply

* Re: [PATCH v2 1/2] powerpc/64: Set up a kernel stack for secondaries before cpu_restore()
From: Michael Ellerman @ 2020-09-18  7:20 UTC (permalink / raw)
  To: Jordan Niethe, linuxppc-dev; +Cc: Jordan Niethe, oohall, npiggin
In-Reply-To: <20200917091716.4631-1-jniethe5@gmail.com>

Hi Jordan,

Jordan Niethe <jniethe5@gmail.com> writes:
> Currently in generic_secondary_smp_init(), cur_cpu_spec->cpu_restore()
> is called before a stack has been set up in r1. This was previously fine
> as the cpu_restore() functions were implemented in assembly and did not
> use a stack. However commit 5a61ef74f269 ("powerpc/64s: Support new
> device tree binding for discovering CPU features") used
> __restore_cpu_cpufeatures() as the cpu_restore() function for a
> device-tree features based cputable entry. This is a C function and
> hence uses a stack in r1.
>
> generic_secondary_smp_init() is entered on the secondary cpus via the
> primary cpu using the OPAL call opal_start_cpu(). In OPAL, each hardware
> thread has its own stack. The OPAL call is ran in the primary's hardware
> thread. During the call, a job is scheduled on a secondary cpu that will
> start executing at the address of generic_secondary_smp_init().  Hence
> the value that will be left in r1 when the secondary cpu enters the
> kernel is part of that secondary cpu's individual OPAL stack. This means
> that __restore_cpu_cpufeatures() will write to that OPAL stack. This is
> not horribly bad as each hardware thread has its own stack and the call
> that enters the kernel from OPAL never returns, but it is still wrong
> and should be corrected.
>
> Create the temp kernel stack before calling cpu_restore().
>
> Fixes: 5a61ef74f269 ("powerpc/64s: Support new device tree binding for discovering CPU features")
> Signed-off-by: Jordan Niethe <jniethe5@gmail.com>
> ---
> v2: Add more detail to the commit message
> ---
>  arch/powerpc/kernel/head_64.S | 8 ++++----
>  1 file changed, 4 insertions(+), 4 deletions(-)

Unfortunately this breaks booting via kexec.

In that case the secondaries come in to 0x60 and spin until they're
released by smp_release_cpus(), which is before emergency_stack_init()
has run. That means they pick up a bad r1 value and crash/get stuck.

I'm not sure what the best solution is.

I've thought in the past that it would be nicer if the CPU setup didn't
run until the secondary is told to start (via PACAPROCSTART), ie. more
the CPU setup call below there.

But that opens the possibility that we run threads with different
settings of some SPRs until SMP bringup, and if the user has said not to
start secondaries then possibly for ever. And I haven't though hard
enough about whether that's actually problematic (running with different
SPR values).

cheers


> diff --git a/arch/powerpc/kernel/head_64.S b/arch/powerpc/kernel/head_64.S
> index 0e05a9a47a4b..4b7f4c6c2600 100644
> --- a/arch/powerpc/kernel/head_64.S
> +++ b/arch/powerpc/kernel/head_64.S
> @@ -420,6 +420,10 @@ generic_secondary_common_init:
>  	/* From now on, r24 is expected to be logical cpuid */
>  	mr	r24,r5
>  
> +	/* Create a temp kernel stack for use before relocation is on.	*/
> +	ld	r1,PACAEMERGSP(r13)
> +	subi	r1,r1,STACK_FRAME_OVERHEAD
> +
>  	/* See if we need to call a cpu state restore handler */
>  	LOAD_REG_ADDR(r23, cur_cpu_spec)
>  	ld	r23,0(r23)
> @@ -448,10 +452,6 @@ generic_secondary_common_init:
>  	sync				/* order paca.run and cur_cpu_spec */
>  	isync				/* In case code patching happened */
>  
> -	/* Create a temp kernel stack for use before relocation is on.	*/
> -	ld	r1,PACAEMERGSP(r13)
> -	subi	r1,r1,STACK_FRAME_OVERHEAD
> -
>  	b	__secondary_start
>  #endif /* SMP */
>  
> -- 
> 2.17.1

^ permalink raw reply

* Re: [PATCH 2/3] powerpc/mce: Add debugfs interface to inject MCE
From: Michael Ellerman @ 2020-09-18  6:40 UTC (permalink / raw)
  To: Ganesh Goudar, linuxppc-dev
  Cc: mahesh, msuchanek, Ganesh Goudar, Kees Cook, npiggin
In-Reply-To: <20200916172228.83271-3-ganeshgr@linux.ibm.com>

Hi Ganesh,

Ganesh Goudar <ganeshgr@linux.ibm.com> writes:
> To test machine check handling, add debugfs interface to inject
> slb multihit errors.
>
> To inject slb multihit:
>  #echo 1 > /sys/kernel/debug/powerpc/mce_error_inject/inject_slb_multihit

Rather than creating a new ad-hoc way to trigger this, can you please
integrate it into drivers/misc/lkdtm.

There's enough code here that I think you should create
drivers/misc/lkdtm/powerpc.c and put the code in there. Then add an
LKDTM entry point for this, maybe called PPC_SLB_MULTIHIT.

Please Cc Kees when you repost.

cheers


>  arch/powerpc/Kconfig.debug             |   9 ++
>  arch/powerpc/sysdev/Makefile           |   2 +
>  arch/powerpc/sysdev/mce_error_inject.c | 148 +++++++++++++++++++++++++
>  3 files changed, 159 insertions(+)
>  create mode 100644 arch/powerpc/sysdev/mce_error_inject.c
>
> diff --git a/arch/powerpc/Kconfig.debug b/arch/powerpc/Kconfig.debug
> index b88900f4832f..61db133f2f0d 100644
> --- a/arch/powerpc/Kconfig.debug
> +++ b/arch/powerpc/Kconfig.debug
> @@ -398,3 +398,12 @@ config KASAN_SHADOW_OFFSET
>  	hex
>  	depends on KASAN
>  	default 0xe0000000
> +
> +config MCE_ERROR_INJECT
> +	bool "Enable MCE error injection through debugfs"
> +	depends on DEBUG_FS
> +	default y
> +	help
> +	  This option creates an mce_error_inject directory in the
> +	  powerpc debugfs directory that allows limited injection of
> +	  Machine Check Errors (MCEs).
> diff --git a/arch/powerpc/sysdev/Makefile b/arch/powerpc/sysdev/Makefile
> index 026b3f01a991..7fc102222b77 100644
> --- a/arch/powerpc/sysdev/Makefile
> +++ b/arch/powerpc/sysdev/Makefile
> @@ -52,3 +52,5 @@ obj-$(CONFIG_PPC_XICS)		+= xics/
>  obj-$(CONFIG_PPC_XIVE)		+= xive/
>  
>  obj-$(CONFIG_GE_FPGA)		+= ge/
> +
> +obj-$(CONFIG_MCE_ERROR_INJECT)	+= mce_error_inject.o
> diff --git a/arch/powerpc/sysdev/mce_error_inject.c b/arch/powerpc/sysdev/mce_error_inject.c
> new file mode 100644
> index 000000000000..ca4726bfa2d9
> --- /dev/null
> +++ b/arch/powerpc/sysdev/mce_error_inject.c
> @@ -0,0 +1,148 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Machine Check Exception injection code
> + */
> +
> +#include <linux/kernel.h>
> +#include <linux/slab.h>
> +#include <linux/vmalloc.h>
> +#include <linux/fs.h>
> +#include <linux/debugfs.h>
> +#include <asm/debugfs.h>
> +
> +static inline unsigned long get_slb_index(void)
> +{
> +	unsigned long index;
> +
> +	index = get_paca()->stab_rr;
> +
> +	/*
> +	 * simple round-robin replacement of slb starting at SLB_NUM_BOLTED.
> +	 */
> +	if (index < (mmu_slb_size - 1))
> +		index++;
> +	else
> +		index = SLB_NUM_BOLTED;
> +	get_paca()->stab_rr = index;
> +	return index;
> +}
> +
> +#define slb_esid_mask(ssize)	\
> +	(((ssize) == MMU_SEGSIZE_256M) ? ESID_MASK : ESID_MASK_1T)
> +
> +static inline unsigned long mk_esid_data(unsigned long ea, int ssize,
> +					 unsigned long slot)
> +{
> +	return (ea & slb_esid_mask(ssize)) | SLB_ESID_V | slot;
> +}
> +
> +#define slb_vsid_shift(ssize)	\
> +	((ssize) == MMU_SEGSIZE_256M ? SLB_VSID_SHIFT : SLB_VSID_SHIFT_1T)
> +
> +static inline unsigned long mk_vsid_data(unsigned long ea, int ssize,
> +					 unsigned long flags)
> +{
> +	return (get_kernel_vsid(ea, ssize) << slb_vsid_shift(ssize)) | flags |
> +		((unsigned long)ssize << SLB_VSID_SSIZE_SHIFT);
> +}
> +
> +static void insert_slb_entry(char *p, int ssize)
> +{
> +	unsigned long flags, entry;
> +	struct paca_struct *paca;
> +
> +	flags = SLB_VSID_KERNEL | mmu_psize_defs[MMU_PAGE_64K].sllp;
> +
> +	preempt_disable();
> +
> +	paca = get_paca();
> +
> +	entry = get_slb_index();
> +	asm volatile("slbmte %0,%1" :
> +			: "r" (mk_vsid_data((unsigned long)p, ssize, flags)),
> +			  "r" (mk_esid_data((unsigned long)p, ssize, entry))
> +			: "memory");
> +
> +	entry = get_slb_index();
> +	asm volatile("slbmte %0,%1" :
> +			: "r" (mk_vsid_data((unsigned long)p, ssize, flags)),
> +			  "r" (mk_esid_data((unsigned long)p, ssize, entry))
> +			: "memory");
> +	preempt_enable();
> +	p[0] = '!';
> +}
> +
> +static void inject_vmalloc_slb_multihit(void)
> +{
> +	char *p;
> +
> +	p = vmalloc(2048);
> +	if (!p)
> +		return;
> +
> +	insert_slb_entry(p, MMU_SEGSIZE_1T);
> +	vfree(p);
> +}
> +
> +static void inject_kmalloc_slb_multihit(void)
> +{
> +	char *p;
> +
> +	p = kmalloc(2048, GFP_KERNEL);
> +	if (!p)
> +		return;
> +
> +	insert_slb_entry(p, MMU_SEGSIZE_1T);
> +	kfree(p);
> +}
> +
> +static ssize_t inject_slb_multihit(const char __user *u_buf, size_t count)
> +{
> +	char buf[32];
> +	size_t buf_size;
> +
> +	buf_size = min(count, (sizeof(buf) - 1));
> +	if (copy_from_user(buf, u_buf, buf_size))
> +		return -EFAULT;
> +	buf[buf_size] = '\0';
> +
> +	if (buf[0] != '1')
> +		return -EINVAL;
> +
> +	inject_vmalloc_slb_multihit();
> +	inject_kmalloc_slb_multihit();
> +	return count;
> +}
> +
> +static ssize_t inject_write(struct file *file, const char __user *buf,
> +			    size_t count, loff_t *ppos)
> +{
> +	static ssize_t (*func)(const char __user *, size_t);
> +
> +	func = file->f_inode->i_private;
> +	return func(buf, count);
> +}
> +
> +static const struct file_operations inject_fops = {
> +	.write		= inject_write,
> +	.llseek		= default_llseek,
> +};
> +
> +static int mce_error_inject_setup(void)
> +{
> +	struct dentry *mce_error_inject_dir;
> +
> +	mce_error_inject_dir = debugfs_create_dir("mce_error_inject",
> +						  powerpc_debugfs_root);
> +
> +	if (mmu_has_feature(MMU_FTR_HPTE_TABLE)) {
> +		(void)debugfs_create_file("inject_slb_multihit", 0200,
> +					  mce_error_inject_dir,
> +					  &inject_slb_multihit,
> +					  &inject_fops);
> +	}
> +
> +	return 0;
> +}
> +
> +device_initcall(mce_error_inject_setup);
> -- 
> 2.26.2

^ permalink raw reply

* Re: [PATCH AUTOSEL 5.4 101/330] powerpc/powernv/ioda: Fix ref count for devices with their own PE
From: Frederic Barrat @ 2020-09-18  6:35 UTC (permalink / raw)
  To: Sasha Levin, linux-kernel, stable; +Cc: linuxppc-dev, Andrew Donnellan
In-Reply-To: <20200918020110.2063155-101-sashal@kernel.org>



Le 18/09/2020 à 03:57, Sasha Levin a écrit :
> From: Frederic Barrat <fbarrat@linux.ibm.com>
> 
> [ Upstream commit 05dd7da76986937fb288b4213b1fa10dbe0d1b33 ]
>

This patch is not desirable for stable, for 5.4 and 4.19 (it was already 
flagged by autosel back in April. Not sure why it's showing again now)

   Fred


> The pci_dn structure used to store a pointer to the struct pci_dev, so
> taking a reference on the device was required. However, the pci_dev
> pointer was later removed from the pci_dn structure, but the reference
> was kept for the npu device.
> See commit 902bdc57451c ("powerpc/powernv/idoa: Remove unnecessary
> pcidev from pci_dn").
> 
> We don't need to take a reference on the device when assigning the PE
> as the struct pnv_ioda_pe is cleaned up at the same time as
> the (physical) device is released. Doing so prevents the device from
> being released, which is a problem for opencapi devices, since we want
> to be able to remove them through PCI hotplug.
> 
> Now the ugly part: nvlink npu devices are not meant to be
> released. Because of the above, we've always leaked a reference and
> simply removing it now is dangerous and would likely require more
> work. There's currently no release device callback for nvlink devices
> for example. So to be safe, this patch leaks a reference on the npu
> device, but only for nvlink and not opencapi.
> 
> Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com>
> Reviewed-by: Andrew Donnellan <ajd@linux.ibm.com>
> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
> Link: https://lore.kernel.org/r/20191121134918.7155-2-fbarrat@linux.ibm.com
> Signed-off-by: Sasha Levin <sashal@kernel.org>
> ---
>   arch/powerpc/platforms/powernv/pci-ioda.c | 19 ++++++++++++-------
>   1 file changed, 12 insertions(+), 7 deletions(-)
> 
> diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
> index 058223233088e..e9cda7e316a50 100644
> --- a/arch/powerpc/platforms/powernv/pci-ioda.c
> +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
> @@ -1062,14 +1062,13 @@ static struct pnv_ioda_pe *pnv_ioda_setup_dev_PE(struct pci_dev *dev)
>   		return NULL;
>   	}
>   
> -	/* NOTE: We get only one ref to the pci_dev for the pdn, not for the
> -	 * pointer in the PE data structure, both should be destroyed at the
> -	 * same time. However, this needs to be looked at more closely again
> -	 * once we actually start removing things (Hotplug, SR-IOV, ...)
> +	/* NOTE: We don't get a reference for the pointer in the PE
> +	 * data structure, both the device and PE structures should be
> +	 * destroyed at the same time. However, removing nvlink
> +	 * devices will need some work.
>   	 *
>   	 * At some point we want to remove the PDN completely anyways
>   	 */
> -	pci_dev_get(dev);
>   	pdn->pe_number = pe->pe_number;
>   	pe->flags = PNV_IODA_PE_DEV;
>   	pe->pdev = dev;
> @@ -1084,7 +1083,6 @@ static struct pnv_ioda_pe *pnv_ioda_setup_dev_PE(struct pci_dev *dev)
>   		pnv_ioda_free_pe(pe);
>   		pdn->pe_number = IODA_INVALID_PE;
>   		pe->pdev = NULL;
> -		pci_dev_put(dev);
>   		return NULL;
>   	}
>   
> @@ -1205,6 +1203,14 @@ static struct pnv_ioda_pe *pnv_ioda_setup_npu_PE(struct pci_dev *npu_pdev)
>   	struct pci_controller *hose = pci_bus_to_host(npu_pdev->bus);
>   	struct pnv_phb *phb = hose->private_data;
>   
> +	/*
> +	 * Intentionally leak a reference on the npu device (for
> +	 * nvlink only; this is not an opencapi path) to make sure it
> +	 * never goes away, as it's been the case all along and some
> +	 * work is needed otherwise.
> +	 */
> +	pci_dev_get(npu_pdev);
> +
>   	/*
>   	 * Due to a hardware errata PE#0 on the NPU is reserved for
>   	 * error handling. This means we only have three PEs remaining
> @@ -1228,7 +1234,6 @@ static struct pnv_ioda_pe *pnv_ioda_setup_npu_PE(struct pci_dev *npu_pdev)
>   			 */
>   			dev_info(&npu_pdev->dev,
>   				"Associating to existing PE %x\n", pe_num);
> -			pci_dev_get(npu_pdev);
>   			npu_pdn = pci_get_pdn(npu_pdev);
>   			rid = npu_pdev->bus->number << 8 | npu_pdn->devfn;
>   			npu_pdn->pe_number = pe_num;
> 

^ permalink raw reply

* Re: [PATCH -next] ide: Fix symbol undeclared warnings
From: Michael Ellerman @ 2020-09-18  6:26 UTC (permalink / raw)
  To: David Miller; +Cc: wangwensheng4, linux-kernel, linux-ide, paulus, linuxppc-dev
In-Reply-To: <20200917.124445.1786301672047605176.davem@davemloft.net>

David Miller <davem@davemloft.net> writes:
> From: Michael Ellerman <mpe@ellerman.id.au>
> Date: Thu, 17 Sep 2020 22:01:00 +1000
>
>> Wang Wensheng <wangwensheng4@huawei.com> writes:
>>> Build the object file with `C=2` and get the following warnings:
>>> make allmodconfig ARCH=powerpc CROSS_COMPILE=powerpc64-linux-gnu-
>>> make C=2 drivers/ide/pmac.o ARCH=powerpc64
>>> CROSS_COMPILE=powerpc64-linux-gnu-
>>>
>>> drivers/ide/pmac.c:228:23: warning: symbol 'mdma_timings_33' was not
>>> declared. Should it be static?
>>> drivers/ide/pmac.c:241:23: warning: symbol 'mdma_timings_33k' was not
>>> declared. Should it be static?
>>> drivers/ide/pmac.c:254:23: warning: symbol 'mdma_timings_66' was not
>>> declared. Should it be static?
>>> drivers/ide/pmac.c:272:3: warning: symbol 'kl66_udma_timings' was not
>>> declared. Should it be static?
>>> drivers/ide/pmac.c:1418:12: warning: symbol 'pmac_ide_probe' was not
>>> declared. Should it be static?
>>>
>>> Signed-off-by: Wang Wensheng <wangwensheng4@huawei.com>
>>> ---
>>>  drivers/ide/pmac.c | 10 +++++-----
>>>  1 file changed, 5 insertions(+), 5 deletions(-)
>> 
>> TIL davem maintains IDE?
>> 
>> But I suspect he isn't that interested in this powerpc only driver, so
>> I'll grab this.
>
> I did have it in my queue, but if you want to take it that's fine too :)

That's OK, if you've got it already you take it. Thanks.

cheers

^ permalink raw reply

* Re: [PATCH] selftests/seccomp: fix ptrace tests on powerpc
From: Michael Ellerman @ 2020-09-18  6:22 UTC (permalink / raw)
  To: Thadeu Lima de Souza Cascardo, Kees Cook
  Cc: linuxppc-dev, Shuah Khan, Oleg Nesterov, linux-kselftest
In-Reply-To: <20200917225100.GG314880@mussarela>

Thadeu Lima de Souza Cascardo <cascardo@canonical.com> writes:
> On Thu, Sep 17, 2020 at 03:37:16PM -0700, Kees Cook wrote:
>> On Sun, Sep 13, 2020 at 10:34:23PM +1000, Michael Ellerman wrote:
>> > Thadeu Lima de Souza Cascardo <cascardo@canonical.com> writes:
>> > > On Tue, Sep 08, 2020 at 04:18:17PM -0700, Kees Cook wrote:
>> > >> On Tue, Jun 30, 2020 at 01:47:39PM -0300, Thadeu Lima de Souza Cascardo wrote:
>> > ...
>> > >> > @@ -1809,10 +1818,15 @@ void tracer_ptrace(struct __test_metadata *_metadata, pid_t tracee,
>> > >> >  	EXPECT_EQ(entry ? PTRACE_EVENTMSG_SYSCALL_ENTRY
>> > >> >  			: PTRACE_EVENTMSG_SYSCALL_EXIT, msg);
>> > >> >  
>> > >> > -	if (!entry)
>> > >> > +	if (!entry && !syscall_nr)
>> > >> >  		return;
>> > >> >  
>> > >> > -	nr = get_syscall(_metadata, tracee);
>> > >> > +	if (entry)
>> > >> > +		nr = get_syscall(_metadata, tracee);
>> > >> > +	else
>> > >> > +		nr = *syscall_nr;
>> > >> 
>> > >> This is weird? Shouldn't get_syscall() be modified to do the right thing
>> > >> here instead of depending on the extra arg?
>> > >> 
>> > >
>> > > R0 might be clobered. Same documentation mentions it as volatile. So, during
>> > > syscall exit, we can't tell for sure that R0 will have the original syscall
>> > > number. So, we need to grab it during syscall enter, save it somewhere and
>> > > reuse it. I used the test context/args for that.
>> > 
>> > The user r0 (in regs->gpr[0]) shouldn't be clobbered.
>> > 
>> > But it is modified if the tracer skips the syscall, by setting the
>> > syscall number to -1. Or if the tracer changes the syscall number.
>> > 
>> > So if you need the original syscall number in the exit path then I think
>> > you do need to save it at entry.
>> 
>> ... the selftest code wants to test the updated syscall (-1 or
>> whatever), so this sounds correct. Was this test actually failing on
>> powerpc? (I'd really rather not split entry/exit if I don't have to.)
>
> Yes, it started failing when the return code started being changed as well.
> Though ptrace can change the syscall at entry (IIRC), it can't change the
> return code. And that is part of the ABI. If ppc is changed so it allows
> changing the return code during ptrace entry, some strace uses will break. So
> that is not an option.

Yep.

I don't know that it would break anything to change that part of the
ptrace ABI, but it definitely could break things and it would be subtle,
so it's not really an option.

So for powerpc we do need the return code change done in the exit hook,
sorry.

cheers

^ permalink raw reply

* Re: [PATCH] powerpc/tm: Save and restore AMR on treclaim and trechkpt
From: Aneesh Kumar K.V @ 2020-09-18  4:20 UTC (permalink / raw)
  To: Gustavo Romero, linuxppc-dev; +Cc: mikey
In-Reply-To: <20200918040536.9046-1-gromero@linux.ibm.com>

On 9/18/20 9:35 AM, Gustavo Romero wrote:
> Althought AMR is stashed on the checkpoint area, currently we don't save
> it to the per thread checkpoint struct after a treclaim and so we don't
> restore it either from that struct when we trechkpt. As a consequence when
> the transaction is later rolled back kernel space AMR value when the
> trechkpt was done appears in userspace.
> 
> That commit saves and restores AMR accordingly on treclaim and trechkpt.
> Since AMR value is also used in kernel space in other functions, it also
> takes care of stashing kernel live AMR into PACA before treclaim and before
> trechkpt, restoring it later, just before returning from tm_reclaim and
> __tm_recheckpoint.
> 
> Is also fixes two nonrelated comments about CR and MSR.
> 
> Signed-off-by: Gustavo Romero <gromero@linux.ibm.com>
> ---
>   arch/powerpc/include/asm/paca.h      |  1 +
>   arch/powerpc/include/asm/processor.h |  1 +
>   arch/powerpc/kernel/asm-offsets.c    |  2 ++
>   arch/powerpc/kernel/tm.S             | 31 +++++++++++++++++++++++-----
>   4 files changed, 30 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/paca.h b/arch/powerpc/include/asm/paca.h
> index 9454d29ff4b4..44c605181529 100644
> --- a/arch/powerpc/include/asm/paca.h
> +++ b/arch/powerpc/include/asm/paca.h
> @@ -179,6 +179,7 @@ struct paca_struct {
>   	u64 sprg_vdso;			/* Saved user-visible sprg */
>   #ifdef CONFIG_PPC_TRANSACTIONAL_MEM
>   	u64 tm_scratch;                 /* TM scratch area for reclaim */
> +	u64 tm_amr;			/* Saved Kernel AMR for treclaim/trechkpt */
>   #endif
>   
>   #ifdef CONFIG_PPC_POWERNV
> diff --git a/arch/powerpc/include/asm/processor.h b/arch/powerpc/include/asm/processor.h
> index ed0d633ab5aa..9f4f6cc033ac 100644
> --- a/arch/powerpc/include/asm/processor.h
> +++ b/arch/powerpc/include/asm/processor.h
> @@ -220,6 +220,7 @@ struct thread_struct {
>   	unsigned long	tm_tar;
>   	unsigned long	tm_ppr;
>   	unsigned long	tm_dscr;
> +	unsigned long   tm_amr;
>   
>   	/*
>   	 * Checkpointed FP and VSX 0-31 register set.
> diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c
> index 8711c2164b45..cf1a6d68a91f 100644
> --- a/arch/powerpc/kernel/asm-offsets.c
> +++ b/arch/powerpc/kernel/asm-offsets.c
> @@ -170,12 +170,14 @@ int main(void)
>   
>   #ifdef CONFIG_PPC_TRANSACTIONAL_MEM
>   	OFFSET(PACATMSCRATCH, paca_struct, tm_scratch);
> +	OFFSET(PACATMAMR, paca_struct, tm_amr);
>   	OFFSET(THREAD_TM_TFHAR, thread_struct, tm_tfhar);
>   	OFFSET(THREAD_TM_TEXASR, thread_struct, tm_texasr);
>   	OFFSET(THREAD_TM_TFIAR, thread_struct, tm_tfiar);
>   	OFFSET(THREAD_TM_TAR, thread_struct, tm_tar);
>   	OFFSET(THREAD_TM_PPR, thread_struct, tm_ppr);
>   	OFFSET(THREAD_TM_DSCR, thread_struct, tm_dscr);
> +	OFFSET(THREAD_TM_AMR, thread_struct, tm_amr);
>   	OFFSET(PT_CKPT_REGS, thread_struct, ckpt_regs);
>   	OFFSET(THREAD_CKVRSTATE, thread_struct, ckvr_state.vr);
>   	OFFSET(THREAD_CKVRSAVE, thread_struct, ckvrsave);
> diff --git a/arch/powerpc/kernel/tm.S b/arch/powerpc/kernel/tm.S
> index 6ba0fdd1e7f8..e178ddb43619 100644
> --- a/arch/powerpc/kernel/tm.S
> +++ b/arch/powerpc/kernel/tm.S
> @@ -152,6 +152,10 @@ _GLOBAL(tm_reclaim)
>   	li	r5, 0
>   	mtmsrd	r5, 1
>   
> +        /* Save AMR since it's used elsewhere in kernel space */
> +	mfspr	r8, SPRN_AMR
> +	std	r8, PACATMAMR(r13)


Can we save this in stack instead of PACA?


> +
>   	/*
>   	 * BE CAREFUL HERE:
>   	 * At this point we can't take an SLB miss since we have MSR_RI
> @@ -245,7 +249,7 @@ _GLOBAL(tm_reclaim)
>   	 * but is used in signal return to 'wind back' to the abort handler.
>   	 */
>   
> -	/* ******************** CR,LR,CCR,MSR ********** */
> +	/* ***************** CTR, LR, CR, XER ********** */
>   	mfctr	r3
>   	mflr	r4
>   	mfcr	r5
> @@ -256,7 +260,6 @@ _GLOBAL(tm_reclaim)
>   	std	r5, _CCR(r7)
>   	std	r6, _XER(r7)
>   
> -
>   	/* ******************** TAR, DSCR ********** */
>   	mfspr	r3, SPRN_TAR
>   	mfspr	r4, SPRN_DSCR
> @@ -264,6 +267,10 @@ _GLOBAL(tm_reclaim)
>   	std	r3, THREAD_TM_TAR(r12)
>   	std	r4, THREAD_TM_DSCR(r12)
>   
> +        /* ******************** AMR **************** */
> +        mfspr	r3, SPRN_AMR
> +        std	r3, THREAD_TM_AMR(r12)
> +
>   	/*
>   	 * MSR and flags: We don't change CRs, and we don't need to alter MSR.
>   	 */
> @@ -308,8 +315,6 @@ _GLOBAL(tm_reclaim)
>   	std	r3, THREAD_TM_TFHAR(r12)
>   	std	r4, THREAD_TM_TFIAR(r12)
>   
> -	/* AMR is checkpointed too, but is unsupported by Linux. */
> -
>   	/* Restore original MSR/IRQ state & clear TM mode */
>   	ld	r14, TM_FRAME_L0(r1)		/* Orig MSR */
>   
> @@ -330,6 +335,10 @@ _GLOBAL(tm_reclaim)
>   	ld	r0, PACA_DSCR_DEFAULT(r13)
>   	mtspr	SPRN_DSCR, r0
>   
> +        /* Restore kernel saved AMR */
> +	ld	r4, PACATMAMR(r13)
> +	mtspr	SPRN_AMR, r4
> +
>   	blr
>   
>   
> @@ -355,6 +364,10 @@ _GLOBAL(__tm_recheckpoint)
>   	 */
>   	SAVE_NVGPRS(r1)
>   
> +	/* Save kernel AMR since it's used elsewhare in kernel space */
> +	mfspr	r8, SPRN_AMR
> +	std	r8, PACATMAMR(r13)
> +
>   	/* Load complete register state from ts_ckpt* registers */
>   
>   	addi	r7, r3, PT_CKPT_REGS		/* Thread's ckpt_regs */
> @@ -404,7 +417,7 @@ _GLOBAL(__tm_recheckpoint)
>   
>   restore_gprs:
>   
> -	/* ******************** CR,LR,CCR,MSR ********** */
> +	/* ****************** CTR, LR, XER ************* */
>   	ld	r4, _CTR(r7)
>   	ld	r5, _LINK(r7)
>   	ld	r8, _XER(r7)
> @@ -417,6 +430,10 @@ restore_gprs:
>   	ld	r4, THREAD_TM_TAR(r3)
>   	mtspr	SPRN_TAR,	r4
>   
> +	/* ******************** AMR ******************** */
> +	ld	r4, THREAD_TM_AMR(r3)
> +	mtspr	SPRN_AMR, r4
> +
>   	/* Load up the PPR and DSCR in GPRs only at this stage */
>   	ld	r5, THREAD_TM_DSCR(r3)
>   	ld	r6, THREAD_TM_PPR(r3)
> @@ -522,6 +539,10 @@ restore_gprs:
>   	ld	r0, PACA_DSCR_DEFAULT(r13)
>   	mtspr	SPRN_DSCR, r0
>   
> +	/* Restore kernel saved AMR */
> +	ld	r4, PACATMAMR(r13)
> +	mtspr	SPRN_AMR, r4
> +
>   	blr
>   
>   	/* ****************************************************************** */
> 


^ permalink raw reply

* [PATCH] powerpc/tm: Save and restore AMR on treclaim and trechkpt
From: Gustavo Romero @ 2020-09-18  4:05 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: mikey, gromero, aneesh.kumar

Althought AMR is stashed on the checkpoint area, currently we don't save
it to the per thread checkpoint struct after a treclaim and so we don't
restore it either from that struct when we trechkpt. As a consequence when
the transaction is later rolled back kernel space AMR value when the
trechkpt was done appears in userspace.

That commit saves and restores AMR accordingly on treclaim and trechkpt.
Since AMR value is also used in kernel space in other functions, it also
takes care of stashing kernel live AMR into PACA before treclaim and before
trechkpt, restoring it later, just before returning from tm_reclaim and
__tm_recheckpoint.

Is also fixes two nonrelated comments about CR and MSR.

Signed-off-by: Gustavo Romero <gromero@linux.ibm.com>
---
 arch/powerpc/include/asm/paca.h      |  1 +
 arch/powerpc/include/asm/processor.h |  1 +
 arch/powerpc/kernel/asm-offsets.c    |  2 ++
 arch/powerpc/kernel/tm.S             | 31 +++++++++++++++++++++++-----
 4 files changed, 30 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/include/asm/paca.h b/arch/powerpc/include/asm/paca.h
index 9454d29ff4b4..44c605181529 100644
--- a/arch/powerpc/include/asm/paca.h
+++ b/arch/powerpc/include/asm/paca.h
@@ -179,6 +179,7 @@ struct paca_struct {
 	u64 sprg_vdso;			/* Saved user-visible sprg */
 #ifdef CONFIG_PPC_TRANSACTIONAL_MEM
 	u64 tm_scratch;                 /* TM scratch area for reclaim */
+	u64 tm_amr;			/* Saved Kernel AMR for treclaim/trechkpt */
 #endif
 
 #ifdef CONFIG_PPC_POWERNV
diff --git a/arch/powerpc/include/asm/processor.h b/arch/powerpc/include/asm/processor.h
index ed0d633ab5aa..9f4f6cc033ac 100644
--- a/arch/powerpc/include/asm/processor.h
+++ b/arch/powerpc/include/asm/processor.h
@@ -220,6 +220,7 @@ struct thread_struct {
 	unsigned long	tm_tar;
 	unsigned long	tm_ppr;
 	unsigned long	tm_dscr;
+	unsigned long   tm_amr;
 
 	/*
 	 * Checkpointed FP and VSX 0-31 register set.
diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c
index 8711c2164b45..cf1a6d68a91f 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -170,12 +170,14 @@ int main(void)
 
 #ifdef CONFIG_PPC_TRANSACTIONAL_MEM
 	OFFSET(PACATMSCRATCH, paca_struct, tm_scratch);
+	OFFSET(PACATMAMR, paca_struct, tm_amr);
 	OFFSET(THREAD_TM_TFHAR, thread_struct, tm_tfhar);
 	OFFSET(THREAD_TM_TEXASR, thread_struct, tm_texasr);
 	OFFSET(THREAD_TM_TFIAR, thread_struct, tm_tfiar);
 	OFFSET(THREAD_TM_TAR, thread_struct, tm_tar);
 	OFFSET(THREAD_TM_PPR, thread_struct, tm_ppr);
 	OFFSET(THREAD_TM_DSCR, thread_struct, tm_dscr);
+	OFFSET(THREAD_TM_AMR, thread_struct, tm_amr);
 	OFFSET(PT_CKPT_REGS, thread_struct, ckpt_regs);
 	OFFSET(THREAD_CKVRSTATE, thread_struct, ckvr_state.vr);
 	OFFSET(THREAD_CKVRSAVE, thread_struct, ckvrsave);
diff --git a/arch/powerpc/kernel/tm.S b/arch/powerpc/kernel/tm.S
index 6ba0fdd1e7f8..e178ddb43619 100644
--- a/arch/powerpc/kernel/tm.S
+++ b/arch/powerpc/kernel/tm.S
@@ -152,6 +152,10 @@ _GLOBAL(tm_reclaim)
 	li	r5, 0
 	mtmsrd	r5, 1
 
+        /* Save AMR since it's used elsewhere in kernel space */
+	mfspr	r8, SPRN_AMR
+	std	r8, PACATMAMR(r13)
+
 	/*
 	 * BE CAREFUL HERE:
 	 * At this point we can't take an SLB miss since we have MSR_RI
@@ -245,7 +249,7 @@ _GLOBAL(tm_reclaim)
 	 * but is used in signal return to 'wind back' to the abort handler.
 	 */
 
-	/* ******************** CR,LR,CCR,MSR ********** */
+	/* ***************** CTR, LR, CR, XER ********** */
 	mfctr	r3
 	mflr	r4
 	mfcr	r5
@@ -256,7 +260,6 @@ _GLOBAL(tm_reclaim)
 	std	r5, _CCR(r7)
 	std	r6, _XER(r7)
 
-
 	/* ******************** TAR, DSCR ********** */
 	mfspr	r3, SPRN_TAR
 	mfspr	r4, SPRN_DSCR
@@ -264,6 +267,10 @@ _GLOBAL(tm_reclaim)
 	std	r3, THREAD_TM_TAR(r12)
 	std	r4, THREAD_TM_DSCR(r12)
 
+        /* ******************** AMR **************** */
+        mfspr	r3, SPRN_AMR
+        std	r3, THREAD_TM_AMR(r12)
+
 	/*
 	 * MSR and flags: We don't change CRs, and we don't need to alter MSR.
 	 */
@@ -308,8 +315,6 @@ _GLOBAL(tm_reclaim)
 	std	r3, THREAD_TM_TFHAR(r12)
 	std	r4, THREAD_TM_TFIAR(r12)
 
-	/* AMR is checkpointed too, but is unsupported by Linux. */
-
 	/* Restore original MSR/IRQ state & clear TM mode */
 	ld	r14, TM_FRAME_L0(r1)		/* Orig MSR */
 
@@ -330,6 +335,10 @@ _GLOBAL(tm_reclaim)
 	ld	r0, PACA_DSCR_DEFAULT(r13)
 	mtspr	SPRN_DSCR, r0
 
+        /* Restore kernel saved AMR */
+	ld	r4, PACATMAMR(r13)
+	mtspr	SPRN_AMR, r4
+
 	blr
 
 
@@ -355,6 +364,10 @@ _GLOBAL(__tm_recheckpoint)
 	 */
 	SAVE_NVGPRS(r1)
 
+	/* Save kernel AMR since it's used elsewhare in kernel space */
+	mfspr	r8, SPRN_AMR
+	std	r8, PACATMAMR(r13)
+
 	/* Load complete register state from ts_ckpt* registers */
 
 	addi	r7, r3, PT_CKPT_REGS		/* Thread's ckpt_regs */
@@ -404,7 +417,7 @@ _GLOBAL(__tm_recheckpoint)
 
 restore_gprs:
 
-	/* ******************** CR,LR,CCR,MSR ********** */
+	/* ****************** CTR, LR, XER ************* */
 	ld	r4, _CTR(r7)
 	ld	r5, _LINK(r7)
 	ld	r8, _XER(r7)
@@ -417,6 +430,10 @@ restore_gprs:
 	ld	r4, THREAD_TM_TAR(r3)
 	mtspr	SPRN_TAR,	r4
 
+	/* ******************** AMR ******************** */
+	ld	r4, THREAD_TM_AMR(r3)
+	mtspr	SPRN_AMR, r4
+
 	/* Load up the PPR and DSCR in GPRs only at this stage */
 	ld	r5, THREAD_TM_DSCR(r3)
 	ld	r6, THREAD_TM_PPR(r3)
@@ -522,6 +539,10 @@ restore_gprs:
 	ld	r0, PACA_DSCR_DEFAULT(r13)
 	mtspr	SPRN_DSCR, r0
 
+	/* Restore kernel saved AMR */
+	ld	r4, PACATMAMR(r13)
+	mtspr	SPRN_AMR, r4
+
 	blr
 
 	/* ****************************************************************** */
-- 
2.17.1


^ permalink raw reply related

* RE: [PATCHv7 00/12]PCI: dwc: Add the multiple PF support for DWC and Layerscape
From: Z.q. Hou @ 2020-09-18  2:55 UTC (permalink / raw)
  To: Lorenzo Pieralisi
  Cc: devicetree@vger.kernel.org, andrew.murray@arm.com, Roy Zang,
	linux-pci@vger.kernel.org, linux-kernel@vger.kernel.org, Leo Li,
	M.h. Lian, jingoohan1@gmail.com, robh+dt@kernel.org, Mingkai Hu,
	gustavo.pimentel@synopsys.com, bhelgaas@google.com,
	shawnguo@kernel.org, kishon@ti.com, linuxppc-dev@lists.ozlabs.org,
	linux-arm-kernel@lists.infradead.org
In-Reply-To: <20200917162017.GA6830@e121166-lin.cambridge.arm.com>

Hi Lorenzo,

Thanks a lot for your comments!

> -----Original Message-----
> From: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
> Sent: 2020年9月18日 0:20
> To: Z.q. Hou <zhiqiang.hou@nxp.com>
> Cc: linux-pci@vger.kernel.org; devicetree@vger.kernel.org;
> linux-kernel@vger.kernel.org; linux-arm-kernel@lists.infradead.org;
> linuxppc-dev@lists.ozlabs.org; robh+dt@kernel.org; bhelgaas@google.com;
> shawnguo@kernel.org; Leo Li <leoyang.li@nxp.com>; kishon@ti.com;
> gustavo.pimentel@synopsys.com; Roy Zang <roy.zang@nxp.com>;
> jingoohan1@gmail.com; andrew.murray@arm.com; Mingkai Hu
> <mingkai.hu@nxp.com>; M.h. Lian <minghuan.lian@nxp.com>
> Subject: Re: [PATCHv7 00/12]PCI: dwc: Add the multiple PF support for DWC
> and Layerscape
> 
> On Tue, Aug 11, 2020 at 05:54:29PM +0800, Zhiqiang Hou wrote:
> > From: Hou Zhiqiang <Zhiqiang.Hou@nxp.com>
> >
> > Add the PCIe EP multiple PF support for DWC and Layerscape, and use a
> > list to manage the PFs of each PCIe controller; add the doorbell MSIX
> > function for DWC; and refactor the Layerscape EP driver due to some
> > difference in Layercape platforms PCIe integration.
> >
> > Hou Zhiqiang (1):
> >   misc: pci_endpoint_test: Add driver data for Layerscape PCIe
> >     controllers
> >
> > Xiaowei Bao (11):
> >   PCI: designware-ep: Add multiple PFs support for DWC
> >   PCI: designware-ep: Add the doorbell mode of MSI-X in EP mode
> >   PCI: designware-ep: Move the function of getting MSI capability
> >     forward
> >   PCI: designware-ep: Modify MSI and MSIX CAP way of finding
> >   dt-bindings: pci: layerscape-pci: Add compatible strings for ls1088a
> >     and ls2088a
> >   PCI: layerscape: Fix some format issue of the code
> >   PCI: layerscape: Modify the way of getting capability with different
> >     PEX
> >   PCI: layerscape: Modify the MSIX to the doorbell mode
> >   PCI: layerscape: Add EP mode support for ls1088a and ls2088a
> >   arm64: dts: layerscape: Add PCIe EP node for ls1088a
> >   misc: pci_endpoint_test: Add LS1088a in pci_device_id table
> >
> >  .../bindings/pci/layerscape-pci.txt           |   2 +
> >  .../arm64/boot/dts/freescale/fsl-ls1088a.dtsi |  31 +++
> >  drivers/misc/pci_endpoint_test.c              |   8 +-
> >  .../pci/controller/dwc/pci-layerscape-ep.c    | 100 +++++--
> >  .../pci/controller/dwc/pcie-designware-ep.c   | 258
> ++++++++++++++----
> >  drivers/pci/controller/dwc/pcie-designware.c  |  59 ++--
> > drivers/pci/controller/dwc/pcie-designware.h  |  48 +++-
> >  7 files changed, 410 insertions(+), 96 deletions(-)
> 
> Side note: I will change it for you but please keep Signed-off-by:
> tags together in the log instead of mixing them with other tags randomly.
> 
> Can you rebase this series against my pci/dwc branch please and send a v8 ?
> 
> I will apply it then.

I'll rebase this series and put the Signed-off-by tags together today.

Regards,
Zhiqiang

> 
> Thanks,
> Lorenzo

^ permalink raw reply

* [PATCH AUTOSEL 4.14 102/127] powerpc/traps: Make unrecoverable NMIs die instead of panic
From: Sasha Levin @ 2020-09-18  2:11 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Christophe Leroy, linuxppc-dev, Nicholas Piggin, Sasha Levin
In-Reply-To: <20200918021220.2066485-1-sashal@kernel.org>

From: Nicholas Piggin <npiggin@gmail.com>

[ Upstream commit 265d6e588d87194c2fe2d6c240247f0264e0c19b ]

System Reset and Machine Check interrupts that are not recoverable due
to being nested or interrupting when RI=0 currently panic. This is not
necessary, and can often just kill the current context and recover.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Reviewed-by: Christophe Leroy <christophe.leroy@c-s.fr>
Link: https://lore.kernel.org/r/20200508043408.886394-16-npiggin@gmail.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 arch/powerpc/kernel/traps.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
index 3c9457420aee8..0f1a888c04a84 100644
--- a/arch/powerpc/kernel/traps.c
+++ b/arch/powerpc/kernel/traps.c
@@ -357,11 +357,11 @@ out:
 #ifdef CONFIG_PPC_BOOK3S_64
 	BUG_ON(get_paca()->in_nmi == 0);
 	if (get_paca()->in_nmi > 1)
-		nmi_panic(regs, "Unrecoverable nested System Reset");
+		die("Unrecoverable nested System Reset", regs, SIGABRT);
 #endif
 	/* Must die if the interrupt is not recoverable */
 	if (!(regs->msr & MSR_RI))
-		nmi_panic(regs, "Unrecoverable System Reset");
+		die("Unrecoverable System Reset", regs, SIGABRT);
 
 	if (!nested)
 		nmi_exit();
@@ -701,7 +701,7 @@ void machine_check_exception(struct pt_regs *regs)
 
 	/* Must die if the interrupt is not recoverable */
 	if (!(regs->msr & MSR_RI))
-		nmi_panic(regs, "Unrecoverable Machine check");
+		die("Unrecoverable Machine check", regs, SIGBUS);
 
 	return;
 
-- 
2.25.1


^ permalink raw reply related

* [PATCH AUTOSEL 4.14 031/127] powerpc/eeh: Only dump stack once if an MMIO loop is detected
From: Sasha Levin @ 2020-09-18  2:10 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Sam Bobroff, Oliver O'Halloran, linuxppc-dev, Sasha Levin
In-Reply-To: <20200918021220.2066485-1-sashal@kernel.org>

From: Oliver O'Halloran <oohall@gmail.com>

[ Upstream commit 4e0942c0302b5ad76b228b1a7b8c09f658a1d58a ]

Many drivers don't check for errors when they get a 0xFFs response from an
MMIO load. As a result after an EEH event occurs a driver can get stuck in
a polling loop unless it some kind of internal timeout logic.

Currently EEH tries to detect and report stuck drivers by dumping a stack
trace after eeh_dev_check_failure() is called EEH_MAX_FAILS times on an
already frozen PE. The value of EEH_MAX_FAILS was chosen so that a dump
would occur every few seconds if the driver was spinning in a loop. This
results in a lot of spurious stack traces in the kernel log.

Fix this by limiting it to printing one stack trace for each PE freeze. If
the driver is truely stuck the kernel's hung task detector is better suited
to reporting the probelm anyway.

Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Reviewed-by: Sam Bobroff <sbobroff@linux.ibm.com>
Tested-by: Sam Bobroff <sbobroff@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20191016012536.22588-1-oohall@gmail.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 arch/powerpc/kernel/eeh.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c
index d2ba7936d0d33..7b46576962bfd 100644
--- a/arch/powerpc/kernel/eeh.c
+++ b/arch/powerpc/kernel/eeh.c
@@ -506,7 +506,7 @@ int eeh_dev_check_failure(struct eeh_dev *edev)
 	rc = 1;
 	if (pe->state & EEH_PE_ISOLATED) {
 		pe->check_count++;
-		if (pe->check_count % EEH_MAX_FAILS == 0) {
+		if (pe->check_count == EEH_MAX_FAILS) {
 			dn = pci_device_to_OF_node(dev);
 			if (dn)
 				location = of_get_property(dn, "ibm,loc-code",
-- 
2.25.1

^ permalink raw reply related

* [PATCH AUTOSEL 4.19 165/206] powerpc/traps: Make unrecoverable NMIs die instead of panic
From: Sasha Levin @ 2020-09-18  2:07 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Christophe Leroy, linuxppc-dev, Nicholas Piggin, Sasha Levin
In-Reply-To: <20200918020802.2065198-1-sashal@kernel.org>

From: Nicholas Piggin <npiggin@gmail.com>

[ Upstream commit 265d6e588d87194c2fe2d6c240247f0264e0c19b ]

System Reset and Machine Check interrupts that are not recoverable due
to being nested or interrupting when RI=0 currently panic. This is not
necessary, and can often just kill the current context and recover.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Reviewed-by: Christophe Leroy <christophe.leroy@c-s.fr>
Link: https://lore.kernel.org/r/20200508043408.886394-16-npiggin@gmail.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 arch/powerpc/kernel/traps.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
index d5f351f02c153..7781f0168ce8c 100644
--- a/arch/powerpc/kernel/traps.c
+++ b/arch/powerpc/kernel/traps.c
@@ -430,11 +430,11 @@ out:
 #ifdef CONFIG_PPC_BOOK3S_64
 	BUG_ON(get_paca()->in_nmi == 0);
 	if (get_paca()->in_nmi > 1)
-		nmi_panic(regs, "Unrecoverable nested System Reset");
+		die("Unrecoverable nested System Reset", regs, SIGABRT);
 #endif
 	/* Must die if the interrupt is not recoverable */
 	if (!(regs->msr & MSR_RI))
-		nmi_panic(regs, "Unrecoverable System Reset");
+		die("Unrecoverable System Reset", regs, SIGABRT);
 
 	if (!nested)
 		nmi_exit();
@@ -775,7 +775,7 @@ void machine_check_exception(struct pt_regs *regs)
 
 	/* Must die if the interrupt is not recoverable */
 	if (!(regs->msr & MSR_RI))
-		nmi_panic(regs, "Unrecoverable Machine check");
+		die("Unrecoverable Machine check", regs, SIGBUS);
 
 	return;
 
-- 
2.25.1


^ permalink raw reply related

* [PATCH AUTOSEL 4.19 114/206] KVM: PPC: Book3S HV: Treat TM-related invalid form instructions on P9 like the valid ones
From: Sasha Levin @ 2020-09-18  2:06 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Sasha Levin, Michael Neuling, kvm, Gustavo Romero, kvm-ppc,
	Leonardo Bras, linuxppc-dev
In-Reply-To: <20200918020802.2065198-1-sashal@kernel.org>

From: Gustavo Romero <gromero@linux.ibm.com>

[ Upstream commit 1dff3064c764b5a51c367b949b341d2e38972bec ]

On P9 DD2.2 due to a CPU defect some TM instructions need to be emulated by
KVM. This is handled at first by the hardware raising a softpatch interrupt
when certain TM instructions that need KVM assistance are executed in the
guest. Althought some TM instructions per Power ISA are invalid forms they
can raise a softpatch interrupt too. For instance, 'tresume.' instruction
as defined in the ISA must have bit 31 set (1), but an instruction that
matches 'tresume.' PO and XO opcode fields but has bit 31 not set (0), like
0x7cfe9ddc, also raises a softpatch interrupt. Similarly for 'treclaim.'
and 'trechkpt.' instructions with bit 31 = 0, i.e. 0x7c00075c and
0x7c0007dc, respectively. Hence, if a code like the following is executed
in the guest it will raise a softpatch interrupt just like a 'tresume.'
when the TM facility is enabled ('tabort. 0' in the example is used only
to enable the TM facility):

int main() { asm("tabort. 0; .long 0x7cfe9ddc;"); }

Currently in such a case KVM throws a complete trace like:

[345523.705984] WARNING: CPU: 24 PID: 64413 at arch/powerpc/kvm/book3s_hv_tm.c:211 kvmhv_p9_tm_emulation+0x68/0x620 [kvm_hv]
[345523.705985] Modules linked in: kvm_hv(E) xt_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp ip6table_mangle ip6table_nat
iptable_mangle iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ebtable_filter ebtables ip6table_filter
ip6_tables iptable_filter bridge stp llc sch_fq_codel ipmi_powernv at24 vmx_crypto ipmi_devintf ipmi_msghandler
ibmpowernv uio_pdrv_genirq kvm opal_prd uio leds_powernv ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp
libiscsi scsi_transport_iscsi ip_tables x_tables autofs4 btrfs blake2b_generic zstd_compress raid10 raid456
async_raid6_recov async_memcpy async_pq async_xor async_tx libcrc32c xor raid6_pq raid1 raid0 multipath linear tg3
crct10dif_vpmsum crc32c_vpmsum ipr [last unloaded: kvm_hv]
[345523.706030] CPU: 24 PID: 64413 Comm: CPU 0/KVM Tainted: G        W   E     5.5.0+ #1
[345523.706031] NIP:  c0080000072cb9c0 LR: c0080000072b5e80 CTR: c0080000085c7850
[345523.706034] REGS: c000000399467680 TRAP: 0700   Tainted: G        W   E      (5.5.0+)
[345523.706034] MSR:  900000010282b033 <SF,HV,VEC,VSX,EE,FP,ME,IR,DR,RI,LE,TM[E]>  CR: 24022428  XER: 00000000
[345523.706042] CFAR: c0080000072b5e7c IRQMASK: 0
                GPR00: c0080000072b5e80 c000000399467910 c0080000072db500 c000000375ccc720
                GPR04: c000000375ccc720 00000003fbec0000 0000a10395dda5a6 0000000000000000
                GPR08: 000000007cfe9ddc 7cfe9ddc000005dc 7cfe9ddc7c0005dc c0080000072cd530
                GPR12: c0080000085c7850 c0000003fffeb800 0000000000000001 00007dfb737f0000
                GPR16: c0002001edcca558 0000000000000000 0000000000000000 0000000000000001
                GPR20: c000000001b21258 c0002001edcca558 0000000000000018 0000000000000000
                GPR24: 0000000001000000 ffffffffffffffff 0000000000000001 0000000000001500
                GPR28: c0002001edcc4278 c00000037dd80000 800000050280f033 c000000375ccc720
[345523.706062] NIP [c0080000072cb9c0] kvmhv_p9_tm_emulation+0x68/0x620 [kvm_hv]
[345523.706065] LR [c0080000072b5e80] kvmppc_handle_exit_hv.isra.53+0x3e8/0x798 [kvm_hv]
[345523.706066] Call Trace:
[345523.706069] [c000000399467910] [c000000399467940] 0xc000000399467940 (unreliable)
[345523.706071] [c000000399467950] [c000000399467980] 0xc000000399467980
[345523.706075] [c0000003994679f0] [c0080000072bd1c4] kvmhv_run_single_vcpu+0xa1c/0xb80 [kvm_hv]
[345523.706079] [c000000399467ac0] [c0080000072bd8e0] kvmppc_vcpu_run_hv+0x5b8/0xb00 [kvm_hv]
[345523.706087] [c000000399467b90] [c0080000085c93cc] kvmppc_vcpu_run+0x34/0x48 [kvm]
[345523.706095] [c000000399467bb0] [c0080000085c582c] kvm_arch_vcpu_ioctl_run+0x244/0x420 [kvm]
[345523.706101] [c000000399467c40] [c0080000085b7498] kvm_vcpu_ioctl+0x3d0/0x7b0 [kvm]
[345523.706105] [c000000399467db0] [c0000000004adf9c] ksys_ioctl+0x13c/0x170
[345523.706107] [c000000399467e00] [c0000000004adff8] sys_ioctl+0x28/0x80
[345523.706111] [c000000399467e20] [c00000000000b278] system_call+0x5c/0x68
[345523.706112] Instruction dump:
[345523.706114] 419e0390 7f8a4840 409d0048 6d497c00 2f89075d 419e021c 6d497c00 2f8907dd
[345523.706119] 419e01c0 6d497c00 2f8905dd 419e00a4 <0fe00000> 38210040 38600000 ebc1fff0

and then treats the executed instruction as a 'nop'.

However the POWER9 User's Manual, in section "4.6.10 Book II Invalid
Forms", informs that for TM instructions bit 31 is in fact ignored, thus
for the TM-related invalid forms ignoring bit 31 and handling them like the
valid forms is an acceptable way to handle them. POWER8 behaves the same
way too.

This commit changes the handling of the cases here described by treating
the TM-related invalid forms that can generate a softpatch interrupt
just like their valid forms (w/ bit 31 = 1) instead of as a 'nop' and by
gently reporting any other unrecognized case to the host and treating it as
illegal instruction instead of throwing a trace and treating it as a 'nop'.

Signed-off-by: Gustavo Romero <gromero@linux.ibm.com>
Reviewed-by: Segher Boessenkool <segher@kernel.crashing.org>
Acked-By: Michael Neuling <mikey@neuling.org>
Reviewed-by: Leonardo Bras <leonardo@linux.ibm.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 arch/powerpc/include/asm/kvm_asm.h      |  3 +++
 arch/powerpc/kvm/book3s_hv_tm.c         | 28 ++++++++++++++++++++-----
 arch/powerpc/kvm/book3s_hv_tm_builtin.c | 16 ++++++++++++--
 3 files changed, 40 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_asm.h b/arch/powerpc/include/asm/kvm_asm.h
index a790d5cf6ea37..684e8ae00d160 100644
--- a/arch/powerpc/include/asm/kvm_asm.h
+++ b/arch/powerpc/include/asm/kvm_asm.h
@@ -163,4 +163,7 @@
 
 #define KVM_INST_FETCH_FAILED	-1
 
+/* Extract PO and XOP opcode fields */
+#define PO_XOP_OPCODE_MASK 0xfc0007fe
+
 #endif /* __POWERPC_KVM_ASM_H__ */
diff --git a/arch/powerpc/kvm/book3s_hv_tm.c b/arch/powerpc/kvm/book3s_hv_tm.c
index 31cd0f327c8a2..e7fd60cf97804 100644
--- a/arch/powerpc/kvm/book3s_hv_tm.c
+++ b/arch/powerpc/kvm/book3s_hv_tm.c
@@ -6,6 +6,8 @@
  * published by the Free Software Foundation.
  */
 
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
 #include <linux/kvm_host.h>
 
 #include <asm/kvm_ppc.h>
@@ -47,7 +49,18 @@ int kvmhv_p9_tm_emulation(struct kvm_vcpu *vcpu)
 	u64 newmsr, bescr;
 	int ra, rs;
 
-	switch (instr & 0xfc0007ff) {
+	/*
+	 * rfid, rfebb, and mtmsrd encode bit 31 = 0 since it's a reserved bit
+	 * in these instructions, so masking bit 31 out doesn't change these
+	 * instructions. For treclaim., tsr., and trechkpt. instructions if bit
+	 * 31 = 0 then they are per ISA invalid forms, however P9 UM, in section
+	 * 4.6.10 Book II Invalid Forms, informs specifically that ignoring bit
+	 * 31 is an acceptable way to handle these invalid forms that have
+	 * bit 31 = 0. Moreover, for emulation purposes both forms (w/ and wo/
+	 * bit 31 set) can generate a softpatch interrupt. Hence both forms
+	 * are handled below for these instructions so they behave the same way.
+	 */
+	switch (instr & PO_XOP_OPCODE_MASK) {
 	case PPC_INST_RFID:
 		/* XXX do we need to check for PR=0 here? */
 		newmsr = vcpu->arch.shregs.srr1;
@@ -108,7 +121,8 @@ int kvmhv_p9_tm_emulation(struct kvm_vcpu *vcpu)
 		vcpu->arch.shregs.msr = newmsr;
 		return RESUME_GUEST;
 
-	case PPC_INST_TSR:
+	/* ignore bit 31, see comment above */
+	case (PPC_INST_TSR & PO_XOP_OPCODE_MASK):
 		/* check for PR=1 and arch 2.06 bit set in PCR */
 		if ((msr & MSR_PR) && (vcpu->arch.vcore->pcr & PCR_ARCH_206)) {
 			/* generate an illegal instruction interrupt */
@@ -143,7 +157,8 @@ int kvmhv_p9_tm_emulation(struct kvm_vcpu *vcpu)
 		vcpu->arch.shregs.msr = msr;
 		return RESUME_GUEST;
 
-	case PPC_INST_TRECLAIM:
+	/* ignore bit 31, see comment above */
+	case (PPC_INST_TRECLAIM & PO_XOP_OPCODE_MASK):
 		/* check for TM disabled in the HFSCR or MSR */
 		if (!(vcpu->arch.hfscr & HFSCR_TM)) {
 			/* generate an illegal instruction interrupt */
@@ -179,7 +194,8 @@ int kvmhv_p9_tm_emulation(struct kvm_vcpu *vcpu)
 		vcpu->arch.shregs.msr &= ~MSR_TS_MASK;
 		return RESUME_GUEST;
 
-	case PPC_INST_TRECHKPT:
+	/* ignore bit 31, see comment above */
+	case (PPC_INST_TRECHKPT & PO_XOP_OPCODE_MASK):
 		/* XXX do we need to check for PR=0 here? */
 		/* check for TM disabled in the HFSCR or MSR */
 		if (!(vcpu->arch.hfscr & HFSCR_TM)) {
@@ -211,6 +227,8 @@ int kvmhv_p9_tm_emulation(struct kvm_vcpu *vcpu)
 	}
 
 	/* What should we do here? We didn't recognize the instruction */
-	WARN_ON_ONCE(1);
+	kvmppc_core_queue_program(vcpu, SRR1_PROGILL);
+	pr_warn_ratelimited("Unrecognized TM-related instruction %#x for emulation", instr);
+
 	return RESUME_GUEST;
 }
diff --git a/arch/powerpc/kvm/book3s_hv_tm_builtin.c b/arch/powerpc/kvm/book3s_hv_tm_builtin.c
index 3cf5863bc06e8..3c7ca2fa19597 100644
--- a/arch/powerpc/kvm/book3s_hv_tm_builtin.c
+++ b/arch/powerpc/kvm/book3s_hv_tm_builtin.c
@@ -26,7 +26,18 @@ int kvmhv_p9_tm_emulation_early(struct kvm_vcpu *vcpu)
 	u64 newmsr, msr, bescr;
 	int rs;
 
-	switch (instr & 0xfc0007ff) {
+	/*
+	 * rfid, rfebb, and mtmsrd encode bit 31 = 0 since it's a reserved bit
+	 * in these instructions, so masking bit 31 out doesn't change these
+	 * instructions. For the tsr. instruction if bit 31 = 0 then it is per
+	 * ISA an invalid form, however P9 UM, in section 4.6.10 Book II Invalid
+	 * Forms, informs specifically that ignoring bit 31 is an acceptable way
+	 * to handle TM-related invalid forms that have bit 31 = 0. Moreover,
+	 * for emulation purposes both forms (w/ and wo/ bit 31 set) can
+	 * generate a softpatch interrupt. Hence both forms are handled below
+	 * for tsr. to make them behave the same way.
+	 */
+	switch (instr & PO_XOP_OPCODE_MASK) {
 	case PPC_INST_RFID:
 		/* XXX do we need to check for PR=0 here? */
 		newmsr = vcpu->arch.shregs.srr1;
@@ -76,7 +87,8 @@ int kvmhv_p9_tm_emulation_early(struct kvm_vcpu *vcpu)
 		vcpu->arch.shregs.msr = newmsr;
 		return 1;
 
-	case PPC_INST_TSR:
+	/* ignore bit 31, see comment above */
+	case (PPC_INST_TSR & PO_XOP_OPCODE_MASK):
 		/* we know the MSR has the TS field = S (0b01) here */
 		msr = vcpu->arch.shregs.msr;
 		/* check for PR=1 and arch 2.06 bit set in PCR */
-- 
2.25.1


^ permalink raw reply related

* [PATCH AUTOSEL 4.19 056/206] powerpc/eeh: Only dump stack once if an MMIO loop is detected
From: Sasha Levin @ 2020-09-18  2:05 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Sam Bobroff, Oliver O'Halloran, linuxppc-dev, Sasha Levin
In-Reply-To: <20200918020802.2065198-1-sashal@kernel.org>

From: Oliver O'Halloran <oohall@gmail.com>

[ Upstream commit 4e0942c0302b5ad76b228b1a7b8c09f658a1d58a ]

Many drivers don't check for errors when they get a 0xFFs response from an
MMIO load. As a result after an EEH event occurs a driver can get stuck in
a polling loop unless it some kind of internal timeout logic.

Currently EEH tries to detect and report stuck drivers by dumping a stack
trace after eeh_dev_check_failure() is called EEH_MAX_FAILS times on an
already frozen PE. The value of EEH_MAX_FAILS was chosen so that a dump
would occur every few seconds if the driver was spinning in a loop. This
results in a lot of spurious stack traces in the kernel log.

Fix this by limiting it to printing one stack trace for each PE freeze. If
the driver is truely stuck the kernel's hung task detector is better suited
to reporting the probelm anyway.

Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Reviewed-by: Sam Bobroff <sbobroff@linux.ibm.com>
Tested-by: Sam Bobroff <sbobroff@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20191016012536.22588-1-oohall@gmail.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 arch/powerpc/kernel/eeh.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c
index fe3c6f3bd3b62..d123cba0992d0 100644
--- a/arch/powerpc/kernel/eeh.c
+++ b/arch/powerpc/kernel/eeh.c
@@ -502,7 +502,7 @@ int eeh_dev_check_failure(struct eeh_dev *edev)
 	rc = 1;
 	if (pe->state & EEH_PE_ISOLATED) {
 		pe->check_count++;
-		if (pe->check_count % EEH_MAX_FAILS == 0) {
+		if (pe->check_count == EEH_MAX_FAILS) {
 			dn = pci_device_to_OF_node(dev);
 			if (dn)
 				location = of_get_property(dn, "ibm,loc-code",
-- 
2.25.1

^ permalink raw reply related

* [PATCH AUTOSEL 4.19 055/206] powerpc/powernv/ioda: Fix ref count for devices with their own PE
From: Sasha Levin @ 2020-09-18  2:05 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Frederic Barrat, linuxppc-dev, Andrew Donnellan, Sasha Levin
In-Reply-To: <20200918020802.2065198-1-sashal@kernel.org>

From: Frederic Barrat <fbarrat@linux.ibm.com>

[ Upstream commit 05dd7da76986937fb288b4213b1fa10dbe0d1b33 ]

The pci_dn structure used to store a pointer to the struct pci_dev, so
taking a reference on the device was required. However, the pci_dev
pointer was later removed from the pci_dn structure, but the reference
was kept for the npu device.
See commit 902bdc57451c ("powerpc/powernv/idoa: Remove unnecessary
pcidev from pci_dn").

We don't need to take a reference on the device when assigning the PE
as the struct pnv_ioda_pe is cleaned up at the same time as
the (physical) device is released. Doing so prevents the device from
being released, which is a problem for opencapi devices, since we want
to be able to remove them through PCI hotplug.

Now the ugly part: nvlink npu devices are not meant to be
released. Because of the above, we've always leaked a reference and
simply removing it now is dangerous and would likely require more
work. There's currently no release device callback for nvlink devices
for example. So to be safe, this patch leaks a reference on the npu
device, but only for nvlink and not opencapi.

Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com>
Reviewed-by: Andrew Donnellan <ajd@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20191121134918.7155-2-fbarrat@linux.ibm.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 arch/powerpc/platforms/powernv/pci-ioda.c | 19 ++++++++++++-------
 1 file changed, 12 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
index ecd211c5f24a5..19cd6affdd5fb 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -1071,14 +1071,13 @@ static struct pnv_ioda_pe *pnv_ioda_setup_dev_PE(struct pci_dev *dev)
 		return NULL;
 	}
 
-	/* NOTE: We get only one ref to the pci_dev for the pdn, not for the
-	 * pointer in the PE data structure, both should be destroyed at the
-	 * same time. However, this needs to be looked at more closely again
-	 * once we actually start removing things (Hotplug, SR-IOV, ...)
+	/* NOTE: We don't get a reference for the pointer in the PE
+	 * data structure, both the device and PE structures should be
+	 * destroyed at the same time. However, removing nvlink
+	 * devices will need some work.
 	 *
 	 * At some point we want to remove the PDN completely anyways
 	 */
-	pci_dev_get(dev);
 	pdn->pe_number = pe->pe_number;
 	pe->flags = PNV_IODA_PE_DEV;
 	pe->pdev = dev;
@@ -1093,7 +1092,6 @@ static struct pnv_ioda_pe *pnv_ioda_setup_dev_PE(struct pci_dev *dev)
 		pnv_ioda_free_pe(pe);
 		pdn->pe_number = IODA_INVALID_PE;
 		pe->pdev = NULL;
-		pci_dev_put(dev);
 		return NULL;
 	}
 
@@ -1213,6 +1211,14 @@ static struct pnv_ioda_pe *pnv_ioda_setup_npu_PE(struct pci_dev *npu_pdev)
 	struct pci_controller *hose = pci_bus_to_host(npu_pdev->bus);
 	struct pnv_phb *phb = hose->private_data;
 
+	/*
+	 * Intentionally leak a reference on the npu device (for
+	 * nvlink only; this is not an opencapi path) to make sure it
+	 * never goes away, as it's been the case all along and some
+	 * work is needed otherwise.
+	 */
+	pci_dev_get(npu_pdev);
+
 	/*
 	 * Due to a hardware errata PE#0 on the NPU is reserved for
 	 * error handling. This means we only have three PEs remaining
@@ -1236,7 +1242,6 @@ static struct pnv_ioda_pe *pnv_ioda_setup_npu_PE(struct pci_dev *npu_pdev)
 			 */
 			dev_info(&npu_pdev->dev,
 				"Associating to existing PE %x\n", pe_num);
-			pci_dev_get(npu_pdev);
 			npu_pdn = pci_get_pdn(npu_pdev);
 			rid = npu_pdev->bus->number << 8 | npu_pdn->devfn;
 			npu_pdn->pe_number = pe_num;
-- 
2.25.1


^ permalink raw reply related

* [PATCH AUTOSEL 5.4 290/330] KVM: PPC: Book3S HV: Close race with page faults around memslot flushes
From: Sasha Levin @ 2020-09-18  2:00 UTC (permalink / raw)
  To: linux-kernel, stable; +Cc: Sasha Levin, linuxppc-dev, kvm, kvm-ppc
In-Reply-To: <20200918020110.2063155-1-sashal@kernel.org>

From: Paul Mackerras <paulus@ozlabs.org>

[ Upstream commit 11362b1befeadaae4d159a8cddcdaf6b8afe08f9 ]

There is a potential race condition between hypervisor page faults
and flushing a memslot.  It is possible for a page fault to read the
memslot before a memslot is updated and then write a PTE to the
partition-scoped page tables after kvmppc_radix_flush_memslot has
completed.  (Note that this race has never been explicitly observed.)

To close this race, it is sufficient to increment the MMU sequence
number while the kvm->mmu_lock is held.  That will cause
mmu_notifier_retry() to return true, and the page fault will then
return to the guest without inserting a PTE.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 arch/powerpc/kvm/book3s_64_mmu_radix.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/arch/powerpc/kvm/book3s_64_mmu_radix.c b/arch/powerpc/kvm/book3s_64_mmu_radix.c
index da8375437d161..9d73448354698 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_radix.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_radix.c
@@ -1104,6 +1104,11 @@ void kvmppc_radix_flush_memslot(struct kvm *kvm,
 					 kvm->arch.lpid);
 		gpa += PAGE_SIZE;
 	}
+	/*
+	 * Increase the mmu notifier sequence number to prevent any page
+	 * fault that read the memslot earlier from writing a PTE.
+	 */
+	kvm->mmu_notifier_seq++;
 	spin_unlock(&kvm->mmu_lock);
 }

-- 
2.25.1

^ permalink raw reply related

* [PATCH AUTOSEL 5.4 270/330] powerpc/traps: Make unrecoverable NMIs die instead of panic
From: Sasha Levin @ 2020-09-18  2:00 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Christophe Leroy, linuxppc-dev, Nicholas Piggin, Sasha Levin
In-Reply-To: <20200918020110.2063155-1-sashal@kernel.org>

From: Nicholas Piggin <npiggin@gmail.com>

[ Upstream commit 265d6e588d87194c2fe2d6c240247f0264e0c19b ]

System Reset and Machine Check interrupts that are not recoverable due
to being nested or interrupting when RI=0 currently panic. This is not
necessary, and can often just kill the current context and recover.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Reviewed-by: Christophe Leroy <christophe.leroy@c-s.fr>
Link: https://lore.kernel.org/r/20200508043408.886394-16-npiggin@gmail.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 arch/powerpc/kernel/traps.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
index 014ff0701f245..9432fc6af28a5 100644
--- a/arch/powerpc/kernel/traps.c
+++ b/arch/powerpc/kernel/traps.c
@@ -510,11 +510,11 @@ out:
 #ifdef CONFIG_PPC_BOOK3S_64
 	BUG_ON(get_paca()->in_nmi == 0);
 	if (get_paca()->in_nmi > 1)
-		nmi_panic(regs, "Unrecoverable nested System Reset");
+		die("Unrecoverable nested System Reset", regs, SIGABRT);
 #endif
 	/* Must die if the interrupt is not recoverable */
 	if (!(regs->msr & MSR_RI))
-		nmi_panic(regs, "Unrecoverable System Reset");
+		die("Unrecoverable System Reset", regs, SIGABRT);
 
 	if (saved_hsrrs) {
 		mtspr(SPRN_HSRR0, hsrr0);
@@ -858,7 +858,7 @@ void machine_check_exception(struct pt_regs *regs)
 
 	/* Must die if the interrupt is not recoverable */
 	if (!(regs->msr & MSR_RI))
-		nmi_panic(regs, "Unrecoverable Machine check");
+		die("Unrecoverable Machine check", regs, SIGBUS);
 
 	return;
 
-- 
2.25.1


^ permalink raw reply related

* [PATCH AUTOSEL 5.4 224/330] powerpc/perf: Implement a global lock to avoid races between trace, core and thread imc events.
From: Sasha Levin @ 2020-09-18  1:59 UTC (permalink / raw)
  To: linux-kernel, stable; +Cc: Sasha Levin, Anju T Sudhakar, linuxppc-dev
In-Reply-To: <20200918020110.2063155-1-sashal@kernel.org>

From: Anju T Sudhakar <anju@linux.vnet.ibm.com>

[ Upstream commit a36e8ba60b991d563677227f172db69e030797e6 ]

IMC(In-memory Collection Counters) does performance monitoring in
two different modes, i.e accumulation mode(core-imc and thread-imc events),
and trace mode(trace-imc events). A cpu thread can either be in
accumulation-mode or trace-mode at a time and this is done via the LDBAR
register in POWER architecture. The current design does not address the
races between thread-imc and trace-imc events.

Patch implements a global id and lock to avoid the races between
core, trace and thread imc events. With this global id-lock
implementation, the system can either run core, thread or trace imc
events at a time. i.e. to run any core-imc events, thread/trace imc events
should not be enabled/monitored.

Signed-off-by: Anju T Sudhakar <anju@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200313055238.8656-1-anju@linux.vnet.ibm.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 arch/powerpc/perf/imc-pmu.c | 173 +++++++++++++++++++++++++++++++-----
 1 file changed, 149 insertions(+), 24 deletions(-)

diff --git a/arch/powerpc/perf/imc-pmu.c b/arch/powerpc/perf/imc-pmu.c
index cb50a9e1fd2d7..eb82dda884e51 100644
--- a/arch/powerpc/perf/imc-pmu.c
+++ b/arch/powerpc/perf/imc-pmu.c
@@ -44,6 +44,16 @@ static DEFINE_PER_CPU(u64 *, trace_imc_mem);
 static struct imc_pmu_ref *trace_imc_refc;
 static int trace_imc_mem_size;
 
+/*
+ * Global data structure used to avoid races between thread,
+ * core and trace-imc
+ */
+static struct imc_pmu_ref imc_global_refc = {
+	.lock = __MUTEX_INITIALIZER(imc_global_refc.lock),
+	.id = 0,
+	.refc = 0,
+};
+
 static struct imc_pmu *imc_event_to_pmu(struct perf_event *event)
 {
 	return container_of(event->pmu, struct imc_pmu, pmu);
@@ -698,6 +708,16 @@ static int ppc_core_imc_cpu_offline(unsigned int cpu)
 			return -EINVAL;
 
 		ref->refc = 0;
+		/*
+		 * Reduce the global reference count, if this is the
+		 * last cpu in this core and core-imc event running
+		 * in this cpu.
+		 */
+		mutex_lock(&imc_global_refc.lock);
+		if (imc_global_refc.id == IMC_DOMAIN_CORE)
+			imc_global_refc.refc--;
+
+		mutex_unlock(&imc_global_refc.lock);
 	}
 	return 0;
 }
@@ -710,6 +730,23 @@ static int core_imc_pmu_cpumask_init(void)
 				 ppc_core_imc_cpu_offline);
 }
 
+static void reset_global_refc(struct perf_event *event)
+{
+		mutex_lock(&imc_global_refc.lock);
+		imc_global_refc.refc--;
+
+		/*
+		 * If no other thread is running any
+		 * event for this domain(thread/core/trace),
+		 * set the global id to zero.
+		 */
+		if (imc_global_refc.refc <= 0) {
+			imc_global_refc.refc = 0;
+			imc_global_refc.id = 0;
+		}
+		mutex_unlock(&imc_global_refc.lock);
+}
+
 static void core_imc_counters_release(struct perf_event *event)
 {
 	int rc, core_id;
@@ -759,6 +796,8 @@ static void core_imc_counters_release(struct perf_event *event)
 		ref->refc = 0;
 	}
 	mutex_unlock(&ref->lock);
+
+	reset_global_refc(event);
 }
 
 static int core_imc_event_init(struct perf_event *event)
@@ -819,6 +858,29 @@ static int core_imc_event_init(struct perf_event *event)
 	++ref->refc;
 	mutex_unlock(&ref->lock);
 
+	/*
+	 * Since the system can run either in accumulation or trace-mode
+	 * of IMC at a time, core-imc events are allowed only if no other
+	 * trace/thread imc events are enabled/monitored.
+	 *
+	 * Take the global lock, and check the refc.id
+	 * to know whether any other trace/thread imc
+	 * events are running.
+	 */
+	mutex_lock(&imc_global_refc.lock);
+	if (imc_global_refc.id == 0 || imc_global_refc.id == IMC_DOMAIN_CORE) {
+		/*
+		 * No other trace/thread imc events are running in
+		 * the system, so set the refc.id to core-imc.
+		 */
+		imc_global_refc.id = IMC_DOMAIN_CORE;
+		imc_global_refc.refc++;
+	} else {
+		mutex_unlock(&imc_global_refc.lock);
+		return -EBUSY;
+	}
+	mutex_unlock(&imc_global_refc.lock);
+
 	event->hw.event_base = (u64)pcmi->vbase + (config & IMC_EVENT_OFFSET_MASK);
 	event->destroy = core_imc_counters_release;
 	return 0;
@@ -877,7 +939,23 @@ static int ppc_thread_imc_cpu_online(unsigned int cpu)
 
 static int ppc_thread_imc_cpu_offline(unsigned int cpu)
 {
-	mtspr(SPRN_LDBAR, 0);
+	/*
+	 * Set the bit 0 of LDBAR to zero.
+	 *
+	 * If bit 0 of LDBAR is unset, it will stop posting
+	 * the counter data to memory.
+	 * For thread-imc, bit 0 of LDBAR will be set to 1 in the
+	 * event_add function. So reset this bit here, to stop the updates
+	 * to memory in the cpu_offline path.
+	 */
+	mtspr(SPRN_LDBAR, (mfspr(SPRN_LDBAR) & (~(1UL << 63))));
+
+	/* Reduce the refc if thread-imc event running on this cpu */
+	mutex_lock(&imc_global_refc.lock);
+	if (imc_global_refc.id == IMC_DOMAIN_THREAD)
+		imc_global_refc.refc--;
+	mutex_unlock(&imc_global_refc.lock);
+
 	return 0;
 }
 
@@ -916,7 +994,22 @@ static int thread_imc_event_init(struct perf_event *event)
 	if (!target)
 		return -EINVAL;
 
+	mutex_lock(&imc_global_refc.lock);
+	/*
+	 * Check if any other trace/core imc events are running in the
+	 * system, if not set the global id to thread-imc.
+	 */
+	if (imc_global_refc.id == 0 || imc_global_refc.id == IMC_DOMAIN_THREAD) {
+		imc_global_refc.id = IMC_DOMAIN_THREAD;
+		imc_global_refc.refc++;
+	} else {
+		mutex_unlock(&imc_global_refc.lock);
+		return -EBUSY;
+	}
+	mutex_unlock(&imc_global_refc.lock);
+
 	event->pmu->task_ctx_nr = perf_sw_context;
+	event->destroy = reset_global_refc;
 	return 0;
 }
 
@@ -1063,10 +1156,12 @@ static void thread_imc_event_del(struct perf_event *event, int flags)
 	int core_id;
 	struct imc_pmu_ref *ref;
 
-	mtspr(SPRN_LDBAR, 0);
-
 	core_id = smp_processor_id() / threads_per_core;
 	ref = &core_imc_refc[core_id];
+	if (!ref) {
+		pr_debug("imc: Failed to get event reference count\n");
+		return;
+	}
 
 	mutex_lock(&ref->lock);
 	ref->refc--;
@@ -1082,6 +1177,10 @@ static void thread_imc_event_del(struct perf_event *event, int flags)
 		ref->refc = 0;
 	}
 	mutex_unlock(&ref->lock);
+
+	/* Set bit 0 of LDBAR to zero, to stop posting updates to memory */
+	mtspr(SPRN_LDBAR, (mfspr(SPRN_LDBAR) & (~(1UL << 63))));
+
 	/*
 	 * Take a snapshot and calculate the delta and update
 	 * the event counter values.
@@ -1133,7 +1232,18 @@ static int ppc_trace_imc_cpu_online(unsigned int cpu)
 
 static int ppc_trace_imc_cpu_offline(unsigned int cpu)
 {
-	mtspr(SPRN_LDBAR, 0);
+	/*
+	 * No need to set bit 0 of LDBAR to zero, as
+	 * it is set to zero for imc trace-mode
+	 *
+	 * Reduce the refc if any trace-imc event running
+	 * on this cpu.
+	 */
+	mutex_lock(&imc_global_refc.lock);
+	if (imc_global_refc.id == IMC_DOMAIN_TRACE)
+		imc_global_refc.refc--;
+	mutex_unlock(&imc_global_refc.lock);
+
 	return 0;
 }
 
@@ -1226,15 +1336,14 @@ static int trace_imc_event_add(struct perf_event *event, int flags)
 	local_mem = get_trace_imc_event_base_addr();
 	ldbar_value = ((u64)local_mem & THREAD_IMC_LDBAR_MASK) | TRACE_IMC_ENABLE;
 
-	if (core_imc_refc)
-		ref = &core_imc_refc[core_id];
+	/* trace-imc reference count */
+	if (trace_imc_refc)
+		ref = &trace_imc_refc[core_id];
 	if (!ref) {
-		/* If core-imc is not enabled, use trace-imc reference count */
-		if (trace_imc_refc)
-			ref = &trace_imc_refc[core_id];
-		if (!ref)
-			return -EINVAL;
+		pr_debug("imc: Failed to get the event reference count\n");
+		return -EINVAL;
 	}
+
 	mtspr(SPRN_LDBAR, ldbar_value);
 	mutex_lock(&ref->lock);
 	if (ref->refc == 0) {
@@ -1242,13 +1351,11 @@ static int trace_imc_event_add(struct perf_event *event, int flags)
 				get_hard_smp_processor_id(smp_processor_id()))) {
 			mutex_unlock(&ref->lock);
 			pr_err("trace-imc: Unable to start the counters for core %d\n", core_id);
-			mtspr(SPRN_LDBAR, 0);
 			return -EINVAL;
 		}
 	}
 	++ref->refc;
 	mutex_unlock(&ref->lock);
-
 	return 0;
 }
 
@@ -1274,16 +1381,13 @@ static void trace_imc_event_del(struct perf_event *event, int flags)
 	int core_id = smp_processor_id() / threads_per_core;
 	struct imc_pmu_ref *ref = NULL;
 
-	if (core_imc_refc)
-		ref = &core_imc_refc[core_id];
+	if (trace_imc_refc)
+		ref = &trace_imc_refc[core_id];
 	if (!ref) {
-		/* If core-imc is not enabled, use trace-imc reference count */
-		if (trace_imc_refc)
-			ref = &trace_imc_refc[core_id];
-		if (!ref)
-			return;
+		pr_debug("imc: Failed to get event reference count\n");
+		return;
 	}
-	mtspr(SPRN_LDBAR, 0);
+
 	mutex_lock(&ref->lock);
 	ref->refc--;
 	if (ref->refc == 0) {
@@ -1297,6 +1401,7 @@ static void trace_imc_event_del(struct perf_event *event, int flags)
 		ref->refc = 0;
 	}
 	mutex_unlock(&ref->lock);
+
 	trace_imc_event_stop(event, flags);
 }
 
@@ -1314,10 +1419,30 @@ static int trace_imc_event_init(struct perf_event *event)
 	if (event->attr.sample_period == 0)
 		return -ENOENT;
 
+	/*
+	 * Take the global lock, and make sure
+	 * no other thread is running any core/thread imc
+	 * events
+	 */
+	mutex_lock(&imc_global_refc.lock);
+	if (imc_global_refc.id == 0 || imc_global_refc.id == IMC_DOMAIN_TRACE) {
+		/*
+		 * No core/thread imc events are running in the
+		 * system, so set the refc.id to trace-imc.
+		 */
+		imc_global_refc.id = IMC_DOMAIN_TRACE;
+		imc_global_refc.refc++;
+	} else {
+		mutex_unlock(&imc_global_refc.lock);
+		return -EBUSY;
+	}
+	mutex_unlock(&imc_global_refc.lock);
+
 	event->hw.idx = -1;
 	target = event->hw.target;
 
 	event->pmu->task_ctx_nr = perf_hw_context;
+	event->destroy = reset_global_refc;
 	return 0;
 }
 
@@ -1429,10 +1554,10 @@ static void cleanup_all_core_imc_memory(void)
 static void thread_imc_ldbar_disable(void *dummy)
 {
 	/*
-	 * By Zeroing LDBAR, we disable thread-imc
-	 * updates.
+	 * By setting 0th bit of LDBAR to zero, we disable thread-imc
+	 * updates to memory.
 	 */
-	mtspr(SPRN_LDBAR, 0);
+	mtspr(SPRN_LDBAR, (mfspr(SPRN_LDBAR) & (~(1UL << 63))));
 }
 
 void thread_imc_disable(void)
-- 
2.25.1


^ permalink raw reply related

* [PATCH AUTOSEL 5.4 180/330] KVM: PPC: Book3S HV: Treat TM-related invalid form instructions on P9 like the valid ones
From: Sasha Levin @ 2020-09-18  1:58 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Sasha Levin, Michael Neuling, kvm, Gustavo Romero, kvm-ppc,
	Leonardo Bras, linuxppc-dev
In-Reply-To: <20200918020110.2063155-1-sashal@kernel.org>

From: Gustavo Romero <gromero@linux.ibm.com>

[ Upstream commit 1dff3064c764b5a51c367b949b341d2e38972bec ]

On P9 DD2.2 due to a CPU defect some TM instructions need to be emulated by
KVM. This is handled at first by the hardware raising a softpatch interrupt
when certain TM instructions that need KVM assistance are executed in the
guest. Althought some TM instructions per Power ISA are invalid forms they
can raise a softpatch interrupt too. For instance, 'tresume.' instruction
as defined in the ISA must have bit 31 set (1), but an instruction that
matches 'tresume.' PO and XO opcode fields but has bit 31 not set (0), like
0x7cfe9ddc, also raises a softpatch interrupt. Similarly for 'treclaim.'
and 'trechkpt.' instructions with bit 31 = 0, i.e. 0x7c00075c and
0x7c0007dc, respectively. Hence, if a code like the following is executed
in the guest it will raise a softpatch interrupt just like a 'tresume.'
when the TM facility is enabled ('tabort. 0' in the example is used only
to enable the TM facility):

int main() { asm("tabort. 0; .long 0x7cfe9ddc;"); }

Currently in such a case KVM throws a complete trace like:

[345523.705984] WARNING: CPU: 24 PID: 64413 at arch/powerpc/kvm/book3s_hv_tm.c:211 kvmhv_p9_tm_emulation+0x68/0x620 [kvm_hv]
[345523.705985] Modules linked in: kvm_hv(E) xt_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp ip6table_mangle ip6table_nat
iptable_mangle iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ebtable_filter ebtables ip6table_filter
ip6_tables iptable_filter bridge stp llc sch_fq_codel ipmi_powernv at24 vmx_crypto ipmi_devintf ipmi_msghandler
ibmpowernv uio_pdrv_genirq kvm opal_prd uio leds_powernv ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp
libiscsi scsi_transport_iscsi ip_tables x_tables autofs4 btrfs blake2b_generic zstd_compress raid10 raid456
async_raid6_recov async_memcpy async_pq async_xor async_tx libcrc32c xor raid6_pq raid1 raid0 multipath linear tg3
crct10dif_vpmsum crc32c_vpmsum ipr [last unloaded: kvm_hv]
[345523.706030] CPU: 24 PID: 64413 Comm: CPU 0/KVM Tainted: G        W   E     5.5.0+ #1
[345523.706031] NIP:  c0080000072cb9c0 LR: c0080000072b5e80 CTR: c0080000085c7850
[345523.706034] REGS: c000000399467680 TRAP: 0700   Tainted: G        W   E      (5.5.0+)
[345523.706034] MSR:  900000010282b033 <SF,HV,VEC,VSX,EE,FP,ME,IR,DR,RI,LE,TM[E]>  CR: 24022428  XER: 00000000
[345523.706042] CFAR: c0080000072b5e7c IRQMASK: 0
                GPR00: c0080000072b5e80 c000000399467910 c0080000072db500 c000000375ccc720
                GPR04: c000000375ccc720 00000003fbec0000 0000a10395dda5a6 0000000000000000
                GPR08: 000000007cfe9ddc 7cfe9ddc000005dc 7cfe9ddc7c0005dc c0080000072cd530
                GPR12: c0080000085c7850 c0000003fffeb800 0000000000000001 00007dfb737f0000
                GPR16: c0002001edcca558 0000000000000000 0000000000000000 0000000000000001
                GPR20: c000000001b21258 c0002001edcca558 0000000000000018 0000000000000000
                GPR24: 0000000001000000 ffffffffffffffff 0000000000000001 0000000000001500
                GPR28: c0002001edcc4278 c00000037dd80000 800000050280f033 c000000375ccc720
[345523.706062] NIP [c0080000072cb9c0] kvmhv_p9_tm_emulation+0x68/0x620 [kvm_hv]
[345523.706065] LR [c0080000072b5e80] kvmppc_handle_exit_hv.isra.53+0x3e8/0x798 [kvm_hv]
[345523.706066] Call Trace:
[345523.706069] [c000000399467910] [c000000399467940] 0xc000000399467940 (unreliable)
[345523.706071] [c000000399467950] [c000000399467980] 0xc000000399467980
[345523.706075] [c0000003994679f0] [c0080000072bd1c4] kvmhv_run_single_vcpu+0xa1c/0xb80 [kvm_hv]
[345523.706079] [c000000399467ac0] [c0080000072bd8e0] kvmppc_vcpu_run_hv+0x5b8/0xb00 [kvm_hv]
[345523.706087] [c000000399467b90] [c0080000085c93cc] kvmppc_vcpu_run+0x34/0x48 [kvm]
[345523.706095] [c000000399467bb0] [c0080000085c582c] kvm_arch_vcpu_ioctl_run+0x244/0x420 [kvm]
[345523.706101] [c000000399467c40] [c0080000085b7498] kvm_vcpu_ioctl+0x3d0/0x7b0 [kvm]
[345523.706105] [c000000399467db0] [c0000000004adf9c] ksys_ioctl+0x13c/0x170
[345523.706107] [c000000399467e00] [c0000000004adff8] sys_ioctl+0x28/0x80
[345523.706111] [c000000399467e20] [c00000000000b278] system_call+0x5c/0x68
[345523.706112] Instruction dump:
[345523.706114] 419e0390 7f8a4840 409d0048 6d497c00 2f89075d 419e021c 6d497c00 2f8907dd
[345523.706119] 419e01c0 6d497c00 2f8905dd 419e00a4 <0fe00000> 38210040 38600000 ebc1fff0

and then treats the executed instruction as a 'nop'.

However the POWER9 User's Manual, in section "4.6.10 Book II Invalid
Forms", informs that for TM instructions bit 31 is in fact ignored, thus
for the TM-related invalid forms ignoring bit 31 and handling them like the
valid forms is an acceptable way to handle them. POWER8 behaves the same
way too.

This commit changes the handling of the cases here described by treating
the TM-related invalid forms that can generate a softpatch interrupt
just like their valid forms (w/ bit 31 = 1) instead of as a 'nop' and by
gently reporting any other unrecognized case to the host and treating it as
illegal instruction instead of throwing a trace and treating it as a 'nop'.

Signed-off-by: Gustavo Romero <gromero@linux.ibm.com>
Reviewed-by: Segher Boessenkool <segher@kernel.crashing.org>
Acked-By: Michael Neuling <mikey@neuling.org>
Reviewed-by: Leonardo Bras <leonardo@linux.ibm.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 arch/powerpc/include/asm/kvm_asm.h      |  3 +++
 arch/powerpc/kvm/book3s_hv_tm.c         | 28 ++++++++++++++++++++-----
 arch/powerpc/kvm/book3s_hv_tm_builtin.c | 16 ++++++++++++--
 3 files changed, 40 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_asm.h b/arch/powerpc/include/asm/kvm_asm.h
index 635fb154b33f9..a3633560493be 100644
--- a/arch/powerpc/include/asm/kvm_asm.h
+++ b/arch/powerpc/include/asm/kvm_asm.h
@@ -150,4 +150,7 @@
 
 #define KVM_INST_FETCH_FAILED	-1
 
+/* Extract PO and XOP opcode fields */
+#define PO_XOP_OPCODE_MASK 0xfc0007fe
+
 #endif /* __POWERPC_KVM_ASM_H__ */
diff --git a/arch/powerpc/kvm/book3s_hv_tm.c b/arch/powerpc/kvm/book3s_hv_tm.c
index 0db9374971697..cc90b8b823291 100644
--- a/arch/powerpc/kvm/book3s_hv_tm.c
+++ b/arch/powerpc/kvm/book3s_hv_tm.c
@@ -3,6 +3,8 @@
  * Copyright 2017 Paul Mackerras, IBM Corp. <paulus@au1.ibm.com>
  */
 
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
 #include <linux/kvm_host.h>
 
 #include <asm/kvm_ppc.h>
@@ -44,7 +46,18 @@ int kvmhv_p9_tm_emulation(struct kvm_vcpu *vcpu)
 	u64 newmsr, bescr;
 	int ra, rs;
 
-	switch (instr & 0xfc0007ff) {
+	/*
+	 * rfid, rfebb, and mtmsrd encode bit 31 = 0 since it's a reserved bit
+	 * in these instructions, so masking bit 31 out doesn't change these
+	 * instructions. For treclaim., tsr., and trechkpt. instructions if bit
+	 * 31 = 0 then they are per ISA invalid forms, however P9 UM, in section
+	 * 4.6.10 Book II Invalid Forms, informs specifically that ignoring bit
+	 * 31 is an acceptable way to handle these invalid forms that have
+	 * bit 31 = 0. Moreover, for emulation purposes both forms (w/ and wo/
+	 * bit 31 set) can generate a softpatch interrupt. Hence both forms
+	 * are handled below for these instructions so they behave the same way.
+	 */
+	switch (instr & PO_XOP_OPCODE_MASK) {
 	case PPC_INST_RFID:
 		/* XXX do we need to check for PR=0 here? */
 		newmsr = vcpu->arch.shregs.srr1;
@@ -105,7 +118,8 @@ int kvmhv_p9_tm_emulation(struct kvm_vcpu *vcpu)
 		vcpu->arch.shregs.msr = newmsr;
 		return RESUME_GUEST;
 
-	case PPC_INST_TSR:
+	/* ignore bit 31, see comment above */
+	case (PPC_INST_TSR & PO_XOP_OPCODE_MASK):
 		/* check for PR=1 and arch 2.06 bit set in PCR */
 		if ((msr & MSR_PR) && (vcpu->arch.vcore->pcr & PCR_ARCH_206)) {
 			/* generate an illegal instruction interrupt */
@@ -140,7 +154,8 @@ int kvmhv_p9_tm_emulation(struct kvm_vcpu *vcpu)
 		vcpu->arch.shregs.msr = msr;
 		return RESUME_GUEST;
 
-	case PPC_INST_TRECLAIM:
+	/* ignore bit 31, see comment above */
+	case (PPC_INST_TRECLAIM & PO_XOP_OPCODE_MASK):
 		/* check for TM disabled in the HFSCR or MSR */
 		if (!(vcpu->arch.hfscr & HFSCR_TM)) {
 			/* generate an illegal instruction interrupt */
@@ -176,7 +191,8 @@ int kvmhv_p9_tm_emulation(struct kvm_vcpu *vcpu)
 		vcpu->arch.shregs.msr &= ~MSR_TS_MASK;
 		return RESUME_GUEST;
 
-	case PPC_INST_TRECHKPT:
+	/* ignore bit 31, see comment above */
+	case (PPC_INST_TRECHKPT & PO_XOP_OPCODE_MASK):
 		/* XXX do we need to check for PR=0 here? */
 		/* check for TM disabled in the HFSCR or MSR */
 		if (!(vcpu->arch.hfscr & HFSCR_TM)) {
@@ -208,6 +224,8 @@ int kvmhv_p9_tm_emulation(struct kvm_vcpu *vcpu)
 	}
 
 	/* What should we do here? We didn't recognize the instruction */
-	WARN_ON_ONCE(1);
+	kvmppc_core_queue_program(vcpu, SRR1_PROGILL);
+	pr_warn_ratelimited("Unrecognized TM-related instruction %#x for emulation", instr);
+
 	return RESUME_GUEST;
 }
diff --git a/arch/powerpc/kvm/book3s_hv_tm_builtin.c b/arch/powerpc/kvm/book3s_hv_tm_builtin.c
index 217246279dfae..fad931f224efd 100644
--- a/arch/powerpc/kvm/book3s_hv_tm_builtin.c
+++ b/arch/powerpc/kvm/book3s_hv_tm_builtin.c
@@ -23,7 +23,18 @@ int kvmhv_p9_tm_emulation_early(struct kvm_vcpu *vcpu)
 	u64 newmsr, msr, bescr;
 	int rs;
 
-	switch (instr & 0xfc0007ff) {
+	/*
+	 * rfid, rfebb, and mtmsrd encode bit 31 = 0 since it's a reserved bit
+	 * in these instructions, so masking bit 31 out doesn't change these
+	 * instructions. For the tsr. instruction if bit 31 = 0 then it is per
+	 * ISA an invalid form, however P9 UM, in section 4.6.10 Book II Invalid
+	 * Forms, informs specifically that ignoring bit 31 is an acceptable way
+	 * to handle TM-related invalid forms that have bit 31 = 0. Moreover,
+	 * for emulation purposes both forms (w/ and wo/ bit 31 set) can
+	 * generate a softpatch interrupt. Hence both forms are handled below
+	 * for tsr. to make them behave the same way.
+	 */
+	switch (instr & PO_XOP_OPCODE_MASK) {
 	case PPC_INST_RFID:
 		/* XXX do we need to check for PR=0 here? */
 		newmsr = vcpu->arch.shregs.srr1;
@@ -73,7 +84,8 @@ int kvmhv_p9_tm_emulation_early(struct kvm_vcpu *vcpu)
 		vcpu->arch.shregs.msr = newmsr;
 		return 1;
 
-	case PPC_INST_TSR:
+	/* ignore bit 31, see comment above */
+	case (PPC_INST_TSR & PO_XOP_OPCODE_MASK):
 		/* we know the MSR has the TS field = S (0b01) here */
 		msr = vcpu->arch.shregs.msr;
 		/* check for PR=1 and arch 2.06 bit set in PCR */
-- 
2.25.1


^ permalink raw reply related

* [PATCH AUTOSEL 5.4 153/330] powerpc/book3s64: Fix error handling in mm_iommu_do_alloc()
From: Sasha Levin @ 2020-09-18  1:58 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Alexey Kardashevskiy, Jan Kara, linuxppc-dev, Sasha Levin
In-Reply-To: <20200918020110.2063155-1-sashal@kernel.org>

From: Alexey Kardashevskiy <aik@ozlabs.ru>

[ Upstream commit c4b78169e3667413184c9a20e11b5832288a109f ]

The last jump to free_exit in mm_iommu_do_alloc() happens after page
pointers in struct mm_iommu_table_group_mem_t were already converted to
physical addresses. Thus calling put_page() on these physical addresses
will likely crash.

This moves the loop which calculates the pageshift and converts page
struct pointers to physical addresses later after the point when
we cannot fail; thus eliminating the need to convert pointers back.

Fixes: eb9d7a62c386 ("powerpc/mm_iommu: Fix potential deadlock")
Reported-by: Jan Kara <jack@suse.cz>
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20191223060351.26359-1-aik@ozlabs.ru
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 arch/powerpc/mm/book3s64/iommu_api.c | 39 +++++++++++++++-------------
 1 file changed, 21 insertions(+), 18 deletions(-)

diff --git a/arch/powerpc/mm/book3s64/iommu_api.c b/arch/powerpc/mm/book3s64/iommu_api.c
index 56cc845205779..ef164851738b8 100644
--- a/arch/powerpc/mm/book3s64/iommu_api.c
+++ b/arch/powerpc/mm/book3s64/iommu_api.c
@@ -121,24 +121,6 @@ static long mm_iommu_do_alloc(struct mm_struct *mm, unsigned long ua,
 		goto free_exit;
 	}
 
-	pageshift = PAGE_SHIFT;
-	for (i = 0; i < entries; ++i) {
-		struct page *page = mem->hpages[i];
-
-		/*
-		 * Allow to use larger than 64k IOMMU pages. Only do that
-		 * if we are backed by hugetlb.
-		 */
-		if ((mem->pageshift > PAGE_SHIFT) && PageHuge(page))
-			pageshift = page_shift(compound_head(page));
-		mem->pageshift = min(mem->pageshift, pageshift);
-		/*
-		 * We don't need struct page reference any more, switch
-		 * to physical address.
-		 */
-		mem->hpas[i] = page_to_pfn(page) << PAGE_SHIFT;
-	}
-
 good_exit:
 	atomic64_set(&mem->mapped, 1);
 	mem->used = 1;
@@ -158,6 +140,27 @@ good_exit:
 		}
 	}
 
+	if (mem->dev_hpa == MM_IOMMU_TABLE_INVALID_HPA) {
+		/*
+		 * Allow to use larger than 64k IOMMU pages. Only do that
+		 * if we are backed by hugetlb. Skip device memory as it is not
+		 * backed with page structs.
+		 */
+		pageshift = PAGE_SHIFT;
+		for (i = 0; i < entries; ++i) {
+			struct page *page = mem->hpages[i];
+
+			if ((mem->pageshift > PAGE_SHIFT) && PageHuge(page))
+				pageshift = page_shift(compound_head(page));
+			mem->pageshift = min(mem->pageshift, pageshift);
+			/*
+			 * We don't need struct page reference any more, switch
+			 * to physical address.
+			 */
+			mem->hpas[i] = page_to_pfn(page) << PAGE_SHIFT;
+		}
+	}
+
 	list_add_rcu(&mem->next, &mm->context.iommu_group_mem_list);
 
 	mutex_unlock(&mem_list_mutex);
-- 
2.25.1


^ permalink raw reply related

* [PATCH AUTOSEL 5.4 101/330] powerpc/powernv/ioda: Fix ref count for devices with their own PE
From: Sasha Levin @ 2020-09-18  1:57 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Frederic Barrat, linuxppc-dev, Andrew Donnellan, Sasha Levin
In-Reply-To: <20200918020110.2063155-1-sashal@kernel.org>

From: Frederic Barrat <fbarrat@linux.ibm.com>

[ Upstream commit 05dd7da76986937fb288b4213b1fa10dbe0d1b33 ]

The pci_dn structure used to store a pointer to the struct pci_dev, so
taking a reference on the device was required. However, the pci_dev
pointer was later removed from the pci_dn structure, but the reference
was kept for the npu device.
See commit 902bdc57451c ("powerpc/powernv/idoa: Remove unnecessary
pcidev from pci_dn").

We don't need to take a reference on the device when assigning the PE
as the struct pnv_ioda_pe is cleaned up at the same time as
the (physical) device is released. Doing so prevents the device from
being released, which is a problem for opencapi devices, since we want
to be able to remove them through PCI hotplug.

Now the ugly part: nvlink npu devices are not meant to be
released. Because of the above, we've always leaked a reference and
simply removing it now is dangerous and would likely require more
work. There's currently no release device callback for nvlink devices
for example. So to be safe, this patch leaks a reference on the npu
device, but only for nvlink and not opencapi.

Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com>
Reviewed-by: Andrew Donnellan <ajd@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20191121134918.7155-2-fbarrat@linux.ibm.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 arch/powerpc/platforms/powernv/pci-ioda.c | 19 ++++++++++++-------
 1 file changed, 12 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
index 058223233088e..e9cda7e316a50 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -1062,14 +1062,13 @@ static struct pnv_ioda_pe *pnv_ioda_setup_dev_PE(struct pci_dev *dev)
 		return NULL;
 	}
 
-	/* NOTE: We get only one ref to the pci_dev for the pdn, not for the
-	 * pointer in the PE data structure, both should be destroyed at the
-	 * same time. However, this needs to be looked at more closely again
-	 * once we actually start removing things (Hotplug, SR-IOV, ...)
+	/* NOTE: We don't get a reference for the pointer in the PE
+	 * data structure, both the device and PE structures should be
+	 * destroyed at the same time. However, removing nvlink
+	 * devices will need some work.
 	 *
 	 * At some point we want to remove the PDN completely anyways
 	 */
-	pci_dev_get(dev);
 	pdn->pe_number = pe->pe_number;
 	pe->flags = PNV_IODA_PE_DEV;
 	pe->pdev = dev;
@@ -1084,7 +1083,6 @@ static struct pnv_ioda_pe *pnv_ioda_setup_dev_PE(struct pci_dev *dev)
 		pnv_ioda_free_pe(pe);
 		pdn->pe_number = IODA_INVALID_PE;
 		pe->pdev = NULL;
-		pci_dev_put(dev);
 		return NULL;
 	}
 
@@ -1205,6 +1203,14 @@ static struct pnv_ioda_pe *pnv_ioda_setup_npu_PE(struct pci_dev *npu_pdev)
 	struct pci_controller *hose = pci_bus_to_host(npu_pdev->bus);
 	struct pnv_phb *phb = hose->private_data;
 
+	/*
+	 * Intentionally leak a reference on the npu device (for
+	 * nvlink only; this is not an opencapi path) to make sure it
+	 * never goes away, as it's been the case all along and some
+	 * work is needed otherwise.
+	 */
+	pci_dev_get(npu_pdev);
+
 	/*
 	 * Due to a hardware errata PE#0 on the NPU is reserved for
 	 * error handling. This means we only have three PEs remaining
@@ -1228,7 +1234,6 @@ static struct pnv_ioda_pe *pnv_ioda_setup_npu_PE(struct pci_dev *npu_pdev)
 			 */
 			dev_info(&npu_pdev->dev,
 				"Associating to existing PE %x\n", pe_num);
-			pci_dev_get(npu_pdev);
 			npu_pdn = pci_get_pdn(npu_pdev);
 			rid = npu_pdev->bus->number << 8 | npu_pdn->devfn;
 			npu_pdn->pe_number = pe_num;
-- 
2.25.1


^ permalink raw reply related

* [PATCH AUTOSEL 5.4 102/330] powerpc/eeh: Only dump stack once if an MMIO loop is detected
From: Sasha Levin @ 2020-09-18  1:57 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Sam Bobroff, Oliver O'Halloran, linuxppc-dev, Sasha Levin
In-Reply-To: <20200918020110.2063155-1-sashal@kernel.org>

From: Oliver O'Halloran <oohall@gmail.com>

[ Upstream commit 4e0942c0302b5ad76b228b1a7b8c09f658a1d58a ]

Many drivers don't check for errors when they get a 0xFFs response from an
MMIO load. As a result after an EEH event occurs a driver can get stuck in
a polling loop unless it some kind of internal timeout logic.

Currently EEH tries to detect and report stuck drivers by dumping a stack
trace after eeh_dev_check_failure() is called EEH_MAX_FAILS times on an
already frozen PE. The value of EEH_MAX_FAILS was chosen so that a dump
would occur every few seconds if the driver was spinning in a loop. This
results in a lot of spurious stack traces in the kernel log.

Fix this by limiting it to printing one stack trace for each PE freeze. If
the driver is truely stuck the kernel's hung task detector is better suited
to reporting the probelm anyway.

Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Reviewed-by: Sam Bobroff <sbobroff@linux.ibm.com>
Tested-by: Sam Bobroff <sbobroff@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20191016012536.22588-1-oohall@gmail.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 arch/powerpc/kernel/eeh.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c
index bc8a551013be9..c35069294ecfb 100644
--- a/arch/powerpc/kernel/eeh.c
+++ b/arch/powerpc/kernel/eeh.c
@@ -503,7 +503,7 @@ int eeh_dev_check_failure(struct eeh_dev *edev)
 	rc = 1;
 	if (pe->state & EEH_PE_ISOLATED) {
 		pe->check_count++;
-		if (pe->check_count % EEH_MAX_FAILS == 0) {
+		if (pe->check_count == EEH_MAX_FAILS) {
 			dn = pci_device_to_OF_node(dev);
 			if (dn)
 				location = of_get_property(dn, "ibm,loc-code",
-- 
2.25.1

^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox