* [PATCH AUTOSEL 5.4 030/330] powerpc/64s: Always disable branch profiling for prom_init.o
[not found] <20200918020110.2063155-1-sashal@kernel.org>
@ 2020-09-18 1:56 ` Sasha Levin
2020-09-18 1:57 ` [PATCH AUTOSEL 5.4 101/330] powerpc/powernv/ioda: Fix ref count for devices with their own PE Sasha Levin
` (6 subsequent siblings)
7 siblings, 0 replies; 11+ messages in thread
From: Sasha Levin @ 2020-09-18 1:56 UTC (permalink / raw)
To: linux-kernel, stable; +Cc: Sasha Levin, linuxppc-dev
From: Michael Ellerman <mpe@ellerman.id.au>
[ Upstream commit 6266a4dadb1d0976490fdf5af4f7941e36f64e80 ]
Otherwise the build fails because prom_init is calling symbols it's
not allowed to, eg:
Error: External symbol 'ftrace_likely_update' referenced from prom_init.c
make[3]: *** [arch/powerpc/kernel/Makefile:197: arch/powerpc/kernel/prom_init_check] Error 1
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20191106051129.7626-1-mpe@ellerman.id.au
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
arch/powerpc/kernel/Makefile | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile
index dc0780f930d5b..59260eb962916 100644
--- a/arch/powerpc/kernel/Makefile
+++ b/arch/powerpc/kernel/Makefile
@@ -19,6 +19,7 @@ CFLAGS_btext.o += $(DISABLE_LATENT_ENTROPY_PLUGIN)
CFLAGS_prom.o += $(DISABLE_LATENT_ENTROPY_PLUGIN)
CFLAGS_prom_init.o += $(call cc-option, -fno-stack-protector)
+CFLAGS_prom_init.o += -DDISABLE_BRANCH_PROFILING
ifdef CONFIG_FUNCTION_TRACER
# Do not trace early boot code
@@ -36,7 +37,6 @@ KASAN_SANITIZE_btext.o := n
ifdef CONFIG_KASAN
CFLAGS_early_32.o += -DDISABLE_BRANCH_PROFILING
CFLAGS_cputable.o += -DDISABLE_BRANCH_PROFILING
-CFLAGS_prom_init.o += -DDISABLE_BRANCH_PROFILING
CFLAGS_btext.o += -DDISABLE_BRANCH_PROFILING
endif
--
2.25.1
^ permalink raw reply related [flat|nested] 11+ messages in thread* [PATCH AUTOSEL 5.4 101/330] powerpc/powernv/ioda: Fix ref count for devices with their own PE
[not found] <20200918020110.2063155-1-sashal@kernel.org>
2020-09-18 1:56 ` [PATCH AUTOSEL 5.4 030/330] powerpc/64s: Always disable branch profiling for prom_init.o Sasha Levin
@ 2020-09-18 1:57 ` Sasha Levin
2020-09-18 6:35 ` Frederic Barrat
2020-09-18 1:57 ` [PATCH AUTOSEL 5.4 102/330] powerpc/eeh: Only dump stack once if an MMIO loop is detected Sasha Levin
` (5 subsequent siblings)
7 siblings, 1 reply; 11+ messages in thread
From: Sasha Levin @ 2020-09-18 1:57 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Frederic Barrat, linuxppc-dev, Andrew Donnellan, Sasha Levin
From: Frederic Barrat <fbarrat@linux.ibm.com>
[ Upstream commit 05dd7da76986937fb288b4213b1fa10dbe0d1b33 ]
The pci_dn structure used to store a pointer to the struct pci_dev, so
taking a reference on the device was required. However, the pci_dev
pointer was later removed from the pci_dn structure, but the reference
was kept for the npu device.
See commit 902bdc57451c ("powerpc/powernv/idoa: Remove unnecessary
pcidev from pci_dn").
We don't need to take a reference on the device when assigning the PE
as the struct pnv_ioda_pe is cleaned up at the same time as
the (physical) device is released. Doing so prevents the device from
being released, which is a problem for opencapi devices, since we want
to be able to remove them through PCI hotplug.
Now the ugly part: nvlink npu devices are not meant to be
released. Because of the above, we've always leaked a reference and
simply removing it now is dangerous and would likely require more
work. There's currently no release device callback for nvlink devices
for example. So to be safe, this patch leaks a reference on the npu
device, but only for nvlink and not opencapi.
Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com>
Reviewed-by: Andrew Donnellan <ajd@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20191121134918.7155-2-fbarrat@linux.ibm.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
arch/powerpc/platforms/powernv/pci-ioda.c | 19 ++++++++++++-------
1 file changed, 12 insertions(+), 7 deletions(-)
diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
index 058223233088e..e9cda7e316a50 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -1062,14 +1062,13 @@ static struct pnv_ioda_pe *pnv_ioda_setup_dev_PE(struct pci_dev *dev)
return NULL;
}
- /* NOTE: We get only one ref to the pci_dev for the pdn, not for the
- * pointer in the PE data structure, both should be destroyed at the
- * same time. However, this needs to be looked at more closely again
- * once we actually start removing things (Hotplug, SR-IOV, ...)
+ /* NOTE: We don't get a reference for the pointer in the PE
+ * data structure, both the device and PE structures should be
+ * destroyed at the same time. However, removing nvlink
+ * devices will need some work.
*
* At some point we want to remove the PDN completely anyways
*/
- pci_dev_get(dev);
pdn->pe_number = pe->pe_number;
pe->flags = PNV_IODA_PE_DEV;
pe->pdev = dev;
@@ -1084,7 +1083,6 @@ static struct pnv_ioda_pe *pnv_ioda_setup_dev_PE(struct pci_dev *dev)
pnv_ioda_free_pe(pe);
pdn->pe_number = IODA_INVALID_PE;
pe->pdev = NULL;
- pci_dev_put(dev);
return NULL;
}
@@ -1205,6 +1203,14 @@ static struct pnv_ioda_pe *pnv_ioda_setup_npu_PE(struct pci_dev *npu_pdev)
struct pci_controller *hose = pci_bus_to_host(npu_pdev->bus);
struct pnv_phb *phb = hose->private_data;
+ /*
+ * Intentionally leak a reference on the npu device (for
+ * nvlink only; this is not an opencapi path) to make sure it
+ * never goes away, as it's been the case all along and some
+ * work is needed otherwise.
+ */
+ pci_dev_get(npu_pdev);
+
/*
* Due to a hardware errata PE#0 on the NPU is reserved for
* error handling. This means we only have three PEs remaining
@@ -1228,7 +1234,6 @@ static struct pnv_ioda_pe *pnv_ioda_setup_npu_PE(struct pci_dev *npu_pdev)
*/
dev_info(&npu_pdev->dev,
"Associating to existing PE %x\n", pe_num);
- pci_dev_get(npu_pdev);
npu_pdn = pci_get_pdn(npu_pdev);
rid = npu_pdev->bus->number << 8 | npu_pdn->devfn;
npu_pdn->pe_number = pe_num;
--
2.25.1
^ permalink raw reply related [flat|nested] 11+ messages in thread* Re: [PATCH AUTOSEL 5.4 101/330] powerpc/powernv/ioda: Fix ref count for devices with their own PE
2020-09-18 1:57 ` [PATCH AUTOSEL 5.4 101/330] powerpc/powernv/ioda: Fix ref count for devices with their own PE Sasha Levin
@ 2020-09-18 6:35 ` Frederic Barrat
2020-09-19 18:10 ` Sasha Levin
0 siblings, 1 reply; 11+ messages in thread
From: Frederic Barrat @ 2020-09-18 6:35 UTC (permalink / raw)
To: Sasha Levin, linux-kernel, stable; +Cc: linuxppc-dev, Andrew Donnellan
Le 18/09/2020 à 03:57, Sasha Levin a écrit :
> From: Frederic Barrat <fbarrat@linux.ibm.com>
>
> [ Upstream commit 05dd7da76986937fb288b4213b1fa10dbe0d1b33 ]
>
This patch is not desirable for stable, for 5.4 and 4.19 (it was already
flagged by autosel back in April. Not sure why it's showing again now)
Fred
> The pci_dn structure used to store a pointer to the struct pci_dev, so
> taking a reference on the device was required. However, the pci_dev
> pointer was later removed from the pci_dn structure, but the reference
> was kept for the npu device.
> See commit 902bdc57451c ("powerpc/powernv/idoa: Remove unnecessary
> pcidev from pci_dn").
>
> We don't need to take a reference on the device when assigning the PE
> as the struct pnv_ioda_pe is cleaned up at the same time as
> the (physical) device is released. Doing so prevents the device from
> being released, which is a problem for opencapi devices, since we want
> to be able to remove them through PCI hotplug.
>
> Now the ugly part: nvlink npu devices are not meant to be
> released. Because of the above, we've always leaked a reference and
> simply removing it now is dangerous and would likely require more
> work. There's currently no release device callback for nvlink devices
> for example. So to be safe, this patch leaks a reference on the npu
> device, but only for nvlink and not opencapi.
>
> Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com>
> Reviewed-by: Andrew Donnellan <ajd@linux.ibm.com>
> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
> Link: https://lore.kernel.org/r/20191121134918.7155-2-fbarrat@linux.ibm.com
> Signed-off-by: Sasha Levin <sashal@kernel.org>
> ---
> arch/powerpc/platforms/powernv/pci-ioda.c | 19 ++++++++++++-------
> 1 file changed, 12 insertions(+), 7 deletions(-)
>
> diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
> index 058223233088e..e9cda7e316a50 100644
> --- a/arch/powerpc/platforms/powernv/pci-ioda.c
> +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
> @@ -1062,14 +1062,13 @@ static struct pnv_ioda_pe *pnv_ioda_setup_dev_PE(struct pci_dev *dev)
> return NULL;
> }
>
> - /* NOTE: We get only one ref to the pci_dev for the pdn, not for the
> - * pointer in the PE data structure, both should be destroyed at the
> - * same time. However, this needs to be looked at more closely again
> - * once we actually start removing things (Hotplug, SR-IOV, ...)
> + /* NOTE: We don't get a reference for the pointer in the PE
> + * data structure, both the device and PE structures should be
> + * destroyed at the same time. However, removing nvlink
> + * devices will need some work.
> *
> * At some point we want to remove the PDN completely anyways
> */
> - pci_dev_get(dev);
> pdn->pe_number = pe->pe_number;
> pe->flags = PNV_IODA_PE_DEV;
> pe->pdev = dev;
> @@ -1084,7 +1083,6 @@ static struct pnv_ioda_pe *pnv_ioda_setup_dev_PE(struct pci_dev *dev)
> pnv_ioda_free_pe(pe);
> pdn->pe_number = IODA_INVALID_PE;
> pe->pdev = NULL;
> - pci_dev_put(dev);
> return NULL;
> }
>
> @@ -1205,6 +1203,14 @@ static struct pnv_ioda_pe *pnv_ioda_setup_npu_PE(struct pci_dev *npu_pdev)
> struct pci_controller *hose = pci_bus_to_host(npu_pdev->bus);
> struct pnv_phb *phb = hose->private_data;
>
> + /*
> + * Intentionally leak a reference on the npu device (for
> + * nvlink only; this is not an opencapi path) to make sure it
> + * never goes away, as it's been the case all along and some
> + * work is needed otherwise.
> + */
> + pci_dev_get(npu_pdev);
> +
> /*
> * Due to a hardware errata PE#0 on the NPU is reserved for
> * error handling. This means we only have three PEs remaining
> @@ -1228,7 +1234,6 @@ static struct pnv_ioda_pe *pnv_ioda_setup_npu_PE(struct pci_dev *npu_pdev)
> */
> dev_info(&npu_pdev->dev,
> "Associating to existing PE %x\n", pe_num);
> - pci_dev_get(npu_pdev);
> npu_pdn = pci_get_pdn(npu_pdev);
> rid = npu_pdev->bus->number << 8 | npu_pdn->devfn;
> npu_pdn->pe_number = pe_num;
>
^ permalink raw reply [flat|nested] 11+ messages in thread* Re: [PATCH AUTOSEL 5.4 101/330] powerpc/powernv/ioda: Fix ref count for devices with their own PE
2020-09-18 6:35 ` Frederic Barrat
@ 2020-09-19 18:10 ` Sasha Levin
2020-09-21 7:58 ` Frederic Barrat
0 siblings, 1 reply; 11+ messages in thread
From: Sasha Levin @ 2020-09-19 18:10 UTC (permalink / raw)
To: Frederic Barrat; +Cc: linuxppc-dev, linux-kernel, stable, Andrew Donnellan
On Fri, Sep 18, 2020 at 08:35:06AM +0200, Frederic Barrat wrote:
>
>
>Le 18/09/2020 à 03:57, Sasha Levin a écrit :
>>From: Frederic Barrat <fbarrat@linux.ibm.com>
>>
>>[ Upstream commit 05dd7da76986937fb288b4213b1fa10dbe0d1b33 ]
>>
>
>This patch is not desirable for stable, for 5.4 and 4.19 (it was
>already flagged by autosel back in April. Not sure why it's showing
>again now)
Hey Fred,
This was a bit of a "lie", it wasn't a run of AUTOSEL, but rather an
audit of patches that went into distro/vendor trees but not into the
upstream stable trees.
I can see that this patch was pulled into Ubuntu's 5.4 tree, is it not
needed in the upstream stable tree?
--
Thanks,
Sasha
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH AUTOSEL 5.4 101/330] powerpc/powernv/ioda: Fix ref count for devices with their own PE
2020-09-19 18:10 ` Sasha Levin
@ 2020-09-21 7:58 ` Frederic Barrat
0 siblings, 0 replies; 11+ messages in thread
From: Frederic Barrat @ 2020-09-21 7:58 UTC (permalink / raw)
To: Sasha Levin; +Cc: linuxppc-dev, linux-kernel, stable, Andrew Donnellan
Le 19/09/2020 à 20:10, Sasha Levin a écrit :
> On Fri, Sep 18, 2020 at 08:35:06AM +0200, Frederic Barrat wrote:
>>
>>
>> Le 18/09/2020 à 03:57, Sasha Levin a écrit :
>>> From: Frederic Barrat <fbarrat@linux.ibm.com>
>>>
>>> [ Upstream commit 05dd7da76986937fb288b4213b1fa10dbe0d1b33 ]
>>>
>>
>> This patch is not desirable for stable, for 5.4 and 4.19 (it was
>> already flagged by autosel back in April. Not sure why it's showing
>> again now)
>
> Hey Fred,
>
> This was a bit of a "lie", it wasn't a run of AUTOSEL, but rather an
> audit of patches that went into distro/vendor trees but not into the
> upstream stable trees.
>
> I can see that this patch was pulled into Ubuntu's 5.4 tree, is it not
> needed in the upstream stable tree?
That patch in itself is useless (it replaces a ref counter leak by
another one). It was part of a longer series that we backported to
Ubuntu's 5.4 tree.
So it's really not needed on the stable trees. It likely wouldn't hurt
or break anything, but there's really no point.
Fred
^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH AUTOSEL 5.4 102/330] powerpc/eeh: Only dump stack once if an MMIO loop is detected
[not found] <20200918020110.2063155-1-sashal@kernel.org>
2020-09-18 1:56 ` [PATCH AUTOSEL 5.4 030/330] powerpc/64s: Always disable branch profiling for prom_init.o Sasha Levin
2020-09-18 1:57 ` [PATCH AUTOSEL 5.4 101/330] powerpc/powernv/ioda: Fix ref count for devices with their own PE Sasha Levin
@ 2020-09-18 1:57 ` Sasha Levin
2020-09-18 1:58 ` [PATCH AUTOSEL 5.4 153/330] powerpc/book3s64: Fix error handling in mm_iommu_do_alloc() Sasha Levin
` (4 subsequent siblings)
7 siblings, 0 replies; 11+ messages in thread
From: Sasha Levin @ 2020-09-18 1:57 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Sam Bobroff, Oliver O'Halloran, linuxppc-dev, Sasha Levin
From: Oliver O'Halloran <oohall@gmail.com>
[ Upstream commit 4e0942c0302b5ad76b228b1a7b8c09f658a1d58a ]
Many drivers don't check for errors when they get a 0xFFs response from an
MMIO load. As a result after an EEH event occurs a driver can get stuck in
a polling loop unless it some kind of internal timeout logic.
Currently EEH tries to detect and report stuck drivers by dumping a stack
trace after eeh_dev_check_failure() is called EEH_MAX_FAILS times on an
already frozen PE. The value of EEH_MAX_FAILS was chosen so that a dump
would occur every few seconds if the driver was spinning in a loop. This
results in a lot of spurious stack traces in the kernel log.
Fix this by limiting it to printing one stack trace for each PE freeze. If
the driver is truely stuck the kernel's hung task detector is better suited
to reporting the probelm anyway.
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Reviewed-by: Sam Bobroff <sbobroff@linux.ibm.com>
Tested-by: Sam Bobroff <sbobroff@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20191016012536.22588-1-oohall@gmail.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
arch/powerpc/kernel/eeh.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c
index bc8a551013be9..c35069294ecfb 100644
--- a/arch/powerpc/kernel/eeh.c
+++ b/arch/powerpc/kernel/eeh.c
@@ -503,7 +503,7 @@ int eeh_dev_check_failure(struct eeh_dev *edev)
rc = 1;
if (pe->state & EEH_PE_ISOLATED) {
pe->check_count++;
- if (pe->check_count % EEH_MAX_FAILS == 0) {
+ if (pe->check_count == EEH_MAX_FAILS) {
dn = pci_device_to_OF_node(dev);
if (dn)
location = of_get_property(dn, "ibm,loc-code",
--
2.25.1
^ permalink raw reply related [flat|nested] 11+ messages in thread* [PATCH AUTOSEL 5.4 153/330] powerpc/book3s64: Fix error handling in mm_iommu_do_alloc()
[not found] <20200918020110.2063155-1-sashal@kernel.org>
` (2 preceding siblings ...)
2020-09-18 1:57 ` [PATCH AUTOSEL 5.4 102/330] powerpc/eeh: Only dump stack once if an MMIO loop is detected Sasha Levin
@ 2020-09-18 1:58 ` Sasha Levin
2020-09-18 1:58 ` [PATCH AUTOSEL 5.4 180/330] KVM: PPC: Book3S HV: Treat TM-related invalid form instructions on P9 like the valid ones Sasha Levin
` (3 subsequent siblings)
7 siblings, 0 replies; 11+ messages in thread
From: Sasha Levin @ 2020-09-18 1:58 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Alexey Kardashevskiy, Jan Kara, linuxppc-dev, Sasha Levin
From: Alexey Kardashevskiy <aik@ozlabs.ru>
[ Upstream commit c4b78169e3667413184c9a20e11b5832288a109f ]
The last jump to free_exit in mm_iommu_do_alloc() happens after page
pointers in struct mm_iommu_table_group_mem_t were already converted to
physical addresses. Thus calling put_page() on these physical addresses
will likely crash.
This moves the loop which calculates the pageshift and converts page
struct pointers to physical addresses later after the point when
we cannot fail; thus eliminating the need to convert pointers back.
Fixes: eb9d7a62c386 ("powerpc/mm_iommu: Fix potential deadlock")
Reported-by: Jan Kara <jack@suse.cz>
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20191223060351.26359-1-aik@ozlabs.ru
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
arch/powerpc/mm/book3s64/iommu_api.c | 39 +++++++++++++++-------------
1 file changed, 21 insertions(+), 18 deletions(-)
diff --git a/arch/powerpc/mm/book3s64/iommu_api.c b/arch/powerpc/mm/book3s64/iommu_api.c
index 56cc845205779..ef164851738b8 100644
--- a/arch/powerpc/mm/book3s64/iommu_api.c
+++ b/arch/powerpc/mm/book3s64/iommu_api.c
@@ -121,24 +121,6 @@ static long mm_iommu_do_alloc(struct mm_struct *mm, unsigned long ua,
goto free_exit;
}
- pageshift = PAGE_SHIFT;
- for (i = 0; i < entries; ++i) {
- struct page *page = mem->hpages[i];
-
- /*
- * Allow to use larger than 64k IOMMU pages. Only do that
- * if we are backed by hugetlb.
- */
- if ((mem->pageshift > PAGE_SHIFT) && PageHuge(page))
- pageshift = page_shift(compound_head(page));
- mem->pageshift = min(mem->pageshift, pageshift);
- /*
- * We don't need struct page reference any more, switch
- * to physical address.
- */
- mem->hpas[i] = page_to_pfn(page) << PAGE_SHIFT;
- }
-
good_exit:
atomic64_set(&mem->mapped, 1);
mem->used = 1;
@@ -158,6 +140,27 @@ good_exit:
}
}
+ if (mem->dev_hpa == MM_IOMMU_TABLE_INVALID_HPA) {
+ /*
+ * Allow to use larger than 64k IOMMU pages. Only do that
+ * if we are backed by hugetlb. Skip device memory as it is not
+ * backed with page structs.
+ */
+ pageshift = PAGE_SHIFT;
+ for (i = 0; i < entries; ++i) {
+ struct page *page = mem->hpages[i];
+
+ if ((mem->pageshift > PAGE_SHIFT) && PageHuge(page))
+ pageshift = page_shift(compound_head(page));
+ mem->pageshift = min(mem->pageshift, pageshift);
+ /*
+ * We don't need struct page reference any more, switch
+ * to physical address.
+ */
+ mem->hpas[i] = page_to_pfn(page) << PAGE_SHIFT;
+ }
+ }
+
list_add_rcu(&mem->next, &mm->context.iommu_group_mem_list);
mutex_unlock(&mem_list_mutex);
--
2.25.1
^ permalink raw reply related [flat|nested] 11+ messages in thread* [PATCH AUTOSEL 5.4 180/330] KVM: PPC: Book3S HV: Treat TM-related invalid form instructions on P9 like the valid ones
[not found] <20200918020110.2063155-1-sashal@kernel.org>
` (3 preceding siblings ...)
2020-09-18 1:58 ` [PATCH AUTOSEL 5.4 153/330] powerpc/book3s64: Fix error handling in mm_iommu_do_alloc() Sasha Levin
@ 2020-09-18 1:58 ` Sasha Levin
2020-09-18 1:59 ` [PATCH AUTOSEL 5.4 224/330] powerpc/perf: Implement a global lock to avoid races between trace, core and thread imc events Sasha Levin
` (2 subsequent siblings)
7 siblings, 0 replies; 11+ messages in thread
From: Sasha Levin @ 2020-09-18 1:58 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Sasha Levin, Michael Neuling, kvm, Gustavo Romero, kvm-ppc,
Leonardo Bras, linuxppc-dev
From: Gustavo Romero <gromero@linux.ibm.com>
[ Upstream commit 1dff3064c764b5a51c367b949b341d2e38972bec ]
On P9 DD2.2 due to a CPU defect some TM instructions need to be emulated by
KVM. This is handled at first by the hardware raising a softpatch interrupt
when certain TM instructions that need KVM assistance are executed in the
guest. Althought some TM instructions per Power ISA are invalid forms they
can raise a softpatch interrupt too. For instance, 'tresume.' instruction
as defined in the ISA must have bit 31 set (1), but an instruction that
matches 'tresume.' PO and XO opcode fields but has bit 31 not set (0), like
0x7cfe9ddc, also raises a softpatch interrupt. Similarly for 'treclaim.'
and 'trechkpt.' instructions with bit 31 = 0, i.e. 0x7c00075c and
0x7c0007dc, respectively. Hence, if a code like the following is executed
in the guest it will raise a softpatch interrupt just like a 'tresume.'
when the TM facility is enabled ('tabort. 0' in the example is used only
to enable the TM facility):
int main() { asm("tabort. 0; .long 0x7cfe9ddc;"); }
Currently in such a case KVM throws a complete trace like:
[345523.705984] WARNING: CPU: 24 PID: 64413 at arch/powerpc/kvm/book3s_hv_tm.c:211 kvmhv_p9_tm_emulation+0x68/0x620 [kvm_hv]
[345523.705985] Modules linked in: kvm_hv(E) xt_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp ip6table_mangle ip6table_nat
iptable_mangle iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ebtable_filter ebtables ip6table_filter
ip6_tables iptable_filter bridge stp llc sch_fq_codel ipmi_powernv at24 vmx_crypto ipmi_devintf ipmi_msghandler
ibmpowernv uio_pdrv_genirq kvm opal_prd uio leds_powernv ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp
libiscsi scsi_transport_iscsi ip_tables x_tables autofs4 btrfs blake2b_generic zstd_compress raid10 raid456
async_raid6_recov async_memcpy async_pq async_xor async_tx libcrc32c xor raid6_pq raid1 raid0 multipath linear tg3
crct10dif_vpmsum crc32c_vpmsum ipr [last unloaded: kvm_hv]
[345523.706030] CPU: 24 PID: 64413 Comm: CPU 0/KVM Tainted: G W E 5.5.0+ #1
[345523.706031] NIP: c0080000072cb9c0 LR: c0080000072b5e80 CTR: c0080000085c7850
[345523.706034] REGS: c000000399467680 TRAP: 0700 Tainted: G W E (5.5.0+)
[345523.706034] MSR: 900000010282b033 <SF,HV,VEC,VSX,EE,FP,ME,IR,DR,RI,LE,TM[E]> CR: 24022428 XER: 00000000
[345523.706042] CFAR: c0080000072b5e7c IRQMASK: 0
GPR00: c0080000072b5e80 c000000399467910 c0080000072db500 c000000375ccc720
GPR04: c000000375ccc720 00000003fbec0000 0000a10395dda5a6 0000000000000000
GPR08: 000000007cfe9ddc 7cfe9ddc000005dc 7cfe9ddc7c0005dc c0080000072cd530
GPR12: c0080000085c7850 c0000003fffeb800 0000000000000001 00007dfb737f0000
GPR16: c0002001edcca558 0000000000000000 0000000000000000 0000000000000001
GPR20: c000000001b21258 c0002001edcca558 0000000000000018 0000000000000000
GPR24: 0000000001000000 ffffffffffffffff 0000000000000001 0000000000001500
GPR28: c0002001edcc4278 c00000037dd80000 800000050280f033 c000000375ccc720
[345523.706062] NIP [c0080000072cb9c0] kvmhv_p9_tm_emulation+0x68/0x620 [kvm_hv]
[345523.706065] LR [c0080000072b5e80] kvmppc_handle_exit_hv.isra.53+0x3e8/0x798 [kvm_hv]
[345523.706066] Call Trace:
[345523.706069] [c000000399467910] [c000000399467940] 0xc000000399467940 (unreliable)
[345523.706071] [c000000399467950] [c000000399467980] 0xc000000399467980
[345523.706075] [c0000003994679f0] [c0080000072bd1c4] kvmhv_run_single_vcpu+0xa1c/0xb80 [kvm_hv]
[345523.706079] [c000000399467ac0] [c0080000072bd8e0] kvmppc_vcpu_run_hv+0x5b8/0xb00 [kvm_hv]
[345523.706087] [c000000399467b90] [c0080000085c93cc] kvmppc_vcpu_run+0x34/0x48 [kvm]
[345523.706095] [c000000399467bb0] [c0080000085c582c] kvm_arch_vcpu_ioctl_run+0x244/0x420 [kvm]
[345523.706101] [c000000399467c40] [c0080000085b7498] kvm_vcpu_ioctl+0x3d0/0x7b0 [kvm]
[345523.706105] [c000000399467db0] [c0000000004adf9c] ksys_ioctl+0x13c/0x170
[345523.706107] [c000000399467e00] [c0000000004adff8] sys_ioctl+0x28/0x80
[345523.706111] [c000000399467e20] [c00000000000b278] system_call+0x5c/0x68
[345523.706112] Instruction dump:
[345523.706114] 419e0390 7f8a4840 409d0048 6d497c00 2f89075d 419e021c 6d497c00 2f8907dd
[345523.706119] 419e01c0 6d497c00 2f8905dd 419e00a4 <0fe00000> 38210040 38600000 ebc1fff0
and then treats the executed instruction as a 'nop'.
However the POWER9 User's Manual, in section "4.6.10 Book II Invalid
Forms", informs that for TM instructions bit 31 is in fact ignored, thus
for the TM-related invalid forms ignoring bit 31 and handling them like the
valid forms is an acceptable way to handle them. POWER8 behaves the same
way too.
This commit changes the handling of the cases here described by treating
the TM-related invalid forms that can generate a softpatch interrupt
just like their valid forms (w/ bit 31 = 1) instead of as a 'nop' and by
gently reporting any other unrecognized case to the host and treating it as
illegal instruction instead of throwing a trace and treating it as a 'nop'.
Signed-off-by: Gustavo Romero <gromero@linux.ibm.com>
Reviewed-by: Segher Boessenkool <segher@kernel.crashing.org>
Acked-By: Michael Neuling <mikey@neuling.org>
Reviewed-by: Leonardo Bras <leonardo@linux.ibm.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
arch/powerpc/include/asm/kvm_asm.h | 3 +++
arch/powerpc/kvm/book3s_hv_tm.c | 28 ++++++++++++++++++++-----
arch/powerpc/kvm/book3s_hv_tm_builtin.c | 16 ++++++++++++--
3 files changed, 40 insertions(+), 7 deletions(-)
diff --git a/arch/powerpc/include/asm/kvm_asm.h b/arch/powerpc/include/asm/kvm_asm.h
index 635fb154b33f9..a3633560493be 100644
--- a/arch/powerpc/include/asm/kvm_asm.h
+++ b/arch/powerpc/include/asm/kvm_asm.h
@@ -150,4 +150,7 @@
#define KVM_INST_FETCH_FAILED -1
+/* Extract PO and XOP opcode fields */
+#define PO_XOP_OPCODE_MASK 0xfc0007fe
+
#endif /* __POWERPC_KVM_ASM_H__ */
diff --git a/arch/powerpc/kvm/book3s_hv_tm.c b/arch/powerpc/kvm/book3s_hv_tm.c
index 0db9374971697..cc90b8b823291 100644
--- a/arch/powerpc/kvm/book3s_hv_tm.c
+++ b/arch/powerpc/kvm/book3s_hv_tm.c
@@ -3,6 +3,8 @@
* Copyright 2017 Paul Mackerras, IBM Corp. <paulus@au1.ibm.com>
*/
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
#include <linux/kvm_host.h>
#include <asm/kvm_ppc.h>
@@ -44,7 +46,18 @@ int kvmhv_p9_tm_emulation(struct kvm_vcpu *vcpu)
u64 newmsr, bescr;
int ra, rs;
- switch (instr & 0xfc0007ff) {
+ /*
+ * rfid, rfebb, and mtmsrd encode bit 31 = 0 since it's a reserved bit
+ * in these instructions, so masking bit 31 out doesn't change these
+ * instructions. For treclaim., tsr., and trechkpt. instructions if bit
+ * 31 = 0 then they are per ISA invalid forms, however P9 UM, in section
+ * 4.6.10 Book II Invalid Forms, informs specifically that ignoring bit
+ * 31 is an acceptable way to handle these invalid forms that have
+ * bit 31 = 0. Moreover, for emulation purposes both forms (w/ and wo/
+ * bit 31 set) can generate a softpatch interrupt. Hence both forms
+ * are handled below for these instructions so they behave the same way.
+ */
+ switch (instr & PO_XOP_OPCODE_MASK) {
case PPC_INST_RFID:
/* XXX do we need to check for PR=0 here? */
newmsr = vcpu->arch.shregs.srr1;
@@ -105,7 +118,8 @@ int kvmhv_p9_tm_emulation(struct kvm_vcpu *vcpu)
vcpu->arch.shregs.msr = newmsr;
return RESUME_GUEST;
- case PPC_INST_TSR:
+ /* ignore bit 31, see comment above */
+ case (PPC_INST_TSR & PO_XOP_OPCODE_MASK):
/* check for PR=1 and arch 2.06 bit set in PCR */
if ((msr & MSR_PR) && (vcpu->arch.vcore->pcr & PCR_ARCH_206)) {
/* generate an illegal instruction interrupt */
@@ -140,7 +154,8 @@ int kvmhv_p9_tm_emulation(struct kvm_vcpu *vcpu)
vcpu->arch.shregs.msr = msr;
return RESUME_GUEST;
- case PPC_INST_TRECLAIM:
+ /* ignore bit 31, see comment above */
+ case (PPC_INST_TRECLAIM & PO_XOP_OPCODE_MASK):
/* check for TM disabled in the HFSCR or MSR */
if (!(vcpu->arch.hfscr & HFSCR_TM)) {
/* generate an illegal instruction interrupt */
@@ -176,7 +191,8 @@ int kvmhv_p9_tm_emulation(struct kvm_vcpu *vcpu)
vcpu->arch.shregs.msr &= ~MSR_TS_MASK;
return RESUME_GUEST;
- case PPC_INST_TRECHKPT:
+ /* ignore bit 31, see comment above */
+ case (PPC_INST_TRECHKPT & PO_XOP_OPCODE_MASK):
/* XXX do we need to check for PR=0 here? */
/* check for TM disabled in the HFSCR or MSR */
if (!(vcpu->arch.hfscr & HFSCR_TM)) {
@@ -208,6 +224,8 @@ int kvmhv_p9_tm_emulation(struct kvm_vcpu *vcpu)
}
/* What should we do here? We didn't recognize the instruction */
- WARN_ON_ONCE(1);
+ kvmppc_core_queue_program(vcpu, SRR1_PROGILL);
+ pr_warn_ratelimited("Unrecognized TM-related instruction %#x for emulation", instr);
+
return RESUME_GUEST;
}
diff --git a/arch/powerpc/kvm/book3s_hv_tm_builtin.c b/arch/powerpc/kvm/book3s_hv_tm_builtin.c
index 217246279dfae..fad931f224efd 100644
--- a/arch/powerpc/kvm/book3s_hv_tm_builtin.c
+++ b/arch/powerpc/kvm/book3s_hv_tm_builtin.c
@@ -23,7 +23,18 @@ int kvmhv_p9_tm_emulation_early(struct kvm_vcpu *vcpu)
u64 newmsr, msr, bescr;
int rs;
- switch (instr & 0xfc0007ff) {
+ /*
+ * rfid, rfebb, and mtmsrd encode bit 31 = 0 since it's a reserved bit
+ * in these instructions, so masking bit 31 out doesn't change these
+ * instructions. For the tsr. instruction if bit 31 = 0 then it is per
+ * ISA an invalid form, however P9 UM, in section 4.6.10 Book II Invalid
+ * Forms, informs specifically that ignoring bit 31 is an acceptable way
+ * to handle TM-related invalid forms that have bit 31 = 0. Moreover,
+ * for emulation purposes both forms (w/ and wo/ bit 31 set) can
+ * generate a softpatch interrupt. Hence both forms are handled below
+ * for tsr. to make them behave the same way.
+ */
+ switch (instr & PO_XOP_OPCODE_MASK) {
case PPC_INST_RFID:
/* XXX do we need to check for PR=0 here? */
newmsr = vcpu->arch.shregs.srr1;
@@ -73,7 +84,8 @@ int kvmhv_p9_tm_emulation_early(struct kvm_vcpu *vcpu)
vcpu->arch.shregs.msr = newmsr;
return 1;
- case PPC_INST_TSR:
+ /* ignore bit 31, see comment above */
+ case (PPC_INST_TSR & PO_XOP_OPCODE_MASK):
/* we know the MSR has the TS field = S (0b01) here */
msr = vcpu->arch.shregs.msr;
/* check for PR=1 and arch 2.06 bit set in PCR */
--
2.25.1
^ permalink raw reply related [flat|nested] 11+ messages in thread* [PATCH AUTOSEL 5.4 224/330] powerpc/perf: Implement a global lock to avoid races between trace, core and thread imc events.
[not found] <20200918020110.2063155-1-sashal@kernel.org>
` (4 preceding siblings ...)
2020-09-18 1:58 ` [PATCH AUTOSEL 5.4 180/330] KVM: PPC: Book3S HV: Treat TM-related invalid form instructions on P9 like the valid ones Sasha Levin
@ 2020-09-18 1:59 ` Sasha Levin
2020-09-18 2:00 ` [PATCH AUTOSEL 5.4 270/330] powerpc/traps: Make unrecoverable NMIs die instead of panic Sasha Levin
2020-09-18 2:00 ` [PATCH AUTOSEL 5.4 290/330] KVM: PPC: Book3S HV: Close race with page faults around memslot flushes Sasha Levin
7 siblings, 0 replies; 11+ messages in thread
From: Sasha Levin @ 2020-09-18 1:59 UTC (permalink / raw)
To: linux-kernel, stable; +Cc: Sasha Levin, Anju T Sudhakar, linuxppc-dev
From: Anju T Sudhakar <anju@linux.vnet.ibm.com>
[ Upstream commit a36e8ba60b991d563677227f172db69e030797e6 ]
IMC(In-memory Collection Counters) does performance monitoring in
two different modes, i.e accumulation mode(core-imc and thread-imc events),
and trace mode(trace-imc events). A cpu thread can either be in
accumulation-mode or trace-mode at a time and this is done via the LDBAR
register in POWER architecture. The current design does not address the
races between thread-imc and trace-imc events.
Patch implements a global id and lock to avoid the races between
core, trace and thread imc events. With this global id-lock
implementation, the system can either run core, thread or trace imc
events at a time. i.e. to run any core-imc events, thread/trace imc events
should not be enabled/monitored.
Signed-off-by: Anju T Sudhakar <anju@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200313055238.8656-1-anju@linux.vnet.ibm.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
arch/powerpc/perf/imc-pmu.c | 173 +++++++++++++++++++++++++++++++-----
1 file changed, 149 insertions(+), 24 deletions(-)
diff --git a/arch/powerpc/perf/imc-pmu.c b/arch/powerpc/perf/imc-pmu.c
index cb50a9e1fd2d7..eb82dda884e51 100644
--- a/arch/powerpc/perf/imc-pmu.c
+++ b/arch/powerpc/perf/imc-pmu.c
@@ -44,6 +44,16 @@ static DEFINE_PER_CPU(u64 *, trace_imc_mem);
static struct imc_pmu_ref *trace_imc_refc;
static int trace_imc_mem_size;
+/*
+ * Global data structure used to avoid races between thread,
+ * core and trace-imc
+ */
+static struct imc_pmu_ref imc_global_refc = {
+ .lock = __MUTEX_INITIALIZER(imc_global_refc.lock),
+ .id = 0,
+ .refc = 0,
+};
+
static struct imc_pmu *imc_event_to_pmu(struct perf_event *event)
{
return container_of(event->pmu, struct imc_pmu, pmu);
@@ -698,6 +708,16 @@ static int ppc_core_imc_cpu_offline(unsigned int cpu)
return -EINVAL;
ref->refc = 0;
+ /*
+ * Reduce the global reference count, if this is the
+ * last cpu in this core and core-imc event running
+ * in this cpu.
+ */
+ mutex_lock(&imc_global_refc.lock);
+ if (imc_global_refc.id == IMC_DOMAIN_CORE)
+ imc_global_refc.refc--;
+
+ mutex_unlock(&imc_global_refc.lock);
}
return 0;
}
@@ -710,6 +730,23 @@ static int core_imc_pmu_cpumask_init(void)
ppc_core_imc_cpu_offline);
}
+static void reset_global_refc(struct perf_event *event)
+{
+ mutex_lock(&imc_global_refc.lock);
+ imc_global_refc.refc--;
+
+ /*
+ * If no other thread is running any
+ * event for this domain(thread/core/trace),
+ * set the global id to zero.
+ */
+ if (imc_global_refc.refc <= 0) {
+ imc_global_refc.refc = 0;
+ imc_global_refc.id = 0;
+ }
+ mutex_unlock(&imc_global_refc.lock);
+}
+
static void core_imc_counters_release(struct perf_event *event)
{
int rc, core_id;
@@ -759,6 +796,8 @@ static void core_imc_counters_release(struct perf_event *event)
ref->refc = 0;
}
mutex_unlock(&ref->lock);
+
+ reset_global_refc(event);
}
static int core_imc_event_init(struct perf_event *event)
@@ -819,6 +858,29 @@ static int core_imc_event_init(struct perf_event *event)
++ref->refc;
mutex_unlock(&ref->lock);
+ /*
+ * Since the system can run either in accumulation or trace-mode
+ * of IMC at a time, core-imc events are allowed only if no other
+ * trace/thread imc events are enabled/monitored.
+ *
+ * Take the global lock, and check the refc.id
+ * to know whether any other trace/thread imc
+ * events are running.
+ */
+ mutex_lock(&imc_global_refc.lock);
+ if (imc_global_refc.id == 0 || imc_global_refc.id == IMC_DOMAIN_CORE) {
+ /*
+ * No other trace/thread imc events are running in
+ * the system, so set the refc.id to core-imc.
+ */
+ imc_global_refc.id = IMC_DOMAIN_CORE;
+ imc_global_refc.refc++;
+ } else {
+ mutex_unlock(&imc_global_refc.lock);
+ return -EBUSY;
+ }
+ mutex_unlock(&imc_global_refc.lock);
+
event->hw.event_base = (u64)pcmi->vbase + (config & IMC_EVENT_OFFSET_MASK);
event->destroy = core_imc_counters_release;
return 0;
@@ -877,7 +939,23 @@ static int ppc_thread_imc_cpu_online(unsigned int cpu)
static int ppc_thread_imc_cpu_offline(unsigned int cpu)
{
- mtspr(SPRN_LDBAR, 0);
+ /*
+ * Set the bit 0 of LDBAR to zero.
+ *
+ * If bit 0 of LDBAR is unset, it will stop posting
+ * the counter data to memory.
+ * For thread-imc, bit 0 of LDBAR will be set to 1 in the
+ * event_add function. So reset this bit here, to stop the updates
+ * to memory in the cpu_offline path.
+ */
+ mtspr(SPRN_LDBAR, (mfspr(SPRN_LDBAR) & (~(1UL << 63))));
+
+ /* Reduce the refc if thread-imc event running on this cpu */
+ mutex_lock(&imc_global_refc.lock);
+ if (imc_global_refc.id == IMC_DOMAIN_THREAD)
+ imc_global_refc.refc--;
+ mutex_unlock(&imc_global_refc.lock);
+
return 0;
}
@@ -916,7 +994,22 @@ static int thread_imc_event_init(struct perf_event *event)
if (!target)
return -EINVAL;
+ mutex_lock(&imc_global_refc.lock);
+ /*
+ * Check if any other trace/core imc events are running in the
+ * system, if not set the global id to thread-imc.
+ */
+ if (imc_global_refc.id == 0 || imc_global_refc.id == IMC_DOMAIN_THREAD) {
+ imc_global_refc.id = IMC_DOMAIN_THREAD;
+ imc_global_refc.refc++;
+ } else {
+ mutex_unlock(&imc_global_refc.lock);
+ return -EBUSY;
+ }
+ mutex_unlock(&imc_global_refc.lock);
+
event->pmu->task_ctx_nr = perf_sw_context;
+ event->destroy = reset_global_refc;
return 0;
}
@@ -1063,10 +1156,12 @@ static void thread_imc_event_del(struct perf_event *event, int flags)
int core_id;
struct imc_pmu_ref *ref;
- mtspr(SPRN_LDBAR, 0);
-
core_id = smp_processor_id() / threads_per_core;
ref = &core_imc_refc[core_id];
+ if (!ref) {
+ pr_debug("imc: Failed to get event reference count\n");
+ return;
+ }
mutex_lock(&ref->lock);
ref->refc--;
@@ -1082,6 +1177,10 @@ static void thread_imc_event_del(struct perf_event *event, int flags)
ref->refc = 0;
}
mutex_unlock(&ref->lock);
+
+ /* Set bit 0 of LDBAR to zero, to stop posting updates to memory */
+ mtspr(SPRN_LDBAR, (mfspr(SPRN_LDBAR) & (~(1UL << 63))));
+
/*
* Take a snapshot and calculate the delta and update
* the event counter values.
@@ -1133,7 +1232,18 @@ static int ppc_trace_imc_cpu_online(unsigned int cpu)
static int ppc_trace_imc_cpu_offline(unsigned int cpu)
{
- mtspr(SPRN_LDBAR, 0);
+ /*
+ * No need to set bit 0 of LDBAR to zero, as
+ * it is set to zero for imc trace-mode
+ *
+ * Reduce the refc if any trace-imc event running
+ * on this cpu.
+ */
+ mutex_lock(&imc_global_refc.lock);
+ if (imc_global_refc.id == IMC_DOMAIN_TRACE)
+ imc_global_refc.refc--;
+ mutex_unlock(&imc_global_refc.lock);
+
return 0;
}
@@ -1226,15 +1336,14 @@ static int trace_imc_event_add(struct perf_event *event, int flags)
local_mem = get_trace_imc_event_base_addr();
ldbar_value = ((u64)local_mem & THREAD_IMC_LDBAR_MASK) | TRACE_IMC_ENABLE;
- if (core_imc_refc)
- ref = &core_imc_refc[core_id];
+ /* trace-imc reference count */
+ if (trace_imc_refc)
+ ref = &trace_imc_refc[core_id];
if (!ref) {
- /* If core-imc is not enabled, use trace-imc reference count */
- if (trace_imc_refc)
- ref = &trace_imc_refc[core_id];
- if (!ref)
- return -EINVAL;
+ pr_debug("imc: Failed to get the event reference count\n");
+ return -EINVAL;
}
+
mtspr(SPRN_LDBAR, ldbar_value);
mutex_lock(&ref->lock);
if (ref->refc == 0) {
@@ -1242,13 +1351,11 @@ static int trace_imc_event_add(struct perf_event *event, int flags)
get_hard_smp_processor_id(smp_processor_id()))) {
mutex_unlock(&ref->lock);
pr_err("trace-imc: Unable to start the counters for core %d\n", core_id);
- mtspr(SPRN_LDBAR, 0);
return -EINVAL;
}
}
++ref->refc;
mutex_unlock(&ref->lock);
-
return 0;
}
@@ -1274,16 +1381,13 @@ static void trace_imc_event_del(struct perf_event *event, int flags)
int core_id = smp_processor_id() / threads_per_core;
struct imc_pmu_ref *ref = NULL;
- if (core_imc_refc)
- ref = &core_imc_refc[core_id];
+ if (trace_imc_refc)
+ ref = &trace_imc_refc[core_id];
if (!ref) {
- /* If core-imc is not enabled, use trace-imc reference count */
- if (trace_imc_refc)
- ref = &trace_imc_refc[core_id];
- if (!ref)
- return;
+ pr_debug("imc: Failed to get event reference count\n");
+ return;
}
- mtspr(SPRN_LDBAR, 0);
+
mutex_lock(&ref->lock);
ref->refc--;
if (ref->refc == 0) {
@@ -1297,6 +1401,7 @@ static void trace_imc_event_del(struct perf_event *event, int flags)
ref->refc = 0;
}
mutex_unlock(&ref->lock);
+
trace_imc_event_stop(event, flags);
}
@@ -1314,10 +1419,30 @@ static int trace_imc_event_init(struct perf_event *event)
if (event->attr.sample_period == 0)
return -ENOENT;
+ /*
+ * Take the global lock, and make sure
+ * no other thread is running any core/thread imc
+ * events
+ */
+ mutex_lock(&imc_global_refc.lock);
+ if (imc_global_refc.id == 0 || imc_global_refc.id == IMC_DOMAIN_TRACE) {
+ /*
+ * No core/thread imc events are running in the
+ * system, so set the refc.id to trace-imc.
+ */
+ imc_global_refc.id = IMC_DOMAIN_TRACE;
+ imc_global_refc.refc++;
+ } else {
+ mutex_unlock(&imc_global_refc.lock);
+ return -EBUSY;
+ }
+ mutex_unlock(&imc_global_refc.lock);
+
event->hw.idx = -1;
target = event->hw.target;
event->pmu->task_ctx_nr = perf_hw_context;
+ event->destroy = reset_global_refc;
return 0;
}
@@ -1429,10 +1554,10 @@ static void cleanup_all_core_imc_memory(void)
static void thread_imc_ldbar_disable(void *dummy)
{
/*
- * By Zeroing LDBAR, we disable thread-imc
- * updates.
+ * By setting 0th bit of LDBAR to zero, we disable thread-imc
+ * updates to memory.
*/
- mtspr(SPRN_LDBAR, 0);
+ mtspr(SPRN_LDBAR, (mfspr(SPRN_LDBAR) & (~(1UL << 63))));
}
void thread_imc_disable(void)
--
2.25.1
^ permalink raw reply related [flat|nested] 11+ messages in thread* [PATCH AUTOSEL 5.4 270/330] powerpc/traps: Make unrecoverable NMIs die instead of panic
[not found] <20200918020110.2063155-1-sashal@kernel.org>
` (5 preceding siblings ...)
2020-09-18 1:59 ` [PATCH AUTOSEL 5.4 224/330] powerpc/perf: Implement a global lock to avoid races between trace, core and thread imc events Sasha Levin
@ 2020-09-18 2:00 ` Sasha Levin
2020-09-18 2:00 ` [PATCH AUTOSEL 5.4 290/330] KVM: PPC: Book3S HV: Close race with page faults around memslot flushes Sasha Levin
7 siblings, 0 replies; 11+ messages in thread
From: Sasha Levin @ 2020-09-18 2:00 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Christophe Leroy, linuxppc-dev, Nicholas Piggin, Sasha Levin
From: Nicholas Piggin <npiggin@gmail.com>
[ Upstream commit 265d6e588d87194c2fe2d6c240247f0264e0c19b ]
System Reset and Machine Check interrupts that are not recoverable due
to being nested or interrupting when RI=0 currently panic. This is not
necessary, and can often just kill the current context and recover.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Reviewed-by: Christophe Leroy <christophe.leroy@c-s.fr>
Link: https://lore.kernel.org/r/20200508043408.886394-16-npiggin@gmail.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
arch/powerpc/kernel/traps.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
index 014ff0701f245..9432fc6af28a5 100644
--- a/arch/powerpc/kernel/traps.c
+++ b/arch/powerpc/kernel/traps.c
@@ -510,11 +510,11 @@ out:
#ifdef CONFIG_PPC_BOOK3S_64
BUG_ON(get_paca()->in_nmi == 0);
if (get_paca()->in_nmi > 1)
- nmi_panic(regs, "Unrecoverable nested System Reset");
+ die("Unrecoverable nested System Reset", regs, SIGABRT);
#endif
/* Must die if the interrupt is not recoverable */
if (!(regs->msr & MSR_RI))
- nmi_panic(regs, "Unrecoverable System Reset");
+ die("Unrecoverable System Reset", regs, SIGABRT);
if (saved_hsrrs) {
mtspr(SPRN_HSRR0, hsrr0);
@@ -858,7 +858,7 @@ void machine_check_exception(struct pt_regs *regs)
/* Must die if the interrupt is not recoverable */
if (!(regs->msr & MSR_RI))
- nmi_panic(regs, "Unrecoverable Machine check");
+ die("Unrecoverable Machine check", regs, SIGBUS);
return;
--
2.25.1
^ permalink raw reply related [flat|nested] 11+ messages in thread* [PATCH AUTOSEL 5.4 290/330] KVM: PPC: Book3S HV: Close race with page faults around memslot flushes
[not found] <20200918020110.2063155-1-sashal@kernel.org>
` (6 preceding siblings ...)
2020-09-18 2:00 ` [PATCH AUTOSEL 5.4 270/330] powerpc/traps: Make unrecoverable NMIs die instead of panic Sasha Levin
@ 2020-09-18 2:00 ` Sasha Levin
7 siblings, 0 replies; 11+ messages in thread
From: Sasha Levin @ 2020-09-18 2:00 UTC (permalink / raw)
To: linux-kernel, stable; +Cc: Sasha Levin, linuxppc-dev, kvm, kvm-ppc
From: Paul Mackerras <paulus@ozlabs.org>
[ Upstream commit 11362b1befeadaae4d159a8cddcdaf6b8afe08f9 ]
There is a potential race condition between hypervisor page faults
and flushing a memslot. It is possible for a page fault to read the
memslot before a memslot is updated and then write a PTE to the
partition-scoped page tables after kvmppc_radix_flush_memslot has
completed. (Note that this race has never been explicitly observed.)
To close this race, it is sufficient to increment the MMU sequence
number while the kvm->mmu_lock is held. That will cause
mmu_notifier_retry() to return true, and the page fault will then
return to the guest without inserting a PTE.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
arch/powerpc/kvm/book3s_64_mmu_radix.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/arch/powerpc/kvm/book3s_64_mmu_radix.c b/arch/powerpc/kvm/book3s_64_mmu_radix.c
index da8375437d161..9d73448354698 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_radix.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_radix.c
@@ -1104,6 +1104,11 @@ void kvmppc_radix_flush_memslot(struct kvm *kvm,
kvm->arch.lpid);
gpa += PAGE_SIZE;
}
+ /*
+ * Increase the mmu notifier sequence number to prevent any page
+ * fault that read the memslot earlier from writing a PTE.
+ */
+ kvm->mmu_notifier_seq++;
spin_unlock(&kvm->mmu_lock);
}
--
2.25.1
^ permalink raw reply related [flat|nested] 11+ messages in thread