* [PATCH AUTOSEL 4.19 052/192] powerpc/powernv/ioda: Fix locked_vm counting for memory used by IOMMU tables
[not found] <20190327181025.13507-1-sashal@kernel.org>
@ 2019-03-27 18:08 ` Sasha Levin
2019-03-27 18:08 ` [PATCH AUTOSEL 4.19 055/192] powerpc/xmon: Fix opcode being uninitialized in print_insn_powerpc Sasha Levin
` (5 subsequent siblings)
6 siblings, 0 replies; 7+ messages in thread
From: Sasha Levin @ 2019-03-27 18:08 UTC (permalink / raw)
To: linux-kernel, stable; +Cc: Alexey Kardashevskiy, linuxppc-dev, Sasha Levin
From: Alexey Kardashevskiy <aik@ozlabs.ru>
[ Upstream commit 11f5acce2fa43b015a8120fa7620fa4efd0a2952 ]
We store 2 multilevel tables in iommu_table - one for the hardware and
one with the corresponding userspace addresses. Before allocating
the tables, the iommu_table_group_ops::get_table_size() hook returns
the combined size of the two and VFIO SPAPR TCE IOMMU driver adjusts
the locked_vm counter correctly. When the table is actually allocated,
the amount of allocated memory is stored in iommu_table::it_allocated_size
and used to decrement the locked_vm counter when we release the memory
used by the table; .get_table_size() and .create_table() calculate it
independently but the result is expected to be the same.
However the allocator does not add the userspace table size to
.it_allocated_size so when we destroy the table because of VFIO PCI
unplug (i.e. VFIO container is gone but the userspace keeps running),
we decrement locked_vm by just a half of size of memory we are
releasing.
To make things worse, since we enabled on-demand allocation of
indirect levels, it_allocated_size contains only the amount of memory
actually allocated at the table creation time which can just be a
fraction. It is not a problem with incrementing locked_vm (as
get_table_size() value is used) but it is with decrementing.
As the result, we leak locked_vm and may not be able to allocate more
IOMMU tables after few iterations of hotplug/unplug.
This sets it_allocated_size in the pnv_pci_ioda2_ops::create_table()
hook to what pnv_pci_ioda2_get_table_size() returns so from now on we
have a single place which calculates the maximum memory a table can
occupy. The original meaning of it_allocated_size is somewhat lost now
though.
We do not ditch it_allocated_size whatsoever here and we do not call
get_table_size() from vfio_iommu_spapr_tce.c when decrementing
locked_vm as we may have multiple IOMMU groups per container and even
though they all are supposed to have the same get_table_size()
implementation, there is a small chance for failure or confusion.
Fixes: 090bad39b237 ("powerpc/powernv: Add indirect levels to it_userspace")
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
arch/powerpc/platforms/powernv/pci-ioda-tce.c | 1 -
arch/powerpc/platforms/powernv/pci-ioda.c | 7 ++++++-
2 files changed, 6 insertions(+), 2 deletions(-)
diff --git a/arch/powerpc/platforms/powernv/pci-ioda-tce.c b/arch/powerpc/platforms/powernv/pci-ioda-tce.c
index 7639b2168755..f5adb6b756f7 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda-tce.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda-tce.c
@@ -313,7 +313,6 @@ long pnv_pci_ioda2_table_alloc_pages(int nid, __u64 bus_offset,
page_shift);
tbl->it_level_size = 1ULL << (level_shift - 3);
tbl->it_indirect_levels = levels - 1;
- tbl->it_allocated_size = total_allocated;
tbl->it_userspace = uas;
tbl->it_nid = nid;
diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
index cde710297a4e..326ca6288bb1 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -2603,8 +2603,13 @@ static long pnv_pci_ioda2_create_table_userspace(
int num, __u32 page_shift, __u64 window_size, __u32 levels,
struct iommu_table **ptbl)
{
- return pnv_pci_ioda2_create_table(table_group,
+ long ret = pnv_pci_ioda2_create_table(table_group,
num, page_shift, window_size, levels, true, ptbl);
+
+ if (!ret)
+ (*ptbl)->it_allocated_size = pnv_pci_ioda2_get_table_size(
+ page_shift, window_size, levels);
+ return ret;
}
static void pnv_ioda2_take_ownership(struct iommu_table_group *table_group)
--
2.19.1
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH AUTOSEL 4.19 055/192] powerpc/xmon: Fix opcode being uninitialized in print_insn_powerpc
[not found] <20190327181025.13507-1-sashal@kernel.org>
2019-03-27 18:08 ` [PATCH AUTOSEL 4.19 052/192] powerpc/powernv/ioda: Fix locked_vm counting for memory used by IOMMU tables Sasha Levin
@ 2019-03-27 18:08 ` Sasha Levin
2019-03-27 18:08 ` [PATCH AUTOSEL 4.19 060/192] powerpc/hugetlb: Handle mmap_min_addr correctly in get_unmapped_area callback Sasha Levin
` (4 subsequent siblings)
6 siblings, 0 replies; 7+ messages in thread
From: Sasha Levin @ 2019-03-27 18:08 UTC (permalink / raw)
To: linux-kernel, stable; +Cc: Sasha Levin, Nathan Chancellor, linuxppc-dev
From: Nathan Chancellor <natechancellor@gmail.com>
[ Upstream commit e7140639b1de65bba435a6bd772d134901141f86 ]
When building with -Wsometimes-uninitialized, Clang warns:
arch/powerpc/xmon/ppc-dis.c:157:7: warning: variable 'opcode' is used
uninitialized whenever 'if' condition is false
[-Wsometimes-uninitialized]
if (cpu_has_feature(CPU_FTRS_POWER9))
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
arch/powerpc/xmon/ppc-dis.c:167:7: note: uninitialized use occurs here
if (opcode == NULL)
^~~~~~
arch/powerpc/xmon/ppc-dis.c:157:3: note: remove the 'if' if its
condition is always true
if (cpu_has_feature(CPU_FTRS_POWER9))
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
arch/powerpc/xmon/ppc-dis.c:132:38: note: initialize the variable
'opcode' to silence this warning
const struct powerpc_opcode *opcode;
^
= NULL
1 warning generated.
This warning seems to make no sense on the surface because opcode is set
to NULL right below this statement. However, there is a comma instead of
semicolon to end the dialect assignment, meaning that the opcode
assignment only happens in the if statement. Properly terminate that
line so that Clang no longer warns.
Fixes: 5b102782c7f4 ("powerpc/xmon: Enable disassembly files (compilation changes)")
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
arch/powerpc/xmon/ppc-dis.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/powerpc/xmon/ppc-dis.c b/arch/powerpc/xmon/ppc-dis.c
index 9deea5ee13f6..27f1e6415036 100644
--- a/arch/powerpc/xmon/ppc-dis.c
+++ b/arch/powerpc/xmon/ppc-dis.c
@@ -158,7 +158,7 @@ int print_insn_powerpc (unsigned long insn, unsigned long memaddr)
dialect |= (PPC_OPCODE_POWER5 | PPC_OPCODE_POWER6 | PPC_OPCODE_POWER7
| PPC_OPCODE_POWER8 | PPC_OPCODE_POWER9 | PPC_OPCODE_HTM
| PPC_OPCODE_ALTIVEC | PPC_OPCODE_ALTIVEC2
- | PPC_OPCODE_VSX | PPC_OPCODE_VSX3),
+ | PPC_OPCODE_VSX | PPC_OPCODE_VSX3);
/* Get the major opcode of the insn. */
opcode = NULL;
--
2.19.1
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH AUTOSEL 4.19 060/192] powerpc/hugetlb: Handle mmap_min_addr correctly in get_unmapped_area callback
[not found] <20190327181025.13507-1-sashal@kernel.org>
2019-03-27 18:08 ` [PATCH AUTOSEL 4.19 052/192] powerpc/powernv/ioda: Fix locked_vm counting for memory used by IOMMU tables Sasha Levin
2019-03-27 18:08 ` [PATCH AUTOSEL 4.19 055/192] powerpc/xmon: Fix opcode being uninitialized in print_insn_powerpc Sasha Levin
@ 2019-03-27 18:08 ` Sasha Levin
2019-03-27 18:08 ` [PATCH AUTOSEL 4.19 083/192] SoC: imx-sgtl5000: add missing put_device() Sasha Levin
` (3 subsequent siblings)
6 siblings, 0 replies; 7+ messages in thread
From: Sasha Levin @ 2019-03-27 18:08 UTC (permalink / raw)
To: linux-kernel, stable; +Cc: Sasha Levin, Aneesh Kumar K.V, linuxppc-dev
From: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
[ Upstream commit 5330367fa300742a97e20e953b1f77f48392faae ]
After we ALIGN up the address we need to make sure we didn't overflow
and resulted in zero address. In that case, we need to make sure that
the returned address is greater than mmap_min_addr.
This fixes selftest va_128TBswitch --run-hugetlb reporting failures when
run as non root user for
mmap(-1, MAP_HUGETLB)
The bug is that a non-root user requesting address -1 will be given address 0
which will then fail, whereas they should have been given something else that
would have succeeded.
We also avoid the first mmap(-1, MAP_HUGETLB) returning NULL address as mmap address
with this change. So we think this is not a security issue, because it only affects
whether we choose an address below mmap_min_addr, not whether we
actually allow that address to be mapped. ie. there are existing capability
checks to prevent a user mapping below mmap_min_addr and those will still be
honoured even without this fix.
Fixes: 484837601d4d ("powerpc/mm: Add radix support for hugetlb")
Reviewed-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
arch/powerpc/mm/hugetlbpage-radix.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/arch/powerpc/mm/hugetlbpage-radix.c b/arch/powerpc/mm/hugetlbpage-radix.c
index 2486bee0f93e..97c7a39ebc00 100644
--- a/arch/powerpc/mm/hugetlbpage-radix.c
+++ b/arch/powerpc/mm/hugetlbpage-radix.c
@@ -1,6 +1,7 @@
// SPDX-License-Identifier: GPL-2.0
#include <linux/mm.h>
#include <linux/hugetlb.h>
+#include <linux/security.h>
#include <asm/pgtable.h>
#include <asm/pgalloc.h>
#include <asm/cacheflush.h>
@@ -73,7 +74,7 @@ radix__hugetlb_get_unmapped_area(struct file *file, unsigned long addr,
if (addr) {
addr = ALIGN(addr, huge_page_size(h));
vma = find_vma(mm, addr);
- if (high_limit - len >= addr &&
+ if (high_limit - len >= addr && addr >= mmap_min_addr &&
(!vma || addr + len <= vm_start_gap(vma)))
return addr;
}
@@ -83,7 +84,7 @@ radix__hugetlb_get_unmapped_area(struct file *file, unsigned long addr,
*/
info.flags = VM_UNMAPPED_AREA_TOPDOWN;
info.length = len;
- info.low_limit = PAGE_SIZE;
+ info.low_limit = max(PAGE_SIZE, mmap_min_addr);
info.high_limit = mm->mmap_base + (high_limit - DEFAULT_MAP_WINDOW);
info.align_mask = PAGE_MASK & ~huge_page_mask(h);
info.align_offset = 0;
--
2.19.1
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH AUTOSEL 4.19 083/192] SoC: imx-sgtl5000: add missing put_device()
[not found] <20190327181025.13507-1-sashal@kernel.org>
` (2 preceding siblings ...)
2019-03-27 18:08 ` [PATCH AUTOSEL 4.19 060/192] powerpc/hugetlb: Handle mmap_min_addr correctly in get_unmapped_area callback Sasha Levin
@ 2019-03-27 18:08 ` Sasha Levin
2019-03-27 18:09 ` [PATCH AUTOSEL 4.19 133/192] ASoC: fsl-asoc-card: fix object reference leaks in fsl_asoc_card_probe Sasha Levin
` (2 subsequent siblings)
6 siblings, 0 replies; 7+ messages in thread
From: Sasha Levin @ 2019-03-27 18:08 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Sasha Levin, alsa-devel, linuxppc-dev, Timur Tabi, Xiubo Li,
Wen Yang, Sascha Hauer, Takashi Iwai, Liam Girdwood,
Jaroslav Kysela, Nicolin Chen, Mark Brown, NXP Linux Team,
Pengutronix Kernel Team, Shawn Guo, Fabio Estevam,
linux-arm-kernel
From: Wen Yang <yellowriver2010@hotmail.com>
[ Upstream commit 8fa857da9744f513036df1c43ab57f338941ae7d ]
The of_find_device_by_node() takes a reference to the underlying device
structure, we should release that reference.
Detected by coccinelle with the following warnings:
./sound/soc/fsl/imx-sgtl5000.c:169:1-7: ERROR: missing put_device;
call of_find_device_by_node on line 105, but without a corresponding
object release within this function.
./sound/soc/fsl/imx-sgtl5000.c:177:1-7: ERROR: missing put_device;
call of_find_device_by_node on line 105, but without a corresponding
object release within this function.
Signed-off-by: Wen Yang <yellowriver2010@hotmail.com>
Cc: Timur Tabi <timur@kernel.org>
Cc: Nicolin Chen <nicoleotsuka@gmail.com>
Cc: Xiubo Li <Xiubo.Lee@gmail.com>
Cc: Fabio Estevam <festevam@gmail.com>
Cc: Liam Girdwood <lgirdwood@gmail.com>
Cc: Mark Brown <broonie@kernel.org>
Cc: Jaroslav Kysela <perex@perex.cz>
Cc: Takashi Iwai <tiwai@suse.com>
Cc: Shawn Guo <shawnguo@kernel.org>
Cc: Sascha Hauer <s.hauer@pengutronix.de>
Cc: Pengutronix Kernel Team <kernel@pengutronix.de>
Cc: NXP Linux Team <linux-imx@nxp.com>
Cc: alsa-devel@alsa-project.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
sound/soc/fsl/imx-sgtl5000.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/sound/soc/fsl/imx-sgtl5000.c b/sound/soc/fsl/imx-sgtl5000.c
index c29200cf755a..9b9a7ec52905 100644
--- a/sound/soc/fsl/imx-sgtl5000.c
+++ b/sound/soc/fsl/imx-sgtl5000.c
@@ -108,6 +108,7 @@ static int imx_sgtl5000_probe(struct platform_device *pdev)
ret = -EPROBE_DEFER;
goto fail;
}
+ put_device(&ssi_pdev->dev);
codec_dev = of_find_i2c_device_by_node(codec_np);
if (!codec_dev) {
dev_err(&pdev->dev, "failed to find codec platform device\n");
--
2.19.1
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH AUTOSEL 4.19 133/192] ASoC: fsl-asoc-card: fix object reference leaks in fsl_asoc_card_probe
[not found] <20190327181025.13507-1-sashal@kernel.org>
` (3 preceding siblings ...)
2019-03-27 18:08 ` [PATCH AUTOSEL 4.19 083/192] SoC: imx-sgtl5000: add missing put_device() Sasha Levin
@ 2019-03-27 18:09 ` Sasha Levin
2019-03-27 18:09 ` [PATCH AUTOSEL 4.19 148/192] powerpc/64s: Clear on-stack exception marker upon exception return Sasha Levin
2019-03-27 18:09 ` [PATCH AUTOSEL 4.19 152/192] powerpc/pseries: Perform full re-add of CPU for topology update post-migration Sasha Levin
6 siblings, 0 replies; 7+ messages in thread
From: Sasha Levin @ 2019-03-27 18:09 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Sasha Levin, alsa-devel, linuxppc-dev, Timur Tabi, Xiubo Li,
wen yang, Takashi Iwai, Liam Girdwood, Jaroslav Kysela,
Nicolin Chen, Mark Brown, Wen Yang, Fabio Estevam
From: wen yang <yellowriver2010@hotmail.com>
[ Upstream commit 11907e9d3533648615db08140e3045b829d2c141 ]
The of_find_device_by_node() takes a reference to the underlying device
structure, we should release that reference.
Signed-off-by: Wen Yang <yellowriver2010@hotmil.com>
Cc: Timur Tabi <timur@kernel.org>
Cc: Nicolin Chen <nicoleotsuka@gmail.com>
Cc: Xiubo Li <Xiubo.Lee@gmail.com>
Cc: Fabio Estevam <festevam@gmail.com>
Cc: Liam Girdwood <lgirdwood@gmail.com>
Cc: Mark Brown <broonie@kernel.org>
Cc: Jaroslav Kysela <perex@perex.cz>
Cc: Takashi Iwai <tiwai@suse.com>
Cc: alsa-devel@alsa-project.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
sound/soc/fsl/fsl-asoc-card.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/sound/soc/fsl/fsl-asoc-card.c b/sound/soc/fsl/fsl-asoc-card.c
index 44433b20435c..600d9be9706e 100644
--- a/sound/soc/fsl/fsl-asoc-card.c
+++ b/sound/soc/fsl/fsl-asoc-card.c
@@ -689,6 +689,7 @@ static int fsl_asoc_card_probe(struct platform_device *pdev)
asrc_fail:
of_node_put(asrc_np);
of_node_put(codec_np);
+ put_device(&cpu_pdev->dev);
fail:
of_node_put(cpu_np);
--
2.19.1
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH AUTOSEL 4.19 148/192] powerpc/64s: Clear on-stack exception marker upon exception return
[not found] <20190327181025.13507-1-sashal@kernel.org>
` (4 preceding siblings ...)
2019-03-27 18:09 ` [PATCH AUTOSEL 4.19 133/192] ASoC: fsl-asoc-card: fix object reference leaks in fsl_asoc_card_probe Sasha Levin
@ 2019-03-27 18:09 ` Sasha Levin
2019-03-27 18:09 ` [PATCH AUTOSEL 4.19 152/192] powerpc/pseries: Perform full re-add of CPU for topology update post-migration Sasha Levin
6 siblings, 0 replies; 7+ messages in thread
From: Sasha Levin @ 2019-03-27 18:09 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Sasha Levin, Joe Lawrence, linuxppc-dev, Nicolai Stange
From: Nicolai Stange <nstange@suse.de>
[ Upstream commit eddd0b332304d554ad6243942f87c2fcea98c56b ]
The ppc64 specific implementation of the reliable stacktracer,
save_stack_trace_tsk_reliable(), bails out and reports an "unreliable
trace" whenever it finds an exception frame on the stack. Stack frames
are classified as exception frames if the STACK_FRAME_REGS_MARKER
magic, as written by exception prologues, is found at a particular
location.
However, as observed by Joe Lawrence, it is possible in practice that
non-exception stack frames can alias with prior exception frames and
thus, that the reliable stacktracer can find a stale
STACK_FRAME_REGS_MARKER on the stack. It in turn falsely reports an
unreliable stacktrace and blocks any live patching transition to
finish. Said condition lasts until the stack frame is
overwritten/initialized by function call or other means.
In principle, we could mitigate this by making the exception frame
classification condition in save_stack_trace_tsk_reliable() stronger:
in addition to testing for STACK_FRAME_REGS_MARKER, we could also take
into account that for all exceptions executing on the kernel stack
- their stack frames's backlink pointers always match what is saved
in their pt_regs instance's ->gpr[1] slot and that
- their exception frame size equals STACK_INT_FRAME_SIZE, a value
uncommonly large for non-exception frames.
However, while these are currently true, relying on them would make
the reliable stacktrace implementation more sensitive towards future
changes in the exception entry code. Note that false negatives, i.e.
not detecting exception frames, would silently break the live patching
consistency model.
Furthermore, certain other places (diagnostic stacktraces, perf, xmon)
rely on STACK_FRAME_REGS_MARKER as well.
Make the exception exit code clear the on-stack
STACK_FRAME_REGS_MARKER for those exceptions running on the "normal"
kernel stack and returning to kernelspace: because the topmost frame
is ignored by the reliable stack tracer anyway, returns to userspace
don't need to take care of clearing the marker.
Furthermore, as I don't have the ability to test this on Book 3E or 32
bits, limit the change to Book 3S and 64 bits.
Fixes: df78d3f61480 ("powerpc/livepatch: Implement reliable stack tracing for the consistency model")
Reported-by: Joe Lawrence <joe.lawrence@redhat.com>
Signed-off-by: Nicolai Stange <nstange@suse.de>
Signed-off-by: Joe Lawrence <joe.lawrence@redhat.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
arch/powerpc/kernel/entry_64.S | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S
index 2206912ea4f0..1f39ce40ae11 100644
--- a/arch/powerpc/kernel/entry_64.S
+++ b/arch/powerpc/kernel/entry_64.S
@@ -989,6 +989,13 @@ END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR)
ld r2,_NIP(r1)
mtspr SPRN_SRR0,r2
+ /*
+ * Leaving a stale exception_marker on the stack can confuse
+ * the reliable stack unwinder later on. Clear it.
+ */
+ li r2,0
+ std r2,STACK_FRAME_OVERHEAD-16(r1)
+
ld r0,GPR0(r1)
ld r2,GPR2(r1)
ld r3,GPR3(r1)
--
2.19.1
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH AUTOSEL 4.19 152/192] powerpc/pseries: Perform full re-add of CPU for topology update post-migration
[not found] <20190327181025.13507-1-sashal@kernel.org>
` (5 preceding siblings ...)
2019-03-27 18:09 ` [PATCH AUTOSEL 4.19 148/192] powerpc/64s: Clear on-stack exception marker upon exception return Sasha Levin
@ 2019-03-27 18:09 ` Sasha Levin
6 siblings, 0 replies; 7+ messages in thread
From: Sasha Levin @ 2019-03-27 18:09 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Sasha Levin, Nathan Fontenot, Michael W . Bringmann, linuxppc-dev
From: Nathan Fontenot <nfont@linux.vnet.ibm.com>
[ Upstream commit 81b61324922c67f73813d8a9c175f3c153f6a1c6 ]
On pseries systems, performing a partition migration can result in
altering the nodes a CPU is assigned to on the destination system. For
exampl, pre-migration on the source system CPUs are in node 1 and 3,
post-migration on the destination system CPUs are in nodes 2 and 3.
Handling the node change for a CPU can cause corruption in the slab
cache if we hit a timing where a CPUs node is changed while cache_reap()
is invoked. The corruption occurs because the slab cache code appears
to rely on the CPU and slab cache pages being on the same node.
The current dynamic updating of a CPUs node done in arch/powerpc/mm/numa.c
does not prevent us from hitting this scenario.
Changing the device tree property update notification handler that
recognizes an affinity change for a CPU to do a full DLPAR remove and
add of the CPU instead of dynamically changing its node resolves this
issue.
Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: Michael W. Bringmann <mwb@linux.vnet.ibm.com>
Tested-by: Michael W. Bringmann <mwb@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
arch/powerpc/include/asm/topology.h | 2 ++
arch/powerpc/mm/numa.c | 9 +--------
arch/powerpc/platforms/pseries/hotplug-cpu.c | 19 +++++++++++++++++++
3 files changed, 22 insertions(+), 8 deletions(-)
diff --git a/arch/powerpc/include/asm/topology.h b/arch/powerpc/include/asm/topology.h
index a4a718dbfec6..f85e2b01c3df 100644
--- a/arch/powerpc/include/asm/topology.h
+++ b/arch/powerpc/include/asm/topology.h
@@ -132,6 +132,8 @@ static inline void shared_proc_topology_init(void) {}
#define topology_sibling_cpumask(cpu) (per_cpu(cpu_sibling_map, cpu))
#define topology_core_cpumask(cpu) (per_cpu(cpu_core_map, cpu))
#define topology_core_id(cpu) (cpu_to_core_id(cpu))
+
+int dlpar_cpu_readd(int cpu);
#endif
#endif
diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
index 5500e4edabc6..10fb43efef50 100644
--- a/arch/powerpc/mm/numa.c
+++ b/arch/powerpc/mm/numa.c
@@ -1461,13 +1461,6 @@ static void reset_topology_timer(void)
#ifdef CONFIG_SMP
-static void stage_topology_update(int core_id)
-{
- cpumask_or(&cpu_associativity_changes_mask,
- &cpu_associativity_changes_mask, cpu_sibling_mask(core_id));
- reset_topology_timer();
-}
-
static int dt_update_callback(struct notifier_block *nb,
unsigned long action, void *data)
{
@@ -1480,7 +1473,7 @@ static int dt_update_callback(struct notifier_block *nb,
!of_prop_cmp(update->prop->name, "ibm,associativity")) {
u32 core_id;
of_property_read_u32(update->dn, "reg", &core_id);
- stage_topology_update(core_id);
+ rc = dlpar_cpu_readd(core_id);
rc = NOTIFY_OK;
}
break;
diff --git a/arch/powerpc/platforms/pseries/hotplug-cpu.c b/arch/powerpc/platforms/pseries/hotplug-cpu.c
index 6ef77caf7bcf..1d3f9313c02f 100644
--- a/arch/powerpc/platforms/pseries/hotplug-cpu.c
+++ b/arch/powerpc/platforms/pseries/hotplug-cpu.c
@@ -802,6 +802,25 @@ static int dlpar_cpu_add_by_count(u32 cpus_to_add)
return rc;
}
+int dlpar_cpu_readd(int cpu)
+{
+ struct device_node *dn;
+ struct device *dev;
+ u32 drc_index;
+ int rc;
+
+ dev = get_cpu_device(cpu);
+ dn = dev->of_node;
+
+ rc = of_property_read_u32(dn, "ibm,my-drc-index", &drc_index);
+
+ rc = dlpar_cpu_remove_by_index(drc_index);
+ if (!rc)
+ rc = dlpar_cpu_add(drc_index);
+
+ return rc;
+}
+
int dlpar_cpu(struct pseries_hp_errorlog *hp_elog)
{
u32 count, drc_index;
--
2.19.1
^ permalink raw reply related [flat|nested] 7+ messages in thread