* [Bug 203837] New: Booting kernel under KVM immediately freezes host
From: bugzilla-daemon @ 2019-06-06 22:59 UTC (permalink / raw)
To: linuxppc-dev
https://bugzilla.kernel.org/show_bug.cgi?id=203837
Bug ID: 203837
Summary: Booting kernel under KVM immediately freezes host
Product: Platform Specific/Hardware
Version: 2.5
Kernel Version: v5.2-rc2
Hardware: PPC-64
OS: Linux
Tree: Mainline
Status: NEW
Severity: blocking
Priority: P1
Component: PPC-64
Assignee: platform_ppc-64@kernel-bugs.osdl.org
Reporter: shawn@anastas.io
Regression: No
Created attachment 283133
--> https://bugzilla.kernel.org/attachment.cgi?id=283133&action=edit
Guest kernel config
When booting kernel v5.2-rc2 (and confirmed up to 156c05917) in a VM on a
POWER9 host running kernel 5.1.7, the host immediately locks up and
becomes unresponsive to the point of requiring a hard reset.
The last guest kernel message printed to the screen before the
host locks up is:
[ 0.013940] smp: Bringing up secondary CPUs ...
Due to the nature of the bug, it is very difficult to bisect, since a manual
host reset is required each time the bug is encountered. Also, my only
POWER machine is my primary workstation.
The bug has also been confirmed on other host kernel versions (down to 5.0.x).
When downgrading the guest kernel to 5.1.0, the issue is not present.
The guest kernel .config is attached.
--
You are receiving this mail because:
You are watching the assignee of the bug.
^ permalink raw reply
* [RFC/RFT PATCH] Revert "ASoC: fsl_esai: ETDR and TX0~5 registers are non volatile"
From: Nicolin Chen @ 2019-06-06 23:01 UTC (permalink / raw)
To: shengjiu.wang, broonie
Cc: alsa-devel, timur, lgirdwood, linuxppc-dev, Xiubo.Lee, tiwai,
perex, festevam, linux-kernel
This reverts commit 8973112aa41b8ad956a5b47f2fe17bc2a5cf2645.
ETDR and TX0~5 are TX data registers. There are a couple of reasons
to revert the change:
1) Though ETDR and TX0~5 are not volatile but write-only registers,
they should not be cached either. According to the definition of
"volatile_reg", one should be put in the volatile list if it can
not be cached.
2) When doing regcache_sync(), the operation may accidentally write
some "dirty" data into these registers, in case that cached data
happen to be different from the default ones. It may also result
in a channel shift/swap situation since the number of write-via-
sync operations at ETDR would unlikely match the channel number.
Note: this revert is not a complete revert as it keeps those macros
of registers remaining in the default value list while the original
commit also changed other entries in the list. And this patch isn't
very necessary to Cc stable tree since there has been always a FIFO
reset operation around the regcache_sync() call, even prior to this
reverted commit.
Signed-off-by: Nicolin Chen <nicoleotsuka@gmail.com>
Cc: Shengjiu Wang <shengjiu.wang@nxp.com>
---
Hi Mark,
In case there's no objection against the patch, I'd still like to
wait for a Tested-by from NXP folks before submitting it. Thanks!
sound/soc/fsl/fsl_esai.c | 14 +++++++-------
1 file changed, 7 insertions(+), 7 deletions(-)
diff --git a/sound/soc/fsl/fsl_esai.c b/sound/soc/fsl/fsl_esai.c
index 10d2210c91ef..8f0a86335f73 100644
--- a/sound/soc/fsl/fsl_esai.c
+++ b/sound/soc/fsl/fsl_esai.c
@@ -652,16 +652,9 @@ static const struct snd_soc_component_driver fsl_esai_component = {
};
static const struct reg_default fsl_esai_reg_defaults[] = {
- {REG_ESAI_ETDR, 0x00000000},
{REG_ESAI_ECR, 0x00000000},
{REG_ESAI_TFCR, 0x00000000},
{REG_ESAI_RFCR, 0x00000000},
- {REG_ESAI_TX0, 0x00000000},
- {REG_ESAI_TX1, 0x00000000},
- {REG_ESAI_TX2, 0x00000000},
- {REG_ESAI_TX3, 0x00000000},
- {REG_ESAI_TX4, 0x00000000},
- {REG_ESAI_TX5, 0x00000000},
{REG_ESAI_TSR, 0x00000000},
{REG_ESAI_SAICR, 0x00000000},
{REG_ESAI_TCR, 0x00000000},
@@ -711,10 +704,17 @@ static bool fsl_esai_readable_reg(struct device *dev, unsigned int reg)
static bool fsl_esai_volatile_reg(struct device *dev, unsigned int reg)
{
switch (reg) {
+ case REG_ESAI_ETDR:
case REG_ESAI_ERDR:
case REG_ESAI_ESR:
case REG_ESAI_TFSR:
case REG_ESAI_RFSR:
+ case REG_ESAI_TX0:
+ case REG_ESAI_TX1:
+ case REG_ESAI_TX2:
+ case REG_ESAI_TX3:
+ case REG_ESAI_TX4:
+ case REG_ESAI_TX5:
case REG_ESAI_RX0:
case REG_ESAI_RX1:
case REG_ESAI_RX2:
--
2.17.1
^ permalink raw reply related
* [Bug 203839] New: Kernel 5.2-rc3 fails to boot on a PowerMac G4 3, 6: systemd[1]: Failed to bump fs.file-max, ignoring: invalid argument
From: bugzilla-daemon @ 2019-06-07 0:00 UTC (permalink / raw)
To: linuxppc-dev
https://bugzilla.kernel.org/show_bug.cgi?id=203839
Bug ID: 203839
Summary: Kernel 5.2-rc3 fails to boot on a PowerMac G4 3,6:
systemd[1]: Failed to bump fs.file-max, ignoring:
invalid argument
Product: Platform Specific/Hardware
Version: 2.5
Kernel Version: 5.2-rc3
Hardware: PPC-32
OS: Linux
Tree: Mainline
Status: NEW
Severity: normal
Priority: P1
Component: PPC-32
Assignee: platform_ppc-32@kernel-bugs.osdl.org
Reporter: erhard_f@mailbox.org
Regression: No
Created attachment 283135
--> https://bugzilla.kernel.org/attachment.cgi?id=283135&action=edit
failed boot, screenshot 5.2-rc3
The system boots fine with kernel 5.1.7. Starting with 5.2-rc1 the G4 got
problems to correctly finish booting. With 5.2-rc3 basic boot process seems to
complete, but crashes when handing control over to systemd:
systemd[1]: Failed to bump fs.file-max, ignoring: invalid argument
systemd[1]: segfault (11) at 0 nip 0 ir 0 code 1
systemd[1]: Bad NIP, not dumping instructions
[...]
For more details see the screenshot. Kernel 5.2-rc1 errors out even earlier
(see screenshot) with a different error. Also this problem maybe is 32bit
specific. Tried 5.2-rc3 on a PowerMac G5 which boots successfully without
problems.
root is ext4, boot is ext2, systemd is v241.
--
You are receiving this mail because:
You are watching the assignee of the bug.
^ permalink raw reply
* [Bug 203839] Kernel 5.2-rc3 fails to boot on a PowerMac G4 3,6: systemd[1]: Failed to bump fs.file-max, ignoring: invalid argument
From: bugzilla-daemon @ 2019-06-07 0:01 UTC (permalink / raw)
To: linuxppc-dev
In-Reply-To: <bug-203839-206035@https.bugzilla.kernel.org/>
https://bugzilla.kernel.org/show_bug.cgi?id=203839
--- Comment #1 from Erhard F. (erhard_f@mailbox.org) ---
Created attachment 283137
--> https://bugzilla.kernel.org/attachment.cgi?id=283137&action=edit
failed boot, screenshot 5.2-rc1
--
You are receiving this mail because:
You are watching the assignee of the bug.
^ permalink raw reply
* [Bug 203839] Kernel 5.2-rc3 fails to boot on a PowerMac G4 3,6: systemd[1]: Failed to bump fs.file-max, ignoring: invalid argument
From: bugzilla-daemon @ 2019-06-07 0:03 UTC (permalink / raw)
To: linuxppc-dev
In-Reply-To: <bug-203839-206035@https.bugzilla.kernel.org/>
https://bugzilla.kernel.org/show_bug.cgi?id=203839
--- Comment #2 from Erhard F. (erhard_f@mailbox.org) ---
Created attachment 283139
--> https://bugzilla.kernel.org/attachment.cgi?id=283139&action=edit
kernel .config (5.2-rc3, G4 MDD)
--
You are receiving this mail because:
You are watching the assignee of the bug.
^ permalink raw reply
* Re: [PATCH] Powerpc/Watchpoint: Restore nvgprs while returning from exception
From: Michael Neuling @ 2019-06-07 0:50 UTC (permalink / raw)
To: Ravi Bangoria, mpe; +Cc: linux-kernel, npiggin, paulus, mahesh, linuxppc-dev
In-Reply-To: <20190606072951.32116-1-ravi.bangoria@linux.ibm.com>
On Thu, 2019-06-06 at 12:59 +0530, Ravi Bangoria wrote:
> Powerpc hw triggers watchpoint before executing the instruction.
> To make trigger-after-execute behavior, kernel emulates the
> instruction. If the instruction is 'load something into non-
> volatile register', exception handler should restore emulated
> register state while returning back, otherwise there will be
> register state corruption. Ex, Adding a watchpoint on a list
> can corrput the list:
>
> # cat /proc/kallsyms | grep kthread_create_list
> c00000000121c8b8 d kthread_create_list
>
> Add watchpoint on kthread_create_list->next:
>
> # perf record -e mem:0xc00000000121c8c0
>
> Run some workload such that new kthread gets invoked. Ex, I
> just logged out from console:
>
> list_add corruption. next->prev should be prev (c000000001214e00), \
> but was c00000000121c8b8. (next=c00000000121c8b8).
> WARNING: CPU: 59 PID: 309 at lib/list_debug.c:25 __list_add_valid+0xb4/0xc0
> CPU: 59 PID: 309 Comm: kworker/59:0 Kdump: loaded Not tainted 5.1.0-rc7+ #69
> ...
> NIP __list_add_valid+0xb4/0xc0
> LR __list_add_valid+0xb0/0xc0
> Call Trace:
> __list_add_valid+0xb0/0xc0 (unreliable)
> __kthread_create_on_node+0xe0/0x260
> kthread_create_on_node+0x34/0x50
> create_worker+0xe8/0x260
> worker_thread+0x444/0x560
> kthread+0x160/0x1a0
> ret_from_kernel_thread+0x5c/0x70
>
> Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
How long has this been around? Should we be CCing stable?
Mikey
> ---
> arch/powerpc/kernel/exceptions-64s.S | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/arch/powerpc/kernel/exceptions-64s.S
> b/arch/powerpc/kernel/exceptions-64s.S
> index 9481a11..96de0d1 100644
> --- a/arch/powerpc/kernel/exceptions-64s.S
> +++ b/arch/powerpc/kernel/exceptions-64s.S
> @@ -1753,7 +1753,7 @@ handle_dabr_fault:
> ld r5,_DSISR(r1)
> addi r3,r1,STACK_FRAME_OVERHEAD
> bl do_break
> -12: b ret_from_except_lite
> +12: b ret_from_except
>
>
> #ifdef CONFIG_PPC_BOOK3S_64
^ permalink raw reply
* Re: [PATCH kernel v3 0/3] powerpc/ioda2: Yet another attempt to allow DMA masks between 32 and 59
From: Alistair Popple @ 2019-06-07 1:41 UTC (permalink / raw)
To: Oliver
Cc: Alexey Kardashevskiy, Shawn Anastasio, David Gibson, linuxppc-dev,
Sam Bobroff
In-Reply-To: <CAOSf1CEKwFHLHLC+CAiEiH=9v+hfRgTSuNUH3hXR4eDyQM1G9g@mail.gmail.com>
On Thursday, 6 June 2019 10:07:54 PM AEST Oliver wrote:
> On Thu, Jun 6, 2019 at 5:17 PM Alistair Popple <alistair@popple.id.au>
wrote:
> > I have been hitting EEH address errors testing this with some network
> > cards which map/unmap DMA addresses more frequently. For example:
> >
> > PHB4 PHB#5 Diag-data (Version: 1)
> > brdgCtl: 00000002
> > RootSts: 00060020 00402000 a0220008 00100107 00000800
> > PhbSts: 0000001c00000000 0000001c00000000
> > Lem: 0000000100000080 0000000000000000 0000000000000080
> > PhbErr: 0000028000000000 0000020000000000 2148000098000240
> > a008400000000000 RxeTceErr: 2000000000000000 2000000000000000
> > c000000000000000 0000000000000000 PblErr: 0000000000020000
> > 0000000000020000 0000000000000000 0000000000000000 RegbErr:
> > 0000004000000000 0000004000000000 61000c4800000000 0000000000000000
> > PE[000] A/B: 8300b03800000000 8000000000000000
> >
> > Interestingly the PE[000] A/B data is the same across different cards
> > and drivers.
>
> TCE page fault due to permissions so odds are the DMA address was unmapped.
>
> What cards did you get this with? I tried with one of the common
> BCM5719 NICs and generated network traffic by using rsync to copy a
> linux git tree to the system and it worked fine.
Personally I've seen it with the BCM5719 with the driver modified to set a DMA
mask of 48 bits instead of 64 and using scp to copy a random 1GB file to the
system repeatedly until it crashes.
I have also had reports of someone hitting the same error using a Mellanox
CX-5 adaptor with a similar driver modification.
- Alistair
^ permalink raw reply
* Re: [PATCH] Powerpc/Watchpoint: Restore nvgprs while returning from exception
From: Ravi Bangoria @ 2019-06-07 3:17 UTC (permalink / raw)
To: Michael Neuling, mpe
Cc: Ravi Bangoria, linux-kernel, npiggin, paulus, mahesh,
linuxppc-dev
In-Reply-To: <80cfc8d7327d3bb744ea1f7e2843943a998d48de.camel@neuling.org>
On 6/7/19 6:20 AM, Michael Neuling wrote:
> On Thu, 2019-06-06 at 12:59 +0530, Ravi Bangoria wrote:
>> Powerpc hw triggers watchpoint before executing the instruction.
>> To make trigger-after-execute behavior, kernel emulates the
>> instruction. If the instruction is 'load something into non-
>> volatile register', exception handler should restore emulated
>> register state while returning back, otherwise there will be
>> register state corruption. Ex, Adding a watchpoint on a list
>> can corrput the list:
>>
>> # cat /proc/kallsyms | grep kthread_create_list
>> c00000000121c8b8 d kthread_create_list
>>
>> Add watchpoint on kthread_create_list->next:
>>
>> # perf record -e mem:0xc00000000121c8c0
>>
>> Run some workload such that new kthread gets invoked. Ex, I
>> just logged out from console:
>>
>> list_add corruption. next->prev should be prev (c000000001214e00), \
>> but was c00000000121c8b8. (next=c00000000121c8b8).
>> WARNING: CPU: 59 PID: 309 at lib/list_debug.c:25 __list_add_valid+0xb4/0xc0
>> CPU: 59 PID: 309 Comm: kworker/59:0 Kdump: loaded Not tainted 5.1.0-rc7+ #69
>> ...
>> NIP __list_add_valid+0xb4/0xc0
>> LR __list_add_valid+0xb0/0xc0
>> Call Trace:
>> __list_add_valid+0xb0/0xc0 (unreliable)
>> __kthread_create_on_node+0xe0/0x260
>> kthread_create_on_node+0x34/0x50
>> create_worker+0xe8/0x260
>> worker_thread+0x444/0x560
>> kthread+0x160/0x1a0
>> ret_from_kernel_thread+0x5c/0x70
>>
>> Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
>
> How long has this been around? Should we be CCing stable?
"bl .save_nvgprs" was added in the commit 5aae8a5370802 ("powerpc, hw_breakpoints:
Implement hw_breakpoints for 64-bit server processors"), which was merged in
v2.6.36.
^ permalink raw reply
* [PATCH 1/2] powerpc/64s: Fix THP PMD collapse serialisation
From: Nicholas Piggin @ 2019-06-07 3:56 UTC (permalink / raw)
To: linuxppc-dev; +Cc: Aneesh Kumar K . V, Nicholas Piggin
Commit 1b2443a547f9 ("powerpc/book3s64: Avoid multiple endian conversion
in pte helpers") changed the actual bitwise tests in pte_access_permitted
by using pte_write() and pte_present() helpers rather than raw bitwise
testing _PAGE_WRITE and _PAGE_PRESENT bits.
The pte_present change now returns true for ptes which are !_PAGE_PRESENT
and _PAGE_INVALID, which is the combination used by pmdp_invalidate to
synchronize access from lock-free lookups. pte_access_permitted is used by
pmd_access_permitted, so allowing GUP lock free access to proceed with
such PTEs breaks this synchronisation.
This bug has been observed on HPT host, with random crashes and corruption
in guests, usually together with bad PMD messages in the host.
Fix this by adding an explicit check in pmd_access_permitted, and
documenting the condition explicitly.
The pte_write() change should be okay, and would prevent GUP from falling
back to the slow path when encountering savedwrite ptes, which matches
what x86 (that does not implement savedwrite) does.
Fixes: 1b2443a547f9 ("powerpc/book3s64: Avoid multiple endian conversion in pte helpers")
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
I accounted for Aneesh's and Christophe's feedback, except I couldn't
find a good way to replace the ifdef with IS_ENABLED because of
_PAGE_INVALID etc., but at least cleaned that up a bit nicer.
Patch 1 solves a problem I can hit quite reliably running HPT/HPT KVM.
Patch 2 was noticed by Aneesh when inspecting code for similar bugs.
They should probably both be merged in stable kernels after upstream.
arch/powerpc/include/asm/book3s/64/pgtable.h | 30 ++++++++++++++++++++
arch/powerpc/mm/book3s64/pgtable.c | 3 ++
2 files changed, 33 insertions(+)
diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h b/arch/powerpc/include/asm/book3s/64/pgtable.h
index 7dede2e34b70..ccf00a8b98c6 100644
--- a/arch/powerpc/include/asm/book3s/64/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
@@ -876,6 +876,23 @@ static inline int pmd_present(pmd_t pmd)
return false;
}
+static inline int pmd_is_serializing(pmd_t pmd)
+{
+ /*
+ * If the pmd is undergoing a split, the _PAGE_PRESENT bit is clear
+ * and _PAGE_INVALID is set (see pmd_present, pmdp_invalidate).
+ *
+ * This condition may also occur when flushing a pmd while flushing
+ * it (see ptep_modify_prot_start), so callers must ensure this
+ * case is fine as well.
+ */
+ if ((pmd_raw(pmd) & cpu_to_be64(_PAGE_PRESENT | _PAGE_INVALID)) ==
+ cpu_to_be64(_PAGE_INVALID))
+ return true;
+
+ return false;
+}
+
static inline int pmd_bad(pmd_t pmd)
{
if (radix_enabled())
@@ -1092,6 +1109,19 @@ static inline int pmd_protnone(pmd_t pmd)
#define pmd_access_permitted pmd_access_permitted
static inline bool pmd_access_permitted(pmd_t pmd, bool write)
{
+ /*
+ * pmdp_invalidate sets this combination (which is not caught by
+ * !pte_present() check in pte_access_permitted), to prevent
+ * lock-free lookups, as part of the serialize_against_pte_lookup()
+ * synchronisation.
+ *
+ * This also catches the case where the PTE's hardware PRESENT bit is
+ * cleared while TLB is flushed, which is suboptimal but should not
+ * be frequent.
+ */
+ if (pmd_is_serializing(pmd))
+ return false;
+
return pte_access_permitted(pmd_pte(pmd), write);
}
diff --git a/arch/powerpc/mm/book3s64/pgtable.c b/arch/powerpc/mm/book3s64/pgtable.c
index 16bda049187a..ff98b663c83e 100644
--- a/arch/powerpc/mm/book3s64/pgtable.c
+++ b/arch/powerpc/mm/book3s64/pgtable.c
@@ -116,6 +116,9 @@ pmd_t pmdp_invalidate(struct vm_area_struct *vma, unsigned long address,
/*
* This ensures that generic code that rely on IRQ disabling
* to prevent a parallel THP split work as expected.
+ *
+ * Marking the entry with _PAGE_INVALID && ~_PAGE_PRESENT requires
+ * a special case check in pmd_access_permitted.
*/
serialize_against_pte_lookup(vma->vm_mm);
return __pmd(old_pmd);
--
2.20.1
^ permalink raw reply related
* [PATCH 2/2] powerpc/64s: __find_linux_pte synchronization vs pmdp_invalidate
From: Nicholas Piggin @ 2019-06-07 3:56 UTC (permalink / raw)
To: linuxppc-dev; +Cc: Aneesh Kumar K . V, Nicholas Piggin
In-Reply-To: <20190607035636.5446-1-npiggin@gmail.com>
The change to pmdp_invalidate to mark the pmd with _PAGE_INVALID broke
the synchronisation against lock free lookups, __find_linux_pte's
pmd_none check no longer returns true for such cases.
Fix this by adding a check for this condition as well.
Fixes: da7ad366b497 ("powerpc/mm/book3s: Update pmd_present to look at _PAGE_PRESENT bit")
Cc: Christophe Leroy <christophe.leroy@c-s.fr>
Suggested-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
arch/powerpc/mm/pgtable.c | 16 ++++++++++++++--
1 file changed, 14 insertions(+), 2 deletions(-)
diff --git a/arch/powerpc/mm/pgtable.c b/arch/powerpc/mm/pgtable.c
index db4a6253df92..533fc6fa6726 100644
--- a/arch/powerpc/mm/pgtable.c
+++ b/arch/powerpc/mm/pgtable.c
@@ -372,13 +372,25 @@ pte_t *__find_linux_pte(pgd_t *pgdir, unsigned long ea,
pdshift = PMD_SHIFT;
pmdp = pmd_offset(&pud, ea);
pmd = READ_ONCE(*pmdp);
+
/*
- * A hugepage collapse is captured by pmd_none, because
- * it mark the pmd none and do a hpte invalidate.
+ * A hugepage collapse is captured by this condition, see
+ * pmdp_collapse_flush.
*/
if (pmd_none(pmd))
return NULL;
+#ifdef CONFIG_PPC_BOOK3S_64
+ /*
+ * A hugepage split is captured by this condition, see
+ * pmdp_invalidate.
+ *
+ * Huge page modification can be caught here too.
+ */
+ if (pmd_is_serializing(pmd))
+ return NULL;
+#endif
+
if (pmd_trans_huge(pmd) || pmd_devmap(pmd)) {
if (is_thp)
*is_thp = true;
--
2.20.1
^ permalink raw reply related
* Re: [PATCH] powerpc/64s: Fix THP PMD collapse serialisation
From: Nicholas Piggin @ 2019-06-07 4:07 UTC (permalink / raw)
To: Aneesh Kumar K.V, linuxppc-dev
In-Reply-To: <8736kmhh62.fsf@linux.ibm.com>
Aneesh Kumar K.V's on June 7, 2019 1:23 am:
> Nicholas Piggin <npiggin@gmail.com> writes:
>
>> Commit 1b2443a547f9 ("powerpc/book3s64: Avoid multiple endian conversion
>> in pte helpers") changed the actual bitwise tests in pte_access_permitted
>> by using pte_write() and pte_present() helpers rather than raw bitwise
>> testing _PAGE_WRITE and _PAGE_PRESENT bits.
>>
>> The pte_present change now returns true for ptes which are !_PAGE_PRESENT
>> and _PAGE_INVALID, which is the combination used by pmdp_invalidate to
>> synchronize access from lock-free lookups. pte_access_permitted is used by
>> pmd_access_permitted, so allowing GUP lock free access to proceed with
>> such PTEs breaks this synchronisation.
>>
>> This bug has been observed on HPT host, with random crashes and corruption
>> in guests, usually together with bad PMD messages in the host.
>>
>> Fix this by adding an explicit check in pmd_access_permitted, and
>> documenting the condition explicitly.
>>
>> The pte_write() change should be okay, and would prevent GUP from falling
>> back to the slow path when encountering savedwrite ptes, which matches
>> what x86 (that does not implement savedwrite) does.
>>
>
> I guess we are doing the find_linux_pte change in another patch.
>
> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Sorry, just got delayed with re-testing. Thanks for the feedback on it
I send new patches.
Two patches yes because they fix issues introduced in different
commits so it should make backports easier.
Thanks,
Nick
^ permalink raw reply
* Re: [PATCH v3 7/9] KVM: PPC: Ultravisor: Restrict LDBAR access
From: Madhavan Srinivasan @ 2019-06-07 4:48 UTC (permalink / raw)
To: Claudio Carvalho, linuxppc-dev
Cc: Thiago Bauermann, Michael Anderson, Ram Pai, kvm-ppc,
Bharata B Rao, Sukadev Bhattiprolu, Anshuman Khandual
In-Reply-To: <20190606173614.32090-8-cclaudio@linux.ibm.com>
On 06/06/19 11:06 PM, Claudio Carvalho wrote:
> When the ultravisor firmware is available, it takes control over the
> LDBAR register. In this case, thread-imc updates and save/restore
> operations on the LDBAR register are handled by ultravisor.
>
> Signed-off-by: Claudio Carvalho <cclaudio@linux.ibm.com>
> Signed-off-by: Ram Pai <linuxram@us.ibm.com>
> ---
> arch/powerpc/kvm/book3s_hv_rmhandlers.S | 2 ++
> arch/powerpc/platforms/powernv/idle.c | 6 ++++--
> arch/powerpc/platforms/powernv/opal-imc.c | 7 +++++++
> 3 files changed, 13 insertions(+), 2 deletions(-)
>
> diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> index f9b2620fbecd..cffb365d9d02 100644
> --- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> +++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> @@ -375,8 +375,10 @@ BEGIN_FTR_SECTION
> mtspr SPRN_RPR, r0
> ld r0, KVM_SPLIT_PMMAR(r6)
> mtspr SPRN_PMMAR, r0
> +BEGIN_FW_FTR_SECTION_NESTED(70)
> ld r0, KVM_SPLIT_LDBAR(r6)
> mtspr SPRN_LDBAR, r0
> +END_FW_FTR_SECTION_NESTED(FW_FEATURE_ULTRAVISOR, 0, 70)
> isync
> FTR_SECTION_ELSE
> /* On P9 we use the split_info for coordinating LPCR changes */
> diff --git a/arch/powerpc/platforms/powernv/idle.c b/arch/powerpc/platforms/powernv/idle.c
> index c9133f7908ca..fd62435e3267 100644
> --- a/arch/powerpc/platforms/powernv/idle.c
> +++ b/arch/powerpc/platforms/powernv/idle.c
> @@ -679,7 +679,8 @@ static unsigned long power9_idle_stop(unsigned long psscr, bool mmu_on)
> sprs.ptcr = mfspr(SPRN_PTCR);
> sprs.rpr = mfspr(SPRN_RPR);
> sprs.tscr = mfspr(SPRN_TSCR);
> - sprs.ldbar = mfspr(SPRN_LDBAR);
> + if (!firmware_has_feature(FW_FEATURE_ULTRAVISOR))
> + sprs.ldbar = mfspr(SPRN_LDBAR);
>
> sprs_saved = true;
>
> @@ -762,7 +763,8 @@ static unsigned long power9_idle_stop(unsigned long psscr, bool mmu_on)
> mtspr(SPRN_PTCR, sprs.ptcr);
> mtspr(SPRN_RPR, sprs.rpr);
> mtspr(SPRN_TSCR, sprs.tscr);
> - mtspr(SPRN_LDBAR, sprs.ldbar);
> + if (!firmware_has_feature(FW_FEATURE_ULTRAVISOR))
> + mtspr(SPRN_LDBAR, sprs.ldbar);
>
> if (pls >= pnv_first_tb_loss_level) {
> /* TB loss */
> diff --git a/arch/powerpc/platforms/powernv/opal-imc.c b/arch/powerpc/platforms/powernv/opal-imc.c
> index 1b6932890a73..e9b641d313fb 100644
> --- a/arch/powerpc/platforms/powernv/opal-imc.c
> +++ b/arch/powerpc/platforms/powernv/opal-imc.c
> @@ -254,6 +254,13 @@ static int opal_imc_counters_probe(struct platform_device *pdev)
> bool core_imc_reg = false, thread_imc_reg = false;
> u32 type;
>
> + /*
> + * When the Ultravisor is enabled, it is responsible for thread-imc
> + * updates
> + */
Would prefer the comment to be "Disable IMC devices, when Ultravisor is
enabled"
Rest looks good.
Acked-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
> + if (firmware_has_feature(FW_FEATURE_ULTRAVISOR))
> + return -EACCES;
> +
> /*
> * Check whether this is kdump kernel. If yes, force the engines to
> * stop and return.
^ permalink raw reply
* [PATCH] powerpc/pseries: fix oops in hotplug memory notifier
From: Nathan Lynch @ 2019-06-07 5:04 UTC (permalink / raw)
To: linuxppc-dev
During post-migration device tree updates, we can oops in
pseries_update_drconf_memory if the source device tree has an
ibm,dynamic-memory-v2 property and the destination has a
ibm,dynamic_memory (v1) property. The notifier processes an "update"
for the ibm,dynamic-memory property but it's really an add in this
scenario. So make sure the old property object is there before
dereferencing it.
Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
---
arch/powerpc/platforms/pseries/hotplug-memory.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/arch/powerpc/platforms/pseries/hotplug-memory.c b/arch/powerpc/platforms/pseries/hotplug-memory.c
index 47087832f8b2..e6bd172bcf30 100644
--- a/arch/powerpc/platforms/pseries/hotplug-memory.c
+++ b/arch/powerpc/platforms/pseries/hotplug-memory.c
@@ -980,6 +980,9 @@ static int pseries_update_drconf_memory(struct of_reconfig_data *pr)
if (!memblock_size)
return -EINVAL;
+ if (!pr->old_prop)
+ return 0;
+
p = (__be32 *) pr->old_prop->value;
if (!p)
return -EINVAL;
--
2.20.1
^ permalink raw reply related
* Re: [PATCH 1/2] powerpc/64s: Fix THP PMD collapse serialisation
From: Christophe Leroy @ 2019-06-07 5:34 UTC (permalink / raw)
To: Nicholas Piggin, linuxppc-dev; +Cc: Aneesh Kumar K . V
In-Reply-To: <20190607035636.5446-1-npiggin@gmail.com>
Le 07/06/2019 à 05:56, Nicholas Piggin a écrit :
> Commit 1b2443a547f9 ("powerpc/book3s64: Avoid multiple endian conversion
> in pte helpers") changed the actual bitwise tests in pte_access_permitted
> by using pte_write() and pte_present() helpers rather than raw bitwise
> testing _PAGE_WRITE and _PAGE_PRESENT bits.
>
> The pte_present change now returns true for ptes which are !_PAGE_PRESENT
> and _PAGE_INVALID, which is the combination used by pmdp_invalidate to
> synchronize access from lock-free lookups. pte_access_permitted is used by
> pmd_access_permitted, so allowing GUP lock free access to proceed with
> such PTEs breaks this synchronisation.
>
> This bug has been observed on HPT host, with random crashes and corruption
> in guests, usually together with bad PMD messages in the host.
>
> Fix this by adding an explicit check in pmd_access_permitted, and
> documenting the condition explicitly.
>
> The pte_write() change should be okay, and would prevent GUP from falling
> back to the slow path when encountering savedwrite ptes, which matches
> what x86 (that does not implement savedwrite) does.
>
> Fixes: 1b2443a547f9 ("powerpc/book3s64: Avoid multiple endian conversion in pte helpers")
> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
> Cc: Christophe Leroy <christophe.leroy@c-s.fr>
> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> ---
>
> I accounted for Aneesh's and Christophe's feedback, except I couldn't
> find a good way to replace the ifdef with IS_ENABLED because of
> _PAGE_INVALID etc., but at least cleaned that up a bit nicer.
I guess the standard way is to add a pmd_is_serializing() which return
always false in book3s/32/pgtable.h and in nohash/pgtable.h
>
> Patch 1 solves a problem I can hit quite reliably running HPT/HPT KVM.
> Patch 2 was noticed by Aneesh when inspecting code for similar bugs.
> They should probably both be merged in stable kernels after upstream.
>
> arch/powerpc/include/asm/book3s/64/pgtable.h | 30 ++++++++++++++++++++
> arch/powerpc/mm/book3s64/pgtable.c | 3 ++
> 2 files changed, 33 insertions(+)
>
> diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h b/arch/powerpc/include/asm/book3s/64/pgtable.h
> index 7dede2e34b70..ccf00a8b98c6 100644
> --- a/arch/powerpc/include/asm/book3s/64/pgtable.h
> +++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
> @@ -876,6 +876,23 @@ static inline int pmd_present(pmd_t pmd)
> return false;
> }
>
> +static inline int pmd_is_serializing(pmd_t pmd)
should be static inline bool instead of int ?
Christophe
> +{
> + /*
> + * If the pmd is undergoing a split, the _PAGE_PRESENT bit is clear
> + * and _PAGE_INVALID is set (see pmd_present, pmdp_invalidate).
> + *
> + * This condition may also occur when flushing a pmd while flushing
> + * it (see ptep_modify_prot_start), so callers must ensure this
> + * case is fine as well.
> + */
> + if ((pmd_raw(pmd) & cpu_to_be64(_PAGE_PRESENT | _PAGE_INVALID)) ==
> + cpu_to_be64(_PAGE_INVALID))
> + return true;
> +
> + return false;
> +}
> +
> static inline int pmd_bad(pmd_t pmd)
> {
> if (radix_enabled())
> @@ -1092,6 +1109,19 @@ static inline int pmd_protnone(pmd_t pmd)
> #define pmd_access_permitted pmd_access_permitted
> static inline bool pmd_access_permitted(pmd_t pmd, bool write)
> {
> + /*
> + * pmdp_invalidate sets this combination (which is not caught by
> + * !pte_present() check in pte_access_permitted), to prevent
> + * lock-free lookups, as part of the serialize_against_pte_lookup()
> + * synchronisation.
> + *
> + * This also catches the case where the PTE's hardware PRESENT bit is
> + * cleared while TLB is flushed, which is suboptimal but should not
> + * be frequent.
> + */
> + if (pmd_is_serializing(pmd))
> + return false;
> +
> return pte_access_permitted(pmd_pte(pmd), write);
> }
>
> diff --git a/arch/powerpc/mm/book3s64/pgtable.c b/arch/powerpc/mm/book3s64/pgtable.c
> index 16bda049187a..ff98b663c83e 100644
> --- a/arch/powerpc/mm/book3s64/pgtable.c
> +++ b/arch/powerpc/mm/book3s64/pgtable.c
> @@ -116,6 +116,9 @@ pmd_t pmdp_invalidate(struct vm_area_struct *vma, unsigned long address,
> /*
> * This ensures that generic code that rely on IRQ disabling
> * to prevent a parallel THP split work as expected.
> + *
> + * Marking the entry with _PAGE_INVALID && ~_PAGE_PRESENT requires
> + * a special case check in pmd_access_permitted.
> */
> serialize_against_pte_lookup(vma->vm_mm);
> return __pmd(old_pmd);
>
^ permalink raw reply
* Re: [PATCH 1/2] powerpc/64s: Fix THP PMD collapse serialisation
From: Aneesh Kumar K.V @ 2019-06-07 5:35 UTC (permalink / raw)
To: Nicholas Piggin, linuxppc-dev; +Cc: Nicholas Piggin
In-Reply-To: <20190607035636.5446-1-npiggin@gmail.com>
Nicholas Piggin <npiggin@gmail.com> writes:
> Commit 1b2443a547f9 ("powerpc/book3s64: Avoid multiple endian conversion
> in pte helpers") changed the actual bitwise tests in pte_access_permitted
> by using pte_write() and pte_present() helpers rather than raw bitwise
> testing _PAGE_WRITE and _PAGE_PRESENT bits.
>
> The pte_present change now returns true for ptes which are !_PAGE_PRESENT
> and _PAGE_INVALID, which is the combination used by pmdp_invalidate to
> synchronize access from lock-free lookups. pte_access_permitted is used by
> pmd_access_permitted, so allowing GUP lock free access to proceed with
> such PTEs breaks this synchronisation.
>
> This bug has been observed on HPT host, with random crashes and corruption
> in guests, usually together with bad PMD messages in the host.
>
> Fix this by adding an explicit check in pmd_access_permitted, and
> documenting the condition explicitly.
>
> The pte_write() change should be okay, and would prevent GUP from falling
> back to the slow path when encountering savedwrite ptes, which matches
> what x86 (that does not implement savedwrite) does.
>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
> Fixes: 1b2443a547f9 ("powerpc/book3s64: Avoid multiple endian conversion in pte helpers")
> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
> Cc: Christophe Leroy <christophe.leroy@c-s.fr>
> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> ---
>
> I accounted for Aneesh's and Christophe's feedback, except I couldn't
> find a good way to replace the ifdef with IS_ENABLED because of
> _PAGE_INVALID etc., but at least cleaned that up a bit nicer.
>
> Patch 1 solves a problem I can hit quite reliably running HPT/HPT KVM.
> Patch 2 was noticed by Aneesh when inspecting code for similar bugs.
> They should probably both be merged in stable kernels after upstream.
>
> arch/powerpc/include/asm/book3s/64/pgtable.h | 30 ++++++++++++++++++++
> arch/powerpc/mm/book3s64/pgtable.c | 3 ++
> 2 files changed, 33 insertions(+)
>
> diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h b/arch/powerpc/include/asm/book3s/64/pgtable.h
> index 7dede2e34b70..ccf00a8b98c6 100644
> --- a/arch/powerpc/include/asm/book3s/64/pgtable.h
> +++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
> @@ -876,6 +876,23 @@ static inline int pmd_present(pmd_t pmd)
> return false;
> }
>
> +static inline int pmd_is_serializing(pmd_t pmd)
> +{
> + /*
> + * If the pmd is undergoing a split, the _PAGE_PRESENT bit is clear
> + * and _PAGE_INVALID is set (see pmd_present, pmdp_invalidate).
> + *
> + * This condition may also occur when flushing a pmd while flushing
> + * it (see ptep_modify_prot_start), so callers must ensure this
> + * case is fine as well.
> + */
> + if ((pmd_raw(pmd) & cpu_to_be64(_PAGE_PRESENT | _PAGE_INVALID)) ==
> + cpu_to_be64(_PAGE_INVALID))
> + return true;
> +
> + return false;
> +}
> +
> static inline int pmd_bad(pmd_t pmd)
> {
> if (radix_enabled())
> @@ -1092,6 +1109,19 @@ static inline int pmd_protnone(pmd_t pmd)
> #define pmd_access_permitted pmd_access_permitted
> static inline bool pmd_access_permitted(pmd_t pmd, bool write)
> {
> + /*
> + * pmdp_invalidate sets this combination (which is not caught by
> + * !pte_present() check in pte_access_permitted), to prevent
> + * lock-free lookups, as part of the serialize_against_pte_lookup()
> + * synchronisation.
> + *
> + * This also catches the case where the PTE's hardware PRESENT bit is
> + * cleared while TLB is flushed, which is suboptimal but should not
> + * be frequent.
> + */
> + if (pmd_is_serializing(pmd))
> + return false;
> +
> return pte_access_permitted(pmd_pte(pmd), write);
> }
>
> diff --git a/arch/powerpc/mm/book3s64/pgtable.c b/arch/powerpc/mm/book3s64/pgtable.c
> index 16bda049187a..ff98b663c83e 100644
> --- a/arch/powerpc/mm/book3s64/pgtable.c
> +++ b/arch/powerpc/mm/book3s64/pgtable.c
> @@ -116,6 +116,9 @@ pmd_t pmdp_invalidate(struct vm_area_struct *vma, unsigned long address,
> /*
> * This ensures that generic code that rely on IRQ disabling
> * to prevent a parallel THP split work as expected.
> + *
> + * Marking the entry with _PAGE_INVALID && ~_PAGE_PRESENT requires
> + * a special case check in pmd_access_permitted.
> */
> serialize_against_pte_lookup(vma->vm_mm);
> return __pmd(old_pmd);
> --
> 2.20.1
^ permalink raw reply
* Re: [PATCH 2/2] powerpc/64s: __find_linux_pte synchronization vs pmdp_invalidate
From: Christophe Leroy @ 2019-06-07 5:35 UTC (permalink / raw)
To: Nicholas Piggin, linuxppc-dev; +Cc: Aneesh Kumar K . V
In-Reply-To: <20190607035636.5446-2-npiggin@gmail.com>
Le 07/06/2019 à 05:56, Nicholas Piggin a écrit :
> The change to pmdp_invalidate to mark the pmd with _PAGE_INVALID broke
> the synchronisation against lock free lookups, __find_linux_pte's
> pmd_none check no longer returns true for such cases.
>
> Fix this by adding a check for this condition as well.
>
> Fixes: da7ad366b497 ("powerpc/mm/book3s: Update pmd_present to look at _PAGE_PRESENT bit")
> Cc: Christophe Leroy <christophe.leroy@c-s.fr>
> Suggested-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> ---
> arch/powerpc/mm/pgtable.c | 16 ++++++++++++++--
> 1 file changed, 14 insertions(+), 2 deletions(-)
>
> diff --git a/arch/powerpc/mm/pgtable.c b/arch/powerpc/mm/pgtable.c
> index db4a6253df92..533fc6fa6726 100644
> --- a/arch/powerpc/mm/pgtable.c
> +++ b/arch/powerpc/mm/pgtable.c
> @@ -372,13 +372,25 @@ pte_t *__find_linux_pte(pgd_t *pgdir, unsigned long ea,
> pdshift = PMD_SHIFT;
> pmdp = pmd_offset(&pud, ea);
> pmd = READ_ONCE(*pmdp);
> +
> /*
> - * A hugepage collapse is captured by pmd_none, because
> - * it mark the pmd none and do a hpte invalidate.
> + * A hugepage collapse is captured by this condition, see
> + * pmdp_collapse_flush.
> */
> if (pmd_none(pmd))
> return NULL;
>
> +#ifdef CONFIG_PPC_BOOK3S_64
> + /*
> + * A hugepage split is captured by this condition, see
> + * pmdp_invalidate.
> + *
> + * Huge page modification can be caught here too.
> + */
> + if (pmd_is_serializing(pmd))
> + return NULL;
> +#endif
> +
Could get rid of that #ifdef by adding the following in book3s32 and
nohash pgtable.h:
static inline bool pmd_is_serializing() { return false; }
Christophe
> if (pmd_trans_huge(pmd) || pmd_devmap(pmd)) {
> if (is_thp)
> *is_thp = true;
>
^ permalink raw reply
* Re: [PATCH 2/2] powerpc/64s: __find_linux_pte synchronization vs pmdp_invalidate
From: Aneesh Kumar K.V @ 2019-06-07 5:35 UTC (permalink / raw)
To: Nicholas Piggin, linuxppc-dev; +Cc: Nicholas Piggin
In-Reply-To: <20190607035636.5446-2-npiggin@gmail.com>
Nicholas Piggin <npiggin@gmail.com> writes:
> The change to pmdp_invalidate to mark the pmd with _PAGE_INVALID broke
> the synchronisation against lock free lookups, __find_linux_pte's
> pmd_none check no longer returns true for such cases.
>
> Fix this by adding a check for this condition as well.
>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
> Fixes: da7ad366b497 ("powerpc/mm/book3s: Update pmd_present to look at _PAGE_PRESENT bit")
> Cc: Christophe Leroy <christophe.leroy@c-s.fr>
> Suggested-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> ---
> arch/powerpc/mm/pgtable.c | 16 ++++++++++++++--
> 1 file changed, 14 insertions(+), 2 deletions(-)
>
> diff --git a/arch/powerpc/mm/pgtable.c b/arch/powerpc/mm/pgtable.c
> index db4a6253df92..533fc6fa6726 100644
> --- a/arch/powerpc/mm/pgtable.c
> +++ b/arch/powerpc/mm/pgtable.c
> @@ -372,13 +372,25 @@ pte_t *__find_linux_pte(pgd_t *pgdir, unsigned long ea,
> pdshift = PMD_SHIFT;
> pmdp = pmd_offset(&pud, ea);
> pmd = READ_ONCE(*pmdp);
> +
> /*
> - * A hugepage collapse is captured by pmd_none, because
> - * it mark the pmd none and do a hpte invalidate.
> + * A hugepage collapse is captured by this condition, see
> + * pmdp_collapse_flush.
> */
> if (pmd_none(pmd))
> return NULL;
>
> +#ifdef CONFIG_PPC_BOOK3S_64
> + /*
> + * A hugepage split is captured by this condition, see
> + * pmdp_invalidate.
> + *
> + * Huge page modification can be caught here too.
> + */
> + if (pmd_is_serializing(pmd))
> + return NULL;
> +#endif
> +
> if (pmd_trans_huge(pmd) || pmd_devmap(pmd)) {
> if (is_thp)
> *is_thp = true;
> --
> 2.20.1
^ permalink raw reply
* [Bug 203837] Booting kernel under KVM immediately freezes host
From: bugzilla-daemon @ 2019-06-07 5:42 UTC (permalink / raw)
To: linuxppc-dev
In-Reply-To: <bug-203837-206035@https.bugzilla.kernel.org/>
https://bugzilla.kernel.org/show_bug.cgi?id=203837
Paul Mackerras (paulus@ozlabs.org) changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |paulus@ozlabs.org
--- Comment #1 from Paul Mackerras (paulus@ozlabs.org) ---
I have tried but not succeeded in replicating this problem.
I have tried 5.2-rc3 in the host with the config I usually use, plus 5.2-rc3 in
the guest with that same config. That boots just fine.
With 5.2-rc3 in the host and my usual config, and 5.2-rc3 in the guest compiled
with the config attached to this bug, the guest gets a kernel panic due to
being unable to mount root. It looks like it never manages to load virtio-blk
for some reason.
With the config attached to this bug, I did once see the guest stop outputting
messages after the message about bringing up CPUs. The host was still running
just fine, and top in the host showed the qemu-system-ppc64 process using 100%
of a CPU, consistent with the guest being in an infinite loop.
I think we need more details about the machine where the crash is occurring -
host kernel config, details of VM config (qemu command line or libvirt xml),
etc.
--
You are receiving this mail because:
You are watching the assignee of the bug.
^ permalink raw reply
* Re: [PATCH] Powerpc/Watchpoint: Restore nvgprs while returning from exception
From: Michael Ellerman @ 2019-06-07 5:50 UTC (permalink / raw)
To: Ravi Bangoria
Cc: Ravi Bangoria, mikey, linux-kernel, npiggin, paulus, mahesh,
linuxppc-dev
In-Reply-To: <20190606072951.32116-1-ravi.bangoria@linux.ibm.com>
Ravi Bangoria <ravi.bangoria@linux.ibm.com> writes:
> Powerpc hw triggers watchpoint before executing the instruction.
> To make trigger-after-execute behavior, kernel emulates the
> instruction. If the instruction is 'load something into non-
> volatile register', exception handler should restore emulated
> register state while returning back, otherwise there will be
> register state corruption. Ex, Adding a watchpoint on a list
> can corrput the list:
>
> # cat /proc/kallsyms | grep kthread_create_list
> c00000000121c8b8 d kthread_create_list
>
> Add watchpoint on kthread_create_list->next:
>
> # perf record -e mem:0xc00000000121c8c0
>
> Run some workload such that new kthread gets invoked. Ex, I
> just logged out from console:
>
> list_add corruption. next->prev should be prev (c000000001214e00), \
> but was c00000000121c8b8. (next=c00000000121c8b8).
> WARNING: CPU: 59 PID: 309 at lib/list_debug.c:25 __list_add_valid+0xb4/0xc0
> CPU: 59 PID: 309 Comm: kworker/59:0 Kdump: loaded Not tainted 5.1.0-rc7+ #69
> ...
> NIP __list_add_valid+0xb4/0xc0
> LR __list_add_valid+0xb0/0xc0
> Call Trace:
> __list_add_valid+0xb0/0xc0 (unreliable)
> __kthread_create_on_node+0xe0/0x260
> kthread_create_on_node+0x34/0x50
> create_worker+0xe8/0x260
> worker_thread+0x444/0x560
> kthread+0x160/0x1a0
> ret_from_kernel_thread+0x5c/0x70
This all depends on what code the compiler generates for the list
access. Can you include a disassembly of the relevant code in your
kernel so we have an example of the bad case.
> diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S
> index 9481a11..96de0d1 100644
> --- a/arch/powerpc/kernel/exceptions-64s.S
> +++ b/arch/powerpc/kernel/exceptions-64s.S
> @@ -1753,7 +1753,7 @@ handle_dabr_fault:
> ld r5,_DSISR(r1)
> addi r3,r1,STACK_FRAME_OVERHEAD
> bl do_break
> -12: b ret_from_except_lite
> +12: b ret_from_except
This probably warrants a comment explaining why we can't use the (badly
named) "lite" version.
cheers
^ permalink raw reply
* Re: [PATCH] powerpc/32s: fix booting with CONFIG_PPC_EARLY_DEBUG_BOOTX
From: Mathieu Malaterre @ 2019-06-07 6:16 UTC (permalink / raw)
To: Christophe Leroy; +Cc: linuxppc-dev, Paul Mackerras, LKML
In-Reply-To: <CA+7wUswvw3JJ2dLCn877tNbTd==O5c9LxHGezOm+y5otQZnS2w@mail.gmail.com>
On Wed, Jun 5, 2019 at 1:32 PM Mathieu Malaterre <malat@debian.org> wrote:
>
> On Mon, Jun 3, 2019 at 3:00 PM Christophe Leroy <christophe.leroy@c-s.fr> wrote:
> >
> > When booting through OF, setup_disp_bat() does nothing because
> > disp_BAT are not set. By change, it used to work because BOOTX
> > buffer is mapped 1:1 at address 0x81000000 by the bootloader, and
> > btext_setup_display() sets virt addr same as phys addr.
> >
> > But since commit 215b823707ce ("powerpc/32s: set up an early static
> > hash table for KASAN."), a temporary page table overrides the
> > bootloader mapping.
> >
> > This 0x81000000 is also problematic with the newly implemented
> > Kernel Userspace Access Protection (KUAP) because it is within user
> > address space.
> >
> > This patch fixes those issues by properly setting disp_BAT through
> > a call to btext_prepare_BAT(), allowing setup_disp_bat() to
> > properly setup BAT3 for early bootx screen buffer access.
> >
> > Reported-by: Mathieu Malaterre <malat@debian.org>
> > Fixes: 215b823707ce ("powerpc/32s: set up an early static hash table for KASAN.")
> > Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
>
> The patch below does fix the symptoms I reported. Tested with CONFIG_KASAN=n :
>
> Tested-by: Mathieu Malaterre <malat@debian.org>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=203699
>
> Thanks !
>
> > ---
> > arch/powerpc/include/asm/btext.h | 4 ++++
> > arch/powerpc/kernel/prom_init.c | 1 +
> > arch/powerpc/kernel/prom_init_check.sh | 2 +-
> > 3 files changed, 6 insertions(+), 1 deletion(-)
> >
> > diff --git a/arch/powerpc/include/asm/btext.h b/arch/powerpc/include/asm/btext.h
> > index 3ffad030393c..461b0f193864 100644
> > --- a/arch/powerpc/include/asm/btext.h
> > +++ b/arch/powerpc/include/asm/btext.h
> > @@ -13,7 +13,11 @@ extern void btext_update_display(unsigned long phys, int width, int height,
> > int depth, int pitch);
> > extern void btext_setup_display(int width, int height, int depth, int pitch,
> > unsigned long address);
> > +#ifdef CONFIG_PPC32
> > extern void btext_prepare_BAT(void);
> > +#else
> > +static inline void btext_prepare_BAT(void) { }
> > +#endif
> > extern void btext_map(void);
> > extern void btext_unmap(void);
> >
> > diff --git a/arch/powerpc/kernel/prom_init.c b/arch/powerpc/kernel/prom_init.c
> > index 3555cad7bdde..ed446b7ea164 100644
> > --- a/arch/powerpc/kernel/prom_init.c
> > +++ b/arch/powerpc/kernel/prom_init.c
> > @@ -2336,6 +2336,7 @@ static void __init prom_check_displays(void)
> > prom_printf("W=%d H=%d LB=%d addr=0x%x\n",
> > width, height, pitch, addr);
> > btext_setup_display(width, height, 8, pitch, addr);
> > + btext_prepare_BAT();
> > }
> > #endif /* CONFIG_PPC_EARLY_DEBUG_BOOTX */
> > }
> > diff --git a/arch/powerpc/kernel/prom_init_check.sh b/arch/powerpc/kernel/prom_init_check.sh
> > index 518d416971c1..160bef0d553d 100644
> > --- a/arch/powerpc/kernel/prom_init_check.sh
> > +++ b/arch/powerpc/kernel/prom_init_check.sh
> > @@ -24,7 +24,7 @@ fi
> > WHITELIST="add_reloc_offset __bss_start __bss_stop copy_and_flush
> > _end enter_prom $MEM_FUNCS reloc_offset __secondary_hold
> > __secondary_hold_acknowledge __secondary_hold_spinloop __start
> > -logo_linux_clut224
> > +logo_linux_clut224 btext_prepare_BAT
> > reloc_got2 kernstart_addr memstart_addr linux_banner _stext
> > __prom_init_toc_start __prom_init_toc_end btext_setup_display TOC."
> >
> > --
> > 2.13.3
> >
^ permalink raw reply
* [PATCH 1/2] powerpc/64s/radix: Enable HAVE_ARCH_HUGE_VMAP
From: Nicholas Piggin @ 2019-06-07 6:19 UTC (permalink / raw)
To: linuxppc-dev; +Cc: Nicholas Piggin
This sets the HAVE_ARCH_HUGE_VMAP option, and defines the required
page table functions.
This will not enable huge iomaps, because powerpc/64 ioremap does not
call ioremap_page_range. That is done in a later change.
HAVE_ARCH_HUGE_VMAP facilities will be used to enable huge pages for
vmalloc memory in a set of generic kernel changes. Combined, this
improves cached `git diff` performance by about 5% on a 2-node POWER9
with 32MB dentry cache hash, by allowing the dentry/inode hashes to
be mapped with 2MB pages:
Profiling git diff dTLB misses with a vanilla kernel:
81.75% git [kernel.vmlinux] [k] __d_lookup_rcu
7.21% git [kernel.vmlinux] [k] strncpy_from_user
1.77% git [kernel.vmlinux] [k] find_get_entry
1.59% git [kernel.vmlinux] [k] kmem_cache_free
40,168 dTLB-miss
0.100342754 seconds time elapsed
With powerpc huge vmap and generic huge vmap vmalloc:
2,987 dTLB-miss
0.095933138 seconds time elapsed
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
arch/powerpc/Kconfig | 1 +
arch/powerpc/mm/book3s64/radix_pgtable.c | 93 ++++++++++++++++++++++++
2 files changed, 94 insertions(+)
diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 8c1c636308c8..f0e5b38d52e8 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -167,6 +167,7 @@ config PPC
select GENERIC_STRNLEN_USER
select GENERIC_TIME_VSYSCALL
select HAVE_ARCH_AUDITSYSCALL
+ select HAVE_ARCH_HUGE_VMAP if PPC_BOOK3S_64 && PPC_RADIX_MMU
select HAVE_ARCH_JUMP_LABEL
select HAVE_ARCH_KASAN if PPC32
select HAVE_ARCH_KGDB
diff --git a/arch/powerpc/mm/book3s64/radix_pgtable.c b/arch/powerpc/mm/book3s64/radix_pgtable.c
index c9bcf428dd2b..3bc9ade56277 100644
--- a/arch/powerpc/mm/book3s64/radix_pgtable.c
+++ b/arch/powerpc/mm/book3s64/radix_pgtable.c
@@ -1122,3 +1122,96 @@ void radix__ptep_modify_prot_commit(struct vm_area_struct *vma,
set_pte_at(mm, addr, ptep, pte);
}
+
+int __init arch_ioremap_pud_supported(void)
+{
+ return radix_enabled();
+}
+
+int __init arch_ioremap_pmd_supported(void)
+{
+ return radix_enabled();
+}
+
+int p4d_free_pud_page(p4d_t *p4d, unsigned long addr)
+{
+ return 0;
+}
+
+int pud_set_huge(pud_t *pud, phys_addr_t addr, pgprot_t prot)
+{
+ pte_t *ptep = (pte_t *)pud;
+ pte_t new_pud = pfn_pte(__phys_to_pfn(addr), prot);
+
+ set_pte_at(&init_mm, 0 /* radix unused */, ptep, new_pud);
+
+ return 1;
+}
+
+int pud_clear_huge(pud_t *pud)
+{
+ if (pud_huge(*pud)) {
+ pud_clear(pud);
+ return 1;
+ }
+
+ return 0;
+}
+
+int pud_free_pmd_page(pud_t *pud, unsigned long addr)
+{
+ pmd_t *pmd;
+ int i;
+
+ pmd = (pmd_t *)pud_page_vaddr(*pud);
+ pud_clear(pud);
+
+ flush_tlb_kernel_range(addr, addr + PUD_SIZE);
+
+ for (i = 0; i < PTRS_PER_PMD; i++) {
+ if (!pmd_none(pmd[i])) {
+ pte_t *pte;
+ pte = (pte_t *)pmd_page_vaddr(pmd[i]);
+
+ pte_free_kernel(&init_mm, pte);
+ }
+ }
+
+ pmd_free(&init_mm, pmd);
+
+ return 1;
+}
+
+int pmd_set_huge(pmd_t *pmd, phys_addr_t addr, pgprot_t prot)
+{
+ pte_t *ptep = (pte_t *)pmd;
+ pte_t new_pmd = pfn_pte(__phys_to_pfn(addr), prot);
+
+ set_pte_at(&init_mm, 0 /* radix unused */, ptep, new_pmd);
+
+ return 1;
+}
+
+int pmd_clear_huge(pmd_t *pmd)
+{
+ if (pmd_huge(*pmd)) {
+ pmd_clear(pmd);
+ return 1;
+ }
+
+ return 0;
+}
+
+int pmd_free_pte_page(pmd_t *pmd, unsigned long addr)
+{
+ pte_t *pte;
+
+ pte = (pte_t *)pmd_page_vaddr(*pmd);
+ pmd_clear(pmd);
+
+ flush_tlb_kernel_range(addr, addr + PMD_SIZE);
+
+ pte_free_kernel(&init_mm, pte);
+
+ return 1;
+}
--
2.20.1
^ permalink raw reply related
* [PATCH 2/2] powerpc/64s/radix: ioremap use huge page mappings
From: Nicholas Piggin @ 2019-06-07 6:19 UTC (permalink / raw)
To: linuxppc-dev; +Cc: Nicholas Piggin
In-Reply-To: <20190607061922.20542-1-npiggin@gmail.com>
powerpc/64s does not use ioremap_page_range, so it does not get huge
vmap iomap mappings automatically. The radix kernel mapping function
already allows larger page mappings that work with huge vmap, so wire
that up to allow huge pages to be used for ioremap mappings.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
arch/powerpc/include/asm/book3s/64/pgtable.h | 8 +++
arch/powerpc/mm/pgtable_64.c | 58 ++++++++++++++++++--
include/linux/io.h | 1 +
lib/ioremap.c | 2 +-
4 files changed, 62 insertions(+), 7 deletions(-)
diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h b/arch/powerpc/include/asm/book3s/64/pgtable.h
index ccf00a8b98c6..d7a4f2d80598 100644
--- a/arch/powerpc/include/asm/book3s/64/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
@@ -274,6 +274,14 @@ extern unsigned long __vmalloc_end;
#define VMALLOC_START __vmalloc_start
#define VMALLOC_END __vmalloc_end
+static inline unsigned int ioremap_max_order(void)
+{
+ if (radix_enabled())
+ return PUD_SHIFT;
+ return 7 + PAGE_SHIFT; /* default from linux/vmalloc.h */
+}
+#define IOREMAP_MAX_ORDER ({ ioremap_max_order();})
+
extern unsigned long __kernel_virt_start;
extern unsigned long __kernel_virt_size;
extern unsigned long __kernel_io_start;
diff --git a/arch/powerpc/mm/pgtable_64.c b/arch/powerpc/mm/pgtable_64.c
index d2d976ff8a0e..cf02b67eee55 100644
--- a/arch/powerpc/mm/pgtable_64.c
+++ b/arch/powerpc/mm/pgtable_64.c
@@ -112,7 +112,7 @@ unsigned long ioremap_bot = IOREMAP_BASE;
* __ioremap_at - Low level function to establish the page tables
* for an IO mapping
*/
-void __iomem *__ioremap_at(phys_addr_t pa, void *ea, unsigned long size, pgprot_t prot)
+static void __iomem * hash__ioremap_at(phys_addr_t pa, void *ea, unsigned long size, pgprot_t prot)
{
unsigned long i;
@@ -120,6 +120,54 @@ void __iomem *__ioremap_at(phys_addr_t pa, void *ea, unsigned long size, pgprot_
if (pgprot_val(prot) & H_PAGE_4K_PFN)
return NULL;
+ for (i = 0; i < size; i += PAGE_SIZE)
+ if (map_kernel_page((unsigned long)ea + i, pa + i, prot))
+ return NULL;
+
+ return (void __iomem *)ea;
+}
+
+static int radix__ioremap_page_range(unsigned long addr, unsigned long end,
+ phys_addr_t phys_addr, pgprot_t prot)
+{
+ while (addr != end) {
+ if (unlikely(ioremap_huge_disabled))
+ goto use_small_page;
+
+ if (!(addr & ~PUD_MASK) && !(phys_addr & ~PUD_MASK) &&
+ end - addr >= PUD_SIZE) {
+ if (radix__map_kernel_page(addr, phys_addr, prot, PUD_SIZE))
+ return -ENOMEM;
+ addr += PUD_SIZE;
+ phys_addr += PUD_SIZE;
+
+ } else if (!(addr & ~PMD_MASK) && !(phys_addr & ~PMD_MASK) &&
+ end - addr >= PMD_SIZE) {
+ if (radix__map_kernel_page(addr, phys_addr, prot, PMD_SIZE))
+ return -ENOMEM;
+ addr += PMD_SIZE;
+ phys_addr += PMD_SIZE;
+
+ } else {
+use_small_page:
+ if (radix__map_kernel_page(addr, phys_addr, prot, PAGE_SIZE))
+ return -ENOMEM;
+ addr += PAGE_SIZE;
+ phys_addr += PAGE_SIZE;
+ }
+ }
+ return 0;
+}
+
+static void __iomem * radix__ioremap_at(phys_addr_t pa, void *ea, unsigned long size, pgprot_t prot)
+{
+ if (radix__ioremap_page_range((unsigned long)ea, (unsigned long)ea + size, pa, prot))
+ return NULL;
+ return ea;
+}
+
+void __iomem *__ioremap_at(phys_addr_t pa, void *ea, unsigned long size, pgprot_t prot)
+{
if ((ea + size) >= (void *)IOREMAP_END) {
pr_warn("Outside the supported range\n");
return NULL;
@@ -129,11 +177,9 @@ void __iomem *__ioremap_at(phys_addr_t pa, void *ea, unsigned long size, pgprot_
WARN_ON(((unsigned long)ea) & ~PAGE_MASK);
WARN_ON(size & ~PAGE_MASK);
- for (i = 0; i < size; i += PAGE_SIZE)
- if (map_kernel_page((unsigned long)ea + i, pa + i, prot))
- return NULL;
-
- return (void __iomem *)ea;
+ if (radix_enabled())
+ return radix__ioremap_at(pa, ea, size, prot);
+ return hash__ioremap_at(pa, ea, size, prot);
}
/**
diff --git a/include/linux/io.h b/include/linux/io.h
index 32e30e8fb9db..423c4294aaa3 100644
--- a/include/linux/io.h
+++ b/include/linux/io.h
@@ -44,6 +44,7 @@ static inline int ioremap_page_range(unsigned long addr, unsigned long end,
#endif
#ifdef CONFIG_HAVE_ARCH_HUGE_VMAP
+extern int ioremap_huge_disabled;
void __init ioremap_huge_init(void);
int arch_ioremap_pud_supported(void);
int arch_ioremap_pmd_supported(void);
diff --git a/lib/ioremap.c b/lib/ioremap.c
index 063213685563..386ff956755f 100644
--- a/lib/ioremap.c
+++ b/lib/ioremap.c
@@ -18,7 +18,7 @@
static int __read_mostly ioremap_p4d_capable;
static int __read_mostly ioremap_pud_capable;
static int __read_mostly ioremap_pmd_capable;
-static int __read_mostly ioremap_huge_disabled;
+int __read_mostly ioremap_huge_disabled;
static int __init set_nohugeiomap(char *str)
{
--
2.20.1
^ permalink raw reply related
* Re: [PATCH] Powerpc/Watchpoint: Restore nvgprs while returning from exception
From: Ravi Bangoria @ 2019-06-07 6:26 UTC (permalink / raw)
To: Michael Ellerman
Cc: Ravi Bangoria, mikey, linux-kernel, npiggin, paulus, mahesh,
linuxppc-dev
In-Reply-To: <87ftom0wrm.fsf@concordia.ellerman.id.au>
On 6/7/19 11:20 AM, Michael Ellerman wrote:
> Ravi Bangoria <ravi.bangoria@linux.ibm.com> writes:
>
>> Powerpc hw triggers watchpoint before executing the instruction.
>> To make trigger-after-execute behavior, kernel emulates the
>> instruction. If the instruction is 'load something into non-
>> volatile register', exception handler should restore emulated
>> register state while returning back, otherwise there will be
>> register state corruption. Ex, Adding a watchpoint on a list
>> can corrput the list:
>>
>> # cat /proc/kallsyms | grep kthread_create_list
>> c00000000121c8b8 d kthread_create_list
>>
>> Add watchpoint on kthread_create_list->next:
>>
>> # perf record -e mem:0xc00000000121c8c0
>>
>> Run some workload such that new kthread gets invoked. Ex, I
>> just logged out from console:
>>
>> list_add corruption. next->prev should be prev (c000000001214e00), \
>> but was c00000000121c8b8. (next=c00000000121c8b8).
>> WARNING: CPU: 59 PID: 309 at lib/list_debug.c:25 __list_add_valid+0xb4/0xc0
>> CPU: 59 PID: 309 Comm: kworker/59:0 Kdump: loaded Not tainted 5.1.0-rc7+ #69
>> ...
>> NIP __list_add_valid+0xb4/0xc0
>> LR __list_add_valid+0xb0/0xc0
>> Call Trace:
>> __list_add_valid+0xb0/0xc0 (unreliable)
>> __kthread_create_on_node+0xe0/0x260
>> kthread_create_on_node+0x34/0x50
>> create_worker+0xe8/0x260
>> worker_thread+0x444/0x560
>> kthread+0x160/0x1a0
>> ret_from_kernel_thread+0x5c/0x70
>
> This all depends on what code the compiler generates for the list
> access.
True. list corruption is just an example. But any load instruction that uses
non-volatile register and hits a watchpoint, will result in register state
corruption.
> Can you include a disassembly of the relevant code in your
> kernel so we have an example of the bad case.
Register state from WARN_ON():
GPR00: c00000000059a3a0 c000007ff23afb50 c000000001344e00 0000000000000075
GPR04: 0000000000000000 0000000000000000 0000001852af8bc1 0000000000000000
GPR08: 0000000000000001 0000000000000007 0000000000000006 00000000000004aa
GPR12: 0000000000000000 c000007ffffeb080 c000000000137038 c000005ff62aaa00
GPR16: 0000000000000000 0000000000000000 c000007fffbe7600 c000007fffbe7370
GPR20: c000007fffbe7320 c000007fffbe7300 c000000001373a00 0000000000000000
GPR24: fffffffffffffef7 c00000000012e320 c000007ff23afcb0 c000000000cb8628
GPR28: c00000000121c8b8 c000000001214e00 c000007fef5b17e8 c000007fef5b17c0
Snippet from __kthread_create_on_node:
c000000000136be8: ed ff a2 3f addis r29,r2,-19
c000000000136bec: c0 7a bd eb ld r29,31424(r29)
if (!__list_add_valid(new, prev, next))
c000000000136bf0: 78 f3 c3 7f mr r3,r30
c000000000136bf4: 78 e3 85 7f mr r5,r28
c000000000136bf8: 78 eb a4 7f mr r4,r29
c000000000136bfc: fd 36 46 48 bl c00000000059a2f8 <__list_add_valid+0x8>
Watchpoint hit at 0xc000000000136bec.
addis r29,r2,-19
=> r29 = 0xc000000001344e00 + (-19 << 16)
=> r29 = 0xc000000001214e00
ld r29,31424(r29)
=> r29 = *(0xc000000001214e00 + 31424)
=> r29 = *(0xc00000000121c8c0)
0xc00000000121c8c0 is where we placed a watchpoint and thus this instruction was
emulated by emulate_step. But because handle_dabr_fault did not restore emulated
register state, r29 still contains stale value in above register state.
^ permalink raw reply
* [Bug 203837] Booting kernel under KVM immediately freezes host
From: bugzilla-daemon @ 2019-06-07 6:29 UTC (permalink / raw)
To: linuxppc-dev
In-Reply-To: <bug-203837-206035@https.bugzilla.kernel.org/>
https://bugzilla.kernel.org/show_bug.cgi?id=203837
--- Comment #2 from Paul Mackerras (paulus@ozlabs.org) ---
Just tried 5.1.7 in the host and got the guest locking up during boot. In xmon
I see one cpu in pmdp_invalidate and another in handle_mm_fault. It seems very
possible this is the bug that Nick Piggin's recent patch series fixes
("powerpc/64s: Fix THP PMD collapse serialisation"):
http://patchwork.ozlabs.org/project/linuxppc-dev/list/?series=112348
--
You are receiving this mail because:
You are watching the assignee of the bug.
^ permalink raw reply
* Re: [PATCH 2/2] powerpc/64s: __find_linux_pte synchronization vs pmdp_invalidate
From: Nicholas Piggin @ 2019-06-07 6:31 UTC (permalink / raw)
To: Christophe Leroy, linuxppc-dev; +Cc: Aneesh Kumar K . V
In-Reply-To: <46295970-4740-5648-efb4-513ab6a5c1c0@c-s.fr>
Christophe Leroy's on June 7, 2019 3:35 pm:
>
>
> Le 07/06/2019 à 05:56, Nicholas Piggin a écrit :
>> The change to pmdp_invalidate to mark the pmd with _PAGE_INVALID broke
>> the synchronisation against lock free lookups, __find_linux_pte's
>> pmd_none check no longer returns true for such cases.
>>
>> Fix this by adding a check for this condition as well.
>>
>> Fixes: da7ad366b497 ("powerpc/mm/book3s: Update pmd_present to look at _PAGE_PRESENT bit")
>> Cc: Christophe Leroy <christophe.leroy@c-s.fr>
>> Suggested-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
>> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
>> ---
>> arch/powerpc/mm/pgtable.c | 16 ++++++++++++++--
>> 1 file changed, 14 insertions(+), 2 deletions(-)
>>
>> diff --git a/arch/powerpc/mm/pgtable.c b/arch/powerpc/mm/pgtable.c
>> index db4a6253df92..533fc6fa6726 100644
>> --- a/arch/powerpc/mm/pgtable.c
>> +++ b/arch/powerpc/mm/pgtable.c
>> @@ -372,13 +372,25 @@ pte_t *__find_linux_pte(pgd_t *pgdir, unsigned long ea,
>> pdshift = PMD_SHIFT;
>> pmdp = pmd_offset(&pud, ea);
>> pmd = READ_ONCE(*pmdp);
>> +
>> /*
>> - * A hugepage collapse is captured by pmd_none, because
>> - * it mark the pmd none and do a hpte invalidate.
>> + * A hugepage collapse is captured by this condition, see
>> + * pmdp_collapse_flush.
>> */
>> if (pmd_none(pmd))
>> return NULL;
>>
>> +#ifdef CONFIG_PPC_BOOK3S_64
>> + /*
>> + * A hugepage split is captured by this condition, see
>> + * pmdp_invalidate.
>> + *
>> + * Huge page modification can be caught here too.
>> + */
>> + if (pmd_is_serializing(pmd))
>> + return NULL;
>> +#endif
>> +
>
> Could get rid of that #ifdef by adding the following in book3s32 and
> nohash pgtable.h:
>
> static inline bool pmd_is_serializing() { return false; }
I don't mind either way. If it's an isolated case like this, sometimes
I'm against polluting the sub arch code with it.
It's up to you I can change that if you prefer.
Thanks,
Nick
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox