From: James Morse <james.morse@arm.com>
To: Qian Cai <cai@lca.pw>
Cc: ard.biesheuvel@linaro.org, marc.zyngier@arm.com,
catalin.marinas@arm.com, will.deacon@arm.com,
linux-kernel@vger.kernel.org, takahiro.akashi@linaro.org,
kexec@lists.infradead.org, linux-arm-kernel@lists.infradead.org
Subject: Re: [PATCH] arm64: invalidate TLB before turning MMU on
Date: Thu, 13 Dec 2018 10:44:02 +0000 [thread overview]
Message-ID: <1b150a95-2b80-2be3-0b77-599404f882dc@arm.com> (raw)
In-Reply-To: <20181213052259.56352-1-cai@lca.pw>
Hi Qian,
On 13/12/2018 05:22, Qian Cai wrote:
> On this HPE Apollo 70 arm64 server with 256 CPUs, triggering a crash
> dump just hung. It has 4 threads on each core. Each 2-core share a same
> L1 and L2 caches, so that is 8 CPUs shares those. All CPUs share a same
> L3 cache.
>
> It turned out that this was due to the TLB contained stale entries (or
> uninitialized junk which just happened to look valid) from the first
> kernel before turning the MMU on in the second kernel which caused this
> instruction hung,
This is a great find, thanks for debugging this!
The kernel should already handle this, as we don't trust the bootloader to clean
up either.
In arch/arm64/mm/proc.S::__cpu_setup()
|/*
| * __cpu_setup
| *
| * Initialise the processor for turning the MMU on. Return in x0 the
| * value of the SCTLR_EL1 register.
| */
| .pushsection ".idmap.text", "awx"
| ENTRY(__cpu_setup)
| tlbi vmalle1 // Invalidate local TLB
| dsb nsh
This is called from stext, which then branches to __primary_switch(), which
calls __enable_mmu() where you see this problem. It shouldn't not be possible to
allocate new tlb entries between these points...
Do you have CONFIG_RANDOMIZE_BASE disabled? This causes enable_mmu() to be
called twice, the extra tlb maintenance is in __primary_switch.
(if it works with this turned off, it points to the extra off/tlbi/on sequence).
> diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
> index 4471f570a295..5196f3d729de 100644
> --- a/arch/arm64/kernel/head.S
> +++ b/arch/arm64/kernel/head.S
> @@ -771,6 +771,10 @@ ENTRY(__enable_mmu)
> msr ttbr0_el1, x2 // load TTBR0
> msr ttbr1_el1, x1 // load TTBR1
> isb
> + dsb nshst
> + tlbi vmalle1 // invalidate TLB
> + dsb nsh
> + isb
> msr sctlr_el1, x0
> isb
The overall change here is that we do extra maintenance later.
Can move this around to bisect where the TLB entries are either coming from, or
failing-to-be invalidated?
Do your first and kdump kernels have the same VA_BITS/PAGE_SIZE?
As a stab in the dark, (totally untested):
------------------------------%<------------------------------
diff --git a/arch/arm64/mm/proc.S b/arch/arm64/mm/proc.S
index 2c75b0b903ae..a5f3b7314bda 100644
--- a/arch/arm64/mm/proc.S
+++ b/arch/arm64/mm/proc.S
@@ -406,9 +406,6 @@ ENDPROC(idmap_kpti_install_ng_mappings)
*/
.pushsection ".idmap.text", "awx"
ENTRY(__cpu_setup)
- tlbi vmalle1 // Invalidate local TLB
- dsb nsh
-
mov x0, #3 << 20
msr cpacr_el1, x0 // Enable FP/ASIMD
mov x0, #1 << 12 // Reset mdscr_el1 and disable
@@ -465,5 +462,10 @@ ENTRY(__cpu_setup)
1:
#endif /* CONFIG_ARM64_HW_AFDBM */
msr tcr_el1, x10
+ isb
+
+ tlbi vmalle1 // Invalidate local TLB
+ dsb nsh
+
ret // return to head.S
ENDPROC(__cpu_setup)
------------------------------%<------------------------------
Thanks,
James
_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec
next prev parent reply other threads:[~2018-12-13 10:44 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <113776f1-5633-e397-96eb-c533ea79671d@lca.pw>
[not found] ` <29f74c6d-dd21-dcee-6c62-914f018c4e4e@arm.com>
[not found] ` <7f467952-342b-71e2-c553-ff53ecc1812e@arm.com>
[not found] ` <20181212025131.GL21466@linaro.org>
2018-12-12 4:39 ` arm64: kdump broken on a large CPU system Qian Cai
2018-12-12 22:37 ` Qian Cai
2018-12-13 5:22 ` [PATCH] arm64: invalidate TLB before turning MMU on Qian Cai
2018-12-13 5:40 ` Bhupesh Sharma
2018-12-13 13:39 ` Qian Cai
2018-12-13 10:44 ` James Morse [this message]
2018-12-13 13:44 ` Qian Cai
2018-12-14 4:08 ` [PATCH v2] arm64: invalidate TLB just " Qian Cai
2018-12-14 5:01 ` Bhupesh Sharma
2018-12-14 12:54 ` Qian Cai
2018-12-14 7:23 ` Ard Biesheuvel
2018-12-15 1:53 ` Qian Cai
2019-01-10 20:00 ` Bhupesh Sharma
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1b150a95-2b80-2be3-0b77-599404f882dc@arm.com \
--to=james.morse@arm.com \
--cc=ard.biesheuvel@linaro.org \
--cc=cai@lca.pw \
--cc=catalin.marinas@arm.com \
--cc=kexec@lists.infradead.org \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=marc.zyngier@arm.com \
--cc=takahiro.akashi@linaro.org \
--cc=will.deacon@arm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox