From: Qian Cai <cai@lca.pw>
To: Bhupesh Sharma <bhsharma@redhat.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>,
Marc Zyngier <marc.zyngier@arm.com>,
Catalin Marinas <catalin.marinas@arm.com>,
Will Deacon <will.deacon@arm.com>,
Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
AKASHI Takahiro <takahiro.akashi@linaro.org>,
James Morse <james.morse@arm.com>,
Bhupesh SHARMA <bhupesh.linux@gmail.com>,
kexec mailing list <kexec@lists.infradead.org>,
linux-arm-kernel <linux-arm-kernel@lists.infradead.org>
Subject: Re: [PATCH] arm64: invalidate TLB before turning MMU on
Date: Thu, 13 Dec 2018 08:39:46 -0500 [thread overview]
Message-ID: <1544708386.18411.13.camel@lca.pw> (raw)
In-Reply-To: <CACi5LpP+NHa1WQg78xm1y1YM0KXnFJ72pw+YrizxR5y7yt4AiQ@mail.gmail.com>
On Thu, 2018-12-13 at 11:10 +0530, Bhupesh Sharma wrote:
> Hi Qian Cai,
>
> On Thu, Dec 13, 2018 at 10:53 AM Qian Cai <cai@lca.pw> wrote:
> >
> > On this HPE Apollo 70 arm64 server with 256 CPUs, triggering a crash
> > dump just hung. It has 4 threads on each core. Each 2-core share a same
> > L1 and L2 caches, so that is 8 CPUs shares those. All CPUs share a same
> > L3 cache.
> >
> > It turned out that this was due to the TLB contained stale entries (or
> > uninitialized junk which just happened to look valid) from the first
> > kernel before turning the MMU on in the second kernel which caused this
> > instruction hung,
> >
> > msr sctlr_el1, x0
> >
> > Signed-off-by: Qian Cai <cai@lca.pw>
> > ---
> > arch/arm64/kernel/head.S | 4 ++++
> > 1 file changed, 4 insertions(+)
> >
> > diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
> > index 4471f570a295..5196f3d729de 100644
> > --- a/arch/arm64/kernel/head.S
> > +++ b/arch/arm64/kernel/head.S
> > @@ -771,6 +771,10 @@ ENTRY(__enable_mmu)
> > msr ttbr0_el1, x2 // load TTBR0
> > msr ttbr1_el1, x1 // load TTBR1
> > isb
> > + dsb nshst
> > + tlbi vmalle1 // invalidate TLB
> > + dsb nsh
> > + isb
>
> This will be executed both for the primary and kdump kernel, right? I
> don't think we really want to invalidate the TLB when booting the
> primary kernel.
> It would be too slow and considering that we need to minimize boot
> timings on embedded arm64 devices, I think it would not be a good
> idea.
Yes, it will be executed for the first kernel as well. As James mentioned, it
needs to be done to invalidate TLB that might be used by bootloader anyway.
>
> > msr sctlr_el1, x0
> > isb
> > /*
> > --
> > 2.17.2 (Apple Git-113)
> >
>
> Also did you check this issue I reported on the HPE apollo machines
> some days back with the kdump kernel boot
> <https://www.spinics.net/lists/kexec/msg21750.html>.
> Can you please confirm that you are not facing the same issue (as I
> suspect from reading your earlier Bug Report) on the HPE apollo
> machine. Also adding 'earlycon' to the bootargs being passed to the
> kdump kernel you can see if you are able to atleast get some console
> output from the kdump kernel.
No, here did not encounter the problem you mentioned.
_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec
WARNING: multiple messages have this Message-ID (diff)
From: Qian Cai <cai@lca.pw>
To: Bhupesh Sharma <bhsharma@redhat.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>,
Marc Zyngier <marc.zyngier@arm.com>,
Catalin Marinas <catalin.marinas@arm.com>,
Will Deacon <will.deacon@arm.com>,
Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
AKASHI Takahiro <takahiro.akashi@linaro.org>,
James Morse <james.morse@arm.com>,
Bhupesh SHARMA <bhupesh.linux@gmail.com>,
kexec mailing list <kexec@lists.infradead.org>,
linux-arm-kernel <linux-arm-kernel@lists.infradead.org>
Subject: Re: [PATCH] arm64: invalidate TLB before turning MMU on
Date: Thu, 13 Dec 2018 08:39:46 -0500 [thread overview]
Message-ID: <1544708386.18411.13.camel@lca.pw> (raw)
In-Reply-To: <CACi5LpP+NHa1WQg78xm1y1YM0KXnFJ72pw+YrizxR5y7yt4AiQ@mail.gmail.com>
On Thu, 2018-12-13 at 11:10 +0530, Bhupesh Sharma wrote:
> Hi Qian Cai,
>
> On Thu, Dec 13, 2018 at 10:53 AM Qian Cai <cai@lca.pw> wrote:
> >
> > On this HPE Apollo 70 arm64 server with 256 CPUs, triggering a crash
> > dump just hung. It has 4 threads on each core. Each 2-core share a same
> > L1 and L2 caches, so that is 8 CPUs shares those. All CPUs share a same
> > L3 cache.
> >
> > It turned out that this was due to the TLB contained stale entries (or
> > uninitialized junk which just happened to look valid) from the first
> > kernel before turning the MMU on in the second kernel which caused this
> > instruction hung,
> >
> > msr sctlr_el1, x0
> >
> > Signed-off-by: Qian Cai <cai@lca.pw>
> > ---
> > arch/arm64/kernel/head.S | 4 ++++
> > 1 file changed, 4 insertions(+)
> >
> > diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
> > index 4471f570a295..5196f3d729de 100644
> > --- a/arch/arm64/kernel/head.S
> > +++ b/arch/arm64/kernel/head.S
> > @@ -771,6 +771,10 @@ ENTRY(__enable_mmu)
> > msr ttbr0_el1, x2 // load TTBR0
> > msr ttbr1_el1, x1 // load TTBR1
> > isb
> > + dsb nshst
> > + tlbi vmalle1 // invalidate TLB
> > + dsb nsh
> > + isb
>
> This will be executed both for the primary and kdump kernel, right? I
> don't think we really want to invalidate the TLB when booting the
> primary kernel.
> It would be too slow and considering that we need to minimize boot
> timings on embedded arm64 devices, I think it would not be a good
> idea.
Yes, it will be executed for the first kernel as well. As James mentioned, it
needs to be done to invalidate TLB that might be used by bootloader anyway.
>
> > msr sctlr_el1, x0
> > isb
> > /*
> > --
> > 2.17.2 (Apple Git-113)
> >
>
> Also did you check this issue I reported on the HPE apollo machines
> some days back with the kdump kernel boot
> <https://www.spinics.net/lists/kexec/msg21750.html>.
> Can you please confirm that you are not facing the same issue (as I
> suspect from reading your earlier Bug Report) on the HPE apollo
> machine. Also adding 'earlycon' to the bootargs being passed to the
> kdump kernel you can see if you are able to atleast get some console
> output from the kdump kernel.
No, here did not encounter the problem you mentioned.
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
WARNING: multiple messages have this Message-ID (diff)
From: Qian Cai <cai@lca.pw>
To: Bhupesh Sharma <bhsharma@redhat.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>,
Will Deacon <will.deacon@arm.com>,
Ard Biesheuvel <ard.biesheuvel@linaro.org>,
Marc Zyngier <marc.zyngier@arm.com>,
kexec mailing list <kexec@lists.infradead.org>,
Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
AKASHI Takahiro <takahiro.akashi@linaro.org>,
James Morse <james.morse@arm.com>,
linux-arm-kernel <linux-arm-kernel@lists.infradead.org>,
Bhupesh SHARMA <bhupesh.linux@gmail.com>
Subject: Re: [PATCH] arm64: invalidate TLB before turning MMU on
Date: Thu, 13 Dec 2018 08:39:46 -0500 [thread overview]
Message-ID: <1544708386.18411.13.camel@lca.pw> (raw)
In-Reply-To: <CACi5LpP+NHa1WQg78xm1y1YM0KXnFJ72pw+YrizxR5y7yt4AiQ@mail.gmail.com>
On Thu, 2018-12-13 at 11:10 +0530, Bhupesh Sharma wrote:
> Hi Qian Cai,
>
> On Thu, Dec 13, 2018 at 10:53 AM Qian Cai <cai@lca.pw> wrote:
> >
> > On this HPE Apollo 70 arm64 server with 256 CPUs, triggering a crash
> > dump just hung. It has 4 threads on each core. Each 2-core share a same
> > L1 and L2 caches, so that is 8 CPUs shares those. All CPUs share a same
> > L3 cache.
> >
> > It turned out that this was due to the TLB contained stale entries (or
> > uninitialized junk which just happened to look valid) from the first
> > kernel before turning the MMU on in the second kernel which caused this
> > instruction hung,
> >
> > msr sctlr_el1, x0
> >
> > Signed-off-by: Qian Cai <cai@lca.pw>
> > ---
> > arch/arm64/kernel/head.S | 4 ++++
> > 1 file changed, 4 insertions(+)
> >
> > diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
> > index 4471f570a295..5196f3d729de 100644
> > --- a/arch/arm64/kernel/head.S
> > +++ b/arch/arm64/kernel/head.S
> > @@ -771,6 +771,10 @@ ENTRY(__enable_mmu)
> > msr ttbr0_el1, x2 // load TTBR0
> > msr ttbr1_el1, x1 // load TTBR1
> > isb
> > + dsb nshst
> > + tlbi vmalle1 // invalidate TLB
> > + dsb nsh
> > + isb
>
> This will be executed both for the primary and kdump kernel, right? I
> don't think we really want to invalidate the TLB when booting the
> primary kernel.
> It would be too slow and considering that we need to minimize boot
> timings on embedded arm64 devices, I think it would not be a good
> idea.
Yes, it will be executed for the first kernel as well. As James mentioned, it
needs to be done to invalidate TLB that might be used by bootloader anyway.
>
> > msr sctlr_el1, x0
> > isb
> > /*
> > --
> > 2.17.2 (Apple Git-113)
> >
>
> Also did you check this issue I reported on the HPE apollo machines
> some days back with the kdump kernel boot
> <https://www.spinics.net/lists/kexec/msg21750.html>.
> Can you please confirm that you are not facing the same issue (as I
> suspect from reading your earlier Bug Report) on the HPE apollo
> machine. Also adding 'earlycon' to the bootargs being passed to the
> kdump kernel you can see if you are able to atleast get some console
> output from the kdump kernel.
No, here did not encounter the problem you mentioned.
next prev parent reply other threads:[~2018-12-13 13:40 UTC|newest]
Thread overview: 41+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-12-10 22:30 arm64: kdump broken on a large CPU system Qian Cai
2018-12-11 10:09 ` Marc Zyngier
2018-12-11 11:34 ` James Morse
2018-12-12 2:51 ` AKASHI, Takahiro
2018-12-12 4:39 ` Qian Cai
2018-12-12 4:39 ` Qian Cai
2018-12-12 22:37 ` Qian Cai
2018-12-12 22:37 ` Qian Cai
2018-12-13 5:22 ` [PATCH] arm64: invalidate TLB before turning MMU on Qian Cai
2018-12-13 5:22 ` Qian Cai
2018-12-13 5:22 ` Qian Cai
2018-12-13 5:40 ` Bhupesh Sharma
2018-12-13 5:40 ` Bhupesh Sharma
2018-12-13 5:40 ` Bhupesh Sharma
2018-12-13 13:39 ` Qian Cai [this message]
2018-12-13 13:39 ` Qian Cai
2018-12-13 13:39 ` Qian Cai
2018-12-13 10:44 ` James Morse
2018-12-13 10:44 ` James Morse
2018-12-13 10:44 ` James Morse
2018-12-13 13:44 ` Qian Cai
2018-12-13 13:44 ` Qian Cai
2018-12-13 13:44 ` Qian Cai
2018-12-14 4:08 ` [PATCH v2] arm64: invalidate TLB just " Qian Cai
2018-12-14 4:08 ` Qian Cai
2018-12-14 4:08 ` Qian Cai
2018-12-14 5:01 ` Bhupesh Sharma
2018-12-14 5:01 ` Bhupesh Sharma
2018-12-14 5:01 ` Bhupesh Sharma
2018-12-14 12:54 ` Qian Cai
2018-12-14 12:54 ` Qian Cai
2018-12-14 12:54 ` Qian Cai
2018-12-14 7:23 ` Ard Biesheuvel
2018-12-14 7:23 ` Ard Biesheuvel
2018-12-14 7:23 ` Ard Biesheuvel
2018-12-15 1:53 ` Qian Cai
2018-12-15 1:53 ` Qian Cai
2018-12-15 1:53 ` Qian Cai
2019-01-10 20:00 ` Bhupesh Sharma
2019-01-10 20:00 ` Bhupesh Sharma
2019-01-10 20:00 ` Bhupesh Sharma
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1544708386.18411.13.camel@lca.pw \
--to=cai@lca.pw \
--cc=ard.biesheuvel@linaro.org \
--cc=bhsharma@redhat.com \
--cc=bhupesh.linux@gmail.com \
--cc=catalin.marinas@arm.com \
--cc=james.morse@arm.com \
--cc=kexec@lists.infradead.org \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=marc.zyngier@arm.com \
--cc=takahiro.akashi@linaro.org \
--cc=will.deacon@arm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.