From: Qian Cai <cai@lca.pw>
To: "AKASHI, Takahiro" <takahiro.akashi@linaro.org>,
James Morse <james.morse@arm.com>,
Marc Zyngier <marc.zyngier@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>,
Will Deacon <will.deacon@arm.com>,
kexec@lists.infradead.org, linux-arm-kernel@lists.infradead.org,
Ard Biesheuvel <ard.biesheuvel@linaro.org>
Subject: Re: arm64: kdump broken on a large CPU system
Date: Wed, 12 Dec 2018 17:37:04 -0500 [thread overview]
Message-ID: <1544654224.18411.11.camel@lca.pw> (raw)
In-Reply-To: <e4a3456b-6a75-4564-a49f-0532d0b35726@lca.pw>
On Tue, 2018-12-11 at 23:39 -0500, Qian Cai wrote:
> [+ kexec@lists.infradead.org]
>
> The debugging progress so far...
>
> Wait up to 5 minutes for other CPUs to stop in crash_smp_send_stop() made no
> difference.
>
> With "dev" branch of this tree [1], it is possible to print out messages from
> purgatory when passing something like "--port=0x602B0000
> --port-lsr=0x602B0000,0x80" to kexec. However, even enable_dcache() in
> setup_arch() will hung like forever on this machine (working fine on another
> arm64 server - Cortex-A72). After removed only enable_dcache() /
> disable_dcache() from setup_arch() etc without removing printf() lines, it did
> print out,
>
> I'm in purgatory
> purgatory: entry=0000000090080000
> purgatory: dtb=0000000092d50000
> purgatory: D-cache Enabled before SHA verification
> purgatory: D-cache Disabled after SHA verification
>
> So, it confirmed that it must hung somewhere in arm64/kernel/head.S (.stext)
> or
> the early part of start_kernel() before earlycon was initialized.
>
> Also confirmed that passing nr_cpus=64 in the first kernel would again make
> everything work fine with this new kexec.
>
> Since enable_dcache() would hung as well, I suspect this has something to do
> with enabling MMU (i.e, .stext -> __primary_switch -> __enable_mmu) coupling
> with some sort of per-CPU data where the number of CPUs matters.
Still debugging a hung to enable MMU (enable_dcache) in purgatory [1] which may
provide some clues for the hung later in the 2nd kernel.
dsb nshst
tlbi alle2
dsb nsh
isb
bl get_ips_bits
lsl x1, x0, #TCR_IPS_EL2_SHIFT
orr x1, x1, x7
mov x0, x6
ldr x2, =MEMORY_ATTRIBUTES
msr mair_el2, x2
msr tcr_el2, x1
msr ttbr0_el2, x0
isb
mrs x0, sctlr_el2
ldr x3, =SCTLR_ELx_FLAGS
orr x0, x0, x3
msr sctlr_el2, x0 <--- hung right on this instruction.
Without CONFIG_ARM64_VHE (i.e., running in EL1), it is able to run
enable_dcache() but it still hung later in the 2nd kernel somewhere.
dsb nshst
tlbi vmalle1
dsb nsh
isb
bl get_ips_bits
lsl x1, x0, #TCR_IPS_EL1_SHIFT
orr x1, x1, x7
mov x0, x6
ldr x2, =MEMORY_ATTRIBUTES
msr mair_el1, x2
msr tcr_el1, x1
msr ttbr0_el1, x0
isb
mrs x0, sctlr_el1
ldr x3, =SCTLR_ELx_FLAGS
orr x0, x0, x3
msr sctlr_el1, x0
isb
One data point of this system is that it has 4 threads on each core. Each 2-core
share a same L1 and L2 caches, so that is 8 CPUs shares them each. All CPUs
share a same L3 cache.
Hence, I wonder if this is because of incomplete cache/TLB invalidation that had
stale entries (or uninitialised junk which just happens to look valid) present
before turning the MMU on.
[1] https://github.com/pratyushanand/kexec-tools/blob/devel/purgatory/arch/\
arm64/cache.S
>
> Right now, I think I need to find a way to print directly to pl011 serial
> console while debugging those assembly code like CONFIG_DEBUG_LL for arm64, so
> it can be used to locate where exactly it hung. Otherwise, I am shooting in
> the
> dark.
>
> [1] https://github.com/pratyushanand/kexec-tools
>
> === original email ===
>
> On this HPE Apollo 70 arm64 server with 256 CPUs, triggering a crash dump just
> hung (4.20-rc6 as well as 4.18). It was confirmed that the executing went as
> far
> as entering __cpu_soft_restart(),
>
> __crash_kexec
> machine_kexec
> cpu_soft_restart
> restart
> __cpu_soft_restart
>
> The earlycon was enabled but had no output from the 2nd kernel, so it was
> pretty
> much stuck in all those assembly code in arm64/kernel/head.S or the early part
> of start_kernel() before earlycon was initialized.
>
> It turned out this has something to do with nr_cpus in the 1st kernel,
> although
> the 2nd kernel always has nr_cpus=1 [1]. It was tested with both
> crashkernel=512M or 768M.
>
> nr_cpus <= 96 GOOD (2nd kernel was up in 2-3 mins.)
> nr_cpus=256 BAD (2nd kernel was NOT up after 1 hour.)
> nr_cpus=127 BAD (2nd kernel was NOT up after 10 mins.)
>
> I did also test with and without CONFIG_ARM64_VHE (i.e., el2_switch) made no
> difference.
>
> [1] KDUMP_COMMANDLINE_APPEND="irqpoll nr_cpus=1 swiotlb=noforce reset_devices"
>
> I am still figuring out a way to debug those assembly code to where it
> actually
> hung, but the server was hooked up with a conserver that was not able to
> generate any sysrq and I have no shell access to the conserver, so seems a bit
> difficult to use kgdb or kdb in this case.
>
> CPU information,
>
> # lscpu
> Architecture: aarch64
> Byte Order: Little Endian
> CPU(s): 256
> On-line CPU(s) list: 0-255
> Thread(s) per core: 4
> Core(s) per socket: 32
> Socket(s): 2
> NUMA node(s): 2
> Vendor ID: Cavium
> Model: 1
> Model name: ThunderX2 99xx
> Stepping: 0x1
> BogoMIPS: 400.00
> L1d cache: 32K
> L1i cache: 32K
> L2 cache: 256K
> L3 cache: 32768K
> NUMA node0 CPU(s): 0-127
> NUMA node1 CPU(s): 128-255
> Flags: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics cpuid
> asimdrdm
_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec
WARNING: multiple messages have this Message-ID (diff)
From: Qian Cai <cai@lca.pw>
To: "AKASHI, Takahiro" <takahiro.akashi@linaro.org>,
James Morse <james.morse@arm.com>,
Marc Zyngier <marc.zyngier@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>,
Will Deacon <will.deacon@arm.com>,
kexec@lists.infradead.org, linux-arm-kernel@lists.infradead.org,
Ard Biesheuvel <ard.biesheuvel@linaro.org>
Subject: Re: arm64: kdump broken on a large CPU system
Date: Wed, 12 Dec 2018 17:37:04 -0500 [thread overview]
Message-ID: <1544654224.18411.11.camel@lca.pw> (raw)
In-Reply-To: <e4a3456b-6a75-4564-a49f-0532d0b35726@lca.pw>
On Tue, 2018-12-11 at 23:39 -0500, Qian Cai wrote:
> [+ kexec@lists.infradead.org]
>
> The debugging progress so far...
>
> Wait up to 5 minutes for other CPUs to stop in crash_smp_send_stop() made no
> difference.
>
> With "dev" branch of this tree [1], it is possible to print out messages from
> purgatory when passing something like "--port=0x602B0000
> --port-lsr=0x602B0000,0x80" to kexec. However, even enable_dcache() in
> setup_arch() will hung like forever on this machine (working fine on another
> arm64 server - Cortex-A72). After removed only enable_dcache() /
> disable_dcache() from setup_arch() etc without removing printf() lines, it did
> print out,
>
> I'm in purgatory
> purgatory: entry=0000000090080000
> purgatory: dtb=0000000092d50000
> purgatory: D-cache Enabled before SHA verification
> purgatory: D-cache Disabled after SHA verification
>
> So, it confirmed that it must hung somewhere in arm64/kernel/head.S (.stext)
> or
> the early part of start_kernel() before earlycon was initialized.
>
> Also confirmed that passing nr_cpus=64 in the first kernel would again make
> everything work fine with this new kexec.
>
> Since enable_dcache() would hung as well, I suspect this has something to do
> with enabling MMU (i.e, .stext -> __primary_switch -> __enable_mmu) coupling
> with some sort of per-CPU data where the number of CPUs matters.
Still debugging a hung to enable MMU (enable_dcache) in purgatory [1] which may
provide some clues for the hung later in the 2nd kernel.
dsb nshst
tlbi alle2
dsb nsh
isb
bl get_ips_bits
lsl x1, x0, #TCR_IPS_EL2_SHIFT
orr x1, x1, x7
mov x0, x6
ldr x2, =MEMORY_ATTRIBUTES
msr mair_el2, x2
msr tcr_el2, x1
msr ttbr0_el2, x0
isb
mrs x0, sctlr_el2
ldr x3, =SCTLR_ELx_FLAGS
orr x0, x0, x3
msr sctlr_el2, x0 <--- hung right on this instruction.
Without CONFIG_ARM64_VHE (i.e., running in EL1), it is able to run
enable_dcache() but it still hung later in the 2nd kernel somewhere.
dsb nshst
tlbi vmalle1
dsb nsh
isb
bl get_ips_bits
lsl x1, x0, #TCR_IPS_EL1_SHIFT
orr x1, x1, x7
mov x0, x6
ldr x2, =MEMORY_ATTRIBUTES
msr mair_el1, x2
msr tcr_el1, x1
msr ttbr0_el1, x0
isb
mrs x0, sctlr_el1
ldr x3, =SCTLR_ELx_FLAGS
orr x0, x0, x3
msr sctlr_el1, x0
isb
One data point of this system is that it has 4 threads on each core. Each 2-core
share a same L1 and L2 caches, so that is 8 CPUs shares them each. All CPUs
share a same L3 cache.
Hence, I wonder if this is because of incomplete cache/TLB invalidation that had
stale entries (or uninitialised junk which just happens to look valid) present
before turning the MMU on.
[1] https://github.com/pratyushanand/kexec-tools/blob/devel/purgatory/arch/\
arm64/cache.S
>
> Right now, I think I need to find a way to print directly to pl011 serial
> console while debugging those assembly code like CONFIG_DEBUG_LL for arm64, so
> it can be used to locate where exactly it hung. Otherwise, I am shooting in
> the
> dark.
>
> [1] https://github.com/pratyushanand/kexec-tools
>
> === original email ===
>
> On this HPE Apollo 70 arm64 server with 256 CPUs, triggering a crash dump just
> hung (4.20-rc6 as well as 4.18). It was confirmed that the executing went as
> far
> as entering __cpu_soft_restart(),
>
> __crash_kexec
> machine_kexec
> cpu_soft_restart
> restart
> __cpu_soft_restart
>
> The earlycon was enabled but had no output from the 2nd kernel, so it was
> pretty
> much stuck in all those assembly code in arm64/kernel/head.S or the early part
> of start_kernel() before earlycon was initialized.
>
> It turned out this has something to do with nr_cpus in the 1st kernel,
> although
> the 2nd kernel always has nr_cpus=1 [1]. It was tested with both
> crashkernel=512M or 768M.
>
> nr_cpus <= 96 GOOD (2nd kernel was up in 2-3 mins.)
> nr_cpus=256 BAD (2nd kernel was NOT up after 1 hour.)
> nr_cpus=127 BAD (2nd kernel was NOT up after 10 mins.)
>
> I did also test with and without CONFIG_ARM64_VHE (i.e., el2_switch) made no
> difference.
>
> [1] KDUMP_COMMANDLINE_APPEND="irqpoll nr_cpus=1 swiotlb=noforce reset_devices"
>
> I am still figuring out a way to debug those assembly code to where it
> actually
> hung, but the server was hooked up with a conserver that was not able to
> generate any sysrq and I have no shell access to the conserver, so seems a bit
> difficult to use kgdb or kdb in this case.
>
> CPU information,
>
> # lscpu
> Architecture: aarch64
> Byte Order: Little Endian
> CPU(s): 256
> On-line CPU(s) list: 0-255
> Thread(s) per core: 4
> Core(s) per socket: 32
> Socket(s): 2
> NUMA node(s): 2
> Vendor ID: Cavium
> Model: 1
> Model name: ThunderX2 99xx
> Stepping: 0x1
> BogoMIPS: 400.00
> L1d cache: 32K
> L1i cache: 32K
> L2 cache: 256K
> L3 cache: 32768K
> NUMA node0 CPU(s): 0-127
> NUMA node1 CPU(s): 128-255
> Flags: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics cpuid
> asimdrdm
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
next prev parent reply other threads:[~2018-12-12 22:37 UTC|newest]
Thread overview: 41+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-12-10 22:30 arm64: kdump broken on a large CPU system Qian Cai
2018-12-11 10:09 ` Marc Zyngier
2018-12-11 11:34 ` James Morse
2018-12-12 2:51 ` AKASHI, Takahiro
2018-12-12 4:39 ` Qian Cai
2018-12-12 4:39 ` Qian Cai
2018-12-12 22:37 ` Qian Cai [this message]
2018-12-12 22:37 ` Qian Cai
2018-12-13 5:22 ` [PATCH] arm64: invalidate TLB before turning MMU on Qian Cai
2018-12-13 5:22 ` Qian Cai
2018-12-13 5:22 ` Qian Cai
2018-12-13 5:40 ` Bhupesh Sharma
2018-12-13 5:40 ` Bhupesh Sharma
2018-12-13 5:40 ` Bhupesh Sharma
2018-12-13 13:39 ` Qian Cai
2018-12-13 13:39 ` Qian Cai
2018-12-13 13:39 ` Qian Cai
2018-12-13 10:44 ` James Morse
2018-12-13 10:44 ` James Morse
2018-12-13 10:44 ` James Morse
2018-12-13 13:44 ` Qian Cai
2018-12-13 13:44 ` Qian Cai
2018-12-13 13:44 ` Qian Cai
2018-12-14 4:08 ` [PATCH v2] arm64: invalidate TLB just " Qian Cai
2018-12-14 4:08 ` Qian Cai
2018-12-14 4:08 ` Qian Cai
2018-12-14 5:01 ` Bhupesh Sharma
2018-12-14 5:01 ` Bhupesh Sharma
2018-12-14 5:01 ` Bhupesh Sharma
2018-12-14 12:54 ` Qian Cai
2018-12-14 12:54 ` Qian Cai
2018-12-14 12:54 ` Qian Cai
2018-12-14 7:23 ` Ard Biesheuvel
2018-12-14 7:23 ` Ard Biesheuvel
2018-12-14 7:23 ` Ard Biesheuvel
2018-12-15 1:53 ` Qian Cai
2018-12-15 1:53 ` Qian Cai
2018-12-15 1:53 ` Qian Cai
2019-01-10 20:00 ` Bhupesh Sharma
2019-01-10 20:00 ` Bhupesh Sharma
2019-01-10 20:00 ` Bhupesh Sharma
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1544654224.18411.11.camel@lca.pw \
--to=cai@lca.pw \
--cc=ard.biesheuvel@linaro.org \
--cc=catalin.marinas@arm.com \
--cc=james.morse@arm.com \
--cc=kexec@lists.infradead.org \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=marc.zyngier@arm.com \
--cc=takahiro.akashi@linaro.org \
--cc=will.deacon@arm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.