All of lore.kernel.org
 help / color / mirror / Atom feed
From: Qian Cai <cai@lca.pw>
To: "AKASHI, Takahiro" <takahiro.akashi@linaro.org>,
	James Morse <james.morse@arm.com>,
	Marc Zyngier <marc.zyngier@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>,
	Will Deacon <will.deacon@arm.com>,
	kexec@lists.infradead.org, linux-arm-kernel@lists.infradead.org,
	Ard Biesheuvel <ard.biesheuvel@linaro.org>
Subject: Re: arm64: kdump broken on a large CPU system
Date: Wed, 12 Dec 2018 17:37:04 -0500	[thread overview]
Message-ID: <1544654224.18411.11.camel@lca.pw> (raw)
In-Reply-To: <e4a3456b-6a75-4564-a49f-0532d0b35726@lca.pw>

On Tue, 2018-12-11 at 23:39 -0500, Qian Cai wrote:
> [+ kexec@lists.infradead.org]
> 
> The debugging progress so far...
> 
> Wait up to 5 minutes for other CPUs to stop in crash_smp_send_stop() made no
> difference.
> 
> With "dev" branch of this tree [1], it is possible to print out messages from
> purgatory when passing something like "--port=0x602B0000
> --port-lsr=0x602B0000,0x80" to kexec. However, even enable_dcache() in
> setup_arch() will hung like forever on this machine (working fine on another
> arm64 server - Cortex-A72). After removed only enable_dcache() /
> disable_dcache() from setup_arch() etc without removing printf() lines, it did
> print out,
> 
> I'm in purgatory
> purgatory: entry=0000000090080000
> purgatory: dtb=0000000092d50000
> purgatory: D-cache Enabled before SHA verification
> purgatory: D-cache Disabled after SHA verification
> 
> So, it confirmed that it must hung somewhere in arm64/kernel/head.S (.stext)
> or
> the early part of start_kernel() before earlycon was initialized.
> 
> Also confirmed that passing nr_cpus=64 in the first kernel would again make
> everything work fine with this new kexec.
> 
> Since enable_dcache() would hung as well, I suspect this has something to do
> with enabling MMU (i.e, .stext -> __primary_switch -> __enable_mmu) coupling
> with some sort of per-CPU data where the number of CPUs matters.

Still debugging a hung to enable MMU (enable_dcache) in purgatory [1] which may
provide some clues for the hung later in the 2nd kernel.

dsb	nshst
tlbi	alle2
dsb	nsh
isb
bl	get_ips_bits
lsl	x1, x0, #TCR_IPS_EL2_SHIFT
orr	x1, x1, x7
mov	x0, x6
ldr	x2, =MEMORY_ATTRIBUTES
msr	mair_el2, x2
msr	tcr_el2, x1
msr	ttbr0_el2, x0
isb
mrs	x0, sctlr_el2
ldr	x3, =SCTLR_ELx_FLAGS
orr	x0, x0, x3
msr	sctlr_el2, x0     <--- hung right on this instruction.

Without CONFIG_ARM64_VHE (i.e., running in EL1), it is able to run
enable_dcache() but it still hung later in the 2nd kernel somewhere.

dsb	nshst
tlbi	vmalle1
dsb	nsh
isb
bl	get_ips_bits
lsl	x1, x0, #TCR_IPS_EL1_SHIFT
orr	x1, x1, x7
mov	x0, x6
ldr	x2, =MEMORY_ATTRIBUTES
msr	mair_el1, x2
msr	tcr_el1, x1
msr	ttbr0_el1, x0
isb
mrs	x0, sctlr_el1
ldr	x3, =SCTLR_ELx_FLAGS
orr	x0, x0, x3
msr	sctlr_el1, x0
isb

One data point of this system is that it has 4 threads on each core. Each 2-core 
share a same L1 and L2 caches, so that is 8 CPUs shares them each. All CPUs
share a same L3 cache.

Hence, I wonder if this is because of incomplete cache/TLB invalidation that had
stale entries (or uninitialised junk which just happens to look valid) present
before turning the MMU on.

[1] https://github.com/pratyushanand/kexec-tools/blob/devel/purgatory/arch/\
arm64/cache.S

> 
> Right now, I think I need to find a way to print directly to pl011 serial
> console while debugging those assembly code like CONFIG_DEBUG_LL for arm64, so
> it can be used to locate where exactly it hung. Otherwise, I am shooting in
> the
> dark.
> 
> [1] https://github.com/pratyushanand/kexec-tools
> 
> === original email ===
> 
> On this HPE Apollo 70 arm64 server with 256 CPUs, triggering a crash dump just
> hung (4.20-rc6 as well as 4.18). It was confirmed that the executing went as
> far
> as entering __cpu_soft_restart(),
> 
> __crash_kexec
>   machine_kexec
>     cpu_soft_restart
>       restart
>         __cpu_soft_restart
> 
> The earlycon was enabled but had no output from the 2nd kernel, so it was
> pretty
> much stuck in all those assembly code in arm64/kernel/head.S or the early part
> of start_kernel() before earlycon was initialized.
> 
> It turned out this has something to do with nr_cpus in the 1st kernel,
> although
> the 2nd kernel always has nr_cpus=1 [1]. It was tested with both
> crashkernel=512M or 768M.
> 
> nr_cpus <= 96  GOOD (2nd kernel was up in 2-3 mins.)
> nr_cpus=256    BAD  (2nd kernel was NOT up after 1 hour.)
> nr_cpus=127    BAD  (2nd kernel was NOT up after 10 mins.)
> 
> I did also test with and without CONFIG_ARM64_VHE (i.e., el2_switch) made no
> difference.
> 
> [1] KDUMP_COMMANDLINE_APPEND="irqpoll nr_cpus=1 swiotlb=noforce reset_devices"
> 
> I am still figuring out a way to debug those assembly code to where it
> actually
> hung, but the server was hooked up with a conserver that was not able to
> generate any sysrq and I have no shell access to the conserver, so seems a bit
> difficult to use kgdb or kdb in this case.
> 
> CPU information,
> 
> # lscpu
> Architecture:        aarch64
> Byte Order:          Little Endian
> CPU(s):              256
> On-line CPU(s) list: 0-255
> Thread(s) per core:  4
> Core(s) per socket:  32
> Socket(s):           2
> NUMA node(s):        2
> Vendor ID:           Cavium
> Model:               1
> Model name:          ThunderX2 99xx
> Stepping:            0x1
> BogoMIPS:            400.00
> L1d cache:           32K
> L1i cache:           32K
> L2 cache:            256K
> L3 cache:            32768K
> NUMA node0 CPU(s):   0-127
> NUMA node1 CPU(s):   128-255
> Flags:               fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics cpuid
> asimdrdm

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

WARNING: multiple messages have this Message-ID (diff)
From: Qian Cai <cai@lca.pw>
To: "AKASHI, Takahiro" <takahiro.akashi@linaro.org>,
	James Morse <james.morse@arm.com>,
	Marc Zyngier <marc.zyngier@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>,
	Will Deacon <will.deacon@arm.com>,
	kexec@lists.infradead.org, linux-arm-kernel@lists.infradead.org,
	Ard Biesheuvel <ard.biesheuvel@linaro.org>
Subject: Re: arm64: kdump broken on a large CPU system
Date: Wed, 12 Dec 2018 17:37:04 -0500	[thread overview]
Message-ID: <1544654224.18411.11.camel@lca.pw> (raw)
In-Reply-To: <e4a3456b-6a75-4564-a49f-0532d0b35726@lca.pw>

On Tue, 2018-12-11 at 23:39 -0500, Qian Cai wrote:
> [+ kexec@lists.infradead.org]
> 
> The debugging progress so far...
> 
> Wait up to 5 minutes for other CPUs to stop in crash_smp_send_stop() made no
> difference.
> 
> With "dev" branch of this tree [1], it is possible to print out messages from
> purgatory when passing something like "--port=0x602B0000
> --port-lsr=0x602B0000,0x80" to kexec. However, even enable_dcache() in
> setup_arch() will hung like forever on this machine (working fine on another
> arm64 server - Cortex-A72). After removed only enable_dcache() /
> disable_dcache() from setup_arch() etc without removing printf() lines, it did
> print out,
> 
> I'm in purgatory
> purgatory: entry=0000000090080000
> purgatory: dtb=0000000092d50000
> purgatory: D-cache Enabled before SHA verification
> purgatory: D-cache Disabled after SHA verification
> 
> So, it confirmed that it must hung somewhere in arm64/kernel/head.S (.stext)
> or
> the early part of start_kernel() before earlycon was initialized.
> 
> Also confirmed that passing nr_cpus=64 in the first kernel would again make
> everything work fine with this new kexec.
> 
> Since enable_dcache() would hung as well, I suspect this has something to do
> with enabling MMU (i.e, .stext -> __primary_switch -> __enable_mmu) coupling
> with some sort of per-CPU data where the number of CPUs matters.

Still debugging a hung to enable MMU (enable_dcache) in purgatory [1] which may
provide some clues for the hung later in the 2nd kernel.

dsb	nshst
tlbi	alle2
dsb	nsh
isb
bl	get_ips_bits
lsl	x1, x0, #TCR_IPS_EL2_SHIFT
orr	x1, x1, x7
mov	x0, x6
ldr	x2, =MEMORY_ATTRIBUTES
msr	mair_el2, x2
msr	tcr_el2, x1
msr	ttbr0_el2, x0
isb
mrs	x0, sctlr_el2
ldr	x3, =SCTLR_ELx_FLAGS
orr	x0, x0, x3
msr	sctlr_el2, x0     <--- hung right on this instruction.

Without CONFIG_ARM64_VHE (i.e., running in EL1), it is able to run
enable_dcache() but it still hung later in the 2nd kernel somewhere.

dsb	nshst
tlbi	vmalle1
dsb	nsh
isb
bl	get_ips_bits
lsl	x1, x0, #TCR_IPS_EL1_SHIFT
orr	x1, x1, x7
mov	x0, x6
ldr	x2, =MEMORY_ATTRIBUTES
msr	mair_el1, x2
msr	tcr_el1, x1
msr	ttbr0_el1, x0
isb
mrs	x0, sctlr_el1
ldr	x3, =SCTLR_ELx_FLAGS
orr	x0, x0, x3
msr	sctlr_el1, x0
isb

One data point of this system is that it has 4 threads on each core. Each 2-core 
share a same L1 and L2 caches, so that is 8 CPUs shares them each. All CPUs
share a same L3 cache.

Hence, I wonder if this is because of incomplete cache/TLB invalidation that had
stale entries (or uninitialised junk which just happens to look valid) present
before turning the MMU on.

[1] https://github.com/pratyushanand/kexec-tools/blob/devel/purgatory/arch/\
arm64/cache.S

> 
> Right now, I think I need to find a way to print directly to pl011 serial
> console while debugging those assembly code like CONFIG_DEBUG_LL for arm64, so
> it can be used to locate where exactly it hung. Otherwise, I am shooting in
> the
> dark.
> 
> [1] https://github.com/pratyushanand/kexec-tools
> 
> === original email ===
> 
> On this HPE Apollo 70 arm64 server with 256 CPUs, triggering a crash dump just
> hung (4.20-rc6 as well as 4.18). It was confirmed that the executing went as
> far
> as entering __cpu_soft_restart(),
> 
> __crash_kexec
>   machine_kexec
>     cpu_soft_restart
>       restart
>         __cpu_soft_restart
> 
> The earlycon was enabled but had no output from the 2nd kernel, so it was
> pretty
> much stuck in all those assembly code in arm64/kernel/head.S or the early part
> of start_kernel() before earlycon was initialized.
> 
> It turned out this has something to do with nr_cpus in the 1st kernel,
> although
> the 2nd kernel always has nr_cpus=1 [1]. It was tested with both
> crashkernel=512M or 768M.
> 
> nr_cpus <= 96  GOOD (2nd kernel was up in 2-3 mins.)
> nr_cpus=256    BAD  (2nd kernel was NOT up after 1 hour.)
> nr_cpus=127    BAD  (2nd kernel was NOT up after 10 mins.)
> 
> I did also test with and without CONFIG_ARM64_VHE (i.e., el2_switch) made no
> difference.
> 
> [1] KDUMP_COMMANDLINE_APPEND="irqpoll nr_cpus=1 swiotlb=noforce reset_devices"
> 
> I am still figuring out a way to debug those assembly code to where it
> actually
> hung, but the server was hooked up with a conserver that was not able to
> generate any sysrq and I have no shell access to the conserver, so seems a bit
> difficult to use kgdb or kdb in this case.
> 
> CPU information,
> 
> # lscpu
> Architecture:        aarch64
> Byte Order:          Little Endian
> CPU(s):              256
> On-line CPU(s) list: 0-255
> Thread(s) per core:  4
> Core(s) per socket:  32
> Socket(s):           2
> NUMA node(s):        2
> Vendor ID:           Cavium
> Model:               1
> Model name:          ThunderX2 99xx
> Stepping:            0x1
> BogoMIPS:            400.00
> L1d cache:           32K
> L1i cache:           32K
> L2 cache:            256K
> L3 cache:            32768K
> NUMA node0 CPU(s):   0-127
> NUMA node1 CPU(s):   128-255
> Flags:               fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics cpuid
> asimdrdm

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

  reply	other threads:[~2018-12-12 22:37 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-12-10 22:30 arm64: kdump broken on a large CPU system Qian Cai
2018-12-11 10:09 ` Marc Zyngier
2018-12-11 11:34   ` James Morse
2018-12-12  2:51     ` AKASHI, Takahiro
2018-12-12  4:39       ` Qian Cai
2018-12-12  4:39         ` Qian Cai
2018-12-12 22:37         ` Qian Cai [this message]
2018-12-12 22:37           ` Qian Cai
2018-12-13  5:22           ` [PATCH] arm64: invalidate TLB before turning MMU on Qian Cai
2018-12-13  5:22             ` Qian Cai
2018-12-13  5:22             ` Qian Cai
2018-12-13  5:40             ` Bhupesh Sharma
2018-12-13  5:40               ` Bhupesh Sharma
2018-12-13  5:40               ` Bhupesh Sharma
2018-12-13 13:39               ` Qian Cai
2018-12-13 13:39                 ` Qian Cai
2018-12-13 13:39                 ` Qian Cai
2018-12-13 10:44             ` James Morse
2018-12-13 10:44               ` James Morse
2018-12-13 10:44               ` James Morse
2018-12-13 13:44               ` Qian Cai
2018-12-13 13:44                 ` Qian Cai
2018-12-13 13:44                 ` Qian Cai
2018-12-14  4:08             ` [PATCH v2] arm64: invalidate TLB just " Qian Cai
2018-12-14  4:08               ` Qian Cai
2018-12-14  4:08               ` Qian Cai
2018-12-14  5:01               ` Bhupesh Sharma
2018-12-14  5:01                 ` Bhupesh Sharma
2018-12-14  5:01                 ` Bhupesh Sharma
2018-12-14 12:54                 ` Qian Cai
2018-12-14 12:54                   ` Qian Cai
2018-12-14 12:54                   ` Qian Cai
2018-12-14  7:23               ` Ard Biesheuvel
2018-12-14  7:23                 ` Ard Biesheuvel
2018-12-14  7:23                 ` Ard Biesheuvel
2018-12-15  1:53                 ` Qian Cai
2018-12-15  1:53                   ` Qian Cai
2018-12-15  1:53                   ` Qian Cai
2019-01-10 20:00                   ` Bhupesh Sharma
2019-01-10 20:00                     ` Bhupesh Sharma
2019-01-10 20:00                     ` Bhupesh Sharma

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1544654224.18411.11.camel@lca.pw \
    --to=cai@lca.pw \
    --cc=ard.biesheuvel@linaro.org \
    --cc=catalin.marinas@arm.com \
    --cc=james.morse@arm.com \
    --cc=kexec@lists.infradead.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=marc.zyngier@arm.com \
    --cc=takahiro.akashi@linaro.org \
    --cc=will.deacon@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.