From: will.deacon@arm.com (Will Deacon)
To: linux-arm-kernel@lists.infradead.org
Subject: KVM guest sometimes failed to boot because of kernel stack overflow if KPTI is enabled on a hisilicon ARM64 platform.
Date: Wed, 20 Jun 2018 15:42:58 +0100 [thread overview]
Message-ID: <20180620144257.GB27776@arm.com> (raw)
In-Reply-To: <5B2A6218.3030201@hisilicon.com>
Hi Wei,
On Wed, Jun 20, 2018 at 10:18:00PM +0800, Wei Xu wrote:
> We have observed KVM guest sometimes failed to boot because of kernel stack
> overflow if KPTI is enabled on a hisilicon arm64 platform.
>
> We also tested with different kernel version and found it is only
> happened if the KPTI and KVM(enable-kvm & cpu=host) are enabled on the
> guest.
> The detail result is as below table.
>
> +---------+----------+--------+------------+-------------------+
> | host |host KPTI | guest | guest KPTI | kvm guest |
> | kernel |enabled | kernel | enabled | booting result |
> +---------+----------+--------+------------+-------------------+
> | 4.17 | Y | 4.17 | Y | stack overflow |
> +---------+----------+--------+------------+-------------------+
> | 4.17 | Y | 4.16 | NA | OK |
> +---------+----------+--------+------------+-------------------+
> | 4.16 | NA | 4.17 | Y | stack overflow |
> +---------+----------+--------+------------+-------------------+
> | 4.16 | NA | 4.16 | NA | OK |
> +---------+----------+--------+------------+-------------------+
>
> A simple walk-around is adding this platform into the "kpti_safe_list".
> But it does not resolve the issue indeed.
> Could you please share any hint how to resolve this kind issue?
> Thanks!
>
> Another issue we found is "kpti_install_ng_mappings" will be invoked
> even "kpti=off" has been added in the kernel command line. Is that expected?
> This is because "kpti" is not a *early* param that "init_cpu_features" will
> be invoked before parsing the param.
That sounds like a straightforward bug, which means we should use
early_param instead of __setup. I assume that doesn't fix your crash,
though?
> The command we are using to run the guest is as:
>
> ./qemu-system-aarch64 -machine virt,kernel_irqchip=on,gic-version=3 -cpu
> host
> -enable-kvm -smp 1 -m 1024 -kernel ./Image -initrd
> ../mini-rootfs-arm64.cpio.gz
> -nographic -append "rdinit=init console=ttyAMA0
> earlycon=pl011,0x9000000"
>
> The log is as below:
>
> [ 0.000000] Booting Linux on physical CPU 0x0000000000
> [0x480fd010]
> [ 0.000000] Linux version 4.17.0-45864-g29dcea8-dirty
> (joyx at Turing-Arch-b) (gcc version 4.9.1 20140505 (prerelease) (crosstool-NG
> linaro-1.13.1-4.9-2014.05 - Linaro GCC 4.9-2014.05)) #6 SMP PREEMPT Fri Jun
> 15 21:39:52 CST 2018
^^^ This is reproducible with vanilla v4.17 and defconfig, right?
> [ 0.038859] SMP: Total of 1 processors activated.
> [ 0.039338] CPU features: detected: GIC system register CPU
> interface
> [ 0.039988] CPU features: detected: Privileged Access Never
> [ 0.040560] CPU features: detected: User Access Override
> [ 0.041093] CPU features: detected: RAS Extension Support
> [ 0.042947] Insufficient stack space to handle exception!
> [ 0.042949] ESR: 0x96000046 -- DABT (current EL)
> [ 0.043963] FAR: 0xffff0000093a80e0
> [ 0.045794] Task stack: [0xffff0000093a8000..0xffff0000093ac000]
> [ 0.052181] IRQ stack: [0xffff000008000000..0xffff000008004000]
> [ 0.058572] Overflow stack:
> [0xffff80003efce2f0..0xffff80003efcf2f0]
> [ 0.065068] CPU: 0 PID: 12 Comm: migration/0 Not tainted
> 4.17.0-45864-g29dcea8-dirty #6
> [ 0.073138] Hardware name: linux,dummy-virt (DT)
> [ 0.077831] pstate: 604003c5 (nZCv DAIF +PAN -UAO)
> [ 0.082661] pc : el1_sync+0x0/0xb0
> [ 0.086152] lr : kpti_install_ng_mappings+0x120/0x214
Can you use scripts/faddr2line to find out which line of code the lr is
pointing at, please? It would be interesting to know if we managed to
install the idmap.
Hmm, I wonder if this is at all related to RAS, since we've just enabled
that and if we take a fault whilst rewriting swapper then we're going to
get stuck. What happens if you set CONFIG_ARM64_RAS_EXTN=n in the guest?
Will
WARNING: multiple messages have this Message-ID (diff)
From: Will Deacon <will.deacon@arm.com>
To: Wei Xu <xuwei5@hisilicon.com>
Cc: catalin.marinas@arm.com, suzuki.poulose@arm.com,
dave.martin@arm.com, mark.rutland@arm.com, james.morse@arm.com,
marc.zyngier@arm.com, linux-arm-kernel@lists.infradead.org,
linux-kernel@vger.kernel.org, Linuxarm <linuxarm@huawei.com>,
Hanjun Guo <guohanjun@huawei.com>,
xiexiuqi@huawei.com, huangdaode <huangdaode@hisilicon.com>,
"Chenxin (Charles)" <charles.chenxin@huawei.com>,
"Xiongfanggou (James)" <james.xiong@huawei.com>,
"Liguozhu (Kenneth)" <liguozhu@hisilicon.com>,
Zhangyi ac <zhangyi.ac@huawei.com>,
jonathan.cameron@huawei.com,
Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>,
John Garry <john.garry@huawei.com>,
Salil Mehta <salil.mehta@huawei.com>,
Shiju Jose <shiju.jose@huawei.com>,
"Zhuangyuzeng (Yisen)" <yisen.zhuang@huawei.com>,
"Wangzhou (B)" <wangzhou1@hisilicon.com>,
"kongxinwei (A)" <kong.kongxinwei@hisilicon.com>,
"Liyuan (Larry, Turing Solution)" <Larry.T@huawei.com>,
libeijian@hisilicon.com
Subject: Re: KVM guest sometimes failed to boot because of kernel stack overflow if KPTI is enabled on a hisilicon ARM64 platform.
Date: Wed, 20 Jun 2018 15:42:58 +0100 [thread overview]
Message-ID: <20180620144257.GB27776@arm.com> (raw)
In-Reply-To: <5B2A6218.3030201@hisilicon.com>
Hi Wei,
On Wed, Jun 20, 2018 at 10:18:00PM +0800, Wei Xu wrote:
> We have observed KVM guest sometimes failed to boot because of kernel stack
> overflow if KPTI is enabled on a hisilicon arm64 platform.
>
> We also tested with different kernel version and found it is only
> happened if the KPTI and KVM(enable-kvm & cpu=host) are enabled on the
> guest.
> The detail result is as below table.
>
> +---------+----------+--------+------------+-------------------+
> | host |host KPTI | guest | guest KPTI | kvm guest |
> | kernel |enabled | kernel | enabled | booting result |
> +---------+----------+--------+------------+-------------------+
> | 4.17 | Y | 4.17 | Y | stack overflow |
> +---------+----------+--------+------------+-------------------+
> | 4.17 | Y | 4.16 | NA | OK |
> +---------+----------+--------+------------+-------------------+
> | 4.16 | NA | 4.17 | Y | stack overflow |
> +---------+----------+--------+------------+-------------------+
> | 4.16 | NA | 4.16 | NA | OK |
> +---------+----------+--------+------------+-------------------+
>
> A simple walk-around is adding this platform into the "kpti_safe_list".
> But it does not resolve the issue indeed.
> Could you please share any hint how to resolve this kind issue?
> Thanks!
>
> Another issue we found is "kpti_install_ng_mappings" will be invoked
> even "kpti=off" has been added in the kernel command line. Is that expected?
> This is because "kpti" is not a *early* param that "init_cpu_features" will
> be invoked before parsing the param.
That sounds like a straightforward bug, which means we should use
early_param instead of __setup. I assume that doesn't fix your crash,
though?
> The command we are using to run the guest is as:
>
> ./qemu-system-aarch64 -machine virt,kernel_irqchip=on,gic-version=3 -cpu
> host
> -enable-kvm -smp 1 -m 1024 -kernel ./Image -initrd
> ../mini-rootfs-arm64.cpio.gz
> -nographic -append "rdinit=init console=ttyAMA0
> earlycon=pl011,0x9000000"
>
> The log is as below:
>
> [ 0.000000] Booting Linux on physical CPU 0x0000000000
> [0x480fd010]
> [ 0.000000] Linux version 4.17.0-45864-g29dcea8-dirty
> (joyx@Turing-Arch-b) (gcc version 4.9.1 20140505 (prerelease) (crosstool-NG
> linaro-1.13.1-4.9-2014.05 - Linaro GCC 4.9-2014.05)) #6 SMP PREEMPT Fri Jun
> 15 21:39:52 CST 2018
^^^ This is reproducible with vanilla v4.17 and defconfig, right?
> [ 0.038859] SMP: Total of 1 processors activated.
> [ 0.039338] CPU features: detected: GIC system register CPU
> interface
> [ 0.039988] CPU features: detected: Privileged Access Never
> [ 0.040560] CPU features: detected: User Access Override
> [ 0.041093] CPU features: detected: RAS Extension Support
> [ 0.042947] Insufficient stack space to handle exception!
> [ 0.042949] ESR: 0x96000046 -- DABT (current EL)
> [ 0.043963] FAR: 0xffff0000093a80e0
> [ 0.045794] Task stack: [0xffff0000093a8000..0xffff0000093ac000]
> [ 0.052181] IRQ stack: [0xffff000008000000..0xffff000008004000]
> [ 0.058572] Overflow stack:
> [0xffff80003efce2f0..0xffff80003efcf2f0]
> [ 0.065068] CPU: 0 PID: 12 Comm: migration/0 Not tainted
> 4.17.0-45864-g29dcea8-dirty #6
> [ 0.073138] Hardware name: linux,dummy-virt (DT)
> [ 0.077831] pstate: 604003c5 (nZCv DAIF +PAN -UAO)
> [ 0.082661] pc : el1_sync+0x0/0xb0
> [ 0.086152] lr : kpti_install_ng_mappings+0x120/0x214
Can you use scripts/faddr2line to find out which line of code the lr is
pointing at, please? It would be interesting to know if we managed to
install the idmap.
Hmm, I wonder if this is at all related to RAS, since we've just enabled
that and if we take a fault whilst rewriting swapper then we're going to
get stuck. What happens if you set CONFIG_ARM64_RAS_EXTN=n in the guest?
Will
next prev parent reply other threads:[~2018-06-20 14:42 UTC|newest]
Thread overview: 79+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-06-20 14:18 KVM guest sometimes failed to boot because of kernel stack overflow if KPTI is enabled on a hisilicon ARM64 platform Wei Xu
2018-06-20 14:18 ` Wei Xu
2018-06-20 14:42 ` Will Deacon [this message]
2018-06-20 14:42 ` Will Deacon
2018-06-20 15:52 ` Wei Xu
2018-06-20 15:52 ` Wei Xu
2018-06-20 15:54 ` James Morse
2018-06-20 15:54 ` James Morse
2018-06-20 16:25 ` Wei Xu
2018-06-20 16:25 ` Wei Xu
2018-06-20 16:28 ` Will Deacon
2018-06-20 16:28 ` Will Deacon
2018-06-20 16:33 ` Wei Xu
2018-06-20 16:33 ` Wei Xu
2018-06-21 8:38 ` James Morse
2018-06-21 8:38 ` James Morse
2018-06-21 9:00 ` Marc Zyngier
2018-06-21 9:00 ` Marc Zyngier
2018-06-21 9:18 ` Will Deacon
2018-06-21 9:18 ` Will Deacon
2018-06-21 10:14 ` Wei Xu
2018-06-21 10:14 ` Wei Xu
2018-06-21 10:54 ` Will Deacon
2018-06-21 10:54 ` Will Deacon
2018-06-22 8:33 ` Wei Xu
2018-06-22 8:33 ` Wei Xu
2018-06-22 9:23 ` Will Deacon
2018-06-22 9:23 ` Will Deacon
2018-06-22 10:45 ` Wei Xu
2018-06-22 10:45 ` Wei Xu
2018-06-22 11:16 ` Will Deacon
2018-06-22 11:16 ` Will Deacon
2018-06-22 13:18 ` Wei Xu
2018-06-22 13:18 ` Wei Xu
2018-06-22 13:31 ` Will Deacon
2018-06-22 13:31 ` Will Deacon
2018-06-22 13:46 ` Wei Xu
2018-06-22 13:46 ` Wei Xu
2018-06-22 14:43 ` Will Deacon
2018-06-22 14:43 ` Will Deacon
2018-06-22 15:26 ` Wei Xu
2018-06-22 15:26 ` Wei Xu
2018-06-22 14:28 ` Mark Rutland
2018-06-22 14:28 ` Mark Rutland
2018-06-22 15:28 ` Wei Xu
2018-06-22 15:28 ` Wei Xu
2018-06-22 15:41 ` Will Deacon
2018-06-22 15:41 ` Will Deacon
2018-06-22 16:02 ` Wei Xu
2018-06-22 16:02 ` Wei Xu
2018-06-21 9:20 ` Wei Xu
2018-06-21 9:20 ` Wei Xu
2018-06-26 17:16 ` Wei Xu
2018-06-26 17:16 ` Wei Xu
2018-06-26 17:47 ` Will Deacon
2018-06-26 17:47 ` Will Deacon
2018-06-27 8:39 ` James Morse
2018-06-27 8:39 ` James Morse
2018-06-27 13:26 ` Wei Xu
2018-06-27 13:26 ` Wei Xu
2018-06-28 8:45 ` James Morse
2018-06-28 8:45 ` James Morse
2018-06-28 10:20 ` Wei Xu
2018-06-28 10:20 ` Wei Xu
2018-06-27 13:22 ` Wei Xu
2018-06-27 13:22 ` Wei Xu
2018-06-27 13:28 ` Will Deacon
2018-06-27 13:28 ` Will Deacon
2018-06-27 13:32 ` Wei Xu
2018-06-27 13:32 ` Wei Xu
2018-06-28 14:50 ` Wei Xu
2018-06-28 14:50 ` Wei Xu
2018-06-28 15:34 ` Mark Rutland
2018-06-28 15:34 ` Mark Rutland
[not found] ` <etPan.5b3507f7.914aa16.1d6b@localhost>
2018-06-28 16:24 ` 答复: " Mark Rutland
2018-06-28 16:24 ` Mark Rutland
2018-06-29 9:59 ` Mark Rutland
2018-06-29 9:59 ` Mark Rutland
2018-06-29 8:47 ` Marc Zyngier
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180620144257.GB27776@arm.com \
--to=will.deacon@arm.com \
--cc=linux-arm-kernel@lists.infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.