KVM guest sometimes failed to boot because of kernel stack overflow if KPTI is enabled on a hisilicon ARM64 platform.

All of lore.kernel.org
 help / color / mirror / Atom feed

From: xuwei5@hisilicon.com (Wei Xu)
To: linux-arm-kernel@lists.infradead.org
Subject: KVM guest sometimes failed to boot because of kernel stack overflow if KPTI is enabled on a hisilicon ARM64 platform.
Date: Wed, 27 Jun 2018 14:22:03 +0100	[thread overview]
Message-ID: <5B338F7B.9070500@hisilicon.com> (raw)
In-Reply-To: <20180626174746.GO23375@arm.com>

Hi Will,

On 2018/6/26 18:47, Will Deacon wrote:
> Hi Wei,
> 
> On Wed, Jun 27, 2018 at 01:16:44AM +0800, Wei Xu wrote:
>> Today I tried the kernel 4.18-rc2(defconfig, no change on top) with qemu
>> 2.12.0.
>> The guest sometimes still failed to boot. But the crash reason is different.
>> Could you please share any hint?
>> Thanks!
>>
>> The guest boot log is as below:
>> ===========================
>>
>>     estuary:/$ ./qemu-system-aarch64 -machine virt,kernel_irqchip=on,gic-v
>>     ersion=3 -cpu host -enable-kvm -smp 1 -m 1024 -kernel ./Image-4.18-joyx
>> -initrd
>>     ../mini-rootfs-arm64.cpio.gz -nographic -append "rdinit=init
>> console=ttyAMA0 ear
>>     lycon=pl011,0x9000000"
>>
>>     [    0.000000] Booting Linux on physical CPU 0x0000000000 [0x480fd010]
>>     [    0.000000] Linux version 4.18.0-rc2-58583-g7daf201-dirty
> 
> I'm still suspicious that this is 4.18-rc2 with "no change on top" ^^^ !

Sorry, I should highlight that I have updated the default value
of CONFIG_NR_CPUS by menuconfig in the previous mail.
That is why it showed dirty.

> 
>>     [    0.048119] Unable to handle kernel NULL pointer dereference at
>> virtual address 0000000000000288
>>     [    0.048991] Mem abort info:
>>     [    0.049267]   ESR = 0x96000004
>>     [    0.049567]   Exception class = DABT (current EL), IL = 32 bits
>>     [    0.050146]   SET = 0, FnV = 0
>>     [    0.050446]   EA = 0, S1PTW = 0
>>     [    0.050754] Data abort info:
>>     [    0.051038]   ISV = 0, ISS = 0x00000004
>>     [    0.051921]   CM = 0, WnR = 0
>>     [    0.054936] [0000000000000288] user address but active_mm is swapper
>>     [    0.061427] Internal error: Oops: 96000004 [#1] PREEMPT SMP
>>     [    0.067080] Modules linked in:
>>     [    0.070206] CPU: 0 PID: 13 Comm: migration/0 Not tainted
>> 4.18.0-rc2-58583-g7daf201-dirty #20
>>     [    0.078745] Hardware name: linux,dummy-virt (DT)
>>     [    0.083433] pstate: 60400085 (nZCv daIf +PAN -UAO)
>>     [    0.088258] pc : kpti_install_ng_mappings+0x154/0x214
>>     [    0.093319] lr : kpti_install_ng_mappings+0x120/0x214
>>     [    0.098483] sp : ffff0000093fbce0
>>     [    0.101854] x29: ffff0000093fbce0 x28: ffff000008ee5000
>>     [    0.107263] x27: ffff000008ee5000 x26: ffff00000923b000
>>     [    0.112568] x25: ffff0000090ac000 x24: ffff0000091d9000
>>     [    0.117983] x23: ffff000008ee5000 x22: 00000000411d8000
>>     [    0.123392] x21: ffff00000923b000 x20: 0000000000000000
>>     [    0.128801] x19: ffff0000091d8000 x18: 000000003455d99d
>>     [    0.134209] x17: 0000000000000001 x16: 00f8000040ffff13
>>     [    0.139513] x15: 000000007dff5000 x14: 000000007dff5000
>>     [    0.144920] x13: 00f800007fe00f11 x12: 000000007dff7000
>>     [    0.150329] x11: 000000007dff7000 x10: 0000000000000000
>>     [    0.155633] x9 : 000000007dff8000 x8 : 000000007dff8000
>>     [    0.161042] x7 : 0000000000000000 x6 : 000000004123c000
>>     [    0.166451] x5 : 000000004123c000 x4 : 0000000040a5f3d4
>>     [    0.171860] x3 : 0000000000000000 x2 : 000000004123b000
>>     [    0.177163] x1 : ffff0000090acd88 x0 : ffff80003ca627c0
> 
> So looking at the disassembly, we access idmap_t0sz as part of
> cpu_install_idmap() and it looks like we push its page address to the
> stack:
> 
>>        0xffff000008091ffc <+128>:   adrp    x3, 0xffff000009096000 <early_node_cpu_hwid+1440>
> 
> [...]
> 
>>        0xffff000008092044 <+200>:   str     x3, [x29,#96]
> 
> Then after we've come back from the asm call, we want to access idmap_t0sz
> again as part of cpu_uninstall_idmap() so we pop it back off:
> 
>>        0xffff0000080920cc <+336>:   ldr     x3, [x29,#96]
>>        0xffff0000080920d0 <+340>:   ldr     x0, [x3,#648]
> 
> And this access is the one that faults, because we popped off NULL.
> 

Thanks for your kindly explanation!

> So actually, rather than faulting on the stack access, we're managing to
> load zeroes from somewhere, so it could still be indicative of page table
> corruption for the stack mapping.
> 
> If you look at the __idmap_kpti_put_pgtable_ent_ng asm macro, can you try
> replacing:
> 
> 	dc      civac, cur_\()\type\()p
> 
> with:
> 
> 	dc      ivac, cur_\()\type\()p
> 
> please? Only do this for the guest kernel, not the host. KVM will upgrade
> the clean to a clean+invalidate, so it's interesting to see if this has
> an effect on the behaviour.

Only changed the guest kernel, the guest still failed to boot and the log
is same with the last mail.

But if I changed to cvac as below for the guest, it is kind of stable.
	dc      cvac, cur_\()\type\()p

I have synced with our SoC guys about this and hope we can find the reason.
Do you have any more suggestion?
Thanks!

Best Regards,
Wei

> 
> Will
> 
> .
>

WARNING: multiple messages have this Message-ID (diff)

From: Wei Xu <xuwei5@hisilicon.com>
To: Will Deacon <will.deacon@arm.com>
Cc: James Morse <james.morse@arm.com>, <mark.rutland@arm.com>,
	<catalin.marinas@arm.com>, Linuxarm <linuxarm@huawei.com>,
	Zhangyi ac <zhangyi.ac@huawei.com>, <suzuki.poulose@arm.com>,
	<marc.zyngier@arm.com>,
	"Xiongfanggou (James)" <james.xiong@huawei.com>,
	<linux-arm-kernel@lists.infradead.org>,
	<linux-kernel@vger.kernel.org>, <dave.martin@arm.com>,
	"Liyuan (Larry, Turing Solution)" <Larry.T@huawei.com>,
	<libeijian@hisilicon.com>, <zhangxiquan@hisilicon.com>,
	<wxf.wang@hisilicon.com>, <dingshuai1@huawei.com>,
	Hanjun Guo <guohanjun@huawei.com>,
	"Liguozhu (Kenneth)" <liguozhu@hisilicon.com>
Subject: Re: KVM guest sometimes failed to boot because of kernel stack overflow if KPTI is enabled on a hisilicon ARM64 platform.
Date: Wed, 27 Jun 2018 14:22:03 +0100	[thread overview]
Message-ID: <5B338F7B.9070500@hisilicon.com> (raw)
In-Reply-To: <20180626174746.GO23375@arm.com>

Hi Will,

On 2018/6/26 18:47, Will Deacon wrote:
> Hi Wei,
> 
> On Wed, Jun 27, 2018 at 01:16:44AM +0800, Wei Xu wrote:
>> Today I tried the kernel 4.18-rc2(defconfig, no change on top) with qemu
>> 2.12.0.
>> The guest sometimes still failed to boot. But the crash reason is different.
>> Could you please share any hint?
>> Thanks!
>>
>> The guest boot log is as below:
>> ===========================
>>
>>     estuary:/$ ./qemu-system-aarch64 -machine virt,kernel_irqchip=on,gic-v
>>     ersion=3 -cpu host -enable-kvm -smp 1 -m 1024 -kernel ./Image-4.18-joyx
>> -initrd
>>     ../mini-rootfs-arm64.cpio.gz -nographic -append "rdinit=init
>> console=ttyAMA0 ear
>>     lycon=pl011,0x9000000"
>>
>>     [    0.000000] Booting Linux on physical CPU 0x0000000000 [0x480fd010]
>>     [    0.000000] Linux version 4.18.0-rc2-58583-g7daf201-dirty
> 
> I'm still suspicious that this is 4.18-rc2 with "no change on top" ^^^ !

Sorry, I should highlight that I have updated the default value
of CONFIG_NR_CPUS by menuconfig in the previous mail.
That is why it showed dirty.

> 
>>     [    0.048119] Unable to handle kernel NULL pointer dereference at
>> virtual address 0000000000000288
>>     [    0.048991] Mem abort info:
>>     [    0.049267]   ESR = 0x96000004
>>     [    0.049567]   Exception class = DABT (current EL), IL = 32 bits
>>     [    0.050146]   SET = 0, FnV = 0
>>     [    0.050446]   EA = 0, S1PTW = 0
>>     [    0.050754] Data abort info:
>>     [    0.051038]   ISV = 0, ISS = 0x00000004
>>     [    0.051921]   CM = 0, WnR = 0
>>     [    0.054936] [0000000000000288] user address but active_mm is swapper
>>     [    0.061427] Internal error: Oops: 96000004 [#1] PREEMPT SMP
>>     [    0.067080] Modules linked in:
>>     [    0.070206] CPU: 0 PID: 13 Comm: migration/0 Not tainted
>> 4.18.0-rc2-58583-g7daf201-dirty #20
>>     [    0.078745] Hardware name: linux,dummy-virt (DT)
>>     [    0.083433] pstate: 60400085 (nZCv daIf +PAN -UAO)
>>     [    0.088258] pc : kpti_install_ng_mappings+0x154/0x214
>>     [    0.093319] lr : kpti_install_ng_mappings+0x120/0x214
>>     [    0.098483] sp : ffff0000093fbce0
>>     [    0.101854] x29: ffff0000093fbce0 x28: ffff000008ee5000
>>     [    0.107263] x27: ffff000008ee5000 x26: ffff00000923b000
>>     [    0.112568] x25: ffff0000090ac000 x24: ffff0000091d9000
>>     [    0.117983] x23: ffff000008ee5000 x22: 00000000411d8000
>>     [    0.123392] x21: ffff00000923b000 x20: 0000000000000000
>>     [    0.128801] x19: ffff0000091d8000 x18: 000000003455d99d
>>     [    0.134209] x17: 0000000000000001 x16: 00f8000040ffff13
>>     [    0.139513] x15: 000000007dff5000 x14: 000000007dff5000
>>     [    0.144920] x13: 00f800007fe00f11 x12: 000000007dff7000
>>     [    0.150329] x11: 000000007dff7000 x10: 0000000000000000
>>     [    0.155633] x9 : 000000007dff8000 x8 : 000000007dff8000
>>     [    0.161042] x7 : 0000000000000000 x6 : 000000004123c000
>>     [    0.166451] x5 : 000000004123c000 x4 : 0000000040a5f3d4
>>     [    0.171860] x3 : 0000000000000000 x2 : 000000004123b000
>>     [    0.177163] x1 : ffff0000090acd88 x0 : ffff80003ca627c0
> 
> So looking at the disassembly, we access idmap_t0sz as part of
> cpu_install_idmap() and it looks like we push its page address to the
> stack:
> 
>>        0xffff000008091ffc <+128>:   adrp    x3, 0xffff000009096000 <early_node_cpu_hwid+1440>
> 
> [...]
> 
>>        0xffff000008092044 <+200>:   str     x3, [x29,#96]
> 
> Then after we've come back from the asm call, we want to access idmap_t0sz
> again as part of cpu_uninstall_idmap() so we pop it back off:
> 
>>        0xffff0000080920cc <+336>:   ldr     x3, [x29,#96]
>>        0xffff0000080920d0 <+340>:   ldr     x0, [x3,#648]
> 
> And this access is the one that faults, because we popped off NULL.
> 

Thanks for your kindly explanation!

> So actually, rather than faulting on the stack access, we're managing to
> load zeroes from somewhere, so it could still be indicative of page table
> corruption for the stack mapping.
> 
> If you look at the __idmap_kpti_put_pgtable_ent_ng asm macro, can you try
> replacing:
> 
> 	dc      civac, cur_\()\type\()p
> 
> with:
> 
> 	dc      ivac, cur_\()\type\()p
> 
> please? Only do this for the guest kernel, not the host. KVM will upgrade
> the clean to a clean+invalidate, so it's interesting to see if this has
> an effect on the behaviour.

Only changed the guest kernel, the guest still failed to boot and the log
is same with the last mail.

But if I changed to cvac as below for the guest, it is kind of stable.
	dc      cvac, cur_\()\type\()p

I have synced with our SoC guys about this and hope we can find the reason.
Do you have any more suggestion?
Thanks!

Best Regards,
Wei

> 
> Will
> 
> .
>

next prev parent reply	other threads:[~2018-06-27 13:22 UTC|newest]

Thread overview: 79+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-06-20 14:18 KVM guest sometimes failed to boot because of kernel stack overflow if KPTI is enabled on a hisilicon ARM64 platform Wei Xu
2018-06-20 14:18 ` Wei Xu
2018-06-20 14:42 ` Will Deacon
2018-06-20 14:42   ` Will Deacon
2018-06-20 15:52   ` Wei Xu
2018-06-20 15:52     ` Wei Xu
2018-06-20 15:54     ` James Morse
2018-06-20 15:54       ` James Morse
2018-06-20 16:25       ` Wei Xu
2018-06-20 16:25         ` Wei Xu
2018-06-20 16:28         ` Will Deacon
2018-06-20 16:28           ` Will Deacon
2018-06-20 16:33           ` Wei Xu
2018-06-20 16:33             ` Wei Xu
2018-06-21  8:38         ` James Morse
2018-06-21  8:38           ` James Morse
2018-06-21  9:00           ` Marc Zyngier
2018-06-21  9:00             ` Marc Zyngier
2018-06-21  9:18           ` Will Deacon
2018-06-21  9:18             ` Will Deacon
2018-06-21 10:14             ` Wei Xu
2018-06-21 10:14               ` Wei Xu
2018-06-21 10:54               ` Will Deacon
2018-06-21 10:54                 ` Will Deacon
2018-06-22  8:33                 ` Wei Xu
2018-06-22  8:33                   ` Wei Xu
2018-06-22  9:23                   ` Will Deacon
2018-06-22  9:23                     ` Will Deacon
2018-06-22 10:45                     ` Wei Xu
2018-06-22 10:45                       ` Wei Xu
2018-06-22 11:16                       ` Will Deacon
2018-06-22 11:16                         ` Will Deacon
2018-06-22 13:18                         ` Wei Xu
2018-06-22 13:18                           ` Wei Xu
2018-06-22 13:31                           ` Will Deacon
2018-06-22 13:31                             ` Will Deacon
2018-06-22 13:46                             ` Wei Xu
2018-06-22 13:46                               ` Wei Xu
2018-06-22 14:43                               ` Will Deacon
2018-06-22 14:43                                 ` Will Deacon
2018-06-22 15:26                                 ` Wei Xu
2018-06-22 15:26                                   ` Wei Xu
2018-06-22 14:28                           ` Mark Rutland
2018-06-22 14:28                             ` Mark Rutland
2018-06-22 15:28                             ` Wei Xu
2018-06-22 15:28                               ` Wei Xu
2018-06-22 15:41                               ` Will Deacon
2018-06-22 15:41                                 ` Will Deacon
2018-06-22 16:02                                 ` Wei Xu
2018-06-22 16:02                                   ` Wei Xu
2018-06-21  9:20           ` Wei Xu
2018-06-21  9:20             ` Wei Xu
2018-06-26 17:16             ` Wei Xu
2018-06-26 17:16               ` Wei Xu
2018-06-26 17:47               ` Will Deacon
2018-06-26 17:47                 ` Will Deacon
2018-06-27  8:39                 ` James Morse
2018-06-27  8:39                   ` James Morse
2018-06-27 13:26                   ` Wei Xu
2018-06-27 13:26                     ` Wei Xu
2018-06-28  8:45                     ` James Morse
2018-06-28  8:45                       ` James Morse
2018-06-28 10:20                       ` Wei Xu
2018-06-28 10:20                         ` Wei Xu
2018-06-27 13:22                 ` Wei Xu [this message]
2018-06-27 13:22                   ` Wei Xu
2018-06-27 13:28                   ` Will Deacon
2018-06-27 13:28                     ` Will Deacon
2018-06-27 13:32                     ` Wei Xu
2018-06-27 13:32                       ` Wei Xu
2018-06-28 14:50                     ` Wei Xu
2018-06-28 14:50                       ` Wei Xu
2018-06-28 15:34                       ` Mark Rutland
2018-06-28 15:34                         ` Mark Rutland
     [not found]                         ` <etPan.5b3507f7.914aa16.1d6b@localhost>
2018-06-28 16:24                           ` 答复: " Mark Rutland
2018-06-28 16:24                             ` Mark Rutland
2018-06-29  9:59                             ` Mark Rutland
2018-06-29  9:59                               ` Mark Rutland
2018-06-29  8:47                           ` Marc Zyngier

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5B338F7B.9070500@hisilicon.com \
    --to=xuwei5@hisilicon.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.