Linux KVM/arm64 development list
 help / color / mirror / Atom feed
From: Marc Zyngier <marc.zyngier@arm.com>
To: Shanker Donthineni <shankerd@codeaurora.org>,
	kvmarm@lists.cs.columbia.edu
Subject: Re: Intermittent guest kernel crashes with v4.5-rc6.
Date: Wed, 2 Mar 2016 15:09:48 +0000	[thread overview]
Message-ID: <56D7023C.7050309@arm.com> (raw)
In-Reply-To: <56D6FFDE.9050704@codeaurora.org>

On 02/03/16 14:59, Shanker Donthineni wrote:
> Hi Marc,
> 
> Thanks for your quick reply.
> 
> On 03/02/2016 08:16 AM, Marc Zyngier wrote:
>> On 02/03/16 13:56, Shanker Donthineni wrote:
>>> For some reason v4.5-rc6 kernel is not stable for guest machines on
>>> Qualcomm server platforms.
>>> We are getting IABT translation faults while booting the guest kernel.
>>> The problem disappears with
>>> the following code snippet (insert "dsb ish" instruction just before
>>> switching to EL1 guest). I am
>>> using v4.5-rc6 kernel for both host and guest machines.
>>>
>>> Please let me know if you have any thoughts or ideas for tracing this
>>> problem.
>>>
>>> --- a/arch/arm64/kvm/hyp/entry.S
>>> +++ b/arch/arm64/kvm/hyp/entry.S
>>> @@ -88,6 +88,7 @@ ENTRY(__guest_enter)
>>>           ldp     x0, x1, [sp], #16
>>>
>>>           // Do not touch any register after this!
>>> +       dsb ish
>>>           eret
>>>    ENDPROC(__guest_enter)
>>>
>>>
>>> Using below QEMU command for launching guest machine:
>>>
>>> qemu-system-aarch64 -machine type=virt,accel=kvm,gic-version=3  \
>>> -cpu "host" -smp cpus=1,maxcpus=1 -m 256M -serial stdio \
>>> -kernel /boot/Image -initrd /boot/rootfs.cpio.gz \
>>> -append 'earlycon=earlycon=pl011,0x09000000  \
>>> console=ttyAMA0,115200 root=/dev/ram'
>>>
>>>
>>> Guest machine crash log messages:
>>>
>>> [    0.000000] Booting Linux on physical CPU 0x0
>>> [    0.000000] Boot CPU: AArch64 Processor [510f2811]
>>> [    0.000000] Bad mode in Synchronous Abort handler detected, code
>>> 0x8600000f -- IABT (current EL)
>>> [    0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 4.5.rc6+
>>> [    0.000000] task: ffffffc000d52200 ti: ffffffc000d44000 task.ti:
>>> ffffffc000d44000
>>> [    0.000000] PC is at early_init_dt_scan_root+0x28/0x94
>>> [    0.000000] LR is at of_scan_flat_dt+0x9c/0xd0
>>> [    0.000000] pc : [<ffffffc000cb32e8>] lr : [<ffffffc000cb3248>]
>>> pstate: 800003c5
>>> [    0.000000] sp : ffffffc000d47e80
>>> [    0.000000] x29: ffffffc000d47e80 x28: 0000000000000000
>>>
>> If you're getting a prefetch abort, it would be interesting to find out
>> what instruction is there, whether the page is mapped at stage-2 or not,
>> what are the stage-2 permissions... Basically, a full description of the
>> memory state.
>>
>> Also, does it work if you do a "dsb ishst" instead?
>>
>> Thanks,
>>
>> 	M.
> 
> Most of the times it is faulting at ldr/str instructions. I have 
> verified stage-1 page and  the
> the corresponding stage-2 page attributes (SH, AP, PERM), PA etc. after 
> IABT, everything
> perfectly matches. I am very confident that stage-1/stage-2 MMU page 
> tables are correct.
> 
> Instruction "dsb ishst" also fixing the problem.
> 
> One more Interesting observation, if retry an instruction fetch that 
> caused IABT, second
> time fetch is successful and I don't see IABT.  I used below 
> experimental code to test.
> 
> --- a/arch/arm64/kernel/entry.S
> +++ b/arch/arm64/kernel/entry.S
> @@ -346,6 +346,7 @@ el1_sync:
>          b.eq    el1_undef
>          cmp     x24, #ESR_ELx_EC_BREAKPT_CUR    // debug exception in EL1
>          b.ge    el1_dbg
> +       kernel_exit 1
>          b       el1_inv
>   el1_da:
> 
> 

OK, that's pretty scary, specially considering that we don't have a DSB
on that path. Do you ever see it exploding at EL0?

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...

  reply	other threads:[~2016-03-02 15:02 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-03-02 13:56 Intermittent guest kernel crashes with v4.5-rc6 Shanker Donthineni
2016-03-02 14:16 ` Marc Zyngier
2016-03-02 14:59   ` Shanker Donthineni
2016-03-02 15:09     ` Marc Zyngier [this message]
2016-03-02 15:48       ` Shanker Donthineni
2016-03-02 17:35         ` Marc Zyngier
2016-03-03 13:25           ` Shanker Donthineni
2016-03-03 14:03             ` Marc Zyngier
2016-03-03 14:26               ` Shanker Donthineni
2016-03-03 14:38                 ` Marc Zyngier
     [not found]                   ` <56DE48B6.4060705@codeaurora.org>
2016-04-18 15:56                     ` Christopher Covington
2016-04-18 16:00                       ` Marc Zyngier
2016-03-02 14:48 ` Marc Zyngier

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=56D7023C.7050309@arm.com \
    --to=marc.zyngier@arm.com \
    --cc=kvmarm@lists.cs.columbia.edu \
    --cc=shankerd@codeaurora.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox