Linux KVM/arm64 development list
 help / color / mirror / Atom feed
From: Christopher Covington <cov@codeaurora.org>
To: Shanker Donthineni <shankerd@codeaurora.org>,
	Marc Zyngier <marc.zyngier@arm.com>,
	kvmarm@lists.cs.columbia.edu
Subject: Re: Intermittent guest kernel crashes with v4.5-rc6.
Date: Mon, 18 Apr 2016 11:56:35 -0400	[thread overview]
Message-ID: <571503B3.6060001@codeaurora.org> (raw)
In-Reply-To: <56DE48B6.4060705@codeaurora.org>

On 03/07/2016 10:36 PM, Shanker Donthineni wrote:
> On 03/03/2016 08:38 AM, Marc Zyngier wrote:
>> On 03/03/16 14:26, Shanker Donthineni wrote:
>>> On 03/03/2016 08:03 AM, Marc Zyngier wrote:
>>>> On 03/03/16 13:25, Shanker Donthineni wrote:
>>>>> On 03/02/2016 11:35 AM, Marc Zyngier wrote:
>>>>>> On 02/03/16 15:48, Shanker Donthineni wrote:
>>>>>>
>>>>>>> We haven't started running heavy workloads in VMs. So far we
>>>>>>> have noticed this random nature behavior only during guest
>>>>>>> kernel boot (at EL1).
>>>>>>>
>>>>>>> We didn't see this problem on 4.3 kernel. Do you think it is
>>>>>>> related to TLB conflicts?
>>>>>> I cannot imagine why a DSB would solve a TLB conflict. But the fact 
>>>>>> that
>>>>>> you didn't see it crashing on 4.3 is a good indication that something
>>>>>> else it at play.
>>>>>>
>>>>>> In 4.5, we've rewritten a large part of KVM in C, which has changed the
>>>>>> ordering of the various accesses a lot. It could be that a latent
>>>>>> problem is now exposed more widely.
>>>>>>
>>>>>> Can you try moving this DSB around and find out what is the earliest
>>>>>> point where it solves this problem? Some sort of bisection?
>>>>> The maximum I can move up 'dsb ishst' to the beginning of
>>>>> __guest_enter() but not out side of this function.
>>>>>
>>>>> I don't understand why it is failing below code, branch
>>>>> instruction causing problems.
>>>>>
>>>>>     /* Jump in the fire! */
>>>>> +  dsb(ishst);
>>>>>     exit_code = __guest_enter(vcpu, host_ctxt);
>>>>>     /* And we're baaack! */
>>>> That's very worrying. I can't see how the branch can have an influence
>>>> on the the DSB (nor why the DSB has an influence on the rest of the
>>>> execution, btw).
>>>>
>>>> What if you replace the DSB with an ISB? Do you observe a similar
>>>> behaviour (works if the barrier is in __guest_enter, but not if it is
>>>> outside)?
>>> I have already tried with isb without success. I did another
>>> experiment flush stage-2 TLBs before calling __guest_enetr(),
>>> it fixed the problem.
>> I suspected something like that. But it is such a massive hammer that it
>> will hide any sort of subtle bug (HW *and* SW).
>>
>>>> Another thing worth looking at is what happened just before we decided
>>>> to get back into the guest. Or to put it differently, what was the
>>>> reason to exit the first place. Was it a Stage-2 fault by any chance?
>>> I will collect as much possible debug data and update results
>>> to you. I went through your KVM refracted 'C' code and did not
>>> find any thing suspicious. I am thinking may be Qualcomm CPUs
>>> have a very aggressive prefech logic that causing the problem.
>> OK. Please keep me posted about your findings. Also maybe involving some
>> HW people ouwld be a good idea (running something in an emulator, for
>> example...).

This has been confirmed to be a hardware defect with a firmware workaround.

Regards,
Christopher Covington

-- 
Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project

  parent reply	other threads:[~2016-04-18 15:54 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-03-02 13:56 Intermittent guest kernel crashes with v4.5-rc6 Shanker Donthineni
2016-03-02 14:16 ` Marc Zyngier
2016-03-02 14:59   ` Shanker Donthineni
2016-03-02 15:09     ` Marc Zyngier
2016-03-02 15:48       ` Shanker Donthineni
2016-03-02 17:35         ` Marc Zyngier
2016-03-03 13:25           ` Shanker Donthineni
2016-03-03 14:03             ` Marc Zyngier
2016-03-03 14:26               ` Shanker Donthineni
2016-03-03 14:38                 ` Marc Zyngier
     [not found]                   ` <56DE48B6.4060705@codeaurora.org>
2016-04-18 15:56                     ` Christopher Covington [this message]
2016-04-18 16:00                       ` Marc Zyngier
2016-03-02 14:48 ` Marc Zyngier

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=571503B3.6060001@codeaurora.org \
    --to=cov@codeaurora.org \
    --cc=kvmarm@lists.cs.columbia.edu \
    --cc=marc.zyngier@arm.com \
    --cc=shankerd@codeaurora.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox