public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Liran Alon <LIRAN.ALON@ORACLE.COM>
To: Gonglei <arei.gonglei@huawei.com>,
	pbonzini@redhat.com, rkrcmar@redhat.com
Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
	weidong.huang@huawei.com
Subject: Re: [PATCH] KVM: x86: ioapic: Clear IRR for rtc bit when rtc EOI gotten
Date: Thu, 14 Dec 2017 15:01:42 +0200	[thread overview]
Message-ID: <5A327636.3050307@ORACLE.COM> (raw)
In-Reply-To: <1513254206-25344-1-git-send-email-arei.gonglei@huawei.com>



On 14/12/17 14:23, Gonglei wrote:
> We hit a bug in our test while run PCMark 10 in a windows 7 VM,
> The VM got stuck and the wallclock was hang after several minutes running
> PCMark 10 in it.
> It is quite easily to reproduce the bug with the upstream KVM and Qemu.
>
> We found that KVM can not inject any RTC irq to VM after it was hang, it fails to
> Deliver irq in ioapic_set_irq() because RTC irq is still pending in ioapic->irr.
>
> static int ioapic_set_irq(struct kvm_ioapic *ioapic, unsigned int irq,
>                    int irq_level, bool line_status)
> {
> ...
>           if (!irq_level) {
>                    ioapic->irr &= ~mask;
>                    ret = 1;
>                    goto out;
>           }
> ...
>           if ((edge && old_irr == ioapic->irr) ||
>               (!edge && entry.fields.remote_irr)) {
>                    ret = 0;
>                    goto out;
>           }
>
> According to RTC spec, after RTC injects a High level irq, OS will read CMOS's
> register C to to clear the irq flag, and pull down the irq electric pin.
>
> For Qemu, we will emulate the reading operation in cmos_ioport_read(),
> but Guest OS will fire a write operation before to tell which register will be read
> after this write, where we use s->cmos_index to record the following register to read.
>
> But in our test, we found that there is a possible situation that Vcpu fails to read
> RTC_REG_C to clear irq, This could happens while two VCpus are writing/reading
> registers at the same time, for example, vcpu 0 is trying to read RTC_REG_C,
> so it write RTC_REG_C first, where the s->cmos_index will be RTC_REG_C,
> but before it tries to read register C, another vcpu1 is going to read RTC_YEAR,
> it changes s->cmos_index to RTC_YEAR by a writing action.
> The next operation of vcpu0 will be lead to read RTC_YEAR, In this case, we will miss
> calling qemu_irq_lower(s->irq) to clear the irq. After this, kvm will never inject RTC irq,
> and Windows VM will hang.

If I understood correctly, this looks to me like a race-condition bug in 
the Windows guest kernel. In real-hardware this race-condition will also 
cause the RTC_YEAR to be read instead of RTC_REG_C.
Guest kernel should make sure that 2 CPUs does not attempt to read a 
CMOS register in parallel as they can override each other's cmos_index.

See for example how Linux kernel makes sure to avoid such kind of issues 
in rtc_cmos_read() (arch/x86/kernel/rtc.c) by grabbing a cmos_lock.

>
> Let's clear IRR of rtc when corresponding EOI is gotten to avoid the issue.

Can you elaborate a bit more why it makes sense to put such workaround 
in KVM code instead of declaring this as guest kernel bug?

Regards,
-Liran

>
> Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
> Signed-off-by: Gonglei <arei.gonglei@huawei.com>
> ---
>    Thanks to Paolo provides a good solution. :)
>
>   arch/x86/kvm/ioapic.c | 1 +
>   1 file changed, 1 insertion(+)
>
> diff --git a/arch/x86/kvm/ioapic.c b/arch/x86/kvm/ioapic.c
> index 4e822ad..5022d63 100644
> --- a/arch/x86/kvm/ioapic.c
> +++ b/arch/x86/kvm/ioapic.c
> @@ -160,6 +160,7 @@ static void rtc_irq_eoi(struct kvm_ioapic *ioapic, struct kvm_vcpu *vcpu)
>   {
>   	if (test_and_clear_bit(vcpu->vcpu_id,
>   			       ioapic->rtc_status.dest_map.map)) {
> +		ioapic->irr &= ~(1 << RTC_GSI);
>   		--ioapic->rtc_status.pending_eoi;
>   		rtc_status_pending_eoi_check_valid(ioapic);
>   	}
>

  reply	other threads:[~2017-12-14 13:02 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-12-14 12:23 [PATCH] KVM: x86: ioapic: Clear IRR for rtc bit when rtc EOI gotten Gonglei
2017-12-14 13:01 ` Liran Alon [this message]
2017-12-14 13:15   ` Gonglei (Arei)
2017-12-14 14:26     ` Quan Xu
2017-12-14 13:18   ` Paolo Bonzini
2017-12-25  7:29 ` Wanpeng Li

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5A327636.3050307@ORACLE.COM \
    --to=liran.alon@oracle.com \
    --cc=arei.gonglei@huawei.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=pbonzini@redhat.com \
    --cc=rkrcmar@redhat.com \
    --cc=weidong.huang@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox