From mboxrd@z Thu Jan  1 00:00:00 1970
From: Gleb Natapov <gleb@minantech.com>
Subject: Re: [Qemu-devel] [RFC] create a single workqueue for each vm to
 update vm irq routing table
Date: Thu, 28 Nov 2013 11:20:35 +0200
Message-ID: <20131128092034.GB4609@kernel.org>
References: <D3E216785288A145B7BC975F83A2ED10448ADCCB@SZXEMA510-MBS.china.huawei.com>
 <52949847.6020908@redhat.com>
 <CAEbWaipAnmoa=gMbB1aNb=btU6LgYXhcnmQWHJ_89m4yvw6Dug@mail.gmail.com>
 <5294A68F.6060301@redhat.com>
 <CAF950W+-UiX6xv4vYmnxji9aWDU8ds2rnx7JugnbHQWJdCCD-Q@mail.gmail.com>
 <5294B461.5000405@redhat.com>
 <5294B634.4050801@cloudius-systems.com>
 <20131126150357.GA20352@redhat.com>
 <5294BD21.4010701@cloudius-systems.com>
 <D3E216785288A145B7BC975F83A2ED10448BBBB4@SZXEMA510-MBS.china.huawei.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Cc: Avi Kivity <avi@cloudius-systems.com>,
	Gleb Natapov <gleb@redhat.com>,
	Paolo Bonzini <pbonzini@redhat.com>,
	Avi Kivity <avi.kivity@gmail.com>,
	"Huangweidong (C)" <weidong.huang@huawei.com>,
	KVM <kvm@vger.kernel.org>, "Michael S. Tsirkin" <mst@redhat.com>,
	"Jinxin (F)" <jinxin712@huawei.com>,
	Luonengjun <luonengjun@huawei.com>,
	"qemu-devel@nongnu.org" <qemu-devel@nongnu.org>,
	Zanghongyong <zanghongyong@huawei.com>
To: "Zhanghaoyu (A)" <haoyu.zhang@huawei.com>
Return-path: <kvm-owner@vger.kernel.org>
Received: from mail-bk0-f43.google.com ([209.85.214.43]:42808 "EHLO
	mail-bk0-f43.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
	with ESMTP id S1750859Ab3K1JUk (ORCPT <rfc822;kvm@vger.kernel.org>);
	Thu, 28 Nov 2013 04:20:40 -0500
Received: by mail-bk0-f43.google.com with SMTP id mz12so3672210bkb.16
        for <kvm@vger.kernel.org>; Thu, 28 Nov 2013 01:20:39 -0800 (PST)
Content-Disposition: inline
In-Reply-To: <D3E216785288A145B7BC975F83A2ED10448BBBB4@SZXEMA510-MBS.china.huawei.com>
Sender: kvm-owner@vger.kernel.org
List-ID: <kvm.vger.kernel.org>

On Thu, Nov 28, 2013 at 09:14:22AM +0000, Zhanghaoyu (A) wrote:
> >>>>>      No, this would be exactly the same code that is running now:
> >>>>>
> >>>>>              mutex_lock(&kvm->irq_lock);
> >>>>>              old = kvm->irq_routing;
> >>>>>              kvm_irq_routing_update(kvm, new);
> >>>>>              mutex_unlock(&kvm->irq_lock);
> >>>>>
> >>>>>              synchronize_rcu();
> >>>>>              kfree(old);
> >>>>>              return 0;
> >>>>>
> >>>>>      Except that the kfree would run in the call_rcu kernel thread instead of
> >>>>>      the vcpu thread.  But the vcpus already see the new routing table after
> >>>>>      the rcu_assign_pointer that is in kvm_irq_routing_update.
> >>>>>
> >>>>> I understood the proposal was also to eliminate the 
> >>>>> synchronize_rcu(), so while new interrupts would see the new 
> >>>>> routing table, interrupts already in flight could pick up the old one.
> >>>> Isn't that always the case with RCU?  (See my answer above: "the 
> >>>> vcpus already see the new routing table after the rcu_assign_pointer 
> >>>> that is in kvm_irq_routing_update").
> >>> With synchronize_rcu(), you have the additional guarantee that any 
> >>> parallel accesses to the old routing table have completed.  Since we 
> >>> also trigger the irq from rcu context, you know that after
> >>> synchronize_rcu() you won't get any interrupts to the old destination 
> >>> (see kvm_set_irq_inatomic()).
> >> We do not have this guaranty for other vcpus that do not call 
> >> synchronize_rcu(). They may still use outdated routing table while a 
> >> vcpu or iothread that performed table update sits in synchronize_rcu().
> >>
> >
> >Consider this guest code:
> >
> >   write msi entry, directing the interrupt away from this vcpu
> >   nop
> >   memset(&idt, 0, sizeof(idt));
> >
> >Currently, this code will never trigger a triple fault.  With the change to call_rcu(), it may.
> >
> >Now it may be that the guest does not expect this to work (PCI writes are posted; and interrupts can be delayed indefinitely by the pci fabric), 
> >but we don't know if there's a path that guarantees the guest something that we're taking away with this change.
> 
> In native environment, if a CPU's LAPIC's IRR and ISR have been pending many interrupts, then OS perform zeroing this CPU's IDT before receiving interrupts,
> will the same problem happen?
> 
This is just an example. OS can ensure that there is no other pending interrupt by some other means.

--
			Gleb.