From mboxrd@z Thu Jan  1 00:00:00 1970
From: Gleb Natapov <gleb@redhat.com>
Subject: Re: [Qemu-devel] [RFC] create a single workqueue for each vm to
 update vm irq routing table
Date: Tue, 26 Nov 2013 18:38:57 +0200
Message-ID: <20131126163857.GE20352@redhat.com>
References: <52949847.6020908@redhat.com>
 <CAEbWaipAnmoa=gMbB1aNb=btU6LgYXhcnmQWHJ_89m4yvw6Dug@mail.gmail.com>
 <5294A68F.6060301@redhat.com>
 <CAF950W+-UiX6xv4vYmnxji9aWDU8ds2rnx7JugnbHQWJdCCD-Q@mail.gmail.com>
 <5294B461.5000405@redhat.com>
 <5294B634.4050801@cloudius-systems.com>
 <20131126150357.GA20352@redhat.com>
 <5294BC3B.6070902@redhat.com>
 <20131126162414.GC20352@redhat.com>
 <5294CC03.7050801@cloudius-systems.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Cc: Paolo Bonzini <pbonzini@redhat.com>,
	Avi Kivity <avi.kivity@gmail.com>,
	"Huangweidong (C)" <weidong.huang@huawei.com>,
	KVM <kvm@vger.kernel.org>, "Michael S. Tsirkin" <mst@redhat.com>,
	"Jinxin (F)" <jinxin712@huawei.com>,
	"Zhanghaoyu (A)" <haoyu.zhang@huawei.com>,
	Luonengjun <luonengjun@huawei.com>,
	"qemu-devel@nongnu.org" <qemu-devel@nongnu.org>,
	Zanghongyong <zanghongyong@huawei.com>
To: Avi Kivity <avi@cloudius-systems.com>
Return-path: <kvm-owner@vger.kernel.org>
Received: from mx1.redhat.com ([209.132.183.28]:55716 "EHLO mx1.redhat.com"
	rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
	id S932249Ab3KZQjE (ORCPT <rfc822;kvm@vger.kernel.org>);
	Tue, 26 Nov 2013 11:39:04 -0500
Content-Disposition: inline
In-Reply-To: <5294CC03.7050801@cloudius-systems.com>
Sender: kvm-owner@vger.kernel.org
List-ID: <kvm.vger.kernel.org>

On Tue, Nov 26, 2013 at 06:27:47PM +0200, Avi Kivity wrote:
> On 11/26/2013 06:24 PM, Gleb Natapov wrote:
> >On Tue, Nov 26, 2013 at 04:20:27PM +0100, Paolo Bonzini wrote:
> >>Il 26/11/2013 16:03, Gleb Natapov ha scritto:
> >>>>>>>>>I understood the proposal was also to eliminate the synchronize_rcu(),
> >>>>>>>>>so while new interrupts would see the new routing table, interrupts
> >>>>>>>>>already in flight could pick up the old one.
> >>>>>>>Isn't that always the case with RCU?  (See my answer above: "the vcpus
> >>>>>>>already see the new routing table after the rcu_assign_pointer that is
> >>>>>>>in kvm_irq_routing_update").
> >>>>>With synchronize_rcu(), you have the additional guarantee that any
> >>>>>parallel accesses to the old routing table have completed.  Since we
> >>>>>also trigger the irq from rcu context, you know that after
> >>>>>synchronize_rcu() you won't get any interrupts to the old
> >>>>>destination (see kvm_set_irq_inatomic()).
> >>>We do not have this guaranty for other vcpus that do not call
> >>>synchronize_rcu(). They may still use outdated routing table while a vcpu
> >>>or iothread that performed table update sits in synchronize_rcu().
> >>Avi's point is that, after the VCPU resumes execution, you know that no
> >>interrupt will be sent to the old destination because
> >>kvm_set_msi_inatomic (and ultimately kvm_irq_delivery_to_apic_fast) is
> >>also called within the RCU read-side critical section.
> >>
> >>Without synchronize_rcu you could have
> >>
> >>     VCPU writes to routing table
> >>                                        e = entry from IRQ routing table
> >>     kvm_irq_routing_update(kvm, new);
> >>     VCPU resumes execution
> >>                                        kvm_set_msi_irq(e, &irq);
> >>                                        kvm_irq_delivery_to_apic_fast();
> >>
> >>where the entry is stale but the VCPU has already resumed execution.
> >>
> >So how is it different from what we have now:
> >
> >disable_irq()
> >VCPU writes to routing table
> >                                  e = entry from IRQ routing table
> >                                  kvm_set_msi_irq(e, &irq);
> >                                  kvm_irq_delivery_to_apic_fast();
> >kvm_irq_routing_update(kvm, new);
> >synchronize_rcu()
> >VCPU resumes execution
> >enable_irq()
> >receive stale irq
> >
> >
> 
> Suppose the guest did not disable_irq() and enable_irq(), but
> instead had a pci read where you have the enable_irq().  After the
> read you cannot have a stale irq (assuming the read flushes the irq
> all the way to the APIC).
There still may be race between pci read and MSI registered in IRR. I do
not believe such read can undo IRR changes.

--
			Gleb.