From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:46679)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <gleb@redhat.com>) id 1VlLfK-00036Q-Nc
	for qemu-devel@nongnu.org; Tue, 26 Nov 2013 11:39:16 -0500
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <gleb@redhat.com>) id 1VlLfE-000828-O2
	for qemu-devel@nongnu.org; Tue, 26 Nov 2013 11:39:10 -0500
Received: from mx1.redhat.com ([209.132.183.28]:24883)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <gleb@redhat.com>) id 1VlLfE-000822-Fb
	for qemu-devel@nongnu.org; Tue, 26 Nov 2013 11:39:04 -0500
Date: Tue, 26 Nov 2013 18:38:57 +0200
From: Gleb Natapov <gleb@redhat.com>
Message-ID: <20131126163857.GE20352@redhat.com>
References: <52949847.6020908@redhat.com>
	<CAEbWaipAnmoa=gMbB1aNb=btU6LgYXhcnmQWHJ_89m4yvw6Dug@mail.gmail.com>
	<5294A68F.6060301@redhat.com>
	<CAF950W+-UiX6xv4vYmnxji9aWDU8ds2rnx7JugnbHQWJdCCD-Q@mail.gmail.com>
	<5294B461.5000405@redhat.com>
	<5294B634.4050801@cloudius-systems.com>
	<20131126150357.GA20352@redhat.com> <5294BC3B.6070902@redhat.com>
	<20131126162414.GC20352@redhat.com>
	<5294CC03.7050801@cloudius-systems.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <5294CC03.7050801@cloudius-systems.com>
Subject: Re: [Qemu-devel] [RFC] create a single workqueue for each vm to
 update vm irq routing table
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Avi Kivity <avi@cloudius-systems.com>
Cc: "Huangweidong (C)" <weidong.huang@huawei.com>, KVM <kvm@vger.kernel.org>, "Michael S. Tsirkin" <mst@redhat.com>, "Zhanghaoyu (A)" <haoyu.zhang@huawei.com>, Luonengjun <luonengjun@huawei.com>, "qemu-devel@nongnu.org" <qemu-devel@nongnu.org>, Zanghongyong <zanghongyong@huawei.com>, Avi Kivity <avi.kivity@gmail.com>, Paolo Bonzini <pbonzini@redhat.com>, "Jinxin (F)" <jinxin712@huawei.com>

On Tue, Nov 26, 2013 at 06:27:47PM +0200, Avi Kivity wrote:
> On 11/26/2013 06:24 PM, Gleb Natapov wrote:
> >On Tue, Nov 26, 2013 at 04:20:27PM +0100, Paolo Bonzini wrote:
> >>Il 26/11/2013 16:03, Gleb Natapov ha scritto:
> >>>>>>>>>I understood the proposal was also to eliminate the synchronize_rcu(),
> >>>>>>>>>so while new interrupts would see the new routing table, interrupts
> >>>>>>>>>already in flight could pick up the old one.
> >>>>>>>Isn't that always the case with RCU?  (See my answer above: "the vcpus
> >>>>>>>already see the new routing table after the rcu_assign_pointer that is
> >>>>>>>in kvm_irq_routing_update").
> >>>>>With synchronize_rcu(), you have the additional guarantee that any
> >>>>>parallel accesses to the old routing table have completed.  Since we
> >>>>>also trigger the irq from rcu context, you know that after
> >>>>>synchronize_rcu() you won't get any interrupts to the old
> >>>>>destination (see kvm_set_irq_inatomic()).
> >>>We do not have this guaranty for other vcpus that do not call
> >>>synchronize_rcu(). They may still use outdated routing table while a vcpu
> >>>or iothread that performed table update sits in synchronize_rcu().
> >>Avi's point is that, after the VCPU resumes execution, you know that no
> >>interrupt will be sent to the old destination because
> >>kvm_set_msi_inatomic (and ultimately kvm_irq_delivery_to_apic_fast) is
> >>also called within the RCU read-side critical section.
> >>
> >>Without synchronize_rcu you could have
> >>
> >>     VCPU writes to routing table
> >>                                        e = entry from IRQ routing table
> >>     kvm_irq_routing_update(kvm, new);
> >>     VCPU resumes execution
> >>                                        kvm_set_msi_irq(e, &irq);
> >>                                        kvm_irq_delivery_to_apic_fast();
> >>
> >>where the entry is stale but the VCPU has already resumed execution.
> >>
> >So how is it different from what we have now:
> >
> >disable_irq()
> >VCPU writes to routing table
> >                                  e = entry from IRQ routing table
> >                                  kvm_set_msi_irq(e, &irq);
> >                                  kvm_irq_delivery_to_apic_fast();
> >kvm_irq_routing_update(kvm, new);
> >synchronize_rcu()
> >VCPU resumes execution
> >enable_irq()
> >receive stale irq
> >
> >
> 
> Suppose the guest did not disable_irq() and enable_irq(), but
> instead had a pci read where you have the enable_irq().  After the
> read you cannot have a stale irq (assuming the read flushes the irq
> all the way to the APIC).
There still may be race between pci read and MSI registered in IRR. I do
not believe such read can undo IRR changes.

--
			Gleb.