From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:42769)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <avi@cloudius-systems.com>) id 1VlLUV-0002NB-2A
	for qemu-devel@nongnu.org; Tue, 26 Nov 2013 11:28:06 -0500
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <avi@cloudius-systems.com>) id 1VlLUN-0003wm-Jt
	for qemu-devel@nongnu.org; Tue, 26 Nov 2013 11:27:59 -0500
Received: from mail-bk0-f53.google.com ([209.85.214.53]:34101)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <avi@cloudius-systems.com>) id 1VlLUN-0003wT-Dh
	for qemu-devel@nongnu.org; Tue, 26 Nov 2013 11:27:51 -0500
Received: by mail-bk0-f53.google.com with SMTP id na10so2668354bkb.40
	for <qemu-devel@nongnu.org>; Tue, 26 Nov 2013 08:27:50 -0800 (PST)
Message-ID: <5294CC03.7050801@cloudius-systems.com>
Date: Tue, 26 Nov 2013 18:27:47 +0200
From: Avi Kivity <avi@cloudius-systems.com>
MIME-Version: 1.0
References: <D3E216785288A145B7BC975F83A2ED10448ADCCB@SZXEMA510-MBS.china.huawei.com>
	<52949847.6020908@redhat.com>
	<CAEbWaipAnmoa=gMbB1aNb=btU6LgYXhcnmQWHJ_89m4yvw6Dug@mail.gmail.com>
	<5294A68F.6060301@redhat.com>
	<CAF950W+-UiX6xv4vYmnxji9aWDU8ds2rnx7JugnbHQWJdCCD-Q@mail.gmail.com>
	<5294B461.5000405@redhat.com>
	<5294B634.4050801@cloudius-systems.com>
	<20131126150357.GA20352@redhat.com> <5294BC3B.6070902@redhat.com>
	<20131126162414.GC20352@redhat.com>
In-Reply-To: <20131126162414.GC20352@redhat.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Subject: Re: [Qemu-devel] [RFC] create a single workqueue for each vm to
 update vm irq routing table
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Gleb Natapov <gleb@redhat.com>, Paolo Bonzini <pbonzini@redhat.com>
Cc: "Huangweidong (C)" <weidong.huang@huawei.com>, KVM <kvm@vger.kernel.org>, "Michael S. Tsirkin" <mst@redhat.com>, "Zhanghaoyu (A)" <haoyu.zhang@huawei.com>, Luonengjun <luonengjun@huawei.com>, "qemu-devel@nongnu.org" <qemu-devel@nongnu.org>, Zanghongyong <zanghongyong@huawei.com>, Avi Kivity <avi.kivity@gmail.com>, "Jinxin (F)" <jinxin712@huawei.com>

On 11/26/2013 06:24 PM, Gleb Natapov wrote:
> On Tue, Nov 26, 2013 at 04:20:27PM +0100, Paolo Bonzini wrote:
>> Il 26/11/2013 16:03, Gleb Natapov ha scritto:
>>>>>>>>> I understood the proposal was also to eliminate the synchronize_rcu(),
>>>>>>>>> so while new interrupts would see the new routing table, interrupts
>>>>>>>>> already in flight could pick up the old one.
>>>>>>> Isn't that always the case with RCU?  (See my answer above: "the vcpus
>>>>>>> already see the new routing table after the rcu_assign_pointer that is
>>>>>>> in kvm_irq_routing_update").
>>>>> With synchronize_rcu(), you have the additional guarantee that any
>>>>> parallel accesses to the old routing table have completed.  Since we
>>>>> also trigger the irq from rcu context, you know that after
>>>>> synchronize_rcu() you won't get any interrupts to the old
>>>>> destination (see kvm_set_irq_inatomic()).
>>> We do not have this guaranty for other vcpus that do not call
>>> synchronize_rcu(). They may still use outdated routing table while a vcpu
>>> or iothread that performed table update sits in synchronize_rcu().
>> Avi's point is that, after the VCPU resumes execution, you know that no
>> interrupt will be sent to the old destination because
>> kvm_set_msi_inatomic (and ultimately kvm_irq_delivery_to_apic_fast) is
>> also called within the RCU read-side critical section.
>>
>> Without synchronize_rcu you could have
>>
>>      VCPU writes to routing table
>>                                         e = entry from IRQ routing table
>>      kvm_irq_routing_update(kvm, new);
>>      VCPU resumes execution
>>                                         kvm_set_msi_irq(e, &irq);
>>                                         kvm_irq_delivery_to_apic_fast();
>>
>> where the entry is stale but the VCPU has already resumed execution.
>>
> So how is it different from what we have now:
>
> disable_irq()
> VCPU writes to routing table
>                                   e = entry from IRQ routing table
>                                   kvm_set_msi_irq(e, &irq);
>                                   kvm_irq_delivery_to_apic_fast();
> kvm_irq_routing_update(kvm, new);
> synchronize_rcu()
> VCPU resumes execution
> enable_irq()
> receive stale irq
>
>        
>

Suppose the guest did not disable_irq() and enable_irq(), but instead 
had a pci read where you have the enable_irq().  After the read you 
cannot have a stale irq (assuming the read flushes the irq all the way 
to the APIC).