From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:53694)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <avi@cloudius-systems.com>) id 1VlKW6-0006Ll-4r
	for qemu-devel@nongnu.org; Tue, 26 Nov 2013 10:25:41 -0500
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <avi@cloudius-systems.com>) id 1VlKVy-0008Rl-DQ
	for qemu-devel@nongnu.org; Tue, 26 Nov 2013 10:25:34 -0500
Received: from mail-bk0-f44.google.com ([209.85.214.44]:58197)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <avi@cloudius-systems.com>) id 1VlKVy-0008Rb-7w
	for qemu-devel@nongnu.org; Tue, 26 Nov 2013 10:25:26 -0500
Received: by mail-bk0-f44.google.com with SMTP id d7so2683550bkh.31
	for <qemu-devel@nongnu.org>; Tue, 26 Nov 2013 07:25:25 -0800 (PST)
Message-ID: <5294BD61.7080904@cloudius-systems.com>
Date: Tue, 26 Nov 2013 17:25:21 +0200
From: Avi Kivity <avi@cloudius-systems.com>
MIME-Version: 1.0
References: <D3E216785288A145B7BC975F83A2ED10448ADCCB@SZXEMA510-MBS.china.huawei.com>
	<52949847.6020908@redhat.com>
	<CAEbWaipAnmoa=gMbB1aNb=btU6LgYXhcnmQWHJ_89m4yvw6Dug@mail.gmail.com>
	<5294A68F.6060301@redhat.com>
	<CAF950W+-UiX6xv4vYmnxji9aWDU8ds2rnx7JugnbHQWJdCCD-Q@mail.gmail.com>
	<5294B461.5000405@redhat.com>
	<5294B634.4050801@cloudius-systems.com>
	<20131126150357.GA20352@redhat.com> <5294BC3B.6070902@redhat.com>
In-Reply-To: <5294BC3B.6070902@redhat.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Subject: Re: [Qemu-devel] [RFC] create a single workqueue for each vm to
 update vm irq routing table
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Paolo Bonzini <pbonzini@redhat.com>, Gleb Natapov <gleb@redhat.com>
Cc: "Huangweidong (C)" <weidong.huang@huawei.com>, KVM <kvm@vger.kernel.org>, "Michael S. Tsirkin" <mst@redhat.com>, "Zhanghaoyu (A)" <haoyu.zhang@huawei.com>, Luonengjun <luonengjun@huawei.com>, "qemu-devel@nongnu.org" <qemu-devel@nongnu.org>, Zanghongyong <zanghongyong@huawei.com>, Avi Kivity <avi.kivity@gmail.com>, "Jinxin (F)" <jinxin712@huawei.com>

On 11/26/2013 05:20 PM, Paolo Bonzini wrote:
> Il 26/11/2013 16:03, Gleb Natapov ha scritto:
>>>>>>>> I understood the proposal was also to eliminate the synchronize_rcu(),
>>>>>>>> so while new interrupts would see the new routing table, interrupts
>>>>>>>> already in flight could pick up the old one.
>>>>>> Isn't that always the case with RCU?  (See my answer above: "the vcpus
>>>>>> already see the new routing table after the rcu_assign_pointer that is
>>>>>> in kvm_irq_routing_update").
>>>> With synchronize_rcu(), you have the additional guarantee that any
>>>> parallel accesses to the old routing table have completed.  Since we
>>>> also trigger the irq from rcu context, you know that after
>>>> synchronize_rcu() you won't get any interrupts to the old
>>>> destination (see kvm_set_irq_inatomic()).
>> We do not have this guaranty for other vcpus that do not call
>> synchronize_rcu(). They may still use outdated routing table while a vcpu
>> or iothread that performed table update sits in synchronize_rcu().
> Avi's point is that, after the VCPU resumes execution, you know that no
> interrupt will be sent to the old destination because
> kvm_set_msi_inatomic (and ultimately kvm_irq_delivery_to_apic_fast) is
> also called within the RCU read-side critical section.
>
> Without synchronize_rcu you could have
>
>      VCPU writes to routing table
>                                         e = entry from IRQ routing table
>      kvm_irq_routing_update(kvm, new);
>      VCPU resumes execution
>                                         kvm_set_msi_irq(e, &irq);
>                                         kvm_irq_delivery_to_apic_fast();
>
> where the entry is stale but the VCPU has already resumed execution.
>
> If we want to ensure, we need to use a different mechanism for
> synchronization than the global RCU.  QRCU would work; readers are not
> wait-free but only if there is a concurrent synchronize_qrcu, which
> should be rare.

An alternative path is to convince ourselves that the hardware does not 
provide the guarantees that the current code provides, and so we can 
relax them.