From mboxrd@z Thu Jan  1 00:00:00 1970
From: Avi Kivity <avi@cloudius-systems.com>
Subject: Re: [Qemu-devel] [RFC] create a single workqueue for each vm to update
 vm irq routing table
Date: Tue, 26 Nov 2013 17:25:21 +0200
Message-ID: <5294BD61.7080904@cloudius-systems.com>
References: <D3E216785288A145B7BC975F83A2ED10448ADCCB@SZXEMA510-MBS.china.huawei.com> <52949847.6020908@redhat.com> <CAEbWaipAnmoa=gMbB1aNb=btU6LgYXhcnmQWHJ_89m4yvw6Dug@mail.gmail.com> <5294A68F.6060301@redhat.com> <CAF950W+-UiX6xv4vYmnxji9aWDU8ds2rnx7JugnbHQWJdCCD-Q@mail.gmail.com> <5294B461.5000405@redhat.com> <5294B634.4050801@cloudius-systems.com> <20131126150357.GA20352@redhat.com> <5294BC3B.6070902@redhat.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Cc: Avi Kivity <avi.kivity@gmail.com>,
	"Huangweidong (C)" <weidong.huang@huawei.com>,
	KVM <kvm@vger.kernel.org>, "Michael S. Tsirkin" <mst@redhat.com>,
	"Jinxin (F)" <jinxin712@huawei.com>,
	"Zhanghaoyu (A)" <haoyu.zhang@huawei.com>,
	Luonengjun <luonengjun@huawei.com>,
	"qemu-devel@nongnu.org" <qemu-devel@nongnu.org>,
	Zanghongyong <zanghongyong@huawei.com>
To: Paolo Bonzini <pbonzini@redhat.com>, Gleb Natapov <gleb@redhat.com>
Return-path: <kvm-owner@vger.kernel.org>
Received: from mail-bk0-f42.google.com ([209.85.214.42]:42264 "EHLO
	mail-bk0-f42.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
	with ESMTP id S1754368Ab3KZPZ1 (ORCPT <rfc822;kvm@vger.kernel.org>);
	Tue, 26 Nov 2013 10:25:27 -0500
Received: by mail-bk0-f42.google.com with SMTP id w11so2691500bkz.1
        for <kvm@vger.kernel.org>; Tue, 26 Nov 2013 07:25:25 -0800 (PST)
In-Reply-To: <5294BC3B.6070902@redhat.com>
Sender: kvm-owner@vger.kernel.org
List-ID: <kvm.vger.kernel.org>

On 11/26/2013 05:20 PM, Paolo Bonzini wrote:
> Il 26/11/2013 16:03, Gleb Natapov ha scritto:
>>>>>>>> I understood the proposal was also to eliminate the synchronize_rcu(),
>>>>>>>> so while new interrupts would see the new routing table, interrupts
>>>>>>>> already in flight could pick up the old one.
>>>>>> Isn't that always the case with RCU?  (See my answer above: "the vcpus
>>>>>> already see the new routing table after the rcu_assign_pointer that is
>>>>>> in kvm_irq_routing_update").
>>>> With synchronize_rcu(), you have the additional guarantee that any
>>>> parallel accesses to the old routing table have completed.  Since we
>>>> also trigger the irq from rcu context, you know that after
>>>> synchronize_rcu() you won't get any interrupts to the old
>>>> destination (see kvm_set_irq_inatomic()).
>> We do not have this guaranty for other vcpus that do not call
>> synchronize_rcu(). They may still use outdated routing table while a vcpu
>> or iothread that performed table update sits in synchronize_rcu().
> Avi's point is that, after the VCPU resumes execution, you know that no
> interrupt will be sent to the old destination because
> kvm_set_msi_inatomic (and ultimately kvm_irq_delivery_to_apic_fast) is
> also called within the RCU read-side critical section.
>
> Without synchronize_rcu you could have
>
>      VCPU writes to routing table
>                                         e = entry from IRQ routing table
>      kvm_irq_routing_update(kvm, new);
>      VCPU resumes execution
>                                         kvm_set_msi_irq(e, &irq);
>                                         kvm_irq_delivery_to_apic_fast();
>
> where the entry is stale but the VCPU has already resumed execution.
>
> If we want to ensure, we need to use a different mechanism for
> synchronization than the global RCU.  QRCU would work; readers are not
> wait-free but only if there is a concurrent synchronize_qrcu, which
> should be rare.

An alternative path is to convince ourselves that the hardware does not 
provide the guarantees that the current code provides, and so we can 
relax them.

From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:53694)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <avi@cloudius-systems.com>) id 1VlKW6-0006Ll-4r
	for qemu-devel@nongnu.org; Tue, 26 Nov 2013 10:25:41 -0500
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <avi@cloudius-systems.com>) id 1VlKVy-0008Rl-DQ
	for qemu-devel@nongnu.org; Tue, 26 Nov 2013 10:25:34 -0500
Received: from mail-bk0-f44.google.com ([209.85.214.44]:58197)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <avi@cloudius-systems.com>) id 1VlKVy-0008Rb-7w
	for qemu-devel@nongnu.org; Tue, 26 Nov 2013 10:25:26 -0500
Received: by mail-bk0-f44.google.com with SMTP id d7so2683550bkh.31
	for <qemu-devel@nongnu.org>; Tue, 26 Nov 2013 07:25:25 -0800 (PST)
Message-ID: <5294BD61.7080904@cloudius-systems.com>
Date: Tue, 26 Nov 2013 17:25:21 +0200
From: Avi Kivity <avi@cloudius-systems.com>
MIME-Version: 1.0
References: <D3E216785288A145B7BC975F83A2ED10448ADCCB@SZXEMA510-MBS.china.huawei.com>
	<52949847.6020908@redhat.com>
	<CAEbWaipAnmoa=gMbB1aNb=btU6LgYXhcnmQWHJ_89m4yvw6Dug@mail.gmail.com>
	<5294A68F.6060301@redhat.com>
	<CAF950W+-UiX6xv4vYmnxji9aWDU8ds2rnx7JugnbHQWJdCCD-Q@mail.gmail.com>
	<5294B461.5000405@redhat.com>
	<5294B634.4050801@cloudius-systems.com>
	<20131126150357.GA20352@redhat.com> <5294BC3B.6070902@redhat.com>
In-Reply-To: <5294BC3B.6070902@redhat.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Subject: Re: [Qemu-devel] [RFC] create a single workqueue for each vm to
 update vm irq routing table
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Paolo Bonzini <pbonzini@redhat.com>, Gleb Natapov <gleb@redhat.com>
Cc: "Huangweidong (C)" <weidong.huang@huawei.com>, KVM <kvm@vger.kernel.org>, "Michael S. Tsirkin" <mst@redhat.com>, "Zhanghaoyu (A)" <haoyu.zhang@huawei.com>, Luonengjun <luonengjun@huawei.com>, "qemu-devel@nongnu.org" <qemu-devel@nongnu.org>, Zanghongyong <zanghongyong@huawei.com>, Avi Kivity <avi.kivity@gmail.com>, "Jinxin (F)" <jinxin712@huawei.com>

On 11/26/2013 05:20 PM, Paolo Bonzini wrote:
> Il 26/11/2013 16:03, Gleb Natapov ha scritto:
>>>>>>>> I understood the proposal was also to eliminate the synchronize_rcu(),
>>>>>>>> so while new interrupts would see the new routing table, interrupts
>>>>>>>> already in flight could pick up the old one.
>>>>>> Isn't that always the case with RCU?  (See my answer above: "the vcpus
>>>>>> already see the new routing table after the rcu_assign_pointer that is
>>>>>> in kvm_irq_routing_update").
>>>> With synchronize_rcu(), you have the additional guarantee that any
>>>> parallel accesses to the old routing table have completed.  Since we
>>>> also trigger the irq from rcu context, you know that after
>>>> synchronize_rcu() you won't get any interrupts to the old
>>>> destination (see kvm_set_irq_inatomic()).
>> We do not have this guaranty for other vcpus that do not call
>> synchronize_rcu(). They may still use outdated routing table while a vcpu
>> or iothread that performed table update sits in synchronize_rcu().
> Avi's point is that, after the VCPU resumes execution, you know that no
> interrupt will be sent to the old destination because
> kvm_set_msi_inatomic (and ultimately kvm_irq_delivery_to_apic_fast) is
> also called within the RCU read-side critical section.
>
> Without synchronize_rcu you could have
>
>      VCPU writes to routing table
>                                         e = entry from IRQ routing table
>      kvm_irq_routing_update(kvm, new);
>      VCPU resumes execution
>                                         kvm_set_msi_irq(e, &irq);
>                                         kvm_irq_delivery_to_apic_fast();
>
> where the entry is stale but the VCPU has already resumed execution.
>
> If we want to ensure, we need to use a different mechanism for
> synchronization than the global RCU.  QRCU would work; readers are not
> wait-free but only if there is a concurrent synchronize_qrcu, which
> should be rare.

An alternative path is to convince ourselves that the hardware does not 
provide the guarantees that the current code provides, and so we can 
relax them.