From mboxrd@z Thu Jan  1 00:00:00 1970
From: Avi Kivity <avi@cloudius-systems.com>
Subject: Re: [Qemu-devel] [RFC] create a single workqueue for each vm to update
 vm irq routing table
Date: Tue, 26 Nov 2013 17:24:17 +0200
Message-ID: <5294BD21.4010701@cloudius-systems.com>
References: <D3E216785288A145B7BC975F83A2ED10448ADCCB@SZXEMA510-MBS.china.huawei.com> <52949847.6020908@redhat.com> <CAEbWaipAnmoa=gMbB1aNb=btU6LgYXhcnmQWHJ_89m4yvw6Dug@mail.gmail.com> <5294A68F.6060301@redhat.com> <CAF950W+-UiX6xv4vYmnxji9aWDU8ds2rnx7JugnbHQWJdCCD-Q@mail.gmail.com> <5294B461.5000405@redhat.com> <5294B634.4050801@cloudius-systems.com> <20131126150357.GA20352@redhat.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Cc: Paolo Bonzini <pbonzini@redhat.com>,
	Avi Kivity <avi.kivity@gmail.com>,
	"Huangweidong (C)" <weidong.huang@huawei.com>,
	KVM <kvm@vger.kernel.org>, "Michael S. Tsirkin" <mst@redhat.com>,
	"Jinxin (F)" <jinxin712@huawei.com>,
	"Zhanghaoyu (A)" <haoyu.zhang@huawei.com>,
	Luonengjun <luonengjun@huawei.com>,
	"qemu-devel@nongnu.org" <qemu-devel@nongnu.org>,
	Zanghongyong <zanghongyong@huawei.com>
To: Gleb Natapov <gleb@redhat.com>
Return-path: <kvm-owner@vger.kernel.org>
Received: from mail-bk0-f47.google.com ([209.85.214.47]:56785 "EHLO
	mail-bk0-f47.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
	with ESMTP id S932160Ab3KZPYW (ORCPT <rfc822;kvm@vger.kernel.org>);
	Tue, 26 Nov 2013 10:24:22 -0500
Received: by mail-bk0-f47.google.com with SMTP id mx12so2667365bkb.34
        for <kvm@vger.kernel.org>; Tue, 26 Nov 2013 07:24:20 -0800 (PST)
In-Reply-To: <20131126150357.GA20352@redhat.com>
Sender: kvm-owner@vger.kernel.org
List-ID: <kvm.vger.kernel.org>

On 11/26/2013 05:03 PM, Gleb Natapov wrote:
> On Tue, Nov 26, 2013 at 04:54:44PM +0200, Avi Kivity wrote:
>> On 11/26/2013 04:46 PM, Paolo Bonzini wrote:
>>> Il 26/11/2013 15:36, Avi Kivity ha scritto:
>>>>      No, this would be exactly the same code that is running now:
>>>>
>>>>              mutex_lock(&kvm->irq_lock);
>>>>              old = kvm->irq_routing;
>>>>              kvm_irq_routing_update(kvm, new);
>>>>              mutex_unlock(&kvm->irq_lock);
>>>>
>>>>              synchronize_rcu();
>>>>              kfree(old);
>>>>              return 0;
>>>>
>>>>      Except that the kfree would run in the call_rcu kernel thread instead of
>>>>      the vcpu thread.  But the vcpus already see the new routing table after
>>>>      the rcu_assign_pointer that is in kvm_irq_routing_update.
>>>>
>>>> I understood the proposal was also to eliminate the synchronize_rcu(),
>>>> so while new interrupts would see the new routing table, interrupts
>>>> already in flight could pick up the old one.
>>> Isn't that always the case with RCU?  (See my answer above: "the vcpus
>>> already see the new routing table after the rcu_assign_pointer that is
>>> in kvm_irq_routing_update").
>> With synchronize_rcu(), you have the additional guarantee that any
>> parallel accesses to the old routing table have completed.  Since we
>> also trigger the irq from rcu context, you know that after
>> synchronize_rcu() you won't get any interrupts to the old
>> destination (see kvm_set_irq_inatomic()).
> We do not have this guaranty for other vcpus that do not call
> synchronize_rcu(). They may still use outdated routing table while a vcpu
> or iothread that performed table update sits in synchronize_rcu().
>

Consider this guest code:

   write msi entry, directing the interrupt away from this vcpu
   nop
   memset(&idt, 0, sizeof(idt));

Currently, this code will never trigger a triple fault.  With the change 
to call_rcu(), it may.

Now it may be that the guest does not expect this to work (PCI writes 
are posted; and interrupts can be delayed indefinitely by the pci 
fabric), but we don't know if there's a path that guarantees the guest 
something that we're taking away with this change.