From mboxrd@z Thu Jan  1 00:00:00 1970
From: "Michael S. Tsirkin" <mst@redhat.com>
Subject: Re: [RFC] create a single workqueue for each vm to update vm irq
 routing table
Date: Tue, 26 Nov 2013 18:24:45 +0200
Message-ID: <20131126162445.GB24806@redhat.com>
References: <D3E216785288A145B7BC975F83A2ED10448ADCCB@SZXEMA510-MBS.china.huawei.com>
 <52949847.6020908@redhat.com>
 <20131126125610.GM959@redhat.com>
 <20131126160536.GA23007@redhat.com>
 <20131126161427.GB20352@redhat.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Cc: Paolo Bonzini <pbonzini@redhat.com>,
	"Zhanghaoyu (A)" <haoyu.zhang@huawei.com>,
	KVM <kvm@vger.kernel.org>,
	"qemu-devel@nongnu.org" <qemu-devel@nongnu.org>,
	Eric Blake <eblake@redhat.com>,
	Luonengjun <luonengjun@huawei.com>,
	"Huangweidong (C)" <weidong.huang@huawei.com>,
	"Jinxin (F)" <jinxin712@huawei.com>,
	Zanghongyong <zanghongyong@huawei.com>
To: Gleb Natapov <gleb@redhat.com>
Return-path: <kvm-owner@vger.kernel.org>
Received: from mx1.redhat.com ([209.132.183.28]:1701 "EHLO mx1.redhat.com"
	rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
	id S932224Ab3KZQVh (ORCPT <rfc822;kvm@vger.kernel.org>);
	Tue, 26 Nov 2013 11:21:37 -0500
Content-Disposition: inline
In-Reply-To: <20131126161427.GB20352@redhat.com>
Sender: kvm-owner@vger.kernel.org
List-ID: <kvm.vger.kernel.org>

On Tue, Nov 26, 2013 at 06:14:27PM +0200, Gleb Natapov wrote:
> On Tue, Nov 26, 2013 at 06:05:37PM +0200, Michael S. Tsirkin wrote:
> > On Tue, Nov 26, 2013 at 02:56:10PM +0200, Gleb Natapov wrote:
> > > On Tue, Nov 26, 2013 at 01:47:03PM +0100, Paolo Bonzini wrote:
> > > > Il 26/11/2013 13:40, Zhanghaoyu (A) ha scritto:
> > > > > When guest set irq smp_affinity, VMEXIT occurs, then the vcpu thread will IOCTL return to QEMU from hypervisor, then vcpu thread ask the hypervisor to update the irq routing table,
> > > > > in kvm_set_irq_routing, synchronize_rcu is called, current vcpu thread is blocked for so much time to wait RCU grace period, and during this period, this vcpu cannot provide service to VM,
> > > > > so those interrupts delivered to this vcpu cannot be handled in time, and the apps running on this vcpu cannot be serviced too.
> > > > > It's unacceptable in some real-time scenario, e.g. telecom. 
> > > > > 
> > > > > So, I want to create a single workqueue for each VM, to asynchronously performing the RCU synchronization for irq routing table, 
> > > > > and let the vcpu thread return and VMENTRY to service VM immediately, no more need to blocked to wait RCU grace period.
> > > > > And, I have implemented a raw patch, took a test in our telecom environment, above problem disappeared.
> > > > 
> > > > I don't think a workqueue is even needed.  You just need to use call_rcu
> > > > to free "old" after releasing kvm->irq_lock.
> > > > 
> > > > What do you think?
> > > > 
> > > It should be rate limited somehow. Since it guest triggarable guest may cause
> > > host to allocate a lot of memory this way.
> > 
> > The checks in __call_rcu(), should handle this I think.  These keep a per-CPU
> > counter, which can be adjusted via rcutree.blimit, which defaults
> > to taking evasive action if more than 10K callbacks are waiting on a
> > given CPU.
> > 
> > 
> Documentation/RCU/checklist.txt has:
> 
>         An especially important property of the synchronize_rcu()
>         primitive is that it automatically self-limits: if grace periods
>         are delayed for whatever reason, then the synchronize_rcu()
>         primitive will correspondingly delay updates.  In contrast,
>         code using call_rcu() should explicitly limit update rate in
>         cases where grace periods are delayed, as failing to do so can
>         result in excessive realtime latencies or even OOM conditions.

I just asked Paul what this means.

> --
> 			Gleb.