From mboxrd@z Thu Jan  1 00:00:00 1970
From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Subject: Re: [PATCHv2] KVM: optimize apic interrupt delivery
Date: Tue, 26 Nov 2013 11:35:06 -0800
Message-ID: <20131126193506.GE4137@linux.vnet.ibm.com>
References: <20120911141023.GB26031@redhat.com>
 <20120911171300.GJ4257@linux.vnet.ibm.com>
 <20120911223337.GA28821@redhat.com>
 <20120912010334.GK4257@linux.vnet.ibm.com>
 <50503D92.7090108@redhat.com>
 <20120912123441.GQ20907@redhat.com>
 <505081E9.8080505@redhat.com>
 <20120912124426.GR20907@redhat.com>
 <20120912151354.GO4257@linux.vnet.ibm.com>
 <20131126162402.GA24806@redhat.com>
Reply-To: paulmck@linux.vnet.ibm.com
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Cc: Gleb Natapov <gleb@redhat.com>, kvm@vger.kernel.org,
	mtosatti@redhat.com
To: "Michael S. Tsirkin" <mst@redhat.com>
Return-path: <kvm-owner@vger.kernel.org>
Received: from e38.co.us.ibm.com ([32.97.110.159]:48948 "EHLO
	e38.co.us.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
	with ESMTP id S1752654Ab3KZTfM (ORCPT <rfc822;kvm@vger.kernel.org>);
	Tue, 26 Nov 2013 14:35:12 -0500
Received: from /spool/local
	by e38.co.us.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted
	for <kvm@vger.kernel.org> from <paulmck@linux.vnet.ibm.com>;
	Tue, 26 Nov 2013 12:35:11 -0700
Received: from b03cxnp08027.gho.boulder.ibm.com (b03cxnp08027.gho.boulder.ibm.com [9.17.130.19])
	by d03dlp01.boulder.ibm.com (Postfix) with ESMTP id 9777C1FF0023
	for <kvm@vger.kernel.org>; Tue, 26 Nov 2013 12:34:48 -0700 (MST)
Received: from d03av06.boulder.ibm.com (d03av06.boulder.ibm.com [9.17.195.245])
	by b03cxnp08027.gho.boulder.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id rAQHXJwu25231582
	for <kvm@vger.kernel.org>; Tue, 26 Nov 2013 18:33:19 +0100
Received: from d03av06.boulder.ibm.com (loopback [127.0.0.1])
	by d03av06.boulder.ibm.com (8.14.4/8.13.1/NCO v10.0 AVout) with ESMTP id rAQJc4vT009700
	for <kvm@vger.kernel.org>; Tue, 26 Nov 2013 12:38:04 -0700
Content-Disposition: inline
In-Reply-To: <20131126162402.GA24806@redhat.com>
Sender: kvm-owner@vger.kernel.org
List-ID: <kvm.vger.kernel.org>

On Tue, Nov 26, 2013 at 06:24:13PM +0200, Michael S. Tsirkin wrote:
> On Wed, Sep 12, 2012 at 08:13:54AM -0700, Paul E. McKenney wrote:
> > On Wed, Sep 12, 2012 at 03:44:26PM +0300, Gleb Natapov wrote:
> > > On Wed, Sep 12, 2012 at 03:36:57PM +0300, Avi Kivity wrote:
> > > > On 09/12/2012 03:34 PM, Gleb Natapov wrote:
> > > > > On Wed, Sep 12, 2012 at 10:45:22AM +0300, Avi Kivity wrote:
> > > > >> On 09/12/2012 04:03 AM, Paul E. McKenney wrote:
> > > > >> >> > > Paul, I'd like to check something with you here:
> > > > >> >> > > this function can be triggered by userspace,
> > > > >> >> > > any number of times; we allocate
> > > > >> >> > > a 2K chunk of memory that is later freed by
> > > > >> >> > > kfree_rcu.
> > > > >> >> > > 
> > > > >> >> > > Is there a risk of DOS if RCU is delayed while
> > > > >> >> > > lots of memory is queued up in this way?
> > > > >> >> > > If yes is this a generic problem with kfree_rcu
> > > > >> >> > > that should be addressed in core kernel?
> > > > >> >> > 
> > > > >> >> > There is indeed a risk.
> > > > >> >> 
> > > > >> >> In our case it's a 2K object. Is it a practical risk?
> > > > >> > 
> > > > >> > How many kfree_rcu()s per second can a given user cause to happen?
> > > > >> 
> > > > >> Not much more than a few hundred thousand per second per process (normal
> > > > >> operation is zero).
> > > > >> 
> > > > > I managed to do 21466 per second.
> > > > 
> > > > Strange, why so slow?
> > > > 
> > > Because ftrace buffer overflows :) With bigger buffer I get 169940.
> > 
> > Ah, good, should not be a problem.  In contrast, if you ran kfree_rcu() in
> > a tight loop, you could probably do in excess of 100M per CPU per second.
> > Now -that- might be a problem.
> > 
> > Well, it -might- be a problem if you somehow figured out how to allocate
> > memory that quickly in a steady-state manner.  ;-)
> > 
> > > > >> Good idea.  Michael, is should be easy to modify kvm-unit-tests to write
> > > > >> to the APIC ID register in a loop.
> > > > >> 
> > > > > I did. Memory consumption does not grow on otherwise idle host.
> > 
> > Very good -- the checks in __call_rcu(), which is common code invoked by
> > kfree_rcu(), seem to be doing their job, then.  These do keep a per-CPU
> > counter, which can be adjusted via rcutree.blimit, which defaults
> > to taking evasive action if more than 10K callbacks are waiting on a
> > given CPU.
> > 
> > My concern was that you might be overrunning that limit in way less
> > than a grace period (as in about a hundred microseconds.  My concern
> > was of course unfounded -- you take several grace periods in push 10K
> > callbacks through.
> > 
> > 							Thanx, Paul
> 
> Gleb noted that Documentation/RCU/checklist.txt has this text:
> 
>         An especially important property of the synchronize_rcu()
>         primitive is that it automatically self-limits: if grace periods
>         are delayed for whatever reason, then the synchronize_rcu()
>         primitive will correspondingly delay updates.  In contrast,
>         code using call_rcu() should explicitly limit update rate in
>         cases where grace periods are delayed, as failing to do so can
>         result in excessive realtime latencies or even OOM conditions.
> 
> If call_rcu is self-limiting maybe this should be documented ...

It would be more accurate to say that takes has some measures to limit
the damage -- you can overwhelm these measures if you try hard enough.

And I guess I could say something to that effect.  ;-)

							Thanx, Paul