From mboxrd@z Thu Jan 1 00:00:00 1970 From: Gleb Natapov Subject: Re: [PATCHv2] KVM: optimize apic interrupt delivery Date: Wed, 27 Nov 2013 10:00:09 +0200 Message-ID: <20131127080009.GO959@redhat.com> References: <20120911171300.GJ4257@linux.vnet.ibm.com> <20120911223337.GA28821@redhat.com> <20120912010334.GK4257@linux.vnet.ibm.com> <50503D92.7090108@redhat.com> <20120912123441.GQ20907@redhat.com> <505081E9.8080505@redhat.com> <20120912124426.GR20907@redhat.com> <20120912151354.GO4257@linux.vnet.ibm.com> <20131126162402.GA24806@redhat.com> <20131126193506.GE4137@linux.vnet.ibm.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: "Michael S. Tsirkin" , kvm@vger.kernel.org, mtosatti@redhat.com To: "Paul E. McKenney" Return-path: Received: from mx1.redhat.com ([209.132.183.28]:42470 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751952Ab3K0IA0 (ORCPT ); Wed, 27 Nov 2013 03:00:26 -0500 Content-Disposition: inline In-Reply-To: <20131126193506.GE4137@linux.vnet.ibm.com> Sender: kvm-owner@vger.kernel.org List-ID: On Tue, Nov 26, 2013 at 11:35:06AM -0800, Paul E. McKenney wrote: > On Tue, Nov 26, 2013 at 06:24:13PM +0200, Michael S. Tsirkin wrote: > > On Wed, Sep 12, 2012 at 08:13:54AM -0700, Paul E. McKenney wrote: > > > On Wed, Sep 12, 2012 at 03:44:26PM +0300, Gleb Natapov wrote: > > > > On Wed, Sep 12, 2012 at 03:36:57PM +0300, Avi Kivity wrote: > > > > > On 09/12/2012 03:34 PM, Gleb Natapov wrote: > > > > > > On Wed, Sep 12, 2012 at 10:45:22AM +0300, Avi Kivity wrote: > > > > > >> On 09/12/2012 04:03 AM, Paul E. McKenney wrote: > > > > > >> >> > > Paul, I'd like to check something with you here: > > > > > >> >> > > this function can be triggered by userspace, > > > > > >> >> > > any number of times; we allocate > > > > > >> >> > > a 2K chunk of memory that is later freed by > > > > > >> >> > > kfree_rcu. > > > > > >> >> > > > > > > > >> >> > > Is there a risk of DOS if RCU is delayed while > > > > > >> >> > > lots of memory is queued up in this way? > > > > > >> >> > > If yes is this a generic problem with kfree_rcu > > > > > >> >> > > that should be addressed in core kernel? > > > > > >> >> > > > > > > >> >> > There is indeed a risk. > > > > > >> >> > > > > > >> >> In our case it's a 2K object. Is it a practical risk? > > > > > >> > > > > > > >> > How many kfree_rcu()s per second can a given user cause to happen? > > > > > >> > > > > > >> Not much more than a few hundred thousand per second per process (normal > > > > > >> operation is zero). > > > > > >> > > > > > > I managed to do 21466 per second. > > > > > > > > > > Strange, why so slow? > > > > > > > > > Because ftrace buffer overflows :) With bigger buffer I get 169940. > > > > > > Ah, good, should not be a problem. In contrast, if you ran kfree_rcu() in > > > a tight loop, you could probably do in excess of 100M per CPU per second. > > > Now -that- might be a problem. > > > > > > Well, it -might- be a problem if you somehow figured out how to allocate > > > memory that quickly in a steady-state manner. ;-) > > > > > > > > >> Good idea. Michael, is should be easy to modify kvm-unit-tests to write > > > > > >> to the APIC ID register in a loop. > > > > > >> > > > > > > I did. Memory consumption does not grow on otherwise idle host. > > > > > > Very good -- the checks in __call_rcu(), which is common code invoked by > > > kfree_rcu(), seem to be doing their job, then. These do keep a per-CPU > > > counter, which can be adjusted via rcutree.blimit, which defaults > > > to taking evasive action if more than 10K callbacks are waiting on a > > > given CPU. > > > > > > My concern was that you might be overrunning that limit in way less > > > than a grace period (as in about a hundred microseconds. My concern > > > was of course unfounded -- you take several grace periods in push 10K > > > callbacks through. > > > > > > Thanx, Paul > > > > Gleb noted that Documentation/RCU/checklist.txt has this text: > > > > An especially important property of the synchronize_rcu() > > primitive is that it automatically self-limits: if grace periods > > are delayed for whatever reason, then the synchronize_rcu() > > primitive will correspondingly delay updates. In contrast, > > code using call_rcu() should explicitly limit update rate in > > cases where grace periods are delayed, as failing to do so can > > result in excessive realtime latencies or even OOM conditions. > > > > If call_rcu is self-limiting maybe this should be documented ... > > It would be more accurate to say that takes has some measures to limit > the damage -- you can overwhelm these measures if you try hard enough. > The question is: Is it safe to have a call_rcu() without any additional rate limiting on user triggerable code path? > And I guess I could say something to that effect. ;-) > > Thanx, Paul -- Gleb.