From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753380AbYHXEzu (ORCPT ); Sun, 24 Aug 2008 00:55:50 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1751519AbYHXEzn (ORCPT ); Sun, 24 Aug 2008 00:55:43 -0400 Received: from gw.goop.org ([64.81.55.164]:59527 "EHLO mail.goop.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751420AbYHXEzm (ORCPT ); Sun, 24 Aug 2008 00:55:42 -0400 Message-ID: <48B0E9CA.4040808@goop.org> Date: Sat, 23 Aug 2008 21:55:38 -0700 From: Jeremy Fitzhardinge User-Agent: Thunderbird 2.0.0.16 (X11/20080723) MIME-Version: 1.0 To: Andi Kleen CC: "Paul E. McKenney" , Christoph Lameter , Pekka Enberg , Ingo Molnar , Nick Piggin , "Pallipadi, Venkatesh" , Suresh Siddha , Jens Axboe , Rusty Russell , Linux Kernel Mailing List Subject: Re: [PATCH 2/2] smp_call_function: use rwlocks on queues rather than rcu References: <48AE0883.6050701@goop.org> <20080822062800.GQ14110@elte.hu> <84144f020808220006n25d684b1n9db306ddc4f58c4c@mail.gmail.com> <48AEC6B2.1080701@linux-foundation.org> <20080822151156.GA6744@linux.vnet.ibm.com> <48AEF3FD.70906@linux-foundation.org> <20080822182915.GG6744@linux.vnet.ibm.com> <20080822183346.GS23334@one.firstfloor.org> <48AF0702.8040303@goop.org> <20080823073457.GV23334@one.firstfloor.org> In-Reply-To: <20080823073457.GV23334@one.firstfloor.org> X-Enigmail-Version: 0.95.7 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Andi Kleen wrote: > On Fri, Aug 22, 2008 at 11:35:46AM -0700, Jeremy Fitzhardinge wrote: > >> Andi Kleen wrote: >> >>> Right now my impression is that it is not well understood why >>> the kmalloc makes the IPI that much slower. In theory a kmalloc >>> shouldn't be all that slow, it's essentially just a >>> "disable interrupts; unlink object from cpu cache; enable interrupts" >>> with some window dressing. kfree() is similar. >>> >>> Does it bounce a cache line on freeing perhaps? >>> >> I think it's just an assumption that it would be slower. Has anyone >> measured it? >> > > It's likely slower than no kmalloc because > there will be more instructions executed, the question is just how much. > > >> (Note: The measurements I posted do not cover this path, because it was >> on a two cpu system, and it was always using the call-single path.) >> > > Ah so it was already 25% slower even without kmalloc? I thought > that was with already. That doesn't sound good. Any idea where that slowdown > comes from? Just longer code path, I think. It calls the generic smp_call_function_mask(), which then does a popcount on the cpu mask (which it needs to do anyway), sees only one bit set, and then punts to the smp_call_function_single() path. If we maintained a cpus_online count, then we could fast-path the call to smp_call_function_single() in the two core/cpu case more efficiently (would still need to scan the mask to extract the cpu number). Or alternatively, maybe it isn't actually worth special casing smp_call_function_single() with a multi-queue smp_call_function_mask() implementation? J