From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner+w=401wt.eu-S1754201AbYHVODS@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S1754201AbYHVODS (ORCPT <rfc822;w@1wt.eu>);
	Fri, 22 Aug 2008 10:03:18 -0400
Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1752848AbYHVODH
	(ORCPT <rfc822;linux-kernel-outgoing>);
	Fri, 22 Aug 2008 10:03:07 -0400
Received: from smtp1.linux-foundation.org ([140.211.169.13]:47954 "EHLO
	smtp1.linux-foundation.org" rhost-flags-OK-OK-OK-OK)
	by vger.kernel.org with ESMTP id S1752608AbYHVODG (ORCPT
	<rfc822;linux-kernel@vger.kernel.org>);
	Fri, 22 Aug 2008 10:03:06 -0400
Message-ID: <48AEC6B2.1080701@linux-foundation.org>
Date: Fri, 22 Aug 2008 09:01:22 -0500
From: Christoph Lameter <cl@linux-foundation.org>
User-Agent: Thunderbird 2.0.0.16 (Windows/20080708)
MIME-Version: 1.0
To: Pekka Enberg <penberg@cs.helsinki.fi>
CC: Ingo Molnar <mingo@elte.hu>, Jeremy Fitzhardinge <jeremy@goop.org>,
       Nick Piggin <nickpiggin@yahoo.com.au>, Andi Kleen <andi@firstfloor.org>,
       "Pallipadi, Venkatesh" <venkatesh.pallipadi@intel.com>,
       Suresh Siddha <suresh.b.siddha@intel.com>,
       Jens Axboe <jens.axboe@oracle.com>,
       Rusty Russell <rusty@rustcorp.com.au>,
       Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
       "Paul E. McKenney" <paulmck@us.ibm.com>
Subject: Re: [PATCH 2/2] smp_call_function: use rwlocks on queues rather than
 rcu
References: <48AE0883.6050701@goop.org> <20080822062800.GQ14110@elte.hu> <84144f020808220006n25d684b1n9db306ddc4f58c4c@mail.gmail.com>
In-Reply-To: <84144f020808220006n25d684b1n9db306ddc4f58c4c@mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

Pekka Enberg wrote:
> Hi Ingo,
> 
> On Fri, Aug 22, 2008 at 9:28 AM, Ingo Molnar <mingo@elte.hu> wrote:
>> * Jeremy Fitzhardinge <jeremy@goop.org> wrote:
>>
>>> RCU can only control the lifetime of allocated memory blocks, which
>>> forces all the call structures to be allocated.  This is expensive
>>> compared to allocating them on the stack, which is the common case for
>>> synchronous calls.
>>>
>>> This patch takes a different approach.  Rather than using RCU, the
>>> queues are managed under rwlocks.  Adding or removing from the queue
>>> requires holding the lock for writing, but multiple CPUs can walk the
>>> queues to process function calls under read locks.  In the common
>>> case, where the structures are stack allocated, the calling CPU need
>>> only wait for its call to be done, take the lock for writing and
>>> remove the call structure.
>>>
>>> Lock contention - particularly write vs read - is reduced by using
>>> multiple queues.
>> hm, is there any authorative data on what is cheaper on a big box, a
>> full-blown MESI cache miss that occurs for every reader in this new
>> fastpath, or a local SLAB/SLUB allocation+free that occurs with the
>> current RCU approach?
> 
> Christoph might have an idea about it.

Its on the stack which is presumably hot so no cache miss? If its async then
presumably we do not need to wait so its okay to call an allocator.

Generally: The larger the box (longer cacheline acquisition latencies) and the
higher the contention (cannot get cacheline because of contention) the better
a slab allocation will be compared to a cacheline miss.

RCU is problematic because it lets cachelines get cold. A hot cacheline that
is used frequently read and written to by the same cpu is very good thing for
performace.