From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755368Ab0ETWXP (ORCPT ); Thu, 20 May 2010 18:23:15 -0400 Received: from rcsinet10.oracle.com ([148.87.113.121]:57060 "EHLO rcsinet10.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755296Ab0ETWXO (ORCPT ); Thu, 20 May 2010 18:23:14 -0400 Date: Thu, 20 May 2010 18:21:54 -0400 From: Chris Mason To: Peter Zijlstra Cc: Ingo Molnar , axboe@kernel.dk, linux-kernel@vger.kernel.org Subject: Re: [PATCH RFC] reduce runqueue lock contention Message-ID: <20100520222154.GC20946@think> Mail-Followup-To: Chris Mason , Peter Zijlstra , Ingo Molnar , axboe@kernel.dk, linux-kernel@vger.kernel.org References: <20100520204810.GA19188@think> <1274389786.1674.1653.camel@laptop> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1274389786.1674.1653.camel@laptop> User-Agent: Mutt/1.5.20 (2009-06-14) X-Auth-Type: Internal IP X-Source-IP: rcsinet15.oracle.com [148.87.113.117] X-CT-RefId: str=0001.0A090209.4BF5B62A.00D1:SCFMA4539811,ss=1,fgs=0 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, May 20, 2010 at 11:09:46PM +0200, Peter Zijlstra wrote: > On Thu, 2010-05-20 at 16:48 -0400, Chris Mason wrote: > > > > This is more of a starting point than a patch, but it is something I've > > been meaning to look at for a long time. Many different workloads end > > up hammering very hard on try_to_wake_up, to the point where the > > runqueue locks dominate CPU profiles. > > Right, so one of the things that I considered was to make p->state an > atomic_t and replace the initial stage of try_to_wake_up() with > something like: > > int try_to_wake_up(struct task *p, unsigned int mask, wake_flags) > { > int state = atomic_read(&p->state); > > do { > if (!(state & mask)) > return 0; > > state = atomic_cmpxchg(&p->state, state, TASK_WAKING); > } while (state != TASK_WAKING); > > /* do this pending queue + ipi thing */ > > return 1; > } > > Also, I think we might want to put that atomic single linked list thing > into some header (using atomic_long_t or so), because I have a similar > thing living in kernel/perf_event.c, that needs to queue things from NMI > context. So I've done three of these cmpxchg lists recently...but they have all been a little different. I went back and forth a bunch of times about using a list_head based thing instead to avoid the walk for list append. I really don't like the walk. But, what makes this one unique is that I'm using a cmpxchg on the list pointer in the in task struct to take ownership of this task struct. It is how I avoid concurrent lockless enqueues. Your fiddling with the p->state above would let me avoid that. -chris