From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755326AbZBTDRd (ORCPT ); Thu, 19 Feb 2009 22:17:33 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1752058AbZBTDRX (ORCPT ); Thu, 19 Feb 2009 22:17:23 -0500 Received: from hera.kernel.org ([140.211.167.34]:39904 "EHLO hera.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752029AbZBTDRX (ORCPT ); Thu, 19 Feb 2009 22:17:23 -0500 Message-ID: <499E20BC.4020408@kernel.org> Date: Fri, 20 Feb 2009 12:17:16 +0900 From: Tejun Heo User-Agent: Thunderbird 2.0.0.19 (X11/20081227) MIME-Version: 1.0 To: Ingo Molnar CC: rusty@rustcorp.com.au, tglx@linutronix.de, x86@kernel.org, linux-kernel@vger.kernel.org, hpa@zytor.com, jeremy@goop.org, cpw@sgi.com Subject: Re: [PATCHSET x86/core/percpu] implement dynamic percpu allocator References: <1234958676-27618-1-git-send-email-tj@kernel.org> <499CA834.4080208@kernel.org> <20090219110718.GK2354@elte.hu> In-Reply-To: <20090219110718.GK2354@elte.hu> X-Enigmail-Version: 0.95.7 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.0 (hera.kernel.org [127.0.0.1]); Fri, 20 Feb 2009 03:17:02 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hello, Ingo. Ingo Molnar wrote: > * Tejun Heo wrote: > >> Tejun Heo wrote: >>> One trick we can do is to reserve the initial chunk in non-vmalloc >>> area so that at least the static cpu ones and whatever gets >>> allocated in the first chunk is served by regular large page >>> mappings. Given that those are most frequent visited ones, this >>> could be a nice compromise - no noticeable penalty for usual cases >>> yet allowing scalability for unusual cases. If this is something >>> which can be agreed on, I'll pursue this. >> I've given more thought to this and it actually will solve >> most of issues for non-NUMA but it can't be done for NUMA. >> Any better ideas? > > It could be allocated via NUMA-aware bootmem allocations. Hmmm... not really. Here's what I was planning to do on non-NUMA. Allocate the first chunk using alloc_bootmem(). After setting up each unit, give back extra space sans the initialized static area and some amount of free space which should be enough for common cases by calling free_bootmem(). Mark the returned space as used in the chunk map. This will allow sane chunk size and scalability without adding TLB pressure, so it's actually pretty sweet. Unfortunately, this doesn't really work for NUMA because we don't have control over how NUMA addresses are laid out so we can't allocate contiguous NUMA-correct chunk without remapping. And if we remap, we can't give back what's left to the allocator. Giving back the original address doubles TLB usage and giving back the remapped address breaks __pa/__va. :-( Thanks. -- tejun