From: Tejun Heo <tj@kernel.org>
To: Andi Kleen <andi@firstfloor.org>
Cc: Christoph Lameter <cl@linux-foundation.org>,
Ingo Molnar <mingo@elte.hu>,
Andrew Morton <akpm@linux-foundation.org>,
linux-kernel@vger.kernel.org, x86@kernel.org,
linux-arch@vger.kernel.org, hpa@zytor.com, tglx@linutronix.de
Subject: Re: [PATCHSET] percpu: generalize first chunk allocators and improve lpage NUMA support
Date: Wed, 01 Jul 2009 21:53:06 +0900 [thread overview]
Message-ID: <4A4B5C32.1010009@kernel.org> (raw)
In-Reply-To: <20090701122312.GP6760@one.firstfloor.org>
Hello, Andi.
Andi Kleen wrote:
> On Wed, Jul 01, 2009 at 07:21:57PM +0900, Tejun Heo wrote:
>>> using possible per cpu data I picked in current git: icmp.c
>> I was talking about percpu allocator proper. Yeap, the major work
>> would be in auditing and converting for_each_possible_cpu() users.
>
> and testing. that's the hard part. cpu hotplug is normally not well
> tested. Any code change that requires lots of new code for
> it will be a problem because that code will then likely bitrot.
It would be nice to have something to test cpu on/offlining
automatically. Something which keeps bringing cpus up and down as the
system goes through stress testing.
[--snip--]
>> too difficult. I wouldn't know for sure before I actually try tho.
>
> I think it's clear that you haven't tried yet :)
No I haven't yet. I had a pretty good idea about how to implement it
in percpu allocator but haven't really looked at the users. So, yeap,
it's quite possible that I'm under estimating the problem. Oh well,
let's see. Thanks for the warnings.
> I wrote quite a few per cpu callback handlers over the years and
> in my experience they are all nasty code with subtle races. The problem
> is that instead of having a single subset init function which
> is just single threaded and doesn't need to worry about races
> you now have multi threaded init, which tends to be a can of worms.
I tried a couple (didn't end up sending them out) and yeah they could
be quite painful. Bringing up part usually isn't as painful as the
other way tho.
> I think a far saner strategy than rewriting every user of DEFINE_PER_CPU,
> ending up with lots of badly tested code is to:
But I don't think it would be that drastic. Most users are quite
simple.
> - Fix the few large size percpu pigs that are problematic today to
> allocate in a callback.
> - Then once the per cpu data in all configurations is <200k (better
> <100 in the non debug builds) again just keep pre-allocating like we
> always did
> - Possibly adjust the vmalloc area on 32bit based on the possible
> CPU map at the cost of the direct mapping, to make sure there's
> always enough mapping space.
I think it's something we eventually need to do. There already are
cases where lack of scalable and performant percpu allocation leads to
design restrictions and between many core cpus and virtualization the
requirements are becoming more varied.
Thanks.
--
tejun
next prev parent reply other threads:[~2009-07-01 12:51 UTC|newest]
Thread overview: 66+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-06-24 13:30 [PATCHSET] percpu: generalize first chunk allocators and improve lpage NUMA support Tejun Heo
2009-06-24 13:30 ` Tejun Heo
2009-06-24 13:30 ` [PATCH 01/10] x86: make pcpu_chunk_addr_search() matching stricter Tejun Heo
2009-06-24 13:30 ` Tejun Heo
2009-06-24 13:30 ` [PATCH 02/10] percpu: drop @unit_size from embed first chunk allocator Tejun Heo
2009-06-24 13:30 ` Tejun Heo
2009-06-24 13:30 ` [PATCH 03/10] x86,percpu: generalize 4k " Tejun Heo
2009-06-24 13:30 ` Tejun Heo
2009-06-24 13:30 ` [PATCH 04/10] percpu: make 4k first chunk allocator map memory Tejun Heo
2009-06-24 13:30 ` Tejun Heo
2009-06-24 13:30 ` [PATCH 05/10] x86,percpu: generalize lpage first chunk allocator Tejun Heo
2009-06-24 13:30 ` Tejun Heo
2009-06-24 13:30 ` [PATCH 06/10] percpu: simplify pcpu_setup_first_chunk() Tejun Heo
2009-06-24 13:30 ` Tejun Heo
2009-06-24 13:30 ` [PATCH 07/10] percpu: reorder a few functions in mm/percpu.c Tejun Heo
2009-06-24 13:30 ` Tejun Heo
2009-06-24 13:30 ` [PATCH 08/10] percpu: drop pcpu_chunk->page[] Tejun Heo
2009-06-24 13:30 ` Tejun Heo
2009-06-24 13:30 ` [PATCH 09/10] percpu: allow non-linear / sparse cpu -> unit mapping Tejun Heo
2009-06-24 13:30 ` Tejun Heo
2009-06-24 13:30 ` [PATCH 10/10] percpu: teach large page allocator about NUMA Tejun Heo
2009-06-24 13:30 ` Tejun Heo
2009-06-24 23:55 ` [PATCHSET] percpu: generalize first chunk allocators and improve lpage NUMA support Andrew Morton
2009-06-25 0:02 ` Andi Kleen
2009-06-25 0:13 ` H. Peter Anvin
2009-06-25 9:19 ` Andi Kleen
2009-06-25 14:18 ` H. Peter Anvin
2009-06-25 19:54 ` Andi Kleen
2009-06-25 20:15 ` H. Peter Anvin
2009-06-25 20:26 ` Andi Kleen
2009-06-26 0:40 ` Tejun Heo
2009-06-26 2:02 ` H. Peter Anvin
2009-06-26 6:54 ` Andi Kleen
2009-06-25 2:35 ` Tejun Heo
2009-06-25 9:20 ` Ingo Molnar
2009-06-29 23:20 ` Christoph Lameter
2009-06-29 23:39 ` Andrew Morton
2009-06-30 14:24 ` Christoph Lameter
2009-06-30 19:15 ` Ingo Molnar
2009-06-30 19:39 ` Christoph Lameter
2009-06-30 20:21 ` Scott Lurndal
2009-06-30 21:31 ` Ingo Molnar
2009-06-30 22:16 ` Christoph Lameter
2009-06-30 22:31 ` Ingo Molnar
2009-06-30 22:40 ` Andi Kleen
2009-07-01 0:48 ` Tejun Heo
2009-06-30 22:55 ` Christoph Lameter
2009-06-30 22:55 ` Christoph Lameter
2009-06-30 23:07 ` Ingo Molnar
2009-06-30 23:18 ` Christoph Lameter
2009-06-30 23:30 ` Ingo Molnar
2009-07-01 6:34 ` Andi Kleen
2009-06-30 23:20 ` Tejun Heo
2009-06-30 23:31 ` Ingo Molnar
2009-06-30 23:34 ` H. Peter Anvin
2009-07-01 6:42 ` Andi Kleen
2009-07-01 10:21 ` Tejun Heo
2009-07-01 12:23 ` Andi Kleen
2009-07-01 12:53 ` Tejun Heo [this message]
2009-07-01 13:11 ` Andi Kleen
2009-07-01 17:33 ` Christoph Lameter
2009-07-01 22:42 ` Tejun Heo
2009-07-03 23:14 ` Tejun Heo
2009-07-03 23:14 ` Tejun Heo
2009-07-13 10:12 ` [PATCH 04/10] percpu: make 4k first chunk allocator map memory David Howells
2009-07-15 3:17 ` Tejun Heo
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4A4B5C32.1010009@kernel.org \
--to=tj@kernel.org \
--cc=akpm@linux-foundation.org \
--cc=andi@firstfloor.org \
--cc=cl@linux-foundation.org \
--cc=hpa@zytor.com \
--cc=linux-arch@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@elte.hu \
--cc=tglx@linutronix.de \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).