public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Tejun Heo <tj@kernel.org>
To: Ingo Molnar <mingo@elte.hu>
Cc: rusty@rustcorp.com.au, tglx@linutronix.de, x86@kernel.org,
	linux-kernel@vger.kernel.org, hpa@zytor.com, jeremy@goop.org,
	cpw@sgi.com, nickpiggin@yahoo.com.au, ink@jurassic.park.msu.ru
Subject: Re: [PATCHSET x86/core/percpu] improve the first percpu chunk	allocation
Date: Tue, 24 Feb 2009 23:37:24 +0900	[thread overview]
Message-ID: <49A40624.4000100@kernel.org> (raw)
In-Reply-To: <20090224141217.GA17287@elte.hu>

Hello, Ingo.

Ingo Molnar wrote:
> * Tejun Heo <tj@kernel.org> wrote:
> 
>> What's missing is unification of static and dynamic accessors 
>> and thus the faster accessors - percpu_read() and friends - 
>> for dynamic ones. This will be the next round of patches.
> 
> Ok, good - we are in agreement then and i'll wait for those 
> patches.

Whooray!

> And i think i finally decoded the real source of the disconnect
> :-)
> 
> It's still about this restriction:
> 
> +       /*
> +        * If large page isn't supported, there's no benefit in doing
> +        * this.  Also, embedding allocation doesn't play well with
> +        * NUMA.
> +        */
> +       if (!cpu_has_pse || pcpu_need_numa())
> +               return -EINVAL;
> 
> This is what makes no sense (why force the static percpu area 
> into 4K mappings on NUMA).

No, the first allocator tried is remap allocator which will do the 2MB
remapping thing if NUMA.  If not, it gives its way to embedding
allocator which only kicks in for pse && !numa.  The 4k thing is just
the last resort.  We might as well kill it and make it

 if (numa)
	do remap
 else
	do embed
 panic if failed

The 4k thing is the final fallback for cases where pse isn't
supported.

> You do it because i think you misunderstood my original 2MB TLB 
> static area suggestion. setup_pcpu_embed() does this now:
> 
> +       pcpue_ptr = pcpu_alloc_bootmem(0, num_possible_cpus() * pcpue_unit_size,
> +                                      PAGE_SIZE);
> 
> That is not NUMA-friendly indeed.

NUMA uses 2MB remapping.  Non-NUMA uses embedding.  If you're annoyed
about the embedding allocator, we can drop it but given that most of
the machines in the wild are non-NUMA and the code to do the trick is
quite simple, I think it justifies its existence.

> What should be done instead is to up the unit size to 2MB as i 
> suggested, and to allocate 2MB sized and 2MB aligned 
> numa-correct area for each CPU, via bootmem.

YES, the posted code does EXACTLY that for NUMA!!!!

> To quote my original mail:
> 
>>> - allocate the static percpu area using bootmem-alloc, but
>>>   using a 2MB alignment parameter and a 2MB aligned size. Then
>>>   we can remap it to some convenient and undisturbed virtual
>>>   memory area, using 2MB TLBs. [*]
> 
> I.e. each individual 2MB allocated largepage can then be 
> remapped as a 2MB TLB to the high (vmalloc) area. Followed by 
> ordinary 4K mappings for regular percpu_alloc() pages.
> 
> ( and the partial, unused pages within this initial chunk are 
>   returned to bootmem. )

I did understand that the first time around.

> That will be NUMA-friendly and i suspect we should also use it 
> on SMP just to get that aspect of the code tested better.

For testing coverage, we can make a debug parameter or something but
think about it.  The embedding allocator is ~100 lines of well
commented code which is dropped once init is complete and it always
saves a 2MB TLB entry for all the non-NUMA machines out there.  It is
a very low cost optimization for >90% of machines out there.

> Do _not_ allocate the units together in one bootmem allocation 
> because that's not NUMA-friendly.

Again, it doesn't do that for NUMA.

-- 
tejun

  reply	other threads:[~2009-02-24 14:38 UTC|newest]

Thread overview: 45+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-02-24  3:11 [PATCHSET x86/core/percpu] improve the first percpu chunk allocation Tejun Heo
2009-02-24  3:11 ` [PATCH 01/10] percpu: fix pcpu_chunk_struct_size Tejun Heo
2009-02-24  3:11 ` [PATCH 02/10] bootmem: clean up arch-specific bootmem wrapping Tejun Heo
2009-02-24 11:30   ` Johannes Weiner
2009-02-24 11:39     ` Tejun Heo
2009-02-24  3:11 ` [PATCH 03/10] bootmem: reorder interface functions and add a missing one Tejun Heo
2009-02-24  3:11 ` [PATCH 04/10] vmalloc: add @align to vm_area_register_early() Tejun Heo
2009-02-24  3:11 ` [PATCH 05/10] x86: update populate_extra_pte() and add populate_extra_pmd() Tejun Heo
2009-02-24  3:11 ` [PATCH 06/10] percpu: remove unit_size power-of-2 restriction Tejun Heo
2009-02-24  3:11 ` [PATCH 07/10] percpu: give more latitude to arch specific first chunk initialization Tejun Heo
2009-02-24  3:11 ` [PATCH 08/10] x86: separate out setup_pcpu_4k() from setup_per_cpu_areas() Tejun Heo
2009-02-24  3:11 ` [PATCH 09/10] x86: add embedding percpu first chunk allocator Tejun Heo
2009-02-24  3:11 ` [PATCH 10/10] x86: add remapping " Tejun Heo
2009-02-24  9:57 ` [PATCHSET x86/core/percpu] improve the first percpu chunk allocation Ingo Molnar
2009-02-24 11:48   ` Tejun Heo
2009-02-24 12:40     ` Ingo Molnar
2009-02-24 13:27       ` Tejun Heo
2009-02-24 14:12         ` Ingo Molnar
2009-02-24 14:37           ` Tejun Heo [this message]
2009-02-24 15:15             ` Ingo Molnar
2009-02-24 23:33               ` Tejun Heo
2009-03-04  0:03             ` Rusty Russell
2009-03-04  0:15               ` H. Peter Anvin
2009-03-04  0:50                 ` Ingo Molnar
2009-02-24 12:51     ` Ingo Molnar
2009-02-24 14:47       ` Tejun Heo
2009-02-24 15:19         ` Ingo Molnar
2009-02-24 15:30           ` Nick Piggin
2009-02-24 13:02     ` Ingo Molnar
2009-02-24 14:40       ` Tejun Heo
2009-02-24 20:17 ` Ingo Molnar
2009-02-24 20:51   ` Ingo Molnar
2009-02-24 21:02     ` Yinghai Lu
2009-02-24 21:12     ` [PATCH] x86: check range in reserve_early() -v2 Yinghai Lu
2009-02-24 21:16     ` [PATCHSET x86/core/percpu] improve the first percpu chunk allocation Ingo Molnar
2009-02-25  2:09       ` [PATCH x86/core/percpu 1/2] x86, percpu: fix minor bugs in setup_percpu.c Tejun Heo
2009-02-25  2:10       ` [PATCH x86/core/percpu 2/2] x86: convert cacheflush macros inline functions Tejun Heo
2009-02-25  2:23       ` [PATCHSET x86/core/percpu] improve the first percpu chunk allocation Tejun Heo
2009-02-25  2:56         ` Tejun Heo
2009-02-25 12:59         ` Ingo Molnar
2009-02-25 13:43           ` WARNING: at include/linux/percpu.h:159 __create_workqueue_key+0x1f6/0x220() Ingo Molnar
2009-02-26  2:03             ` [PATCH core/percpu] percpu: fix too low alignment restriction on UP Tejun Heo
2009-02-26  3:26               ` Ingo Molnar
2009-02-25  6:40       ` [PATCHSET x86/core/percpu] improve the first percpu chunk allocation Rusty Russell
2009-02-25 12:54         ` Ingo Molnar

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=49A40624.4000100@kernel.org \
    --to=tj@kernel.org \
    --cc=cpw@sgi.com \
    --cc=hpa@zytor.com \
    --cc=ink@jurassic.park.msu.ru \
    --cc=jeremy@goop.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=nickpiggin@yahoo.com.au \
    --cc=rusty@rustcorp.com.au \
    --cc=tglx@linutronix.de \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox