Re: regarding the x86_64 zero-based percpu patches

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Tejun Heo <tj@kernel.org>
To: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Christoph Lameter <cl@linux-foundation.org>,
	Rusty Russell <rusty@rustcorp.com.au>,
	Ingo Molnar <mingo@elte.hu>,
	travis@sgi.com,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	"H. Peter Anvin" <hpa@zytor.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	steiner@sgi.com, Hugh Dickins <hugh@veritas.com>
Subject: Re: regarding the x86_64 zero-based percpu patches
Date: Wed, 14 Jan 2009 12:58:56 +0900	[thread overview]
Message-ID: <496D6300.9070402@kernel.org> (raw)
In-Reply-To: <m1k58zx305.fsf@frodo.ebiederm.org>

Hello, Eric.

Eric W. Biederman wrote:
> Tejun Heo <tj@kernel.org> writes:
>> I don't know.  I think it's a dangerous thing which can be avoided.
>> If there's no other solution, then we might have to live with it but I
>> don't see the winning benefit of such design over per-cpu virtual
>> mapping.
> 
> It isn't incompatible with a per-cpu virtual mapping.  It allows the
> possibility of each cpu reusing the same chunk of virtual address
> space for per cpu memory.
> 
> On x86_64 and other architectures with enough address space bits it allows
> us to share the large pages that we use for the normal memory mapping with
> the ones for per cpu access.
> 
> I definitely think the work of combining the pda and the percpu areas
> into a common area is worthwhile.

Yeah, it's gonna be necessary regardless of which way we go.

> I think it would be nice if the percpu area could grow and would not be
> a fixed size at boot time, I'm not particularly convinced it has to.

The main problem is that the area needs to be congruent which
basically mandates them to be contiguous.  The three alternatives on
table are...

1. Just reserve memory from the get-go.  Simplest.  No additional TLB
   pressure but memory is likely to be wasted and more importantly
   scalability suffers.

2. Reserve address space and map memory as necessary.  We can be much
   more generous about reserving address space especially on 64bit
   machines and probably can mostly forget about scalability issue
   there.  However, getting things just right for address space
   contrained 32bit might not be too easy but then again nothing
   really is scalable on 32bit these days, so we probably can live
   with boot time parameter or something.

   Another issue is added TLB pressure as it's likely to consume 4K
   TLB entries in addition to the default kernel mapping 2M TLB
   entries.  The TLB pressure can be mostly avoided if percpu area is
   sufficiently large to justify 2MB page allocation but it isn't.

3. Do realloc().  This doesn't impose scalability issues or add to TLB
   pressure but it does contrain how the percpu variables can be used
   and introduces certain amount of possibility for scary
   once-in-a-blue-moon never-reproducible bugs.  Maybe such
   possibility can be reduced by putting some restriction on the
   interface but I don't know.  It still scares me.

Hmm... IIUC, the biggest drawback of #2 is the added TLB pressure,
right?  What if we reserve percpu allocation by 2MB chunks?  ie. use
4k mapping but always allocate the percpu pages from aligned 2MB
chunks.  That way it won't waste 2MB per cpu and although it will use
additional 4K TLB entries, it will free up 2MB TLB entries.

Thanks.

-- 
tejun

next prev parent reply	other threads:[~2009-01-14  3:59 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <49649814.4040005@kernel.org>
     [not found] ` <20090107120225.GA30651@elte.hu>
2009-01-07 12:13   ` regarding the x86_64 zero-based percpu patches Tejun Heo
2009-01-10  6:46     ` Rusty Russell
2009-01-12 17:23       ` Christoph Lameter
2009-01-12 17:44         ` Eric W. Biederman
2009-01-12 19:00           ` Christoph Lameter
2009-01-13  0:33           ` Tejun Heo
2009-01-13  3:01             ` Eric W. Biederman
2009-01-13  3:14               ` Tejun Heo
2009-01-13  4:07                 ` Eric W. Biederman
2009-01-14  3:58                   ` Tejun Heo [this message]
2009-01-15  1:47                     ` Rusty Russell
2009-01-15  1:49                   ` Rusty Russell
2009-01-15 20:26                     ` Christoph Lameter
2009-01-15  1:34           ` Rusty Russell
2009-01-15 13:55             ` Ingo Molnar
2009-01-15 20:27             ` Christoph Lameter

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=496D6300.9070402@kernel.org \
    --to=tj@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=cl@linux-foundation.org \
    --cc=ebiederm@xmission.com \
    --cc=hpa@zytor.com \
    --cc=hugh@veritas.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=rusty@rustcorp.com.au \
    --cc=steiner@sgi.com \
    --cc=travis@sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox