Re: [RFC] kernel/pid.c pid allocation wierdness

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: ebiederm@xmission.com (Eric W. Biederman)
To: Pavel Emelianov <xemul@sw.ru>
Cc: Sukadev Bhattiprolu <sukadev@us.ibm.com>,
	Serge Hallyn <serue@us.ibm.com>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Oleg Nesterov <oleg@tv-sign.ru>,
	Linux Containers <containers@lists.osdl.org>
Subject: Re: [RFC] kernel/pid.c pid allocation wierdness
Date: Wed, 14 Mar 2007 08:12:35 -0600	[thread overview]
Message-ID: <m16493rk18.fsf@ebiederm.dsl.xmission.com> (raw)
In-Reply-To: <45F7A4B3.5040005@sw.ru> (Pavel Emelianov's message of "Wed, 14 Mar 2007 10:30:59 +0300")

Pavel Emelianov <xemul@sw.ru> writes:

> Hi.
>
> I'm looking at how alloc_pid() works and can't understand
> one (simple/stupid) thing.
>
> It first kmem_cache_alloc()-s a strct pid, then calls
> alloc_pidmap() and at the end it taks a global pidmap_lock()
> to add new pid to hash.
>
> The question is - why does alloc_pidmap() use at least
> two atomic ops and potentially loop to find a zero bit
> in pidmap? Why not call alloc_pidmap() under pidmap_lock
> and find zero pid in pidmap w/o any loops and atomics?
>
> The same is for free_pid(). Do I miss something?

Well as far as I can tell that is just the way the code
evolved.

Looking at the history.  At the time I started messing with it
alloc_pidmap was the function and it behaved pretty much as it
does today with locking (except it didn't disable irqs).

To add the allocation of struct pid.  I added alloc_pid
as a wrapper.  Left alloc_pidmap alone, and added the hash
table manipulation code.  I know this results is fairly
short hold times which is moderately important for a global lock.

We loop in alloc_pidnmap because of what we are trying to do.  Simply
returning the first free pid number would have bad effects on user
space, so we have the simple requirement that we don't reuse pid
numbers for as long as is practical.  We achieve that doing full walks
through the pid space before we consider a pid again.  So we have to
start from the last place we looked.  In addition  we may have
multiple pages of bitmap to traverse (when our pid limit is high) and
those pages are not physically contiguous.

So while I wouldn't call alloc_pidmap perfect it does seem to be
reasonable.

>From what I can tell for the low number of pids that we usually have
the pid hash table seems near optimal.

If we do dig into this more we need to consider a radix_tree to hold
the pid values.  That could replace both the pid map and the hash
table, gracefully handle but large and small pid counts, might
be a smidgin simpler, possibly be more space efficient, and it would
more easily handle multiple pid namespaces.   The downside to using a
radix tree is that is looks like it will have more cache misses for
the normal pid map size, and it is yet another change that we would
need to validate.

Eric

next prev parent reply	other threads:[~2007-03-14 14:14 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-03-14  7:30 [RFC] kernel/pid.c pid allocation wierdness Pavel Emelianov
2007-03-14 14:12 ` Eric W. Biederman [this message]
2007-03-14 15:03   ` William Lee Irwin III
2007-03-14 16:54     ` Eric W. Biederman
2007-03-15 20:26       ` William Lee Irwin III
2007-03-16 13:04         ` Eric W. Biederman
2007-03-16 19:46           ` William Lee Irwin III
2007-03-16 21:18             ` Eric W. Biederman
2007-03-14 15:33   ` Oleg Nesterov
2007-03-16 10:57     ` Pavel Emelianov
2007-03-16 11:37       ` Eric Dumazet
2007-03-16 11:58         ` Pavel Emelianov
2007-03-16 11:40       ` Dmitry Adamushko
2007-03-14 14:43 ` William Lee Irwin III

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=m16493rk18.fsf@ebiederm.dsl.xmission.com \
    --to=ebiederm@xmission.com \
    --cc=containers@lists.osdl.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=oleg@tv-sign.ru \
    --cc=serue@us.ibm.com \
    --cc=sukadev@us.ibm.com \
    --cc=xemul@sw.ru \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox