Re: [RFC] kernel/pid.c pid allocation wierdness

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Eric Dumazet <dada1@cosmosbay.com>
To: Pavel Emelianov <xemul@sw.ru>
Cc: Oleg Nesterov <oleg@tv-sign.ru>,
	"Eric W. Biederman" <ebiederm@xmission.com>,
	Sukadev Bhattiprolu <sukadev@us.ibm.com>,
	Serge Hallyn <serue@us.ibm.com>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Linux Containers <containers@lists.osdl.org>
Subject: Re: [RFC] kernel/pid.c pid allocation wierdness
Date: Fri, 16 Mar 2007 12:37:47 +0100	[thread overview]
Message-ID: <200703161237.48014.dada1@cosmosbay.com> (raw)
In-Reply-To: <45FA7823.2040104@sw.ru>

On Friday 16 March 2007 11:57, Pavel Emelianov wrote:
> Oleg Nesterov wrote:
> > On 03/14, Eric W. Biederman wrote:
> >> Pavel Emelianov <xemul@sw.ru> writes:
> >>> Hi.
> >>>
> >>> I'm looking at how alloc_pid() works and can't understand
> >>> one (simple/stupid) thing.
> >>>
> >>> It first kmem_cache_alloc()-s a strct pid, then calls
> >>> alloc_pidmap() and at the end it taks a global pidmap_lock()
> >>> to add new pid to hash.
> >
> > We need some global lock. pidmap_lock is already here, and it is
> > only used to protect pidmap->page allocation. Iow, it is almost
> > unused. So it was very natural to re-use it while implementing
> > pidrefs.
> >
> >>> The question is - why does alloc_pidmap() use at least
> >>> two atomic ops and potentially loop to find a zero bit
> >>> in pidmap? Why not call alloc_pidmap() under pidmap_lock
> >>> and find zero pid in pidmap w/o any loops and atomics?
> >
> > Currently we search for zero bit lockless, why do you want
> > to do it under spin_lock ?
>
> Search isn't lockless. Look:
>
> while (1) {
>    if (!test_and_set_bit(...)) {
>        atomic_dec(&nr_free);
>        return pid;
>    }
>    ...
> }
>
> we use two atomic operations to find and set a bit in a map.

The finding of the zero bit is done without lock. (Search/lookup)

Then , the reservation of the found bit (test_and_set_bit) is done, and 
decrement of nr_free. It may fail because the search was done lockless.

Finding a zero bit in a 4096 bytes array may consume about 6000 cycles on 
modern hardware. Much more on SMP/NUMA machines, or on machines where 
PAGE_SIZE is 64K instead of 4K :)

You don't want to hold pidmad_lock for so long period.

next prev parent reply	other threads:[~2007-03-16 11:37 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-03-14  7:30 [RFC] kernel/pid.c pid allocation wierdness Pavel Emelianov
2007-03-14 14:12 ` Eric W. Biederman
2007-03-14 15:03   ` William Lee Irwin III
2007-03-14 16:54     ` Eric W. Biederman
2007-03-15 20:26       ` William Lee Irwin III
2007-03-16 13:04         ` Eric W. Biederman
2007-03-16 19:46           ` William Lee Irwin III
2007-03-16 21:18             ` Eric W. Biederman
2007-03-14 15:33   ` Oleg Nesterov
2007-03-16 10:57     ` Pavel Emelianov
2007-03-16 11:37       ` Eric Dumazet [this message]
2007-03-16 11:58         ` Pavel Emelianov
2007-03-16 11:40       ` Dmitry Adamushko
2007-03-14 14:43 ` William Lee Irwin III

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=200703161237.48014.dada1@cosmosbay.com \
    --to=dada1@cosmosbay.com \
    --cc=containers@lists.osdl.org \
    --cc=ebiederm@xmission.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=oleg@tv-sign.ru \
    --cc=serue@us.ibm.com \
    --cc=sukadev@us.ibm.com \
    --cc=xemul@sw.ru \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.