Re: [RFC] kernel/pid.c pid allocation wierdness

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: William Lee Irwin III <wli@holomorphy.com>
To: Pavel Emelianov <xemul@sw.ru>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>,
	Sukadev Bhattiprolu <sukadev@us.ibm.com>,
	Serge Hallyn <serue@us.ibm.com>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
Subject: Re: [RFC] kernel/pid.c pid allocation wierdness
Date: Wed, 14 Mar 2007 07:43:58 -0700	[thread overview]
Message-ID: <20070314144358.GS2986@holomorphy.com> (raw)
In-Reply-To: <45F7A4B3.5040005@sw.ru>

On Wed, Mar 14, 2007 at 10:30:59AM +0300, Pavel Emelianov wrote:
> I'm looking at how alloc_pid() works and can't understand
> one (simple/stupid) thing.
> It first kmem_cache_alloc()-s a strct pid, then calls
> alloc_pidmap() and at the end it taks a global pidmap_lock()
> to add new pid to hash.
> The question is - why does alloc_pidmap() use at least
> two atomic ops and potentially loop to find a zero bit
> in pidmap? Why not call alloc_pidmap() under pidmap_lock
> and find zero pid in pidmap w/o any loops and atomics?
> The same is for free_pid(). Do I miss something?

pidmap_lock protects the ->page elements of the pidmap array. The
bitmap is not protected by it. It was intended to be as lockless as
possible, so the lock there essentially stands in for a cmpxchg().

A loop of some kind is strictly necessary regardless; in this lockless
case concurrent bitmap updates can trigger looping. It's very important
that only O(1) operations happen under the lock. These operations are
installing freshly-allocated pidmap pages, inserting a struct pid into
a hashtable collision chain, and removing a struct pid from a hashtable
collision chain. Traversals of hashtable collision chains are lockless
as per RCU. In any event, the atomic bit operations allow purely
lockless bitmap updates as well as purely lockless bitmap reads.

Essentially the idioms you're noticing are all for SMP scalability; in
particular, pid allocation used to cause enormous stress on tasklist_lock
that would trigger NMI-based deadlock detectors. Backing out such
optimizations is tantamount to making the systems affected by that (i.e.
any with enough cpus) crash that way again.

-- wli

     prev parent reply	other threads:[~2007-03-14 14:44 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-03-14  7:30 [RFC] kernel/pid.c pid allocation wierdness Pavel Emelianov
2007-03-14 14:12 ` Eric W. Biederman
2007-03-14 15:03   ` William Lee Irwin III
2007-03-14 16:54     ` Eric W. Biederman
2007-03-15 20:26       ` William Lee Irwin III
2007-03-16 13:04         ` Eric W. Biederman
2007-03-16 19:46           ` William Lee Irwin III
2007-03-16 21:18             ` Eric W. Biederman
2007-03-14 15:33   ` Oleg Nesterov
2007-03-16 10:57     ` Pavel Emelianov
2007-03-16 11:37       ` Eric Dumazet
2007-03-16 11:58         ` Pavel Emelianov
2007-03-16 11:40       ` Dmitry Adamushko
2007-03-14 14:43 ` William Lee Irwin III [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20070314144358.GS2986@holomorphy.com \
    --to=wli@holomorphy.com \
    --cc=ebiederm@xmission.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=serue@us.ibm.com \
    --cc=sukadev@us.ibm.com \
    --cc=xemul@sw.ru \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox