public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Andrew Morton <akpm@linux-foundation.org>
To: Salman <sqazi@google.com>
Cc: mingo@elte.hu, linux-kernel@vger.kernel.org,
	peterz@infradead.org, tytso@google.com,
	torvalds@linux-foundation.org, walken@google.com,
	Chen Liqin <liqin.chen@sunplusct.com>,
	Lennox Wu <lennox.wu@gmail.com>
Subject: Re: [PATCH] Fix a race in pid generation that causes pids to be reused immediately.
Date: Mon, 14 Jun 2010 16:58:51 -0700	[thread overview]
Message-ID: <20100614165851.6bdfe485.akpm@linux-foundation.org> (raw)
In-Reply-To: <20100611224902.5039.60134.stgit@bumblebee1.mtv.corp.google.com>

On Fri, 11 Jun 2010 15:49:54 -0700
Salman <sqazi@google.com> wrote:

> A program that repeatedly forks and waits is susceptible to having the
> same pid repeated, especially when it competes with another instance of the
> same program.  This is really bad for bash implementation.  Furthermore,
> many shell scripts assume that pid numbers will not be used for some length
> of time.
> 
> Race Description:
>
> ...
>
> diff --git a/kernel/pid.c b/kernel/pid.c
> index e9fd8c1..fbbd5f6 100644
> --- a/kernel/pid.c
> +++ b/kernel/pid.c
> @@ -122,6 +122,43 @@ static void free_pidmap(struct upid *upid)
>  	atomic_inc(&map->nr_free);
>  }
>  
> +/*
> + * If we started walking pids at 'base', is 'a' seen before 'b'?
> + */
> +static int pid_before(int base, int a, int b)
> +{
> +	/*
> +	 * This is the same as saying
> +	 *
> +	 * (a - base + MAXUINT) % MAXUINT < (b - base + MAXUINT) % MAXUINT
> +	 * and that mapping orders 'a' and 'b' with respect to 'base'.
> +	 */
> +	return (unsigned)(a - base) < (unsigned)(b - base);
> +}

pid.c uses an exotic mix of `int' and `pid_t' to represent pids.  `int'
seems to preponderate.

> +/*
> + * We might be racing with someone else trying to set pid_ns->last_pid.
> + * We want the winner to have the "later" value, because if the
> + * "earlier" value prevails, then a pid may get reused immediately.
> + *
> + * Since pids rollover, it is not sufficient to just pick the bigger
> + * value.  We have to consider where we started counting from.
> + *
> + * 'base' is the value of pid_ns->last_pid that we observed when
> + * we started looking for a pid.
> + *
> + * 'pid' is the pid that we eventually found.
> + */
> +static void set_last_pid(struct pid_namespace *pid_ns, int base, int pid)
> +{
> +	int prev;
> +	int last_write = base;
> +	do {
> +		prev = last_write;
> +		last_write = cmpxchg(&pid_ns->last_pid, prev, pid);
> +	} while ((prev != last_write) && (pid_before(base, last_write, pid)));
> +}

<gets distracted>

hm.  For a long time cmpxchg() wasn't available on all architectures. 
That _seems_ to have been fixed.

arch/score assumes that cmpxchg() operates on unsigned longs.

arch/powerpc plays the necessary games to make 4- and 8-byte scalars work.

ia64 handles 1, 2, 4 and 8-byte quantities.

arm handles 1, 2 and 4-byte scalars.

as does blackfin.

So from the few architectures I looked at, it seems that we do indeed
handle cmpxchg() on all architectures although not very consistently. 
arch/score will blow up if someone tries to use cmpxchg() on 1- or
2-byte scalars.

<looks at the consumers>

infiniband deos cmpxchg() on u64*'s, which will blow up on many
architectures.

Using

	grep -r '[ 	]cmpxchg[^_]' . | grep -v /arch/

I can't see any cmpxchg() callers in truly generic code.  lockdep and
kernel/trace/ring_buffer.c aren't used on the more remote
architectures, I think.

Traditionally, atomic_cmpxchg() was the safe and portable one to use.



  parent reply	other threads:[~2010-06-14 23:59 UTC|newest]

Thread overview: 47+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-06-11 17:17 [PATCH] Fix a race in pid generation that causes pids to be reused immediately Salman
2010-06-11 17:44 ` Linus Torvalds
2010-06-11 22:49   ` Salman
2010-06-11 23:07     ` Linus Torvalds
2010-06-14 23:58     ` Andrew Morton [this message]
2010-06-15  0:56       ` tytso
2010-06-15  1:55         ` Andrew Morton
2010-06-15  3:26           ` Paul Mackerras
2010-06-15  4:21             ` Andrew Morton
2010-06-15  4:38               ` Eric Dumazet
2010-06-15  6:57               ` Benjamin Herrenschmidt
2010-06-15  7:25               ` Paul Mackerras
2010-06-15 12:56           ` tytso
2010-06-15 13:06             ` Kyle McMartin
2010-06-15 14:35           ` Peter Zijlstra
2010-06-15 19:37             ` Andrew Morton
2010-06-15 18:35     ` [PATCH 0/1] (Was: Fix a race in pid generation that causes pids to be reused immediately.) Oleg Nesterov
2010-06-15 18:35       ` [PATCH 1/1] pids: alloc_pidmap: remove the unnecessary boundary checks Oleg Nesterov
  -- strict thread matches above, loose matches on Subject: below --
2010-06-10 21:24 [PATCH] Fix a race in pid generation that causes pids to be reused immediately Salman
2010-06-10 20:09 Salman
2010-06-10 20:38 ` tytso
2010-06-10 21:04   ` Salman Qazi
2010-06-09 21:00 Salman
2010-06-09 21:21 ` Linus Torvalds
2010-06-09 21:33   ` Peter Zijlstra
2010-06-09 22:20     ` Linus Torvalds
2010-06-09 22:27       ` Linus Torvalds
2010-06-10  0:08         ` Salman Qazi
2010-06-10  0:20           ` Linus Torvalds
     [not found]             ` <AANLkTilXJ0X2qxD9cNTlLayKzySEZu1HEZUWu--Go8kw@mail.gmail.com>
2010-06-10  5:55               ` Salman Qazi
2010-06-10 16:39                 ` Linus Torvalds
2010-06-09  6:24 Salman
2010-06-09  6:53 ` Andi Kleen
2010-06-09  9:48 ` Ingo Molnar
2010-06-09 15:39   ` Linus Torvalds
2010-06-09 15:50     ` tytso
2010-06-09 16:06       ` Linus Torvalds
2010-06-09 17:10         ` tytso
2010-06-09 17:23           ` Linus Torvalds
2010-06-09 17:25             ` Linus Torvalds
2010-06-09 17:34               ` tytso
2010-06-09 17:43                 ` Linus Torvalds
2010-06-09 17:47                   ` tytso
2010-06-09 18:09                     ` Salman Qazi
2010-06-09 11:49 ` Michel Lespinasse
2010-06-09 12:37   ` tytso
2010-06-09 12:17 ` tytso

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100614165851.6bdfe485.akpm@linux-foundation.org \
    --to=akpm@linux-foundation.org \
    --cc=lennox.wu@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=liqin.chen@sunplusct.com \
    --cc=mingo@elte.hu \
    --cc=peterz@infradead.org \
    --cc=sqazi@google.com \
    --cc=torvalds@linux-foundation.org \
    --cc=tytso@google.com \
    --cc=walken@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox