public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Oleg Nesterov <oleg@redhat.com>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>,
	David Howells <dhowells@redhat.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>,
	paulmck@linux.vnet.ibm.com, akpm@linux-foundation.org,
	linux-kernel@vger.kernel.org,
	linux-security-module@vger.kernel.org,
	Jiri Olsa <jolsa@redhat.com>
Subject: Re: [PATCH 2/2] CRED: Fix __task_cred()'s lockdep check and banner comment
Date: Thu, 5 Aug 2010 20:16:52 +0200	[thread overview]
Message-ID: <20100805181652.GA27671@redhat.com> (raw)
In-Reply-To: <AANLkTikavQKkvmdY5=8jh8KSV7vit=WcmDHWjnZfVWnv@mail.gmail.com>

On 08/05, Linus Torvalds wrote:
>
> On Thu, Aug 5, 2010 at 12:19 AM, Eric W. Biederman
> <ebiederm@xmission.com> wrote:
> >
> > No.  When we send a signal to multiple processes it needs to be an
> > atomic operation so that kill -KILL -pgrp won't let processes escape.
> > It is what posix specifies, it is what real programs expect, and it
> > is the useful semantic in userspace.
>
> Ok. However, in that case, it's not really about the whole list
> traversal, it's a totally separate thing, and it's really sad that we
> end up using the (rather hot) tasklist_lock for something like that.
> With the dcache/inode locks basically going away, I think
> tasklist_lock ends up being one of the few hot locks left.
>
> Wouldn't it be much nicer to:
>  - make it clear that all the "real" signal locking can rely on RCU
>  - use a separate per-pgrp lock that ends up being the one that gives
> the signal _semantic_ meaning?
>
> That would automatically document why we get the lock too, which
> certainly isn't clear from the code as-is.
>
> The per-pgrp lock might be something as simple as a silly hash that
> just spreads out the process groups over some random number of simple
> spinlocks.

I still think we can avoid the new lock and rely on RCU in
kill_something_info() and kill_pgrp().

I am attaching my old email below. It talks about pgrp, however I
think kill_something_info() is almost the same thing.

Oleg.

On 12/07, Eric W. Biederman wrote:
>
> Oleg Nesterov <oleg@redhat.com> writes:
>
> > On 12/05, Eric W. Biederman wrote:
> >>
> >> Atomically sending signal to every member of a process group, is the
> >> big fly in the ointment I am aware of.  Last time I looked I could
> >> not see how to convert it rcu.
> >
> > I am not sure, but iirc we can do this lockless (under rcu_lock).
> > We need to modify pid_link to use list_entry and attach_pid() should
> > add the new task to the end. Of course we need more changes, but
> > (again iirc) this is not too hard.
>
> The problem is that even adding to the end of the list, we could run
> into a deleted entry and not see the new end of the list.
>
> Suppose when we start iterating the list we have:
>
>   A -> B -> C -> D
>
> Then someone deletes some of the entries while we are iterating the list.
>
> A ->
>  B' -> C' -> D'
>
> We will continue on traversing through the deleted entries.
>
> Then someone adds a new entry to the end of the list.
>
> A-> N
>
> Since we are at B', C' or D' we will never see the new entry on the
> end of the list.

Yes, but who can add the new entry?

Let's forget about setpgrp/etc for the moment, I think we have "races"
with or without tasklist. Say, setpgrp() can add the new process to the
already "killed" pgrp.

Then, I think the only important case is SIGKILL/SIGSTOP (or other
signals which can't be blockes/ignored). We must kill/stop the entire
pgrp, we must not race with fork() and miss a child.

In this case I _think_ rcu_read_lock() is enough,

	rcu_read_lock()

	list_for_each_entry_rcu(task, pid->tasks[PIDTYPE_PGID)
		group_send_sig_info(sig, task);

	rcu_read_unlock();

except group_send_sig_info() can race with mt-exec, but this is simple
to fix.

If we send a signal (not necessary SIGKILL) to a process P, we must see
all childs which were forked by P, both send_signal() and copy_process()
take the same ->siglock, we must see the result of list_add_tail_rcu().
And, after we sent SIGKILL/SIGSTOP, it can't fork the new child.

If list_for_each_entry() does not see the exited process P, this means
we see the result of list_del_rcu(). But this also means we must the
the result of the previous list_add_rcu().

IOW, fork+exit means list_add_rcu() + wmb() + list_del_rcu(), if we
don't see the new entry on list, we must see the new one, right?

(I am ignoring the case when list_for_each_entry_rcu() sees a process
 P but lock_task_sighand(P) fails, I think this is the same as if we
 we missed P)

Now suppose a signal is blocked/ignored or has a handler. In this case
we can miss a child, but I think this is OK, we can pretend the new
child was forked after kill_pgrp() completes. Say, this child C was
forked by some process P. We can miss C only if it was forked after
we already sent the signal to P.

However. I do not pretend the reasoning above is "complete", and
perhaps I missed something else.

> Additionally we have the other possibility that if a child is forking
> we send the signal to the parent after the child forks away but before
> the child joins whichever list we are walking, and we complete our
> traversal without seeing the child.

Not sure I understand... But afaics this case is covered above.
->siglock should serialize this, copy_process() does attach_pid()
under this lock.

Oleg.


  reply	other threads:[~2010-08-05 18:20 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-07-29 11:45 [PATCH 1/2] CRED: Fix get_task_cred() and task_state() to not resurrect dead credentials David Howells
2010-07-29 11:45 ` [PATCH 2/2] CRED: Fix __task_cred()'s lockdep check and banner comment David Howells
2010-08-02 20:40   ` Paul E. McKenney
2010-08-03  0:55     ` Tetsuo Handa
2010-08-03  9:34       ` David Howells
2010-08-03 16:07         ` Linus Torvalds
2010-08-03 17:48           ` Thomas Gleixner
2010-08-04 13:17           ` Oleg Nesterov
2010-08-04 14:01             ` David Howells
2010-08-04 15:08               ` Oleg Nesterov
2010-08-04 15:22                 ` David Howells
2010-08-04 15:44                   ` Oleg Nesterov
2010-08-05  7:19           ` Eric W. Biederman
2010-08-05 16:14             ` Linus Torvalds
2010-08-05 18:16               ` Oleg Nesterov [this message]
2010-08-05 20:13               ` Eric W. Biederman
2010-08-05 20:26                 ` Linus Torvalds
2010-08-05 21:20                   ` Eric W. Biederman
2010-08-04  0:38         ` Tetsuo Handa

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100805181652.GA27671@redhat.com \
    --to=oleg@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=dhowells@redhat.com \
    --cc=ebiederm@xmission.com \
    --cc=jolsa@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-security-module@vger.kernel.org \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=penguin-kernel@i-love.sakura.ne.jp \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox