From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Oleg Nesterov <oleg@redhat.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>,
Linus Torvalds <torvalds@linux-foundation.org>,
Thomas Gleixner <tglx@linutronix.de>,
Peter Zijlstra <peterz@infradead.org>,
Ingo Molnar <mingo@elte.hu>,
Christoph Hellwig <hch@infradead.org>,
Nick Piggin <npiggin@suse.de>,
Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
Subject: Re: [rfc] "fair" rw spinlocks
Date: Wed, 9 Dec 2009 22:22:20 -0800 [thread overview]
Message-ID: <20091210062220.GC6720@linux.vnet.ibm.com> (raw)
In-Reply-To: <20091209153709.GA13192@redhat.com>
On Wed, Dec 09, 2009 at 04:37:09PM +0100, Oleg Nesterov wrote:
> On 12/07, Eric W. Biederman wrote:
> >
> > Oleg Nesterov <oleg@redhat.com> writes:
> >
> > > On 12/05, Eric W. Biederman wrote:
> > >>
> > >> Atomically sending signal to every member of a process group, is the
> > >> big fly in the ointment I am aware of. Last time I looked I could
> > >> not see how to convert it rcu.
> > >
> > > I am not sure, but iirc we can do this lockless (under rcu_lock).
> > > We need to modify pid_link to use list_entry and attach_pid() should
> > > add the new task to the end. Of course we need more changes, but
> > > (again iirc) this is not too hard.
> >
> > The problem is that even adding to the end of the list, we could run
> > into a deleted entry and not see the new end of the list.
> >
> > Suppose when we start iterating the list we have:
> >
> > A -> B -> C -> D
> >
> > Then someone deletes some of the entries while we are iterating the list.
> >
> > A ->
> > B' -> C' -> D'
> >
> > We will continue on traversing through the deleted entries.
> >
> > Then someone adds a new entry to the end of the list.
> >
> > A-> N
> >
> > Since we are at B', C' or D' we will never see the new entry on the
> > end of the list.
>
> Yes, but who can add the new entry?
>
> Let's forget about setpgrp/etc for the moment, I think we have "races"
> with or without tasklist. Say, setpgrp() can add the new process to the
> already "killed" pgrp.
>
> Then, I think the only important case is SIGKILL/SIGSTOP (or other
> signals which can't be blockes/ignored). We must kill/stop the entire
> pgrp, we must not race with fork() and miss a child.
>
> In this case I _think_ rcu_read_lock() is enough,
>
> rcu_read_lock()
>
> list_for_each_entry_rcu(task, pid->tasks[PIDTYPE_PGID)
> group_send_sig_info(sig, task);
>
> rcu_read_unlock();
>
> except group_send_sig_info() can race with mt-exec, but this is simple
> to fix.
>
> If we send a signal (not necessary SIGKILL) to a process P, we must see
> all childs which were forked by P, both send_signal() and copy_process()
> take the same ->siglock, we must see the result of list_add_tail_rcu().
> And, after we sent SIGKILL/SIGSTOP, it can't fork the new child.
>
> If list_for_each_entry() does not see the exited process P, this means
> we see the result of list_del_rcu(). But this also means we must the
> the result of the previous list_add_rcu().
>
> IOW, fork+exit means list_add_rcu() + wmb() + list_del_rcu(), if we
> don't see the new entry on list, we must see the new one, right?
>
> (I am ignoring the case when list_for_each_entry_rcu() sees a process
> P but lock_task_sighand(P) fails, I think this is the same as if we
> we missed P)
>
> Now suppose a signal is blocked/ignored or has a handler. In this case
> we can miss a child, but I think this is OK, we can pretend the new
> child was forked after kill_pgrp() completes. Say, this child C was
> forked by some process P. We can miss C only if it was forked after
> we already sent the signal to P.
>
> However. I do not pretend the reasoning above is "complete", and
> perhaps I missed something else.
My main concern would be "fork storms", where each CPU in a large
system is spawning children in a pgrp that some other CPU is attempting
to kill. The CPUs spawning children might be able to keep ahead of
the single CPU, so that the pgrp never is completely killed.
Enlisting the aid of the CPUs doing the spawning (e.g., by having them
consult a list of signals being sent) prevents this fork-storm scenario.
Thanx, Paul
> > Additionally we have the other possibility that if a child is forking
> > we send the signal to the parent after the child forks away but before
> > the child joins whichever list we are walking, and we complete our
> > traversal without seeing the child.
>
> Not sure I understand... But afaics this case is covered above.
> ->siglock should serialize this, copy_process() does attach_pid()
> under this lock.
>
> Oleg.
>
next prev parent reply other threads:[~2009-12-10 6:22 UTC|newest]
Thread overview: 58+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-11-23 14:54 [rfc] "fair" rw spinlocks Nick Piggin
2009-11-24 20:19 ` David Miller
2009-11-25 6:52 ` Nick Piggin
2009-11-25 8:49 ` Andi Kleen
2009-11-25 8:56 ` Nick Piggin
2009-11-24 20:47 ` Andi Kleen
2009-11-25 6:54 ` Nick Piggin
2009-11-25 8:48 ` Andi Kleen
2009-11-25 13:09 ` Arnd Bergmann
2009-11-28 2:07 ` Paul E. McKenney
2009-11-28 11:15 ` Andi Kleen
2009-11-28 15:20 ` Paul E. McKenney
2009-11-28 17:30 ` Linus Torvalds
2009-11-29 18:51 ` Paul E. McKenney
2009-11-30 7:57 ` Nick Piggin
2009-11-30 7:55 ` Nick Piggin
2009-11-30 15:22 ` Linus Torvalds
2009-11-30 15:40 ` Nick Piggin
2009-11-30 16:07 ` Linus Torvalds
2009-11-30 16:17 ` Nick Piggin
2009-11-30 16:39 ` Paul E. McKenney
2009-11-30 17:05 ` Linus Torvalds
2009-11-30 17:13 ` Nick Piggin
2009-11-30 17:18 ` Linus Torvalds
2009-12-01 17:03 ` Arnd Bergmann
2009-12-01 17:15 ` Linus Torvalds
2009-11-30 18:29 ` Paul E. McKenney
2009-11-30 16:20 ` Paul E. McKenney
2009-11-30 10:00 ` Christoph Hellwig
2009-11-30 15:52 ` Linus Torvalds
2009-11-30 17:46 ` Ingo Molnar
2009-11-30 21:12 ` Thomas Gleixner
2009-11-30 21:27 ` Peter Zijlstra
2009-11-30 22:02 ` Thomas Gleixner
2009-11-30 22:11 ` Linus Torvalds
2009-11-30 22:37 ` Thomas Gleixner
2009-11-30 22:49 ` Linus Torvalds
2009-12-01 17:37 ` [PATCH] audit: Call tty_audit_push_task() outside preempt disabled region Thomas Gleixner
2009-12-01 18:22 ` Oleg Nesterov
2009-12-01 19:53 ` Thomas Gleixner
2009-12-06 3:12 ` [rfc] "fair" rw spinlocks Eric W. Biederman
2009-12-07 18:18 ` Paul E. McKenney
2009-12-07 22:24 ` Eric W. Biederman
2009-12-07 22:35 ` Andi Kleen
2009-12-07 23:19 ` Eric W. Biederman
2009-12-08 1:39 ` Paul E. McKenney
2009-12-08 2:11 ` Eric W. Biederman
2009-12-08 2:37 ` Paul E. McKenney
2009-12-07 18:32 ` Oleg Nesterov
2009-12-07 20:38 ` Peter Zijlstra
2009-12-09 15:55 ` Oleg Nesterov
2009-12-07 22:10 ` Eric W. Biederman
2009-12-09 15:37 ` Oleg Nesterov
2009-12-10 3:36 ` Eric W. Biederman
2009-12-10 6:22 ` Paul E. McKenney [this message]
2009-12-10 10:31 ` Eric W. Biederman
2009-12-10 16:41 ` Paul E. McKenney
2009-12-01 19:01 ` Mathieu Desnoyers
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20091210062220.GC6720@linux.vnet.ibm.com \
--to=paulmck@linux.vnet.ibm.com \
--cc=ebiederm@xmission.com \
--cc=hch@infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@elte.hu \
--cc=npiggin@suse.de \
--cc=oleg@redhat.com \
--cc=peterz@infradead.org \
--cc=tglx@linutronix.de \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.