From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Oleg Nesterov <oleg@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
Don Zickus <dzickus@redhat.com>,
Frederic Weisbecker <fweisbec@gmail.com>,
Ingo Molnar <mingo@elte.hu>,
Jerome Marchand <jmarchan@redhat.com>,
Mandeep Singh Baines <msb@google.com>,
Roland McGrath <roland@redhat.com>,
linux-kernel@vger.kernel.org, stable@kernel.org,
"Eric W. Biederman" <ebiederm@xmission.com>
Subject: Re: while_each_thread() under rcu_read_lock() is broken?
Date: Tue, 22 Jun 2010 15:12:26 -0700 [thread overview]
Message-ID: <20100622221226.GP2290@linux.vnet.ibm.com> (raw)
In-Reply-To: <20100622212357.GA19670@redhat.com>
On Tue, Jun 22, 2010 at 11:23:57PM +0200, Oleg Nesterov wrote:
> On 06/21, Paul E. McKenney wrote:
> >
> > Indeed, the tough part is figuring out when you are done given that things
> > can come and go at will. Some additional tricks, in no particular order:
> >
> > 1. Always start at the group leader.
>
> We can't. We have users which start at the arbitrary thread.
OK.
> > 2. Maintain a separate task structure that flags the head of the
> > list. This separate structure is freed one RCU grace period
> > following the disappearance of the current group leader.
>
> Even simpler, we can just add list_head into signal_struct. I thought
> about this, but this breaks thread_group_empty (this is fixeable) and,
> again, I'd like very much to avoid adding new fields into task_struct
> or signal_struct.
Understood.
> > > Well, another field in task_struct...
> >
> > Yeah, would be good to avoid this. Not sure it can be avoided, though.
>
> Why? I think next_thread_careful() from
> http://marc.info/?l=linux-kernel&m=127714242731448
> should work.
>
> If the caller holds tasklist or siglock, this change has no effect.
>
> If the caller does while_each_thread() under rcu_read_lock(), then
> it is OK to break the loop earlier than we do now. The lockless
> while_each_thread() works in a "best effort" manner anyway, if it
> races with exit_group() or exec() it can miss some/most/all sub-threads
> (including the new leader) with or without this change.
>
> Yes, zap_threads() needs additional fixes. But I think it is better
> to complicate a couple of lockless callers (or just change them
> to take tasklist) which must not miss an "interesting" thread.
Is it the case that creating a new group leader from an existing group
always destroys the old group? It certainly is the case for exec().
In my earlier emails, I was assuming that it was possible to create a
new thread group without destroying the old one, and that the thread
group leader might leave the thread group and form a new one, so that a
new thread group leader would be selected for the old group. I suspect
that I was confused. ;-)
Anyway, if creating a new thread group implies destroying the old one,
and if the thread group leader cannot be concurrently creating a new
thread group and traversing the old one, then yes, I do believe your
code at http://marc.info/?l=linux-kernel&m=127714242731448 will work.
Assuming that the call to next_thread_careful(t) in the definition of
while_each_thread() is replaced with next_thread_careful(g,t).
And give or take memory barriers.
The implied memory barrier mentioned in the comment in your example code
is the spin_lock_irqsave() and spin_unlock_irqrestore() in free_pid(),
which is called from __change_pid() which is called from detach_pid()?
One some platforms, code may be reordered from both sides into the
resulting critical section, so you actually need two non-overlapping
lock-based critical sections to guarantee full memory-barrier semantics.
And the atomic_inc() in free_pidmap() is not required to provide
memory-barrier semantics, and does not do so on all platforms.
Or does the implied memory barrier correspond to the first of three calls
to detach_pid() in __unhash_process()? (The above analysis assumes that
it corresponds to the last of the three.)
> > > > o Do the de_thread() incrementally. So if the list is tasks A,
> > > > B, and C, in that order, and if we are de-thread()ing B,
> > > > then make A's pointer refer to C,
> > >
> > > This breaks while_each_thread() under tasklist/siglock. It must
> > > see all unhashed tasks.
> >
> > Could de_thread() hold those locks in order to avoid that breakage?
>
> How can it hold, say, siglock? We need to wait a grace period.
> To clarify. de_thread() kills all threads except the group_leader,
> so we have only 2 threads: group_leader A and B.
>
> If we add synchronize_rcu() before release_task(leader) (as Roland
> suggested), then we don't need to change A's pointer. This probably
> fixes while_each_thread() in the common case. But this disallows
> the tricks like rcu_lock_break().
>
>
> And. Whatever we do with de_thread(), this can't fix the lockless
> while_each_thread(not_a_group_leader, t). I do not know if there is
> any user which does this though.
> fastpath_timer_check()->thread_group_cputimer() does this, but this
> is wrong and we already have the patch which removes it.
Indeed. Suppose that the starting task is the one immediately preceding
the task group leader. You get a pointer to the task in question
and traverse to the next task (the task group leader), during which
time the thread group leader does exec() and maybe a pthread_create()
or two. Oops! You are now now traversing the wrong thread group!
There are ways of fixing this, but all the ones I know of require more
fields in the task structure, so best if we don't need to start somewhere
other than a group leader.
Thanx, Paul
next prev parent reply other threads:[~2010-06-22 22:12 UTC|newest]
Thread overview: 45+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-06-18 19:02 [PATCH] fix the racy check_hung_uninterruptible_tasks()->rcu_lock_break() logic Oleg Nesterov
2010-06-18 19:34 ` while_each_thread() under rcu_read_lock() is broken? Oleg Nesterov
2010-06-18 21:08 ` Roland McGrath
2010-06-18 22:37 ` Oleg Nesterov
2010-06-18 22:33 ` Paul E. McKenney
2010-06-21 17:09 ` Oleg Nesterov
2010-06-21 17:44 ` Oleg Nesterov
2010-06-21 18:00 ` Oleg Nesterov
2010-06-21 19:02 ` Roland McGrath
2010-06-21 20:06 ` Oleg Nesterov
2010-06-21 21:19 ` Eric W. Biederman
2010-06-22 14:34 ` Oleg Nesterov
2010-07-08 23:59 ` Roland McGrath
2010-07-09 0:41 ` Paul E. McKenney
2010-07-09 1:01 ` Roland McGrath
2010-07-09 16:18 ` Paul E. McKenney
2010-06-21 20:51 ` Paul E. McKenney
2010-06-21 21:22 ` Eric W. Biederman
2010-06-21 21:38 ` Paul E. McKenney
2010-06-22 21:23 ` Oleg Nesterov
2010-06-22 22:12 ` Paul E. McKenney [this message]
2010-06-23 15:24 ` Oleg Nesterov
2010-06-24 18:07 ` Paul E. McKenney
2010-06-24 18:50 ` Chris Friesen
2010-06-24 22:00 ` Oleg Nesterov
2010-06-25 0:08 ` Eric W. Biederman
2010-06-25 3:42 ` Paul E. McKenney
2010-06-25 10:08 ` Oleg Nesterov
2010-07-09 0:52 ` Roland McGrath
2010-06-24 21:14 ` Roland McGrath
2010-06-25 3:37 ` Paul E. McKenney
2010-07-09 0:41 ` Roland McGrath
2010-06-24 21:57 ` Oleg Nesterov
2010-06-25 3:41 ` Paul E. McKenney
2010-06-25 9:55 ` Oleg Nesterov
2010-06-28 23:43 ` Paul E. McKenney
2010-06-29 13:05 ` Oleg Nesterov
2010-06-29 15:34 ` Paul E. McKenney
2010-06-29 17:54 ` Oleg Nesterov
2010-06-19 5:00 ` Mandeep Baines
2010-06-19 5:35 ` Frederic Weisbecker
2010-06-19 15:44 ` Mandeep Baines
2010-06-19 19:19 ` Oleg Nesterov
2010-06-18 20:11 ` [PATCH] fix the racy check_hung_uninterruptible_tasks()->rcu_lock_break() logic Frederic Weisbecker
2010-06-18 20:38 ` Mandeep Singh Baines
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20100622221226.GP2290@linux.vnet.ibm.com \
--to=paulmck@linux.vnet.ibm.com \
--cc=akpm@linux-foundation.org \
--cc=dzickus@redhat.com \
--cc=ebiederm@xmission.com \
--cc=fweisbec@gmail.com \
--cc=jmarchan@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@elte.hu \
--cc=msb@google.com \
--cc=oleg@redhat.com \
--cc=roland@redhat.com \
--cc=stable@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.