Re: while_each_thread() under rcu_read_lock() is broken?

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Oleg Nesterov <oleg@redhat.com>
To: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Don Zickus <dzickus@redhat.com>,
	Frederic Weisbecker <fweisbec@gmail.com>,
	Ingo Molnar <mingo@elte.hu>,
	Jerome Marchand <jmarchan@redhat.com>,
	Mandeep Singh Baines <msb@google.com>,
	Roland McGrath <roland@redhat.com>,
	linux-kernel@vger.kernel.org, stable@kernel.org,
	"Eric W. Biederman" <ebiederm@xmission.com>
Subject: Re: while_each_thread() under rcu_read_lock() is broken?
Date: Wed, 23 Jun 2010 17:24:21 +0200	[thread overview]
Message-ID: <20100623152421.GA8445@redhat.com> (raw)
In-Reply-To: <20100622221226.GP2290@linux.vnet.ibm.com>

On 06/22, Paul E. McKenney wrote:
>
> On Tue, Jun 22, 2010 at 11:23:57PM +0200, Oleg Nesterov wrote:
> > On 06/21, Paul E. McKenney wrote:
> > >
> > > Yeah, would be good to avoid this.  Not sure it can be avoided, though.
> >
> > Why? I think next_thread_careful() from
> > 	http://marc.info/?l=linux-kernel&m=127714242731448
> > should work.
> >
> > If the caller holds tasklist or siglock, this change has no effect.
> >
> > If the caller does while_each_thread() under rcu_read_lock(), then
> > it is OK to break the loop earlier than we do now. The lockless
> > while_each_thread() works in a "best effort" manner anyway, if it
> > races with exit_group() or exec() it can miss some/most/all sub-threads
> > (including the new leader) with or without this change.
> >
> > Yes, zap_threads() needs additional fixes. But I think it is better
> > to complicate a couple of lockless callers (or just change them
> > to take tasklist) which must not miss an "interesting" thread.
>
> Is it the case that creating a new group leader from an existing group
> always destroys the old group?  It certainly is the case for exec().

Yes. And only exec() can change the leader.

> Anyway, if creating a new thread group implies destroying the old one,
> and if the thread group leader cannot be concurrently creating a new
> thread group and traversing the old one, then yes, I do believe your
> code at http://marc.info/?l=linux-kernel&m=127714242731448 will work.
>
> Assuming that the call to next_thread_careful(t) in the definition of
> while_each_thread() is replaced with next_thread_careful(g,t).

Great.

> And give or take memory barriers.
>
> The implied memory barrier mentioned in the comment in your example code
> is the spin_lock_irqsave() and spin_unlock_irqrestore()

Argh. No, no, no. I meant unlock + lock, not lock + unlock. But when
I look at __change_pid() now I do not see the 2nd lock() after
unlock(pidmap_lock). I was wrong, thanks.

And, even if I was correct it is not nice to rely on the internals
of detach_pid(). If we use next_thread_careful(), __unhash_process()
needs wmb.

Thanks!

> > And. Whatever we do with de_thread(), this can't fix the lockless
> > while_each_thread(not_a_group_leader, t). I do not know if there is
> > any user which does this though.
> > fastpath_timer_check()->thread_group_cputimer() does this, but this
> > is wrong and we already have the patch which removes it.
>
> Indeed.  Suppose that the starting task is the one immediately preceding
> the task group leader.  You get a pointer to the task in question
> and traverse to the next task (the task group leader), during which
> time the thread group leader does exec() and maybe a pthread_create()
> or two.  Oops!  You are now now traversing the wrong thread group!

Yes, but there is another more simple and much more feasible race.
while_each_thread(non_leader, t) will loop forever if non_leader
simply exits and passes __unhash_process(). After that next_thread(t)
can never return non_leader.

> There are ways of fixing this, but all the ones I know of require more
> fields in the task structure,

Just in case, I hope that next_thread_careful() handles this case too.

> so best if we don't need to start somewhere
> other than a group leader.

(I assume, you mean the lockless case)

I am not sure. Because it is not trivial to enforce this rule even if
we add a fat comment. Please look at check_hung_uninterruptible_tasks().
It does do_each_thread/while_each_thread and thus it always starts at
the leader. But in fact this is not true due to rcu_lock_break().

Or proc_task_readdir(). It finds the leader, does get_task_struct(leader)
and drops rcu lock. After that first_tid() takes rcu_read_lock() again
and does the open coded while_each_thread(). At first glance this case
looks fine, "for (pos = leader; nr > 0; --nr)" can't loop forever.
But in fact it is not:

	- proc_task_readdir() finds the leader L, does get_task_struct()
	  and drops rcu lock.

	  Suppose that filp->f_pos >= 2.
          Suppose also that the previous tid cached in filp->f_version
          is no longer valid.

	  The caller is preempted.

	- a sub-thread T does exec, and does release_task(L).

	  But L->thread_group->next == T, so far everything is good

	- T spawns other sub-threads (so that get_nr_threads() > nr)
	  and exits.

	- grace period passes, T's task_struct is freed/unmapped/reused

	- proc_task_readdir() resumes and calls first_tid(L).

	  next_thread(L) returns T == nowhere

It is very possible that I missed something here, my only point is
that I think it would be safer to assume nothing about the leaderness.

Oleg.

next prev parent reply	other threads:[~2010-06-23 15:26 UTC|newest]

Thread overview: 45+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-06-18 19:02 [PATCH] fix the racy check_hung_uninterruptible_tasks()->rcu_lock_break() logic Oleg Nesterov
2010-06-18 19:34 ` while_each_thread() under rcu_read_lock() is broken? Oleg Nesterov
2010-06-18 21:08   ` Roland McGrath
2010-06-18 22:37     ` Oleg Nesterov
2010-06-18 22:33   ` Paul E. McKenney
2010-06-21 17:09     ` Oleg Nesterov
2010-06-21 17:44       ` Oleg Nesterov
2010-06-21 18:00         ` Oleg Nesterov
2010-06-21 19:02         ` Roland McGrath
2010-06-21 20:06           ` Oleg Nesterov
2010-06-21 21:19             ` Eric W. Biederman
2010-06-22 14:34               ` Oleg Nesterov
2010-07-08 23:59             ` Roland McGrath
2010-07-09  0:41               ` Paul E. McKenney
2010-07-09  1:01                 ` Roland McGrath
2010-07-09 16:18                   ` Paul E. McKenney
2010-06-21 20:51       ` Paul E. McKenney
2010-06-21 21:22         ` Eric W. Biederman
2010-06-21 21:38           ` Paul E. McKenney
2010-06-22 21:23         ` Oleg Nesterov
2010-06-22 22:12           ` Paul E. McKenney
2010-06-23 15:24             ` Oleg Nesterov [this message]
2010-06-24 18:07               ` Paul E. McKenney
2010-06-24 18:50                 ` Chris Friesen
2010-06-24 22:00                   ` Oleg Nesterov
2010-06-25  0:08                     ` Eric W. Biederman
2010-06-25  3:42                       ` Paul E. McKenney
2010-06-25 10:08                       ` Oleg Nesterov
2010-07-09  0:52                       ` Roland McGrath
2010-06-24 21:14                 ` Roland McGrath
2010-06-25  3:37                   ` Paul E. McKenney
2010-07-09  0:41                     ` Roland McGrath
2010-06-24 21:57                 ` Oleg Nesterov
2010-06-25  3:41                   ` Paul E. McKenney
2010-06-25  9:55                     ` Oleg Nesterov
2010-06-28 23:43                       ` Paul E. McKenney
2010-06-29 13:05                         ` Oleg Nesterov
2010-06-29 15:34                           ` Paul E. McKenney
2010-06-29 17:54                             ` Oleg Nesterov
2010-06-19  5:00   ` Mandeep Baines
2010-06-19  5:35     ` Frederic Weisbecker
2010-06-19 15:44       ` Mandeep Baines
2010-06-19 19:19     ` Oleg Nesterov
2010-06-18 20:11 ` [PATCH] fix the racy check_hung_uninterruptible_tasks()->rcu_lock_break() logic Frederic Weisbecker
2010-06-18 20:38 ` Mandeep Singh Baines

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100623152421.GA8445@redhat.com \
    --to=oleg@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=dzickus@redhat.com \
    --cc=ebiederm@xmission.com \
    --cc=fweisbec@gmail.com \
    --cc=jmarchan@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=msb@google.com \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=roland@redhat.com \
    --cc=stable@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.