From: Oleg Nesterov <oleg@redhat.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@elte.hu>, Ben Blum <bblum@google.com>,
Jiri Slaby <jirislaby@gmail.com>,
Lai Jiangshan <laijs@cn.fujitsu.com>,
Li Zefan <lizf@cn.fujitsu.com>, Miao Xie <miaox@cn.fujitsu.com>,
Paul Menage <menage@google.com>,
"Rafael J. Wysocki" <rjw@sisk.pl>, Tejun Heo <tj@kernel.org>,
linux-kernel@vger.kernel.org
Subject: Re: [PATCH 0/6] sched/cpusets fixes, more changes are needed
Date: Thu, 25 Mar 2010 16:46:16 +0100 [thread overview]
Message-ID: <20100325154616.GA9773@redhat.com> (raw)
In-Reply-To: <1269512531.12097.67.camel@laptop>
On 03/25, Peter Zijlstra wrote:
>
> On Wed, 2010-03-24 at 19:09 +0100, Oleg Nesterov wrote:
> > On 03/24, Peter Zijlstra wrote:
> > >
> > > On Mon, 2010-03-15 at 10:09 +0100, Oleg Nesterov wrote:
> > > >
> > > > - do_fork() clears PF_STARTING and then calls wake_up_new_task()
> > > > which finally does s/WAKING/RUNNING.
> > > >
> > > > But. Nobody can take rq->lock in between. This means a signal
> > > > from irq (quite possible with CLONE_THREAD) or another rt
> > > > thread which preempts us can lockup.
> > >
> > > Hmm, the signal case might indeed be a problem, however I cannot see how
> > > the RT thread can be a problem because until we do wake_up_new_task()
> > > the child will not be runnable and can thus not be preempted.
> >
> > Indeed, but I meant the _parent_ can be preempted ;)
>
> I still can't see how that would be a problem..
The parent P does do_fork(), copy_process returns the new child C with
TASK_WAKING at PF_STARTING set.
do_fork() clears PF_STARTING, but TASK_WAKING is still set, and C is
already visible to the user-space
An rt-thread X preempts P and calls ttwu() (say, it sends a signal to C)
ttwu() loops in task_rq_lock() "forever", because TASK_WAKING is still
set.
> > > The reason we have that TASK_WAKING stuff for fork is because
> > > wake_up_new_task() needs p->cpus_allowed to be stable
> >
> > Sure! But it is very easy to change wake_up_new_task() to set TASK_WAKING
> > like ttwu() does. Of course, this needs raw_spin_lock_irq(rq->lock) for
> > a moment, but afaics that is all?
>
> My patch does that.
OK, that is what I meant. Now, why sched_fork() can't just set TASK_RUNNING ?
This way the "spurious" wakeup can do nothing with the new child, and we
do not have problems with cpuset which needs rq->lock for set_cpus_allowed().
> > > So the below patch makes select_task_rq_fair unlock the rq when needed,
> > > and then puts all ->select_task_rq() calls under rq->lock.
Yes, I thought about this too. I tried to preserve the current
"->select_task_rq() is called without rq->lock" logic.
> This should
> > > allow us to remove the TASK_WAKING thing from fork
Confused. Why can't we simply remove TASK_WAKING from fork without any
changes except in wake_up_new_task() ?
> which in turn allows
> > > us to remove the PF_STARTING check in task_is_waking.
Even if I do not think I understand sched.c above, I'd say we must do
this in any case ;)
> I was still looking at removing the TASK_WAKING check from
> task_rq_lock()
Peter, I can't apply your patch due to rejects (will try again later)
but at first glance, it makes TASK_WAKING unneeded? Since we do not
drop the lock after we set TASK_WAKING, why do we need this state at
all ?
Aha... select_task_rq_fair() can drop the lock, yes? Well, in this
case probably select_task_rq_fair() can set TASK_WAKING before unlock?
I like the current idea to call select_task_rq() without rq->lock, but
of course this is up to you.
However, once again, can't we make a simpler patch?
- remove PF_STARTING from task_waking()
- change sched_fork() to set RUNNING instead of WAKING
- change wake_up_new_task() to set WAKING under rq->lock
This looks simpler to me, and allows to drop rq->lock in ttwu() right
after it sets WAKING.
Another change which seems reasonable is to allow ttwu() to take rq->lock
even if WAKING is set, it can do nothing but check task->state in this case.
What do you think?
Oleg.
next prev parent reply other threads:[~2010-03-25 15:48 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-03-15 9:09 [PATCH 0/6] sched/cpusets fixes, more changes are needed Oleg Nesterov
2010-03-24 17:38 ` Peter Zijlstra
2010-03-24 18:09 ` Oleg Nesterov
2010-03-25 10:22 ` Peter Zijlstra
2010-03-25 15:46 ` Oleg Nesterov [this message]
2010-03-25 16:02 ` Oleg Nesterov
2010-03-25 16:10 ` Oleg Nesterov
2010-03-25 17:29 ` Peter Zijlstra
2010-03-25 19:15 ` Oleg Nesterov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20100325154616.GA9773@redhat.com \
--to=oleg@redhat.com \
--cc=bblum@google.com \
--cc=jirislaby@gmail.com \
--cc=laijs@cn.fujitsu.com \
--cc=linux-kernel@vger.kernel.org \
--cc=lizf@cn.fujitsu.com \
--cc=menage@google.com \
--cc=miaox@cn.fujitsu.com \
--cc=mingo@elte.hu \
--cc=peterz@infradead.org \
--cc=rjw@sisk.pl \
--cc=tj@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox