Re: [patch] sched: fix b5d9d734 blunder in task_new_fair()

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Mike Galbraith <efault@gmx.de>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@elte.hu>, LKML <linux-kernel@vger.kernel.org>
Subject: Re: [patch] sched: fix b5d9d734 blunder in task_new_fair()
Date: Fri, 27 Nov 2009 09:57:44 +0100	[thread overview]
Message-ID: <1259312264.7747.4.camel@marge.simson.net> (raw)
In-Reply-To: <1259311550.6110.241.camel@marge.simson.net>

On Fri, 2009-11-27 at 09:45 +0100, Mike Galbraith wrote:
> Off list

Guess not, mouse went click on the way to save menu.  Oh well, was going
to spare thousands of mailboxes another confused /me message, but too
late now.

> On Thu, 2009-11-26 at 17:26 +0100, Peter Zijlstra wrote:
> > On Sun, 2009-11-22 at 13:07 +0100, Mike Galbraith wrote:
> > > @@ -2589,16 +2588,10 @@ void sched_fork(struct task_struct *p, i
> > >          */
> > >         p->prio = current->normal_prio;
> > >  
> > > -       if (!rt_prio(p->prio))
> > > +       if (!task_has_rt_policy(p))
> > >                 p->sched_class = &fair_sched_class;
> > >  
> > > -#ifdef CONFIG_SMP
> > > -       cpu = p->sched_class->select_task_rq(p, SD_BALANCE_FORK, 0);
> > > -#endif
> > > -       local_irq_save(flags);
> > > -       update_rq_clock(cpu_rq(cpu));
> > > -       set_task_cpu(p, cpu);
> > > -       local_irq_restore(flags);
> > > +       __set_task_cpu(p, cpu); 
> > 
> > OK, so I figured out why it was in sched_fork() and not in
> > wake_up_new_task().
> > 
> > It is because in sched_fork() the new task isn't in the tasklist yet and
> > can therefore not race with any other migration logic.
> 
> All the raciness I'm fretting over probably just doesn't matter much.
> Things aren't exploding.  Maybe min_vruntime is the only thing I should
> be worrying about.  That concern is in-flight deltas of SCHED_IDLE
> magnitude, ie cross cpu "fuzziness" on a very large scale.
> 
> However :-/  (aw sh*t, here i go again.  aaaaOOOOOgah! dive! dive!;)
> 
> WRT affinity, sched_class, nice level fretting, that can all change from
> userland at any instant that you do not hold the task's runqueue lock
> and the tasklist lock is not held by somebody to keep them from getting
> a task reference to start the ball rolling.  As soon as you drop the
> runqueue lock, userland can acquire, and change whatever it likes under
> you, so afaikt, we can call the wrong sched_class method etc etc.
> 
> 3f029d3 agrees fully wrt sched_class at least:
>     In addition, we fix a race condition where we try to access
>     current->sched_class without holding the rq->lock.  This is
>     technically racy, as the sched-class could change out from
>     under us.  Instead, we reference the per-rq post_schedule
>     variable with the runqueue unlocked, but with preemption
>     disabled to see if we need to reacquire the rq->lock.
> 
> The only thing that really changes with the unlocked _rummaging_ is that
> we now can't count on nr_running/load on the task's current runqueue,
> sched_class etc while you're rummaging, ALL state is fuzzy, instead of
> only most.
> 
> However, I don't think we can even count on the task remaining untouched
> while in TASK_WAKING state, and that might be a bigger deal.
> 
> afaikt, userland can migrate the task you're in the process of waking
> while you're off rummaging around looking for a place to put it, like
> so:  Wakee is on the tasklist, can be accessed by userland.  We wouldn't
> be in ttwu either were it not.  We're waking, we set task state to
> TASK_WAKING, release the lock, userland acquires, nobody but ttwu has
> ever heard of a TASK_WAKING, so it happily changes task's affinity,
> migrates the sleeping task to the one and only (pins) correct runqueue,
> sets task cpu etc, releases the lock, and goes home.  We finish
> rummaging, do NOT check for an intervening affinity change, instead, we
> do an unlocked scribble over what userland just wrote, resetting cpu and
> vruntime to a now illegal cpu, and activate.  I'm not seeing any
> inhibitor for this scenario.
> 
> When I moved fork balancing runqueue selection to the much more logical
> wakeup and enqueue time, vs copy and fiddle time, I didn't fix anything,
> I merely duplicated the races that are now in ttwu.
> 
> No matter were we do the selection, we can race with userland if the
> darn thing isn't locked all the while.  With .31 ttwu locking, there is
> no race, because nobody can get to the task struct.  If target cpu
> changes via rq selection, we set cpu, _then_ unlock, at which point
> userland or whomever may override _our_ decision, but we never write
> after re-acquiring, so intervening changes, if any, stay intact.
> 
> With an exec, userland policy/affinity change will deactivate/activate
> or do a migration call.  We don't have the thing locked while we're
> rummaging, so what keeps sched_class from changing after we evaluated,
> so we call the wrong method, and then do our own migration call?
> 
> /me is still pretty befuddled, and haven't even crawled over PI.
> 
> I flat don't see how we can do this race free, unless we put every task
> in some untouchable state while we're rummaging, and teach everything
> having to do with migration about that state.
> 
> 	-Mike
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

next prev parent reply	other threads:[~2009-11-27  8:57 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-11-22 12:07 [patch] sched: fix b5d9d734 blunder in task_new_fair() Mike Galbraith
2009-11-24 13:51 ` Peter Zijlstra
2009-11-24 17:07   ` Mike Galbraith
2009-11-24 17:35     ` Peter Zijlstra
2009-11-24 17:54       ` Peter Zijlstra
2009-11-24 18:21         ` Mike Galbraith
2009-11-24 18:27           ` Peter Zijlstra
2009-11-24 18:36             ` Mike Galbraith
2009-11-25  6:57             ` Mike Galbraith
2009-11-25  9:51               ` Peter Zijlstra
2009-11-25 13:09                 ` Mike Galbraith
2009-11-26 16:26 ` Peter Zijlstra
2009-11-27  8:45   ` Mike Galbraith
2009-11-27  8:57     ` Mike Galbraith [this message]
2009-11-27 11:55   ` Mike Galbraith
2009-11-27 12:21     ` Peter Zijlstra
2009-11-27 12:38       ` Peter Zijlstra

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1259312264.7747.4.camel@marge.simson.net \
    --to=efault@gmx.de \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=peterz@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox