From: Mike Galbraith <efault@gmx.de>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@elte.hu>, LKML <linux-kernel@vger.kernel.org>
Subject: Re: [patch] sched: fix b5d9d734 blunder in task_new_fair()
Date: Fri, 27 Nov 2009 09:57:44 +0100 [thread overview]
Message-ID: <1259312264.7747.4.camel@marge.simson.net> (raw)
In-Reply-To: <1259311550.6110.241.camel@marge.simson.net>
On Fri, 2009-11-27 at 09:45 +0100, Mike Galbraith wrote:
> Off list
Guess not, mouse went click on the way to save menu. Oh well, was going
to spare thousands of mailboxes another confused /me message, but too
late now.
> On Thu, 2009-11-26 at 17:26 +0100, Peter Zijlstra wrote:
> > On Sun, 2009-11-22 at 13:07 +0100, Mike Galbraith wrote:
> > > @@ -2589,16 +2588,10 @@ void sched_fork(struct task_struct *p, i
> > > */
> > > p->prio = current->normal_prio;
> > >
> > > - if (!rt_prio(p->prio))
> > > + if (!task_has_rt_policy(p))
> > > p->sched_class = &fair_sched_class;
> > >
> > > -#ifdef CONFIG_SMP
> > > - cpu = p->sched_class->select_task_rq(p, SD_BALANCE_FORK, 0);
> > > -#endif
> > > - local_irq_save(flags);
> > > - update_rq_clock(cpu_rq(cpu));
> > > - set_task_cpu(p, cpu);
> > > - local_irq_restore(flags);
> > > + __set_task_cpu(p, cpu);
> >
> > OK, so I figured out why it was in sched_fork() and not in
> > wake_up_new_task().
> >
> > It is because in sched_fork() the new task isn't in the tasklist yet and
> > can therefore not race with any other migration logic.
>
> All the raciness I'm fretting over probably just doesn't matter much.
> Things aren't exploding. Maybe min_vruntime is the only thing I should
> be worrying about. That concern is in-flight deltas of SCHED_IDLE
> magnitude, ie cross cpu "fuzziness" on a very large scale.
>
> However :-/ (aw sh*t, here i go again. aaaaOOOOOgah! dive! dive!;)
>
> WRT affinity, sched_class, nice level fretting, that can all change from
> userland at any instant that you do not hold the task's runqueue lock
> and the tasklist lock is not held by somebody to keep them from getting
> a task reference to start the ball rolling. As soon as you drop the
> runqueue lock, userland can acquire, and change whatever it likes under
> you, so afaikt, we can call the wrong sched_class method etc etc.
>
> 3f029d3 agrees fully wrt sched_class at least:
> In addition, we fix a race condition where we try to access
> current->sched_class without holding the rq->lock. This is
> technically racy, as the sched-class could change out from
> under us. Instead, we reference the per-rq post_schedule
> variable with the runqueue unlocked, but with preemption
> disabled to see if we need to reacquire the rq->lock.
>
> The only thing that really changes with the unlocked _rummaging_ is that
> we now can't count on nr_running/load on the task's current runqueue,
> sched_class etc while you're rummaging, ALL state is fuzzy, instead of
> only most.
>
> However, I don't think we can even count on the task remaining untouched
> while in TASK_WAKING state, and that might be a bigger deal.
>
> afaikt, userland can migrate the task you're in the process of waking
> while you're off rummaging around looking for a place to put it, like
> so: Wakee is on the tasklist, can be accessed by userland. We wouldn't
> be in ttwu either were it not. We're waking, we set task state to
> TASK_WAKING, release the lock, userland acquires, nobody but ttwu has
> ever heard of a TASK_WAKING, so it happily changes task's affinity,
> migrates the sleeping task to the one and only (pins) correct runqueue,
> sets task cpu etc, releases the lock, and goes home. We finish
> rummaging, do NOT check for an intervening affinity change, instead, we
> do an unlocked scribble over what userland just wrote, resetting cpu and
> vruntime to a now illegal cpu, and activate. I'm not seeing any
> inhibitor for this scenario.
>
> When I moved fork balancing runqueue selection to the much more logical
> wakeup and enqueue time, vs copy and fiddle time, I didn't fix anything,
> I merely duplicated the races that are now in ttwu.
>
> No matter were we do the selection, we can race with userland if the
> darn thing isn't locked all the while. With .31 ttwu locking, there is
> no race, because nobody can get to the task struct. If target cpu
> changes via rq selection, we set cpu, _then_ unlock, at which point
> userland or whomever may override _our_ decision, but we never write
> after re-acquiring, so intervening changes, if any, stay intact.
>
> With an exec, userland policy/affinity change will deactivate/activate
> or do a migration call. We don't have the thing locked while we're
> rummaging, so what keeps sched_class from changing after we evaluated,
> so we call the wrong method, and then do our own migration call?
>
> /me is still pretty befuddled, and haven't even crawled over PI.
>
> I flat don't see how we can do this race free, unless we put every task
> in some untouchable state while we're rummaging, and teach everything
> having to do with migration about that state.
>
> -Mike
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
next prev parent reply other threads:[~2009-11-27 8:57 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-11-22 12:07 [patch] sched: fix b5d9d734 blunder in task_new_fair() Mike Galbraith
2009-11-24 13:51 ` Peter Zijlstra
2009-11-24 17:07 ` Mike Galbraith
2009-11-24 17:35 ` Peter Zijlstra
2009-11-24 17:54 ` Peter Zijlstra
2009-11-24 18:21 ` Mike Galbraith
2009-11-24 18:27 ` Peter Zijlstra
2009-11-24 18:36 ` Mike Galbraith
2009-11-25 6:57 ` Mike Galbraith
2009-11-25 9:51 ` Peter Zijlstra
2009-11-25 13:09 ` Mike Galbraith
2009-11-26 16:26 ` Peter Zijlstra
2009-11-27 8:45 ` Mike Galbraith
2009-11-27 8:57 ` Mike Galbraith [this message]
2009-11-27 11:55 ` Mike Galbraith
2009-11-27 12:21 ` Peter Zijlstra
2009-11-27 12:38 ` Peter Zijlstra
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1259312264.7747.4.camel@marge.simson.net \
--to=efault@gmx.de \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@elte.hu \
--cc=peterz@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox