From: Mike Galbraith <efault@gmx.de>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@elte.hu>, LKML <linux-kernel@vger.kernel.org>
Subject: Re: [patch] sched: fix b5d9d734 blunder in task_new_fair()
Date: Fri, 27 Nov 2009 09:57:44 +0100 [thread overview]
Message-ID: <1259312264.7747.4.camel@marge.simson.net> (raw)
In-Reply-To: <1259311550.6110.241.camel@marge.simson.net>
On Fri, 2009-11-27 at 09:45 +0100, Mike Galbraith wrote:
> Off list
Guess not, mouse went click on the way to save menu. Oh well, was going
to spare thousands of mailboxes another confused /me message, but too
late now.
> On Thu, 2009-11-26 at 17:26 +0100, Peter Zijlstra wrote:
> > On Sun, 2009-11-22 at 13:07 +0100, Mike Galbraith wrote:
> > > @@ -2589,16 +2588,10 @@ void sched_fork(struct task_struct *p, i
> > > */
> > > p->prio = current->normal_prio;
> > >
> > > - if (!rt_prio(p->prio))
> > > + if (!task_has_rt_policy(p))
> > > p->sched_class = &fair_sched_class;
> > >
> > > -#ifdef CONFIG_SMP
> > > - cpu = p->sched_class->select_task_rq(p, SD_BALANCE_FORK, 0);
> > > -#endif
> > > - local_irq_save(flags);
> > > - update_rq_clock(cpu_rq(cpu));
> > > - set_task_cpu(p, cpu);
> > > - local_irq_restore(flags);
> > > + __set_task_cpu(p, cpu);
> >
> > OK, so I figured out why it was in sched_fork() and not in
> > wake_up_new_task().
> >
> > It is because in sched_fork() the new task isn't in the tasklist yet and
> > can therefore not race with any other migration logic.
>
> All the raciness I'm fretting over probably just doesn't matter much.
> Things aren't exploding. Maybe min_vruntime is the only thing I should
> be worrying about. That concern is in-flight deltas of SCHED_IDLE
> magnitude, ie cross cpu "fuzziness" on a very large scale.
>
> However :-/ (aw sh*t, here i go again. aaaaOOOOOgah! dive! dive!;)
>
> WRT affinity, sched_class, nice level fretting, that can all change from
> userland at any instant that you do not hold the task's runqueue lock
> and the tasklist lock is not held by somebody to keep them from getting
> a task reference to start the ball rolling. As soon as you drop the
> runqueue lock, userland can acquire, and change whatever it likes under
> you, so afaikt, we can call the wrong sched_class method etc etc.
>
> 3f029d3 agrees fully wrt sched_class at least:
> In addition, we fix a race condition where we try to access
> current->sched_class without holding the rq->lock. This is
> technically racy, as the sched-class could change out from
> under us. Instead, we reference the per-rq post_schedule
> variable with the runqueue unlocked, but with preemption
> disabled to see if we need to reacquire the rq->lock.
>
> The only thing that really changes with the unlocked _rummaging_ is that
> we now can't count on nr_running/load on the task's current runqueue,
> sched_class etc while you're rummaging, ALL state is fuzzy, instead of
> only most.
>
> However, I don't think we can even count on the task remaining untouched
> while in TASK_WAKING state, and that might be a bigger deal.
>
> afaikt, userland can migrate the task you're in the process of waking
> while you're off rummaging around looking for a place to put it, like
> so: Wakee is on the tasklist, can be accessed by userland. We wouldn't
> be in ttwu either were it not. We're waking, we set task state to
> TASK_WAKING, release the lock, userland acquires, nobody but ttwu has
> ever heard of a TASK_WAKING, so it happily changes task's affinity,
> migrates the sleeping task to the one and only (pins) correct runqueue,
> sets task cpu etc, releases the lock, and goes home. We finish
> rummaging, do NOT check for an intervening affinity change, instead, we
> do an unlocked scribble over what userland just wrote, resetting cpu and
> vruntime to a now illegal cpu, and activate. I'm not seeing any
> inhibitor for this scenario.
>
> When I moved fork balancing runqueue selection to the much more logical
> wakeup and enqueue time, vs copy and fiddle time, I didn't fix anything,
> I merely duplicated the races that are now in ttwu.
>
> No matter were we do the selection, we can race with userland if the
> darn thing isn't locked all the while. With .31 ttwu locking, there is
> no race, because nobody can get to the task struct. If target cpu
> changes via rq selection, we set cpu, _then_ unlock, at which point
> userland or whomever may override _our_ decision, but we never write
> after re-acquiring, so intervening changes, if any, stay intact.
>
> With an exec, userland policy/affinity change will deactivate/activate
> or do a migration call. We don't have the thing locked while we're
> rummaging, so what keeps sched_class from changing after we evaluated,
> so we call the wrong method, and then do our own migration call?
>
> /me is still pretty befuddled, and haven't even crawled over PI.
>
> I flat don't see how we can do this race free, unless we put every task
> in some untouchable state while we're rummaging, and teach everything
> having to do with migration about that state.
>
> -Mike
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
next prev parent reply other threads:[~2009-11-27 8:57 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-11-22 12:07 [patch] sched: fix b5d9d734 blunder in task_new_fair() Mike Galbraith
2009-11-24 13:51 ` Peter Zijlstra
2009-11-24 17:07 ` Mike Galbraith
2009-11-24 17:35 ` Peter Zijlstra
2009-11-24 17:54 ` Peter Zijlstra
2009-11-24 18:21 ` Mike Galbraith
2009-11-24 18:27 ` Peter Zijlstra
2009-11-24 18:36 ` Mike Galbraith
2009-11-25 6:57 ` Mike Galbraith
2009-11-25 9:51 ` Peter Zijlstra
2009-11-25 13:09 ` Mike Galbraith
2009-11-26 16:26 ` Peter Zijlstra
2009-11-27 8:45 ` Mike Galbraith
2009-11-27 8:57 ` Mike Galbraith [this message]
2009-11-27 11:55 ` Mike Galbraith
2009-11-27 12:21 ` Peter Zijlstra
2009-11-27 12:38 ` Peter Zijlstra
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1259312264.7747.4.camel@marge.simson.net \
--to=efault@gmx.de \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@elte.hu \
--cc=peterz@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.