Re: [rfc] forked kernel task and mm structures imbalanced on NUMA

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Nick Piggin <npiggin@suse.de>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	Ingo Molnar <mingo@redhat.com>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	Lee Schermerhorn <lee.schermerhorn@hp.com>
Subject: Re: [rfc] forked kernel task and mm structures imbalanced on NUMA
Date: Tue, 1 Jun 2010 18:41:57 +1000	[thread overview]
Message-ID: <20100601084157.GS9453@laptop> (raw)
In-Reply-To: <1275380202.27810.26214.camel@twins>

On Tue, Jun 01, 2010 at 10:16:42AM +0200, Peter Zijlstra wrote:
> On Tue, 2010-06-01 at 17:33 +1000, Nick Piggin wrote:
> > Another problem I found when testing this patch is that the scheduler
> > has some issues of its own when balancing. This is improved by
> > traversing the sd groups starting from a different spot each time, so
> > processes get sprinkled around the nodes a bit better.
> 
> Right, makes sense. And I think we could merge that group iteration
> without much problems.
> 
> Your alternative placement for sched_exec() seems to make sense too, the
> earlier we do that the better the memory allocations will be.
> 
> Your changes to sched_fork() and wake_up_new_task() made my head hurt a
> bit -- but that's not your fault. I'm not quite sure why you're changing
> that though.

Well because we indeed don't want to select a new CPU for it unless
there has been a cpus_allowed race in the meantime.

 
> The addition of sched_fork_suggest_cpu() to select a target node seems
> to make sense, but since you then call fork balancing a second time we
> have a chance of ending up on a totally different node all together.
> 
> So I think it would make sense to rework the fork balancing muck to be
> called only once and stick with its decision.

Just need to close that race somehow. AFAIKS we can't use TASK_WAKING
because that must not be preempted?

Otherwise I don't see a problem with just taking another balance on
the extremely rare case of task failure.


 
> One thing that would make the whole fork path much easier is fully
> ripping out that child_runs_first mess for CONFIG_SMP, I think its been
> disabled by default for long enough, and its always been broken in the
> face of fork balancing anyway.

Interesting problem. vfork is nice for fork+exec, but it's a bit
restrictive.

 
> So basically we have to move fork balancing back to sched_fork(), I'd
> have to again look at wth happens to ->cpus_allowed, but I guess it
> should be fixable, and I don't think we should care overly much about
> cpu-hotplug.

No more than simply getting it right. Simply calling into the balancer
again seems to be the simplest way to do it.

 
> A specific code comment:
> 
> > @@ -2550,14 +2561,16 @@ void wake_up_new_task(struct task_struct
> >          * We set TASK_WAKING so that select_task_rq() can drop rq->lock
> >          * without people poking at ->cpus_allowed.
> >          */
> > -       cpu = select_task_rq(rq, p, SD_BALANCE_FORK, 0);
> > -       set_task_cpu(p, cpu);
> > -
> > -       p->state = TASK_RUNNING;
> > -       task_rq_unlock(rq, &flags);
> > +       if (!cpumask_test_cpu(cpu, &p->cpus_allowed)) {
> > +               p->state = TASK_WAKING;
> > +               cpu = select_task_rq(rq, p, SD_BALANCE_FORK, 0);
> > +               set_task_cpu(p, cpu);
> > +               p->state = TASK_RUNNING;
> > +               task_rq_unlock(rq, &flags);
> > +               rq = task_rq_lock(p, &flags);
> > +       }
> >  #endif
> 
> That's iffy because p->cpus_allowed isn't stable when we're not holding
> the task's current rq->lock or p->state is not TASK_WAKING.
> 

Oop, yeah missed that. Half hearted attempt to avoid more rq locks.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

next prev parent reply	other threads:[~2010-06-01  8:42 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-06-01  7:33 [rfc] forked kernel task and mm structures imbalanced on NUMA Nick Piggin
2010-06-01  8:16 ` Peter Zijlstra
2010-06-01  8:41   ` Nick Piggin [this message]
2010-06-01  9:05     ` Peter Zijlstra
2010-06-01  8:49   ` Peter Zijlstra
2010-06-01 15:48 ` Andi Kleen
2010-06-01 15:59   ` Nick Piggin
2010-06-01 16:20     ` Andi Kleen
2010-06-01 16:31       ` Nick Piggin
2010-06-01 16:00 ` Peter Zijlstra
2010-06-01 16:02   ` Peter Zijlstra

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100601084157.GS9453@laptop \
    --to=npiggin@suse.de \
    --cc=lee.schermerhorn@hp.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).