public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Eric Paris <eparis@redhat.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: linux-kernel@vger.kernel.org, mingo@elte.hu, efault@gmx.de
Subject: Re: 2.6.33-rc1 unusable due to scheduler issues, circular locking, WARNs and BUGs
Date: Tue, 22 Dec 2009 01:29:42 -0500	[thread overview]
Message-ID: <1261463382.2923.1.camel@localhost> (raw)
In-Reply-To: <1261471720.4937.9.camel@laptop>

On Tue, 2009-12-22 at 09:48 +0100, Peter Zijlstra wrote:
> On Mon, 2009-12-21 at 19:17 -0500, Eric Paris wrote:
> > Trying to build a kernel on a 48 core x86_64 box using make -j 64 and
> > I'm exploding in the scheduler.  I'm running (and building) kernel
> > f7b84a6ba7eaeba4e1df8feddca1473a7db369a5  There are three distinct
> > signatures of problems.  Some boots I'll see all 3 of these failures
> > sometimes only 1 or 2 of them.  That's the reason they are kinda split
> > up in dmesg.

Appears the kernel built with no oops, circular locking, or bugs.  Only
thing in dmesg during the build was:

DMA-API: debugging out of memory - disabling

I'm going to do it a number more times to be sure, but I had 100%
failure rate before.  Sounds like this patch has got to go.

-Eric

> > 
> > 1) gcc/3141 is trying to acquire lock:
> >  (&(&sem->wait_lock)->rlock){......}, at: [<ffffffff81223234>] __down_read_trylock+0x13/0x46
> > 
> > but task is already holding lock:
> >  (&rq->lock){-.-.-.}, at: [<ffffffff8103dd2d>] task_rq_lock+0x51/0x83
> 
> This is due to the pagefalut happening while holding the rq->lock, so
> its an artefact of 3).
> 
> > 2) WARN() in kernel/sched_fair.c:1001 hrtick_start_fair()
> 
> Worrying, but probably due to the same problem as 3)
> 
> > 3) NULL pointer dereference at 0000000000000168 in check_preempt_wakeup
> >       kernel/sched_fair.c
> 
> Right, hard to tell where exactly it goes bang, but could you please try
> reverting the below patch.
> 
> What I suspect happens is that we his the task_cpu(p)==cpu case, we then
> don't do __set_task_cpu()->set_task_rq(), which sets the group
> scheduling pointers (you seem to have cgroup scheduling enabled).
> 
> If those pointers are wild all kinds of interesting bits can happen,
> including 3) and possibly 2).
> 
> If this revert doesn't help, could you please also provide the output of
> addr2line -e vmlinux <FAULT_IP> ?
> 
> ---
> commit 738d2be4301007f054541c5c4bf7fb6a361c9b3a
> Author: Peter Zijlstra <a.p.zijlstra@chello.nl>
> Date:   Wed Dec 16 18:04:42 2009 +0100
> 
>     sched: Simplify set_task_cpu()
>     
>     Rearrange code a bit now that its a simpler function.
>     
>     Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
>     Cc: Mike Galbraith <efault@gmx.de>
>     LKML-Reference: <20091216170518.269101883@chello.nl>
>     Signed-off-by: Ingo Molnar <mingo@elte.hu>
> 
> diff --git a/kernel/sched.c b/kernel/sched.c
> index f92ce63..8a2bfd3 100644
> --- a/kernel/sched.c
> +++ b/kernel/sched.c
> @@ -2034,11 +2034,8 @@ task_hot(struct task_struct *p, u64 now, struct sched_domain *sd)
>  	return delta < (s64)sysctl_sched_migration_cost;
>  }
>  
> -
>  void set_task_cpu(struct task_struct *p, unsigned int new_cpu)
>  {
> -	int old_cpu = task_cpu(p);
> -
>  #ifdef CONFIG_SCHED_DEBUG
>  	/*
>  	 * We should never call set_task_cpu() on a blocked task,
> @@ -2049,11 +2046,11 @@ void set_task_cpu(struct task_struct *p, unsigned int new_cpu)
>  
>  	trace_sched_migrate_task(p, new_cpu);
>  
> -	if (old_cpu != new_cpu) {
> -		p->se.nr_migrations++;
> -		perf_sw_event(PERF_COUNT_SW_CPU_MIGRATIONS,
> -				     1, 1, NULL, 0);
> -	}
> +	if (task_cpu(p) == new_cpu)
> +		return;
> +
> +	p->se.nr_migrations++;
> +	perf_sw_event(PERF_COUNT_SW_CPU_MIGRATIONS, 1, 1, NULL, 0);
>  
>  	__set_task_cpu(p, new_cpu);
>  }
> 
> 



  reply	other threads:[~2009-12-22 14:31 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-12-22  0:17 2.6.33-rc1 unusable due to scheduler issues, circular locking, WARNs and BUGs Eric Paris
2009-12-22  5:42 ` Xiaotian Feng
2009-12-22  7:19   ` Américo Wang
2009-12-22  7:41     ` Xiaotian Feng
2009-12-22  7:50       ` Américo Wang
2009-12-22  8:34         ` Xiaotian Feng
2009-12-22  8:48 ` Peter Zijlstra
2009-12-22  6:29   ` Eric Paris [this message]
2009-12-22 14:41     ` [PATCH] sched: Peter Zijlstra
2009-12-22 14:43     ` [PATCH] sched: Revert 738d2be, Simplify set_task_cpu() Peter Zijlstra
2009-12-23  9:06       ` [tip:sched/urgent] sched: Revert 738d2be, simplify set_task_cpu() tip-bot for Peter Zijlstra
2009-12-22 11:31   ` 2.6.33-rc1 unusable due to scheduler issues, circular locking, WARNs and BUGs Arjan van de Ven

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1261463382.2923.1.camel@localhost \
    --to=eparis@redhat.com \
    --cc=efault@gmx.de \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=peterz@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox