public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* 2.6.23-rc7-mm1:  panic in scheduler
@ 2007-09-24 21:12 Lee Schermerhorn
  2007-09-24 21:30 ` Kamalesh Babulal
  2007-09-24 22:24 ` Ingo Molnar
  0 siblings, 2 replies; 6+ messages in thread
From: Lee Schermerhorn @ 2007-09-24 21:12 UTC (permalink / raw)
  To: linux-kernel; +Cc: Andrew Morton, Ingo Molnar, Peter Zijlstra

I looked around on the MLs for mention of this, but didn't find anything
that appeared to match.

Platform:  HP rx8620 - 16-cpu/32GB/4-node ia64 [Madison]

2.6.23-rc7-mm1 broken out -- panic occurs when git-sched.patch pushed:

Unable to handle kernel NULL pointer dereference (address 0000000000000000)
swapper[0]: Oops 8813272891392 [1]
Modules linked in: scsi_wait_scan ehci_hcd ohci_hcd uhci_hcd usbcore

Pid: 0, CPU 14, comm:              swapper
psr : 0000101008522030 ifs : 8000000000000002 ip  : [<a0000001003014e0>]    Not tainted
ip is at rb_next+0x0/0x140
unat: 0000000000000000 pfs : 0000000000000308 rsc : 0000000000000003
rnat: 8000000000000012 bsps: 000000000001003e pr  : 6609a840599519a5
ldrs: 0000000000000000 ccv : 0000000000000002 fpsr: 0009804c8a70433f
csd : 0000000000000000 ssd : 0000000000000000
b0  : a000000100078dc0 b6  : a000000100074a40 b7  : a000000100078e00
f6  : 1003e0000000000000000 f7  : 1003e0000000000400000
f8  : 1003e000000002aaaaaab f9  : 1003e0000000d43798a2b
f10 : 1003e35e9970b967dd8b9 f11 : 1003e0000000000000002
r1  : a000000100bc0920 r2  : e0000760000577f0 r3  : e000076000057f10
r8  : fffffffffffffff0 r9  : 0000000000000002 r10 : e000076000057780
r11 : 0000000000000000 r12 : e00007004160fe10 r13 : e000070041608000
r14 : 0000000000000000 r15 : 000000000000000e r16 : 00000007f6c30a22
r17 : e000070041608040 r18 : a0000001008383a8 r19 : a000000100078e00
r20 : e000076000055bb8 r21 : e000076000055bb0 r22 : e000076000057ed0
r23 : 00000000000f4240 r24 : a0000001009e0440 r25 : e000070041608bb4
r26 : 0000000000000000 r27 : 0000000000000000 r28 : e000076000057f80
r29 : 00000000000002e7 r30 : 0000000000000000 r31 : e000076000057780

Call Trace:
 [<a000000100014f60>] show_stack+0x80/0xa0
                                sp=e00007004160f9e0 bsp=e000070041609008
 [<a000000100015bf0>] show_regs+0x870/0x8a0
                                sp=e00007004160fbb0 bsp=e000070041608fa8
 [<a00000010003d170>] die+0x190/0x300
                                sp=e00007004160fbb0 bsp=e000070041608f60
 [<a000000100071bc0>] ia64_do_page_fault+0x780/0xa80
                                sp=e00007004160fbb0 bsp=e000070041608f08
 [<a00000010000b5c0>] ia64_leave_kernel+0x0/0x270
                                sp=e00007004160fc40 bsp=e000070041608f08
 [<a0000001003014e0>] rb_next+0x0/0x140
                                sp=e00007004160fe10 bsp=e000070041608ef8
 [<a000000100078dc0>] __dequeue_entity+0x80/0xc0
                                sp=e00007004160fe10 bsp=e000070041608ec8
 [<a000000100078e60>] pick_next_task_fair+0x60/0x180
                                sp=e00007004160fe10 bsp=e000070041608e98
 [<a0000001006a5880>] schedule+0x340/0x19c0
                                sp=e00007004160fe10 bsp=e000070041608cc0
 [<a000000100014cb0>] cpu_idle+0x290/0x3e0
                                sp=e00007004160fe30 bsp=e000070041608c50
 [<a000000100066020>] start_secondary+0x380/0x5a0
                                sp=e00007004160fe30 bsp=e000070041608c00
 [<a0000001006abca0>] __kprobes_text_end+0x6c0/0x6f0
                                sp=e00007004160fe30 bsp=e000070041608c00


Taking a quick look at [__]{en|de|queue_entity() and the functions they call,
I see something suspicious in set_leftmost() in sched_fair.c:

static inline void
set_leftmost(struct cfs_rq *cfs_rq, struct rb_node *leftmost)
{
        struct sched_entity *se;

        cfs_rq->rb_leftmost = leftmost;
        if (leftmost)
                se = rb_entry(leftmost, struct sched_entity, run_node);
}

Missing code?  corrupt patch?

config available on request, but there doesn't seem to be much in the way
of scheduler config option.  A few that might apply:

SCHED_SMT is not set
SCHED_DEBUG=y
SCHEDSTATS=y


Regards,
Lee Schermerhorn



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: 2.6.23-rc7-mm1:  panic in scheduler
  2007-09-24 21:12 2.6.23-rc7-mm1: panic in scheduler Lee Schermerhorn
@ 2007-09-24 21:30 ` Kamalesh Babulal
  2007-09-25  7:02   ` Balbir Singh
  2007-09-24 22:24 ` Ingo Molnar
  1 sibling, 1 reply; 6+ messages in thread
From: Kamalesh Babulal @ 2007-09-24 21:30 UTC (permalink / raw)
  To: Lee Schermerhorn; +Cc: linux-kernel, Andrew Morton, Ingo Molnar, Peter Zijlstra

Lee Schermerhorn wrote:
> I looked around on the MLs for mention of this, but didn't find anything
> that appeared to match.
> 
> Platform:  HP rx8620 - 16-cpu/32GB/4-node ia64 [Madison]
> 
> 2.6.23-rc7-mm1 broken out -- panic occurs when git-sched.patch pushed:
> 
> Unable to handle kernel NULL pointer dereference (address 0000000000000000)
> swapper[0]: Oops 8813272891392 [1]
> Modules linked in: scsi_wait_scan ehci_hcd ohci_hcd uhci_hcd usbcore
> 
> Pid: 0, CPU 14, comm:              swapper
> psr : 0000101008522030 ifs : 8000000000000002 ip  : [<a0000001003014e0>]    Not tainted
> ip is at rb_next+0x0/0x140
> unat: 0000000000000000 pfs : 0000000000000308 rsc : 0000000000000003
> rnat: 8000000000000012 bsps: 000000000001003e pr  : 6609a840599519a5
> ldrs: 0000000000000000 ccv : 0000000000000002 fpsr: 0009804c8a70433f
> csd : 0000000000000000 ssd : 0000000000000000
> b0  : a000000100078dc0 b6  : a000000100074a40 b7  : a000000100078e00
> f6  : 1003e0000000000000000 f7  : 1003e0000000000400000
> f8  : 1003e000000002aaaaaab f9  : 1003e0000000d43798a2b
> f10 : 1003e35e9970b967dd8b9 f11 : 1003e0000000000000002
> r1  : a000000100bc0920 r2  : e0000760000577f0 r3  : e000076000057f10
> r8  : fffffffffffffff0 r9  : 0000000000000002 r10 : e000076000057780
> r11 : 0000000000000000 r12 : e00007004160fe10 r13 : e000070041608000
> r14 : 0000000000000000 r15 : 000000000000000e r16 : 00000007f6c30a22
> r17 : e000070041608040 r18 : a0000001008383a8 r19 : a000000100078e00
> r20 : e000076000055bb8 r21 : e000076000055bb0 r22 : e000076000057ed0
> r23 : 00000000000f4240 r24 : a0000001009e0440 r25 : e000070041608bb4
> r26 : 0000000000000000 r27 : 0000000000000000 r28 : e000076000057f80
> r29 : 00000000000002e7 r30 : 0000000000000000 r31 : e000076000057780
> 
> Call Trace:
>  [<a000000100014f60>] show_stack+0x80/0xa0
>                                 sp=e00007004160f9e0 bsp=e000070041609008
>  [<a000000100015bf0>] show_regs+0x870/0x8a0
>                                 sp=e00007004160fbb0 bsp=e000070041608fa8
>  [<a00000010003d170>] die+0x190/0x300
>                                 sp=e00007004160fbb0 bsp=e000070041608f60
>  [<a000000100071bc0>] ia64_do_page_fault+0x780/0xa80
>                                 sp=e00007004160fbb0 bsp=e000070041608f08
>  [<a00000010000b5c0>] ia64_leave_kernel+0x0/0x270
>                                 sp=e00007004160fc40 bsp=e000070041608f08
>  [<a0000001003014e0>] rb_next+0x0/0x140
>                                 sp=e00007004160fe10 bsp=e000070041608ef8
>  [<a000000100078dc0>] __dequeue_entity+0x80/0xc0
>                                 sp=e00007004160fe10 bsp=e000070041608ec8
>  [<a000000100078e60>] pick_next_task_fair+0x60/0x180
>                                 sp=e00007004160fe10 bsp=e000070041608e98
>  [<a0000001006a5880>] schedule+0x340/0x19c0
>                                 sp=e00007004160fe10 bsp=e000070041608cc0
>  [<a000000100014cb0>] cpu_idle+0x290/0x3e0
>                                 sp=e00007004160fe30 bsp=e000070041608c50
>  [<a000000100066020>] start_secondary+0x380/0x5a0
>                                 sp=e00007004160fe30 bsp=e000070041608c00
>  [<a0000001006abca0>] __kprobes_text_end+0x6c0/0x6f0
>                                 sp=e00007004160fe30 bsp=e000070041608c00
> 
> 
> Taking a quick look at [__]{en|de|queue_entity() and the functions they call,
> I see something suspicious in set_leftmost() in sched_fair.c:
> 
> static inline void
> set_leftmost(struct cfs_rq *cfs_rq, struct rb_node *leftmost)
> {
>         struct sched_entity *se;
> 
>         cfs_rq->rb_leftmost = leftmost;
>         if (leftmost)
>                 se = rb_entry(leftmost, struct sched_entity, run_node);
> }
> 
> Missing code?  corrupt patch?
> 
> config available on request, but there doesn't seem to be much in the way
> of scheduler config option.  A few that might apply:
> 
> SCHED_SMT is not set
> SCHED_DEBUG=y
> SCHEDSTATS=y
> 
> 
> Regards,
> Lee Schermerhorn
> 

Exactly same call trace is produced over IA64 Madison (up to 9M cache) with 8 cpu's.
-- 
Thanks & Regards,
Kamalesh Babulal,
Linux Technology Center,
IBM, ISTL.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: 2.6.23-rc7-mm1:  panic in scheduler
  2007-09-24 21:12 2.6.23-rc7-mm1: panic in scheduler Lee Schermerhorn
  2007-09-24 21:30 ` Kamalesh Babulal
@ 2007-09-24 22:24 ` Ingo Molnar
  1 sibling, 0 replies; 6+ messages in thread
From: Ingo Molnar @ 2007-09-24 22:24 UTC (permalink / raw)
  To: Lee Schermerhorn
  Cc: linux-kernel, Andrew Morton, Ingo Molnar, Peter Zijlstra,
	Dmitry Adamushko


* Lee Schermerhorn <Lee.Schermerhorn@hp.com> wrote:

> Taking a quick look at [__]{en|de|queue_entity() and the functions 
> they call, I see something suspicious in set_leftmost() in 
> sched_fair.c:
> 
> static inline void
> set_leftmost(struct cfs_rq *cfs_rq, struct rb_node *leftmost)
> {
>         struct sched_entity *se;
> 
>         cfs_rq->rb_leftmost = leftmost;
>         if (leftmost)
>                 se = rb_entry(leftmost, struct sched_entity, run_node);
> }
> 
> Missing code?  corrupt patch?

could you pull this git tree ontop of a -rc7 (or later) upstream tree:

  git-pull git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched-devel.git

does the solve the crash?

the above set_leftmost() code used to be larger and now indeed those 
bits are mostly dead code. I've queued up a clean-up patch for that - 
see the patch below. It should not impact correctness though, so if you 
can still trigger the crash with the latest sched-devel.git tree we'd 
like to know about it.

	Ingo

------------------->
Subject: sched: remove set_leftmost()
From: Ingo Molnar <mingo@elte.hu>

Lee Schermerhorn noticed that set_leftmost() contains dead code,
remove this.

Reported-by: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
 kernel/sched_fair.c |   14 ++------------
 1 file changed, 2 insertions(+), 12 deletions(-)

Index: linux/kernel/sched_fair.c
===================================================================
--- linux.orig/kernel/sched_fair.c
+++ linux/kernel/sched_fair.c
@@ -124,16 +124,6 @@ max_vruntime(u64 min_vruntime, u64 vrunt
 	return min_vruntime;
 }
 
-static inline void
-set_leftmost(struct cfs_rq *cfs_rq, struct rb_node *leftmost)
-{
-	struct sched_entity *se;
-
-	cfs_rq->rb_leftmost = leftmost;
-	if (leftmost)
-		se = rb_entry(leftmost, struct sched_entity, run_node);
-}
-
 static inline s64
 entity_key(struct cfs_rq *cfs_rq, struct sched_entity *se)
 {
@@ -175,7 +165,7 @@ __enqueue_entity(struct cfs_rq *cfs_rq, 
 	 * used):
 	 */
 	if (leftmost)
-		set_leftmost(cfs_rq, &se->run_node);
+		cfs_rq->rb_leftmost = &se->run_node;
 
 	rb_link_node(&se->run_node, parent, link);
 	rb_insert_color(&se->run_node, &cfs_rq->tasks_timeline);
@@ -185,7 +175,7 @@ static void
 __dequeue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se)
 {
 	if (cfs_rq->rb_leftmost == &se->run_node)
-		set_leftmost(cfs_rq, rb_next(&se->run_node));
+		cfs_rq->rb_leftmost = rb_next(&se->run_node);
 
 	rb_erase(&se->run_node, &cfs_rq->tasks_timeline);
 }

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: 2.6.23-rc7-mm1: panic in scheduler
  2007-09-24 21:30 ` Kamalesh Babulal
@ 2007-09-25  7:02   ` Balbir Singh
  2007-09-25  8:02     ` Kamalesh Babulal
  0 siblings, 1 reply; 6+ messages in thread
From: Balbir Singh @ 2007-09-25  7:02 UTC (permalink / raw)
  To: Kamalesh Babulal
  Cc: Lee Schermerhorn, linux-kernel, Andrew Morton, Ingo Molnar,
	Peter Zijlstra

On 9/25/07, Kamalesh Babulal <kamalesh@linux.vnet.ibm.com> wrote:
> Exactly same call trace is produced over IA64 Madison (up to 9M cache) with 8 cpu's.
> --

Hi, Kamalesh,

Could you please reproduce the problem or share the steps to reproduce
the problem?

Thanks,
Balbir

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: 2.6.23-rc7-mm1: panic in scheduler
  2007-09-25  7:02   ` Balbir Singh
@ 2007-09-25  8:02     ` Kamalesh Babulal
  2007-09-25 13:58       ` Lee Schermerhorn
  0 siblings, 1 reply; 6+ messages in thread
From: Kamalesh Babulal @ 2007-09-25  8:02 UTC (permalink / raw)
  To: Balbir Singh
  Cc: Lee Schermerhorn, linux-kernel, Andrew Morton, Ingo Molnar,
	Peter Zijlstra

Balbir Singh wrote:
> On 9/25/07, Kamalesh Babulal <kamalesh@linux.vnet.ibm.com> wrote:
>> Exactly same call trace is produced over IA64 Madison (up to 9M cache) with 8 cpu's.
>> --
> 
> Hi, Kamalesh,
> 
> Could you please reproduce the problem or share the steps to reproduce
> the problem?
> 
> Thanks,
> Balbir
> -

Hi Balbir,

Yes, i am able to reproduce the problem. The problem can be reproduced
using the ltprunall.

-- 
Thanks & Regards,
Kamalesh Babulal,
Linux Technology Center,
IBM, ISTL.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: 2.6.23-rc7-mm1: panic in scheduler
  2007-09-25  8:02     ` Kamalesh Babulal
@ 2007-09-25 13:58       ` Lee Schermerhorn
  0 siblings, 0 replies; 6+ messages in thread
From: Lee Schermerhorn @ 2007-09-25 13:58 UTC (permalink / raw)
  To: Kamalesh Babulal
  Cc: Balbir Singh, linux-kernel, Andrew Morton, Ingo Molnar,
	Peter Zijlstra

On Tue, 2007-09-25 at 13:32 +0530, Kamalesh Babulal wrote:
> Balbir Singh wrote:
> > On 9/25/07, Kamalesh Babulal <kamalesh@linux.vnet.ibm.com> wrote:
> >> Exactly same call trace is produced over IA64 Madison (up to 9M cache) with 8 cpu's.
> >> --
> > 
> > Hi, Kamalesh,
> > 
> > Could you please reproduce the problem or share the steps to reproduce
> > the problem?
> > 
> > Thanks,
> > Balbir
> > -
> 
> Hi Balbir,
> 
> Yes, i am able to reproduce the problem. The problem can be reproduced
> using the ltprunall.
> 

I see the problem just trying to boot.  I have yet to successfully boot
23-rc7-mm1 on my platform.  [But, I'll try Ingo's dev tree real soon
now...]

Lee


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2007-09-25 14:00 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-09-24 21:12 2.6.23-rc7-mm1: panic in scheduler Lee Schermerhorn
2007-09-24 21:30 ` Kamalesh Babulal
2007-09-25  7:02   ` Balbir Singh
2007-09-25  8:02     ` Kamalesh Babulal
2007-09-25 13:58       ` Lee Schermerhorn
2007-09-24 22:24 ` Ingo Molnar

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox