* 2.6.23-rc7-mm1: panic in scheduler
@ 2007-09-24 21:12 Lee Schermerhorn
2007-09-24 21:30 ` Kamalesh Babulal
2007-09-24 22:24 ` Ingo Molnar
0 siblings, 2 replies; 6+ messages in thread
From: Lee Schermerhorn @ 2007-09-24 21:12 UTC (permalink / raw)
To: linux-kernel; +Cc: Andrew Morton, Ingo Molnar, Peter Zijlstra
I looked around on the MLs for mention of this, but didn't find anything
that appeared to match.
Platform: HP rx8620 - 16-cpu/32GB/4-node ia64 [Madison]
2.6.23-rc7-mm1 broken out -- panic occurs when git-sched.patch pushed:
Unable to handle kernel NULL pointer dereference (address 0000000000000000)
swapper[0]: Oops 8813272891392 [1]
Modules linked in: scsi_wait_scan ehci_hcd ohci_hcd uhci_hcd usbcore
Pid: 0, CPU 14, comm: swapper
psr : 0000101008522030 ifs : 8000000000000002 ip : [<a0000001003014e0>] Not tainted
ip is at rb_next+0x0/0x140
unat: 0000000000000000 pfs : 0000000000000308 rsc : 0000000000000003
rnat: 8000000000000012 bsps: 000000000001003e pr : 6609a840599519a5
ldrs: 0000000000000000 ccv : 0000000000000002 fpsr: 0009804c8a70433f
csd : 0000000000000000 ssd : 0000000000000000
b0 : a000000100078dc0 b6 : a000000100074a40 b7 : a000000100078e00
f6 : 1003e0000000000000000 f7 : 1003e0000000000400000
f8 : 1003e000000002aaaaaab f9 : 1003e0000000d43798a2b
f10 : 1003e35e9970b967dd8b9 f11 : 1003e0000000000000002
r1 : a000000100bc0920 r2 : e0000760000577f0 r3 : e000076000057f10
r8 : fffffffffffffff0 r9 : 0000000000000002 r10 : e000076000057780
r11 : 0000000000000000 r12 : e00007004160fe10 r13 : e000070041608000
r14 : 0000000000000000 r15 : 000000000000000e r16 : 00000007f6c30a22
r17 : e000070041608040 r18 : a0000001008383a8 r19 : a000000100078e00
r20 : e000076000055bb8 r21 : e000076000055bb0 r22 : e000076000057ed0
r23 : 00000000000f4240 r24 : a0000001009e0440 r25 : e000070041608bb4
r26 : 0000000000000000 r27 : 0000000000000000 r28 : e000076000057f80
r29 : 00000000000002e7 r30 : 0000000000000000 r31 : e000076000057780
Call Trace:
[<a000000100014f60>] show_stack+0x80/0xa0
sp=e00007004160f9e0 bsp=e000070041609008
[<a000000100015bf0>] show_regs+0x870/0x8a0
sp=e00007004160fbb0 bsp=e000070041608fa8
[<a00000010003d170>] die+0x190/0x300
sp=e00007004160fbb0 bsp=e000070041608f60
[<a000000100071bc0>] ia64_do_page_fault+0x780/0xa80
sp=e00007004160fbb0 bsp=e000070041608f08
[<a00000010000b5c0>] ia64_leave_kernel+0x0/0x270
sp=e00007004160fc40 bsp=e000070041608f08
[<a0000001003014e0>] rb_next+0x0/0x140
sp=e00007004160fe10 bsp=e000070041608ef8
[<a000000100078dc0>] __dequeue_entity+0x80/0xc0
sp=e00007004160fe10 bsp=e000070041608ec8
[<a000000100078e60>] pick_next_task_fair+0x60/0x180
sp=e00007004160fe10 bsp=e000070041608e98
[<a0000001006a5880>] schedule+0x340/0x19c0
sp=e00007004160fe10 bsp=e000070041608cc0
[<a000000100014cb0>] cpu_idle+0x290/0x3e0
sp=e00007004160fe30 bsp=e000070041608c50
[<a000000100066020>] start_secondary+0x380/0x5a0
sp=e00007004160fe30 bsp=e000070041608c00
[<a0000001006abca0>] __kprobes_text_end+0x6c0/0x6f0
sp=e00007004160fe30 bsp=e000070041608c00
Taking a quick look at [__]{en|de|queue_entity() and the functions they call,
I see something suspicious in set_leftmost() in sched_fair.c:
static inline void
set_leftmost(struct cfs_rq *cfs_rq, struct rb_node *leftmost)
{
struct sched_entity *se;
cfs_rq->rb_leftmost = leftmost;
if (leftmost)
se = rb_entry(leftmost, struct sched_entity, run_node);
}
Missing code? corrupt patch?
config available on request, but there doesn't seem to be much in the way
of scheduler config option. A few that might apply:
SCHED_SMT is not set
SCHED_DEBUG=y
SCHEDSTATS=y
Regards,
Lee Schermerhorn
^ permalink raw reply [flat|nested] 6+ messages in thread* Re: 2.6.23-rc7-mm1: panic in scheduler 2007-09-24 21:12 2.6.23-rc7-mm1: panic in scheduler Lee Schermerhorn @ 2007-09-24 21:30 ` Kamalesh Babulal 2007-09-25 7:02 ` Balbir Singh 2007-09-24 22:24 ` Ingo Molnar 1 sibling, 1 reply; 6+ messages in thread From: Kamalesh Babulal @ 2007-09-24 21:30 UTC (permalink / raw) To: Lee Schermerhorn; +Cc: linux-kernel, Andrew Morton, Ingo Molnar, Peter Zijlstra Lee Schermerhorn wrote: > I looked around on the MLs for mention of this, but didn't find anything > that appeared to match. > > Platform: HP rx8620 - 16-cpu/32GB/4-node ia64 [Madison] > > 2.6.23-rc7-mm1 broken out -- panic occurs when git-sched.patch pushed: > > Unable to handle kernel NULL pointer dereference (address 0000000000000000) > swapper[0]: Oops 8813272891392 [1] > Modules linked in: scsi_wait_scan ehci_hcd ohci_hcd uhci_hcd usbcore > > Pid: 0, CPU 14, comm: swapper > psr : 0000101008522030 ifs : 8000000000000002 ip : [<a0000001003014e0>] Not tainted > ip is at rb_next+0x0/0x140 > unat: 0000000000000000 pfs : 0000000000000308 rsc : 0000000000000003 > rnat: 8000000000000012 bsps: 000000000001003e pr : 6609a840599519a5 > ldrs: 0000000000000000 ccv : 0000000000000002 fpsr: 0009804c8a70433f > csd : 0000000000000000 ssd : 0000000000000000 > b0 : a000000100078dc0 b6 : a000000100074a40 b7 : a000000100078e00 > f6 : 1003e0000000000000000 f7 : 1003e0000000000400000 > f8 : 1003e000000002aaaaaab f9 : 1003e0000000d43798a2b > f10 : 1003e35e9970b967dd8b9 f11 : 1003e0000000000000002 > r1 : a000000100bc0920 r2 : e0000760000577f0 r3 : e000076000057f10 > r8 : fffffffffffffff0 r9 : 0000000000000002 r10 : e000076000057780 > r11 : 0000000000000000 r12 : e00007004160fe10 r13 : e000070041608000 > r14 : 0000000000000000 r15 : 000000000000000e r16 : 00000007f6c30a22 > r17 : e000070041608040 r18 : a0000001008383a8 r19 : a000000100078e00 > r20 : e000076000055bb8 r21 : e000076000055bb0 r22 : e000076000057ed0 > r23 : 00000000000f4240 r24 : a0000001009e0440 r25 : e000070041608bb4 > r26 : 0000000000000000 r27 : 0000000000000000 r28 : e000076000057f80 > r29 : 00000000000002e7 r30 : 0000000000000000 r31 : e000076000057780 > > Call Trace: > [<a000000100014f60>] show_stack+0x80/0xa0 > sp=e00007004160f9e0 bsp=e000070041609008 > [<a000000100015bf0>] show_regs+0x870/0x8a0 > sp=e00007004160fbb0 bsp=e000070041608fa8 > [<a00000010003d170>] die+0x190/0x300 > sp=e00007004160fbb0 bsp=e000070041608f60 > [<a000000100071bc0>] ia64_do_page_fault+0x780/0xa80 > sp=e00007004160fbb0 bsp=e000070041608f08 > [<a00000010000b5c0>] ia64_leave_kernel+0x0/0x270 > sp=e00007004160fc40 bsp=e000070041608f08 > [<a0000001003014e0>] rb_next+0x0/0x140 > sp=e00007004160fe10 bsp=e000070041608ef8 > [<a000000100078dc0>] __dequeue_entity+0x80/0xc0 > sp=e00007004160fe10 bsp=e000070041608ec8 > [<a000000100078e60>] pick_next_task_fair+0x60/0x180 > sp=e00007004160fe10 bsp=e000070041608e98 > [<a0000001006a5880>] schedule+0x340/0x19c0 > sp=e00007004160fe10 bsp=e000070041608cc0 > [<a000000100014cb0>] cpu_idle+0x290/0x3e0 > sp=e00007004160fe30 bsp=e000070041608c50 > [<a000000100066020>] start_secondary+0x380/0x5a0 > sp=e00007004160fe30 bsp=e000070041608c00 > [<a0000001006abca0>] __kprobes_text_end+0x6c0/0x6f0 > sp=e00007004160fe30 bsp=e000070041608c00 > > > Taking a quick look at [__]{en|de|queue_entity() and the functions they call, > I see something suspicious in set_leftmost() in sched_fair.c: > > static inline void > set_leftmost(struct cfs_rq *cfs_rq, struct rb_node *leftmost) > { > struct sched_entity *se; > > cfs_rq->rb_leftmost = leftmost; > if (leftmost) > se = rb_entry(leftmost, struct sched_entity, run_node); > } > > Missing code? corrupt patch? > > config available on request, but there doesn't seem to be much in the way > of scheduler config option. A few that might apply: > > SCHED_SMT is not set > SCHED_DEBUG=y > SCHEDSTATS=y > > > Regards, > Lee Schermerhorn > Exactly same call trace is produced over IA64 Madison (up to 9M cache) with 8 cpu's. -- Thanks & Regards, Kamalesh Babulal, Linux Technology Center, IBM, ISTL. ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: 2.6.23-rc7-mm1: panic in scheduler 2007-09-24 21:30 ` Kamalesh Babulal @ 2007-09-25 7:02 ` Balbir Singh 2007-09-25 8:02 ` Kamalesh Babulal 0 siblings, 1 reply; 6+ messages in thread From: Balbir Singh @ 2007-09-25 7:02 UTC (permalink / raw) To: Kamalesh Babulal Cc: Lee Schermerhorn, linux-kernel, Andrew Morton, Ingo Molnar, Peter Zijlstra On 9/25/07, Kamalesh Babulal <kamalesh@linux.vnet.ibm.com> wrote: > Exactly same call trace is produced over IA64 Madison (up to 9M cache) with 8 cpu's. > -- Hi, Kamalesh, Could you please reproduce the problem or share the steps to reproduce the problem? Thanks, Balbir ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: 2.6.23-rc7-mm1: panic in scheduler 2007-09-25 7:02 ` Balbir Singh @ 2007-09-25 8:02 ` Kamalesh Babulal 2007-09-25 13:58 ` Lee Schermerhorn 0 siblings, 1 reply; 6+ messages in thread From: Kamalesh Babulal @ 2007-09-25 8:02 UTC (permalink / raw) To: Balbir Singh Cc: Lee Schermerhorn, linux-kernel, Andrew Morton, Ingo Molnar, Peter Zijlstra Balbir Singh wrote: > On 9/25/07, Kamalesh Babulal <kamalesh@linux.vnet.ibm.com> wrote: >> Exactly same call trace is produced over IA64 Madison (up to 9M cache) with 8 cpu's. >> -- > > Hi, Kamalesh, > > Could you please reproduce the problem or share the steps to reproduce > the problem? > > Thanks, > Balbir > - Hi Balbir, Yes, i am able to reproduce the problem. The problem can be reproduced using the ltprunall. -- Thanks & Regards, Kamalesh Babulal, Linux Technology Center, IBM, ISTL. ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: 2.6.23-rc7-mm1: panic in scheduler 2007-09-25 8:02 ` Kamalesh Babulal @ 2007-09-25 13:58 ` Lee Schermerhorn 0 siblings, 0 replies; 6+ messages in thread From: Lee Schermerhorn @ 2007-09-25 13:58 UTC (permalink / raw) To: Kamalesh Babulal Cc: Balbir Singh, linux-kernel, Andrew Morton, Ingo Molnar, Peter Zijlstra On Tue, 2007-09-25 at 13:32 +0530, Kamalesh Babulal wrote: > Balbir Singh wrote: > > On 9/25/07, Kamalesh Babulal <kamalesh@linux.vnet.ibm.com> wrote: > >> Exactly same call trace is produced over IA64 Madison (up to 9M cache) with 8 cpu's. > >> -- > > > > Hi, Kamalesh, > > > > Could you please reproduce the problem or share the steps to reproduce > > the problem? > > > > Thanks, > > Balbir > > - > > Hi Balbir, > > Yes, i am able to reproduce the problem. The problem can be reproduced > using the ltprunall. > I see the problem just trying to boot. I have yet to successfully boot 23-rc7-mm1 on my platform. [But, I'll try Ingo's dev tree real soon now...] Lee ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: 2.6.23-rc7-mm1: panic in scheduler 2007-09-24 21:12 2.6.23-rc7-mm1: panic in scheduler Lee Schermerhorn 2007-09-24 21:30 ` Kamalesh Babulal @ 2007-09-24 22:24 ` Ingo Molnar 1 sibling, 0 replies; 6+ messages in thread From: Ingo Molnar @ 2007-09-24 22:24 UTC (permalink / raw) To: Lee Schermerhorn Cc: linux-kernel, Andrew Morton, Ingo Molnar, Peter Zijlstra, Dmitry Adamushko * Lee Schermerhorn <Lee.Schermerhorn@hp.com> wrote: > Taking a quick look at [__]{en|de|queue_entity() and the functions > they call, I see something suspicious in set_leftmost() in > sched_fair.c: > > static inline void > set_leftmost(struct cfs_rq *cfs_rq, struct rb_node *leftmost) > { > struct sched_entity *se; > > cfs_rq->rb_leftmost = leftmost; > if (leftmost) > se = rb_entry(leftmost, struct sched_entity, run_node); > } > > Missing code? corrupt patch? could you pull this git tree ontop of a -rc7 (or later) upstream tree: git-pull git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched-devel.git does the solve the crash? the above set_leftmost() code used to be larger and now indeed those bits are mostly dead code. I've queued up a clean-up patch for that - see the patch below. It should not impact correctness though, so if you can still trigger the crash with the latest sched-devel.git tree we'd like to know about it. Ingo -------------------> Subject: sched: remove set_leftmost() From: Ingo Molnar <mingo@elte.hu> Lee Schermerhorn noticed that set_leftmost() contains dead code, remove this. Reported-by: Lee Schermerhorn <Lee.Schermerhorn@hp.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> --- kernel/sched_fair.c | 14 ++------------ 1 file changed, 2 insertions(+), 12 deletions(-) Index: linux/kernel/sched_fair.c =================================================================== --- linux.orig/kernel/sched_fair.c +++ linux/kernel/sched_fair.c @@ -124,16 +124,6 @@ max_vruntime(u64 min_vruntime, u64 vrunt return min_vruntime; } -static inline void -set_leftmost(struct cfs_rq *cfs_rq, struct rb_node *leftmost) -{ - struct sched_entity *se; - - cfs_rq->rb_leftmost = leftmost; - if (leftmost) - se = rb_entry(leftmost, struct sched_entity, run_node); -} - static inline s64 entity_key(struct cfs_rq *cfs_rq, struct sched_entity *se) { @@ -175,7 +165,7 @@ __enqueue_entity(struct cfs_rq *cfs_rq, * used): */ if (leftmost) - set_leftmost(cfs_rq, &se->run_node); + cfs_rq->rb_leftmost = &se->run_node; rb_link_node(&se->run_node, parent, link); rb_insert_color(&se->run_node, &cfs_rq->tasks_timeline); @@ -185,7 +175,7 @@ static void __dequeue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se) { if (cfs_rq->rb_leftmost == &se->run_node) - set_leftmost(cfs_rq, rb_next(&se->run_node)); + cfs_rq->rb_leftmost = rb_next(&se->run_node); rb_erase(&se->run_node, &cfs_rq->tasks_timeline); } ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2007-09-25 14:00 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2007-09-24 21:12 2.6.23-rc7-mm1: panic in scheduler Lee Schermerhorn 2007-09-24 21:30 ` Kamalesh Babulal 2007-09-25 7:02 ` Balbir Singh 2007-09-25 8:02 ` Kamalesh Babulal 2007-09-25 13:58 ` Lee Schermerhorn 2007-09-24 22:24 ` Ingo Molnar
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox