All of lore.kernel.org
 help / color / mirror / Atom feed
* 2048 CPU system panic while sitting idle.
@ 2012-06-04 10:18 Robin Holt
  0 siblings, 0 replies; only message in thread
From: Robin Holt @ 2012-06-04 10:18 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar; +Cc: linux-kernel

I had a 1024 core / 2048 thread system which had been running aim7 more
than an hour before sitting idle.

###30286.149797 (28156.419861)| BUG: unable to handle kernel NULL pointer dereference at 000000000000008d
   30286.169399 (    0.019602)| IP: [<ffffffff8105f071>] load_balance+0xb0/0xfa2
   30286.169515 (    0.000116)| PGD 0
   30286.169547 (    0.000032)| Oops: 0002 [#1] SMP
   30286.169604 (    0.000057)| xpc : all partitions have deactivated
   30286.179717 (    0.010113)| CPU 1246
   30286.179763 (    0.000046)| Modules linked in:
   30286.179812 (    0.000049)|
   30286.179830 (    0.000018)| Pid: 0, comm: swapper/1246 Not tainted 3.4.0-holt-09547-gfb21aff-dirty #26 Intel Corp. Stoutland Platform
   30286.189405 (    0.009575)| RIP: 0010:[<ffffffff8105f071>]  [<ffffffff8105f071>] load_balance+0xb0/0xfa2
   30286.199995 (    0.010590)| RSP: 0018:ffff8b5ffedc3c10  EFLAGS: 00010206
   30286.200113 (    0.000118)| RAX: 00000000000004de RBX: ffff8b5ff8bce400 RCX: 0000000000000012
   30286.200260 (    0.000147)| RDX: ffff8b5ffedd1480 RSI: ffffffff81a7fe7e RDI: ffff88207daabcee
   30286.209437 (    0.009177)| RBP: ffff8b5ffedc3e50 R08: ffff8b5ffedc3e84 R09: ffff8b5ffedc3e38
   30286.219981 (    0.010544)| R10: ffff88203f12be58 R11: 0000000000000010 R12: 0000000000000000
   30286.220134 (    0.000153)| R13: 000000010088ccc1 R14: 0000000000000000 R15: ffff8b5ff8bce400
   30286.229418 (    0.009284)| FS:  0000000000000000(0000) GS:ffff8b5ffedc0000(0000) knlGS:0000000000000000
   30286.239960 (    0.010542)| CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
   30286.249945 (    0.009985)| CR2: 000000000000008d CR3: 0000000001a0b000 CR4: 00000000000007e0
   30286.250090 (    0.000145)| DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
   30286.259423 (    0.009333)| DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
   30286.259562 (    0.000139)| Process swapper/1246 (pid: 0, threadinfo ffff88203f12a000, task ffff88203f128280)
   30286.270024 (    0.010462)| Stack:
   30286.270176 (    0.000152)|  ffff8b5ffedc3c20 0000000000011480 0000000000011480 0000000000011480
   30286.270335 (    0.000159)|  0000000000011478 fffffffffffffff8 ffff8b5ffedc3c40 ffff8b5ffedc3c40
   30286.309938 (    0.039603)|  ffff8b5ffedc3c70 ffffffff81008f64 00000000000004de ffff8b5ffedd1e80
   30286.310090 (    0.000152)| Call Trace:
   30286.310128 (    0.000038)|  <IRQ>
   30286.310157 (    0.000029)|  [<ffffffff81008f64>] ? native_sched_clock+0x40/0x8b
   30286.310262 (    0.000105)|  [<ffffffff81008fc6>] ? sched_clock+0x17/0x1b
   30286.310354 (    0.000092)|  [<ffffffff8105ecea>] ? enqueue_task_fair+0x2a8/0x3f8
   30286.310461 (    0.000107)|  [<ffffffff8105bad0>] ? wake_up_process+0x10/0x12
   30286.310560 (    0.000099)|  [<ffffffff81008fc6>] ? sched_clock+0x17/0x1b
   30286.319949 (    0.009389)|  [<ffffffff81060049>] rebalance_domains+0xe6/0x156
   30286.330017 (    0.010068)|  [<ffffffff810603e3>] run_rebalance_domains+0x47/0x164
   30286.330184 (    0.000167)|  [<ffffffff8103cf84>] __do_softirq+0x9a/0x147
   30286.330282 (    0.000098)|  [<ffffffff81473a4c>] call_softirq+0x1c/0x30
   30286.339961 (    0.009679)|  [<ffffffff81004489>] do_softirq+0x61/0xbf
   30286.340132 (    0.000171)|  [<ffffffff8103ccd4>] irq_exit+0x43/0xb0
   30286.349978 (    0.009846)|  [<ffffffff8101da3a>] smp_apic_timer_interrupt+0x86/0x94
   30286.350177 (    0.000199)|  [<ffffffff814730fa>] apic_timer_interrupt+0x6a/0x70
   30286.360010 (    0.009833)|  <EOI>
   30286.360093 (    0.000083)|  [<ffffffff8105c604>] ? sched_clock_cpu+0xd3/0xde
   

Disassembly of section .text:

00000000000026f1 <load_balance>:
load_balance():
/data/lwork/gulag1b/holt/nate-test/uv/linux-2.6/linux/kernel/sched/fair.c:4216
    26f1:       55                      push   %rbp
...
load_balance():
/data/lwork/gulag1b/holt/nate-test/uv/linux-2.6/linux/kernel/sched/fair.c:4229
    277b:       89 8d 68 ff ff ff       mov    %ecx,-0x98(%rbp)
bitmap_copy():
/data/lwork/gulag1b/holt/nate-test/uv/linux-2.6/linux/include/linux/bitmap.h:186
    2781:       48 63 0d 00 00 00 00    movslq 0x0(%rip),%rcx        # 2788 <load_balance+0x97>
load_balance():
/data/lwork/gulag1b/holt/nate-test/uv/linux-2.6/linux/kernel/sched/fair.c:4229
    2788:       89 85 58 ff ff ff       mov    %eax,-0xa8(%rbp)
    278e:       48 89 95 60 ff ff ff    mov    %rdx,-0xa0(%rbp)
bitmap_copy():
/data/lwork/gulag1b/holt/nate-test/uv/linux-2.6/linux/include/linux/bitmap.h:186
    2795:       48 83 c1 3f             add    $0x3f,%rcx
    2799:       48 c1 f9 03             sar    $0x3,%rcx
    279d:       48 83 e1 f8             and    $0xfffffffffffffff8,%rcx
    27a1:       f3 a4                   rep movsb %ds:(%rsi),%es:(%rdi)
update_sg_lb_stats():
/data/lwork/gulag1b/holt/nate-test/uv/linux-2.6/linux/kernel/sched/fair.c:3642
    27a3:       48 c7 c7 00 00 00 00    mov    $0x0,%rdi
load_balance():
/data/lwork/gulag1b/holt/nate-test/uv/linux-2.6/linux/kernel/sched/fair.c:4233
    27aa:       8b 85 4c fe ff ff       mov    -0x1b4(%rbp),%eax
    27b0:       41 ff 44 87 70          incl   0x70(%r15,%rax,4)
update_sg_lb_stats():
/data/lwork/gulag1b/holt/nate-test/uv/linux-2.6/linux/kernel/sched/fair.c:3642
    27b5:       48 89 bd d8 fd ff ff    mov    %rdi,-0x228(%rbp)
target_load():


fair.c:
4213 static int load_balance(int this_cpu, struct rq *this_rq,
4214                         struct sched_domain *sd, enum cpu_idle_type idle,
4215                         int *balance)
4216 {
4217         int ld_moved, active_balance = 0;
4218         struct sched_group *group;
4219         struct rq *busiest;
4220         unsigned long flags;
4221         struct cpumask *cpus = __get_cpu_var(load_balance_tmpmask);
4222 
4223         struct lb_env env = {
4224                 .sd             = sd,
4225                 .dst_cpu        = this_cpu,
4226                 .dst_rq         = this_rq,
4227                 .idle           = idle,
4228                 .loop_break     = sched_nr_migrate_break,
4229         };
4230 
4231         cpumask_copy(cpus, cpu_active_mask);
4232 
4233         schedstat_inc(sd, lb_count[idle]);


I am just rushing out for the day and wanted to report this problem
before going.

My quick glance at it did not make any sense so I really have nothing
more to contribute at the time.

Thanks,
Robin

^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2012-06-04 10:18 UTC | newest]

Thread overview: (only message) (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-06-04 10:18 2048 CPU system panic while sitting idle Robin Holt

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.