From: Ben Hutchings <benh@debian.org>
To: Frede_Feuerstein@gmx.net, Ingo Molnar <mingo@elte.hu>,
Peter Zijlstra <peterz@infradead.org>
Cc: 603229@bugs.debian.org, LKML <linux-kernel@vger.kernel.org>
Subject: Scheduler grouping failure; division by zero in select_task_rq_fair
Date: Sun, 28 Nov 2010 20:14:26 +0000 [thread overview]
Message-ID: <1290975266.3292.316.camel@localhost> (raw)
In-Reply-To: <1290920436.4255.1025.camel@localhost>
[-- Attachment #1: Type: text/plain, Size: 6027 bytes --]
On Sun, 2010-11-28 at 06:00 +0100, Frede Feuerstein wrote:
[...]
> > The division by zero appears to be a result of getting bad information
> > from the firmware about the groups of processors.
>
> Well, technically a division error always is a result of bad data fed to
> that division. I rather meant, that this is the point to backtrace the
> error.
> Though the bios of the w2100z is known for some problems, the cpus are
> reported correctly by the bios and it is the latest version (R01-B5-S1).
>
> > I realise that this
> > same bad information did not previously result in a crash, but I (and
> > the upstream developers) need to know what that information is before we
> > can understand how this can be avoided.
>
> Are there any means to gather more information ? Tell me and i shall do
> it.
I think this is now enough information.
Ingo, Peter, the output from scheduler domain/group setup was:
[ 0.536554] CPU0 attaching sched-domain:
[ 0.540004] domain 0: span 0-1 level MC
[ 0.548002] groups: 0 1
[ 0.560003] domain 1: span 0-3 level NODE
[ 0.568002] groups:
[ 0.574179] ERROR: domain->cpu_power not set
[ 0.576002]
[ 0.580002] ERROR: groups don't span domain->span
[ 0.584004] CPU1 attaching sched-domain:
[ 0.588007] domain 0: span 0-1 level MC
[ 0.596002] groups: 1 0 (cpu_power = 1023)
[ 0.612002] ERROR: parent span is not a superset of domain->span
[ 0.616003] domain 1: span 1-3 level CPU
[ 0.624002] groups: 1 (cpu_power = 2048) 2-3 (cpu_power = 2048)
[ 0.644003] domain 2: span 0-3 level NODE
[ 0.652004] groups: 1-3 (cpu_power = 4096)
[ 0.668002] ERROR: domain->cpu_power not set
[ 0.672002]
[ 0.676002] ERROR: groups don't span domain->span
[ 0.680004] CPU2 attaching sched-domain:
[ 0.684003] domain 0: span 2-3 level MC
[ 0.692003] groups: 2 3
[ 0.704003] domain 1: span 1-3 level CPU
[ 0.712003] groups: 2-3 (cpu_power = 2048) 1 (cpu_power = 2048)
[ 0.736003] domain 2: span 0-3 level NODE
[ 0.744003] groups: 1-3 (cpu_power = 4096)
[ 0.760003] ERROR: domain->cpu_power not set
[ 0.764003]
[ 0.768003] ERROR: groups don't span domain->span
[ 0.772004] CPU3 attaching sched-domain:
[ 0.776003] domain 0: span 2-3 level MC
[ 0.784003] groups: 3 2
[ 0.794183] domain 1: span 1-3 level CPU
[ 0.800003] groups: 2-3 (cpu_power = 2048) 1 (cpu_power = 2048)
[ 0.822183] domain 2: span 0-3 level NODE
[ 0.828003] groups: 1-3 (cpu_power = 4096)
[ 0.842180] ERROR: domain->cpu_power not set
[ 0.844003]
[ 0.848003] ERROR: groups don't span domain->span
and the oops is:
[ 0.852154] divide error: 0000 [#1] SMP
[ 0.856002] last sysfs file:
[ 0.856002] CPU 1
[ 0.856002] Modules linked in:
[ 0.856002] Pid: 2, comm: kthreadd Not tainted 2.6.32-5-amd64 #1 W1100z/2100z
[ 0.856002] RIP: 0010:[<ffffffff810416e9>] [<ffffffff810416e9>] select_task_rq_fair+0x665/0 x800
[ 0.856002] RSP: 0018:ffff88003fdb7c90 EFLAGS: 00010046
[ 0.856002] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
[ 0.856002] RDX: 0000000000000000 RSI: 0000000000000200 RDI: 0000000000000200
[ 0.856002] RBP: ffff88004120fd50 R08: 0000000000000000 R09: ffff88007f98f0b0
[ 0.856002] R10: 0000000000000000 R11: 00000000000252d0 R12: ffff88007f98f060
[ 0.856002] R13: ffff88007f98f070 R14: ffffffffffffffff R15: 0000000000015780
[ 0.856002] FS: 0000000000000000(0000) GS:ffff880041200000(0000) knlGS:0000000000000000
[ 0.856002] CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
[ 0.856002] CR2: 0000000000000000 CR3: 0000000001001000 CR4: 00000000000006e0
[ 0.856002] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 0.856002] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 0.856002] Process kthreadd (pid: 2, threadinfo ffff88003fdb6000, task ffff88003fdc8710)
[ 0.856002] Stack:
[ 0.856002] 0000000000015780 0000000000015780 0000000000015780 0000000000015780
[ 0.856002] <0> 0000000000015780 0000000000015788 0000000000015788 ffffffff8146c260
[ 0.856002] <0> 0000000800000000 ffff88007f9b0000 ffff880041215780 0000000081317f88
[ 0.856002] Call Trace:
[ 0.856002] [<ffffffff8104d2b2>] ? copy_process+0x1007/0x115f
[ 0.856002] [<ffffffff810475f4>] ? select_task_rq+0xb/0x3e
[ 0.856002] [<ffffffff8104b53b>] ? wake_up_new_task+0x35/0xf6
[ 0.856002] [<ffffffff8104d65e>] ? do_fork+0x254/0x31e
[ 0.856002] [<ffffffff81041aa9>] ? pick_next_task_fair+0xca/0xd6
[ 0.856002] [<ffffffff8104802b>] ? finish_task_switch+0x3a/0xaf
[ 0.856002] [<ffffffff81011b42>] ? kernel_thread+0x82/0xe0
[ 0.856002] [<ffffffff810648c8>] ? kthread+0x0/0x81
[ 0.856002] [<ffffffff81011ba0>] ? child_rip+0x0/0x20
[ 0.856002] [<ffffffff8106488d>] ? kthreadd+0xb1/0xec
[ 0.856002] [<ffffffff814f3140>] ? early_idt_handler+0x0/0x71
[ 0.856002] [<ffffffff81011baa>] ? child_rip+0xa/0x20
[ 0.856002] [<ffffffff814f3140>] ? early_idt_handler+0x0/0x71
[ 0.856002] [<ffffffff810dfda5>] ? do_set_mempolicy+0x128/0x13a
[ 0.856002] [<ffffffff810647dc>] ? kthreadd+0x0/0xec
[ 0.856002] [<ffffffff81011ba0>] ? child_rip+0x0/0x20
[ 0.856002] Code: 00 02 00 00 4c 89 ef 48 63 d2 e8 0f c6 14 00 3b 05 ad 33 49 00 89 c2 0f 8c 6f ff ff ff 41 8b 4c 24 08 48 c1 e3 0a 31 d2 48 89 d8 <48> f7 f1 83 bc 24 a8 00 00 00 00 48 89 c1 75 22 4c 39 f0 73 15
[ 0.856002] RIP [<ffffffff810416e9>] select_task_rq_fair+0x665/0x800
[ 0.856002] RSP <ffff88003fdb7c90>
[ 0.856002] ---[ end trace a22d306b065d4a66 ]---
There's more information in the bug log at <http://bugs.debian.org/603229>.
If you think this has been fixed since 2.6.32 (I didn't see any relevant
changes) then we have a package of 2.6.36 which Frede can test.
Ben.
--
Ben Hutchings, Debian Developer and kernel team member
[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 828 bytes --]
next parent reply other threads:[~2010-11-28 20:14 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <1290449310.3868.13.camel@localhost>
[not found] ` <1290470134.6770.929.camel@localhost>
[not found] ` <1290514638.3892.16.camel@localhost>
[not found] ` <1290900814.3292.84.camel@localhost>
[not found] ` <1290920436.4255.1025.camel@localhost>
2010-11-28 20:14 ` Ben Hutchings [this message]
2010-11-29 11:50 ` Scheduler grouping failure; division by zero in select_task_rq_fair Peter Zijlstra
2010-11-29 13:58 ` Frede Feuerstein
2010-11-29 16:26 ` Ben Hutchings
2010-11-29 18:06 ` Ingo Molnar
2010-12-14 10:07 ` Frede Feuerstein
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1290975266.3292.316.camel@localhost \
--to=benh@debian.org \
--cc=603229@bugs.debian.org \
--cc=Frede_Feuerstein@gmx.net \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@elte.hu \
--cc=peterz@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.