* Re: 2.6.2-rc2-bk1 oopses on boot (ACPI patch) [not found] <BF1FE1855350A0479097B3A0D2A80EE0020AE8AD@hdsmsx402.hd.intel.com> @ 2004-01-29 23:31 ` Len Brown 2004-01-30 0:37 ` Alessandro Suardi 0 siblings, 1 reply; 12+ messages in thread From: Len Brown @ 2004-01-29 23:31 UTC (permalink / raw) To: Alessandro Suardi Cc: Matt Domsch, Dmitry Torokhov, Andrew Morton, linux-kernel, linux-acpi Alessandro, Looks like you've identifed a regression, probably in ACPI. Please test the 1st patch attached to this bug report http://bugzilla.kernel.org/show_bug.cgi?id=1766 If it doesn't address the problem, please file an additional bug report per below. thanks, -Len ps. The divide-by zero symptom should be addressed by Dominik's update, now in the ACPI tree and thus the next -mm patch. pps. How to file a bug against ACPI: http://bugzilla.kernel.org/ Category: Power Management, Component: ACPI Please attach dmesg -s40000 output (or serial console log if dmesg unavailable) Please attach the output from acpidmp, available in /usr/sbin/, or in pmtools: http://ftp.kernel.org/pub/linux/kernel/people/lenb/acpi/utils/ On Wed, 2004-01-28 at 17:32, Alessandro Suardi wrote: > Matt Domsch wrote: > > On Tue, Jan 27, 2004 at 11:37:55PM -0500, Dmitry Torokhov wrote: > > > >>>Divide by zero. Looks like ACPI is now passing bad values into the > >>>frequency change notifier. ... > , I'd like to remind that this works > perfectly prior to the 20031203 ACPI patch. Indeed, this is what > 2.6.1 vanilla says in that area: > > cpufreq: CPU0 - ACPI performance management activated. > cpufreq: *P0: 1800 MHz, 0 mW, 250 uS > cpufreq: P1: 1200 MHz, 0 mW, 250 uS > > Attaching the gzipped dmesg for my 2.6.1 boot - let me know if > you want anyway dmidecode output and DSDT; for this latter I'll > have to ask for instructions (or is the output of a simple > 'cat /proc/acpi/dsdt' enough ?). > > --alessandro > > "Two rivers run too deep > The seasons change and so do I" > (U2, "Indian Summer Sky") > > ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: 2.6.2-rc2-bk1 oopses on boot (ACPI patch) 2004-01-29 23:31 ` 2.6.2-rc2-bk1 oopses on boot (ACPI patch) Len Brown @ 2004-01-30 0:37 ` Alessandro Suardi 0 siblings, 0 replies; 12+ messages in thread From: Alessandro Suardi @ 2004-01-30 0:37 UTC (permalink / raw) To: Len Brown Cc: Matt Domsch, Dmitry Torokhov, Andrew Morton, linux-kernel, linux-acpi Len Brown wrote: > Alessandro, > Looks like you've identifed a regression, probably in ACPI. > > Please test the 1st patch attached to this bug report > http://bugzilla.kernel.org/show_bug.cgi?id=1766 The patch you mention fixes my problem - tested over 2.6.2-rc2-bk3. > If it doesn't address the problem, please file an additional bug report > per below. Thanks for the instructions, I really appreciate. Keep up the great work ! Ciao, --alessandro "Two rivers run too deep The seasons change and so do I" (U2, "Indian Summer Sky") ^ permalink raw reply [flat|nested] 12+ messages in thread
* 2.6.2-rc2-bk1 oopses on boot (ACPI patch)
@ 2004-01-28 2:15 Alessandro Suardi
2004-01-28 2:42 ` Andrew Morton
2004-01-28 3:06 ` Linus Torvalds
0 siblings, 2 replies; 12+ messages in thread
From: Alessandro Suardi @ 2004-01-28 2:15 UTC (permalink / raw)
To: linux-kernel, linux-acpi; +Cc: Andrew Morton
Already reported, but I'll do so once again, since it looks like
in a short while I won't be able to boot official kernels in my
current config...
Original report here:
http://www.ussg.iu.edu/hypermail/linux/kernel/0312.3/0442.html
Please advise whether I should give up cpufreq for now - I really
don't want to bang my head against a wall.
Thanks in advance,
--alessandro
"Two rivers run too deep
The seasons change and so do I"
(U2, "Indian Summer Sky")
^ permalink raw reply [flat|nested] 12+ messages in thread* Re: 2.6.2-rc2-bk1 oopses on boot (ACPI patch) 2004-01-28 2:15 Alessandro Suardi @ 2004-01-28 2:42 ` Andrew Morton 2004-01-28 3:10 ` Linus Torvalds 2004-01-28 4:37 ` Dmitry Torokhov 2004-01-28 3:06 ` Linus Torvalds 1 sibling, 2 replies; 12+ messages in thread From: Andrew Morton @ 2004-01-28 2:42 UTC (permalink / raw) To: Alessandro Suardi; +Cc: linux-kernel, linux-acpi Alessandro Suardi <alessandro.suardi@oracle.com> wrote: > > Already reported, but I'll do so once again, since it looks like > in a short while I won't be able to boot official kernels in my > current config... > > Original report here: > > http://www.ussg.iu.edu/hypermail/linux/kernel/0312.3/0442.html Divide by zero. Looks like ACPI is now passing bad values into the frequency change notifier. Does this make the oops go away? diff -puN drivers/cpufreq/cpufreq.c~cpufreq-workaround drivers/cpufreq/cpufreq.c --- 25/drivers/cpufreq/cpufreq.c~cpufreq-workaround 2004-01-27 18:36:05.000000000 -0800 +++ 25-akpm/drivers/cpufreq/cpufreq.c 2004-01-27 18:36:42.000000000 -0800 @@ -928,6 +928,11 @@ void cpufreq_notify_transition(struct cp return; /* Only valid if we're in the resume process where * everyone knows what CPU frequency we are at */ + if (freqs->new == 0) { + printk("%s: avoiding div-by-zero\n", __FUNCTION__); + return; + } + down_read(&cpufreq_notifier_rwsem); switch (state) { case CPUFREQ_PRECHANGE: _ ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: 2.6.2-rc2-bk1 oopses on boot (ACPI patch) 2004-01-28 2:42 ` Andrew Morton @ 2004-01-28 3:10 ` Linus Torvalds 2004-01-28 3:19 ` Alessandro Suardi 2004-01-28 4:37 ` Dmitry Torokhov 1 sibling, 1 reply; 12+ messages in thread From: Linus Torvalds @ 2004-01-28 3:10 UTC (permalink / raw) To: Andrew Morton; +Cc: Alessandro Suardi, linux-kernel, linux-acpi On Tue, 27 Jan 2004, Andrew Morton wrote: > > Divide by zero. Looks like ACPI is now passing bad values into the > frequency change notifier. > > Does this make the oops go away? Other values will still cause divide-by-zero (any divisor in 0..9 will do it). Besides, we're dividing with _old_, not new, so that's the one we should likely check. Linus ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: 2.6.2-rc2-bk1 oopses on boot (ACPI patch) 2004-01-28 3:10 ` Linus Torvalds @ 2004-01-28 3:19 ` Alessandro Suardi 0 siblings, 0 replies; 12+ messages in thread From: Alessandro Suardi @ 2004-01-28 3:19 UTC (permalink / raw) To: Linus Torvalds; +Cc: Andrew Morton, linux-kernel, linux-acpi Linus Torvalds wrote: > > On Tue, 27 Jan 2004, Andrew Morton wrote: > >>Divide by zero. Looks like ACPI is now passing bad values into the >>frequency change notifier. >> >>Does this make the oops go away? > > > Other values will still cause divide-by-zero (any divisor in 0..9 will do > it). Besides, we're dividing with _old_, not new, so that's the one we > should likely check. > > Linus Indeed... I get two of the debug printks from the patch, but in the end I still oops due to a div-by-zero with EIP in time_cpufreq_notifier. I'll try and look into Linus' suggestion about printing out stuff from adjust_jiffies() in cpufreq.c and will report later. Thanks, --alessandro "Two rivers run too deep The seasons change and so do I" (U2, "Indian Summer Sky") ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: 2.6.2-rc2-bk1 oopses on boot (ACPI patch) 2004-01-28 2:42 ` Andrew Morton 2004-01-28 3:10 ` Linus Torvalds @ 2004-01-28 4:37 ` Dmitry Torokhov 2004-01-28 13:37 ` Matt Domsch 1 sibling, 1 reply; 12+ messages in thread From: Dmitry Torokhov @ 2004-01-28 4:37 UTC (permalink / raw) To: Andrew Morton, Alessandro Suardi; +Cc: linux-kernel, linux-acpi On Tuesday 27 January 2004 09:42 pm, Andrew Morton wrote: > Alessandro Suardi <alessandro.suardi@oracle.com> wrote: > > Already reported, but I'll do so once again, since it looks like > > in a short while I won't be able to boot official kernels in my > > current config... > > > > Original report here: > > > > http://www.ussg.iu.edu/hypermail/linux/kernel/0312.3/0442.html > > Divide by zero. Looks like ACPI is now passing bad values into the > frequency change notifier. It is a common problem with Dell's DSDT implementation which does not follow ACPI spec and it's been going on for ages. From the original report: cpufreq: CPU0 - ACPI performance management activated cpufreq: *P0: 1Mhz, 0 mW, 0 uS cpufreq: P1: 0Mhz, 0 mW, 0 uS divide error: 0000 [#1] As you can see all data is bogus... Patching DSDT cures it for sure, sometimes CONFIG_ACPI_RELAXED_AML helps as well. I suppose ACPI P-states driver could check frequencies/latencies and refuse to activate if the are bogus. -- Dmitry ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: 2.6.2-rc2-bk1 oopses on boot (ACPI patch) 2004-01-28 4:37 ` Dmitry Torokhov @ 2004-01-28 13:37 ` Matt Domsch 2004-01-28 22:32 ` Alessandro Suardi 0 siblings, 1 reply; 12+ messages in thread From: Matt Domsch @ 2004-01-28 13:37 UTC (permalink / raw) To: Dmitry Torokhov Cc: Andrew Morton, Alessandro Suardi, linux-kernel, linux-acpi On Tue, Jan 27, 2004 at 11:37:55PM -0500, Dmitry Torokhov wrote: > > Divide by zero. Looks like ACPI is now passing bad values into the > > frequency change notifier. > > It is a common problem with Dell's DSDT implementation which does not > follow ACPI spec and it's been going on for ages. From the original > report: > > cpufreq: CPU0 - ACPI performance management activated > cpufreq: *P0: 1Mhz, 0 mW, 0 uS > cpufreq: P1: 0Mhz, 0 mW, 0 uS > divide error: 0000 [#1] > > As you can see all data is bogus... Patching DSDT cures it for sure, > sometimes CONFIG_ACPI_RELAXED_AML helps as well. Please send me your DSDT and output of dmidecode, and ideally what a proper DSDT should show in this case (I'm not familiar enough with what all the various ACPI tables should contain), and I'll take it up with the BIOS programmers for that platform. Thanks, Matt -- Matt Domsch Sr. Software Engineer, Lead Engineer Dell Linux Solutions linux.dell.com & www.dell.com/linux Linux on Dell mailing lists @ http://lists.us.dell.com ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: 2.6.2-rc2-bk1 oopses on boot (ACPI patch) 2004-01-28 13:37 ` Matt Domsch @ 2004-01-28 22:32 ` Alessandro Suardi 0 siblings, 0 replies; 12+ messages in thread From: Alessandro Suardi @ 2004-01-28 22:32 UTC (permalink / raw) To: Matt Domsch; +Cc: Dmitry Torokhov, Andrew Morton, linux-kernel, linux-acpi [-- Attachment #1: Type: text/plain, Size: 1523 bytes --] Matt Domsch wrote: > On Tue, Jan 27, 2004 at 11:37:55PM -0500, Dmitry Torokhov wrote: > >>>Divide by zero. Looks like ACPI is now passing bad values into the >>>frequency change notifier. >> >>It is a common problem with Dell's DSDT implementation which does not >>follow ACPI spec and it's been going on for ages. From the original >>report: >> >>cpufreq: CPU0 - ACPI performance management activated >> cpufreq: *P0: 1Mhz, 0 mW, 0 uS >> cpufreq: P1: 0Mhz, 0 mW, 0 uS >> divide error: 0000 [#1] >> >>As you can see all data is bogus... Patching DSDT cures it for sure, >>sometimes CONFIG_ACPI_RELAXED_AML helps as well. > > > Please send me your DSDT and output of dmidecode, and ideally what a > proper DSDT should show in this case (I'm not familiar enough with > what all the various ACPI tables should contain), and I'll take it up > with the BIOS programmers for that platform. While appreciating your offer, I'd like to remind that this works perfectly prior to the 20031203 ACPI patch. Indeed, this is what 2.6.1 vanilla says in that area: cpufreq: CPU0 - ACPI performance management activated. cpufreq: *P0: 1800 MHz, 0 mW, 250 uS cpufreq: P1: 1200 MHz, 0 mW, 250 uS Attaching the gzipped dmesg for my 2.6.1 boot - let me know if you want anyway dmidecode output and DSDT; for this latter I'll have to ask for instructions (or is the output of a simple 'cat /proc/acpi/dsdt' enough ?). --alessandro "Two rivers run too deep The seasons change and so do I" (U2, "Indian Summer Sky") [-- Attachment #2: dmesg.out.gz --] [-- Type: application/x-gzip, Size: 4489 bytes --] ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: 2.6.2-rc2-bk1 oopses on boot (ACPI patch) 2004-01-28 2:15 Alessandro Suardi 2004-01-28 2:42 ` Andrew Morton @ 2004-01-28 3:06 ` Linus Torvalds 2004-01-28 3:40 ` Alessandro Suardi 1 sibling, 1 reply; 12+ messages in thread From: Linus Torvalds @ 2004-01-28 3:06 UTC (permalink / raw) To: Alessandro Suardi Cc: linux-kernel, linux-acpi, Andrew Morton, Dominik Brodowski, Dave Jones On Wed, 28 Jan 2004, Alessandro Suardi wrote: > > Already reported, but I'll do so once again, since it looks like > in a short while I won't be able to boot official kernels in my > current config... > > http://www.ussg.iu.edu/hypermail/linux/kernel/0312.3/0442.html Can you make adjust_jiffies() print out its arguments (it's in drivers/cpufreq/cpufreq.c). It looks like cpufreq_scale() gets a divide-by-zero or an overflow on one of l_p_j_ref, l_p_j_ref_freq, ci->new and just printing out those values would be interesting. That said, the code is crap anyway. It does various divides without actually testing for any sanity at all, and tries to "avoid overflow" by totally bogus methods, instead of just using the 64-bit do_div64(). Dominic? Dave? Suggestions about nicer failure modes? Linus ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: 2.6.2-rc2-bk1 oopses on boot (ACPI patch) 2004-01-28 3:06 ` Linus Torvalds @ 2004-01-28 3:40 ` Alessandro Suardi 2004-01-28 16:14 ` Dominik Brodowski 0 siblings, 1 reply; 12+ messages in thread From: Alessandro Suardi @ 2004-01-28 3:40 UTC (permalink / raw) To: Linus Torvalds Cc: linux-kernel, linux-acpi, Andrew Morton, Dominik Brodowski, Dave Jones Linus Torvalds wrote: > > On Wed, 28 Jan 2004, Alessandro Suardi wrote: > >>Already reported, but I'll do so once again, since it looks like >> in a short while I won't be able to boot official kernels in my >> current config... >> >> http://www.ussg.iu.edu/hypermail/linux/kernel/0312.3/0442.html > > > Can you make adjust_jiffies() print out its arguments (it's in > drivers/cpufreq/cpufreq.c). > > It looks like cpufreq_scale() gets a divide-by-zero or an overflow on one > of > > l_p_j_ref, l_p_j_ref_freq, ci->new > > and just printing out those values would be interesting. Assuming the late hour (hmm, early by now) hasn't crossed my eyes entirely the three above entities are %lu, %u, %u... so this line printk("CPUFREQ DEBUG: [%lu] [%u] [%u]\n", l_p_j_ref, l_p_j_ref_freq, ci->new); as both first and last instruction in adjust_jiffies() turns up the same values, which are 1773568, 1, 0. Side-note, since master penguin is looking... after the oops all SysRq stuff keeps working - except Alt-SysRq-B; the atkbd.c code tells me the keyboard says "too many keys pressed". K, T, P just do their job fine. (yeah, okay, Alt-SysRq-O prints Power Off but obviously doesn't). Thanks, --alessandro "Two rivers run too deep The seasons change and so do I" (U2, "Indian Summer Sky") ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: 2.6.2-rc2-bk1 oopses on boot (ACPI patch) 2004-01-28 3:40 ` Alessandro Suardi @ 2004-01-28 16:14 ` Dominik Brodowski 0 siblings, 0 replies; 12+ messages in thread From: Dominik Brodowski @ 2004-01-28 16:14 UTC (permalink / raw) To: Alessandro Suardi, Linus Torvalds Cc: linux-kernel, linux-acpi, Andrew Morton, Dave Jones [-- Attachment #1: Type: text/plain, Size: 1783 bytes --] On Wed, Jan 28, 2004 at 04:40:32AM +0100, Alessandro Suardi wrote: > printk("CPUFREQ DEBUG: [%lu] [%u] [%u]\n", l_p_j_ref, l_p_j_ref_freq, > ci->new); > > as both first and last instruction in adjust_jiffies() turns > up the same values, which are 1773568, 1, 0. The ACPI tables report totally bogus CPU frequencies -- 1 and 0 MHz. I'm surprised this differs between 2.6.0 and 2.6.x-mm... Len, any idea? On Tue, Jan 27, 2004 at 07:06:41PM -0800, Linus Torvalds wrote: > > > On Wed, 28 Jan 2004, Alessandro Suardi wrote: > > > > Already reported, but I'll do so once again, since it looks like > > in a short while I won't be able to boot official kernels in my > > current config... > > > > http://www.ussg.iu.edu/hypermail/linux/kernel/0312.3/0442.html > > Can you make adjust_jiffies() print out its arguments (it's in > drivers/cpufreq/cpufreq.c). > > It looks like cpufreq_scale() gets a divide-by-zero or an overflow on one > of > > l_p_j_ref, l_p_j_ref_freq, ci->new > > and just printing out those values would be interesting. > > That said, the code is crap anyway. > It does various divides without > actually testing for any sanity at all, CPUfreq and the CPUfreq timing code _need_ to rely on the CPU frequencies being reported by the drivers. If they're wrong all timing will be wrong[1]... Nonetheless, a fix for the acpi driver which aborts on such "zero" MHz reports has already been sent to Len for reviewal [2]. > and tries to "avoid overflow" by > totally bogus methods, instead of just using the 64-bit do_div64(). Agreed, will fix it. Dominik [1] Especially as the pmtmr also uses tsc for the delay() routines... [2] http://marc.theaimsgroup.com/?l=acpi4linux&m=107421039607335&w=2 [-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2004-01-30 0:38 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <BF1FE1855350A0479097B3A0D2A80EE0020AE8AD@hdsmsx402.hd.intel.com>
2004-01-29 23:31 ` 2.6.2-rc2-bk1 oopses on boot (ACPI patch) Len Brown
2004-01-30 0:37 ` Alessandro Suardi
2004-01-28 2:15 Alessandro Suardi
2004-01-28 2:42 ` Andrew Morton
2004-01-28 3:10 ` Linus Torvalds
2004-01-28 3:19 ` Alessandro Suardi
2004-01-28 4:37 ` Dmitry Torokhov
2004-01-28 13:37 ` Matt Domsch
2004-01-28 22:32 ` Alessandro Suardi
2004-01-28 3:06 ` Linus Torvalds
2004-01-28 3:40 ` Alessandro Suardi
2004-01-28 16:14 ` Dominik Brodowski
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox