* Re: Hardware bug or kernel bug? [not found] ` <20061013130648.GC1690@ff.dom.local> @ 2006-10-13 16:24 ` David Johnson 2006-10-13 17:11 ` Alan Cox 2006-10-16 10:25 ` Jarek Poplawski 0 siblings, 2 replies; 5+ messages in thread From: David Johnson @ 2006-10-13 16:24 UTC (permalink / raw) To: Jarek Poplawski; +Cc: Linux Kernel, netdev On Friday 13 October 2006 14:06, Jarek Poplawski wrote: > > Probably - but only with networking. So I'd try with this debugging > like in my first reply plus maybe 2.6.19-rc1 (e1000 - btw. I hope > this other tested card was different model - and locking improved) > and resend conclusions to netdev@vger.kernel.org. > OK I built a 2.6.19-rc1 kernel with a minimal config as you describe and I cannot reproduce the reboots with this kernel. My .config: http://www.david-web.co.uk/download/config The other NIC I tried was a D-Link DL10050-based card which I think uses the dl2k module. I tried to reproduce the problem under Windows (2k), which didn't reboot but did still suffer from it I believe. Randomly during an scp transfer (using the PuTTY scp client) Windows will lock-up for about 30 seconds, making an entry in the event log indicating that there was a time-out talking to the IDE controller, then continuing. Could the same thing be happening in Linux? If Linux can't talk to the IDE controller when trying to write to disk, how does it handle that? Regards, David. ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Hardware bug or kernel bug? 2006-10-13 16:24 ` Hardware bug or kernel bug? David Johnson @ 2006-10-13 17:11 ` Alan Cox 2006-10-16 10:25 ` Jarek Poplawski 1 sibling, 0 replies; 5+ messages in thread From: Alan Cox @ 2006-10-13 17:11 UTC (permalink / raw) To: David Johnson; +Cc: Jarek Poplawski, Linux Kernel, netdev Ar Gwe, 2006-10-13 am 17:24 +0100, ysgrifennodd David Johnson: > IDE controller, then continuing. Could the same thing be happening in Linux? > If Linux can't talk to the IDE controller when trying to write to disk, how > does it handle that? It will timeout and then retry the command. It's not the most ideal situation to end up in but I'd expect to see a DMA timeout and a retry or two in the log not a crash. ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Hardware bug or kernel bug? 2006-10-13 16:24 ` Hardware bug or kernel bug? David Johnson 2006-10-13 17:11 ` Alan Cox @ 2006-10-16 10:25 ` Jarek Poplawski 2006-10-16 14:32 ` David Johnson 1 sibling, 1 reply; 5+ messages in thread From: Jarek Poplawski @ 2006-10-16 10:25 UTC (permalink / raw) To: David Johnson; +Cc: Linux Kernel, netdev On Fri, Oct 13, 2006 at 05:24:39PM +0100, David Johnson wrote: > On Friday 13 October 2006 14:06, Jarek Poplawski wrote: > > > > Probably - but only with networking. So I'd try with this debugging > > like in my first reply plus maybe 2.6.19-rc1 (e1000 - btw. I hope > > this other tested card was different model - and locking improved) > > and resend conclusions to netdev@vger.kernel.org. > > > > OK I built a 2.6.19-rc1 kernel with a minimal config as you describe and I > cannot reproduce the reboots with this kernel. My .config: > http://www.david-web.co.uk/download/config I've seen more minimal minimal configs but if it works it is 50% of success. > The other NIC I tried was a D-Link DL10050-based card which I think uses the > dl2k module. > > I tried to reproduce the problem under Windows (2k), which didn't reboot but > did still suffer from it I believe. Randomly during an scp transfer (using > the PuTTY scp client) Windows will lock-up for about 30 seconds, making an > entry in the event log indicating that there was a time-out talking to the > IDE controller, then continuing. Could the same thing be happening in Linux? > If Linux can't talk to the IDE controller when trying to write to disk, how > does it handle that? Was this lock-up effect visible during above 2.6.19-rc1 tests? If not I'd try to continue linux debbuging: - is 2.6.19-rc1 working with "normal" config (use make oldconfig to "upgrade" .config), - is 2.6.17 working with "minimal" config (use make oldconfig), - changing one or two options at a time try to find which one makes the effect returns (acpi, smp...). Regards, Jarek P. PS: Sorry for late reply - I was offline. ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Hardware bug or kernel bug? 2006-10-16 10:25 ` Jarek Poplawski @ 2006-10-16 14:32 ` David Johnson 2006-10-17 7:10 ` Jarek Poplawski 0 siblings, 1 reply; 5+ messages in thread From: David Johnson @ 2006-10-16 14:32 UTC (permalink / raw) To: Jarek Poplawski; +Cc: Linux Kernel, netdev On Monday 16 October 2006 11:25, Jarek Poplawski wrote: > > Was this lock-up effect visible during above 2.6.19-rc1 tests? No, I've not seen anything in Linux other than the reboots, which are instant without any preceding lock-up. > If not I'd try to continue linux debbuging: > - is 2.6.19-rc1 working with "normal" config (use make oldconfig > to "upgrade" .config), With 2.6.19-rc1 and a normal config, I get the reboots as usual. > - is 2.6.17 working with "minimal" config (use make oldconfig), Yes. > - changing one or two options at a time try to find which one makes > the effect returns (acpi, smp...). I've found the culprit - CPU Frequency Scaling. With it enabled I get the reboots, with it disabled I don't. That's the same with every kernel version I've tried (2.6.19-rc1+rc2, 2.6.17.13 & Centos' 2.6.9) The system was using the p4-clockmod driver and the ondemand governor. I'm still not sure exactly what the problem is - the reboots only happen in the circumstances I've mentioned and are not triggered by changes in clock speed alone - but disabling cpufreq seems to make it go away... Thanks for your help, David. ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Hardware bug or kernel bug? 2006-10-16 14:32 ` David Johnson @ 2006-10-17 7:10 ` Jarek Poplawski 0 siblings, 0 replies; 5+ messages in thread From: Jarek Poplawski @ 2006-10-17 7:10 UTC (permalink / raw) To: David Johnson; +Cc: Linux Kernel, netdev On Mon, Oct 16, 2006 at 03:32:38PM +0100, David Johnson wrote: ... > I've found the culprit - CPU Frequency Scaling. > With it enabled I get the reboots, with it disabled I don't. That's the same > with every kernel version I've tried (2.6.19-rc1+rc2, 2.6.17.13 & Centos' > 2.6.9) The system was using the p4-clockmod driver and the ondemand governor. > > I'm still not sure exactly what the problem is - the reboots only happen in > the circumstances I've mentioned and are not triggered by changes in clock > speed alone - but disabling cpufreq seems to make it go away... I see you devoted a lot of work and time to this testing and for sure it will help people who read this to diagnose similar problems but I think it could be even more valuable if you'd try (after some rest!) to find if "Enable CPUfreq debugging" plus adding to kernel command line cpufreq.debug=<value> (according to help screen) would return any error messages that could be send to bugzilla and/or cpufreq maintainer. Best regards, Jarek P. ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2006-10-17 7:05 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20061013085605.GA1690@ff.dom.local>
[not found] ` <200610131256.54546.dj@david-web.co.uk>
[not found] ` <20061013130648.GC1690@ff.dom.local>
2006-10-13 16:24 ` Hardware bug or kernel bug? David Johnson
2006-10-13 17:11 ` Alan Cox
2006-10-16 10:25 ` Jarek Poplawski
2006-10-16 14:32 ` David Johnson
2006-10-17 7:10 ` Jarek Poplawski
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).