All of lore.kernel.org
 help / color / mirror / Atom feed
From: Charles-Edouard Ruault <ce@ruault.com>
To: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>, Andrew Morton <akpm@osdl.org>,
	linux-kernel@vger.kernel.org
Subject: Re: [patch] timer-irq-driven soft-watchdog, cleanups
Date: Mon, 13 Mar 2006 15:48:30 +0100	[thread overview]
Message-ID: <4415863E.5060704@ruault.com> (raw)
In-Reply-To: <58cb370e0602171415q1b040e7er1e7849212f95f2b7@mail.gmail.com>

Bartlomiej Zolnierkiewicz wrote:

>On 2/17/06, Ingo Molnar <mingo@elte.hu> wrote:
>  
>
>>* Bartlomiej Zolnierkiewicz <bzolnier@gmail.com> wrote:
>>
>>    
>>
>>>Sorry but I have enough more high priority issues to take care of and
>>>I'm not going to spend any more time on soft lockups even if they are
>>>really problems in IDE subsystem.  If this is not fixed before 2.6.16
>>>I'm submitting patch to Linus making DETECT_SOFTLOCKUP depend on
>>>"CONFIG_IDE=n"... at least users will be able to use their systems
>>>instead of seeing lockups.
>>>      
>>>
>>i have lots of IDE based systems (they dont use PIO though) and i'm not
>>seeing these. I'll oppose such a patch if it's to hide genuine issues -
>>the 10 seconds tolerance is already generous i think. I'll of course fix
>>any false positives which are the fault of the softlockup-watchdog, but
>>from your mails it appears to me that the IDE warnings are indeed
>>genuine.
>>
>>If the source of the delay is hard to fix you can temporarily work it
>>around in the code by putting in the touch_softlockup_watchdog() lines -
>>that will also document the places that cause long delays - which is a
>>good thing.
>>
>>It is entirely feasible to put a touch_softlockup_watchdog() call into
>>every PIO OP - even a single-byte PIO related IN/OUT instruction takes a
>>couple of microseconds, so a touch_softlockup_watchdog() wont even show
>>up on the radar.
>>    
>>
>
>OK, I'll just add touch_softlockup_watchdog() if needed but first lets
>wait for results of your patch.
>
>Note that I'll invest my time on this which could be invested into other
>things and I don't see it as top-priority issue if you differ in your opinion
>you should be the person fixing affected drivers.
>
>The conclusion of the rant is:  people making changes at higher layers
>should start paying maintenance costs of their changes.  Over few years
>of maintaining IDE I learned quite a lot about block layer, VFS, VM, ACPI,
>PM, IRQ routing, scheduling, sysfs etc (I'm not talking about interface
>changes but about bugs/changes which are reported by end users
>and driver maintainers are end-point).  This is all good but distracts me
>from my primary task and now it is turn for people hacking on generic
>code to learn few driver specific things... :)
>
>No wonder that nobody wants to hack drivers: less fame, more flames,
>and actually besides knowing hardware you need to know a lot about
>kernel in general to do your job right.  I hope that Andrew is reading this.
>
>End of whining.
>
>  
>
>>>DETECT_SOFTLOCKUP should be an aim in development not a method of
>>>forcing driver maintainers to work on specific issues...
>>>      
>>>
>>well, 10+ seconds delays on a running system are not really acceptable,
>>and can cause other problems. The softlockup-watchdog is optional and
>>can be easily turned off in the .config.
>>    
>>
>
>It is "y" by default so anybody saying "y" to DEBUG_KERNEL will get it as
>added bonus and moreover DEBUG_KERNEL is "y" in x86_64 defconfig.
>
>Bartlomiej
>  
>
Guys,
sorry for having been silent for so long ... since i had the problem, i
upgraded to 2.6.15.6 and i experienced the problem again over the week-end.
My point it that it appears that the BUG: soft lockup detected on CPU#0!
message is legitimate in my case since my machine really freezes
periodically because of ide timeouts.
What i'm trying to figure out is why are the IDE timeouts occuring ?
Both my DVD burner & CD burner usually work fine ( i had been able to
burn a DVD a few hours before the problem occured , i'm able to read CDs
& DVDs fine ).
However, the problem seems to occur only when trying to rip CDs. It
appears that both drives get confused .... ( see logs, both are
reporting timeouts/erros ).
Mar 11 17:41:47 ruault kernel: hdd: ATAPI reset
complete                                                                   
Mar 11 17:41:57 ruault kernel: hdc: irq timeout: status=0x80 { Busy }
Mar 11 17:41:57 ruault kernel: ide: failed opcode was:
unknown                                                             
Mar 11 17:41:57 ruault kernel: hdd: status timeout: status=0x80 { Busy }
Mar 11 17:41:57 ruault kernel: ide: failed opcode was:
unknown                                                             
Mar 11 17:41:57 ruault kernel: hdd: drive not ready for command
Mar 11 17:41:57 ruault kernel: hdd: ATAPI reset
complete                                                                   
Mar 11 17:42:17 ruault kernel: hdc: irq timeout: status=0x80 { Busy }
Mar 11 17:42:17 ruault kernel: ide: failed opcode was:
unknown                                                             
Mar 11 17:42:17 ruault kernel: hdd: status timeout: status=0x80 { Busy }
Mar 11 17:42:17 ruault kernel: ide: failed opcode was:
unknown                                                             
Mar 11 17:42:17 ruault kernel: hdd: drive not ready for command
Mar 11 17:42:17 ruault kernel: BUG: soft lockup detected on
CPU#0!                                                         
Mar 11 17:42:17 ruault kernel:
Mar 11 17:42:17 ruault kernel: Pid: 0, comm:             
swapper                                                          
Mar 11 17:42:17 ruault kernel: EIP: 0060:[<c0272c55>] CPU: 0
Mar 11 17:42:17 ruault kernel: EIP is at
ide_inb+0x5/0x10                                                                  

Mar 11 17:42:17 ruault kernel:  EFLAGS: 00000206    Tainted: P      
(2.6.15.6)
Mar 11 17:42:17 ruault kernel: EAX: 00000180 EBX: 03aba395 ECX: f2b9d721
EDX: 00000177                                     
Mar 11 17:42:17 ruault kernel: ESI: 00000088 EDI: c0419e30 EBP: c0419ec4
DS: 007b ES: 007b
Mar 11 17:42:17 ruault kernel: CR0: 8005003b CR2: b7fe2000 CR3: 32e1b000
CR4: 000006d0

And i can confirm that when this happens the machine is totally
unresponsive, as before i had to reboot the machine with a hard reset
after a while.
So my question is what could be causing this behaviour ? is this a bug
in the IDE driver ? a hardware problem with my drives ( as i said they
appear to be working fine otherwise ), another hardware problem with my
motherboard ?
What could i do to help troubleshoot the problem ?
Thanks in advance.

PS: I do not subscribe to the list so please CC me when replying ....

-- 
Charles-Edouard Ruault
GPG key Id E4D2B80C


      reply	other threads:[~2006-03-13 14:48 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-02-12 18:50 [BUG] kernel 2.6.15.4: soft lockup detected on CPU#0! Charles-Edouard Ruault
2006-02-14 11:41 ` Folkert van Heusden
2006-02-14 16:32   ` Lee Revell
2006-02-15 15:07     ` Folkert van Heusden
2006-02-15 18:19       ` Lee Revell
2006-02-16  1:38         ` Alan Cox
2006-02-17 22:56           ` Lee Revell
2006-02-16  2:51 ` Andrew Morton
2006-02-16  9:44   ` Charles-Edouard Ruault
2006-02-16 10:36     ` Folkert van Heusden
2006-02-16 10:46       ` Andrew Morton
2006-02-16 11:18         ` Charles-Edouard Ruault
2006-02-16 12:57     ` Alan Cox
2006-02-16 13:33   ` Bartlomiej Zolnierkiewicz
2006-02-16 20:20     ` Andrew Morton
2006-02-17 11:47       ` Bartlomiej Zolnierkiewicz
2006-02-17 13:08         ` [patch] timer-irq-driven soft-watchdog, cleanups Ingo Molnar
2006-02-17 14:46           ` Bartlomiej Zolnierkiewicz
2006-02-17 19:52             ` Ingo Molnar
2006-02-17 20:11             ` Ingo Molnar
2006-02-17 22:15               ` Bartlomiej Zolnierkiewicz
2006-03-13 14:48                 ` Charles-Edouard Ruault [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4415863E.5060704@ruault.com \
    --to=ce@ruault.com \
    --cc=akpm@osdl.org \
    --cc=bzolnier@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.