linux-ide.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Mike Waychison <mikew@google.com>
To: Tejun Heo <tj@kernel.org>
Cc: Linux SCSI List <linux-scsi@vger.kernel.org>,
	"linux-ide@vger.kernel.org" <linux-ide@vger.kernel.org>
Subject: Re: 2.6.36: Dropped interrupts in ata_piix
Date: Tue, 26 Oct 2010 19:33:57 -0700	[thread overview]
Message-ID: <4CC78F95.4050107@google.com> (raw)
In-Reply-To: <4CC7190A.8080500@google.com>

Mike Waychison wrote:
> Tejun Heo wrote:
>> (cc'ing linux-ide)
>>
>> Hello,
>>
>> On 10/25/2010 08:13 PM, Mike Waychison wrote:
>>> I'm having problems reliably booting 2.6.36 on one of my development
>>> systems whereby it looks like the ata_piix driver isn't
>>> acknowledging interrupts.
>>
>> Why do you think ata_piix isn't ack'ing IRQs?
> 
> Well, it's the only thing in the logs marked up as using IRQ 20.
> 
>>
>>> I went through a bit of the recent history here, and it seems that
>>> things clear up for me if I revert the following two commits in my
>>> tree:
>>>
>>> 1c5afdf7 "libata-sff: separate out BMDMA init"
>>> c3b28894 "libata-sff: separate out BMDMA irq handler"
>>
>> Those commits look scary but they're code refactoring in nature and
>> unless I screwed up (definitely possible) things shouldn't break over
>> them.  Another thing is that they have been in mainline for quite some
>> time and even shipped with ubuntu 10.10 and this is the first report,
>> so I'm a bit skeptical they actually are the culprit.
> 
> Ya, I don't see how they change anything either, but I can't pretend to 
> really understand what is going on in this code either :(
> 
>>
>>> I usually don't get a trace, but I did get this blurted out once on
>>> the console:
>>>
>>> kinit: Mounted root (ext2 filesystem) readonly.
>>> INIT: version 2.78 booting
>>> [    5.419165] irq 20: nobody cared (try booting with the "irqpoll" 
>>> option)
>>> [    5.420140] Pid: 0, comm: kworker/0:1 Not tainted 2.6.36-smp-mikew 
>>> #5gca29cdd
>>> [    5.420140] Call Trace:
>>> [    5.420140]  <IRQ>  [<ffffffff810b207f>] __report_bad_irq+0x3d/0x8c
>>> [    5.420140]  [<ffffffff810b21e6>] note_interrupt+0x118/0x17e
>>> [    5.420140]  [<ffffffff810b29be>] handle_fasteoi_irq+0xa7/0xcc
>>> [    5.420140]  [<ffffffff81032da5>] handle_irq+0x24/0x2f
>>> [    5.420140]  [<ffffffff81453744>] do_IRQ+0x5c/0xc3
>>> [    5.420140]  [<ffffffff8144d853>] ret_from_intr+0x0/0xa
>>> [    5.420140]  <EOI>  [<ffffffff81037a64>] ? mwait_idle+0x93/0x9b
>>> [    5.420140]  [<ffffffff81037a0a>] ? mwait_idle+0x39/0x9b
>>> [    5.420140]  [<ffffffff8102faee>] cpu_idle+0x63/0xd5
>>> [    5.420140]  [<ffffffff8193d340>] start_secondary+0x192/0x196
>>> [    5.420140] handlers:
>>> [    5.420140] [<ffffffff812e26e6>] (ata_bmdma_interrupt+0x0/0x17)
>>> [    5.420140] Disabling IRQ #20
>>> [   34.720103] ata1: lost interrupt (Status 0x51)
>>> [   34.724569] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 
>>> 0x6 frozen
>>> [   34.731612] ata1.00: BMDMA stat 0x26, BMDMA stat 0x0, BMDMA stat 
>>> 0x0, BMDMA stat 0x0, BMDMA stat 0x0
>>> [   34.740750] ata1.00: failed command: READ DMA
>>> [   34.745115] ata1.00: cmd c8/00:a0:f7:78:09/00:00:00:00:00/e0 tag 0 
>>> dma 81920 in
>>> [   34.745116]          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 
>>> 0x24 (host bus error)
>>
>> Hmm... this is interesting.  The IRQ storm happened while a read
>> command was in progress.  BMDMA indicates that host bus error occurred
>> (the DMA controller experienced transfer failure on the PCI side while
>> trying to write to main memory).  It's curious why the interrupt
>> handler thought the IRQ wasn't its.  ata_bmdma_port_intr() should have
>> noticed ATA_DMA_INTR and the command should have been completed
>> immediately, weird.
> 
> To be clear, usually all I see on the console when hanging is:
> 
> [   15.411337] Disabling IRQ #20
> 
> I've only ever had the above call trace make it out to console once.
> 
> Sometimes this happens while mounting the root filesystem, mounting the 
> second filesystem (on the same disk) or while loading modules.  I don't 
> think the failure is tied to any particular sector.
> 
> I don't have any trouble with previous kernels (2.6.34 was fine), and 
> backing out the above two commits makes the symptom go away.
> 
>>
>>> [   34.760490] ata1.00: status: { DRDY }
>>> [   34.764180] ata1: soft resetting link
>>> [   35.143059] ata1.00: configured for UDMA/133
>>> [   35.147332] ata1.00: device reported invalid CHS sector 0
>>> [   35.152730] ata1: EH complete
>>>
>>> As you can see above, something looks to be wrong with 
>>> ata_bmdma_interrupt.
>>>
>>> Have you seen this problem before?
>>
>> No, this is the first time and your hardware seems to be developing an
>> interesting issue.  I suggest trying a different PSU if you have one
>> available.  That said, it would still be useful to track down why the
>> error handling isn't working as expected.  How reliably can you
>> reproduce the problem?
> 
> 
> I can reproduce very reliably on this machine in particular.  I have 
> other machines of similar type that seem to be fine (though I need to 
> dig into the test result database to be sure).  I'd be happy to try any 
> suggestions on this machine.


FWIW, with those two commits reverted, things seem usable, though I just 
noticed these in my logs.  As far as I can tell, the error handling is 
working in this case, while with v2.6.36 it doesn't.



Oct 26 19:14:28 lpw36 kernel: [   42.858479] ata1.00: exception Emask 
0x0 SAct 0x0 SErr 0x0 action 0x6
Oct 26 19:14:28 lpw36 kernel: [   42.858483] ata1.00: BMDMA stat 0x26
Oct 26 19:14:28 lpw36 kernel: [   42.858487] ata1.00: failed command: 
READ DMA
Oct 26 19:14:28 lpw36 kernel: [   42.858493] ata1.00: cmd 
c8/00:60:d3:9c:55/00:00:00:00:00/e5 tag 0 dma 49152 in
Oct 26 19:14:28 lpw36 kernel: [   42.858495]          res 
51/84:60:d3:9c:55/84:00:00:00:00/e5 Emask 0x30 (host bus error)
Oct 26 19:14:28 lpw36 kernel: [   42.858498] ata1.00: status: { DRDY ERR }
Oct 26 19:14:28 lpw36 kernel: [   42.858500] ata1.00: error: { ICRC ABRT }
Oct 26 19:14:28 lpw36 kernel: [   42.858512] ata1: soft resetting link
Oct 26 19:14:28 lpw36 kernel: [   43.134269] ata1.00: configured for 
UDMA/133
Oct 26 19:14:28 lpw36 kernel: [   43.134286] ata1: EH complete


Oct 26 19:14:35 lpw36 kernel: [   50.123746] ata1.00: limiting speed to 
UDMA/100:PIO4
Oct 26 19:14:35 lpw36 kernel: [   50.123751] ata1.00: exception Emask 
0x0 SAct 0x0 SErr 0x0 action 0x6
Oct 26 19:14:35 lpw36 kernel: [   50.123754] ata1.00: BMDMA stat 0x26
Oct 26 19:14:35 lpw36 kernel: [   50.123757] ata1.00: failed command: 
READ DMA
Oct 26 19:14:35 lpw36 kernel: [   50.123763] ata1.00: cmd 
c8/00:a0:47:96:14/00:00:00:00:00/e0 tag 0 dma 81920 in
Oct 26 19:14:35 lpw36 kernel: [   50.123764]          res 
51/84:a0:47:96:14/84:00:00:00:00/e0 Emask 0x30 (host bus error)
Oct 26 19:14:35 lpw36 kernel: [   50.123767] ata1.00: status: { DRDY ERR }
Oct 26 19:14:35 lpw36 kernel: [   50.123768] ata1.00: error: { ICRC ABRT }
Oct 26 19:14:35 lpw36 kernel: [   50.123784] ata1: soft resetting link
Oct 26 19:14:36 lpw36 kernel: [   50.391355] ata1.00: configured for 
UDMA/100
Oct 26 19:14:36 lpw36 kernel: [   50.391364] ata1: EH complete
Oct 26 19:14:41 lpw36 kernel: [   55.600153] ata1.00: exception Emask 
0x0 SAct 0x0 SErr 0x0 action 0x6
Oct 26 19:14:41 lpw36 kernel: [   55.600156] ata1.00: BMDMA stat 0x26
Oct 26 19:14:41 lpw36 kernel: [   55.600160] ata1.00: failed command: 
READ DMA
Oct 26 19:14:41 lpw36 kernel: [   55.600165] ata1.00: cmd 
c8/00:60:ff:88:10/00:00:00:00:00/e0 tag 0 dma 49152 in
Oct 26 19:14:41 lpw36 kernel: [   55.600167]          res 
51/84:60:ff:88:10/00:00:00:00:00/e0 Emask 0x30 (host bus error)
Oct 26 19:14:41 lpw36 kernel: [   55.600169] ata1.00: status: { DRDY ERR }
Oct 26 19:14:41 lpw36 kernel: [   55.600171] ata1.00: error: { ICRC ABRT }
Oct 26 19:14:41 lpw36 kernel: [   55.600187] ata1: soft resetting link
Oct 26 19:14:41 lpw36 kernel: [   55.867655] ata1.00: configured for 
UDMA/100
Oct 26 19:14:41 lpw36 kernel: [   55.867668] ata1: EH complete
Oct 26 19:14:45 lpw36 kernel: [   59.808362] ata1.00: exception Emask 
0x0 SAct 0x0 SErr 0x0 action 0x6
Oct 26 19:14:45 lpw36 kernel: [   59.808365] ata1.00: BMDMA stat 0x26
Oct 26 19:14:45 lpw36 kernel: [   59.808369] ata1.00: failed command: 
READ DMA
Oct 26 19:14:45 lpw36 kernel: [   59.808376] ata1.00: cmd 
c8/00:00:5f:90:1c/00:00:00:00:00/e0 tag 0 dma 131072 in
Oct 26 19:14:45 lpw36 kernel: [   59.808377]          res 
51/84:00:5f:90:1c/00:00:00:00:00/e0 Emask 0x30 (host bus error)
Oct 26 19:14:45 lpw36 kernel: [   59.808380] ata1.00: status: { DRDY ERR }
Oct 26 19:14:45 lpw36 kernel: [   59.808382] ata1.00: error: { ICRC ABRT }
Oct 26 19:14:45 lpw36 kernel: [   59.808395] ata1: soft resetting link
Oct 26 19:14:45 lpw36 kernel: [   60.084257] ata1.00: configured for 
UDMA/100
Oct 26 19:14:45 lpw36 kernel: [   60.084270] ata1: EH complete
Oct 26 19:14:53 lpw36 kernel: [   67.423809] ata1.00: exception Emask 
0x0 SAct 0x0 SErr 0x0 action 0x6
Oct 26 19:14:53 lpw36 kernel: [   67.423813] ata1.00: BMDMA stat 0x26
Oct 26 19:14:53 lpw36 kernel: [   67.423818] ata1.00: failed command: 
READ DMA
Oct 26 19:14:53 lpw36 kernel: [   67.423824] ata1.00: cmd 
c8/00:60:cf:50:1c/00:00:00:00:00/e0 tag 0 dma 49152 in
Oct 26 19:14:53 lpw36 kernel: [   67.423825]          res 
51/84:60:cf:50:1c/00:00:00:00:00/e0 Emask 0x30 (host bus error)
Oct 26 19:14:53 lpw36 kernel: [   67.423828] ata1.00: status: { DRDY ERR }
Oct 26 19:14:53 lpw36 kernel: [   67.423831] ata1.00: error: { ICRC ABRT }
Oct 26 19:14:53 lpw36 kernel: [   67.423843] ata1: soft resetting link
Oct 26 19:14:53 lpw36 kernel: [   67.691312] ata1.00: configured for 
UDMA/100
Oct 26 19:14:53 lpw36 kernel: [   67.691329] ata1: EH complete
Oct 26 19:14:58 lpw36 kernel: [   73.017686] ata1.00: limiting speed to 
UDMA/33:PIO4
Oct 26 19:14:58 lpw36 kernel: [   73.017692] ata1.00: exception Emask 
0x0 SAct 0x0 SErr 0x0 action 0x6
Oct 26 19:14:58 lpw36 kernel: [   73.017694] ata1.00: BMDMA stat 0x26
Oct 26 19:14:58 lpw36 kernel: [   73.017698] ata1.00: failed command: 
READ DMA
Oct 26 19:14:58 lpw36 kernel: [   73.017705] ata1.00: cmd 
c8/00:08:bf:e8:05/00:00:00:00:00/e0 tag 0 dma 4096 in
Oct 26 19:14:58 lpw36 kernel: [   73.017706]          res 
51/84:00:c6:e8:05/00:00:00:00:00/e0 Emask 0x30 (host bus error)
Oct 26 19:14:58 lpw36 kernel: [   73.017709] ata1.00: status: { DRDY ERR }
Oct 26 19:14:58 lpw36 kernel: [   73.017711] ata1.00: error: { ICRC ABRT }
Oct 26 19:14:58 lpw36 kernel: [   73.017734] ata1: soft resetting link
Oct 26 19:14:59 lpw36 kernel: [   73.285426] ata1.00: configured for UDMA/33
Oct 26 19:14:59 lpw36 kernel: [   73.285447] ata1: EH complete
Oct 26 19:15:00 lpw36 kernel: [   74.535362] ata1.00: exception Emask 
0x0 SAct 0x0 SErr 0x0 action 0x6
Oct 26 19:15:00 lpw36 kernel: [   74.535366] ata1.00: BMDMA stat 0x26
Oct 26 19:15:00 lpw36 kernel: [   74.535370] ata1.00: failed command: 
READ DMA
Oct 26 19:15:00 lpw36 kernel: [   74.535376] ata1.00: cmd 
c8/00:00:97:53:0f/00:00:00:00:00/e0 tag 0 dma 131072 in
Oct 26 19:15:00 lpw36 kernel: [   74.535378]          res 
51/84:00:97:53:0f/00:00:00:00:00/e0 Emask 0x30 (host bus error)
Oct 26 19:15:00 lpw36 kernel: [   74.535381] ata1.00: status: { DRDY ERR }
Oct 26 19:15:00 lpw36 kernel: [   74.535383] ata1.00: error: { ICRC ABRT }
Oct 26 19:15:00 lpw36 kernel: [   74.535395] ata1: soft resetting link
Oct 26 19:15:00 lpw36 kernel: [   74.803082] ata1.00: configured for UDMA/33
Oct 26 19:15:00 lpw36 kernel: [   74.803099] ata1: EH complete
Oct 26 19:15:02 lpw36 kernel: [   76.177935] ata1.00: exception Emask 
0x0 SAct 0x0 SErr 0x0 action 0x6
Oct 26 19:15:02 lpw36 kernel: [   76.177940] ata1.00: BMDMA stat 0x26
Oct 26 19:15:02 lpw36 kernel: [   76.177944] ata1.00: failed command: 
READ DMA
Oct 26 19:15:02 lpw36 kernel: [   76.177950] ata1.00: cmd 
c8/00:00:7f:39:11/00:00:00:00:00/e0 tag 0 dma 131072 in
Oct 26 19:15:02 lpw36 kernel: [   76.177952]          res 
51/84:00:7f:39:11/00:00:00:00:00/e0 Emask 0x30 (host bus error)
Oct 26 19:15:02 lpw36 kernel: [   76.177955] ata1.00: status: { DRDY ERR }
Oct 26 19:15:02 lpw36 kernel: [   76.177957] ata1.00: error: { ICRC ABRT }
Oct 26 19:15:02 lpw36 kernel: [   76.177969] ata1: soft resetting link
Oct 26 19:15:02 lpw36 kernel: [   76.453953] ata1.00: configured for UDMA/33
Oct 26 19:15:02 lpw36 kernel: [   76.453970] ata1: EH complete
Oct 26 19:15:04 lpw36 kernel: [   78.453813] ata1.00: exception Emask 
0x0 SAct 0x0 SErr 0x0 action 0x6
Oct 26 19:15:04 lpw36 kernel: [   78.453817] ata1.00: BMDMA stat 0x26
Oct 26 19:15:04 lpw36 kernel: [   78.453821] ata1.00: failed command: 
READ DMA
Oct 26 19:15:04 lpw36 kernel: [   78.453827] ata1.00: cmd 
c8/00:00:8f:0d:09/00:00:00:00:00/e0 tag 0 dma 131072 in
Oct 26 19:15:04 lpw36 kernel: [   78.453828]          res 
51/84:00:8f:0d:09/00:00:00:00:00/e0 Emask 0x30 (host bus error)
Oct 26 19:15:04 lpw36 kernel: [   78.453831] ata1.00: status: { DRDY ERR }
Oct 26 19:15:04 lpw36 kernel: [   78.453833] ata1.00: error: { ICRC ABRT }
Oct 26 19:15:04 lpw36 kernel: [   78.453844] ata1: soft resetting link
Oct 26 19:15:04 lpw36 kernel: [   78.721461] ata1.00: configured for UDMA/33
Oct 26 19:15:04 lpw36 kernel: [   78.721473] ata1: EH complete




      reply	other threads:[~2010-10-27  2:34 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <4CC5C8C4.8010602@google.com>
2010-10-26  9:58 ` 2.6.36: Dropped interrupts in ata_piix Tejun Heo
2010-10-26 18:08   ` Mike Waychison
2010-10-27  2:33     ` Mike Waychison [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4CC78F95.4050107@google.com \
    --to=mikew@google.com \
    --cc=linux-ide@vger.kernel.org \
    --cc=linux-scsi@vger.kernel.org \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).