linux-ide.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
To: David Miller <davem@davemloft.net>
Cc: elendil@planet.nl, sparclinux@vger.kernel.org, linux-ide@vger.kernel.org
Subject: Re: cmd64x: irq 14: nobody cared - system is dreadfully slow
Date: Sun, 21 Jun 2009 15:15:56 +0200	[thread overview]
Message-ID: <200906211515.56804.bzolnier@gmail.com> (raw)
In-Reply-To: <20090620.171945.140676687.davem@davemloft.net>

On Sunday 21 June 2009 02:19:45 David Miller wrote:
> From: Frans Pop <elendil@planet.nl>
> Date: Sat, 20 Jun 2009 23:52:33 +0200
> 
> > ide0 at 0x1fe02c00000-0x1fe02c00007,0x1fe02c0000a on irq 14 (serialized)
> > irq 14: nobody cared (try booting with the "irqpoll" option)
> > Call Trace:
> 
> Try reverting this patch:
> 
> commit 6b5cde3629701258004b94cde75dd1089b556b02
> Author: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
> Date:   Mon Dec 29 20:27:32 2008 +0100
> 
>     cmd64x: set IDE_HFLAG_SERIALIZE explictly for CMD646
>     
>     * Set IDE_HFLAG_SERIALIZE explictly for CMD646.
>     
>     * Remove no longer needed ide_cmd646 chipset type (which has
>       a nice side-effect of fixing handling of unexpected IRQs).
>     
>     Cc: Sergei Shtylyov <sshtylyov@ru.mvista.com>
>     Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>

Great work Sherlock, Frans has found this already by himself.. ;)

> Unlike the commit log message states, I suspect this change
> "introduces" incorrect handling of unexpected IRQs rather than
> "fixing".  I suspect the problem arises when the controller

Please take a look at ide_intr() at the time of the commit:

1348         if ((handler = hwgroup->handler) == NULL || hwgroup->polling) {
1349                 /*
1350                  * Not expecting an interrupt from this drive.
1351                  * That means this could be:
1352                  *      (1) an interrupt from another PCI device
1353                  *      sharing the same PCI INT# as us.
1354                  * or   (2) a drive just entered sleep or standby mode,
1355                  *      and is interrupting to let us know.
1356                  * or   (3) a spurious interrupt of unknown origin.
1357                  *
1358                  * For PCI, we cannot tell the difference,
1359                  * so in that case we just ignore it and hope it goes away.
1360                  *
1361                  * FIXME: unexpected_intr should be hwif-> then we can
1362                  * remove all the ifdef PCI crap
1363                  */
1364 #ifdef CONFIG_BLK_DEV_IDEPCI
1365                 if (hwif->chipset != ide_pci)
1366 #endif  /* CONFIG_BLK_DEV_IDEPCI */

Before the patch hwif->chipset was set to ide_cmd646
and CONFIG_BLK_DEV_IDEPCI was always 'y'.

1367                 {
1368                         /*
1369                          * Probably not a shared PCI interrupt,
1370                          * so we can safely try to do something about it:
1371                          */
1372                         unexpected_intr(irq, hwgroup);
1373 #ifdef CONFIG_BLK_DEV_IDEPCI
1374                 } else {
1375                         /*
1376                          * Whack the status register, just in case
1377                          * we have a leftover pending IRQ.
1378                          */
1379                         (void)hwif->tp_ops->read_status(hwif);
1380 #endif /* CONFIG_BLK_DEV_IDEPCI */
1381                 }
1382                 goto out;
1383         }

> has a pending interrupt before the kernel boots, and after the
> reset the IDE layer now won't clear the thing properly due to
> the IDE_HFLAG_SERIALIZE now being set.

Because of hwif->chipset == ide_cmd646 IDE core has treated this
chipset as serialized one anyway:

init_irq():

1061                 if (h && h->hwgroup) {  /* scan only initialized ports */
1062                         if (hwif->irq == h->irq) {
1063                                 hwif->sharing_irq = h->sharing_irq = 1;
1064                                 if (hwif->chipset != ide_pci ||
1065                                     h->chipset != ide_pci) {
1066                                         save_match(hwif, h, &match);
1067                                 }
1068                         }

> I suspect the following logic in ide_intr() is being triggered:
> 
> 	if (host->host_flags & IDE_HFLAG_SERIALIZE) {
> 		if (hwif != host->cur_port)
> 			goto out_early;
> 	}
> 
> so the interrupt isn't cleared and it just keeps trying to run the
> interrupt handler over and over until the generic IRQ layer gives up
> and shuts off the interrupt.
> 
> This would make sense if the CMD64X chip reset code triggers the
> interrupt, because I see absolutely nothing the makes sure
> host->cur_port would be setup correctly to ensure that the interrupt
> got services in that case.

At the moment that we call request_irq() the first time both ports have
already been probed.  Since this includes reading device status there should
be no pending IRQs at this point.  IOW I think that this commit is just
a trigger for some other issue (which in turn was uncovered by rework of
handling of serialized interfaces).  Welcome in the wonderful land of broken
devices (we have a separate issue in this system -- ATAPI device returns
buggy ID block and since we added stricter checks we will need to workaround
it now to get DMA working), buggy controllers and no errata documentation.

> I wonder how much testing this commit received...

Too less, any help with improving testing coverage is welcomed
(this was just a tiny part of very large rework which has been tested
and has been sitting in linux-next for weeks before push to Linus).

> Actually... the patch doesn't revert cleanly.  Let me setup a
> patch to test by hand.  It just removes the IDE_HFLAG_SERIALIZE
> flag from the chipset array entry.

It would be worth to try it since IDE_HFLAG_SERIALIZE might be not needed
by CMD646 chipset.  It was suggested by mikpe & Sergei (IIRC) in the past
and pata_cmd64x.c has never used serialization (though this a worthless data
since initially libata didn't support serialization and some host drivers
still lack it when needed) but we couldn't get a definitive answer without
anybody testing the change.

[ The problem is that it can potentially cause a data corruption if we are
  wrong.  OTOH the worst thing that could happened with keeping it around was
  slower operation and (as discovered now) wrong handling of unexpected IRQs
  (so even slower operation). ]

> Alternatively you can try using the pata_cmd64x.c driver instead :-)

I fail to see how this is going to help with cmd64x.c testing coverage.. :(

The good news now -- the current Linus tree contains the patches from Sergei
adding support host-side testing/clearing of IRQs for hosts that support it
(this includes CMD64x ones) that should in the long-term give us a saner and
more reliable handling of shared and/or unexpected IRQs.

Frans, could you try also the current Linus tree + David's patch?

Thanks.
Bart

  parent reply	other threads:[~2009-06-21 13:10 UTC|newest]

Thread overview: 57+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-06-20 21:52 cmd64x: irq 14: nobody cared - system is dreadfully slow Frans Pop
2009-06-21  0:19 ` David Miller
2009-06-21  4:47   ` Frans Pop
2009-06-21 12:46   ` Frans Pop
2009-06-21 13:33     ` Bartlomiej Zolnierkiewicz
2009-06-21 20:14       ` Frans Pop
2009-06-22  1:56     ` David Miller
2009-06-22  4:28       ` Frans Pop
2009-06-22  5:45         ` David Miller
2009-06-22  6:43           ` Frans Pop
2009-06-22  6:44             ` David Miller
2009-06-22 11:21             ` Bartlomiej Zolnierkiewicz
2009-06-22 14:04               ` Frans Pop
2009-06-22 14:39                 ` Bartlomiej Zolnierkiewicz
2009-06-22 15:16                   ` Frans Pop
2009-06-22 17:38                     ` Bartlomiej Zolnierkiewicz
2009-06-22 19:01                       ` Frans Pop
2009-06-22 21:35                         ` Bartlomiej Zolnierkiewicz
2009-06-23  7:51                           ` [PATCH] ide-cd: Improve "weird block size" error message Frans Pop
2009-06-23  7:57                             ` Borislav Petkov
2009-06-23  8:02                               ` Borislav Petkov
2009-06-23 23:03                                 ` David Miller
2009-06-23  8:20                               ` Frans Pop
2009-06-23 10:59                             ` David Miller
2009-06-23 11:13                               ` Frans Pop
2009-06-23 11:18                                 ` David Miller
2009-06-23 21:30                               ` Frans Pop
2009-06-23 23:01                                 ` David Miller
2009-06-29 11:19                               ` Jan Engelhardt
2009-06-23 10:15                           ` cmd64x: irq 14: nobody cared - system is dreadfully slow David Miller
2009-06-23 14:58                             ` Frans Pop
2009-06-23 16:13                               ` Bartlomiej Zolnierkiewicz
2009-06-23 23:04                                 ` David Miller
2009-06-23 10:47                         ` David Miller
2009-06-23 10:43                 ` David Miller
2009-07-31 14:08                   ` Frans Pop
2009-08-01  5:46                     ` David Miller
2009-08-05 20:43                     ` [stable] " Greg KH
2009-06-21 13:15   ` Bartlomiej Zolnierkiewicz [this message]
2009-06-21 21:19     ` David Miller
2009-06-21 22:34       ` Bartlomiej Zolnierkiewicz
2009-06-21 22:57         ` David Miller
2009-06-21 23:13           ` New IDE maintainer (was Re: cmd64x: irq 14: nobody cared - system is dreadfully slow) David Miller
2009-06-21 23:45             ` Bartlomiej Zolnierkiewicz
2009-06-21 23:52               ` New IDE maintainer David Miller
2009-06-22  0:53               ` New IDE maintainer (was Re: cmd64x: irq 14: nobody cared - system is dreadfully slow) Matthew Wilcox
2009-06-22  0:00             ` Stephen Rothwell
2009-06-22  0:20               ` New IDE maintainer David Miller
2009-06-22  3:39             ` New IDE maintainer (was Re: cmd64x: irq 14: nobody cared - system is dreadfully slow) Greg Freemyer
2009-06-22 17:03               ` Jeff Garzik
2009-06-22 17:11                 ` Alan Cox
2009-06-22 17:21                   ` Arnd Bergmann
2009-06-22 17:32                     ` Alan Cox
2009-06-22  3:47         ` cmd64x: irq 14: nobody cared - system is dreadfully slow Frans Pop
2009-06-21 15:43   ` Bartlomiej Zolnierkiewicz
2009-06-21 21:21     ` David Miller
  -- strict thread matches above, loose matches on Subject: below --
2009-06-20 21:39 Frans Pop

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=200906211515.56804.bzolnier@gmail.com \
    --to=bzolnier@gmail.com \
    --cc=davem@davemloft.net \
    --cc=elendil@planet.nl \
    --cc=linux-ide@vger.kernel.org \
    --cc=sparclinux@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).