netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: 2.6.21-rc3-git4 ata1.00: qc timeout (cmd 0xef) (crashdump kernel)
       [not found] ` <45F51218.9030907@gmail.com>
@ 2007-03-12 16:31   ` Michal Piotrowski
  2007-03-12 16:37     ` Tejun Heo
  2007-03-12 16:46     ` Thomas Gleixner
  0 siblings, 2 replies; 12+ messages in thread
From: Michal Piotrowski @ 2007-03-12 16:31 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Michal Piotrowski, Jeff Garzik, linux-ide, Thomas Gleixner,
	Ingo Molnar, Bartlomiej Zolnierkiewicz, Stephen Hemminger, netdev,
	Alan Cox

Hi,

Tejun Heo napisał(a):
> Michal Piotrowski wrote:
>> Hi Jeff,
>>
>> I've got some problems with my SATA controller on crashdump kernel.
>>
>> Calling initcall 0xc1916081: fc_transport_init+0x0/0x35()
>> Calling initcall 0xc19160b6: init_sd+0x0/0xbc()
>> Calling initcall 0xc19161ec: piix_init+0x0/0x27()
>> ata_piix 0000:00:1f.2: version 2.10
>> ata_piix 0000:00:1f.2: MAP [ P0 -- P1 -- ]
>> ACPI: PCI Interrupt 0000:00:1f.2[A] -> Link [LNKC] -> GSI 5 (level,
>> low) -> IRQ 5
>> PCI: Setting latency timer of device 0000:00:1f.2 to 64
>> ata1: SATA max UDMA/133 cmd 0x0001cc00 ctl 0x0001c882 bmdma 0x0001c400
>> irq 5
>> ata2: SATA max UDMA/133 cmd 0x0001c800 ctl 0x0001c482 bmdma 0x0001c408
>> irq 5
>> scsi0 : ata_piix
>> PM: Adding info for No Bus:host0
>> ata1.00: ATA-7: ST3160811AS, 3.AAE, max UDMA/133
>> ata1.00: 312581808 sectors, multi 16: LBA48 NCQ (depth 0/32)
>> ata1.00: qc timeout (cmd 0xef)
>> ata1.00: failed to set xfermode (err_mask=0x4)
> 
> Does giving 'irqpoll' boot parameter fix the problem?
> 

Hmmm... it works.

Calling initcall 0xc19154d8: piix_ide_init+0x0/0xbb()
Calling initcall 0xc19155b6: generic_ide_init+0x0/0x16()
Calling initcall 0xc191572e: ide_init+0x0/0x81()
Uniform Multi-Platform E-IDE driver Revision: 7.00alpha2
ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx
ICH5: IDE controller at PCI slot 0000:00:1f.1
irq 5: nobody cared (try booting with the "irqpoll" option)
 [<c1604556>] show_trace_log_lvl+0x1a/0x2f
 [<c1604c2c>] show_trace+0x12/0x14
 [<c1604cde>] dump_stack+0x16/0x18
 [<c164341c>] __report_bad_irq+0x39/0x79
 [<c16435eb>] note_interrupt+0x18f/0x1c8
 [<c1643ec6>] handle_level_irq+0x95/0xcb
 [<c1605dd8>] do_IRQ+0xb4/0xe0
 =======================
handlers:
[<c174f55e>] (skge_intr+0x0/0x3ff)
Disabling IRQ #5
ACPI: PCI Interrupt Link [LNKC] enabled at IRQ 5
ACPI: PCI Interrupt 0000:00:1f.1[A] -> Link [LNKC] -> GSI 5 (level, low) -> IRQ 5
ICH5: chipset revision 2
ICH5: not 100% native mode: will probe irqs later
    ide0: BM-DMA at 0xfc00-0xfc07, BIOS settings: hda:DMA, hdb:DMA
    ide1: BM-DMA at 0xfc08-0xfc0f, BIOS settings: hdc:pio, hdd:DMA

Is this an IDE or skge bug?

http://www.stardust.webpages.pl/files/tbf/bitis-gabonica/2.6.21-rc3-git4-kdump/git-config

Thomas, Ingo - this soft lockup with irqpoll seems to be fixed
http://www.ussg.iu.edu/hypermail/linux/kernel/0703.0/index.html#1116
Thanks!

Regards,
Michal

-- 
Michal K. K. Piotrowski
LTG - Linux Testers Group (PL)
(http://www.stardust.webpages.pl/ltg/)
LTG - Linux Testers Group (EN)
(http://www.stardust.webpages.pl/linux_testers_group_en/)

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: 2.6.21-rc3-git4 ata1.00: qc timeout (cmd 0xef) (crashdump kernel)
  2007-03-12 16:31   ` 2.6.21-rc3-git4 ata1.00: qc timeout (cmd 0xef) (crashdump kernel) Michal Piotrowski
@ 2007-03-12 16:37     ` Tejun Heo
  2007-03-12 16:47       ` Thomas Gleixner
  2007-03-12 16:46     ` Thomas Gleixner
  1 sibling, 1 reply; 12+ messages in thread
From: Tejun Heo @ 2007-03-12 16:37 UTC (permalink / raw)
  To: Michal Piotrowski
  Cc: Jeff Garzik, linux-ide, Thomas Gleixner, Ingo Molnar,
	Bartlomiej Zolnierkiewicz, Stephen Hemminger, netdev, Alan Cox

Michal Piotrowski wrote:
> Calling initcall 0xc19154d8: piix_ide_init+0x0/0xbb()
> Calling initcall 0xc19155b6: generic_ide_init+0x0/0x16()
> Calling initcall 0xc191572e: ide_init+0x0/0x81()
> Uniform Multi-Platform E-IDE driver Revision: 7.00alpha2
> ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx
> ICH5: IDE controller at PCI slot 0000:00:1f.1
> irq 5: nobody cared (try booting with the "irqpoll" option)
>  [<c1604556>] show_trace_log_lvl+0x1a/0x2f
>  [<c1604c2c>] show_trace+0x12/0x14
>  [<c1604cde>] dump_stack+0x16/0x18
>  [<c164341c>] __report_bad_irq+0x39/0x79
>  [<c16435eb>] note_interrupt+0x18f/0x1c8
>  [<c1643ec6>] handle_level_irq+0x95/0xcb
>  [<c1605dd8>] do_IRQ+0xb4/0xe0
>  =======================
> handlers:
> [<c174f55e>] (skge_intr+0x0/0x3ff)
> Disabling IRQ #5
> ACPI: PCI Interrupt Link [LNKC] enabled at IRQ 5
> ACPI: PCI Interrupt 0000:00:1f.1[A] -> Link [LNKC] -> GSI 5 (level, low) -> IRQ 5
> ICH5: chipset revision 2
> ICH5: not 100% native mode: will probe irqs later
>     ide0: BM-DMA at 0xfc00-0xfc07, BIOS settings: hda:DMA, hdb:DMA
>     ide1: BM-DMA at 0xfc08-0xfc0f, BIOS settings: hdc:pio, hdd:DMA
> 
> Is this an IDE or skge bug?

It seems skge's.  skge is screaming and kernel shuts down IRQ 5.
ata_piix is unfortunately sharing the IRQ, so its IRQ doesn't get
serviced and commands time out.

-- 
tejun

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: 2.6.21-rc3-git4 ata1.00: qc timeout (cmd 0xef) (crashdump kernel)
  2007-03-12 16:31   ` 2.6.21-rc3-git4 ata1.00: qc timeout (cmd 0xef) (crashdump kernel) Michal Piotrowski
  2007-03-12 16:37     ` Tejun Heo
@ 2007-03-12 16:46     ` Thomas Gleixner
  2007-03-12 16:56       ` Tejun Heo
  1 sibling, 1 reply; 12+ messages in thread
From: Thomas Gleixner @ 2007-03-12 16:46 UTC (permalink / raw)
  To: Michal Piotrowski
  Cc: Tejun Heo, Jeff Garzik, linux-ide, Ingo Molnar,
	Bartlomiej Zolnierkiewicz, Stephen Hemminger, netdev, Alan Cox

On Mon, 2007-03-12 at 17:31 +0100, Michal Piotrowski wrote:
> Calling initcall 0xc19154d8: piix_ide_init+0x0/0xbb()
> Calling initcall 0xc19155b6: generic_ide_init+0x0/0x16()
> Calling initcall 0xc191572e: ide_init+0x0/0x81()
> Uniform Multi-Platform E-IDE driver Revision: 7.00alpha2
> ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx
> ICH5: IDE controller at PCI slot 0000:00:1f.1
> irq 5: nobody cared (try booting with the "irqpoll" option)
>  [<c1604556>] show_trace_log_lvl+0x1a/0x2f
>  [<c1604c2c>] show_trace+0x12/0x14
>  [<c1604cde>] dump_stack+0x16/0x18
>  [<c164341c>] __report_bad_irq+0x39/0x79
>  [<c16435eb>] note_interrupt+0x18f/0x1c8
>  [<c1643ec6>] handle_level_irq+0x95/0xcb
>  [<c1605dd8>] do_IRQ+0xb4/0xe0
>  =======================
> handlers:
> [<c174f55e>] (skge_intr+0x0/0x3ff)
> Disabling IRQ #5

I know this one :( 

It seems to be related to the BIOS spinning up the CDROM and leaving the
IDE controller in some weird state. When we come back the interrupt is
screaming and nobody cares, so it gets disabled. I have no clue yet, how
to handle this.

Disabling the interrupt across suspend/resume helps, but does not work,
when the interrupt is shared with some other device.

	tglx



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: 2.6.21-rc3-git4 ata1.00: qc timeout (cmd 0xef) (crashdump kernel)
  2007-03-12 16:47       ` Thomas Gleixner
@ 2007-03-12 16:47         ` Tejun Heo
  2007-03-12 17:36           ` Michal Piotrowski
  0 siblings, 1 reply; 12+ messages in thread
From: Tejun Heo @ 2007-03-12 16:47 UTC (permalink / raw)
  To: tglx
  Cc: Michal Piotrowski, Jeff Garzik, linux-ide, Ingo Molnar,
	Bartlomiej Zolnierkiewicz, Stephen Hemminger, netdev, Alan Cox

Thomas Gleixner wrote:
> On Tue, 2007-03-13 at 01:37 +0900, Tejun Heo wrote:
>> Michal Piotrowski wrote:
>>> Calling initcall 0xc19154d8: piix_ide_init+0x0/0xbb()
>>> Calling initcall 0xc19155b6: generic_ide_init+0x0/0x16()
>>> Calling initcall 0xc191572e: ide_init+0x0/0x81()
>>> Uniform Multi-Platform E-IDE driver Revision: 7.00alpha2
>>> ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx
>>> ICH5: IDE controller at PCI slot 0000:00:1f.1
>>> irq 5: nobody cared (try booting with the "irqpoll" option)
>>>  [<c1604556>] show_trace_log_lvl+0x1a/0x2f
>>>  [<c1604c2c>] show_trace+0x12/0x14
>>>  [<c1604cde>] dump_stack+0x16/0x18
>>>  [<c164341c>] __report_bad_irq+0x39/0x79
>>>  [<c16435eb>] note_interrupt+0x18f/0x1c8
>>>  [<c1643ec6>] handle_level_irq+0x95/0xcb
>>>  [<c1605dd8>] do_IRQ+0xb4/0xe0
>>>  =======================
>>> handlers:
>>> [<c174f55e>] (skge_intr+0x0/0x3ff)
>>> Disabling IRQ #5
>>> ACPI: PCI Interrupt Link [LNKC] enabled at IRQ 5
>>> ACPI: PCI Interrupt 0000:00:1f.1[A] -> Link [LNKC] -> GSI 5 (level, low) -> IRQ 5
>>> ICH5: chipset revision 2
>>> ICH5: not 100% native mode: will probe irqs later
>>>     ide0: BM-DMA at 0xfc00-0xfc07, BIOS settings: hda:DMA, hdb:DMA
>>>     ide1: BM-DMA at 0xfc08-0xfc0f, BIOS settings: hdc:pio, hdd:DMA
>>>
>>> Is this an IDE or skge bug?
>> It seems skge's.  skge is screaming and kernel shuts down IRQ 5.
>> ata_piix is unfortunately sharing the IRQ, so its IRQ doesn't get
>> serviced and commands time out.
> 
> I doubt that. On my box the interrupt is solely used by ata_piix.

Ah right.  ata_piix could be screaming when the skge requested IRQ#5,
but ata_piix is in native mode meaning that the PCI device is probably
in disabled state when skge requests IRQ#5.

Michal, can you please test the machine with skge disabled?  If it's an
on board device, you can probably disable it in the BIOS configuration menu.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: 2.6.21-rc3-git4 ata1.00: qc timeout (cmd 0xef) (crashdump kernel)
  2007-03-12 16:37     ` Tejun Heo
@ 2007-03-12 16:47       ` Thomas Gleixner
  2007-03-12 16:47         ` Tejun Heo
  0 siblings, 1 reply; 12+ messages in thread
From: Thomas Gleixner @ 2007-03-12 16:47 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Michal Piotrowski, Jeff Garzik, linux-ide, Ingo Molnar,
	Bartlomiej Zolnierkiewicz, Stephen Hemminger, netdev, Alan Cox

On Tue, 2007-03-13 at 01:37 +0900, Tejun Heo wrote:
> Michal Piotrowski wrote:
> > Calling initcall 0xc19154d8: piix_ide_init+0x0/0xbb()
> > Calling initcall 0xc19155b6: generic_ide_init+0x0/0x16()
> > Calling initcall 0xc191572e: ide_init+0x0/0x81()
> > Uniform Multi-Platform E-IDE driver Revision: 7.00alpha2
> > ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx
> > ICH5: IDE controller at PCI slot 0000:00:1f.1
> > irq 5: nobody cared (try booting with the "irqpoll" option)
> >  [<c1604556>] show_trace_log_lvl+0x1a/0x2f
> >  [<c1604c2c>] show_trace+0x12/0x14
> >  [<c1604cde>] dump_stack+0x16/0x18
> >  [<c164341c>] __report_bad_irq+0x39/0x79
> >  [<c16435eb>] note_interrupt+0x18f/0x1c8
> >  [<c1643ec6>] handle_level_irq+0x95/0xcb
> >  [<c1605dd8>] do_IRQ+0xb4/0xe0
> >  =======================
> > handlers:
> > [<c174f55e>] (skge_intr+0x0/0x3ff)
> > Disabling IRQ #5
> > ACPI: PCI Interrupt Link [LNKC] enabled at IRQ 5
> > ACPI: PCI Interrupt 0000:00:1f.1[A] -> Link [LNKC] -> GSI 5 (level, low) -> IRQ 5
> > ICH5: chipset revision 2
> > ICH5: not 100% native mode: will probe irqs later
> >     ide0: BM-DMA at 0xfc00-0xfc07, BIOS settings: hda:DMA, hdb:DMA
> >     ide1: BM-DMA at 0xfc08-0xfc0f, BIOS settings: hdc:pio, hdd:DMA
> > 
> > Is this an IDE or skge bug?
> 
> It seems skge's.  skge is screaming and kernel shuts down IRQ 5.
> ata_piix is unfortunately sharing the IRQ, so its IRQ doesn't get
> serviced and commands time out.

I doubt that. On my box the interrupt is solely used by ata_piix.

	tglx



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: 2.6.21-rc3-git4 ata1.00: qc timeout (cmd 0xef) (crashdump kernel)
  2007-03-12 16:46     ` Thomas Gleixner
@ 2007-03-12 16:56       ` Tejun Heo
  2007-03-12 18:50         ` Stephen Hemminger
  0 siblings, 1 reply; 12+ messages in thread
From: Tejun Heo @ 2007-03-12 16:56 UTC (permalink / raw)
  To: tglx
  Cc: Michal Piotrowski, Jeff Garzik, linux-ide, Ingo Molnar,
	Bartlomiej Zolnierkiewicz, Stephen Hemminger, netdev, Alan Cox

Thomas Gleixner wrote:
> On Mon, 2007-03-12 at 17:31 +0100, Michal Piotrowski wrote:
>> Calling initcall 0xc19154d8: piix_ide_init+0x0/0xbb()
>> Calling initcall 0xc19155b6: generic_ide_init+0x0/0x16()
>> Calling initcall 0xc191572e: ide_init+0x0/0x81()
>> Uniform Multi-Platform E-IDE driver Revision: 7.00alpha2
>> ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx
>> ICH5: IDE controller at PCI slot 0000:00:1f.1
>> irq 5: nobody cared (try booting with the "irqpoll" option)
>>  [<c1604556>] show_trace_log_lvl+0x1a/0x2f
>>  [<c1604c2c>] show_trace+0x12/0x14
>>  [<c1604cde>] dump_stack+0x16/0x18
>>  [<c164341c>] __report_bad_irq+0x39/0x79
>>  [<c16435eb>] note_interrupt+0x18f/0x1c8
>>  [<c1643ec6>] handle_level_irq+0x95/0xcb
>>  [<c1605dd8>] do_IRQ+0xb4/0xe0
>>  =======================
>> handlers:
>> [<c174f55e>] (skge_intr+0x0/0x3ff)
>> Disabling IRQ #5
> 
> I know this one :( 
> 
> It seems to be related to the BIOS spinning up the CDROM and leaving the
> IDE controller in some weird state. When we come back the interrupt is
> screaming and nobody cares, so it gets disabled. I have no clue yet, how
> to handle this.
> 
> Disabling the interrupt across suspend/resume helps, but does not work,
> when the interrupt is shared with some other device.

Similar thing can happen during initialization.  I haven't actually
instrumented the code but I think what happens is

1. the controller has IRQ stuck high (infrequent but possible)
2. the IRQ is already requested by another device
3. the IRQ gets disabled due to screaming interrupts at the moment
ata_piix does pci_enable_device().

I think we can be much more resilient to screaming interrupts if we
enable device with IRQ disabled and enable it after the device is
initialized to some level, possibly when requesting IRQ.

-- 
tejun

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: 2.6.21-rc3-git4 ata1.00: qc timeout (cmd 0xef) (crashdump kernel)
  2007-03-12 16:47         ` Tejun Heo
@ 2007-03-12 17:36           ` Michal Piotrowski
  0 siblings, 0 replies; 12+ messages in thread
From: Michal Piotrowski @ 2007-03-12 17:36 UTC (permalink / raw)
  To: Tejun Heo
  Cc: tglx, Jeff Garzik, linux-ide, Ingo Molnar,
	Bartlomiej Zolnierkiewicz, Stephen Hemminger, netdev, Alan Cox

On 12/03/07, Tejun Heo <htejun@gmail.com> wrote:
> Thomas Gleixner wrote:
> > On Tue, 2007-03-13 at 01:37 +0900, Tejun Heo wrote:
> >> Michal Piotrowski wrote:
> >>> Calling initcall 0xc19154d8: piix_ide_init+0x0/0xbb()
> >>> Calling initcall 0xc19155b6: generic_ide_init+0x0/0x16()
> >>> Calling initcall 0xc191572e: ide_init+0x0/0x81()
> >>> Uniform Multi-Platform E-IDE driver Revision: 7.00alpha2
> >>> ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx
> >>> ICH5: IDE controller at PCI slot 0000:00:1f.1
> >>> irq 5: nobody cared (try booting with the "irqpoll" option)
> >>>  [<c1604556>] show_trace_log_lvl+0x1a/0x2f
> >>>  [<c1604c2c>] show_trace+0x12/0x14
> >>>  [<c1604cde>] dump_stack+0x16/0x18
> >>>  [<c164341c>] __report_bad_irq+0x39/0x79
> >>>  [<c16435eb>] note_interrupt+0x18f/0x1c8
> >>>  [<c1643ec6>] handle_level_irq+0x95/0xcb
> >>>  [<c1605dd8>] do_IRQ+0xb4/0xe0
> >>>  =======================
> >>> handlers:
> >>> [<c174f55e>] (skge_intr+0x0/0x3ff)
> >>> Disabling IRQ #5
> >>> ACPI: PCI Interrupt Link [LNKC] enabled at IRQ 5
> >>> ACPI: PCI Interrupt 0000:00:1f.1[A] -> Link [LNKC] -> GSI 5 (level, low) -> IRQ 5
> >>> ICH5: chipset revision 2
> >>> ICH5: not 100% native mode: will probe irqs later
> >>>     ide0: BM-DMA at 0xfc00-0xfc07, BIOS settings: hda:DMA, hdb:DMA
> >>>     ide1: BM-DMA at 0xfc08-0xfc0f, BIOS settings: hdc:pio, hdd:DMA
> >>>
> >>> Is this an IDE or skge bug?
> >> It seems skge's.  skge is screaming and kernel shuts down IRQ 5.
> >> ata_piix is unfortunately sharing the IRQ, so its IRQ doesn't get
> >> serviced and commands time out.
> >
> > I doubt that. On my box the interrupt is solely used by ata_piix.
>
> Ah right.  ata_piix could be screaming when the skge requested IRQ#5,
> but ata_piix is in native mode meaning that the PCI device is probably
> in disabled state when skge requests IRQ#5.
>
> Michal, can you please test the machine with skge disabled?

It seems to work fine with skge disabled.

>  If it's an
> on board device, you can probably disable it in the BIOS configuration menu.
>
> Thanks.
>
> --
> tejun
>

Regards,
Michal

-- 
Michal K. K. Piotrowski
LTG - Linux Testers Group (PL)
(http://www.stardust.webpages.pl/ltg/)
LTG - Linux Testers Group (EN)
(http://www.stardust.webpages.pl/linux_testers_group_en/)

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: 2.6.21-rc3-git4 ata1.00: qc timeout (cmd 0xef) (crashdump kernel)
  2007-03-12 16:56       ` Tejun Heo
@ 2007-03-12 18:50         ` Stephen Hemminger
  2007-03-12 19:03           ` Tejun Heo
  0 siblings, 1 reply; 12+ messages in thread
From: Stephen Hemminger @ 2007-03-12 18:50 UTC (permalink / raw)
  To: Tejun Heo
  Cc: tglx, Michal Piotrowski, Jeff Garzik, linux-ide, Ingo Molnar,
	Bartlomiej Zolnierkiewicz, netdev, Alan Cox

On Tue, 13 Mar 2007 01:56:36 +0900
Tejun Heo <htejun@gmail.com> wrote:

> Thomas Gleixner wrote:
> > On Mon, 2007-03-12 at 17:31 +0100, Michal Piotrowski wrote:
> >> Calling initcall 0xc19154d8: piix_ide_init+0x0/0xbb()
> >> Calling initcall 0xc19155b6: generic_ide_init+0x0/0x16()
> >> Calling initcall 0xc191572e: ide_init+0x0/0x81()
> >> Uniform Multi-Platform E-IDE driver Revision: 7.00alpha2
> >> ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx
> >> ICH5: IDE controller at PCI slot 0000:00:1f.1
> >> irq 5: nobody cared (try booting with the "irqpoll" option)
> >>  [<c1604556>] show_trace_log_lvl+0x1a/0x2f
> >>  [<c1604c2c>] show_trace+0x12/0x14
> >>  [<c1604cde>] dump_stack+0x16/0x18
> >>  [<c164341c>] __report_bad_irq+0x39/0x79
> >>  [<c16435eb>] note_interrupt+0x18f/0x1c8
> >>  [<c1643ec6>] handle_level_irq+0x95/0xcb
> >>  [<c1605dd8>] do_IRQ+0xb4/0xe0
> >>  =======================
> >> handlers:
> >> [<c174f55e>] (skge_intr+0x0/0x3ff)
> >> Disabling IRQ #5
> > 
> > I know this one :( 
> > 
> > It seems to be related to the BIOS spinning up the CDROM and leaving the
> > IDE controller in some weird state. When we come back the interrupt is
> > screaming and nobody cares, so it gets disabled. I have no clue yet, how
> > to handle this.
> > 
> > Disabling the interrupt across suspend/resume helps, but does not work,
> > when the interrupt is shared with some other device.
> 
> Similar thing can happen during initialization.  I haven't actually
> instrumented the code but I think what happens is
> 
> 1. the controller has IRQ stuck high (infrequent but possible)
> 2. the IRQ is already requested by another device
> 3. the IRQ gets disabled due to screaming interrupts at the moment
> ata_piix does pci_enable_device().
> 
> I think we can be much more resilient to screaming interrupts if we
> enable device with IRQ disabled and enable it after the device is
> initialized to some level, possibly when requesting IRQ.

The first thing the skge driver does is do a chip reset, and that should
cause IRQ to be disabled and cleared. The driver has no chance to
fix it if the BIOS left the IRQ screaming...

> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: 2.6.21-rc3-git4 ata1.00: qc timeout (cmd 0xef) (crashdump kernel)
  2007-03-12 18:50         ` Stephen Hemminger
@ 2007-03-12 19:03           ` Tejun Heo
  2007-03-12 19:30             ` Stephen Hemminger
  0 siblings, 1 reply; 12+ messages in thread
From: Tejun Heo @ 2007-03-12 19:03 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: tglx, Michal Piotrowski, Jeff Garzik, linux-ide, Ingo Molnar,
	Bartlomiej Zolnierkiewicz, netdev, Alan Cox

Stephen Hemminger wrote:
>> 1. the controller has IRQ stuck high (infrequent but possible)
>> 2. the IRQ is already requested by another device
>> 3. the IRQ gets disabled due to screaming interrupts at the moment
>> ata_piix does pci_enable_device().
>>
>> I think we can be much more resilient to screaming interrupts if we
>> enable device with IRQ disabled and enable it after the device is
>> initialized to some level, possibly when requesting IRQ.
> 
> The first thing the skge driver does is do a chip reset, and that should
> cause IRQ to be disabled and cleared. The driver has no chance to
> fix it if the BIOS left the IRQ screaming...

What if we do something like...

	pci_intx(pdev, 0);
	pci_enable_device(pdev);
	/* initialize */
	request_irq(blah blah...);
	pci_intx(pdev, 1);

Would this work for skge?

-- 
tejun

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: 2.6.21-rc3-git4 ata1.00: qc timeout (cmd 0xef) (crashdump kernel)
  2007-03-12 19:03           ` Tejun Heo
@ 2007-03-12 19:30             ` Stephen Hemminger
  2007-03-12 19:40               ` Tejun Heo
  0 siblings, 1 reply; 12+ messages in thread
From: Stephen Hemminger @ 2007-03-12 19:30 UTC (permalink / raw)
  To: Tejun Heo
  Cc: tglx, Michal Piotrowski, Jeff Garzik, linux-ide, Ingo Molnar,
	Bartlomiej Zolnierkiewicz, netdev, Alan Cox

On Tue, 13 Mar 2007 04:03:00 +0900
Tejun Heo <htejun@gmail.com> wrote:

> Stephen Hemminger wrote:
> >> 1. the controller has IRQ stuck high (infrequent but possible)
> >> 2. the IRQ is already requested by another device
> >> 3. the IRQ gets disabled due to screaming interrupts at the moment
> >> ata_piix does pci_enable_device().
> >>
> >> I think we can be much more resilient to screaming interrupts if we
> >> enable device with IRQ disabled and enable it after the device is
> >> initialized to some level, possibly when requesting IRQ.
> > 
> > The first thing the skge driver does is do a chip reset, and that should
> > cause IRQ to be disabled and cleared. The driver has no chance to
> > fix it if the BIOS left the IRQ screaming...
> 
> What if we do something like...
> 
> 	pci_intx(pdev, 0);
> 	pci_enable_device(pdev);
> 	/* initialize */
> 	request_irq(blah blah...);
> 	pci_intx(pdev, 1);
> 
> Would this work for skge?
> 

Okay for testing, but any change like this should be done in the base
PCI layer, not one off in a particular driver.

-- 
Stephen Hemminger <shemminger@linux-foundation.org>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: 2.6.21-rc3-git4 ata1.00: qc timeout (cmd 0xef) (crashdump kernel)
  2007-03-12 19:30             ` Stephen Hemminger
@ 2007-03-12 19:40               ` Tejun Heo
  2007-03-13 18:26                 ` Michal Piotrowski
  0 siblings, 1 reply; 12+ messages in thread
From: Tejun Heo @ 2007-03-12 19:40 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: tglx, Michal Piotrowski, Jeff Garzik, linux-ide, Ingo Molnar,
	Bartlomiej Zolnierkiewicz, netdev, Alan Cox

[-- Attachment #1: Type: text/plain, Size: 1359 bytes --]

Stephen Hemminger wrote:
> On Tue, 13 Mar 2007 04:03:00 +0900
> Tejun Heo <htejun@gmail.com> wrote:
> 
>> Stephen Hemminger wrote:
>>>> 1. the controller has IRQ stuck high (infrequent but possible)
>>>> 2. the IRQ is already requested by another device
>>>> 3. the IRQ gets disabled due to screaming interrupts at the moment
>>>> ata_piix does pci_enable_device().
>>>>
>>>> I think we can be much more resilient to screaming interrupts if we
>>>> enable device with IRQ disabled and enable it after the device is
>>>> initialized to some level, possibly when requesting IRQ.
>>> The first thing the skge driver does is do a chip reset, and that should
>>> cause IRQ to be disabled and cleared. The driver has no chance to
>>> fix it if the BIOS left the IRQ screaming...
>> What if we do something like...
>>
>> 	pci_intx(pdev, 0);
>> 	pci_enable_device(pdev);
>> 	/* initialize */
>> 	request_irq(blah blah...);
>> 	pci_intx(pdev, 1);
>>
>> Would this work for skge?
>>
> 
> Okay for testing, but any change like this should be done in the base
> PCI layer, not one off in a particular driver.

Yeap, it was a proof-of-concept pseudo code.  I attached a patch to do
above in skge.  Please point out if it is broken (e.g. intx needs to be
enabled earlier).

Michal, can you apply the attached patch and see whether it fixes the
problem.

Thanks.

-- 
tejun

[-- Attachment #2: patch --]
[-- Type: text/plain, Size: 633 bytes --]

diff --git a/drivers/net/skge.c b/drivers/net/skge.c
index eea75a4..2c990f2 100644
--- a/drivers/net/skge.c
+++ b/drivers/net/skge.c
@@ -3585,6 +3585,7 @@ static int __devinit skge_probe(struct pci_dev *pdev,
 	struct skge_hw *hw;
 	int err, using_dac = 0;
 
+	pci_intx(pdev, 0);
 	err = pci_enable_device(pdev);
 	if (err) {
 		dev_err(&pdev->dev, "cannot enable PCI device\n");
@@ -3669,6 +3670,7 @@ static int __devinit skge_probe(struct pci_dev *pdev,
 		       dev->name, pdev->irq);
 		goto err_out_unregister;
 	}
+	pci_intx(pdev, 1);
 	skge_show_addr(dev);
 
 	if (hw->ports > 1 && (dev1 = skge_devinit(hw, 1, using_dac))) {

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: 2.6.21-rc3-git4 ata1.00: qc timeout (cmd 0xef) (crashdump kernel)
  2007-03-12 19:40               ` Tejun Heo
@ 2007-03-13 18:26                 ` Michal Piotrowski
  0 siblings, 0 replies; 12+ messages in thread
From: Michal Piotrowski @ 2007-03-13 18:26 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Stephen Hemminger, tglx, Jeff Garzik, linux-ide, Ingo Molnar,
	Bartlomiej Zolnierkiewicz, netdev, Alan Cox

On 12/03/07, Tejun Heo <htejun@gmail.com> wrote:
> Stephen Hemminger wrote:
> > On Tue, 13 Mar 2007 04:03:00 +0900
> > Tejun Heo <htejun@gmail.com> wrote:
> >
> >> Stephen Hemminger wrote:
> >>>> 1. the controller has IRQ stuck high (infrequent but possible)
> >>>> 2. the IRQ is already requested by another device
> >>>> 3. the IRQ gets disabled due to screaming interrupts at the moment
> >>>> ata_piix does pci_enable_device().
> >>>>
> >>>> I think we can be much more resilient to screaming interrupts if we
> >>>> enable device with IRQ disabled and enable it after the device is
> >>>> initialized to some level, possibly when requesting IRQ.
> >>> The first thing the skge driver does is do a chip reset, and that should
> >>> cause IRQ to be disabled and cleared. The driver has no chance to
> >>> fix it if the BIOS left the IRQ screaming...
> >> What if we do something like...
> >>
> >>      pci_intx(pdev, 0);
> >>      pci_enable_device(pdev);
> >>      /* initialize */
> >>      request_irq(blah blah...);
> >>      pci_intx(pdev, 1);
> >>
> >> Would this work for skge?
> >>
> >
> > Okay for testing, but any change like this should be done in the base
> > PCI layer, not one off in a particular driver.
>
> Yeap, it was a proof-of-concept pseudo code.  I attached a patch to do
> above in skge.  Please point out if it is broken (e.g. intx needs to be
> enabled earlier).
>
> Michal, can you apply the attached patch and see whether it fixes the
> problem.

I think that problem is solved.

Thanks.

>
> Thanks.
>
> --
> tejun
>
> diff --git a/drivers/net/skge.c b/drivers/net/skge.c
> index eea75a4..2c990f2 100644
> --- a/drivers/net/skge.c
> +++ b/drivers/net/skge.c
> @@ -3585,6 +3585,7 @@ static int __devinit skge_probe(struct pci_dev *pdev,
>         struct skge_hw *hw;
>         int err, using_dac = 0;
>
> +       pci_intx(pdev, 0);
>         err = pci_enable_device(pdev);
>         if (err) {
>                 dev_err(&pdev->dev, "cannot enable PCI device\n");
> @@ -3669,6 +3670,7 @@ static int __devinit skge_probe(struct pci_dev *pdev,
>                        dev->name, pdev->irq);
>                 goto err_out_unregister;
>         }
> +       pci_intx(pdev, 1);
>         skge_show_addr(dev);
>
>         if (hw->ports > 1 && (dev1 = skge_devinit(hw, 1, using_dac))) {
>
>

Regards,
Michal

-- 
Michal K. K. Piotrowski
LTG - Linux Testers Group (PL)
(http://www.stardust.webpages.pl/ltg/)
LTG - Linux Testers Group (EN)
(http://www.stardust.webpages.pl/linux_testers_group_en/)

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2007-03-13 18:26 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <6bffcb0e0703090857r14eda34bj92f3fd1d0008edb8@mail.gmail.com>
     [not found] ` <45F51218.9030907@gmail.com>
2007-03-12 16:31   ` 2.6.21-rc3-git4 ata1.00: qc timeout (cmd 0xef) (crashdump kernel) Michal Piotrowski
2007-03-12 16:37     ` Tejun Heo
2007-03-12 16:47       ` Thomas Gleixner
2007-03-12 16:47         ` Tejun Heo
2007-03-12 17:36           ` Michal Piotrowski
2007-03-12 16:46     ` Thomas Gleixner
2007-03-12 16:56       ` Tejun Heo
2007-03-12 18:50         ` Stephen Hemminger
2007-03-12 19:03           ` Tejun Heo
2007-03-12 19:30             ` Stephen Hemminger
2007-03-12 19:40               ` Tejun Heo
2007-03-13 18:26                 ` Michal Piotrowski

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).