PATA Sil680 Command Timeout on ARM XScale

linux-ide.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* PATA Sil680 Command Timeout on ARM XScale
@ 2007-03-13 18:34 Fajun Chen
  2007-03-13 19:39 ` Alan Cox
  0 siblings, 1 reply; 7+ messages in thread
From: Fajun Chen @ 2007-03-13 18:34 UTC (permalink / raw)
  To: linux-ide@vger.kernel.org; +Cc: Tejun Heo, Alan

Hi Folks,

We have a command timeout with Sil680 controller on ARM XScale. The
kernel is 2.6.18-rc2 and libata 2.00 with preemptive enabled. Similar
problem observed as well with kernel preemptive disabled.  ATA pass
through and sg are used.  Heavy IO test was ran on both channels of
Sil680 and the system was pretty loaded where the load average was
above 1.5.   Two timers are used to track command timeout in our test
software.  The one in the user space is set to 6 seconds using alarm()
call while the one in the kernel (scsi timer) is set to 5 seconds.
These timeout values are probably too low to be realistic,  but the
issue here is not about the timeout itself but to understand why it is
always user space timer expired before kernel timer.   Since kernel
timer uses jiffies to track time, does this imply a kernel bug where
the time interrupts were lost or delay somehow?  Do you know any know
problems related to command timeout in PATA Sil680?

Thanks,
Fajun
--------------------------------
User space trace:
Cmd 4276 timed out after 7.260137 secs: start time 1173775439.409099
secs, timed out at 1173775446.669236 secs
[Tue Mar 13 08:44:06 2007]:
Test:               Random Write Sectors Extended
LBA Low:            0
LBA High:           100000
...
Num Cmds:           4277
Num_Failed_Cmds:    1
...
Status:             Fail [Error 401: Command timeout]

Dmesg log

~ $ dmesg
.770000] Calling initcall 0xc001ebb4: inet_diag_init+0x0/0x80()
[42949375.770000] Calling initcall 0xc001ec34: tcp_diag_init+0x0/0x1c()
[42949375.770000] Calling initcall 0xc001ec50: bictcp_register+0x0/0x1c()
[42949375.770000] TCP bic registered
[42949375.770000] Calling initcall 0xc001ee2c: af_unix_init+0x0/0x80()
[42949375.770000] NET: Registered protocol family 1
[42949375.770000] Calling initcall 0xc001eeac: packet_init+0x0/0x70()
[42949375.770000] NET: Registered protocol family 17
[42949375.770000] Calling initcall 0xc0012a88:
clocksource_done_booting+0x0/0x24()
[42949375.770000] Calling initcall 0xc0019ed4: seqgen_init+0x0/0x1c()
[42949375.770000] Calling initcall 0xc001ba44:
early_uart_console_switch+0x0/0x90()
[42949375.770000] Calling initcall 0xc013a150: net_random_reseed+0x0/0x38()
[42949375.770000] RAMDISK: Compressed image found at block 0
[42949378.950000] VFS: Mounted root (ext2 filesystem).
[42949378.960000] Freeing init memory: 104K
[42949549.170000] ata1: soft resetting port
[42949549.250000] ata1.00: ATA-6, max UDMA/100, 78140160 sectors: LBA48
[42949549.250000] ata1.00: configured for UDMA/100
[42949549.250000] ata1: EH complete
[42949549.250000]   Vendor: ATA       Model: ST940813AM        Rev: 3.02
[42949549.250000]   Type:   Direct-Access                      ANSI
SCSI revision: 05
[42949549.260000] SCSI device sda: 78140160 512-byte hdwr sectors (40008 MB)
[42949549.260000] sda: Write Protect is off
[42949549.260000] sda: Mode Sense: 00 3a 00 00
[42949549.260000] SCSI device sda: drive cache: write back
[42949549.270000] SCSI device sda: 78140160 512-byte hdwr sectors (40008 MB)
[42949549.270000] sda: Write Protect is off
[42949549.270000] sda: Mode Sense: 00 3a 00 00
[42949549.270000] SCSI device sda: drive cache: write back
[42949549.270000]  sda: unknown partition table
[42949549.290000] sd 0:0:0:0: Attached scsi disk sda
[42949549.290000] sd 0:0:0:0: Attached scsi generic sg0 type 0
[42949549.320000] ata1: soft resetting port
[42949549.380000] ata1.00: configured for UDMA/100
[42949549.380000] ata1: EH complete
[42949549.380000] SCSI device sda: 78140160 512-byte hdwr sectors (40008 MB)
[42949549.380000] sda: Write Protect is off
[42949549.380000] sda: Mode Sense: 00 3a 00 00
[42949549.390000] SCSI device sda: drive cache: write back
[42949559.280000] ata2: soft resetting port
[42949559.420000] ata2.00: ATA-6, max UDMA/100, 78140160 sectors: LBA48
[42949559.420000] ata2.00: configured for UDMA/100
[42949559.420000] ata2: EH complete
[42949559.420000]   Vendor: ATA       Model: ST94811A          Rev: 3.07
[42949559.420000]   Type:   Direct-Access                      ANSI
SCSI revision: 05
[42949559.430000] SCSI device sdb: 78140160 512-byte hdwr sectors (40008 MB)
[42949559.430000] sdb: Write Protect is off
[42949559.430000] sdb: Mode Sense: 00 3a 00 00
[42949559.430000] SCSI device sdb: drive cache: write back
[42949559.430000] SCSI device sdb: 78140160 512-byte hdwr sectors (40008 MB)
[42949559.440000] sdb: Write Protect is off
[42949559.440000] sdb: Mode Sense: 00 3a 00 00
[42949559.440000] SCSI device sdb: drive cache: write back
[42949559.440000]  sdb: unknown partition table
[42949559.460000] sd 1:0:0:0: Attached scsi disk sdb
[42949559.460000] sd 1:0:0:0: Attached scsi generic sg1 type 0
[  643.230000] NWFPE: ntpd[38] takes exception 00000001 at c002d514
from 0001d308
[  711.230000] NWFPE: ntpd[38] takes exception 00000001 at c002d514
from 0001d308
[  777.220000] NWFPE: ntpd[38] takes exception 00000001 at c002d514
from 0001d308
[  841.300000] NWFPE: ntpd[38] takes exception 00000001 at c002d514
from 0001d308
[  906.270000] NWFPE: ntpd[38] takes exception 00000001 at c002d514
from 0001d308
[  972.190000] NWFPE: ntpd[38] takes exception 00000001 at c002d514
from 0001d308
[ 1035.280000] NWFPE: ntpd[38] takes exception 00000001 at c002d514
from 0001d308
[ 1162.190000] NWFPE: ntpd[38] takes exception 00000001 at c002d514
from 0001d308
[ 1292.230000] NWFPE: ntpd[38] takes exception 00000001 at c002d514
from 0001d308
[ 1422.220000] NWFPE: ntpd[38] takes exception 00000001 at c002d514
from 0001d308
[ 1551.310000] NWFPE: ntpd[38] takes exception 00000001 at c002d514
from 0001d308
[ 1680.260000] NWFPE: ntpd[38] takes exception 00000001 at c002d514
from 0001d308
[ 1936.230000] NWFPE: ntpd[38] takes exception 00000001 at c002d514
from 0001d308
[ 2193.200000] NWFPE: ntpd[38] takes exception 00000001 at c002d514
from 0001d308
[ 2450.170000] NWFPE: ntpd[38] takes exception 00000001 at c002d514
from 0001d308
[ 2707.240000] NWFPE: ntpd[38] takes exception 00000001 at c002d514
from 0001d308
[ 3219.170000] NWFPE: ntpd[38] takes exception 00000001 at c002d514
from 0001d308
[ 3733.170000] NWFPE: ntpd[38] takes exception 00000001 at c002d514
from 0001d308
[ 4245.140000] NWFPE: ntpd[38] takes exception 00000001 at c002d514
from 0001d308
[ 4759.130000] NWFPE: ntpd[38] takes exception 00000001 at c002d514
from 0001d308
[ 5782.140000] NWFPE: ntpd[38] takes exception 00000001 at c002d514
from 0001d308
[ 6808.180000] NWFPE: ntpd[38] takes exception 00000001 at c002d514
from 0001d308
[ 7831.150000] NWFPE: ntpd[38] takes exception 00000001 at c002d514
from 0001d308
[ 8855.110000] NWFPE: ntpd[38] takes exception 00000001 at c002d514
from 0001d308
[ 9880.070000] NWFPE: ntpd[38] takes exception 00000001 at c002d514
from 0001d308
[10906.100000] NWFPE: ntpd[38] takes exception 00000001 at c002d514
from 0001d308
[11931.030000] NWFPE: ntpd[38] takes exception 00000001 at c002d514
from 0001d308
[12955.070000] NWFPE: ntpd[38] takes exception 00000001 at c002d514
from 0001d308
[13978.110000] NWFPE: ntpd[38] takes exception 00000001 at c002d514
from 0001d308
[15003.060000] NWFPE: ntpd[38] takes exception 00000001 at c002d514
from 0001d308
[16028.010000] NWFPE: ntpd[38] takes exception 00000001 at c002d514
from 0001d308
[17051.960000] NWFPE: ntpd[38] takes exception 00000001 at c002d514
from 0001d308
[18077.950000] NWFPE: ntpd[38] takes exception 00000001 at c002d514
from 0001d308
[19100.930000] NWFPE: ntpd[38] takes exception 00000001 at c002d514
from 0001d308
[20126.930000] NWFPE: ntpd[38] takes exception 00000001 at c002d514
from 0001d308
[21150.920000] NWFPE: ntpd[38] takes exception 00000001 at c002d514
from 0001d308
[22176.890000] NWFPE: ntpd[38] takes exception 00000001 at c002d514
from 0001d308
[23199.900000] NWFPE: ntpd[38] takes exception 00000001 at c002d514
from 0001d308
[24224.920000] NWFPE: ntpd[38] takes exception 00000001 at c002d514
from 0001d308
[25248.860000] NWFPE: ntpd[38] takes exception 00000001 at c002d514
from 0001d308
[26271.850000] NWFPE: ntpd[38] takes exception 00000001 at c002d514
from 0001d308
[27297.860000] NWFPE: ntpd[38] takes exception 00000001 at c002d514
from 0001d308
[28323.790000] NWFPE: ntpd[38] takes exception 00000001 at c002d514
from 0001d308
[29346.790000] NWFPE: ntpd[38] takes exception 00000001 at c002d514
from 0001d308
[30382.830000] NWFPE: ntpd[38] takes exception 00000001 at c002d514
from 0001d308
[31406.810000] NWFPE: ntpd[38] takes exception 00000001 at c002d514
from 0001d308
[32429.780000] NWFPE: ntpd[38] takes exception 00000001 at c002d514
from 0001d308
[33454.720000] NWFPE: ntpd[38] takes exception 00000001 at c002d514
from 0001d308
[34480.710000] NWFPE: ntpd[38] takes exception 00000001 at c002d514
from 0001d308
[35503.690000] NWFPE: ntpd[38] takes exception 00000001 at c002d514
from 0001d308
[36498.280000] sg_cmd_done: sg0, pack_id=349, res=0x0, dur=2240 ms
[36526.680000] NWFPE: ntpd[38] takes exception 00000001 at c002d514
from 0001d308
[37552.780000] NWFPE: ntpd[38] takes exception 00000001 at c002d514
from 0001d308
[38577.660000] NWFPE: ntpd[38] takes exception 00000001 at c002d514
from 0001d308
[39602.630000] NWFPE: ntpd[38] takes exception 00000001 at c002d514
from 0001d308
[40627.630000] NWFPE: ntpd[38] takes exception 00000001 at c002d514
from 0001d308

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: PATA Sil680 Command Timeout on ARM XScale
  2007-03-13 18:34 PATA Sil680 Command Timeout on ARM XScale Fajun Chen
@ 2007-03-13 19:39 ` Alan Cox
  2007-03-13 21:17   ` Fajun Chen
  0 siblings, 1 reply; 7+ messages in thread
From: Alan Cox @ 2007-03-13 19:39 UTC (permalink / raw)
  To: Fajun Chen; +Cc: linux-ide@vger.kernel.org, Tejun Heo

> above 1.5.   Two timers are used to track command timeout in our test
> software.  The one in the user space is set to 6 seconds using alarm()
> call while the one in the kernel (scsi timer) is set to 5 seconds.
> These timeout values are probably too low to be realistic,  but the
> issue here is not about the timeout itself but to understand why it is

A lot of drive commands seem to be set up on a seven second worst case

> always user space timer expired before kernel timer.   Since kernel
> timer uses jiffies to track time, does this imply a kernel bug where
> the time interrupts were lost or delay somehow?  Do you know any know
> problems related to command timeout in PATA Sil680?

Alarm() is also handled by the same jiffies logic, so I suspect a bug in
your test environment ?


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: PATA Sil680 Command Timeout on ARM XScale
  2007-03-13 19:39 ` Alan Cox
@ 2007-03-13 21:17   ` Fajun Chen
  2007-03-14 22:27     ` Fajun Chen
  0 siblings, 1 reply; 7+ messages in thread
From: Fajun Chen @ 2007-03-13 21:17 UTC (permalink / raw)
  To: Alan Cox; +Cc: linux-ide@vger.kernel.org, Tejun Heo

On 3/13/07, Alan Cox <alan@lxorguk.ukuu.org.uk> wrote:
> > above 1.5.   Two timers are used to track command timeout in our test
> > software.  The one in the user space is set to 6 seconds using alarm()
> > call while the one in the kernel (scsi timer) is set to 5 seconds.
> > These timeout values are probably too low to be realistic,  but the
> > issue here is not about the timeout itself but to understand why it is
>
> A lot of drive commands seem to be set up on a seven second worst case
>
> > always user space timer expired before kernel timer.   Since kernel
> > timer uses jiffies to track time, does this imply a kernel bug where
> > the time interrupts were lost or delay somehow?  Do you know any know
> > problems related to command timeout in PATA Sil680?
>
> Alarm() is also handled by the same jiffies logic, so I suspect a bug in
> your test environment ?
>
>
I enabled ata_irq_trap and did the same test again. The kernel timer
caught the timeout (10 seconds) this time along with the irq trap
traces below.  What's the cause of these idle irqs?

[42949560.150000] SCSI device sdb: drive cache: write back
[   85.570000] ata1: irq trap
[   85.820000] ata2: irq trap
[   92.120000] abnormal status 0xD0
[   92.120000] ata1: irq trap
[   92.920000] ata2: irq trap
[   98.750000] ata1: irq trap
[  100.260000] abnormal status 0xD0
[  100.260000] ata2: irq trap
[  105.540000] ata1: irq trap
[  108.050000] ata1: irq trap
[  110.620000] ata1: irq trap
[  113.130000] ata1: irq trap
[  115.530000] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen
[  115.530000] ata1.00: (BMDMA stat 0x0)
[  115.530000] ata1.00: tag 0 cmd 0xc8 Emask 0x4 stat 0x40 err 0x0 (timeout)
[  115.530000] ata1: soft resetting port
[  115.570000] ata1.00: configured for UDMA/100
[  115.570000] sg_cmd_done: sg0, pack_id=2706, res=0x8000002, dur=10040 ms
[  115.570000] ata1: EH complete
[  115.580000] SCSI device sda: 156301488 512-byte hdwr sectors (80026 MB)
[  115.580000] sda: Write Protect is off
[  115.580000] sda: Mode Sense: 00 3a 00 00
[  115.580000] SCSI device sda: drive cache: write back
...

Thanks,
Fajun

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: PATA Sil680 Command Timeout on ARM XScale
  2007-03-13 21:17   ` Fajun Chen
@ 2007-03-14 22:27     ` Fajun Chen
  2007-03-14 22:36       ` Jeff Garzik
  0 siblings, 1 reply; 7+ messages in thread
From: Fajun Chen @ 2007-03-14 22:27 UTC (permalink / raw)
  To: Alan Cox; +Cc: linux-ide@vger.kernel.org, Tejun Heo

Since primary channel and secondary channel share the same IRQ,  the
ISR could be called to service one or both channels. So I would think
it's normal to see "irq trap" traces when both channels are in IO
operation, correct?

I have another question in regard to ata_host_intr() function in
libata-core.c. For PIO read/write, the status of interrupt pin was not
checked before moving the host state machine.  Sil680 spec. recommend
checking IDE channel interupt (bit 11 in the IDEx Task File Timing and
Config + Status register) though.  Could someone explain why interrupt
status does not need to be checked for PIO?

I'm still troubleshooting the command timeout issue on my test
hardware, I will repeat the same test on i386 hardware. In the mean
time, I would appreciate any suggestions or known information to
isolate the issue.

Thanks,
Fajun

On 3/13/07, Fajun Chen <fajunchen@gmail.com> wrote:
> On 3/13/07, Alan Cox <alan@lxorguk.ukuu.org.uk> wrote:
> > > above 1.5.   Two timers are used to track command timeout in our test
> > > software.  The one in the user space is set to 6 seconds using alarm()
> > > call while the one in the kernel (scsi timer) is set to 5 seconds.
> > > These timeout values are probably too low to be realistic,  but the
> > > issue here is not about the timeout itself but to understand why it is
> >
> > A lot of drive commands seem to be set up on a seven second worst case
> >
> > > always user space timer expired before kernel timer.   Since kernel
> > > timer uses jiffies to track time, does this imply a kernel bug where
> > > the time interrupts were lost or delay somehow?  Do you know any know
> > > problems related to command timeout in PATA Sil680?
> >
> > Alarm() is also handled by the same jiffies logic, so I suspect a bug in
> > your test environment ?
> >
> >
> I enabled ata_irq_trap and did the same test again. The kernel timer
> caught the timeout (10 seconds) this time along with the irq trap
> traces below.  What's the cause of these idle irqs?
>
> [42949560.150000] SCSI device sdb: drive cache: write back
> [   85.570000] ata1: irq trap
> [   85.820000] ata2: irq trap
> [   92.120000] abnormal status 0xD0
> [   92.120000] ata1: irq trap
> [   92.920000] ata2: irq trap
> [   98.750000] ata1: irq trap
> [  100.260000] abnormal status 0xD0
> [  100.260000] ata2: irq trap
> [  105.540000] ata1: irq trap
> [  108.050000] ata1: irq trap
> [  110.620000] ata1: irq trap
> [  113.130000] ata1: irq trap
> [  115.530000] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen
> [  115.530000] ata1.00: (BMDMA stat 0x0)
> [  115.530000] ata1.00: tag 0 cmd 0xc8 Emask 0x4 stat 0x40 err 0x0 (timeout)
> [  115.530000] ata1: soft resetting port
> [  115.570000] ata1.00: configured for UDMA/100
> [  115.570000] sg_cmd_done: sg0, pack_id=2706, res=0x8000002, dur=10040 ms
> [  115.570000] ata1: EH complete
> [  115.580000] SCSI device sda: 156301488 512-byte hdwr sectors (80026 MB)
> [  115.580000] sda: Write Protect is off
> [  115.580000] sda: Mode Sense: 00 3a 00 00
> [  115.580000] SCSI device sda: drive cache: write back
> ...
>
> Thanks,
> Fajun
>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: PATA Sil680 Command Timeout on ARM XScale
  2007-03-14 22:27     ` Fajun Chen
@ 2007-03-14 22:36       ` Jeff Garzik
  2007-03-14 23:46         ` Fajun Chen
  0 siblings, 1 reply; 7+ messages in thread
From: Jeff Garzik @ 2007-03-14 22:36 UTC (permalink / raw)
  To: Fajun Chen; +Cc: Alan Cox, linux-ide@vger.kernel.org, Tejun Heo

Fajun Chen wrote:
> Since primary channel and secondary channel share the same IRQ,  the
> ISR could be called to service one or both channels. So I would think
> it's normal to see "irq trap" traces when both channels are in IO
> operation, correct?

The irq trap code only occurs after a certain number of unhandled 
interrupts.


> I have another question in regard to ata_host_intr() function in
> libata-core.c. For PIO read/write, the status of interrupt pin was not
> checked before moving the host state machine.  Sil680 spec. recommend
> checking IDE channel interupt (bit 11 in the IDEx Task File Timing and
> Config + Status register) though.  Could someone explain why interrupt
> status does not need to be checked for PIO?

Reading the Status register (as opposed to AltStatus) should clear the 
interrupt condition, on standard hardware.

	Jeff



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: PATA Sil680 Command Timeout on ARM XScale
  2007-03-14 22:36       ` Jeff Garzik
@ 2007-03-14 23:46         ` Fajun Chen
  2007-03-15  2:32           ` Albert Lee
  0 siblings, 1 reply; 7+ messages in thread
From: Fajun Chen @ 2007-03-14 23:46 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: Alan Cox, linux-ide@vger.kernel.org, Tejun Heo

On 3/14/07, Jeff Garzik <jeff@garzik.org> wrote:
> Fajun Chen wrote:
> > Since primary channel and secondary channel share the same IRQ,  the
> > ISR could be called to service one or both channels. So I would think
> > it's normal to see "irq trap" traces when both channels are in IO
> > operation, correct?
>
> The irq trap code only occurs after a certain number of unhandled
> interrupts.
>

If an interrupt fires for IDE1 while IDE0 is busy handling commands,
the irq trap code will count one unhandled interrupts on IDE0 but the
interrupt was not targeted for IDE0 to start with.  So  this irq trap
code if enabled could generate false alarms even in a perfert working
system and ata_irq_ack() function should not be called based on false
alarm.   Please correct me if I misunderstand the intention of the
code.

>
> > I have another question in regard to ata_host_intr() function in
> > libata-core.c. For PIO read/write, the status of interrupt pin was not
> > checked before moving the host state machine.  Sil680 spec. recommend
> > checking IDE channel interupt (bit 11 in the IDEx Task File Timing and
> > Config + Status register) though.  Could someone explain why interrupt
> > status does not need to be checked for PIO?
>
> Reading the Status register (as opposed to AltStatus) should clear the
> interrupt condition, on standard hardware.
>

Could this piece of code handle the sequence below well?
1.  A interrupt fires for IDE1 to indicate command finish
2.  At the same time,    IDE0 just started PIO read and its status
register has not been updated to busy.
3. As part of the interrupt handling, ata_host_intr() will be called
for IDE0 as well. Since this code doesn't check interrupt validity on
IDE0 and count on status register to make decision,  it will be misled
to read data register which has not been populated by the target
device yet.
...
I'm new to the code, so I want to double check with your guys to see
if this is a valid case and if the code can handle it.

Thanks,
Fajun

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: PATA Sil680 Command Timeout on ARM XScale
  2007-03-14 23:46         ` Fajun Chen
@ 2007-03-15  2:32           ` Albert Lee
  0 siblings, 0 replies; 7+ messages in thread
From: Albert Lee @ 2007-03-15  2:32 UTC (permalink / raw)
  To: Fajun Chen; +Cc: Jeff Garzik, Alan Cox, linux-ide@vger.kernel.org, Tejun Heo

Fajun Chen wrote:
> On 3/14/07, Jeff Garzik <jeff@garzik.org> wrote:
> 
>> Fajun Chen wrote:
>> > Since primary channel and secondary channel share the same IRQ,  the
>> > ISR could be called to service one or both channels. So I would think
>> > it's normal to see "irq trap" traces when both channels are in IO
>> > operation, correct?
>>
>> The irq trap code only occurs after a certain number of unhandled
>> interrupts.
>>
> 
> If an interrupt fires for IDE1 while IDE0 is busy handling commands,
> the irq trap code will count one unhandled interrupts on IDE0 but the
> interrupt was not targeted for IDE0 to start with.  So  this irq trap
> code if enabled could generate false alarms even in a perfert working
> system and ata_irq_ack() function should not be called based on false
> alarm.   Please correct me if I misunderstand the intention of the
> code.
> 
>>
>> > I have another question in regard to ata_host_intr() function in
>> > libata-core.c. For PIO read/write, the status of interrupt pin was not
>> > checked before moving the host state machine.  Sil680 spec. recommend
>> > checking IDE channel interupt (bit 11 in the IDEx Task File Timing and
>> > Config + Status register) though.  Could someone explain why interrupt
>> > status does not need to be checked for PIO?
>>
>> Reading the Status register (as opposed to AltStatus) should clear the
>> interrupt condition, on standard hardware.
>>
> 
> Could this piece of code handle the sequence below well?
> 1.  A interrupt fires for IDE1 to indicate command finish
> 2.  At the same time,    IDE0 just started PIO read and its status
> register has not been updated to busy.
> 3. As part of the interrupt handling, ata_host_intr() will be called
> for IDE0 as well. Since this code doesn't check interrupt validity on
> IDE0 and count on status register to make decision,  it will be misled
> to read data register which has not been populated by the target
> device yet.

The host->lock is acquired when a command is being issued to the device.
After writing the command register, libata reads Alt Status and
waits for a short period of time (ndelay(400)) to ensure that the device
is BSY before releasing the host lock. (Please see ata_exec_command().)

So, the above scenario won't happen, unless the device is faulty and
doesn't set BSY after 400+ ns. Even under such situation, libata won't
read the data register blindly as worried. It will be detected as
HSM violation and EH will be activated to handle such situation.

--
albert



^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2007-03-15  2:33 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-03-13 18:34 PATA Sil680 Command Timeout on ARM XScale Fajun Chen
2007-03-13 19:39 ` Alan Cox
2007-03-13 21:17   ` Fajun Chen
2007-03-14 22:27     ` Fajun Chen
2007-03-14 22:36       ` Jeff Garzik
2007-03-14 23:46         ` Fajun Chen
2007-03-15  2:32           ` Albert Lee

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).