linux-ide.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* 2.6.27.6 question: ata_sff_hsm_move: ata15 (why always ata15)?
@ 2008-11-14 23:55 Justin Piszcz
  2008-11-15  0:12 ` Justin Piszcz
                   ` (2 more replies)
  0 siblings, 3 replies; 13+ messages in thread
From: Justin Piszcz @ 2008-11-14 23:55 UTC (permalink / raw)
  To: linux-kernel, Jeff Garzik; +Cc: linux-ide, linux-raid

I am trying to find out what the root cause of this error/problem is:
https://bugzilla.redhat.com/show_bug.cgi?id=462425

It seems like every week or other week someone else reports this on the 
linux-raid mailing list.

I have enabled debugging in the libata.h file and I find that ata15 is 
constantly being accessed for some reason, why is this?

The first two disks are md/RAID1, the other volume is not being written to or 
read:

Attached devices:
Host: scsi0 Channel: 00 Id: 00 Lun: 00
   Vendor: ATA      Model: WDC WD3000HLFS-0 Rev: 04.0
   Type:   Direct-Access                    ANSI  SCSI revision: 05
Host: scsi1 Channel: 00 Id: 00 Lun: 00
   Vendor: ATA      Model: WDC WD3000HLFS-0 Rev: 04.0
   Type:   Direct-Access                    ANSI  SCSI revision: 05
Host: scsi2 Channel: 00 Id: 00 Lun: 00
   Vendor: ATA      Model: WDC WD3000GLFS-0 Rev: 03.0
   Type:   Direct-Access                    ANSI  SCSI revision: 05
Host: scsi3 Channel: 00 Id: 00 Lun: 00
   Vendor: ATA      Model: WDC WD3000GLFS-0 Rev: 03.0
   Type:   Direct-Access                    ANSI  SCSI revision: 05
Host: scsi4 Channel: 00 Id: 00 Lun: 00
   Vendor: ATA      Model: WDC WD3000GLFS-0 Rev: 03.0
   Type:   Direct-Access                    ANSI  SCSI revision: 05
Host: scsi5 Channel: 00 Id: 00 Lun: 00
   Vendor: ATA      Model: WDC WD3000GLFS-0 Rev: 03.0
   Type:   Direct-Access                    ANSI  SCSI revision: 05
Host: scsi6 Channel: 00 Id: 00 Lun: 00
   Vendor: ATA      Model: WDC WD3000GLFS-0 Rev: 03.0
   Type:   Direct-Access                    ANSI  SCSI revision: 05
Host: scsi8 Channel: 00 Id: 00 Lun: 00
   Vendor: ATA      Model: WDC WD3000GLFS-0 Rev: 03.0
   Type:   Direct-Access                    ANSI  SCSI revision: 05
Host: scsi10 Channel: 00 Id: 00 Lun: 00
   Vendor: ATA      Model: WDC WD3000GLFS-0 Rev: 03.0
   Type:   Direct-Access                    ANSI  SCSI revision: 05
Host: scsi11 Channel: 00 Id: 00 Lun: 00
   Vendor: ATA      Model: WDC WD3000GLFS-0 Rev: 03.0
   Type:   Direct-Access                    ANSI  SCSI revision: 05
Host: scsi12 Channel: 00 Id: 00 Lun: 00
   Vendor: ATA      Model: WDC WD3000GLFS-0 Rev: 03.0
   Type:   Direct-Access                    ANSI  SCSI revision: 05
Host: scsi13 Channel: 00 Id: 00 Lun: 00
   Vendor: ATA      Model: WDC WD3000GLFS-0 Rev: 03.0
   Type:   Direct-Access                    ANSI  SCSI revision: 05
Host: scsi14 Channel: 00 Id: 00 Lun: 00
   Vendor: _NEC     Model: DVD_RW ND-3520A  Rev: 1.04
   Type:   CD-ROM                           ANSI  SCSI revision: 05

What is the purpose of this, what is it doing?
Also in relation to the "ATAPI" message, is that for ATA or ATAPI, e.g. if 
I disable CDROM support, will those messages stop?

Nov 14 18:50:41 p34 kernel: [ 1205.558414] ata_sff_host_intr: ata15: protocol 5 task_state 3
Nov 14 18:50:41 p34 kernel: [ 1205.558420] ata_sff_hsm_move: ata15: protocol 5 task_state 3 (dev_stat 0x51)
Nov 14 18:50:41 p34 kernel: [ 1205.558422] ata_sff_hsm_move: ata15: protocol 5 task_state 4 (dev_stat 0x51)
Nov 14 18:50:41 p34 kernel: [ 1205.558433] ata_scsi_timed_out: ENTER
Nov 14 18:50:41 p34 kernel: [ 1205.558434] ata_scsi_timed_out: EXIT, ret=0
Nov 14 18:50:41 p34 kernel: [ 1205.558438] ata_scsi_error: ENTER
Nov 14 18:50:41 p34 kernel: [ 1205.558440] ata_port_flush_task: ENTER
Nov 14 18:50:41 p34 kernel: [ 1205.558443] ata15: ata_port_flush_task: EXIT
Nov 14 18:50:41 p34 kernel: [ 1205.558450] ata_eh_link_autopsy: ENTER
Nov 14 18:50:41 p34 kernel: [ 1205.558452] atapi_eh_request_sense: ATAPI request sense
Nov 14 18:50:41 p34 kernel: [ 1205.558454] ata15: ata_dev_select: ENTER, device 0, wait 1
Nov 14 18:50:41 p34 kernel: [ 1205.558494] ata_sff_tf_load: feat 0x0 nsect 0x0 lba 0x0 0x60 0x0
Nov 14 18:50:41 p34 kernel: [ 1205.558496] ata_sff_tf_load: device 0xA0
Nov 14 18:50:41 p34 kernel: [ 1205.558509] ata_sff_exec_command: ata15: cmd 0xA0
Nov 14 18:50:41 p34 kernel: [ 1205.558526] ata_sff_hsm_move: ata15: protocol 6 task_state 1 (dev_stat 0x58)
Nov 14 18:50:41 p34 kernel: [ 1205.558527] atapi_send_cdb: send cdb
Nov 14 18:50:41 p34 kernel: [ 1205.558697] ata_sff_host_intr: ata15: protocol 6 task_state 2
Nov 14 18:50:41 p34 kernel: [ 1205.558703] ata_sff_hsm_move: ata15: protocol 6 task_state 2 (dev_stat 0x58)
Nov 14 18:50:41 p34 kernel: [ 1205.558713] atapi_pio_bytes: ata15: xfering 18 bytes
Nov 14 18:50:41 p34 kernel: [ 1205.558715] __atapi_pio_bytes: data read
Nov 14 18:50:41 p34 kernel: [ 1205.558825] ata_sff_host_intr: ata15: protocol 6 task_state 2
Nov 14 18:50:41 p34 kernel: [ 1205.558831] ata_sff_hsm_move: ata15: protocol 6 task_state 2 (dev_stat 0x50)
Nov 14 18:50:41 p34 kernel: [ 1205.558833] ata_sff_hsm_move: ata15: protocol 6 task_state 3 (dev_stat 0x50)
Nov 14 18:50:41 p34 kernel: [ 1205.558834] ata_sff_hsm_move: ata15: dev 0 command complete, drv_stat 0x50
Nov 14 18:50:41 p34 kernel: [ 1205.558858] ata_port_flush_task: ENTER
Nov 14 18:50:41 p34 kernel: [ 1205.558860] ata15: ata_port_flush_task: EXIT
Nov 14 18:50:41 p34 kernel: [ 1205.558865] ata_eh_link_autopsy: EXIT
Nov 14 18:50:41 p34 kernel: [ 1205.558866] ata_eh_recover: ENTER
Nov 14 18:50:41 p34 kernel: [ 1205.558868] ata_eh_revalidate_and_attach: ENTER
Nov 14 18:50:41 p34 kernel: [ 1205.558869] ata_eh_recover: EXIT, rc=0
Nov 14 18:50:41 p34 kernel: [ 1205.558870] atapi_qc_complete: ENTER, err_mask 0x0
Nov 14 18:50:41 p34 kernel: [ 1205.558874] ata_scsi_error: EXIT
Nov 14 18:50:41 p34 kernel: [ 1205.558882] ata_scsi_dump_cdb: CDB (15:0,0,0) 4a 01 00 00 10 00 00 00 08
Nov 14 18:50:41 p34 kernel: [ 1205.558883] ata_scsi_translate: ENTER
Nov 14 18:50:41 p34 kernel: [ 1205.558885] ata15: ata_dev_select: ENTER, device 0, wait 1
Nov 14 18:50:41 p34 kernel: [ 1205.558917] ata_sff_tf_load: feat 0x0 nsect 0x0 lba 0x0 0x8 0x0
Nov 14 18:50:41 p34 kernel: [ 1205.558920] ata_sff_tf_load: device 0xA0
Nov 14 18:50:41 p34 kernel: [ 1205.558932] ata_sff_exec_command: ata15: cmd 0xA0
Nov 14 18:50:41 p34 kernel: [ 1205.558937] ata_scsi_translate: EXIT
Nov 14 18:50:41 p34 kernel: [ 1205.558951] ata_sff_hsm_move: ata15: protocol 6 task_state 1 (dev_stat 0x58)
Nov 14 18:50:41 p34 kernel: [ 1205.558952] atapi_send_cdb: send cdb
Nov 14 18:50:41 p34 kernel: [ 1205.559120] ata_sff_host_intr: ata15: protocol 6 task_state 2
Nov 14 18:50:41 p34 kernel: [ 1205.559126] ata_sff_hsm_move: ata15: protocol 6 task_state 2 (dev_stat 0x58)
Nov 14 18:50:41 p34 kernel: [ 1205.559126] atapi_pio_bytes: ata15: xfering 8 bytes
Nov 14 18:50:41 p34 kernel: [ 1205.559126] __atapi_pio_bytes: data read
Nov 14 18:50:41 p34 kernel: [ 1205.559258] ata_sff_host_intr: ata15: protocol 6 task_state 2
Nov 14 18:50:41 p34 kernel: [ 1205.559264] ata_sff_hsm_move: ata15: protocol 6 task_state 2 (dev_stat 0x50)
Nov 14 18:50:41 p34 kernel: [ 1205.559266] ata_sff_hsm_move: ata15: protocol 6 task_state 3 (dev_stat 0x50)
Nov 14 18:50:41 p34 kernel: [ 1205.559268] ata_sff_hsm_move: ata15: dev 0 command complete, drv_stat 0x50
Nov 14 18:50:41 p34 kernel: [ 1205.559269] atapi_qc_complete: ENTER, err_mask 0x0
Nov 14 18:50:41 p34 kernel: [ 1205.559297] ata_scsi_dump_cdb: CDB (15:0,0,0) 1e 00 00 00 00 00 00 00 00
Nov 14 18:50:41 p34 kernel: [ 1205.559299] ata_scsi_translate: ENTER
Nov 14 18:50:41 p34 kernel: [ 1205.559301] ata15: ata_dev_select: ENTER, device 0, wait 1
Nov 14 18:50:41 p34 kernel: [ 1205.559333] ata_sff_tf_load: feat 0x0 nsect 0x0 lba 0x0 0x0 0x0
Nov 14 18:50:41 p34 kernel: [ 1205.559335] ata_sff_tf_load: device 0xA0
Nov 14 18:50:41 p34 kernel: [ 1205.559348] ata_sff_exec_command: ata15: cmd 0xA0
Nov 14 18:50:41 p34 kernel: [ 1205.559352] ata_scsi_translate: EXIT
Nov 14 18:50:41 p34 kernel: [ 1205.559369] ata_sff_hsm_move: ata15: protocol 5 task_state 1 (dev_stat 0x58)
Nov 14 18:50:41 p34 kernel: [ 1205.559371] atapi_send_cdb: send cdb
Nov 14 18:50:41 p34 kernel: [ 1205.559517] ata_sff_host_intr: ata15: protocol 5 task_state 3
Nov 14 18:50:41 p34 kernel: [ 1205.559523] ata_sff_hsm_move: ata15: protocol 5 task_state 3 (dev_stat 0x50)
Nov 14 18:50:41 p34 kernel: [ 1205.559525] ata_sff_hsm_move: ata15: dev 0 command complete, drv_stat 0x50
Nov 14 18:50:41 p34 kernel: [ 1205.559526] atapi_qc_complete: ENTER, err_mask 0x0


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: 2.6.27.6 question: ata_sff_hsm_move: ata15 (why always ata15)?
  2008-11-14 23:55 2.6.27.6 question: ata_sff_hsm_move: ata15 (why always ata15)? Justin Piszcz
@ 2008-11-15  0:12 ` Justin Piszcz
  2008-11-15  3:22 ` Robert Hancock
  2008-11-15 17:39 ` Mario 'BitKoenig' Holbe
  2 siblings, 0 replies; 13+ messages in thread
From: Justin Piszcz @ 2008-11-15  0:12 UTC (permalink / raw)
  To: linux-kernel, Jeff Garzik; +Cc: linux-ide, linux-raid


On Fri, 14 Nov 2008, Justin Piszcz wrote:

> I am trying to find out what the root cause of this error/problem is:
> https://bugzilla.redhat.com/show_bug.cgi?id=462425
>
> It seems like every week or other week someone else reports this on the 
> linux-raid mailing list.
>
> I have enabled debugging in the libata.h file and I find that ata15 is 
> constantly being accessed for some reason, why is this?
>
> The first two disks are md/RAID1, the other volume is not being written to or 
> read:
>

In messages:

Nov 14 19:12:01 p34 kernel: [ 2485.558898] ata15: ata_dev_select: ENTER, device 0, wait 1
Nov 14 19:12:01 p34 kernel: [ 2485.559318] ata15: ata_dev_select: ENTER, device 0, wait 1
Nov 14 19:12:03 p34 kernel: [ 2487.554449] ata15: ata_dev_select: ENTER, device 0, wait 1
Nov 14 19:12:03 p34 kernel: [ 2487.554725] ata15: ata_dev_select: ENTER, device 0, wait 1
Nov 14 19:12:03 p34 kernel: [ 2487.555399] ata15: ata_dev_select: ENTER, device 0, wait 1
Nov 14 19:12:03 p34 kernel: [ 2487.555746] ata15: ata_dev_select: ENTER, device 0, wait 1
Nov 14 19:12:03 p34 kernel: [ 2487.556194] ata15: ata_dev_select: ENTER, device 0, wait 1
Nov 14 19:12:03 p34 kernel: [ 2487.556449] ata15: ata_dev_select: ENTER, device 0, wait 1


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: 2.6.27.6 question: ata_sff_hsm_move: ata15 (why always ata15)?
  2008-11-14 23:55 2.6.27.6 question: ata_sff_hsm_move: ata15 (why always ata15)? Justin Piszcz
  2008-11-15  0:12 ` Justin Piszcz
@ 2008-11-15  3:22 ` Robert Hancock
  2008-11-15 10:34   ` Alan Cox
  2008-11-15 17:39 ` Mario 'BitKoenig' Holbe
  2 siblings, 1 reply; 13+ messages in thread
From: Robert Hancock @ 2008-11-15  3:22 UTC (permalink / raw)
  To: linux-kernel; +Cc: linux-ide, linux-raid

Justin Piszcz wrote:
> I am trying to find out what the root cause of this error/problem is:
> https://bugzilla.redhat.com/show_bug.cgi?id=462425

The problem is that people assume that timeouts with DRDY like that bug 
refers to must be the same problem when it is often not, and so bug 
reports like this end up conflated with all manner of different issues. 
It's one of the most generic errors imaginable. Also, as one of the 
posters mentions, "I think this problem tends to get ignored because 
there are so many things that can cause it (bad drives, cables, power 
supplies, or any combination thereof).."  Just because older kernels did 
work doesn't necessarily prove anything, if they didn't do NCQ (which in 
the Marvell case, I think older kernels didn't) or are pushing the disk 
subsystem harder this may expose hardware issues that didn't show up on 
older kernels.

One really needs to isolate individual problems and try some hardware 
debugging before looking into the kernel for such problems..

> 
> It seems like every week or other week someone else reports this on the 
> linux-raid mailing list.
> 
> I have enabled debugging in the libata.h file and I find that ata15 is 
> constantly being accessed for some reason, why is this?

Presumably ata15 is the port the DVD-RW is connected to, likely HAL is 
polling for media or something..

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: 2.6.27.6 question: ata_sff_hsm_move: ata15 (why always ata15)?
  2008-11-15  3:22 ` Robert Hancock
@ 2008-11-15 10:34   ` Alan Cox
  2008-11-16  6:14     ` Tejun Heo
  0 siblings, 1 reply; 13+ messages in thread
From: Alan Cox @ 2008-11-15 10:34 UTC (permalink / raw)
  To: Robert Hancock; +Cc: linux-kernel, linux-ide, linux-raid

On Fri, 14 Nov 2008 21:22:58 -0600
Robert Hancock <hancockr@shaw.ca> wrote:

> Justin Piszcz wrote:
> > I am trying to find out what the root cause of this error/problem is:
> > https://bugzilla.redhat.com/show_bug.cgi?id=462425
> 
> The problem is that people assume that timeouts with DRDY like that bug 
> refers to must be the same problem when it is often not

The first stopping point is to apply the DRQ drain patch I sent to the
list some time ago and is hopefully lined up for 2.6.29. After that point
you can begin to look at the remaining cases, until then its hardly worth
it.

Alan


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: 2.6.27.6 question: ata_sff_hsm_move: ata15 (why always ata15)?
  2008-11-14 23:55 2.6.27.6 question: ata_sff_hsm_move: ata15 (why always ata15)? Justin Piszcz
  2008-11-15  0:12 ` Justin Piszcz
  2008-11-15  3:22 ` Robert Hancock
@ 2008-11-15 17:39 ` Mario 'BitKoenig' Holbe
  2008-11-15 22:00   ` Justin Piszcz
  2 siblings, 1 reply; 13+ messages in thread
From: Mario 'BitKoenig' Holbe @ 2008-11-15 17:39 UTC (permalink / raw)
  To: linux-kernel; +Cc: linux-ide, linux-raid

Justin Piszcz <jpiszcz@lucidpixels.com> wrote:
> I have enabled debugging in the libata.h file and I find that ata15 is 
> constantly being accessed for some reason, why is this?
...
> Host: scsi14 Channel: 00 Id: 00 Lun: 00
>    Vendor: _NEC     Model: DVD_RW ND-3520A  Rev: 1.04
>    Type:   CD-ROM                           ANSI  SCSI revision: 05

Well, if ata15 really is the CD-ROM, hal is likely to poll it :/


regards
   Mario
-- 
Evidently men are more intelligent than women. Every woman on earth
believes that men should be able to read minds. Every man knows this
is impossible. Ergo, we are more intelligent.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: 2.6.27.6 question: ata_sff_hsm_move: ata15 (why always ata15)?
  2008-11-15 17:39 ` Mario 'BitKoenig' Holbe
@ 2008-11-15 22:00   ` Justin Piszcz
  0 siblings, 0 replies; 13+ messages in thread
From: Justin Piszcz @ 2008-11-15 22:00 UTC (permalink / raw)
  To: Mario 'BitKoenig' Holbe; +Cc: linux-raid, linux-ide, linux-kernel



On Sat, 15 Nov 2008, Mario 'BitKoenig' Holbe wrote:

> Justin Piszcz <jpiszcz@lucidpixels.com> wrote:
>> I have enabled debugging in the libata.h file and I find that ata15 is
>> constantly being accessed for some reason, why is this?
> ...
>> Host: scsi14 Channel: 00 Id: 00 Lun: 00
>>    Vendor: _NEC     Model: DVD_RW ND-3520A  Rev: 1.04
>>    Type:   CD-ROM                           ANSI  SCSI revision: 05
>
> Well, if ata15 really is the CD-ROM, hal is likely to poll it :/
>
>
> regards
>   Mario
> -- 
> Evidently men are more intelligent than women. Every woman on earth
> believes that men should be able to read minds. Every man knows this
> is impossible. Ergo, we are more intelligent.

Yes, for this -specific- issue, it was HAL polling the DVD drive multiple 
times a second, this fixed that:

# hal-disable-polling --device /dev/sr0 
Following symlink from /dev/sr0 to /dev/scd0.
Polling for drive /dev/sr0 have been disabled. The fdi file written was
   /etc/hal/fdi/information/media-check-disable-storage_model_DVD_RW_ND_3520A.fdi

#

Justin.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: 2.6.27.6 question: ata_sff_hsm_move: ata15 (why always ata15)?
  2008-11-15 10:34   ` Alan Cox
@ 2008-11-16  6:14     ` Tejun Heo
  2008-11-16  9:11       ` Justin Piszcz
  2008-11-16 11:25       ` Alan Cox
  0 siblings, 2 replies; 13+ messages in thread
From: Tejun Heo @ 2008-11-16  6:14 UTC (permalink / raw)
  To: Alan Cox; +Cc: Robert Hancock, linux-kernel, linux-ide, linux-raid

Alan Cox wrote:
> On Fri, 14 Nov 2008 21:22:58 -0600
> Robert Hancock <hancockr@shaw.ca> wrote:
> 
>> Justin Piszcz wrote:
>>> I am trying to find out what the root cause of this error/problem is:
>>> https://bugzilla.redhat.com/show_bug.cgi?id=462425
>> The problem is that people assume that timeouts with DRDY like that bug 
>> refers to must be the same problem when it is often not
> 
> The first stopping point is to apply the DRQ drain patch I sent to the
> list some time ago and is hopefully lined up for 2.6.29. After that point
> you can begin to look at the remaining cases, until then its hardly worth
> it.

Is it really?  For many SATA controllers, DRQ draining isn't really
necessary.  PATA might be a completely different story tho.

-- 
tejun

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: 2.6.27.6 question: ata_sff_hsm_move: ata15 (why always ata15)?
  2008-11-16  6:14     ` Tejun Heo
@ 2008-11-16  9:11       ` Justin Piszcz
  2008-11-16  9:14         ` Tejun Heo
  2008-11-16 11:25       ` Alan Cox
  1 sibling, 1 reply; 13+ messages in thread
From: Justin Piszcz @ 2008-11-16  9:11 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Alan Cox, Robert Hancock, linux-kernel, linux-ide, linux-raid



On Sun, 16 Nov 2008, Tejun Heo wrote:

> Alan Cox wrote:
>> On Fri, 14 Nov 2008 21:22:58 -0600
>> Robert Hancock <hancockr@shaw.ca> wrote:
>>
>>> Justin Piszcz wrote:
>>>> I am trying to find out what the root cause of this error/problem is:
>>>> https://bugzilla.redhat.com/show_bug.cgi?id=462425
>>> The problem is that people assume that timeouts with DRDY like that bug
>>> refers to must be the same problem when it is often not
>>
>> The first stopping point is to apply the DRQ drain patch I sent to the
>> list some time ago and is hopefully lined up for 2.6.29. After that point
>> you can begin to look at the remaining cases, until then its hardly worth
>> it.
>
> Is it really?  For many SATA controllers, DRQ draining isn't really
> necessary.  PATA might be a completely different story tho.

I have been running with the patch for almost 24 hours and running 
extensive disk tests, so far nothing but sometimes it takes a few days 
or more to repeat, I am continuing to test to see if the problem recurs..

Justin.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: 2.6.27.6 question: ata_sff_hsm_move: ata15 (why always ata15)?
  2008-11-16  9:11       ` Justin Piszcz
@ 2008-11-16  9:14         ` Tejun Heo
  0 siblings, 0 replies; 13+ messages in thread
From: Tejun Heo @ 2008-11-16  9:14 UTC (permalink / raw)
  To: Justin Piszcz
  Cc: Alan Cox, Robert Hancock, linux-kernel, linux-ide, linux-raid

Justin Piszcz wrote:
> 
> 
> On Sun, 16 Nov 2008, Tejun Heo wrote:
> 
>> Alan Cox wrote:
>>> On Fri, 14 Nov 2008 21:22:58 -0600
>>> Robert Hancock <hancockr@shaw.ca> wrote:
>>>
>>>> Justin Piszcz wrote:
>>>>> I am trying to find out what the root cause of this error/problem is:
>>>>> https://bugzilla.redhat.com/show_bug.cgi?id=462425
>>>> The problem is that people assume that timeouts with DRDY like that bug
>>>> refers to must be the same problem when it is often not
>>>
>>> The first stopping point is to apply the DRQ drain patch I sent to the
>>> list some time ago and is hopefully lined up for 2.6.29. After that
>>> point
>>> you can begin to look at the remaining cases, until then its hardly
>>> worth
>>> it.
>>
>> Is it really?  For many SATA controllers, DRQ draining isn't really
>> necessary.  PATA might be a completely different story tho.
> 
> I have been running with the patch for almost 24 hours and running
> extensive disk tests, so far nothing but sometimes it takes a few days
> or more to repeat, I am continuing to test to see if the problem recurs..

The patch only kicks in only when EH kicks in so you wouldn't know
whether it's effective or not till exception occurs.


-- 
tejun

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: 2.6.27.6 question: ata_sff_hsm_move: ata15 (why always ata15)?
  2008-11-16  6:14     ` Tejun Heo
  2008-11-16  9:11       ` Justin Piszcz
@ 2008-11-16 11:25       ` Alan Cox
  2008-11-24 18:20         ` Mark Lord
  1 sibling, 1 reply; 13+ messages in thread
From: Alan Cox @ 2008-11-16 11:25 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Robert Hancock, linux-kernel, linux-ide, linux-raid

On Sun, 16 Nov 2008 15:14:01 +0900
Tejun Heo <tj@kernel.org> wrote:

> Alan Cox wrote:
> > On Fri, 14 Nov 2008 21:22:58 -0600
> > Robert Hancock <hancockr@shaw.ca> wrote:
> > 
> >> Justin Piszcz wrote:
> >>> I am trying to find out what the root cause of this error/problem is:
> >>> https://bugzilla.redhat.com/show_bug.cgi?id=462425
> >> The problem is that people assume that timeouts with DRDY like that bug 
> >> refers to must be the same problem when it is often not
> > 
> > The first stopping point is to apply the DRQ drain patch I sent to the
> > list some time ago and is hopefully lined up for 2.6.29. After that point
> > you can begin to look at the remaining cases, until then its hardly worth
> > it.
> 
> Is it really?  For many SATA controllers, DRQ draining isn't really
> necessary.  PATA might be a completely different story tho.

It seems to be needed for various devices and some controllers. Given we
don't know which it seems to be the sensible starting point for almost any
failure involving a DRQ being left on. It won't fix them all but it is
the one case that can easily be eliminated.

Alan

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: 2.6.27.6 question: ata_sff_hsm_move: ata15 (why always ata15)?
  2008-11-16 11:25       ` Alan Cox
@ 2008-11-24 18:20         ` Mark Lord
  2008-11-24 18:32           ` Alan Cox
  0 siblings, 1 reply; 13+ messages in thread
From: Mark Lord @ 2008-11-24 18:20 UTC (permalink / raw)
  To: Alan Cox; +Cc: Tejun Heo, Robert Hancock, linux-kernel, linux-ide, linux-raid

Alan Cox wrote:
> On Sun, 16 Nov 2008 15:14:01 +0900
> Tejun Heo <tj@kernel.org> wrote:
> 
>> Alan Cox wrote:
>>> On Fri, 14 Nov 2008 21:22:58 -0600
>>> Robert Hancock <hancockr@shaw.ca> wrote:
>>>
>>>> Justin Piszcz wrote:
>>>>> I am trying to find out what the root cause of this error/problem is:
>>>>> https://bugzilla.redhat.com/show_bug.cgi?id=462425
>>>> The problem is that people assume that timeouts with DRDY like that bug 
>>>> refers to must be the same problem when it is often not
>>> The first stopping point is to apply the DRQ drain patch I sent to the
>>> list some time ago and is hopefully lined up for 2.6.29. After that point
>>> you can begin to look at the remaining cases, until then its hardly worth
>>> it.
>> Is it really?  For many SATA controllers, DRQ draining isn't really
>> necessary.  PATA might be a completely different story tho.
> 
> It seems to be needed for various devices and some controllers. Given we
> don't know which it seems to be the sensible starting point for almost any
> failure involving a DRQ being left on. It won't fix them all but it is
> the one case that can easily be eliminated.
..

Well ata_piix for starters.  Verified here by me.

Cheers

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: 2.6.27.6 question: ata_sff_hsm_move: ata15 (why always ata15)?
  2008-11-24 18:20         ` Mark Lord
@ 2008-11-24 18:32           ` Alan Cox
  2008-11-25 16:35             ` Mark Lord
  0 siblings, 1 reply; 13+ messages in thread
From: Alan Cox @ 2008-11-24 18:32 UTC (permalink / raw)
  To: Mark Lord; +Cc: Tejun Heo, Robert Hancock, linux-kernel, linux-ide, linux-raid

> Well ata_piix for starters.  Verified here by me.

Some piix are fine (mine are) so maybe its a device property in part.
Certainly it is needed for some combinations including current stuff like
jmicron PATA ports.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: 2.6.27.6 question: ata_sff_hsm_move: ata15 (why always ata15)?
  2008-11-24 18:32           ` Alan Cox
@ 2008-11-25 16:35             ` Mark Lord
  0 siblings, 0 replies; 13+ messages in thread
From: Mark Lord @ 2008-11-25 16:35 UTC (permalink / raw)
  To: Alan Cox; +Cc: Tejun Heo, Robert Hancock, linux-kernel, linux-ide, linux-raid

Alan Cox wrote:
>> Well ata_piix for starters.  Verified here by me.
> 
> Some piix are fine (mine are) so maybe its a device property in part.
> Certainly it is needed for some combinations including current stuff like
> jmicron PATA ports.
..

Agreed -- here it was a notebook (Dell Inspiron 9300) with ata_piix
and a PATA drive.  So the mainboard in the notebook had an onboard bridge
chip to adapt the PATA drive to the internal SATA port.

The subsequent model notebook (Dell Inspiron 9400) has very similar innards,
except it now uses SATA drives, so no bridge chip.

The former needs DRQ draining, the latter does not.
So, yeah, it's probably the bridge chip in this case.

Cheers

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2008-11-25 16:34 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-11-14 23:55 2.6.27.6 question: ata_sff_hsm_move: ata15 (why always ata15)? Justin Piszcz
2008-11-15  0:12 ` Justin Piszcz
2008-11-15  3:22 ` Robert Hancock
2008-11-15 10:34   ` Alan Cox
2008-11-16  6:14     ` Tejun Heo
2008-11-16  9:11       ` Justin Piszcz
2008-11-16  9:14         ` Tejun Heo
2008-11-16 11:25       ` Alan Cox
2008-11-24 18:20         ` Mark Lord
2008-11-24 18:32           ` Alan Cox
2008-11-25 16:35             ` Mark Lord
2008-11-15 17:39 ` Mario 'BitKoenig' Holbe
2008-11-15 22:00   ` Justin Piszcz

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).