linux-ide.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: 2.6.25: sata_sil freezes, hard resets port.
@ 2008-06-04 19:58 Andrew Henry
  2008-06-10  3:18 ` Tejun Heo
  0 siblings, 1 reply; 10+ messages in thread
From: Andrew Henry @ 2008-06-04 19:58 UTC (permalink / raw)
  To: linux-ide

Any news with this?  I start to understand that this really is a missing
function in the sata_sil driver: driver does not work well with power
saving modes/standby on eSATA drives.

I have created a cron script to 'ping' the drives usng sdparm -C start
/dev/sdX every 3 minutes and the drives have been operating with RAID1
and dmcrypt for 4 days non-stop without error.

It does seem that this fault I am experiencing is only caused by
incomplete driver.  Can someone please acknowledge this?


--andrew

^ permalink raw reply	[flat|nested] 10+ messages in thread
* Re: 2.6.25: sata_sil freezes, hard resets port.
@ 2009-01-13  9:50 Henry, Andrew
  0 siblings, 0 replies; 10+ messages in thread
From: Henry, Andrew @ 2009-01-13  9:50 UTC (permalink / raw)
  To: linux-ide@vger.kernel.org; +Cc: Tejun Heo

Hi Tejun,

I know I said that I would try new kernels etc with this issue but I gave up.  I bought a new tower server to replace my aging laptop 'server', that now runs on an ASUS P5Q3 M/B with an Intel ICH10 AHCI SATA controller and bought 2x1TB Western Digital RE3 drives to replace the e-SATA WD MyBook 500GB drives I was having issues with on the sata_sil 3512 controller.

As you can guess, I now have no issues whatsoever and the RE3 discs work wonderfully.  I ended up selling the MyBook drives.  Takes just over 3.5 hours to resync the RE3 drives compared with over 26 hours for the e-SATA 500GB drives.  However I do not know how much the system performance affects the resync.  I was running an AMD Turion 64 mobile chip 1.6GHz and now running Quad Core Q9550.

Regards,
Andrew



Hello, Henry.

Sorry about the late reply.  I was traveling for quite some time.

Henry, Andrew wrote:
> I tried the patch wd.debug.  I am sorry that it took so long.  I
> assume that this was to help the wake-up problem with the MyBook
> drives ( I have many problems).  I added this patch to 2.6.25.9 and
> ran the kernel and then started up my raid-1 array and did not use
> any cron scripts to keep the drives awake.  I left the drives
> inactive for 30 minutes and then tried to access them.  It seems ok.
> If this is what the patch was meant to fix, then it does seem to
> work.  I have only tested this one time though.

Can you please post kernel log with and without the patch?

> I have so many other issues that I have gone back over to USB2
> instead of using the sata_sil 3512 CardBus controller, because it
> seems to stink badly.
>
> Other issues with 3512 controller:
>
> 1. When I write to the array, the activity lights show burst
> activity, i.e. the LEDs are not on constantly, but burst visibly
> with gaps inbetween where there is no activity.  They burst many
> times a second, but it's enough to see that its not going at max
> throughput.  If I read from the disk, then *sometimes* they light up
> constantly without going off.  Does this indicate in some way, that
> the controller is not being utilized to it's maximum capability?
> Seems like the controller is capable of higher throughput, but
> something in the way IO happens is hindering it.

Well, the only way you can tell is by actually measuring the
throughput.  How fast does it actually transfer?

> 2. A shutdown/startup or a reboot can kill the array, and then when
> it comes back up, only one disk is in the array and I have to re-add
> the other one causing a re-sync.  This never happens on USB2, only
> when using 3512 and sata_sil on CardBus controller.  One thing I
> notice is that when shutting down, after the the final KILL
> processes, I see a message saying something like "md is still
> active" right before the PC shuts down.  Is this why I am losing a
> disk, because md is not being shut down properly before a
> reset/poweroff?

Hmmm... I can't tell without looking at the log.  Can you please
attach log for the case where it loses one disk?

> 3. When the drives have gone to sleep and I try to access the
> mounted filesystem, or if I type "fdisk -l" after 10 minutes of not
> accessing the array, then the activity light on the CardBus
> controller lights up and does not go out for 1 minute.  Fdisk -l
> does not report back until after the activity light has gone off.
> The same thing happens when udev starts when booting the system.

Does the kernel complains about anything during that 1 minute?  Sounds
like there are command timeouts there.

> 4. If I eject the CardBus controller, after havin unmounted the raid
> filesystem and having stopped md, then I get a console message
> saying something along the lines of: **DANGER** power could not be
> stopped [when ejecting the controller card].  Sounds serious.
> Possible that I am damaging the controller when I do this.

I don't have much idea about that.  Can you please report this to
linux-pcmcia@lists.infradead.org and cc linux-ide and me?

> 5. IO times are very poor using eSATA CardBus 3512.  I get 15MB/s vs
> 35MB/s on USB2 (reads using /usr/bin/time and dd where if is the
> device and of is /dev/null.  Seems like either the 3512 controller
> card I have is just crap, and/or there are serious problems with the
> sata_sil driver when used in combination with a CardBus controller
> (anyone else out there with a CardBus?)  Maybe the issues do not
> manifest themselves as much on PCI controllers?  I am trying to
> source a CardBus controller at the moment that uses sata_sil24
> instead.

Hmmm...  I also have a 3512 cardbus controller and it works just fine.
Again, does the kernel complain about anything?  15MB/s is way too
slow.  What does "hdparm -t" on the drive report?

Thanks.

--
tejun

^ permalink raw reply	[flat|nested] 10+ messages in thread
* Re: 2.6.25: sata_sil freezes, hard resets port.
@ 2008-09-10 13:23 Henry, Andrew
  2008-09-29  2:59 ` Tejun Heo
  0 siblings, 1 reply; 10+ messages in thread
From: Henry, Andrew @ 2008-09-10 13:23 UTC (permalink / raw)
  To: linux-ide@vger.kernel.org

Hi Tejun.

I tried the patch wd.debug.  I am sorry that it took so long.  I assume that this was to help the wake-up problem with the MyBook drives ( I have many problems).  I added this patch to 2.6.25.9 and ran the kernel and then started up my raid-1 array and did not use any cron scripts to keep the drives awake.  I left the drives inactive for 30 minutes and then tried to access them.  It seems ok.  If this is what the patch was meant to fix, then it does seem to work.  I have only tested this one time though.

I have so many other issues that I have gone back over to USB2 instead of using the sata_sil 3512 CardBus controller, because it seems to stink badly.

Other issues with 3512 controller:

1. When I write to the array, the activity lights show burst activity, i.e. the LEDs are not on constantly, but burst visibly with gaps inbetween where there is no activity.  They burst many times a second, but it's enough to see that its not going at max throughput.  If I read from the disk, then *sometimes* they light up constantly without going off.  Does this indicate in some way, that the controller is not being utilized to it's maximum capability?  Seems like the controller is capable of higher throughput, but something in the way IO happens is hindering it.
2. A shutdown/startup or a reboot can kill the array, and then when it comes back up, only one disk is in the array and I have to re-add the other one causing a re-sync.  This never happens on USB2, only when using 3512 and sata_sil on CardBus controller.  One thing I notice is that when shutting down, after the the final KILL processes, I see a message saying something like "md is still active" right before the PC shuts down.  Is this why I am losing a disk, because md is not being shut down properly before a reset/poweroff?
3. When the drives have gone to sleep and I try to access the mounted filesystem, or if I type "fdisk -l" after 10 minutes of not accessing the array, then the activity light on the CardBus controller lights up and does not go out for 1 minute.  Fdisk -l does not report back until after the activity light has gone off.  The same thing happens when udev starts when booting the system.
4. If I eject the CardBus controller, after havin unmounted the raid filesystem and having stopped md, then I get a console message saying something along the lines of: **DANGER** power could not be stopped [when ejecting the controller card].  Sounds serious.  Possible that I am damaging the controller when I do this.
5. IO times are very poor using eSATA CardBus 3512.  I get 15MB/s vs 35MB/s on USB2 (reads using /usr/bin/time and dd where if is the device and of is /dev/null.

Seems like either the 3512 controller card I have is just crap, and/or there are serious problems with the sata_sil driver when used in combination with a CardBus controller (anyone else out there with a CardBus?)  Maybe the issues do not manifest themselves as much on PCI controllers?  I am trying to source a CardBus controller at the moment that uses sata_sil24 instead.

 --andrew

^ permalink raw reply	[flat|nested] 10+ messages in thread
* Re: 2.6.25: sata_sil freezes, hard resets port.
@ 2008-05-30 18:25 Andrew Henry
  0 siblings, 0 replies; 10+ messages in thread
From: Andrew Henry @ 2008-05-30 18:25 UTC (permalink / raw)
  To: linux-ide

Sorry if this is confusing, I posted a message on linux-kernel and got a
reply from Tejun Heo below who said he had cc'd linux-ide, but it seems
this may not have made it to linux-ide.  I then posted a reply to Tejun
this morning, so the  original is coming after the reply.  Anyway, hope
someone can help, as I cannot use my disks at the moment with eSATA.  My
reply has the same subject and was sent earlier today so it shouldn't be
too hard to find.

Just to add: I have now tried with USB 2.0 and IEEE-1394a simultaneously
with one drive on each interface and there does not seem to be any issue
waking the drives from sleep.  Running fdisk -l takes 3s to respond with
drives in standby mode.  Much better than 120s on eSATA.

>From Tejun:
--------------------------------

Hello, cc'ing linux-ide@vger.kernel.org

Henry, Andrew wrote:
> I'm not on the list.  Please cc me if you reply.
> 
> I run 2.6.18-53 kernel on CentOS5.1 x86_64.  I recently bought 2 x
> WD 500GB triple interface drives and an ST Labs/Sunway eSATA CardBus
> (sil_3512?) controller with 2 ports.
> 
> Note that I compiled 2.6.25 and still get errors.  All output below
> is from 2.6.25.
> 
> I can hotplug the card and drives and run badblocks for 48hrs
> without any verification errors, RAID1 them with mdadm, run dmcrypt
> and create ext3 fs and mount it and it works perfectly.
>
> Then the drives spin down/go to sleep *or* I cold boot the
> computer, and the problems begin...
>
> As long as the discs are always in use, they seem to work, and
> maybe a workaround is a cronjob with sdparm -C start /dev/sdx, but
> the lockups/hangs on the port during boot cannot be overcome so
> easily.  At boot one of the 2 ports can hang and the activity LED
> stays lit and then I cannot access that disc until I cold boot, and
> disconnect all power from the drive and unplug the eSATA cable.  It
> does not work even on cold boot and pressing power off/power on
> button on drive: I need to actually disconnect the cables!
>
> Error 1.
> 
> (system is booted, I hotplug card here)
> 
> pccard: CardBus card inserted into slot 0
> sata_sil 0000:07:00.0: version 2.3
> PCI: Enabling device 0000:07:00.0 (0000 -> 0003)
> ACPI: PCI Interrupt 0000:07:00.0[A] -> GSI 20 (level, low) -> IRQ 20
> sata_sil 0000:07:00.0: cache line size not set.  Driver may not function
> sata_sil 0000:07:00.0: Applying R_ERR on DMA activate FIS errata fix
> 
> Error 2.
> 
> ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen
> ata1.00: cmd c8/00:08:00:00:00/00:00:00:00:00/e0 tag 0 cdb 0x0 data 4096 in
>          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
> ata1: soft resetting port
> ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
> ata1.00: configured for UDMA/66
> ata1: EH complete
> 
> Error 3.
> 
> (this can happen when disc has spun down and I try to access with 'fdisk -l')
> 
> ata2: port is slow to respond, please be patient (Status 0xd8)
> ata2: device not ready (errno=-16), forcing hardreset
> ata2: hard resetting port
> ata2: port is slow to respond, please be patient (Status 0xff)
> ata2: COMRESET failed (errno=-16)
> ata2: hard resetting port
> ata2: port is slow to respond, please be patient (Status 0xff)
> ata2: COMRESET failed (errno=-16)
> ata2: hard resetting port
> ata2: port is slow to respond, please be patient (Status 0xff)
> ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen
> ata1.00: cmd 25/00:08:80:5f:38/00:00:3a:00:00/e0 tag 0 cdb 0x0 data 4096 in
>          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
> ata1: soft resetting port
> ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
> ata1.00: configured for UDMA/33
> ata1: EH complete
> ata2: COMRESET failed (errno=-16)
> ata2: hard resetting port
> ata2: COMRESET failed (errno=-16)
> ata2: reset failed, giving up
> ata2.00: disabled
> ata2: EH complete
> sd 1:0:0:0: SCSI error: return code = 0x00040000
> end_request: I/O error, dev sdb, sector 0
> 
> 
> Error 4.
> 
> ( I get these after the hard resets)
> 
> May 29 07:50:25 k2 kernel: end_request: I/O error, dev sdb, sector 0
> May 29 07:50:25 k2 kernel: sd 1:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK,SUGGEST_OK

ATA drives are supposed to wake up from standby on command issue and
from sleep on reset.  Does the drive spin up while sata_sil is trying
to reset the port?  Also, please post the result of 'hdparm -I
/dev/sdX' where sdX is the offending drive.

Thanks.

-- 
tejun


^ permalink raw reply	[flat|nested] 10+ messages in thread
* Re: 2.6.25: sata_sil freezes, hard resets port.
@ 2008-05-30  8:26 Andrew Henry
  0 siblings, 0 replies; 10+ messages in thread
From: Andrew Henry @ 2008-05-30  8:26 UTC (permalink / raw)
  To: linux-kernel, linux-ide

>>ATA drives are supposed to wake up from standby on command issue and
>>from sleep on reset.  Does the drive spin up while sata_sil is trying
>>to reset the port?  Also, please post the result of 'hdparm -I
>>/dev/sdX' where sdX is the offending drive.
>>-- 
>>tejun


Here is the output from hdparm -I /dev/sdb.  The output is the same for both drives.

I just want to re-state that it's not just when drives spindown.  Happens on hotplug or cold boot also.


As for what happens when I try to access the drives when they have spun down:

- drives are asleep
- i run fdisk -l
- drive on port 1 spins up and LED for that port lights up
- it waits 60s then the LED *should* turn off, but many times, at this point the port will hang: LED is always on
- LED on port 2 lights up, drive spins up, after 60s fdisk reports full output for drives and returns to prompt
- port 1 is still hung.  I remove cable and plug it in again, no effect
- fdisk -l makes port 2 LED flash briefly and reports one  of the eSATA drives connected. port 1 LED does not flash
- unplug port 1 cable and disconnect power to drive; power drive and connect cable: drive is redetected and fdisk reports 2 drives

it is not always the same port that hangs, it seems random.

/dev/sdb:

ATA device, with non-removable media
	Model Number:       WD My Book                              
	Serial Number:      WD-WCASU0206873
	Firmware Revision:  01.01B01
Standards:
	Used: ATA/ATAPI-6 T13 1410D revision 1 
	Supported: 6 5 4 
Configuration:
	Logical		max	current
	cylinders	16383	16383
	heads		16	16
	sectors/track	63	63
	--
	CHS current addressable sectors:   16514064
	LBA    user addressable sectors:  268435455
	LBA48  user addressable sectors:  976773168
	device size with M = 1024*1024:      476940 MBytes
	device size with M = 1000*1000:      500107 MBytes (500 GB)
Capabilities:
	LBA, IORDY(cannot be disabled)
	Queue depth: 1
	Standby timer values: spec'd by Vendor, with device specific minimum
	R/W multiple sector transfer: Max = 1	Current = 0
	DMA: mdma0 mdma1 mdma2 udma0 udma1 udma2 udma3 udma4 *udma5 udma6 
	     Cycle time: min=120ns recommended=120ns
	PIO: pio0 pio1 pio2 pio3 pio4 
	     Cycle time: no flow control=120ns  IORDY flow control=120ns
Commands/features:
	Enabled	Supported:
	   *	SMART feature set
	   *	Power Management feature set
	   *	Write cache
	   *	48-bit Address feature set
	   *	Mandatory FLUSH_CACHE
	   *	FLUSH_CACHE_EXT
	   *	SMART self-test
	   *	SATA-I signaling speed (1.5Gb/s)
	   *	SATA-II signaling speed (3.0Gb/s)
	   *	Native Command Queueing (NCQ)

--andrew

^ permalink raw reply	[flat|nested] 10+ messages in thread
[parent not found: <3ECBDC05781B3D48ABD520A01ABF2F9B12C589D4FA@SE-EX008.groupinfra.com>]

end of thread, other threads:[~2009-01-13 10:42 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-06-04 19:58 2.6.25: sata_sil freezes, hard resets port Andrew Henry
2008-06-10  3:18 ` Tejun Heo
2008-06-11 18:42   ` Andrew Henry
2008-06-12  1:46     ` Tejun Heo
  -- strict thread matches above, loose matches on Subject: below --
2009-01-13  9:50 Henry, Andrew
2008-09-10 13:23 Henry, Andrew
2008-09-29  2:59 ` Tejun Heo
2008-05-30 18:25 Andrew Henry
2008-05-30  8:26 Andrew Henry
     [not found] <3ECBDC05781B3D48ABD520A01ABF2F9B12C589D4FA@SE-EX008.groupinfra.com>
2008-05-30  5:15 ` Tejun Heo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).