linux-ide.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Hotplug with sata_nv
@ 2006-07-19 20:51 Philipp Wagner
  2006-07-30 20:35 ` Tejun Heo
  0 siblings, 1 reply; 3+ messages in thread
From: Philipp Wagner @ 2006-07-19 20:51 UTC (permalink / raw)
  To: htejun; +Cc: linux-ide

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hey,
I've just tested the new hotplug features of libata and unfortunately,
it didn't work out as expected.

My environment here is a Tyan Tomcat K8E (S2865) mainboard with nVIDIA
nForce4 Ultra chipset, which includes my SATA II controller. It is built
into a server chassis featuring two SATA II (Seagate ST3500630AS, 500GB)
hard drives with hotplug backplanes.  Operating system is Fedora Core 5.
I set up a RAID 1 environment, both drives contain several partitions
which are mirrored.
I tried two kernel versions, both with the same result:
2.6.18-rc2 and 2.6.17.5 with the libata-tj-2.6.17.4-20060710 patch

When I remove one drive, I get the following messages into
/var/log/messages:

Jul 19 21:59:08 srv1 kernel: ata2: exception Emask 0x10 SAct 0x0 SErr
0x1810000 action 0x2 frozen
Jul 19 21:59:08 srv1 kernel: ata2: soft resetting port
Jul 19 21:59:08 srv1 kernel: ata2: SATA link down (SStatus 0 SControl 300)
Jul 19 21:59:08 srv1 kernel: ata2: failed to recover some devices,
retrying in 5 secs
Jul 19 21:59:13 srv1 kernel: ata2: hard resetting port
Jul 19 21:59:14 srv1 kernel: ata2: SATA link down (SStatus 0 SControl 300)
Jul 19 21:59:14 srv1 kernel: ata2: failed to recover some devices,
retrying in 5 secs
Jul 19 21:59:19 srv1 kernel: ata2: hard resetting port
Jul 19 21:59:20 srv1 kernel: ata2: SATA link down (SStatus 0 SControl 300)
Jul 19 21:59:20 srv1 kernel: ata2.00: disabled
Jul 19 21:59:20 srv1 kernel: ata2: EH pending after completion,
repeating EH (cnt=4)
Jul 19 21:59:20 srv1 kernel: ata2: EH complete

The files /dev/sdb* are removed and the RAID says it has been degraded,
which is perfectly ok.

Now I re-insert the drive and get the following messages into
/var/log/messages:

Jul 19 22:00:11 srv1 kernel: ata2: exception Emask 0x10 SAct 0x0 SErr
0x50000 action 0x2 frozen
Jul 19 22:00:19 srv1 kernel: ata2: port is slow to respond, please be
patient
Jul 19 22:00:42 srv1 kernel: ata2: port failed to respond (30 secs)
Jul 19 22:00:42 srv1 kernel: ata2: soft resetting port
Jul 19 22:00:49 srv1 kernel: ata2: port is slow to respond, please be
patient
Jul 19 22:01:12 srv1 kernel: ata2: port failed to respond (30 secs)
Jul 19 22:01:12 srv1 kernel: ata2: SATA link up 1.5 Gbps (SStatus 113
SControl 300)
Jul 19 22:01:12 srv1 kernel: ata2: EH pending after completion,
repeating EH (cnt=4)
Jul 19 22:01:12 srv1 kernel: ata2: EH complete

To me this looks like everything went ok and the drive should be
available again. But unfortunately, this is not the case.
The /dev/sdb* files are not created again, nor did I find a way to
create them (udevstart e.g. didn't do nothing). Only a reboot gave me
the device files back. I also disabled SELinux, but the problems still
perexisted.

Now I do not know where the problem comes from, udev, libata or
something totally else? Also for the RAID, shouldn't be the md module
notice automatically that the drive is added again and begin to
reconstruct the data?

I don't know which information may be most important to you, so I put a
copy of three files online:

The kernel configuration I used:
http://philipp-wagner.com/temp/sata/kernel-config

The relevant part of /var/log/messages:
http://philipp-wagner.com/temp/sata/messages

The output of the `dmesg` command:
http://philipp-wagner.com/temp/sata/dmesg

Thanks already for your help,

Philipp
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2 (GNU/Linux)
Comment: Using GnuPG with SUSE - http://enigmail.mozdev.org

iD8DBQFEvptmn9gADIbesF4RAiZjAJ47qIp/y8R9qQXuorGiEU7Dm2+GngCgsDdw
6sT0FaN4DhoCKW35PhO2PvI=
=mRx2
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Hotplug with sata_nv
  2006-07-19 20:51 Hotplug with sata_nv Philipp Wagner
@ 2006-07-30 20:35 ` Tejun Heo
  2006-08-13  7:58   ` Philipp Wagner
  0 siblings, 1 reply; 3+ messages in thread
From: Tejun Heo @ 2006-07-30 20:35 UTC (permalink / raw)
  To: Philipp Wagner; +Cc: linux-ide

[-- Attachment #1: Type: text/plain, Size: 2087 bytes --]

Hello, Philipp,

Sorry about late reply.  There were OLS and then some personal things to 
attend to.

Philipp Wagner wrote:
> When I remove one drive, I get the following messages into
> /var/log/messages:
> 
> Jul 19 21:59:08 srv1 kernel: ata2: exception Emask 0x10 SAct 0x0 SErr
> 0x1810000 action 0x2 frozen
> Jul 19 21:59:08 srv1 kernel: ata2: soft resetting port
> Jul 19 21:59:08 srv1 kernel: ata2: SATA link down (SStatus 0 SControl 300)
> Jul 19 21:59:08 srv1 kernel: ata2: failed to recover some devices,
> retrying in 5 secs
> Jul 19 21:59:13 srv1 kernel: ata2: hard resetting port
> Jul 19 21:59:14 srv1 kernel: ata2: SATA link down (SStatus 0 SControl 300)
> Jul 19 21:59:14 srv1 kernel: ata2: failed to recover some devices,
> retrying in 5 secs
> Jul 19 21:59:19 srv1 kernel: ata2: hard resetting port
> Jul 19 21:59:20 srv1 kernel: ata2: SATA link down (SStatus 0 SControl 300)
> Jul 19 21:59:20 srv1 kernel: ata2.00: disabled
> Jul 19 21:59:20 srv1 kernel: ata2: EH pending after completion,
> repeating EH (cnt=4)
> Jul 19 21:59:20 srv1 kernel: ata2: EH complete
> 
> The files /dev/sdb* are removed and the RAID says it has been degraded,
> which is perfectly ok.

This looks good.

> Now I re-insert the drive and get the following messages into
> /var/log/messages:
> 
> Jul 19 22:00:11 srv1 kernel: ata2: exception Emask 0x10 SAct 0x0 SErr
> 0x50000 action 0x2 frozen
> Jul 19 22:00:19 srv1 kernel: ata2: port is slow to respond, please be
> patient
> Jul 19 22:00:42 srv1 kernel: ata2: port failed to respond (30 secs)
> Jul 19 22:00:42 srv1 kernel: ata2: soft resetting port
> Jul 19 22:00:49 srv1 kernel: ata2: port is slow to respond, please be
> patient
> Jul 19 22:01:12 srv1 kernel: ata2: port failed to respond (30 secs)
> Jul 19 22:01:12 srv1 kernel: ata2: SATA link up 1.5 Gbps (SStatus 113
> SControl 300)
> Jul 19 22:01:12 srv1 kernel: ata2: EH pending after completion,
> repeating EH (cnt=4)
> Jul 19 22:01:12 srv1 kernel: ata2: EH complete

Hmmm... Softreset fails but libata doesn't notice it has failed.

Can you try the attached patch?

-- 
tejun

[-- Attachment #2: patch --]
[-- Type: text/plain, Size: 1232 bytes --]

diff --git a/drivers/scsi/sata_nv.c b/drivers/scsi/sata_nv.c
index 56da255..0ff682b 100644
--- a/drivers/scsi/sata_nv.c
+++ b/drivers/scsi/sata_nv.c
@@ -257,7 +257,9 @@ static struct ata_port_info nv_port_info
 	/* generic */
 	{
 		.sht		= &nv_sht,
-		.host_flags	= ATA_FLAG_SATA | ATA_FLAG_NO_LEGACY,
+		.host_flags	= ATA_FLAG_SATA | ATA_FLAG_NO_LEGACY |
+				  ATA_FLAG_HRST_TO_RESUME |
+				  ATA_FLAG_SKIP_D2H_BSY,
 		.pio_mask	= NV_PIO_MASK,
 		.mwdma_mask	= NV_MWDMA_MASK,
 		.udma_mask	= NV_UDMA_MASK,
@@ -266,7 +268,9 @@ static struct ata_port_info nv_port_info
 	/* nforce2/3 */
 	{
 		.sht		= &nv_sht,
-		.host_flags	= ATA_FLAG_SATA | ATA_FLAG_NO_LEGACY,
+		.host_flags	= ATA_FLAG_SATA | ATA_FLAG_NO_LEGACY |
+				  ATA_FLAG_HRST_TO_RESUME |
+				  ATA_FLAG_SKIP_D2H_BSY,
 		.pio_mask	= NV_PIO_MASK,
 		.mwdma_mask	= NV_MWDMA_MASK,
 		.udma_mask	= NV_UDMA_MASK,
@@ -275,7 +279,9 @@ static struct ata_port_info nv_port_info
 	/* ck804 */
 	{
 		.sht		= &nv_sht,
-		.host_flags	= ATA_FLAG_SATA | ATA_FLAG_NO_LEGACY,
+		.host_flags	= ATA_FLAG_SATA | ATA_FLAG_NO_LEGACY |
+				  ATA_FLAG_HRST_TO_RESUME |
+				  ATA_FLAG_SKIP_D2H_BSY,
 		.pio_mask	= NV_PIO_MASK,
 		.mwdma_mask	= NV_MWDMA_MASK,
 		.udma_mask	= NV_UDMA_MASK,

^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: Hotplug with sata_nv
  2006-07-30 20:35 ` Tejun Heo
@ 2006-08-13  7:58   ` Philipp Wagner
  0 siblings, 0 replies; 3+ messages in thread
From: Philipp Wagner @ 2006-08-13  7:58 UTC (permalink / raw)
  To: Tejun Heo; +Cc: linux-ide

[-- Attachment #1: Type: text/plain, Size: 647 bytes --]

Hello,
Thank you for your reply. I am only home on weekends, so it took some
time to try your patch.

I applied the patch to 2.6.18-rc4.
At first the good thing: hotplug is working, the device is properly
removed and re-added.
But now the bad side: The device names are changed. When I remove
/dev/sdb, and add it again, it is named /dev/sdc. When I remove /dev/sda
afterwards, it is named /dev/sdd.

Furthermore I am still looking for a solution which automatically starts
the RAID recovery process when the device is re-added.

I attached the relevant parts of /var/log/messages (removal of /dev/sdb
and it being /dev/sdc afterwards).

Philipp


[-- Attachment #2: messages-sata --]
[-- Type: text/plain, Size: 2396 bytes --]

Aug 12 17:38:17 srv1 kernel: ata2: exception Emask 0x10 SAct 0x0 SErr 0x1810000 action 0x2 frozen
Aug 12 17:38:18 srv1 kernel: ata2: hard resetting port
Aug 12 17:38:19 srv1 kernel: ata2: SATA link down (SStatus 0 SControl 300)
Aug 12 17:38:19 srv1 kernel: ata2: failed to recover some devices, retrying in 5 secs
Aug 12 17:38:24 srv1 kernel: ata2: hard resetting port
Aug 12 17:38:25 srv1 kernel: ata2: SATA link down (SStatus 0 SControl 300)
Aug 12 17:38:25 srv1 kernel: ata2: failed to recover some devices, retrying in 5 secs
Aug 12 17:38:30 srv1 kernel: ata2: hard resetting port
Aug 12 17:38:31 srv1 kernel: ata2: SATA link down (SStatus 0 SControl 300)
Aug 12 17:38:31 srv1 kernel: ata2.00: disabled
Aug 12 17:38:32 srv1 kernel: ata2: EH pending after completion, repeating EH (cnt=4)
Aug 12 17:38:32 srv1 kernel: ata2: EH complete
Aug 12 17:38:32 srv1 kernel: ata2.00: detaching (SCSI 1:0:0:0)
Aug 12 17:38:52 srv1 kernel: ata2: exception Emask 0x10 SAct 0x0 SErr 0x50000 action 0x2 frozen
Aug 12 17:38:52 srv1 kernel: ata2: waiting for device to spin up (8 secs)
Aug 12 17:38:52 srv1 kernel: ata2: hard resetting port
Aug 12 17:39:00 srv1 kernel: ata2: port is slow to respond, please be patient
Aug 12 17:39:02 srv1 kernel: ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
Aug 12 17:39:02 srv1 kernel: ata2.00: ATA-7, max UDMA/133, 976773168 sectors: LBA48 NCQ (depth 0/32)
Aug 12 17:39:02 srv1 kernel: ata2.00: configured for UDMA/133
Aug 12 17:39:02 srv1 kernel: ata2: EH pending after completion, repeating EH (cnt=4)
Aug 12 17:39:02 srv1 kernel: ata2: EH complete
Aug 12 17:39:02 srv1 kernel:   Vendor: ATA       Model: ST3500630AS       Rev: 3.AA
Aug 12 17:39:02 srv1 kernel:   Type:   Direct-Access                      ANSI SCSI revision: 05
Aug 12 17:39:02 srv1 kernel: SCSI device sdc: 976773168 512-byte hdwr sectors (500108 MB)
Aug 12 17:39:02 srv1 kernel: sdc: Write Protect is off
Aug 12 17:39:02 srv1 kernel: SCSI device sdc: drive cache: write back
Aug 12 17:39:02 srv1 kernel: SCSI device sdc: 976773168 512-byte hdwr sectors (500108 MB)
Aug 12 17:39:02 srv1 kernel: sdc: Write Protect is off
Aug 12 17:39:02 srv1 kernel: SCSI device sdc: drive cache: write back
Aug 12 17:39:02 srv1 kernel:  sdc: sdc1 sdc2 sdc3 sdc4 < sdc5 >
Aug 12 17:39:02 srv1 kernel: sd 1:0:0:0: Attached scsi disk sdc
Aug 12 17:39:02 srv1 kernel: sd 1:0:0:0: Attached scsi generic sg1 type 0

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2006-08-13  7:58 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-07-19 20:51 Hotplug with sata_nv Philipp Wagner
2006-07-30 20:35 ` Tejun Heo
2006-08-13  7:58   ` Philipp Wagner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).