linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Disk failure behavior
@ 2009-09-04 17:23 Chaitanya Lala
  2009-09-05  0:31 ` Tejun Heo
  0 siblings, 1 reply; 2+ messages in thread
From: Chaitanya Lala @ 2009-09-04 17:23 UTC (permalink / raw)
  To: tj; +Cc: clala, rbecker, linux-kernel

Hi,

I am using a back-port of libata from ~ 2.6.20 on a 2.6.9
Red Hat kernel. I have SATA disks (using AHCI) in the 
system which are hot-pluggable. The problem I am facing
is that, certain disk failures bring the system into a
weird state. The system tries to reset the disk but fails.
Finally it prints a message "reset failed, giving up."

At this point the port is left in a frozen state and
the interrupts from the port are masked. If now, this disk is
pulled out and a healthy disk is inserted, the new disk's
insertion does not raise any event/notification/interrupt.
In fact, the only way at this point to get the disk to work is
to reboot.

Below is a snippet of the code, I am referring to, from v2.6.20.
File - drivers/ata/libata-eh.c & function-name -  ata_eh_recover
 
	/* reset */
	if (ehc->i.action & ATA_EH_RESET_MASK) {
		ata_eh_freeze_port(ap);

		rc = ata_eh_reset(ap, ata_port_nr_vacant(ap), prereset,
				  softreset, hardreset, postreset);
		if (rc) {
			ata_port_printk(ap, KERN_ERR,
					"reset failed, giving up\n");
			goto out; 
		}    

		ata_eh_thaw_port(ap);
	} 

A possible work-around is to thaw the port before going to "out".
That would enable the interrupts again before going to "out".
I understand that would enable future interrupts from the old disk as well,
but I am willing to live with that, if it helps to detect the new device.

	/* reset */
	if (ehc->i.action & ATA_EH_RESET_MASK) {
		ata_eh_freeze_port(ap);

		rc = ata_eh_reset(ap, ata_port_nr_vacant(ap), prereset,
				  softreset, hardreset, postreset);
		if (rc) {
			ata_port_printk(ap, KERN_ERR,
					"reset failed, giving up\n");
+			ata_eh_thaw_port(ap);
			goto out; 
		}    

		ata_eh_thaw_port(ap);
	} 

I have tested this successfully. But I would like to ask you if this would
possibly "break" some other functionality ? I am new to the kernel ata stuff
and want to be sure before I use this.

Thanks,
Chaitanya


^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: Disk failure behavior
  2009-09-04 17:23 Disk failure behavior Chaitanya Lala
@ 2009-09-05  0:31 ` Tejun Heo
  0 siblings, 0 replies; 2+ messages in thread
From: Tejun Heo @ 2009-09-05  0:31 UTC (permalink / raw)
  To: Chaitanya Lala; +Cc: rbecker, linux-kernel

Hello,

Chaitanya Lala wrote:
> I am using a back-port of libata from ~ 2.6.20 on a 2.6.9
> Red Hat kernel. I have SATA disks (using AHCI) in the 
> system which are hot-pluggable. The problem I am facing
> is that, certain disk failures bring the system into a
> weird state. The system tries to reset the disk but fails.
> Finally it prints a message "reset failed, giving up."
> 
> At this point the port is left in a frozen state and
> the interrupts from the port are masked. If now, this disk is
> pulled out and a healthy disk is inserted, the new disk's
> insertion does not raise any event/notification/interrupt.
> In fact, the only way at this point to get the disk to work is
> to reboot.

# echo - - - /sys/class/scsi_host/hostX/scan

should revive it too.

> Below is a snippet of the code, I am referring to, from v2.6.20.
> File - drivers/ata/libata-eh.c & function-name -  ata_eh_recover
>  
> 	/* reset */
> 	if (ehc->i.action & ATA_EH_RESET_MASK) {
> 		ata_eh_freeze_port(ap);
> 
> 		rc = ata_eh_reset(ap, ata_port_nr_vacant(ap), prereset,
> 				  softreset, hardreset, postreset);
> 		if (rc) {
> 			ata_port_printk(ap, KERN_ERR,
> 					"reset failed, giving up\n");
> 			goto out; 
> 		}    
> 
> 		ata_eh_thaw_port(ap);
> 	} 
> 
> A possible work-around is to thaw the port before going to "out".
> That would enable the interrupts again before going to "out".
> I understand that would enable future interrupts from the old disk as well,
> but I am willing to live with that, if it helps to detect the new device.
> 
> 	/* reset */
> 	if (ehc->i.action & ATA_EH_RESET_MASK) {
> 		ata_eh_freeze_port(ap);
> 
> 		rc = ata_eh_reset(ap, ata_port_nr_vacant(ap), prereset,
> 				  softreset, hardreset, postreset);
> 		if (rc) {
> 			ata_port_printk(ap, KERN_ERR,
> 					"reset failed, giving up\n");
> +			ata_eh_thaw_port(ap);
> 			goto out; 
> 		}    
> 
> 		ata_eh_thaw_port(ap);
> 	} 
> 
> I have tested this successfully. But I would like to ask you if this would
> possibly "break" some other functionality ? I am new to the kernel ata stuff
> and want to be sure before I use this.

Unless your controller causes IRQ storm bringing down the controller,
the above change shouldn't be dangerous.

-- 
tejun

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2009-09-05  0:31 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-09-04 17:23 Disk failure behavior Chaitanya Lala
2009-09-05  0:31 ` Tejun Heo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).