public inbox for linux-scsi@vger.kernel.org
 help / color / mirror / Atom feed
* [RFC] libsas: the trouble with ata resets
@ 2011-10-23 23:48 Dan Williams
  2011-10-26  3:04 ` Jack Wang
  2011-11-11 16:14 ` [RFC] enclosure & ses: modernize and add target power management Mark Salyzyn
  0 siblings, 2 replies; 14+ messages in thread
From: Dan Williams @ 2011-10-23 23:48 UTC (permalink / raw)
  To: linux-scsi, IDE/ATA development list
  Cc: Skirvin, Jeffrey D, Jacek Danecki, edmund.nadolski, Luben Tuikov,
	Mark Salyzyn, Jack Wang, Hannes Reinecke, David Milburn

Currently libsas has a problem with prematurely dropping sata devices
during recovery.  Libata knows that some devices can take quite a
while to recover from a reset and re-establish the link.  The fact
that sas_ata_hard_reset() ignores its 'deadline'  parameter is
evidence that it ignores the link management aspects of what libata
wants from a ->hardreset() handler.

item1: teach sas_ata_hard_reset() to check that the link came back up.
 For direct attached devices the lldd will need the deadline
parameter, and for expander attached perform smp polling to wait for
the link to come back.

Now, during this time that libata is trying to recover the connection
in the host-eh context libsas will start receiving BCNs in the
host-workqueue context.  In the unfortunate cases libsas may take
removal action on a device that will come back with a bit more time.
While libata-eh is in progress libsas should not take any action on
the ata phys in question..

item2:  flush eh before trying to determine what action to take on a phy.

In the case of libsas not all resets are initiated by the eh process
(the sas transport class can reset a phy directly).  It seems libata
takes care to arrange for user requested resets to occur under the
control of eh, and libsas should do the same.

item3: teach all reset entry points to kick and flush eh for ata devices

A corollary for items 1 and 3 is that there is a difference between
scheduling the reset and performing the reset.
->lldd_I_T_nexus_reset() is currently called twice, once by sas-eh to
manage sas_tasks and again by ata-eh to recover the device.  Likely we
need a new ->lldd_ata_hard_reset() handler that is called by ata-eh,
while ->lldd_I_T_nexus_reset() cleans up the sas_tasks and just
schedules reset on the ata_port.

item4: allow for lldd's to provide a direct ->lldd_ata_hard_reset()
which can be assumed to only be called from ata-eh context.

Any other pain points in reset handling?

--
Dan

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2012-05-15  0:11 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-10-23 23:48 [RFC] libsas: the trouble with ata resets Dan Williams
2011-10-26  3:04 ` Jack Wang
2011-11-11 16:14 ` [RFC] enclosure & ses: modernize and add target power management Mark Salyzyn
2011-11-11 21:33   ` Douglas Gilbert
2011-11-14 16:21     ` Mark Salyzyn
2012-01-12 17:03       ` [PATCH][SCSI] " Mark Salyzyn
2012-02-19 16:02         ` James Bottomley
2012-05-10 20:48           ` [PATCH][SCSI] panic within ses.ko during insmod Mark Salyzyn
2012-05-10 21:02             ` [PATCH][SCSI] panic within ses.ko during insmod (take 2) Mark Salyzyn
2012-05-11 19:08             ` [PATCH][SCSI] panic within ses.ko during insmod Dan Williams
2012-05-15  0:11               ` Mark Salyzyn
2012-02-22 19:09         ` [PATCH][SCSI] enclosure & ses: modernize and add target power management (take 2) Mark Salyzyn
2012-02-22 20:12           ` Douglas Gilbert
2012-02-22 20:40             ` Mark Salyzyn

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox