public inbox for linux-scsi@vger.kernel.org
 help / color / mirror / Atom feed
* Bad emulex/linux FC behavior
@ 2009-05-27 21:10 Jeremy Linton
  2009-05-27 22:00 ` James Smart
  0 siblings, 1 reply; 2+ messages in thread
From: Jeremy Linton @ 2009-05-27 21:10 UTC (permalink / raw)
  To: Linux Scsi, james.smart

While hunting for a data integrity problem, I discovered that the emulex lpfc driver sends target resets to all attached 
devices on a SAN in the lpfc_reset_bus_handler() routine. That routine is exported in the host template and is 
indirectly called by scsi_eh_ready_devs() when any device attached to the current HBA fails (more on this later).

I believe this is fundamentally the wrong thing to be doing. It makes sense for a parallel SCSI bus but not on a SAN. 
SAN attached devices are completely independent, and a single target failure should not result in anyone zoned together 
with the failed device being reset.

This is not an unusual configuration. Our product runs in an environment where the customers regularly have a pool of 
tape drives shared among a mix of server platforms/applications. Having our machine spuriously sending resets to devices 
currently in use by other servers is a serious problem.

Lun resets are defined by the T10 specifications to drop reservations, clear soft write protect flags, clear medium 
removal preventions, etc. I can't find anywhere that describes the behaviors of write back buffers during resets, but I 
believe I've found one device which dumps the write back buffer without flushing it to the media when it receives a 
reset. Either way, any application or application that depends on system drivers using reservations as a method of 
device arbitration between machines/application instances is going to have serious problems. Should an attached Linux 
machine clear those reservations it will result in a data integrity problems.

As such, I suggest that the emulex driver simply return from the eh_bus_reset_handler. I've been running like that for a 
few days and it doesn't adversely affect anything. I've also looked at the qlogic driver and it has a similar piece of 
code but the functionality can be enabled/disabled by the enable_target_reset flag (from nvram) in qla2x00_loop_reset(). 
Frankly, I can't really imagine a scenario where its valid to reset an attached device on a SAN if a completely 
different device fails. Personally, I would remove the code there too.

Our machines are running fairly old kernels, but the latest RC kernels appear to still have the same code.










^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2009-05-27 22:00 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-05-27 21:10 Bad emulex/linux FC behavior Jeremy Linton
2009-05-27 22:00 ` James Smart

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox