Re: Bad emulex/linux FC error handing behavior

public inbox for linux-scsi@vger.kernel.org
 help / color / mirror / Atom feed

* Re: Bad emulex/linux FC error handing behavior
@ 2009-05-28  9:36 Jeremy Linton
  0 siblings, 0 replies; only message in thread
From: Jeremy Linton @ 2009-05-28  9:36 UTC (permalink / raw)
  To: Linux Scsi, james.smart

James Smart wrote:
 > However, the question in my mind is - why did you get to bus reset ?
	Because the device is having intermittent problems? The whole error handler sequence fails (tur failures, etc), and it 
ends up marking the device off-line. In the process it shoots everything else in the head. This is the behavior i'm 
having a problem with. I don't really care about the state of the failing device, it is having a physical problem. My 
problem is the remainder of the shared devices which are having their activities interrupted. In many cases, those other 
machines/devices many not even have visibility to the failing device. It becomes a serious error isolation problem. From 
the perspective of other hosts, the only way to track the error down is to actually have an analyzer attached to the 
interrupted devices. Assuming it reproduces, the analyzer can then detect the reset and identify the source port it 
originated from. That machine may then be removed from the SAN. This whole process can be nearly impossible to perform 
at a customer's site.


 >The reason for the behavior is to replicate the parallel scsi behavior,
 >which is expected/required by many people.

	I'm confused by this. For parallel SCSI, there were device dependencies due to the physical bus. The bus reset was 
standard error handing because a bad/failing SCSI device often put the bus in a unrecoverable state for the remainder of 
the devices. SPI also rarely had multiple initiators sharing devices.
	I was unaware of how big the "hammer" lpfc tends to use against the SAN when a device fails. I suspect that I'm not the 
only one. Is there are way to simulate the SPI behavior(?), short of actually resetting all attached devices? For that 
matter, I'm a little confused what exactly the intended behavior is. Can you enlighten me? I could understand if it was 
just resetting all luns on a particular device, but its resetting all attached devices.


 > We can certainly discuss adding a parameter that
 > controls the behavior, but this should be on a transport basis, not on
 > an adapter-specific manner.

	Thats a great plan. To me it makes sense that this behavior should be transport dependent, I would want it for SPI, but 
not for FC or iSCSI. How likely is that to be accepted? The SCSI error hander seem to be completely transport 
independent. Initially, I targeted the emulex driver because the qlogic already has a way to disable the behavior, and 
the LSI driver doesn't appear to support this behavior at all.



^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2009-05-28  9:36 UTC | newest]

Thread overview: (only message) (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-05-28  9:36 Bad emulex/linux FC error handing behavior Jeremy Linton

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox