* Re: Bad emulex/linux FC error handing behavior
@ 2009-05-28 9:36 Jeremy Linton
0 siblings, 0 replies; only message in thread
From: Jeremy Linton @ 2009-05-28 9:36 UTC (permalink / raw)
To: Linux Scsi, james.smart
James Smart wrote:
> However, the question in my mind is - why did you get to bus reset ?
Because the device is having intermittent problems? The whole error handler sequence fails (tur failures, etc), and it
ends up marking the device off-line. In the process it shoots everything else in the head. This is the behavior i'm
having a problem with. I don't really care about the state of the failing device, it is having a physical problem. My
problem is the remainder of the shared devices which are having their activities interrupted. In many cases, those other
machines/devices many not even have visibility to the failing device. It becomes a serious error isolation problem. From
the perspective of other hosts, the only way to track the error down is to actually have an analyzer attached to the
interrupted devices. Assuming it reproduces, the analyzer can then detect the reset and identify the source port it
originated from. That machine may then be removed from the SAN. This whole process can be nearly impossible to perform
at a customer's site.
>The reason for the behavior is to replicate the parallel scsi behavior,
>which is expected/required by many people.
I'm confused by this. For parallel SCSI, there were device dependencies due to the physical bus. The bus reset was
standard error handing because a bad/failing SCSI device often put the bus in a unrecoverable state for the remainder of
the devices. SPI also rarely had multiple initiators sharing devices.
I was unaware of how big the "hammer" lpfc tends to use against the SAN when a device fails. I suspect that I'm not the
only one. Is there are way to simulate the SPI behavior(?), short of actually resetting all attached devices? For that
matter, I'm a little confused what exactly the intended behavior is. Can you enlighten me? I could understand if it was
just resetting all luns on a particular device, but its resetting all attached devices.
> We can certainly discuss adding a parameter that
> controls the behavior, but this should be on a transport basis, not on
> an adapter-specific manner.
Thats a great plan. To me it makes sense that this behavior should be transport dependent, I would want it for SPI, but
not for FC or iSCSI. How likely is that to be accepted? The SCSI error hander seem to be completely transport
independent. Initially, I targeted the emulex driver because the qlogic already has a way to disable the behavior, and
the LSI driver doesn't appear to support this behavior at all.
^ permalink raw reply [flat|nested] only message in thread
only message in thread, other threads:[~2009-05-28 9:36 UTC | newest]
Thread overview: (only message) (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-05-28 9:36 Bad emulex/linux FC error handing behavior Jeremy Linton
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox