public inbox for linux-scsi@vger.kernel.org
 help / color / mirror / Atom feed
From: Jeremy Linton <jlinton@greshamstorage.com>
To: Linux Scsi <linux-scsi@vger.kernel.org>, james.smart@emulex.com
Subject: Re: Bad emulex/linux FC error handing behavior
Date: Thu, 28 May 2009 04:36:58 -0500	[thread overview]
Message-ID: <4A1E5B3A.1070202@greshamstorage.com> (raw)

James Smart wrote:
 > However, the question in my mind is - why did you get to bus reset ?
	Because the device is having intermittent problems? The whole error handler sequence fails (tur failures, etc), and it 
ends up marking the device off-line. In the process it shoots everything else in the head. This is the behavior i'm 
having a problem with. I don't really care about the state of the failing device, it is having a physical problem. My 
problem is the remainder of the shared devices which are having their activities interrupted. In many cases, those other 
machines/devices many not even have visibility to the failing device. It becomes a serious error isolation problem. From 
the perspective of other hosts, the only way to track the error down is to actually have an analyzer attached to the 
interrupted devices. Assuming it reproduces, the analyzer can then detect the reset and identify the source port it 
originated from. That machine may then be removed from the SAN. This whole process can be nearly impossible to perform 
at a customer's site.


 >The reason for the behavior is to replicate the parallel scsi behavior,
 >which is expected/required by many people.

	I'm confused by this. For parallel SCSI, there were device dependencies due to the physical bus. The bus reset was 
standard error handing because a bad/failing SCSI device often put the bus in a unrecoverable state for the remainder of 
the devices. SPI also rarely had multiple initiators sharing devices.
	I was unaware of how big the "hammer" lpfc tends to use against the SAN when a device fails. I suspect that I'm not the 
only one. Is there are way to simulate the SPI behavior(?), short of actually resetting all attached devices? For that 
matter, I'm a little confused what exactly the intended behavior is. Can you enlighten me? I could understand if it was 
just resetting all luns on a particular device, but its resetting all attached devices.


 > We can certainly discuss adding a parameter that
 > controls the behavior, but this should be on a transport basis, not on
 > an adapter-specific manner.

	Thats a great plan. To me it makes sense that this behavior should be transport dependent, I would want it for SPI, but 
not for FC or iSCSI. How likely is that to be accepted? The SCSI error hander seem to be completely transport 
independent. Initially, I targeted the emulex driver because the qlogic already has a way to disable the behavior, and 
the LSI driver doesn't appear to support this behavior at all.



                 reply	other threads:[~2009-05-28  9:36 UTC|newest]

Thread overview: [no followups] expand[flat|nested]  mbox.gz  Atom feed

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4A1E5B3A.1070202@greshamstorage.com \
    --to=jlinton@greshamstorage.com \
    --cc=james.smart@emulex.com \
    --cc=linux-scsi@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox