Re: Bad emulex/linux FC behavior

public inbox for linux-scsi@vger.kernel.org
 help / color / mirror / Atom feed

From: James Smart <James.Smart@Emulex.Com>
To: Jeremy Linton <jli@greshamstorage.com>
Cc: Linux Scsi <linux-scsi@vger.kernel.org>
Subject: Re: Bad emulex/linux FC behavior
Date: Wed, 27 May 2009 18:00:13 -0400	[thread overview]
Message-ID: <4A1DB7ED.5080702@emulex.com> (raw)
In-Reply-To: <4A1DAC5E.2070709@greshamstorage.com>

(bad by your interpretation, not bad by others....)

The reason for the behavior is to replicate the parallel scsi behavior, 
which is expected/required by many people.  I certainly disagree with 
unconditionally nooping the bus reset handler - and it certainly defeats 
the intended purpose of the reset (for errors other than the one you are 
encountering).   We can certainly discuss adding a parameter that 
controls the behavior, but this should be on a transport basis, not on 
an adapter-specific manner.

However, the question in my mind is - why did you get to bus reset ?  
The upstream kernel should be going through it's successively larger 
hammer reset policy. There should be device (lun) resets, followed by 
target resets, then the bus reset. If the error was target-centric, it 
should not have hit the bus reset.

Looking at the 8.3.1 driver - it looks like somewhere along the line, we 
dropped the patches that added the target-reset support. This really 
concerns me. I'll repost them, and that should resolve your issue - 
unless the target is completely gone. However, for older kernels, the 
midlayer didn't have target reset support, so there has to be a 
different answer.

-- james s


Jeremy Linton wrote:
> While hunting for a data integrity problem, I discovered that the emulex lpfc driver sends target resets to all attached 
> devices on a SAN in the lpfc_reset_bus_handler() routine. That routine is exported in the host template and is 
> indirectly called by scsi_eh_ready_devs() when any device attached to the current HBA fails (more on this later).
>
> I believe this is fundamentally the wrong thing to be doing. It makes sense for a parallel SCSI bus but not on a SAN. 
> SAN attached devices are completely independent, and a single target failure should not result in anyone zoned together 
> with the failed device being reset.
>
> This is not an unusual configuration. Our product runs in an environment where the customers regularly have a pool of 
> tape drives shared among a mix of server platforms/applications. Having our machine spuriously sending resets to devices 
> currently in use by other servers is a serious problem.
>
> Lun resets are defined by the T10 specifications to drop reservations, clear soft write protect flags, clear medium 
> removal preventions, etc. I can't find anywhere that describes the behaviors of write back buffers during resets, but I 
> believe I've found one device which dumps the write back buffer without flushing it to the media when it receives a 
> reset. Either way, any application or application that depends on system drivers using reservations as a method of 
> device arbitration between machines/application instances is going to have serious problems. Should an attached Linux 
> machine clear those reservations it will result in a data integrity problems.
>
> As such, I suggest that the emulex driver simply return from the eh_bus_reset_handler. I've been running like that for a 
> few days and it doesn't adversely affect anything. I've also looked at the qlogic driver and it has a similar piece of 
> code but the functionality can be enabled/disabled by the enable_target_reset flag (from nvram) in qla2x00_loop_reset(). 
> Frankly, I can't really imagine a scenario where its valid to reset an attached device on a SAN if a completely 
> different device fails. Personally, I would remove the code there too.
>
> Our machines are running fairly old kernels, but the latest RC kernels appear to still have the same code.
>
>
>
>
>
>
>
>
>
>
>

     prev parent reply	other threads:[~2009-05-27 22:00 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-05-27 21:10 Bad emulex/linux FC behavior Jeremy Linton
2009-05-27 22:00 ` James Smart [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4A1DB7ED.5080702@emulex.com \
    --to=james.smart@emulex.com \
    --cc=jli@greshamstorage.com \
    --cc=linux-scsi@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox