Re: lpfc SAN/SCSI issue

public inbox for linux-scsi@vger.kernel.org
 help / color / mirror / Atom feed

From: James Smart <james.smart@emulex.com>
To: brem belguebli <brem.belguebli@gmail.com>
Cc: "linux-scsi@vger.kernel.org" <linux-scsi@vger.kernel.org>
Subject: Re: lpfc SAN/SCSI issue
Date: Fri, 23 Apr 2010 09:28:17 -0400	[thread overview]
Message-ID: <4BD1A071.2010202@emulex.com> (raw)
In-Reply-To: <1271964275.2480.1.camel@localhost>

Brem,

We're looking at the lpfc driver as to whether this matches anything we are 
aware of.

Please send me the system console log during this time frame. No messages 
whatsoever would be very odd.  Sending us the output of the shost, rport, and 
sdev sysfs parameters, as well as DM configuration values would also help. 
It won't necessarily be i/o timers that would fire, but other timers should.

-- james s


brem belguebli wrote:
> I have a server (RHEL 5.3) connected to 2 SAN extended fabrics (across 2
> sites, distance 1 ms, links are ISL with 100 km long distance buffer
> credits) via 2 lpfc HBA's (LPe1105-HP FC with the RHEL 5.3 shipped LPFC
> driver 8.2.0.33.3p.)
>  
> A SAN FABRIC reconfiguration (DWDM Ring failover from worker to
> protection)  occured yesterday  after some intersite telco link switch
> that lasted less than 0,3 ms. 
>  
> Only one FABRIC was impacted, named FABRIC2 
>  
> Our server is connected to the FABRICs thru 2 edge switches, so not
> directly connected to the core switches on which the link failure
> occured. 
>  
>>From then, our server (which accesses thru the 2 fabrics the LUNS from
> our 2 sites) started to climb in terms of load average (up to 250 for a
> dual proc quadcore machine!) with a high percentage of iowait (up to
> 50%). 
>  
> We did some testing, bypassing DM-MP by issuing dd commands to the
> physical /dev/sdX devices (more than 30 LUNS are presented to the
> server, seen each thru 4 paths making more than 120 /dev/sd devices)
> and half of our dd processes went to D state, as well as some unitary
> scsi_id that we manually run on the same physical devices. 
>  
> Multipathd itself was also in D state. 
>  
> The only way to restore the whole thing was to reset the server HBA
> connected to FABRIC2, after 2 hours of investigation 
>  
> No kind of scsi log, or whatever did appear during the outage duration
> (~2 hours) despite the fact that the scsi timeouts set on the physical
> devices is 60s, that the HBA's timeout is 14s. 
>  
> The /sys/block/sdX/device/state were showing running state despite the
> fact that the devices (well half of them) were actually inaccessible. 
>  
> What leads me to : 
>  
> 1) assumption: it looks the lpfc driver following this SAN event goes in
> a black hole mode not returning any io error or whatever to the scsi
> upper layer 
>  
> 2) question: how come the scsi timers don't trigger and declare the
> device faulty (the answer may be in the above assumption). 
>  
> Any idea or tip on what could cause this, some FC SCN message not well
> handled or whatever ?
> 
> Regards
> 
> Brem
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

next prev parent reply	other threads:[~2010-04-23 13:28 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-04-22 16:47 [PATCH] mpt2sas: DIF Type 2 Protection Support Eric Moore
2010-04-22 19:24 ` lpfc SAN/SCSI issue brem belguebli
2010-04-23 13:28   ` James Smart [this message]
     [not found]     ` <j2o29ae894c1004230922le8baf635y563e50e3edc53bc3@mail.gmail.com>
     [not found]       ` <4BD226F4.6070908@emulex.com>
     [not found]         ` <1272109999.2983.30.camel@localhost>
     [not found]           ` <4BD5D258.8030309@emulex.com>
2010-04-26 21:52             ` brem belguebli
2010-04-27 17:37               ` brem belguebli
2010-05-03 16:39                 ` brem belguebli
2010-05-05 14:01                   ` James Smart
2010-05-06 11:06                     ` brem belguebli
2010-05-06 13:39                       ` James Smart

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4BD1A071.2010202@emulex.com \
    --to=james.smart@emulex.com \
    --cc=brem.belguebli@gmail.com \
    --cc=linux-scsi@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox