public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Michael Clark <michael@metaparadigm.com>
To: Lincoln Dale <ltd@cisco.com>
Cc: Jurjen Oskam <jurjen@quadpro.stupendous.org>,
	linux-kernel@vger.kernel.org
Subject: Re: Booting from Qlogic qla2300 fibre channel card
Date: Wed, 16 Apr 2003 23:32:21 +0800	[thread overview]
Message-ID: <3E9D7785.5020205@metaparadigm.com> (raw)
In-Reply-To: <5.1.0.14.2.20030416162813.03b1b6e8@mira-sjcm-3.cisco.com>

Hi,

On 04/16/03 14:56, Lincoln Dale wrote:
> Hi,
> 
> At 08:18 AM 16/04/2003 +0200, Jurjen Oskam wrote:
> 
>> At work, we are looking to deploy several Linux boxes on our SAN. The
>> machines will be IBM eServer xSeries 345 with Qlogic qla2340 Fibre 
>> Channel
>> cards, and no internal disks.
>>
>> The storage array is an EMC Symmetrix model 8530. EMC created a document
>> where they explain how to make such a configuration work. When they 
>> mention
>> booting from a Symmetrix-provided volume, they mention the following:
>>
>> "If Linux loses connectivity long enough, the disks disappear from the
>> system. [...] For [this reason], EMC recommends that you do not boot a
>> Linux host from the EMC storage array."
> 
> 
> in general, all OSes get rather upset if disks disappear under them.  
> particularly if those disks contain swap -- exactly how is the machine 
> meant to recover from that?
> 
> some recommendations:
>  - run with the Matthew Jacob's "feral" driver rather than QLogic's driver
>    it has much better error recovery

Although this is certainly a matter of opinion. When i tried the feral
driver a month ago - upon unplugging the fibre (and getting loop down)
the SCSI layer started spewing IO errors and the files copied during
this test (on ext3) had invalid checksums. The qlogic driver however
handled this test fine (handling multiple fibre unplugs while copying a
multi gigabyte file). Certainly the qlogic driver has its fair share of
recovery problems such as an abort function that tries to re-init the
hardware but always fails.

I'm currently looking for alternatives to qlogic HBAs after a year of
not being able to find a stable driver combo (one that can stand up
for more than a few weeks). Does any one out there have experience
with the LSI HBAs and Fusion MPT drivers or perhaps Emulex?

We get the following with latest 6.1 qlogic driver and our 2300s about
every 2 weeks (we are about to file a bug report to qlogic).

Apr  2 10:54:13 prodapp3 kernel: qla2x00: Status Entry invalid handle.
Apr  2 10:54:13 prodapp3 kernel: qla2x00: Performing ISP error recovery - ha= c3afc07c.
Apr  2 10:54:13 prodapp3 kernel: qla2x00_abort_isp(2): **** FAILED ****
Apr  2 10:54:13 prodapp3 kernel: qla2x00: Performing ISP error recovery - ha= c3afc07c.
Apr  2 10:54:13 prodapp3 kernel: qla2x00_abort_isp(2): **** FAILED ****
Apr  2 10:54:13 prodapp3 kernel: qla2x00: Performing ISP error recovery - ha= c3afc07c.
Apr  2 10:54:13 prodapp3 kernel: qla2x00_abort_isp(2): **** FAILED ****
Apr  2 10:54:14 prodapp3 kernel: qla2x00: Performing ISP error recovery - ha= c3afc07c.
Apr  2 10:54:14 prodapp3 kernel: qla2x00_abort_isp(2): **** FAILED ****
Apr  2 10:54:15 prodapp3 kernel: qla2x00: Performing ISP error recovery - ha= c3afc07c.
Apr  2 10:54:15 prodapp3 kernel: qla2x00_abort_isp(2): **** FAILED ****
Apr  2 10:54:16 prodapp3 kernel: qla2x00: Performing ISP error recovery - ha= c3afc07c.
Apr  2 10:54:16 prodapp3 kernel: qla2x00_abort_isp(2): **** FAILED ****
Apr  2 10:54:17 prodapp3 kernel: qla2x00: Performing ISP error recovery - ha= c3afc07c.
Apr  2 10:54:17 prodapp3 kernel: qla2x00(2): ISP error recovery failed - board disabled

>  - you may want to increase the delay of SCSI_TIMEOUT in 
> drivers/scsi/scsi.h
> 
> in my lab here, i do a ton of work on Fibre Channel & iSCSI.
> the best setup i've found is that i end up using ramfs as my root and 
> having lots of things in there.  sure, its burns a bit of ram, but i can 
> be sure if i'm doing anything that could impact the i/o path, its on 
> less system-critical stuff.  since its a lab and the things running on 
> the hosts aren't RAM hongs, i don't have swap either.  you probably 
> can't get away with that, so i'd recommend doing some extensive testing 
> pulling cables out and seeing what happens and tuning timers to cope 
> with it accordingly.
> 
>> When making an online configuration change on the Symmetrix (such as
>> remapping volumes), it is possible for the attached hosts to experience
>> a temporary error while accessing a storage array volume. For example,
> 
> 
> are you sure this tech note will still apply with the DMX?
> i'd imagine that there are still bin file changes that can cause this 
> kind of thing, but its something i believe EMC was addressing with the DMX.
> 
>> when changing the Symmetrix configuration, it is not uncommon for the
>> RS/6000s (also attached to the SAN) to log one or two temporary
>> SCSI-errors. They don't cause any problems at all, the AIX volume manager
>> never notices a problem.
> 
> 
> on RS/6000's, the rules were somewhat different.  the HBAs that IBM had 
> for RS6Ks typically only tried to issue FLOGIs once every 30 seconds - 
> so you would be more likely to see timeout errors if you impacted the 
> flow of traffic temporarily.
> 
> 
> cheers,
> 
> lincoln.
> 


  parent reply	other threads:[~2003-04-16 15:20 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2003-04-16  6:18 Booting from Qlogic qla2300 fibre channel card Jurjen Oskam
2003-04-16  6:56 ` Lincoln Dale
2003-04-16  9:48   ` Jurjen Oskam
2003-04-16 15:32   ` Michael Clark [this message]
2003-04-16 15:56     ` James Bourne
2003-04-16 16:25       ` jds
2003-04-16 16:43       ` Michael Clark
2003-04-16 16:10 ` Patrick Mansfield

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3E9D7785.5020205@metaparadigm.com \
    --to=michael@metaparadigm.com \
    --cc=jurjen@quadpro.stupendous.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=ltd@cisco.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox