All of lore.kernel.org
 help / color / mirror / Atom feed
From: Brian De Wolf <bldewolf@csupomona.edu>
To: device-mapper development <dm-devel@redhat.com>
Cc: linux-scsi@vger.kernel.org
Subject: Re: [dm-devel] dm-mpath-rdac.patch problem
Date: Fri, 13 Jul 2007 12:33:03 -0700	[thread overview]
Message-ID: <4697D36F.5030201@csupomona.edu> (raw)
In-Reply-To: <20070713161253.GA20901@plap.qlogic.org>

Andrew Vasquez wrote:
> On Thu, 12 Jul 2007, Mike Anderson wrote:
> 
>> Copying this mail to linux-scsi and Ccing Andrew Vasquez to possibly
>> provide input on the Qlogic behavior.
>>
>> Chandra Seetharaman <sekharan@us.ibm.com> wrote:
>>> On Thu, 2007-07-12 at 18:35 -0700, Brian De Wolf wrote:
>>>> Hello All,
>>>>
>>>> I'm not sure if this is the right place for this, but it seems to be the only
>>>> mailing list related to dm, multipath, and rdac, as far as I can tell.  I've
>>>> been trying out the dm-mpath-rdac patch (both yesterday's and previous) with
>>>> gentoo's unstable 2.6.22 kernel, on a Sun x4100 through a QLA2422 HBA (firmware
>>>> ql2400_fw.bin.4.00.27) to an IBM DS4000.  I am using a version of
>>>> multipath-tools that I got with git a few days ago.
>>>>
>>>> I've got multipath working, it reports the hwhandler correctly ([hwhandler=1
>>>> rdac]), and the volume is mountable, etc.  It also shows one link as active, the
>>>> other as ghost.  However, once the active link dies, the volume becomes read
>>>> only, and both connections are listed as failed.  Most importantly, something
>>>> like this shows up in the logs:
>>>>
>>>> Jul 12 17:11:15 jimbo kernel: device-mapper: multipath rdac: queueing
>>>> MODE_SELECT command on 8:32
>>> It does look like the rdac hardware handler is doing the right thing and
>>> the qlogic is dying for some reason.
>>>
>>> I have tested this code in both RHEL5 and SLES10 environments (qla23xx)
>>> and they work fine. Can you try in one of those and see if it is any
>>> different.
>>>
>>> Just an FYI w.r.t multipath tools: please remove the patch
>>> http://git.kernel.org/?p=linux/storage/multipath-
>>> tools/.git;a=commit;h=e1e1a1bfb2cf76bfd1a49335e3deec5360fb09db from your
>>> tree for the tools to calculate the path priorities properly.
>>>
>>>
>>>> Jul 12 17:11:15 jimbo kernel: qla2xxx 0000:02:01.1: ISP System Error - mbx1=0h
>>>> mbx2=8012h mbx3=8002h.
>>>> Jul 12 17:11:15 jimbo kernel: qla2xxx 0000:02:01.1: Firmware has been previously
>>>> dumped (ffffc2000171d000) -- ignoring request...
>>>> Jul 12 17:11:16 jimbo kernel: qla2xxx 0000:02:01.1: Performing ISP error
>>>> recovery - ha= ffff81007e85c530.
> 
> Hmm yes, there's some real problems going on within the firmware which
> we need to triage.  From the snippet above, the driver was able to
> capture a firmware-dump of a failure (not sure of the timing and how
> it relates to the window in which you recognized a 'problem'), but
> I'll need to to 'capture' the firmware trace and forward it along to
> us to inspect.
> 
> 1) download the following shell script:
> 
> 	ftp://ftp.qlogic.com/outgoing/linux/beta/8.x/test/qla_dmp.sh
> 
> 2) copy the script to the host (/tmp) which is experiencing the
>    problems.
> 
> 3) reboot and load the driver with the ql2xextended_error_logging
>    module parameter set to 1. e.g.:
> 
> 	$ insmod qla2xxx.ko ql2xextended_error_logging=1
> 
> 4) rerun your test and monitor the kernel-messages file for a message
>    similar to:
> 
>         Firmware dump saved to temp buffer (1/adcdabcd)
> 
> 5) To retrieve the dump, go to a console and type the following:
> 
>         # cd /tmp/
>         # ./qla_dmp.sh 1
> 
>    The value passed to qla_dmp.sh should be the same as the first integer
>    in the 'saved to temp buffer' string (in this example, 1).  If the
>    operation was successful, a message like to following should be
>    displayed:
> 
>         Firmware dumped to file fw_dump_1_20041217_023222.txt.gz
> 
>    Formward the 
>    forward over the file.
> 
> 6) forward over the /var/log/messages file of the driver load and
>    failure snippet.
> 
> 
> Not sure which firmware version you are running, but an additional
> datapoint which may be useful after you send the firmware-dump is to
> download the latest 24xx firmware file from QLogic.com:
> 
> 	ftp://ftp.qlogic.com/outgoing/linux/firmware/ql2400_fw.bin
> 
> and retry the test.  If you still see problems, and see a similar
> 'Firmware dump saved...' messages.  Follow the steps above again and
> forward the same datapoints.
> 

I have tried both the ql2400_fw.bin.4.00.18 and ql2400_fw.bin.4.00.27 firmwares
and the HBA had the same error.  The attached datapoints were done using
ql2400_fw.bin.4.00.27.

Note:  This is a resend to the mailing list without attachments.

>>>> While this may be something for the maintainer of the qla2xxx module (I can't
>>>> figure out where I'd send it, in that case...) I think it may be of interest
>>>> that the dm_rdac module tries to push something over the HBA that causes it to
>>>> bail completely and start from scratch (it starts init processes and loading
>>>> firmware again).
>>>>
>>>> Not to say that I'm not interested in any help getting this working, that is.
>>>> If you have any suggestions on how to get this working, I'd love to hear them.
>>>> I'm also willing to guinea pig some testing if you need it (This box still has a
>>>> bit before it will have to be put in use).  I may use redhat to ensure that it's
>>>> not just a broken HBA, but for the long run we would like it to join our gentoo
>>>> environment.
>>>>
>>>> Thanks!
>>>> Brian De Wolf
>>>>
>>>> PS- If the subject mislead you because you feel that this is just a qla2xxx
>>>> problem, I'm sorry for wasting your time.
> 
> Regards,
> Andrew Vasquez
> 
> --
> dm-devel mailing list
> dm-devel@redhat.com
> https://www.redhat.com/mailman/listinfo/dm-devel


  parent reply	other threads:[~2007-07-13 19:33 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-07-13  1:35 dm-mpath-rdac.patch problem Brian De Wolf
2007-07-13  2:06 ` Chandra Seetharaman
2007-07-13  2:37   ` Mike Anderson
2007-07-13 16:12     ` [dm-devel] " Andrew Vasquez
2007-07-13 19:13       ` Brian De Wolf
2007-07-13 19:33       ` Brian De Wolf [this message]
2007-07-17 21:07     ` [PATCH] dm-mpath-rdac: don't stomp on a request's transfer bit Andrew Vasquez
2007-07-20 23:05       ` Brian De Wolf
2007-07-21  1:25         ` Chandra Seetharaman
2007-07-21 16:45         ` Alasdair G Kergon
2007-07-21  5:56       ` Chandra Seetharaman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4697D36F.5030201@csupomona.edu \
    --to=bldewolf@csupomona.edu \
    --cc=dm-devel@redhat.com \
    --cc=linux-scsi@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.