All of lore.kernel.org
 help / color / mirror / Atom feed
* dm-mpath-rdac.patch problem
@ 2007-07-13  1:35 Brian De Wolf
  2007-07-13  2:06 ` Chandra Seetharaman
  0 siblings, 1 reply; 11+ messages in thread
From: Brian De Wolf @ 2007-07-13  1:35 UTC (permalink / raw)
  To: dm-devel

Hello All,

I'm not sure if this is the right place for this, but it seems to be the only
mailing list related to dm, multipath, and rdac, as far as I can tell.  I've
been trying out the dm-mpath-rdac patch (both yesterday's and previous) with
gentoo's unstable 2.6.22 kernel, on a Sun x4100 through a QLA2422 HBA (firmware
ql2400_fw.bin.4.00.27) to an IBM DS4000.  I am using a version of
multipath-tools that I got with git a few days ago.

I've got multipath working, it reports the hwhandler correctly ([hwhandler=1
rdac]), and the volume is mountable, etc.  It also shows one link as active, the
other as ghost.  However, once the active link dies, the volume becomes read
only, and both connections are listed as failed.  Most importantly, something
like this shows up in the logs:

Jul 12 17:11:15 jimbo kernel: device-mapper: multipath rdac: queueing
MODE_SELECT command on 8:32
Jul 12 17:11:15 jimbo kernel: qla2xxx 0000:02:01.1: ISP System Error - mbx1=0h
mbx2=8012h mbx3=8002h.
Jul 12 17:11:15 jimbo kernel: qla2xxx 0000:02:01.1: Firmware has been previously
dumped (ffffc2000171d000) -- ignoring request...
Jul 12 17:11:16 jimbo kernel: qla2xxx 0000:02:01.1: Performing ISP error
recovery - ha= ffff81007e85c530.
Jul 12 17:11:16 jimbo kernel: device-mapper: multipath: Failing path 8:32.
Jul 12 17:11:16 jimbo kernel: Buffer I/O error on device dm-6, logical block 0
Jul 12 17:11:16 jimbo kernel: Buffer I/O error on device dm-6, logical block 1
Jul 12 17:11:16 jimbo kernel: Buffer I/O error on device dm-6, logical block 2
Jul 12 17:11:16 jimbo kernel: Buffer I/O error on device dm-6, logical block 3
Jul 12 17:11:16 jimbo kernel: Buffer I/O error on device dm-6, logical block 0
Jul 12 17:11:16 jimbo multipathd: 8:32: mark as failed
Jul 12 17:11:16 jimbo multipathd: test: remaining active paths: 0

While this may be something for the maintainer of the qla2xxx module (I can't
figure out where I'd send it, in that case...) I think it may be of interest
that the dm_rdac module tries to push something over the HBA that causes it to
bail completely and start from scratch (it starts init processes and loading
firmware again).

Not to say that I'm not interested in any help getting this working, that is.
If you have any suggestions on how to get this working, I'd love to hear them.
I'm also willing to guinea pig some testing if you need it (This box still has a
bit before it will have to be put in use).  I may use redhat to ensure that it's
not just a broken HBA, but for the long run we would like it to join our gentoo
environment.

Thanks!
Brian De Wolf

PS- If the subject mislead you because you feel that this is just a qla2xxx
problem, I'm sorry for wasting your time.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: dm-mpath-rdac.patch problem
  2007-07-13  1:35 dm-mpath-rdac.patch problem Brian De Wolf
@ 2007-07-13  2:06 ` Chandra Seetharaman
  2007-07-13  2:37   ` Mike Anderson
  0 siblings, 1 reply; 11+ messages in thread
From: Chandra Seetharaman @ 2007-07-13  2:06 UTC (permalink / raw)
  To: device-mapper development

On Thu, 2007-07-12 at 18:35 -0700, Brian De Wolf wrote:
> Hello All,
> 
> I'm not sure if this is the right place for this, but it seems to be the only
> mailing list related to dm, multipath, and rdac, as far as I can tell.  I've
> been trying out the dm-mpath-rdac patch (both yesterday's and previous) with
> gentoo's unstable 2.6.22 kernel, on a Sun x4100 through a QLA2422 HBA (firmware
> ql2400_fw.bin.4.00.27) to an IBM DS4000.  I am using a version of
> multipath-tools that I got with git a few days ago.
> 
> I've got multipath working, it reports the hwhandler correctly ([hwhandler=1
> rdac]), and the volume is mountable, etc.  It also shows one link as active, the
> other as ghost.  However, once the active link dies, the volume becomes read
> only, and both connections are listed as failed.  Most importantly, something
> like this shows up in the logs:
> 
> Jul 12 17:11:15 jimbo kernel: device-mapper: multipath rdac: queueing
> MODE_SELECT command on 8:32

It does look like the rdac hardware handler is doing the right thing and
the qlogic is dying for some reason.

I have tested this code in both RHEL5 and SLES10 environments (qla23xx)
and they work fine. Can you try in one of those and see if it is any
different.

Just an FYI w.r.t multipath tools: please remove the patch
http://git.kernel.org/?p=linux/storage/multipath-
tools/.git;a=commit;h=e1e1a1bfb2cf76bfd1a49335e3deec5360fb09db from your
tree for the tools to calculate the path priorities properly.


> Jul 12 17:11:15 jimbo kernel: qla2xxx 0000:02:01.1: ISP System Error - mbx1=0h
> mbx2=8012h mbx3=8002h.
> Jul 12 17:11:15 jimbo kernel: qla2xxx 0000:02:01.1: Firmware has been previously
> dumped (ffffc2000171d000) -- ignoring request...
> Jul 12 17:11:16 jimbo kernel: qla2xxx 0000:02:01.1: Performing ISP error
> recovery - ha= ffff81007e85c530.
> Jul 12 17:11:16 jimbo kernel: device-mapper: multipath: Failing path 8:32.
> Jul 12 17:11:16 jimbo kernel: Buffer I/O error on device dm-6, logical block 0
> Jul 12 17:11:16 jimbo kernel: Buffer I/O error on device dm-6, logical block 1
> Jul 12 17:11:16 jimbo kernel: Buffer I/O error on device dm-6, logical block 2
> Jul 12 17:11:16 jimbo kernel: Buffer I/O error on device dm-6, logical block 3
> Jul 12 17:11:16 jimbo kernel: Buffer I/O error on device dm-6, logical block 0
> Jul 12 17:11:16 jimbo multipathd: 8:32: mark as failed
> Jul 12 17:11:16 jimbo multipathd: test: remaining active paths: 0
> 
> While this may be something for the maintainer of the qla2xxx module (I can't
> figure out where I'd send it, in that case...) I think it may be of interest
> that the dm_rdac module tries to push something over the HBA that causes it to
> bail completely and start from scratch (it starts init processes and loading
> firmware again).
> 
> Not to say that I'm not interested in any help getting this working, that is.
> If you have any suggestions on how to get this working, I'd love to hear them.
> I'm also willing to guinea pig some testing if you need it (This box still has a
> bit before it will have to be put in use).  I may use redhat to ensure that it's
> not just a broken HBA, but for the long run we would like it to join our gentoo
> environment.
> 
> Thanks!
> Brian De Wolf
> 
> PS- If the subject mislead you because you feel that this is just a qla2xxx
> problem, I'm sorry for wasting your time.
> 
> --
> dm-devel mailing list
> dm-devel@redhat.com
> https://www.redhat.com/mailman/listinfo/dm-devel
-- 

----------------------------------------------------------------------
    Chandra Seetharaman               | Be careful what you choose....
              - sekharan@us.ibm.com   |      .......you may get it.
----------------------------------------------------------------------

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: dm-mpath-rdac.patch problem
  2007-07-13  2:06 ` Chandra Seetharaman
@ 2007-07-13  2:37   ` Mike Anderson
  2007-07-13 16:12     ` [dm-devel] " Andrew Vasquez
  2007-07-17 21:07     ` [PATCH] dm-mpath-rdac: don't stomp on a request's transfer bit Andrew Vasquez
  0 siblings, 2 replies; 11+ messages in thread
From: Mike Anderson @ 2007-07-13  2:37 UTC (permalink / raw)
  To: linux-scsi, device-mapper development

Copying this mail to linux-scsi and Ccing Andrew Vasquez to possibly
provide input on the Qlogic behavior.

Chandra Seetharaman <sekharan@us.ibm.com> wrote:
> On Thu, 2007-07-12 at 18:35 -0700, Brian De Wolf wrote:
> > Hello All,
> > 
> > I'm not sure if this is the right place for this, but it seems to be the only
> > mailing list related to dm, multipath, and rdac, as far as I can tell.  I've
> > been trying out the dm-mpath-rdac patch (both yesterday's and previous) with
> > gentoo's unstable 2.6.22 kernel, on a Sun x4100 through a QLA2422 HBA (firmware
> > ql2400_fw.bin.4.00.27) to an IBM DS4000.  I am using a version of
> > multipath-tools that I got with git a few days ago.
> > 
> > I've got multipath working, it reports the hwhandler correctly ([hwhandler=1
> > rdac]), and the volume is mountable, etc.  It also shows one link as active, the
> > other as ghost.  However, once the active link dies, the volume becomes read
> > only, and both connections are listed as failed.  Most importantly, something
> > like this shows up in the logs:
> > 
> > Jul 12 17:11:15 jimbo kernel: device-mapper: multipath rdac: queueing
> > MODE_SELECT command on 8:32
> 
> It does look like the rdac hardware handler is doing the right thing and
> the qlogic is dying for some reason.
> 
> I have tested this code in both RHEL5 and SLES10 environments (qla23xx)
> and they work fine. Can you try in one of those and see if it is any
> different.
> 
> Just an FYI w.r.t multipath tools: please remove the patch
> http://git.kernel.org/?p=linux/storage/multipath-
> tools/.git;a=commit;h=e1e1a1bfb2cf76bfd1a49335e3deec5360fb09db from your
> tree for the tools to calculate the path priorities properly.
> 
> 
> > Jul 12 17:11:15 jimbo kernel: qla2xxx 0000:02:01.1: ISP System Error - mbx1=0h
> > mbx2=8012h mbx3=8002h.
> > Jul 12 17:11:15 jimbo kernel: qla2xxx 0000:02:01.1: Firmware has been previously
> > dumped (ffffc2000171d000) -- ignoring request...
> > Jul 12 17:11:16 jimbo kernel: qla2xxx 0000:02:01.1: Performing ISP error
> > recovery - ha= ffff81007e85c530.
> > Jul 12 17:11:16 jimbo kernel: device-mapper: multipath: Failing path 8:32.
> > Jul 12 17:11:16 jimbo kernel: Buffer I/O error on device dm-6, logical block 0
> > Jul 12 17:11:16 jimbo kernel: Buffer I/O error on device dm-6, logical block 1
> > Jul 12 17:11:16 jimbo kernel: Buffer I/O error on device dm-6, logical block 2
> > Jul 12 17:11:16 jimbo kernel: Buffer I/O error on device dm-6, logical block 3
> > Jul 12 17:11:16 jimbo kernel: Buffer I/O error on device dm-6, logical block 0
> > Jul 12 17:11:16 jimbo multipathd: 8:32: mark as failed
> > Jul 12 17:11:16 jimbo multipathd: test: remaining active paths: 0
> > 
> > While this may be something for the maintainer of the qla2xxx module (I can't
> > figure out where I'd send it, in that case...) I think it may be of interest
> > that the dm_rdac module tries to push something over the HBA that causes it to
> > bail completely and start from scratch (it starts init processes and loading
> > firmware again).
> > 
> > Not to say that I'm not interested in any help getting this working, that is.
> > If you have any suggestions on how to get this working, I'd love to hear them.
> > I'm also willing to guinea pig some testing if you need it (This box still has a
> > bit before it will have to be put in use).  I may use redhat to ensure that it's
> > not just a broken HBA, but for the long run we would like it to join our gentoo
> > environment.
> > 
> > Thanks!
> > Brian De Wolf
> > 
> > PS- If the subject mislead you because you feel that this is just a qla2xxx
> > problem, I'm sorry for wasting your time.
> > 
> > --
> > dm-devel mailing list
> > dm-devel@redhat.com
> > https://www.redhat.com/mailman/listinfo/dm-devel
> -- 
> 
> ----------------------------------------------------------------------
>     Chandra Seetharaman               | Be careful what you choose....
>               - sekharan@us.ibm.com   |      .......you may get it.
> ----------------------------------------------------------------------
> 
> 
> --
> dm-devel mailing list
> dm-devel@redhat.com
> https://www.redhat.com/mailman/listinfo/dm-devel

-andmike
--
Michael Anderson
andmike@us.ibm.com

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [dm-devel] dm-mpath-rdac.patch problem
  2007-07-13  2:37   ` Mike Anderson
@ 2007-07-13 16:12     ` Andrew Vasquez
  2007-07-13 19:13       ` Brian De Wolf
  2007-07-13 19:33       ` Brian De Wolf
  2007-07-17 21:07     ` [PATCH] dm-mpath-rdac: don't stomp on a request's transfer bit Andrew Vasquez
  1 sibling, 2 replies; 11+ messages in thread
From: Andrew Vasquez @ 2007-07-13 16:12 UTC (permalink / raw)
  To: sekharan; +Cc: linux-scsi, device-mapper development, Mike Anderson

On Thu, 12 Jul 2007, Mike Anderson wrote:

> Copying this mail to linux-scsi and Ccing Andrew Vasquez to possibly
> provide input on the Qlogic behavior.
> 
> Chandra Seetharaman <sekharan@us.ibm.com> wrote:
> > On Thu, 2007-07-12 at 18:35 -0700, Brian De Wolf wrote:
> > > Hello All,
> > > 
> > > I'm not sure if this is the right place for this, but it seems to be the only
> > > mailing list related to dm, multipath, and rdac, as far as I can tell.  I've
> > > been trying out the dm-mpath-rdac patch (both yesterday's and previous) with
> > > gentoo's unstable 2.6.22 kernel, on a Sun x4100 through a QLA2422 HBA (firmware
> > > ql2400_fw.bin.4.00.27) to an IBM DS4000.  I am using a version of
> > > multipath-tools that I got with git a few days ago.
> > > 
> > > I've got multipath working, it reports the hwhandler correctly ([hwhandler=1
> > > rdac]), and the volume is mountable, etc.  It also shows one link as active, the
> > > other as ghost.  However, once the active link dies, the volume becomes read
> > > only, and both connections are listed as failed.  Most importantly, something
> > > like this shows up in the logs:
> > > 
> > > Jul 12 17:11:15 jimbo kernel: device-mapper: multipath rdac: queueing
> > > MODE_SELECT command on 8:32
> > 
> > It does look like the rdac hardware handler is doing the right thing and
> > the qlogic is dying for some reason.
> > 
> > I have tested this code in both RHEL5 and SLES10 environments (qla23xx)
> > and they work fine. Can you try in one of those and see if it is any
> > different.
> > 
> > Just an FYI w.r.t multipath tools: please remove the patch
> > http://git.kernel.org/?p=linux/storage/multipath-
> > tools/.git;a=commit;h=e1e1a1bfb2cf76bfd1a49335e3deec5360fb09db from your
> > tree for the tools to calculate the path priorities properly.
> > 
> > 
> > > Jul 12 17:11:15 jimbo kernel: qla2xxx 0000:02:01.1: ISP System Error - mbx1=0h
> > > mbx2=8012h mbx3=8002h.
> > > Jul 12 17:11:15 jimbo kernel: qla2xxx 0000:02:01.1: Firmware has been previously
> > > dumped (ffffc2000171d000) -- ignoring request...
> > > Jul 12 17:11:16 jimbo kernel: qla2xxx 0000:02:01.1: Performing ISP error
> > > recovery - ha= ffff81007e85c530.

Hmm yes, there's some real problems going on within the firmware which
we need to triage.  From the snippet above, the driver was able to
capture a firmware-dump of a failure (not sure of the timing and how
it relates to the window in which you recognized a 'problem'), but
I'll need to to 'capture' the firmware trace and forward it along to
us to inspect.

1) download the following shell script:

	ftp://ftp.qlogic.com/outgoing/linux/beta/8.x/test/qla_dmp.sh

2) copy the script to the host (/tmp) which is experiencing the
   problems.

3) reboot and load the driver with the ql2xextended_error_logging
   module parameter set to 1. e.g.:

	$ insmod qla2xxx.ko ql2xextended_error_logging=1

4) rerun your test and monitor the kernel-messages file for a message
   similar to:

        Firmware dump saved to temp buffer (1/adcdabcd)

5) To retrieve the dump, go to a console and type the following:

        # cd /tmp/
        # ./qla_dmp.sh 1

   The value passed to qla_dmp.sh should be the same as the first integer
   in the 'saved to temp buffer' string (in this example, 1).  If the
   operation was successful, a message like to following should be
   displayed:

        Firmware dumped to file fw_dump_1_20041217_023222.txt.gz

   Formward the 
   forward over the file.

6) forward over the /var/log/messages file of the driver load and
   failure snippet.


Not sure which firmware version you are running, but an additional
datapoint which may be useful after you send the firmware-dump is to
download the latest 24xx firmware file from QLogic.com:

	ftp://ftp.qlogic.com/outgoing/linux/firmware/ql2400_fw.bin

and retry the test.  If you still see problems, and see a similar
'Firmware dump saved...' messages.  Follow the steps above again and
forward the same datapoints.

> > > While this may be something for the maintainer of the qla2xxx module (I can't
> > > figure out where I'd send it, in that case...) I think it may be of interest
> > > that the dm_rdac module tries to push something over the HBA that causes it to
> > > bail completely and start from scratch (it starts init processes and loading
> > > firmware again).
> > > 
> > > Not to say that I'm not interested in any help getting this working, that is.
> > > If you have any suggestions on how to get this working, I'd love to hear them.
> > > I'm also willing to guinea pig some testing if you need it (This box still has a
> > > bit before it will have to be put in use).  I may use redhat to ensure that it's
> > > not just a broken HBA, but for the long run we would like it to join our gentoo
> > > environment.
> > > 
> > > Thanks!
> > > Brian De Wolf
> > > 
> > > PS- If the subject mislead you because you feel that this is just a qla2xxx
> > > problem, I'm sorry for wasting your time.

Regards,
Andrew Vasquez

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [dm-devel] dm-mpath-rdac.patch problem
  2007-07-13 16:12     ` [dm-devel] " Andrew Vasquez
@ 2007-07-13 19:13       ` Brian De Wolf
  2007-07-13 19:33       ` Brian De Wolf
  1 sibling, 0 replies; 11+ messages in thread
From: Brian De Wolf @ 2007-07-13 19:13 UTC (permalink / raw)
  To: device-mapper development; +Cc: sekharan, Mike Anderson, linux-scsi

[-- Attachment #1: Type: text/plain, Size: 5378 bytes --]

Andrew Vasquez wrote:
> On Thu, 12 Jul 2007, Mike Anderson wrote:
> 
>> Copying this mail to linux-scsi and Ccing Andrew Vasquez to possibly
>> provide input on the Qlogic behavior.
>>
>> Chandra Seetharaman <sekharan@us.ibm.com> wrote:
>>> On Thu, 2007-07-12 at 18:35 -0700, Brian De Wolf wrote:
>>>> Hello All,
>>>>
>>>> I'm not sure if this is the right place for this, but it seems to be the only
>>>> mailing list related to dm, multipath, and rdac, as far as I can tell.  I've
>>>> been trying out the dm-mpath-rdac patch (both yesterday's and previous) with
>>>> gentoo's unstable 2.6.22 kernel, on a Sun x4100 through a QLA2422 HBA (firmware
>>>> ql2400_fw.bin.4.00.27) to an IBM DS4000.  I am using a version of
>>>> multipath-tools that I got with git a few days ago.
>>>>
>>>> I've got multipath working, it reports the hwhandler correctly ([hwhandler=1
>>>> rdac]), and the volume is mountable, etc.  It also shows one link as active, the
>>>> other as ghost.  However, once the active link dies, the volume becomes read
>>>> only, and both connections are listed as failed.  Most importantly, something
>>>> like this shows up in the logs:
>>>>
>>>> Jul 12 17:11:15 jimbo kernel: device-mapper: multipath rdac: queueing
>>>> MODE_SELECT command on 8:32
>>> It does look like the rdac hardware handler is doing the right thing and
>>> the qlogic is dying for some reason.
>>>
>>> I have tested this code in both RHEL5 and SLES10 environments (qla23xx)
>>> and they work fine. Can you try in one of those and see if it is any
>>> different.
>>>
>>> Just an FYI w.r.t multipath tools: please remove the patch
>>> http://git.kernel.org/?p=linux/storage/multipath-
>>> tools/.git;a=commit;h=e1e1a1bfb2cf76bfd1a49335e3deec5360fb09db from your
>>> tree for the tools to calculate the path priorities properly.
>>>
>>>
>>>> Jul 12 17:11:15 jimbo kernel: qla2xxx 0000:02:01.1: ISP System Error - mbx1=0h
>>>> mbx2=8012h mbx3=8002h.
>>>> Jul 12 17:11:15 jimbo kernel: qla2xxx 0000:02:01.1: Firmware has been previously
>>>> dumped (ffffc2000171d000) -- ignoring request...
>>>> Jul 12 17:11:16 jimbo kernel: qla2xxx 0000:02:01.1: Performing ISP error
>>>> recovery - ha= ffff81007e85c530.
> 
> Hmm yes, there's some real problems going on within the firmware which
> we need to triage.  From the snippet above, the driver was able to
> capture a firmware-dump of a failure (not sure of the timing and how
> it relates to the window in which you recognized a 'problem'), but
> I'll need to to 'capture' the firmware trace and forward it along to
> us to inspect.
> 
> 1) download the following shell script:
> 
> 	ftp://ftp.qlogic.com/outgoing/linux/beta/8.x/test/qla_dmp.sh
> 
> 2) copy the script to the host (/tmp) which is experiencing the
>    problems.
> 
> 3) reboot and load the driver with the ql2xextended_error_logging
>    module parameter set to 1. e.g.:
> 
> 	$ insmod qla2xxx.ko ql2xextended_error_logging=1
> 
> 4) rerun your test and monitor the kernel-messages file for a message
>    similar to:
> 
>         Firmware dump saved to temp buffer (1/adcdabcd)
> 
> 5) To retrieve the dump, go to a console and type the following:
> 
>         # cd /tmp/
>         # ./qla_dmp.sh 1
> 
>    The value passed to qla_dmp.sh should be the same as the first integer
>    in the 'saved to temp buffer' string (in this example, 1).  If the
>    operation was successful, a message like to following should be
>    displayed:
> 
>         Firmware dumped to file fw_dump_1_20041217_023222.txt.gz
> 
>    Formward the 
>    forward over the file.
> 
> 6) forward over the /var/log/messages file of the driver load and
>    failure snippet.
> 
> 
> Not sure which firmware version you are running, but an additional
> datapoint which may be useful after you send the firmware-dump is to
> download the latest 24xx firmware file from QLogic.com:
> 
> 	ftp://ftp.qlogic.com/outgoing/linux/firmware/ql2400_fw.bin
> 
> and retry the test.  If you still see problems, and see a similar
> 'Firmware dump saved...' messages.  Follow the steps above again and
> forward the same datapoints.
> 

I have tried both the ql2400_fw.bin.4.00.18 and ql2400_fw.bin.4.00.27 firmwares
and the HBA had the same error.  The attached datapoints were done using
ql2400_fw.bin.4.00.27.


>>>> While this may be something for the maintainer of the qla2xxx module (I can't
>>>> figure out where I'd send it, in that case...) I think it may be of interest
>>>> that the dm_rdac module tries to push something over the HBA that causes it to
>>>> bail completely and start from scratch (it starts init processes and loading
>>>> firmware again).
>>>>
>>>> Not to say that I'm not interested in any help getting this working, that is.
>>>> If you have any suggestions on how to get this working, I'd love to hear them.
>>>> I'm also willing to guinea pig some testing if you need it (This box still has a
>>>> bit before it will have to be put in use).  I may use redhat to ensure that it's
>>>> not just a broken HBA, but for the long run we would like it to join our gentoo
>>>> environment.
>>>>
>>>> Thanks!
>>>> Brian De Wolf
>>>>
>>>> PS- If the subject mislead you because you feel that this is just a qla2xxx
>>>> problem, I'm sorry for wasting your time.
> 
> Regards,
> Andrew Vasquez
> 
> --
> dm-devel mailing list
> dm-devel@redhat.com
> https://www.redhat.com/mailman/listinfo/dm-devel

[-- Attachment #2: fw_dump_6_20070713_112706.txt.gz --]
[-- Type: application/gzip, Size: 173148 bytes --]

[-- Attachment #3: fw_dump_6_20070713_112706.syslog.txt --]
[-- Type: text/plain, Size: 16017 bytes --]

Jul 13 11:24:10 jimbo kernel: QLogic Fibre Channel HBA Driver
Jul 13 11:24:10 jimbo kernel: ACPI: PCI Interrupt 0000:02:01.0[A] -> GSI 28 (level, low) -> IRQ 28
Jul 13 11:24:10 jimbo kernel: qla2xxx 0000:02:01.0: Found an ISP2422, irq 28, iobase 0xffffc20001428000
Jul 13 11:24:10 jimbo kernel: qla2xxx 0000:02:01.0: Configuring PCI space...
Jul 13 11:24:10 jimbo kernel: qla2xxx 0000:02:01.0: Configure NVRAM parameters...
Jul 13 11:24:10 jimbo kernel: qla2xxx 0000:02:01.0: Verifying loaded RISC code...
Jul 13 11:24:10 jimbo kernel: scsi(5): **** Load RISC code ****
Jul 13 11:24:10 jimbo kernel: scsi(5): Verifying Checksum of loaded RISC code.
Jul 13 11:24:10 jimbo kernel: scsi(5): Checksum OK, start firmware.
Jul 13 11:24:10 jimbo kernel: qla2xxx 0000:02:01.0: Allocated (64 KB) for EFT...
Jul 13 11:24:10 jimbo kernel: qla2xxx 0000:02:01.0: Allocated (1413 KB) for firmware dump...
Jul 13 11:24:10 jimbo kernel: scsi(5): Issue init firmware.
Jul 13 11:24:10 jimbo kernel: DEBUG: detect hba 5 at address = ffff81007dd84530
Jul 13 11:24:10 jimbo kernel: scsi5 : qla2xxx
Jul 13 11:24:12 jimbo kernel: scsi(5): qla2x00_loop_resync()
Jul 13 11:24:15 jimbo kernel: qla2xxx 0000:02:01.0: 
Jul 13 11:24:15 jimbo kernel: QLogic Fibre Channel HBA Driver: 8.01.07-k7-debug
Jul 13 11:24:15 jimbo kernel: QLogic QLA2462 - Sun PCI-X 2.0 to 4Gb FC, Dual Channel
Jul 13 11:24:15 jimbo kernel: ISP2422: PCI-X Mode 1 (100 MHz) @ 0000:02:01.0 hdma+, host#=5, fw=4.00.27 [IP] 
Jul 13 11:24:15 jimbo kernel: ACPI: PCI Interrupt 0000:02:01.1[B] -> GSI 29 (level, low) -> IRQ 29
Jul 13 11:24:15 jimbo kernel: qla2xxx 0000:02:01.1: Found an ISP2422, irq 29, iobase 0xffffc2000142a000
Jul 13 11:24:15 jimbo kernel: qla2xxx 0000:02:01.1: Configuring PCI space...
Jul 13 11:24:15 jimbo kernel: qla2xxx 0000:02:01.1: Configure NVRAM parameters...
Jul 13 11:24:15 jimbo kernel: qla2xxx 0000:02:01.1: Verifying loaded RISC code...
Jul 13 11:24:15 jimbo kernel: scsi(6): **** Load RISC code ****
Jul 13 11:24:15 jimbo kernel: scsi(6): Verifying Checksum of loaded RISC code.
Jul 13 11:24:15 jimbo kernel: scsi(6): Checksum OK, start firmware.
Jul 13 11:24:15 jimbo kernel: qla2xxx 0000:02:01.1: Allocated (64 KB) for EFT...
Jul 13 11:24:15 jimbo kernel: qla2xxx 0000:02:01.1: Allocated (1413 KB) for firmware dump...
Jul 13 11:24:15 jimbo kernel: scsi(6): Issue init firmware.
Jul 13 11:24:15 jimbo kernel: DEBUG: detect hba 6 at address = ffff81007de68530
Jul 13 11:24:15 jimbo kernel: scsi6 : qla2xxx
Jul 13 11:24:16 jimbo kernel: scsi(6): Asynchronous LIP RESET (f700).
Jul 13 11:24:16 jimbo kernel: qla2xxx 0000:02:01.1: LIP reset occured (f700).
Jul 13 11:24:16 jimbo kernel: scsi(6): LIP occured (f700).
Jul 13 11:24:16 jimbo kernel: qla2xxx 0000:02:01.1: LIP occured (f700).
Jul 13 11:24:16 jimbo kernel: scsi(6): Asynchronous LIP RESET (f7f7).
Jul 13 11:24:16 jimbo kernel: qla2xxx 0000:02:01.1: LIP reset occured (f7f7).
Jul 13 11:24:16 jimbo kernel: scsi(6): Asynchronous P2P MODE received.
Jul 13 11:24:17 jimbo kernel: scsi(6): Asynchronous LOOP UP (4 Gbps).
Jul 13 11:24:17 jimbo kernel: qla2xxx 0000:02:01.1: LOOP UP detected (4 Gbps).
Jul 13 11:24:17 jimbo kernel: scsi(6): Asynchronous PORT UPDATE.
Jul 13 11:24:17 jimbo kernel: scsi(6): Port database changed ffff 0006 0000.
Jul 13 11:24:17 jimbo kernel: scsi(6): qla2x00_reset_marker()
Jul 13 11:24:17 jimbo kernel: scsi(6): qla2x00_loop_resync()
Jul 13 11:24:17 jimbo kernel: scsi(6): F/W Ready - OK 
Jul 13 11:24:17 jimbo kernel: scsi(6): fw_state=3 curr time=100032111.
Jul 13 11:24:17 jimbo kernel: scsi(6): Configure loop -- dpc flags =0x40800e0
Jul 13 11:24:17 jimbo kernel: scsi(6): RSCN queue entry[0] = [00/000000].
Jul 13 11:24:17 jimbo kernel: scsi(6): device_resync: rscn overflow.
Jul 13 11:24:17 jimbo kernel: scsi(6): RFT_ID exiting normally.
Jul 13 11:24:17 jimbo kernel: scsi(6): RFF_ID exiting normally.
Jul 13 11:24:17 jimbo kernel: scsi(6): RNN_ID exiting normally.
Jul 13 11:24:17 jimbo kernel: scsi(6): RSNN_NN exiting normally.
Jul 13 11:24:17 jimbo kernel: scsi(6): GID_PT entry - nn 200600a0b8119910 pn 202700a0b8119910 portid=050100.
Jul 13 11:24:17 jimbo kernel: scsi(6): GID_PT entry - nn 200100e08bb191cd pn 210100e08bb191cd portid=050200.
Jul 13 11:24:17 jimbo kernel: scsi(6): GPSC failed, rejected request:
Jul 13 11:24:17 jimbo kernel: 0   1   2   3   4   5   6   7   8   9  Ah  Bh  Ch  Dh  Eh  Fh
Jul 13 11:24:17 jimbo kernel: --------------------------------------------------------------
Jul 13 11:24:17 jimbo kernel: 01  00  00  00  fa  01  00  00  80  01  00  00  00  0b  00  00
Jul 13 11:24:17 jimbo kernel: scsi(6): GPSC failed, rejected request:
Jul 13 11:24:17 jimbo kernel: 0   1   2   3   4   5   6   7   8   9  Ah  Bh  Ch  Dh  Eh  Fh
Jul 13 11:24:17 jimbo kernel: --------------------------------------------------------------
Jul 13 11:24:17 jimbo kernel: 01  00  00  00  fa  01  00  00  80  01  00  00  00  0b  00  00
Jul 13 11:24:17 jimbo kernel: scsi(6): device wrap (050200)
Jul 13 11:24:17 jimbo kernel: scsi(6): Trying Fabric Login w/loop id 0x0081 for port 050100.
Jul 13 11:24:17 jimbo kernel: scsi(6): 202700a0b8119910 -- unsupported FM port operating speed (0000).
Jul 13 11:24:17 jimbo kernel: scsi(6): LOOP READY
Jul 13 11:24:17 jimbo kernel: scsi(6): qla2x00_loop_resync - end
Jul 13 11:24:17 jimbo kernel: qla2xxx 0000:02:01.1: scsi(6:0:0:0): Queue depth adjusted-up to 4.
Jul 13 11:24:17 jimbo kernel: scsi 6:0:0:0: Direct-Access     IBM      1815      FAStT  0914 PQ: 0 ANSI: 3
Jul 13 11:24:17 jimbo kernel: qla2xxx 0000:02:01.1: 
Jul 13 11:24:17 jimbo kernel: QLogic Fibre Channel HBA Driver: 8.01.07-k7-debug
Jul 13 11:24:17 jimbo kernel: QLogic QLA2462 - Sun PCI-X 2.0 to 4Gb FC, Dual Channel
Jul 13 11:24:17 jimbo kernel: ISP2422: PCI-X Mode 1 (100 MHz) @ 0000:02:01.1 hdma+, host#=6, fw=4.00.27 [IP] 
Jul 13 11:24:17 jimbo kernel: sd 6:0:0:0: [sdc] 6291456 512-byte hardware sectors (3221 MB)
Jul 13 11:24:17 jimbo kernel: sd 6:0:0:0: [sdc] Write Protect is off
Jul 13 11:24:17 jimbo kernel: sd 6:0:0:0: [sdc] Mode Sense: 77 00 10 08
Jul 13 11:24:17 jimbo kernel: sd 6:0:0:0: [sdc] Write cache: enabled, read cache: enabled, supports DPO and FUA
Jul 13 11:24:17 jimbo kernel: sd 6:0:0:0: [sdc] 6291456 512-byte hardware sectors (3221 MB)
Jul 13 11:24:17 jimbo kernel: sd 6:0:0:0: [sdc] Write Protect is off
Jul 13 11:24:17 jimbo kernel: sd 6:0:0:0: [sdc] Mode Sense: 77 00 10 08
Jul 13 11:24:17 jimbo kernel: sd 6:0:0:0: [sdc] Write cache: enabled, read cache: enabled, supports DPO and FUA
Jul 13 11:24:17 jimbo kernel: sdc:scsi(6): Asynchronous PORT UPDATE ignored 0000/0004/0600.
Jul 13 11:24:17 jimbo kernel: scsi(6): Asynchronous PORT UPDATE ignored 0000/0007/0b00.
Jul 13 11:24:17 jimbo kernel: end_request: I/O error, dev sdc, sector 0
Jul 13 11:24:17 jimbo kernel: Buffer I/O error on device sdc, logical block 0
Jul 13 11:24:18 jimbo kernel: end_request: I/O error, dev sdc, sector 0
Jul 13 11:24:18 jimbo kernel: Buffer I/O error on device sdc, logical block 0
Jul 13 11:24:18 jimbo kernel: unable to read partition table
Jul 13 11:24:18 jimbo kernel: sd 6:0:0:0: [sdc] Attached SCSI disk
Jul 13 11:24:18 jimbo kernel: scsi 6:0:0:31: Direct-Access     IBM      Universal Xport  0914 PQ: 0 ANSI: 3
Jul 13 11:24:18 jimbo kernel: end_request: I/O error, dev sdc, sector 6291328
Jul 13 11:24:18 jimbo kernel: Buffer I/O error on device sdc, logical block 786416
Jul 13 11:24:19 jimbo kernel: end_request: I/O error, dev sdc, sector 6291328
Jul 13 11:24:19 jimbo kernel: Buffer I/O error on device sdc, logical block 786416
Jul 13 11:24:19 jimbo kernel: end_request: I/O error, dev sdc, sector 6291448
Jul 13 11:24:19 jimbo kernel: Buffer I/O error on device sdc, logical block 786431
Jul 13 11:24:20 jimbo kernel: end_request: I/O error, dev sdc, sector 6291448
Jul 13 11:24:20 jimbo kernel: Buffer I/O error on device sdc, logical block 786431
Jul 13 11:24:20 jimbo kernel: end_request: I/O error, dev sdc, sector 6291448
Jul 13 11:24:20 jimbo kernel: Buffer I/O error on device sdc, logical block 786431
Jul 13 11:24:21 jimbo kernel: end_request: I/O error, dev sdc, sector 6291448
Jul 13 11:24:21 jimbo kernel: Buffer I/O error on device sdc, logical block 786431
Jul 13 11:24:21 jimbo kernel: end_request: I/O error, dev sdc, sector 6291448
Jul 13 11:24:21 jimbo kernel: Buffer I/O error on device sdc, logical block 786431
Jul 13 11:24:22 jimbo kernel: end_request: I/O error, dev sdc, sector 6291448
Jul 13 11:24:22 jimbo kernel: Buffer I/O error on device sdc, logical block 786431
Jul 13 11:24:22 jimbo kernel: end_request: I/O error, dev sdc, sector 6291392
Jul 13 11:24:22 jimbo kernel: Buffer I/O error on device sdc, logical block 786424
Jul 13 11:24:23 jimbo kernel: end_request: I/O error, dev sdc, sector 6291440
Jul 13 11:24:23 jimbo kernel: end_request: I/O error, dev sdc, sector 6291448
Jul 13 11:24:24 jimbo kernel: end_request: I/O error, dev sdc, sector 6291448
Jul 13 11:24:24 jimbo kernel: end_request: I/O error, dev sdc, sector 0
Jul 13 11:24:25 jimbo kernel: end_request: I/O error, dev sdc, sector 0
Jul 13 11:24:26 jimbo kernel: end_request: I/O error, dev sdc, sector 0
Jul 13 11:24:26 jimbo kernel: end_request: I/O error, dev sdc, sector 0
Jul 13 11:24:27 jimbo kernel: end_request: I/O error, dev sdc, sector 0
Jul 13 11:24:27 jimbo kernel: end_request: I/O error, dev sdc, sector 0
Jul 13 11:24:27 jimbo kernel: printk: 8 messages suppressed.
Jul 13 11:24:27 jimbo kernel: Buffer I/O error on device sdc, logical block 0
Jul 13 11:24:28 jimbo kernel: end_request: I/O error, dev sdc, sector 0
Jul 13 11:24:28 jimbo kernel: end_request: I/O error, dev sdc, sector 0
Jul 13 11:24:29 jimbo kernel: end_request: I/O error, dev sdc, sector 0
Jul 13 11:24:29 jimbo kernel: end_request: I/O error, dev sdc, sector 0
Jul 13 11:24:30 jimbo kernel: end_request: I/O error, dev sdc, sector 0
Jul 13 11:24:30 jimbo kernel: end_request: I/O error, dev sdc, sector 0
Jul 13 11:24:31 jimbo kernel: end_request: I/O error, dev sdc, sector 0
Jul 13 11:24:31 jimbo kernel: end_request: I/O error, dev sdc, sector 0
Jul 13 11:24:32 jimbo kernel: end_request: I/O error, dev sdc, sector 0
Jul 13 11:24:32 jimbo kernel: qla2xxx 0000:02:01.0: Cable is unplugged...
Jul 13 11:24:32 jimbo kernel: scsi(5): fw_state=4 curr time=100033044.
Jul 13 11:24:32 jimbo kernel: scsi(5): Firmware ready **** FAILED ****.
Jul 13 11:24:32 jimbo kernel: qla2x00_loop_resync(): **** FAILED ****
Jul 13 11:24:32 jimbo kernel: scsi(5): qla2x00_loop_resync - end
Jul 13 11:24:32 jimbo kernel: end_request: I/O error, dev sdc, sector 0
Jul 13 11:24:32 jimbo kernel: printk: 9 messages suppressed.
Jul 13 11:24:32 jimbo kernel: Buffer I/O error on device sdc, logical block 0
Jul 13 11:24:33 jimbo kernel: end_request: I/O error, dev sdc, sector 0
Jul 13 11:24:34 jimbo kernel: end_request: I/O error, dev sdc, sector 0
Jul 13 11:24:34 jimbo kernel: end_request: I/O error, dev sdc, sector 0
Jul 13 11:24:35 jimbo kernel: end_request: I/O error, dev sdc, sector 0
Jul 13 11:24:35 jimbo kernel: end_request: I/O error, dev sdc, sector 0
Jul 13 11:24:36 jimbo kernel: end_request: I/O error, dev sdc, sector 0
Jul 13 11:24:36 jimbo kernel: end_request: I/O error, dev sdc, sector 0
Jul 13 11:24:37 jimbo kernel: end_request: I/O error, dev sdc, sector 0
Jul 13 11:24:37 jimbo kernel: end_request: I/O error, dev sdc, sector 0
Jul 13 11:24:37 jimbo kernel: printk: 8 messages suppressed.
Jul 13 11:24:37 jimbo kernel: Buffer I/O error on device sdc, logical block 0
Jul 13 11:24:38 jimbo kernel: end_request: I/O error, dev sdc, sector 0
Jul 13 11:24:38 jimbo kernel: end_request: I/O error, dev sdc, sector 0
Jul 13 11:24:39 jimbo kernel: end_request: I/O error, dev sdc, sector 0
Jul 13 11:24:39 jimbo kernel: device-mapper: multipath rdac: using RDAC command with timeout 15000
Jul 13 11:24:39 jimbo kernel: device-mapper: multipath rdac: queueing MODE_SELECT command on 8:32
Jul 13 11:24:41 jimbo kernel: qla2xxx 0000:02:01.1: ISP System Error - mbx1=0h mbx2=8012h mbx3=8002h.
Jul 13 11:24:41 jimbo kernel: qla2xxx 0000:02:01.1: Firmware dump saved to temp buffer (6/ffffc2000191f000).
Jul 13 11:24:41 jimbo kernel: scsi(5): Loop Down - aborting the queues before time expire
Jul 13 11:24:41 jimbo kernel: scsi(6): dpc: sched qla2x00_abort_isp ha = ffff81007de68530
Jul 13 11:24:41 jimbo kernel: qla2xxx 0000:02:01.1: Performing ISP error recovery - ha= ffff81007de68530.
Jul 13 11:24:41 jimbo kernel: device-mapper: multipath: Failing path 8:32.
Jul 13 11:24:41 jimbo kernel: scsi(6): **** Load RISC code ****
Jul 13 11:24:41 jimbo kernel: scsi(6): Verifying Checksum of loaded RISC code.
Jul 13 11:24:41 jimbo kernel: scsi(6): Checksum OK, start firmware.
Jul 13 11:24:41 jimbo kernel: scsi(6): Issue init firmware.
Jul 13 11:24:42 jimbo kernel: scsi(6): Asynchronous LIP RESET (f700).
Jul 13 11:24:42 jimbo kernel: qla2xxx 0000:02:01.1: LIP reset occured (f700).
Jul 13 11:24:42 jimbo kernel: scsi(6): LIP occured (f700).
Jul 13 11:24:42 jimbo kernel: qla2xxx 0000:02:01.1: LIP occured (f700).
Jul 13 11:24:42 jimbo kernel: scsi(6): Asynchronous LIP RESET (f7f7).
Jul 13 11:24:42 jimbo kernel: qla2xxx 0000:02:01.1: LIP reset occured (f7f7).
Jul 13 11:24:42 jimbo kernel: scsi(6): Asynchronous P2P MODE received.
Jul 13 11:24:42 jimbo kernel: scsi(6): fcport-0 - port retry count: 0 remaining
Jul 13 11:24:43 jimbo kernel: scsi(6): Asynchronous LOOP UP (4 Gbps).
Jul 13 11:24:43 jimbo kernel: qla2xxx 0000:02:01.1: LOOP UP detected (4 Gbps).
Jul 13 11:24:43 jimbo kernel: scsi(6): Asynchronous PORT UPDATE.
Jul 13 11:24:43 jimbo kernel: scsi(6): Port database changed ffff 0006 0000.
Jul 13 11:24:43 jimbo kernel: scsi(6): Asynchronous PORT UPDATE ignored 0000/0004/0600.
Jul 13 11:24:43 jimbo kernel: scsi(6): Asynchronous PORT UPDATE ignored 0000/0007/0b00.
Jul 13 11:24:43 jimbo kernel: scsi(6): F/W Ready - OK 
Jul 13 11:24:43 jimbo kernel: scsi(6): fw_state=3 curr time=100033b09.
Jul 13 11:24:43 jimbo kernel: qla2x00_restart_isp(): Start configure loop, status = 0
Jul 13 11:24:43 jimbo kernel: scsi(6): Configure loop -- dpc flags =0x4080049
Jul 13 11:24:43 jimbo kernel: scsi(6): RSCN queue entry[0] = [00/000000].
Jul 13 11:24:43 jimbo kernel: scsi(6): device_resync: rscn overflow.
Jul 13 11:24:43 jimbo kernel: scsi(6): RFT_ID exiting normally.
Jul 13 11:24:43 jimbo kernel: scsi(6): RFF_ID exiting normally.
Jul 13 11:24:43 jimbo kernel: scsi(6): RNN_ID exiting normally.
Jul 13 11:24:43 jimbo kernel: scsi(6): RSNN_NN exiting normally.
Jul 13 11:24:43 jimbo kernel: scsi(6): GID_PT entry - nn 200600a0b8119910 pn 202700a0b8119910 portid=050100.
Jul 13 11:24:43 jimbo kernel: scsi(6): GID_PT entry - nn 200100e08bb191cd pn 210100e08bb191cd portid=050200.
Jul 13 11:24:43 jimbo kernel: scsi(6): GPSC failed, rejected request:
Jul 13 11:24:43 jimbo kernel: 0   1   2   3   4   5   6   7   8   9  Ah  Bh  Ch  Dh  Eh  Fh
Jul 13 11:24:43 jimbo kernel: --------------------------------------------------------------
Jul 13 11:24:43 jimbo kernel: 01  00  00  00  fa  01  00  00  80  01  00  00  00  0b  00  00
Jul 13 11:24:43 jimbo kernel: scsi(6): GPSC failed, rejected request:
Jul 13 11:24:43 jimbo kernel: 0   1   2   3   4   5   6   7   8   9  Ah  Bh  Ch  Dh  Eh  Fh
Jul 13 11:24:43 jimbo kernel: --------------------------------------------------------------
Jul 13 11:24:43 jimbo kernel: 01  00  00  00  fa  01  00  00  80  01  00  00  00  0b  00  00
Jul 13 11:24:43 jimbo kernel: qla24xx_fabric_logout(6): failed to complete IOCB -- completion status (31)  ioparam=a/0.
Jul 13 11:24:43 jimbo kernel: scsi(6): device wrap (050200)
Jul 13 11:24:43 jimbo kernel: scsi(6): Trying Fabric Login w/loop id 0x0081 for port 050100.
Jul 13 11:24:44 jimbo kernel: scsi(6): 202700a0b8119910 -- unsupported FM port operating speed (0000).
Jul 13 11:24:44 jimbo kernel: scsi(6): LOOP READY
Jul 13 11:24:44 jimbo kernel: qla2x00_restart_isp(): Configure loop done, status = 0x0
Jul 13 11:24:44 jimbo kernel: qla2x00_abort_isp(6): exiting.
Jul 13 11:24:44 jimbo kernel: scsi(6): dpc: qla2x00_abort_isp end

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [dm-devel] dm-mpath-rdac.patch problem
  2007-07-13 16:12     ` [dm-devel] " Andrew Vasquez
  2007-07-13 19:13       ` Brian De Wolf
@ 2007-07-13 19:33       ` Brian De Wolf
  1 sibling, 0 replies; 11+ messages in thread
From: Brian De Wolf @ 2007-07-13 19:33 UTC (permalink / raw)
  To: device-mapper development; +Cc: linux-scsi

Andrew Vasquez wrote:
> On Thu, 12 Jul 2007, Mike Anderson wrote:
> 
>> Copying this mail to linux-scsi and Ccing Andrew Vasquez to possibly
>> provide input on the Qlogic behavior.
>>
>> Chandra Seetharaman <sekharan@us.ibm.com> wrote:
>>> On Thu, 2007-07-12 at 18:35 -0700, Brian De Wolf wrote:
>>>> Hello All,
>>>>
>>>> I'm not sure if this is the right place for this, but it seems to be the only
>>>> mailing list related to dm, multipath, and rdac, as far as I can tell.  I've
>>>> been trying out the dm-mpath-rdac patch (both yesterday's and previous) with
>>>> gentoo's unstable 2.6.22 kernel, on a Sun x4100 through a QLA2422 HBA (firmware
>>>> ql2400_fw.bin.4.00.27) to an IBM DS4000.  I am using a version of
>>>> multipath-tools that I got with git a few days ago.
>>>>
>>>> I've got multipath working, it reports the hwhandler correctly ([hwhandler=1
>>>> rdac]), and the volume is mountable, etc.  It also shows one link as active, the
>>>> other as ghost.  However, once the active link dies, the volume becomes read
>>>> only, and both connections are listed as failed.  Most importantly, something
>>>> like this shows up in the logs:
>>>>
>>>> Jul 12 17:11:15 jimbo kernel: device-mapper: multipath rdac: queueing
>>>> MODE_SELECT command on 8:32
>>> It does look like the rdac hardware handler is doing the right thing and
>>> the qlogic is dying for some reason.
>>>
>>> I have tested this code in both RHEL5 and SLES10 environments (qla23xx)
>>> and they work fine. Can you try in one of those and see if it is any
>>> different.
>>>
>>> Just an FYI w.r.t multipath tools: please remove the patch
>>> http://git.kernel.org/?p=linux/storage/multipath-
>>> tools/.git;a=commit;h=e1e1a1bfb2cf76bfd1a49335e3deec5360fb09db from your
>>> tree for the tools to calculate the path priorities properly.
>>>
>>>
>>>> Jul 12 17:11:15 jimbo kernel: qla2xxx 0000:02:01.1: ISP System Error - mbx1=0h
>>>> mbx2=8012h mbx3=8002h.
>>>> Jul 12 17:11:15 jimbo kernel: qla2xxx 0000:02:01.1: Firmware has been previously
>>>> dumped (ffffc2000171d000) -- ignoring request...
>>>> Jul 12 17:11:16 jimbo kernel: qla2xxx 0000:02:01.1: Performing ISP error
>>>> recovery - ha= ffff81007e85c530.
> 
> Hmm yes, there's some real problems going on within the firmware which
> we need to triage.  From the snippet above, the driver was able to
> capture a firmware-dump of a failure (not sure of the timing and how
> it relates to the window in which you recognized a 'problem'), but
> I'll need to to 'capture' the firmware trace and forward it along to
> us to inspect.
> 
> 1) download the following shell script:
> 
> 	ftp://ftp.qlogic.com/outgoing/linux/beta/8.x/test/qla_dmp.sh
> 
> 2) copy the script to the host (/tmp) which is experiencing the
>    problems.
> 
> 3) reboot and load the driver with the ql2xextended_error_logging
>    module parameter set to 1. e.g.:
> 
> 	$ insmod qla2xxx.ko ql2xextended_error_logging=1
> 
> 4) rerun your test and monitor the kernel-messages file for a message
>    similar to:
> 
>         Firmware dump saved to temp buffer (1/adcdabcd)
> 
> 5) To retrieve the dump, go to a console and type the following:
> 
>         # cd /tmp/
>         # ./qla_dmp.sh 1
> 
>    The value passed to qla_dmp.sh should be the same as the first integer
>    in the 'saved to temp buffer' string (in this example, 1).  If the
>    operation was successful, a message like to following should be
>    displayed:
> 
>         Firmware dumped to file fw_dump_1_20041217_023222.txt.gz
> 
>    Formward the 
>    forward over the file.
> 
> 6) forward over the /var/log/messages file of the driver load and
>    failure snippet.
> 
> 
> Not sure which firmware version you are running, but an additional
> datapoint which may be useful after you send the firmware-dump is to
> download the latest 24xx firmware file from QLogic.com:
> 
> 	ftp://ftp.qlogic.com/outgoing/linux/firmware/ql2400_fw.bin
> 
> and retry the test.  If you still see problems, and see a similar
> 'Firmware dump saved...' messages.  Follow the steps above again and
> forward the same datapoints.
> 

I have tried both the ql2400_fw.bin.4.00.18 and ql2400_fw.bin.4.00.27 firmwares
and the HBA had the same error.  The attached datapoints were done using
ql2400_fw.bin.4.00.27.

Note:  This is a resend to the mailing list without attachments.

>>>> While this may be something for the maintainer of the qla2xxx module (I can't
>>>> figure out where I'd send it, in that case...) I think it may be of interest
>>>> that the dm_rdac module tries to push something over the HBA that causes it to
>>>> bail completely and start from scratch (it starts init processes and loading
>>>> firmware again).
>>>>
>>>> Not to say that I'm not interested in any help getting this working, that is.
>>>> If you have any suggestions on how to get this working, I'd love to hear them.
>>>> I'm also willing to guinea pig some testing if you need it (This box still has a
>>>> bit before it will have to be put in use).  I may use redhat to ensure that it's
>>>> not just a broken HBA, but for the long run we would like it to join our gentoo
>>>> environment.
>>>>
>>>> Thanks!
>>>> Brian De Wolf
>>>>
>>>> PS- If the subject mislead you because you feel that this is just a qla2xxx
>>>> problem, I'm sorry for wasting your time.
> 
> Regards,
> Andrew Vasquez
> 
> --
> dm-devel mailing list
> dm-devel@redhat.com
> https://www.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH] dm-mpath-rdac: don't stomp on a request's transfer bit.
  2007-07-13  2:37   ` Mike Anderson
  2007-07-13 16:12     ` [dm-devel] " Andrew Vasquez
@ 2007-07-17 21:07     ` Andrew Vasquez
  2007-07-20 23:05       ` Brian De Wolf
  2007-07-21  5:56       ` Chandra Seetharaman
  1 sibling, 2 replies; 11+ messages in thread
From: Andrew Vasquez @ 2007-07-17 21:07 UTC (permalink / raw)
  To: Brian De Wolf
  Cc: linux-scsi, device-mapper development, sekharan, Mike Anderson

Signed-off-by: Andrew Vasquez <andrew.vasquez@qlogic.com>
---

	On Thu, 12 Jul 2007, Mike Anderson wrote:

	> Copying this mail to linux-scsi and Ccing Andrew Vasquez to possibly
	> provide input on the Qlogic behavior.
	...

	> > > Jul 12 17:11:15 jimbo kernel: qla2xxx 0000:02:01.1: ISP System Error - mbx1=0h
	> > > mbx2=8012h mbx3=8002h.
	> > > Jul 12 17:11:15 jimbo kernel: qla2xxx 0000:02:01.1: Firmware has been previously
	> > > dumped (ffffc2000171d000) -- ignoring request...
	> > > Jul 12 17:11:16 jimbo kernel: qla2xxx 0000:02:01.1: Performing ISP error
	> > > recovery - ha= ffff81007e85c530.

	So what's happening here is the firmware is detecting a Xfer-ready
	from the storage when in fact the data-direction for a mode-select
	should be a write (DATA_OUT).

	The following patch fixes the problem (typo).  Verified by Brian, as
	well.

diff --git a/drivers/md/dm-mpath-rdac.c b/drivers/md/dm-mpath-rdac.c
index 8b776b8..16b1613 100644
--- a/drivers/md/dm-mpath-rdac.c
+++ b/drivers/md/dm-mpath-rdac.c
@@ -292,7 +292,7 @@ static struct request *get_rdac_req(struct rdac_handler *h,
 	rq->end_io_data = h;
 	rq->timeout = h->timeout;
 	rq->cmd_type = REQ_TYPE_BLOCK_PC;
-	rq->cmd_flags = REQ_FAILFAST | REQ_NOMERGE;
+	rq->cmd_flags |= REQ_FAILFAST | REQ_NOMERGE;
 	return rq;
 }
 


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH] dm-mpath-rdac: don't stomp on a request's transfer bit.
  2007-07-17 21:07     ` [PATCH] dm-mpath-rdac: don't stomp on a request's transfer bit Andrew Vasquez
@ 2007-07-20 23:05       ` Brian De Wolf
  2007-07-21  1:25         ` Chandra Seetharaman
  2007-07-21 16:45         ` Alasdair G Kergon
  2007-07-21  5:56       ` Chandra Seetharaman
  1 sibling, 2 replies; 11+ messages in thread
From: Brian De Wolf @ 2007-07-20 23:05 UTC (permalink / raw)
  To: device-mapper development

Andrew Vasquez wrote:
> Signed-off-by: Andrew Vasquez <andrew.vasquez@qlogic.com>
> ---
> 
> 	On Thu, 12 Jul 2007, Mike Anderson wrote:
> 
> 	> Copying this mail to linux-scsi and Ccing Andrew Vasquez to possibly
> 	> provide input on the Qlogic behavior.
> 	...
> 
> 	> > > Jul 12 17:11:15 jimbo kernel: qla2xxx 0000:02:01.1: ISP System Error - mbx1=0h
> 	> > > mbx2=8012h mbx3=8002h.
> 	> > > Jul 12 17:11:15 jimbo kernel: qla2xxx 0000:02:01.1: Firmware has been previously
> 	> > > dumped (ffffc2000171d000) -- ignoring request...
> 	> > > Jul 12 17:11:16 jimbo kernel: qla2xxx 0000:02:01.1: Performing ISP error
> 	> > > recovery - ha= ffff81007e85c530.
> 
> 	So what's happening here is the firmware is detecting a Xfer-ready
> 	from the storage when in fact the data-direction for a mode-select
> 	should be a write (DATA_OUT).
> 
> 	The following patch fixes the problem (typo).  Verified by Brian, as
> 	well.
> 
> diff --git a/drivers/md/dm-mpath-rdac.c b/drivers/md/dm-mpath-rdac.c
> index 8b776b8..16b1613 100644
> --- a/drivers/md/dm-mpath-rdac.c
> +++ b/drivers/md/dm-mpath-rdac.c
> @@ -292,7 +292,7 @@ static struct request *get_rdac_req(struct rdac_handler *h,
>  	rq->end_io_data = h;
>  	rq->timeout = h->timeout;
>  	rq->cmd_type = REQ_TYPE_BLOCK_PC;
> -	rq->cmd_flags = REQ_FAILFAST | REQ_NOMERGE;
> +	rq->cmd_flags |= REQ_FAILFAST | REQ_NOMERGE;
>  	return rq;
>  }
>  
> 

Is this patch going to be adopted into the official linux kernel?  I only see
the big dm-mpath-rdac patch, but I don't think that one works properly without
this one.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Re: [PATCH] dm-mpath-rdac: don't stomp on a request's transfer bit.
  2007-07-20 23:05       ` Brian De Wolf
@ 2007-07-21  1:25         ` Chandra Seetharaman
  2007-07-21 16:45         ` Alasdair G Kergon
  1 sibling, 0 replies; 11+ messages in thread
From: Chandra Seetharaman @ 2007-07-21  1:25 UTC (permalink / raw)
  To: device-mapper development

Hi Brian,

trying to test it out now before i send an ack. Will do so as soon as I
verify it.

regards,

chandra
On Fri, 2007-07-20 at 16:05 -0700, Brian De Wolf wrote:
> Andrew Vasquez wrote:
> > Signed-off-by: Andrew Vasquez <andrew.vasquez@qlogic.com>
> > ---
> > 
> > 	On Thu, 12 Jul 2007, Mike Anderson wrote:
> > 
> > 	> Copying this mail to linux-scsi and Ccing Andrew Vasquez to possibly
> > 	> provide input on the Qlogic behavior.
> > 	...
> > 
> > 	> > > Jul 12 17:11:15 jimbo kernel: qla2xxx 0000:02:01.1: ISP System Error - mbx1=0h
> > 	> > > mbx2=8012h mbx3=8002h.
> > 	> > > Jul 12 17:11:15 jimbo kernel: qla2xxx 0000:02:01.1: Firmware has been previously
> > 	> > > dumped (ffffc2000171d000) -- ignoring request...
> > 	> > > Jul 12 17:11:16 jimbo kernel: qla2xxx 0000:02:01.1: Performing ISP error
> > 	> > > recovery - ha= ffff81007e85c530.
> > 
> > 	So what's happening here is the firmware is detecting a Xfer-ready
> > 	from the storage when in fact the data-direction for a mode-select
> > 	should be a write (DATA_OUT).
> > 
> > 	The following patch fixes the problem (typo).  Verified by Brian, as
> > 	well.
> > 
> > diff --git a/drivers/md/dm-mpath-rdac.c b/drivers/md/dm-mpath-rdac.c
> > index 8b776b8..16b1613 100644
> > --- a/drivers/md/dm-mpath-rdac.c
> > +++ b/drivers/md/dm-mpath-rdac.c
> > @@ -292,7 +292,7 @@ static struct request *get_rdac_req(struct rdac_handler *h,
> >  	rq->end_io_data = h;
> >  	rq->timeout = h->timeout;
> >  	rq->cmd_type = REQ_TYPE_BLOCK_PC;
> > -	rq->cmd_flags = REQ_FAILFAST | REQ_NOMERGE;
> > +	rq->cmd_flags |= REQ_FAILFAST | REQ_NOMERGE;
> >  	return rq;
> >  }
> >  
> > 
> 
> Is this patch going to be adopted into the official linux kernel?  I only see
> the big dm-mpath-rdac patch, but I don't think that one works properly without
> this one.
> 
> --
> dm-devel mailing list
> dm-devel@redhat.com
> https://www.redhat.com/mailman/listinfo/dm-devel
-- 

----------------------------------------------------------------------
    Chandra Seetharaman               | Be careful what you choose....
              - sekharan@us.ibm.com   |      .......you may get it.
----------------------------------------------------------------------

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] dm-mpath-rdac: don't stomp on a request's transfer bit.
  2007-07-17 21:07     ` [PATCH] dm-mpath-rdac: don't stomp on a request's transfer bit Andrew Vasquez
  2007-07-20 23:05       ` Brian De Wolf
@ 2007-07-21  5:56       ` Chandra Seetharaman
  1 sibling, 0 replies; 11+ messages in thread
From: Chandra Seetharaman @ 2007-07-21  5:56 UTC (permalink / raw)
  To: Alasdair G Kergon, torvalds
  Cc: Brian De Wolf, linux-scsi, device-mapper development,
	Mike Anderson

ACK'd. This patch is needed for rdac to work properly.

Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com>

On Tue, 2007-07-17 at 14:07 -0700, Andrew Vasquez wrote:
> Signed-off-by: Andrew Vasquez <andrew.vasquez@qlogic.com>
> ---
> 
> 	On Thu, 12 Jul 2007, Mike Anderson wrote:
> 
> 	> Copying this mail to linux-scsi and Ccing Andrew Vasquez to possibly
> 	> provide input on the Qlogic behavior.
> 	...
> 
> 	> > > Jul 12 17:11:15 jimbo kernel: qla2xxx 0000:02:01.1: ISP System Error - mbx1=0h
> 	> > > mbx2=8012h mbx3=8002h.
> 	> > > Jul 12 17:11:15 jimbo kernel: qla2xxx 0000:02:01.1: Firmware has been previously
> 	> > > dumped (ffffc2000171d000) -- ignoring request...
> 	> > > Jul 12 17:11:16 jimbo kernel: qla2xxx 0000:02:01.1: Performing ISP error
> 	> > > recovery - ha= ffff81007e85c530.
> 
> 	So what's happening here is the firmware is detecting a Xfer-ready
> 	from the storage when in fact the data-direction for a mode-select
> 	should be a write (DATA_OUT).
> 
> 	The following patch fixes the problem (typo).  Verified by Brian, as
> 	well.
> 
> diff --git a/drivers/md/dm-mpath-rdac.c b/drivers/md/dm-mpath-rdac.c
> index 8b776b8..16b1613 100644
> --- a/drivers/md/dm-mpath-rdac.c
> +++ b/drivers/md/dm-mpath-rdac.c
> @@ -292,7 +292,7 @@ static struct request *get_rdac_req(struct rdac_handler *h,
>  	rq->end_io_data = h;
>  	rq->timeout = h->timeout;
>  	rq->cmd_type = REQ_TYPE_BLOCK_PC;
> -	rq->cmd_flags = REQ_FAILFAST | REQ_NOMERGE;
> +	rq->cmd_flags |= REQ_FAILFAST | REQ_NOMERGE;
>  	return rq;
>  }
> 
> 
-- 

----------------------------------------------------------------------
    Chandra Seetharaman               | Be careful what you choose....
              - sekharan@us.ibm.com   |      .......you may get it.
----------------------------------------------------------------------



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Re: [PATCH] dm-mpath-rdac: don't stomp on a request's transfer bit.
  2007-07-20 23:05       ` Brian De Wolf
  2007-07-21  1:25         ` Chandra Seetharaman
@ 2007-07-21 16:45         ` Alasdair G Kergon
  1 sibling, 0 replies; 11+ messages in thread
From: Alasdair G Kergon @ 2007-07-21 16:45 UTC (permalink / raw)
  To: device-mapper development

On Fri, Jul 20, 2007 at 04:05:23PM -0700, Brian De Wolf wrote:
> Is this patch going to be adopted into the official linux kernel?  

Yes.

Alasdair
-- 
agk@redhat.com

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2007-07-21 16:45 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-07-13  1:35 dm-mpath-rdac.patch problem Brian De Wolf
2007-07-13  2:06 ` Chandra Seetharaman
2007-07-13  2:37   ` Mike Anderson
2007-07-13 16:12     ` [dm-devel] " Andrew Vasquez
2007-07-13 19:13       ` Brian De Wolf
2007-07-13 19:33       ` Brian De Wolf
2007-07-17 21:07     ` [PATCH] dm-mpath-rdac: don't stomp on a request's transfer bit Andrew Vasquez
2007-07-20 23:05       ` Brian De Wolf
2007-07-21  1:25         ` Chandra Seetharaman
2007-07-21 16:45         ` Alasdair G Kergon
2007-07-21  5:56       ` Chandra Seetharaman

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.