All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Roger Håkansson" <hson@ludd.luth.se>
To: device-mapper development <dm-devel@redhat.com>
Subject: Re: Problems with multipathing
Date: Mon, 17 Apr 2006 16:17:52 +0200	[thread overview]
Message-ID: <4443A390.5010005@ludd.luth.se> (raw)
In-Reply-To: <4443635F.5000401@free.fr>

Christophe Varoqui wrote:
>
> Do failover device nodes get reassigned during the rescan ?
> Like, for example, a configured path sda gets removed and a new path sdb
> appears ?

No, since I don't do a rescan on the bus but just on the target itself.
When I had the controller in non-hubbed mode and did (when a controller
has failed) "echo 1> /sys/class/scsi_host/host[1-2]/scan" I got two new
devices, sde and sdf (I normally havesda,sdb,sdc and sdd)
But if I instead did "echo 1 >
/sys/class/scsi_device/[1-2]:0:0:0/device/rescan", I didn't get any new
devices but the old ones start working again.
Now, when I have the box in hubbed-mode, I can't seem to get new devices
 even when I do a scsi-host-scan, but just as before, a
scsi-target-rescan will get my devices back to order again.

Also, I've noticed that it's not only when a controller fails that this
happens, when a failed controller is "revived" the same thing might happen.

As far as I've been able to tell, the more I/O-transactions at the time
of the failure, the more likely that the (SCSI) device will be marked as
"dead".
If I do "while /bin/true ;do dd if=/dev/zero of=/mnt/test count=20000;
sleep 1" and fail (or revive) a controller it seems to work in 50% of
the cases, with 2 sec sleep there is rarely any problem but with no
sleep at all it fails nearly 100% of the times
And in all types of tests, if I do a SCSI-target(path) rescan before
multipath decides both paths are dead, both paths will work again and
the multipath-device will never fail.

> If so, the FC transport class is in charge of the timeout triggering the
> dead devices removal.
> A hardware handler wouldn't help here.
> 
> Can you paste a before/after scsi rescan "multipath -l" output ?
> 

They are identical

[root@asl005 ~]# multipath -l
mpath1 (3600d0230000000000b01910b4d313400)
[size=97 GB][features=0][hwhandler=0]
\_ round-robin 0 [prio=0][active]
 \_ 1:0:0:0 sdb 8:16  [active][undef]
 \_ 2:0:0:0 sdc 8:32  [active][undef]
[root@asl005 ~]# dmesg |tail -20
SCSI error : <1 0 0 0> return code = 0x20008
end_request: I/O error, dev sdb, sector 21247352
end_request: I/O error, dev sdb, sector 21247360
SCSI error : <1 0 0 0> return code = 0x20008
end_request: I/O error, dev sdb, sector 21036576
end_request: I/O error, dev sdb, sector 21036584
Aborting journal on device dm-5.
ext3_abort called.
EXT3-fs error (device dm-5): ext3_journal_start_sb: Detected aborted journal
Remounting filesystem read-only
EXT3-fs error (device dm-5) in start_transaction: Journal has aborted
__journal_remove_journal_head: freeing b_committed_data
printk: 254766 messages suppressed.
Buffer I/O error on device dm-5, logical block 2092209
lost page write due to I/O error on dm-5
Buffer I/O error on device dm-5, logical block 2093234
lost page write due to I/O error on dm-5
printk: 485 messages suppressed.
Buffer I/O error on device dm-5, logical block 1
lost page write due to I/O error on dm-5
[root@asl005 ~]# multipath -l
mpath1 (3600d0230000000000b01910b4d313400)
[size=97 GB][features=0][hwhandler=0]
\_ round-robin 0 [prio=0][enabled]
 \_ 1:0:0:0 sdb 8:16  [failed][undef]
 \_ 2:0:0:0 sdc 8:32  [failed][undef]
[root@asl005 ~]# echo 1 >
/sys/class/fc_transport/target1\:0\:0/device/1\:0\:0\:0/rescan
[root@asl005 ~]# multipath -l
mpath1 (3600d0230000000000b01910b4d313400)
[size=97 GB][features=0][hwhandler=0]
\_ round-robin 0 [prio=0][enabled]
 \_ 1:0:0:0 sdb 8:16  [failed][undef]
 \_ 2:0:0:0 sdc 8:32  [failed][undef]
[root@asl005 ~]# multipath
[root@asl005 ~]# multipath -ll
mpath1 (3600d0230000000000b01910b4d313400)
[size=97 GB][features=0][hwhandler=0]
\_ round-robin 0 [prio=0][active]
 \_ 1:0:0:0 sdb 8:16  [active][undef]
 \_ 2:0:0:0 sdc 8:32  [active][undef]

  reply	other threads:[~2006-04-17 14:17 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-04-11 18:10 Problems with multipathing Roger Håkansson
2006-04-11 18:42 ` Christophe Varoqui
2006-04-11 21:04   ` Roger Håkansson
2006-04-13 17:14     ` Roger Håkansson
2006-04-13 20:48       ` Christophe Varoqui
2006-04-16 22:44         ` Roger Håkansson
2006-04-17  0:35         ` Roger Håkansson
2006-04-17  9:43           ` Christophe Varoqui
2006-04-17 14:17             ` Roger Håkansson [this message]
2006-04-18  4:47               ` Christophe Varoqui
2006-04-18  5:04                 ` Roger Håkansson
2006-04-18 19:38                 ` James Smart
2006-04-18 19:24             ` James Smart

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4443A390.5010005@ludd.luth.se \
    --to=hson@ludd.luth.se \
    --cc=dm-devel@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.