From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Gerald Nowitzky" Subject: Re: multibus / failover and EMC CX600 Date: Wed, 17 Oct 2007 21:38:56 +0200 Message-ID: <06b201c810f5$5b7782d0$0a00a8c0@ALDI2> References: <061401c810a7$cac685d0$0a00a8c0@ALDI2><4715E6A4.7060308@linpro.no> <4715ED28.9020102@suse.de><471600F7.5090607@linpro.no> <066001c810cc$cd9f6f90$0a00a8c0@ALDI2> <471631E5.9050603@linpro.no> Reply-To: device-mapper development Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============1152962555==" Return-path: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: dm-devel-bounces@redhat.com Errors-To: dm-devel-bounces@redhat.com To: device-mapper development List-Id: dm-devel.ids This is a multi-part message in MIME format. --===============1152962555== Content-Type: multipart/alternative; boundary="----=_NextPart_000_06AF_01C81106.1EAF4CA0" This is a multi-part message in MIME format. ------=_NextPart_000_06AF_01C81106.1EAF4CA0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable not much difference with 2.6.23.1: Oct 17 22:33:56 SANfile_m kernel: kobject_add failed for 1:0:1:0 with = -EEXIST, don't try to register things with the same name in the same directory. Oct 17 22:33:56 SANfile_m kernel: [] = kobject_shadow_add+0x115/0x1b0 Oct 17 22:33:56 SANfile_m kernel: [] device_add+0xa8/0x5a0 Oct 17 22:33:56 SANfile_m kernel: [] = __blk_queue_init_tags+0x32/0x70 Oct 17 22:33:56 SANfile_m kernel: [] = scsi_sysfs_add_sdev+0x4f/0x220 Oct 17 22:33:56 SANfile_m kernel: [] = qla2xxx_slave_configure+0x77/0x110 [qla2xxx] Oct 17 22:33:56 SANfile_m kernel: [] = scsi_probe_and_add_lun+0x92a/0x950 Oct 17 22:33:56 SANfile_m kernel: [] = __scsi_scan_target+0x4fd/0x5b0 Oct 17 22:33:56 SANfile_m kernel: [] = scsi_scan_target+0x94/0xc0 Oct 17 22:33:56 SANfile_m kernel: [] = fc_scsi_scan_rport+0x0/0x90 Oct 17 22:33:56 SANfile_m kernel: [] = fc_scsi_scan_rport+0x78/0x90 Oct 17 22:33:56 SANfile_m kernel: [] run_workqueue+0x73/0x100 Oct 17 22:33:56 SANfile_m kernel: [] = autoremove_wake_function+0x0/0x50 Oct 17 22:33:56 SANfile_m kernel: [] worker_thread+0x9c/0x100 Oct 17 22:33:56 SANfile_m kernel: [] = autoremove_wake_function+0x0/0x50 Oct 17 22:33:56 SANfile_m kernel: [] worker_thread+0x0/0x100 Oct 17 22:33:56 SANfile_m kernel: [] kthread+0x42/0x70 Oct 17 22:33:56 SANfile_m kernel: [] kthread+0x0/0x70 Oct 17 22:33:56 SANfile_m kernel: [] = kernel_thread_helper+0x7/0x18 Oct 17 22:33:56 SANfile_m kernel: = =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D Oct 17 22:33:56 SANfile_m kernel: error 1 ----- Original Message -----=20 From: Tore Anderson=20 To: device-mapper development=20 Sent: Wednesday, October 17, 2007 6:01 PM Subject: Re: [dm-devel] multibus / failover and EMC CX600 * Gerald Nowitzky > The mpath_prio_emc with group_by_prio did the trick. Thanks! > =20 > But I am still loosing the paths to the failed devices. I Increased > dev_loss_tmo, but the maximum seems to be about 600 - thus, after 10 > Minutes, the paths fail: The maximum is indeed 600 seconds in 2.6.23. > SANfile_m linux # multipath -l > hcfshare (360060160c820080063502869e459dc11) dm-0 , > [size=3D3.4T][features=3D1 queue_if_no_path][hwhandler=3D1 emc] > \_ round-robin 0 [prio=3D0][enabled] > \_ #:#:#:# - #:# [failed][undef] > \_ #:#:#:# - #:# [failed][undef] > \_ round-robin 0 [prio=3D0][active] > \_ 2:0:0:0 sdd 8:48 [active][undef] > \_ 1:0:0:0 sdb 8:16 [active][undef] > If I put them online again, I run into the -EEXIST prob. Async SCSI > scanning *is* off in my kernel, so the only thing I could do from > here is to try the patch, is it? Matthew Wilcox' patch solved this particular problem for me, yes. I still had some problems with -EEXIST when unloading and re-inserting = the HBA driver module, though, but that's a corner case I rarely run into (as well as being easily worked around by trying again). Come to think of it, you never said which kernel version you're = running...? > Oct 17 17:26:36 SANfile_m kernel: kobject_add failed for 1:0:1:0 = with > -EEXIST, don't try to register things with the same name in the same > directory. One suggestion... If the sysfs object is still around, you might be able to delete it manually by running =C2=ABecho 1 > /sys/class/scsi_device/1:0:1:0/device/delete=C2=BB. If that works, = you can try to rescan again by doing =C2=ABecho 0 1 0 > /sys/class/scsi_host/host1/scan=C2=BB. With some luck it'll work... If it does, most of the time udev will notice and alert multipath to check out the new device. Sometimes it doesn't work, though - simply run the =C2=ABmultipath=C2=BB command manually in that case. By the way - the =C2=AB1=C2=BB in =C2=ABhost1=C2=BB maps to the first = digit in =C2=AB1:0:1:0=C2=BB, while the =C2=AB0 1 0=C2=BB in the echo command to the last three. Regards --=20 Tore Anderson -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel ------=_NextPart_000_06AF_01C81106.1EAF4CA0 Content-Type: text/html; charset="utf-8" Content-Transfer-Encoding: quoted-printable =EF=BB=BF
not much difference with=20 2.6.23.1:
 
Oct 17 22:33:56 SANfile_m kernel: = kobject_add=20 failed for 1:0:1:0 with -EEXIST, don't try to register things = with
the same=20 name in the same directory.
Oct 17 22:33:56 SANfile_m kernel: =20 [<c03074d5>] kobject_shadow_add+0x115/0x1b0
Oct 17 22:33:56 = SANfile_m=20 kernel:  [<c03a8f58>] device_add+0xa8/0x5a0
Oct 17 = 22:33:56=20 SANfile_m kernel:  [<c02fc582>]=20 __blk_queue_init_tags+0x32/0x70
Oct 17 22:33:56 SANfile_m = kernel: =20 [<c03fb8af>] scsi_sysfs_add_sdev+0x4f/0x220
Oct 17 22:33:56 = SANfile_m=20 kernel:  [<f99498b7>] qla2xxx_slave_configure+0x77/0x110=20 [qla2xxx]
Oct 17 22:33:56 SANfile_m kernel:  [<c03f986a>]=20 scsi_probe_and_add_lun+0x92a/0x950
Oct 17 22:33:56 SANfile_m = kernel: =20 [<c03fa26d>] __scsi_scan_target+0x4fd/0x5b0
Oct 17 22:33:56 = SANfile_m=20 kernel:  [<c03fa954>] scsi_scan_target+0x94/0xc0
Oct 17 = 22:33:56=20 SANfile_m kernel:  [<c04014f0>] = fc_scsi_scan_rport+0x0/0x90
Oct 17=20 22:33:56 SANfile_m kernel:  [<c0401568>]=20 fc_scsi_scan_rport+0x78/0x90
Oct 17 22:33:56 SANfile_m kernel: =20 [<c013e963>] run_workqueue+0x73/0x100
Oct 17 22:33:56 SANfile_m = kernel:  [<c0142350>] = autoremove_wake_function+0x0/0x50
Oct 17=20 22:33:56 SANfile_m kernel:  [<c013f3dc>]=20 worker_thread+0x9c/0x100
Oct 17 22:33:56 SANfile_m kernel: =20 [<c0142350>] autoremove_wake_function+0x0/0x50
Oct 17 22:33:56=20 SANfile_m kernel:  [<c013f340>] = worker_thread+0x0/0x100
Oct 17=20 22:33:56 SANfile_m kernel:  [<c0142082>] = kthread+0x42/0x70
Oct 17=20 22:33:56 SANfile_m kernel:  [<c0142040>] = kthread+0x0/0x70
Oct 17=20 22:33:56 SANfile_m kernel:  [<c0105e6f>]=20 kernel_thread_helper+0x7/0x18
Oct 17 22:33:56 SANfile_m kernel:  = =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
= Oct 17 22:33:56 SANfile_m kernel: error=20 1
----- Original Message -----
From:=20 Tore = Anderson
Sent: Wednesday, October 17, = 2007 6:01=20 PM
Subject: Re: [dm-devel] = multibus /=20 failover and EMC CX600

* Gerald Nowitzky

> The mpath_prio_emc with=20 group_by_prio did the trick. Thanks!

> But I am = still=20 loosing the paths to the failed devices. I Increased
> = dev_loss_tmo, but=20 the maximum seems to be about 600 - thus, after 10
> Minutes, = the paths=20 fail:

The maximum is indeed 600 seconds in 2.6.23.

>=20 SANfile_m linux # multipath -l
> hcfshare=20 (360060160c820080063502869e459dc11) dm-0 ,
> = [size=3D3.4T][features=3D1=20 queue_if_no_path][hwhandler=3D1 emc]
> \_ round-robin 0=20 [prio=3D0][enabled]
>  \_ #:#:#:# -   = #:#  =20 [failed][undef]
>  \_ #:#:#:# -   #:#   = [failed][undef]
> \_ round-robin 0 = [prio=3D0][active]
>  \_=20 2:0:0:0 sdd 8:48  [active][undef]
>  \_ 1:0:0:0 sdb = 8:16 =20 [active][undef]
> If I put them online again, I run into the = -EEXIST=20 prob. Async SCSI
> scanning *is* off in my kernel, so the only = thing I=20 could do from
> here is to try the patch, is it?

Matthew = Wilcox'=20 patch solved this particular problem for me, yes.  I
still had = some=20 problems with -EEXIST when unloading and re-inserting the
HBA = driver=20 module, though, but that's a corner case I rarely run into
(as well = as=20 being easily worked around by trying again).

Come to think of = it, you=20 never said which kernel version you're running...?

> Oct 17 = 17:26:36=20 SANfile_m kernel: kobject_add failed for 1:0:1:0 with
> -EEXIST, = don't=20 try to register things with the same name in the same
>=20 directory.

One suggestion...  If the sysfs object is still = around,=20 you might be
able to delete it manually by running =C2=ABecho 1=20 >
/sys/class/scsi_device/1:0:1:0/device/delete=C2=BB.  If = that works,=20 you can
try to rescan again by doing =C2=ABecho 0 1 0=20 >
/sys/class/scsi_host/host1/scan=C2=BB.  With some luck = it'll=20 work...

If it does, most of the time udev will notice and alert = multipath to
check out the new device.  Sometimes it doesn't = work,=20 though - simply
run the =C2=ABmultipath=C2=BB command manually in = that=20 case.

By the way - the =C2=AB1=C2=BB in =C2=ABhost1=C2=BB maps = to the first digit in=20 =C2=AB1:0:1:0=C2=BB,
while the =C2=AB0 1 0=C2=BB in the echo = command to the last=20 three.

Regards
--
Tore Anderson

--
dm-devel = mailing=20 list
dm-devel@redhat.com
https://www.red= hat.com/mailman/listinfo/dm-devel ------=_NextPart_000_06AF_01C81106.1EAF4CA0-- --===============1152962555== Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline --===============1152962555==--