From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Gerald Nowitzky" Subject: Re: multibus / failover and EMC CX600 Date: Wed, 17 Oct 2007 20:04:12 +0200 Message-ID: <068301c810e8$1f4e6e70$0a00a8c0@ALDI2> References: <061401c810a7$cac685d0$0a00a8c0@ALDI2><4715E6A4.7060308@linpro.no> <4715ED28.9020102@suse.de><471600F7.5090607@linpro.no> <066001c810cc$cd9f6f90$0a00a8c0@ALDI2> <471631E5.9050603@linpro.no> Reply-To: device-mapper development Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============0378461437==" Return-path: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: dm-devel-bounces@redhat.com Errors-To: dm-devel-bounces@redhat.com To: device-mapper development List-Id: dm-devel.ids This is a multi-part message in MIME format. --===============0378461437== Content-Type: multipart/alternative; boundary="----=_NextPart_000_0680_01C810F8.E2A04FF0" This is a multi-part message in MIME format. ------=_NextPart_000_0680_01C810F8.E2A04FF0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable I'm afraid the patch did not work for me. I'ts still the same. I am using kernel 2.6.22.2 at the moment. Should I upgrade to 2.6.23 ? Anybody any Ideas? The system is not in production at the moment. We could do some testing. (Gerald) Oct 17 20:57:09 SANfile_m kernel: kobject_add failed for 1:0:1:0 with = -EEXIST, don't try to register things with the same name in the same = directory. Oct 17 20:57:09 SANfile_m kernel: [number+85/816] = kobject_shadow_add+0x115/0x1b0 Oct 17 20:57:09 SANfile_m kernel: [] = kobject_shadow_add+0x115/0x1b0 Oct 17 20:57:09 SANfile_m kernel: [lo_ioctl+1125/2528] = device_add+0xc5/0x570 Oct 17 20:57:09 SANfile_m kernel: [] device_add+0xc5/0x570 Oct 17 20:57:09 SANfile_m kernel: [fc_remote_port_rolechg+127/320] = scsi_adjust_queue_depth+0x9f/0xf0 Oct 17 20:57:09 SANfile_m kernel: [] = scsi_adjust_queue_depth+0x9f/0xf0 Oct 17 20:57:09 SANfile_m kernel: [blk_register_region+18/64] = __blk_queue_init_tags+0x32/0x70 Oct 17 20:57:09 SANfile_m kernel: [] = __blk_queue_init_tags+0x32/0x70 Oct 17 20:57:09 SANfile_m kernel: [sr_get_mcn+50/240] = scsi_sysfs_add_sdev+0x32/0x230 Oct 17 20:57:09 SANfile_m kernel: [] = scsi_sysfs_add_sdev+0x32/0x230 Oct 17 20:57:09 SANfile_m kernel: [] = qla2xxx_slave_configure+0x77/0x110 [qla2xxx] Oct 17 20:57:09 SANfile_m kernel: [sd_init_command+313/1088] = scsi_probe_and_add_lun+0x8c9/0x940 Oct 17 20:57:09 SANfile_m kernel: [] = scsi_probe_and_add_lun+0x8c9/0x940 Oct 17 20:57:09 SANfile_m kernel: [sr_probe+72/1472] = __scsi_scan_target+0x518/0x5c0 Oct 17 20:57:09 SANfile_m kernel: [] = __scsi_scan_target+0x518/0x5c0 Oct 17 20:57:09 SANfile_m kernel: [kallsyms_addresses+36323/130252] = schedule+0x2df/0x940 Oct 17 20:57:09 SANfile_m kernel: [] schedule+0x2df/0x940 Oct 17 20:57:09 SANfile_m kernel: [sr_init_command+128/944] = scsi_scan_target+0xd0/0xe0 Oct 17 20:57:09 SANfile_m kernel: [] = scsi_scan_target+0xd0/0xe0 Oct 17 20:57:09 SANfile_m kernel: [SendIocInit+272/784] = fc_scsi_scan_rport+0x0/0x90 Oct 17 20:57:09 SANfile_m kernel: [] = fc_scsi_scan_rport+0x0/0x90 Oct 17 20:57:09 SANfile_m kernel: [SendIocInit+392/784] = fc_scsi_scan_rport+0x78/0x90 Oct 17 20:57:09 SANfile_m kernel: [] = fc_scsi_scan_rport+0x78/0x90 Oct 17 20:57:09 SANfile_m kernel: [run_workqueue+131/256] = run_workqueue+0x73/0x100 Oct 17 20:57:09 SANfile_m kernel: [] run_workqueue+0x73/0x100 Oct 17 20:57:09 SANfile_m kernel: [autoremove_wake_function+16/80] = autoremove_wake_function+0x0/0x50 Oct 17 20:57:09 SANfile_m kernel: [] = autoremove_wake_function+0x0/0x50 Oct 17 20:57:09 SANfile_m kernel: [worker_thread+172/256] = worker_thread+0x9c/0x100 Oct 17 20:57:09 SANfile_m kernel: [] worker_thread+0x9c/0x100 Oct 17 20:57:09 SANfile_m kernel: [autoremove_wake_function+16/80] = autoremove_wake_function+0x0/0x50 Oct 17 20:57:09 SANfile_m kernel: [] = autoremove_wake_function+0x0/0x50 Oct 17 20:57:09 SANfile_m kernel: [worker_thread+16/256] = worker_thread+0x0/0x100 Oct 17 20:57:09 SANfile_m kernel: [] worker_thread+0x0/0x100 Oct 17 20:57:09 SANfile_m kernel: [kthread+82/112] kthread+0x42/0x70 Oct 17 20:57:09 SANfile_m kernel: [] kthread+0x42/0x70 Oct 17 20:57:09 SANfile_m kernel: [kthread+16/112] kthread+0x0/0x70 Oct 17 20:57:09 SANfile_m kernel: [] kthread+0x0/0x70 Oct 17 20:57:09 SANfile_m kernel: [print_trace_stack+3/16] = kernel_thread_helper+0x7/0x14 Oct 17 20:57:09 SANfile_m kernel: [] = kernel_thread_helper+0x7/0x14 Oct 17 20:57:09 SANfile_m kernel: = =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D Oct 17 20:57:09 SANfile_m kernel: error 1 ----- Original Message -----=20 From: Tore Anderson=20 To: device-mapper development=20 Sent: Wednesday, October 17, 2007 6:01 PM Subject: Re: [dm-devel] multibus / failover and EMC CX600 * Gerald Nowitzky > The mpath_prio_emc with group_by_prio did the trick. Thanks! > =20 > But I am still loosing the paths to the failed devices. I Increased > dev_loss_tmo, but the maximum seems to be about 600 - thus, after 10 > Minutes, the paths fail: The maximum is indeed 600 seconds in 2.6.23. > SANfile_m linux # multipath -l > hcfshare (360060160c820080063502869e459dc11) dm-0 , > [size=3D3.4T][features=3D1 queue_if_no_path][hwhandler=3D1 emc] > \_ round-robin 0 [prio=3D0][enabled] > \_ #:#:#:# - #:# [failed][undef] > \_ #:#:#:# - #:# [failed][undef] > \_ round-robin 0 [prio=3D0][active] > \_ 2:0:0:0 sdd 8:48 [active][undef] > \_ 1:0:0:0 sdb 8:16 [active][undef] > If I put them online again, I run into the -EEXIST prob. Async SCSI > scanning *is* off in my kernel, so the only thing I could do from > here is to try the patch, is it? Matthew Wilcox' patch solved this particular problem for me, yes. I still had some problems with -EEXIST when unloading and re-inserting = the HBA driver module, though, but that's a corner case I rarely run into (as well as being easily worked around by trying again). Come to think of it, you never said which kernel version you're = running...? > Oct 17 17:26:36 SANfile_m kernel: kobject_add failed for 1:0:1:0 = with > -EEXIST, don't try to register things with the same name in the same > directory. One suggestion... If the sysfs object is still around, you might be able to delete it manually by running =C2=ABecho 1 > /sys/class/scsi_device/1:0:1:0/device/delete=C2=BB. If that works, = you can try to rescan again by doing =C2=ABecho 0 1 0 > /sys/class/scsi_host/host1/scan=C2=BB. With some luck it'll work... If it does, most of the time udev will notice and alert multipath to check out the new device. Sometimes it doesn't work, though - simply run the =C2=ABmultipath=C2=BB command manually in that case. By the way - the =C2=AB1=C2=BB in =C2=ABhost1=C2=BB maps to the first = digit in =C2=AB1:0:1:0=C2=BB, while the =C2=AB0 1 0=C2=BB in the echo command to the last three. Regards --=20 Tore Anderson -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel ------=_NextPart_000_0680_01C810F8.E2A04FF0 Content-Type: text/html; charset="utf-8" Content-Transfer-Encoding: quoted-printable =EF=BB=BF
I'm afraid the patch did not = work for me. I'ts=20 still the same.
 
I am using kernel 2.6.22.2 at the = moment. Should I=20 upgrade to 2.6.23 ?
 
Anybody any Ideas?
The system is not in production at the = moment. We=20 could do some testing.
 
(Gerald)
 
Oct 17 20:57:09 SANfile_m kernel: = kobject_add=20 failed for 1:0:1:0 with -EEXIST, don't try to register things with the = same name=20 in the same directory.
Oct 17 20:57:09 SANfile_m kernel: =20 [number+85/816] kobject_shadow_add+0x115/0x1b0
Oct 17 20:57:09 = SANfile_m=20 kernel:  [<c02f95f5>] kobject_shadow_add+0x115/0x1b0
Oct = 17=20 20:57:09 SANfile_m kernel:  [lo_ioctl+1125/2528]=20 device_add+0xc5/0x570
Oct 17 20:57:09 SANfile_m kernel: =20 [<c03aefd5>] device_add+0xc5/0x570
Oct 17 20:57:09 SANfile_m=20 kernel:  [fc_remote_port_rolechg+127/320]=20 scsi_adjust_queue_depth+0x9f/0xf0
Oct 17 20:57:09 SANfile_m = kernel: =20 [<c03f9d7f>] scsi_adjust_queue_depth+0x9f/0xf0
Oct 17 20:57:09=20 SANfile_m kernel:  [blk_register_region+18/64]=20 __blk_queue_init_tags+0x32/0x70
Oct 17 20:57:09 SANfile_m = kernel: =20 [<c02eeb72>] __blk_queue_init_tags+0x32/0x70
Oct 17 20:57:09 = SANfile_m=20 kernel:  [sr_get_mcn+50/240] scsi_sysfs_add_sdev+0x32/0x230
Oct = 17=20 20:57:09 SANfile_m kernel:  [<c04028b2>]=20 scsi_sysfs_add_sdev+0x32/0x230
Oct 17 20:57:09 SANfile_m = kernel: =20 [<f99445b7>] qla2xxx_slave_configure+0x77/0x110 [qla2xxx]
Oct = 17=20 20:57:09 SANfile_m kernel:  [sd_init_command+313/1088]=20 scsi_probe_and_add_lun+0x8c9/0x940
Oct 17 20:57:09 SANfile_m = kernel: =20 [<c0400859>] scsi_probe_and_add_lun+0x8c9/0x940
Oct 17 20:57:09 = SANfile_m kernel:  [sr_probe+72/1472] = __scsi_scan_target+0x518/0x5c0
Oct=20 17 20:57:09 SANfile_m kernel:  [<c04012c8>]=20 __scsi_scan_target+0x518/0x5c0
Oct 17 20:57:09 SANfile_m = kernel: =20 [kallsyms_addresses+36323/130252] schedule+0x2df/0x940
Oct 17 = 20:57:09=20 SANfile_m kernel:  [<c053699f>] schedule+0x2df/0x940
Oct = 17=20 20:57:09 SANfile_m kernel:  [sr_init_command+128/944]=20 scsi_scan_target+0xd0/0xe0
Oct 17 20:57:09 SANfile_m kernel: =20 [<c0401a40>] scsi_scan_target+0xd0/0xe0
Oct 17 20:57:09 = SANfile_m=20 kernel:  [SendIocInit+272/784] fc_scsi_scan_rport+0x0/0x90
Oct = 17=20 20:57:09 SANfile_m kernel:  [<c04084e0>]=20 fc_scsi_scan_rport+0x0/0x90
Oct 17 20:57:09 SANfile_m kernel: =20 [SendIocInit+392/784] fc_scsi_scan_rport+0x78/0x90
Oct 17 20:57:09 = SANfile_m=20 kernel:  [<c0408558>] fc_scsi_scan_rport+0x78/0x90
Oct 17 = 20:57:09=20 SANfile_m kernel:  [run_workqueue+131/256] = run_workqueue+0x73/0x100
Oct=20 17 20:57:09 SANfile_m kernel:  [<c0131dc3>]=20 run_workqueue+0x73/0x100
Oct 17 20:57:09 SANfile_m kernel: =20 [autoremove_wake_function+16/80] = autoremove_wake_function+0x0/0x50
Oct 17=20 20:57:09 SANfile_m kernel:  [<c01354e0>]=20 autoremove_wake_function+0x0/0x50
Oct 17 20:57:09 SANfile_m = kernel: =20 [worker_thread+172/256] worker_thread+0x9c/0x100
Oct 17 20:57:09 = SANfile_m=20 kernel:  [<c01326dc>] worker_thread+0x9c/0x100
Oct 17 = 20:57:09=20 SANfile_m kernel:  [autoremove_wake_function+16/80]=20 autoremove_wake_function+0x0/0x50
Oct 17 20:57:09 SANfile_m = kernel: =20 [<c01354e0>] autoremove_wake_function+0x0/0x50
Oct 17 20:57:09=20 SANfile_m kernel:  [worker_thread+16/256] = worker_thread+0x0/0x100
Oct 17=20 20:57:09 SANfile_m kernel:  [<c0132640>]=20 worker_thread+0x0/0x100
Oct 17 20:57:09 SANfile_m kernel: =20 [kthread+82/112] kthread+0x42/0x70
Oct 17 20:57:09 SANfile_m = kernel: =20 [<c0135212>] kthread+0x42/0x70
Oct 17 20:57:09 SANfile_m = kernel: =20 [kthread+16/112] kthread+0x0/0x70
Oct 17 20:57:09 SANfile_m = kernel: =20 [<c01351d0>] kthread+0x0/0x70
Oct 17 20:57:09 SANfile_m = kernel: =20 [print_trace_stack+3/16] kernel_thread_helper+0x7/0x14
Oct 17 = 20:57:09=20 SANfile_m kernel:  [<c0104763>] = kernel_thread_helper+0x7/0x14
Oct=20 17 20:57:09 SANfile_m kernel:  = =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
= Oct 17 20:57:09=20 SANfile_m kernel: error 1
----- Original Message -----
From:=20 Tore = Anderson
Sent: Wednesday, October 17, = 2007 6:01=20 PM
Subject: Re: [dm-devel] = multibus /=20 failover and EMC CX600

* Gerald Nowitzky

> The mpath_prio_emc with=20 group_by_prio did the trick. Thanks!

> But I am = still=20 loosing the paths to the failed devices. I Increased
> = dev_loss_tmo, but=20 the maximum seems to be about 600 - thus, after 10
> Minutes, = the paths=20 fail:

The maximum is indeed 600 seconds in 2.6.23.

>=20 SANfile_m linux # multipath -l
> hcfshare=20 (360060160c820080063502869e459dc11) dm-0 ,
> = [size=3D3.4T][features=3D1=20 queue_if_no_path][hwhandler=3D1 emc]
> \_ round-robin 0=20 [prio=3D0][enabled]
>  \_ #:#:#:# -   = #:#  =20 [failed][undef]
>  \_ #:#:#:# -   #:#   = [failed][undef]
> \_ round-robin 0 = [prio=3D0][active]
>  \_=20 2:0:0:0 sdd 8:48  [active][undef]
>  \_ 1:0:0:0 sdb = 8:16 =20 [active][undef]
> If I put them online again, I run into the = -EEXIST=20 prob. Async SCSI
> scanning *is* off in my kernel, so the only = thing I=20 could do from
> here is to try the patch, is it?

Matthew = Wilcox'=20 patch solved this particular problem for me, yes.  I
still had = some=20 problems with -EEXIST when unloading and re-inserting the
HBA = driver=20 module, though, but that's a corner case I rarely run into
(as well = as=20 being easily worked around by trying again).

Come to think of = it, you=20 never said which kernel version you're running...?

> Oct 17 = 17:26:36=20 SANfile_m kernel: kobject_add failed for 1:0:1:0 with
> -EEXIST, = don't=20 try to register things with the same name in the same
>=20 directory.

One suggestion...  If the sysfs object is still = around,=20 you might be
able to delete it manually by running =C2=ABecho 1=20 >
/sys/class/scsi_device/1:0:1:0/device/delete=C2=BB.  If = that works,=20 you can
try to rescan again by doing =C2=ABecho 0 1 0=20 >
/sys/class/scsi_host/host1/scan=C2=BB.  With some luck = it'll=20 work...

If it does, most of the time udev will notice and alert = multipath to
check out the new device.  Sometimes it doesn't = work,=20 though - simply
run the =C2=ABmultipath=C2=BB command manually in = that=20 case.

By the way - the =C2=AB1=C2=BB in =C2=ABhost1=C2=BB maps = to the first digit in=20 =C2=AB1:0:1:0=C2=BB,
while the =C2=AB0 1 0=C2=BB in the echo = command to the last=20 three.

Regards
--
Tore Anderson

--
dm-devel = mailing=20 list
dm-devel@redhat.com
https://www.red= hat.com/mailman/listinfo/dm-devel ------=_NextPart_000_0680_01C810F8.E2A04FF0-- --===============0378461437== Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline --===============0378461437==--