From: "Gerald Nowitzky" <Nowitzky@igne.de>
To: device-mapper development <dm-devel@redhat.com>
Subject: Re: multibus / failover and EMC CX600
Date: Wed, 17 Oct 2007 16:48:38 +0200 [thread overview]
Message-ID: <066001c810cc$cd9f6f90$0a00a8c0@ALDI2> (raw)
In-Reply-To: 471600F7.5090607@linpro.no
[-- Attachment #1.1: Type: text/plain, Size: 9523 bytes --]
The mpath_prio_emc with group_by_prio did the trick. Thanks!
But I am still loosing the paths to the failed devices. I Increased dev_loss_tmo, but the maximum seems to be about 600 - thus, after 10 Minutes, the paths fail:
SANfile_m linux # multipath -l
hcfshare (360060160c820080063502869e459dc11) dm-0 ,
[size=3.4T][features=1 queue_if_no_path][hwhandler=1 emc]
\_ round-robin 0 [prio=0][enabled]
\_ #:#:#:# - #:# [failed][undef]
\_ #:#:#:# - #:# [failed][undef]
\_ round-robin 0 [prio=0][active]
\_ 2:0:0:0 sdd 8:48 [active][undef]
\_ 1:0:0:0 sdb 8:16 [active][undef]
If I put them online again, I run into the -EEXIST prob. Async SCSI scanning *is* off in my kernel, so the only thing I could do from here is to try the patch, is it?
Oct 17 17:26:34 SANfile_m kernel: scsi 1:0:1:0: rejecting I/O to dead device
Oct 17 17:26:34 SANfile_m multipathd: sdc: emc_clariion_checker: query command indicates error
Oct 17 17:26:35 SANfile_m kernel: scsi 2:0:1:0: rejecting I/O to dead device
Oct 17 17:26:35 SANfile_m multipathd: sde: emc_clariion_checker: query command indicates error
Oct 17 17:26:36 SANfile_m kernel: scsi 1:0:1:0: Direct-Access DGC RAID 5 0219 PQ: 0 ANSI: 4
Oct 17 17:26:36 SANfile_m kernel: sd 1:0:1:0: [sdf] Very big device. Trying to use READ CAPACITY(16).
Oct 17 17:26:36 SANfile_m kernel: sd 1:0:1:0: [sdf] 7263453184 512-byte hardware sectors (3718888 MB)
Oct 17 17:26:36 SANfile_m kernel: sd 1:0:1:0: [sdf] Test WP failed, assume Write Enabled
Oct 17 17:26:36 SANfile_m kernel: sd 1:0:1:0: [sdf] Asking for cache data failed
Oct 17 17:26:36 SANfile_m kernel: sd 1:0:1:0: [sdf] Assuming drive cache: write through
Oct 17 17:26:36 SANfile_m kernel: sd 1:0:1:0: [sdf] Very big device. Trying to use READ CAPACITY(16).
Oct 17 17:26:36 SANfile_m kernel: sd 1:0:1:0: [sdf] 7263453184 512-byte hardware sectors (3718888 MB)
Oct 17 17:26:36 SANfile_m kernel: sd 1:0:1:0: [sdf] Test WP failed, assume Write Enabled
Oct 17 17:26:36 SANfile_m kernel: sd 1:0:1:0: [sdf] Asking for cache data failed
Oct 17 17:26:36 SANfile_m kernel: sd 1:0:1:0: [sdf] Assuming drive cache: write through
Oct 17 17:26:36 SANfile_m kernel: sdf:<6>sd 1:0:1:0: [sdf] Device not ready: <6>: Sense Key : 0x2 [current]
Oct 17 17:26:36 SANfile_m kernel: : ASC=0x4 ASCQ=0x3
Oct 17 17:26:36 SANfile_m kernel: end_request: I/O error, dev sdf, sector 0
Oct 17 17:26:36 SANfile_m kernel: printk: 40 messages suppressed.
Oct 17 17:26:36 SANfile_m kernel: Buffer I/O error on device sdf, logical block 0
Oct 17 17:26:36 SANfile_m kernel: sd 1:0:1:0: [sdf] Device not ready: <6>: Sense Key : 0x2 [current]
Oct 17 17:26:36 SANfile_m kernel: : ASC=0x4 ASCQ=0x3
Oct 17 17:26:36 SANfile_m kernel: end_request: I/O error, dev sdf, sector 0
Oct 17 17:26:36 SANfile_m kernel: Buffer I/O error on device sdf, logical block 0
Oct 17 17:26:36 SANfile_m kernel: sd 1:0:1:0: [sdf] Device not ready: <6>: Sense Key : 0x2 [current]
Oct 17 17:26:36 SANfile_m kernel: : ASC=0x4 ASCQ=0x3
Oct 17 17:26:36 SANfile_m kernel: end_request: I/O error, dev sdf, sector 0
Oct 17 17:26:36 SANfile_m kernel: Buffer I/O error on device sdf, logical block 0
Oct 17 17:26:36 SANfile_m kernel: ldm_validate_partition_table(): Disk read failed.
Oct 17 17:26:36 SANfile_m kernel: sd 1:0:1:0: [sdf] Device not ready: <6>: Sense Key : 0x2 [current]
Oct 17 17:26:36 SANfile_m kernel: : ASC=0x4 ASCQ=0x3
Oct 17 17:26:36 SANfile_m kernel: end_request: I/O error, dev sdf, sector 0
Oct 17 17:26:36 SANfile_m kernel: Buffer I/O error on device sdf, logical block 0
Oct 17 17:26:36 SANfile_m kernel: unable to read partition table
Oct 17 17:26:36 SANfile_m kernel: sd 1:0:1:0: [sdf] Attached SCSI disk
Oct 17 17:26:36 SANfile_m kernel: sd 1:0:1:0: Attached scsi generic sg2 type 0
Oct 17 17:26:36 SANfile_m kernel: scsi 1:0:1:0: Direct-Access DGC RAID 5 0219 PQ: 0 ANSI: 4
Oct 17 17:26:36 SANfile_m kernel: kobject_add failed for 1:0:1:0 with -EEXIST, don't try to register things with the same name in the same directory.
Oct 17 17:26:36 SANfile_m kernel: [number+85/816] kobject_shadow_add+0x115/0x1b0
Oct 17 17:26:36 SANfile_m kernel: [<c02f95f5>] kobject_shadow_add+0x115/0x1b0
Oct 17 17:26:36 SANfile_m kernel: [lo_ioctl+1125/2528] device_add+0xc5/0x570
Oct 17 17:26:36 SANfile_m kernel: [<c03aefd5>] device_add+0xc5/0x570
Oct 17 17:26:36 SANfile_m kernel: [fc_remote_port_rolechg+127/320] scsi_adjust_queue_depth+0x9f/0xf0
Oct 17 17:26:36 SANfile_m kernel: [<c03f9d7f>] scsi_adjust_queue_depth+0x9f/0xf0
Oct 17 17:26:36 SANfile_m kernel: [blk_register_region+18/64] __blk_queue_init_tags+0x32/0x70
Oct 17 17:26:36 SANfile_m kernel: [<c02eeb72>] __blk_queue_init_tags+0x32/0x70
Oct 17 17:26:36 SANfile_m kernel: [sr_get_mcn+2/240] scsi_sysfs_add_sdev+0x32/0x230
Oct 17 17:26:36 SANfile_m kernel: [<c0402882>] scsi_sysfs_add_sdev+0x32/0x230
Oct 17 17:26:36 SANfile_m kernel: [<f99445b7>] qla2xxx_slave_configure+0x77/0x110 [qla2xxx]
Oct 17 17:26:36 SANfile_m kernel: [sd_init_command+313/1088] scsi_probe_and_add_lun+0x8c9/0x940
Oct 17 17:26:36 SANfile_m kernel: [<c0400859>] scsi_probe_and_add_lun+0x8c9/0x940
Oct 17 17:26:36 SANfile_m kernel: [sr_probe+72/1472] __scsi_scan_target+0x518/0x5c0
Oct 17 17:26:36 SANfile_m kernel: [<c04012c8>] __scsi_scan_target+0x518/0x5c0
Oct 17 17:26:36 SANfile_m kernel: [kallsyms_addresses+36259/130252] schedule+0x2df/0x940
Oct 17 17:26:36 SANfile_m kernel: [<c053695f>] schedule+0x2df/0x940
Oct 17 17:26:36 SANfile_m kernel: [sr_init_command+54/944] scsi_scan_target+0xb6/0xe0
Oct 17 17:26:36 SANfile_m kernel: [<c04019f6>] scsi_scan_target+0xb6/0xe0
Oct 17 17:26:36 SANfile_m kernel: [SendIocInit+224/784] fc_scsi_scan_rport+0x0/0x90
Oct 17 17:26:36 SANfile_m kernel: [<c04084b0>] fc_scsi_scan_rport+0x0/0x90
Oct 17 17:26:36 SANfile_m kernel: [SendIocInit+344/784] fc_scsi_scan_rport+0x78/0x90
Oct 17 17:26:36 SANfile_m kernel: [<c0408528>] fc_scsi_scan_rport+0x78/0x90
Oct 17 17:26:36 SANfile_m kernel: [run_workqueue+131/256] run_workqueue+0x73/0x100
Oct 17 17:26:36 SANfile_m kernel: [<c0131dc3>] run_workqueue+0x73/0x100
Oct 17 17:26:36 SANfile_m kernel: [autoremove_wake_function+16/80] autoremove_wake_function+0x0/0x50
Oct 17 17:26:36 SANfile_m kernel: [<c01354e0>] autoremove_wake_function+0x0/0x50
Oct 17 17:26:36 SANfile_m kernel: [worker_thread+172/256] worker_thread+0x9c/0x100
Oct 17 17:26:36 SANfile_m kernel: [<c01326dc>] worker_thread+0x9c/0x100
Oct 17 17:26:36 SANfile_m kernel: [autoremove_wake_function+16/80] autoremove_wake_function+0x0/0x50
Oct 17 17:26:36 SANfile_m kernel: [<c01354e0>] autoremove_wake_function+0x0/0x50
Oct 17 17:26:36 SANfile_m kernel: [worker_thread+16/256] worker_thread+0x0/0x100
Oct 17 17:26:36 SANfile_m kernel: [<c0132640>] worker_thread+0x0/0x100
Oct 17 17:26:36 SANfile_m kernel: [kthread+82/112] kthread+0x42/0x70
Oct 17 17:26:36 SANfile_m kernel: [<c0135212>] kthread+0x42/0x70
Oct 17 17:26:36 SANfile_m kernel: [kthread+16/112] kthread+0x0/0x70
Oct 17 17:26:36 SANfile_m kernel: [<c01351d0>] kthread+0x0/0x70
Oct 17 17:26:36 SANfile_m kernel: [print_trace_stack+3/16] kernel_thread_helper+0x7/0x14
Oct 17 17:26:36 SANfile_m kernel: [<c0104763>] kernel_thread_helper+0x7/0x14
Oct 17 17:26:36 SANfile_m kernel: =======================
Oct 17 17:26:36 SANfile_m kernel: error 1
Oct 17 17:26:36 SANfile_m kernel: scsi 1:0:1:0: Unexpected response from lun 0 while scanning, scan aborted
Oct 17 17:26:36 SANfile_m kernel: sd 1:0:1:0: [sdf] Device not ready: <6>: Sense Key : 0x2 [current]
Oct 17 17:26:36 SANfile_m kernel: : ASC=0x4 ASCQ=0x3
Oct 17 17:26:36 SANfile_m kernel: end_request: I/O error, dev sdf, sector 7263453056
Oct 17 17:26:36 SANfile_m kernel: Buffer I/O error on device sdf, logical block 907931632
Oct 17 17:26:36 SANfile_m kernel: sd 1:0:1:0: [sdf] Device not ready: <6>: Sense Key : 0x2 [current]
Oct 17 17:26:36 SANfile_m kernel: : ASC=0x4 ASCQ=0x3
Oct 17 17:26:36 SANfile_m kernel: end_request: I/O error, dev sdf, sector 7263453056
Oct 17 17:26:36 SANfile_m kernel: Buffer I/O error on device sdf, logical block 907931632
Oct 17 17:26:36 SANfile_m kernel: sd 1:0:1:0: [sdf] Device not ready: <6>: Sense Key : 0x2 [current]
Oct 17 17:26:36 SANfile_m kernel: : ASC=0x4 ASCQ=0x3
Is that what you refer as
----- Original Message -----
From: Tore Anderson
To: device-mapper development
Sent: Wednesday, October 17, 2007 2:32 PM
Subject: Re: [dm-devel] multibus / failover and EMC CX600
* Hannes Reinecke
> That's the dev_loss_tmo setting. Just increase it to something to
> your liking.
Oh, sweet. This knob won't affect how long the layer will hold I/O
before failing it (like lpfc_nodev_tmo), I assume? (I'm worried about
it taking longer for dm-multipath to detect failed paths).
I wish it could've been set to unlimited, though. Seems like there's
always some kind of trouble with re-adding the devices, either I run
into that -EEXIST bug, or udev doesn't do it's job properly and the
revived device isn't added back into the dm-multipath map. In addition
it somtimes breaks queue_if_no_path with earlier multipath-tools that
doesn't use no_flush on suspend. Those versions are of course included
in most server distributions... Sigh.
Regards
--
Tore Anderson
--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
[-- Attachment #1.2: Type: text/html, Size: 12371 bytes --]
[-- Attachment #2: Type: text/plain, Size: 0 bytes --]
next prev parent reply other threads:[~2007-10-17 14:48 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-10-17 10:23 multibus / failover and EMC CX600 Gerald Nowitzky
2007-10-17 10:40 ` Tore Anderson
2007-10-17 11:08 ` Hannes Reinecke
2007-10-17 12:32 ` Tore Anderson
2007-10-17 14:48 ` Gerald Nowitzky [this message]
2007-10-17 16:01 ` Tore Anderson
2007-10-17 18:04 ` Gerald Nowitzky
2007-10-18 6:19 ` Hannes Reinecke
2007-10-18 6:55 ` Gerald Nowitzky
2007-10-18 7:12 ` Hannes Reinecke
2007-10-18 8:07 ` Gerald Nowitzky
2007-10-19 22:35 ` David Strand
2007-10-17 19:38 ` Gerald Nowitzky
2007-10-18 6:01 ` Tore Anderson
2007-10-18 6:19 ` Tore Anderson
2007-10-17 19:49 ` Mike Christie
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='066001c810cc$cd9f6f90$0a00a8c0@ALDI2' \
--to=nowitzky@igne.de \
--cc=dm-devel@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.