* dm-rdac not working?
@ 2007-08-27 14:41 Tore Anderson
2007-08-27 17:33 ` Chandra Seetharaman
0 siblings, 1 reply; 6+ messages in thread
From: Tore Anderson @ 2007-08-27 14:41 UTC (permalink / raw)
To: dm-devel
I've got a Sun StorageTek 6140 (an Engenio 3994 with a Sun sticker),
which is an RDAC active/passive array. When I configure it to use RDAC
mode and set up my host (2.6.23-rc3, x86_64) to use the rdac hardware
handler, it doesn't really seem to work. I yank the two paths to the
active controller using "echo 1 > /sys/block/$dev/device/delete" while
generating heavy I/O and wait for a pg switch to happen, but all I get
in the logs is the following:
Aug 27 16:17:45 atalanta multipathd: sdj: remove path (uevent)
Aug 27 16:17:45 atalanta kernel: [264107.296729] sd 4:0:1:1: [sdj] Synchronizing SCSI cache
Aug 27 16:17:45 atalanta kernel: [264107.296894] device-mapper: multipath rdac: using RDAC command with timeout 6000
Aug 27 16:17:45 atalanta kernel: [264107.297271] device-mapper: multipath: Failing path 8:144.
Aug 27 16:17:47 atalanta multipathd: mysql: load table [0 41943040 multipath 0 1 rdac 2 1 round-robin 0 1 1 8:96 1000 round-robin 0 2 1 8:64 1000 8:112 1000]
Aug 27 16:18:10 atalanta multipathd: sdg: remove path (uevent)
Aug 27 16:18:10 atalanta kernel: [264131.733301] sd 3:0:1:1: [sdg] Synchronizing SCSI cache
Aug 27 16:18:10 atalanta kernel: [264131.733435] device-mapper: multipath rdac: using RDAC command with timeout 6000
Aug 27 16:18:10 atalanta kernel: [264131.759010] device-mapper: multipath: Failing path 8:96.
Aug 27 16:18:10 atalanta kernel: [264131.829770] device-mapper: multipath rdac: queueing MODE_SELECT command on 8:64
Aug 27 16:19:12 atalanta kernel: [264193.753980] device-mapper: multipath: Failing path 8:64.
Aug 27 16:19:12 atalanta kernel: [264193.754944] device-mapper: multipath rdac: queueing MODE_SELECT command on 8:112
Aug 27 16:20:14 atalanta kernel: [264255.632988] device-mapper: multipath: Failing path 8:112.
Aug 27 16:20:14 atalanta kernel: [264255.633032] printk: 12 messages suppressed.
Aug 27 16:20:14 atalanta kernel: [264255.633035] Buffer I/O error on device dm-4, logical block 1821127
Aug 27 16:20:14 atalanta kernel: [264255.670535] lost page write due to I/O error on dm-4
Aug 27 16:20:14 atalanta kernel: [264255.670545] Buffer I/O error on device dm-4, logical block 1821128
Aug 27 16:20:14 atalanta kernel: [264255.708064] lost page write due to I/O error on dm-4
Aug 27 16:20:14 atalanta kernel: [264255.708070] Buffer I/O error on device dm-4, logical block 1821129
Aug 27 16:20:14 atalanta kernel: [264255.745599] lost page write due to I/O error on dm-4
Aug 27 16:20:14 atalanta kernel: [264255.745608] Buffer I/O error on device dm-4, logical block 1821130
Aug 27 16:20:14 atalanta kernel: [264255.783080] lost page write due to I/O error on dm-4
Aug 27 16:20:14 atalanta kernel: [264255.783089] Buffer I/O error on device dm-4, logical block 1821131
Aug 27 16:20:14 atalanta kernel: [264255.820614] lost page write due to I/O error on dm-4
Aug 27 16:20:14 atalanta kernel: [264255.820622] Buffer I/O error on device dm-4, logical block 1821132
Aug 27 16:20:14 atalanta kernel: [264255.858158] lost page write due to I/O error on dm-4
Aug 27 16:20:14 atalanta kernel: [264255.858165] Buffer I/O error on device dm-4, logical block 1821133
Aug 27 16:20:14 atalanta kernel: [264255.895633] lost page write due to I/O error on dm-4
Aug 27 16:20:14 atalanta kernel: [264255.895639] Buffer I/O error on device dm-4, logical block 1821134
Aug 27 16:20:14 atalanta kernel: [264255.933099] lost page write due to I/O error on dm-4
Aug 27 16:20:14 atalanta kernel: [264255.933105] Buffer I/O error on device dm-4, logical block 1821135
Aug 27 16:20:14 atalanta kernel: [264255.970556] lost page write due to I/O error on dm-4
Aug 27 16:20:14 atalanta kernel: [264255.970562] Buffer I/O error on device dm-4, logical block 1821136
Aug 27 16:20:14 atalanta kernel: [264256.008010] lost page write due to I/O error on dm-4
Aug 27 16:20:14 atalanta kernel: [264256.061324] Aborting journal on device dm-4.
Aug 27 16:20:14 atalanta kernel: [264256.088223] journal commit I/O error
Aug 27 16:20:14 atalanta kernel: [264256.113352] __journal_remove_journal_head: freeing b_committed_data
Aug 27 16:20:14 atalanta kernel: [264256.113383] __journal_remove_journal_head: freeing b_committed_data
Aug 27 16:20:14 atalanta kernel: [264256.113401] __journal_remove_journal_head: freeing b_committed_data
Aug 27 16:20:14 atalanta kernel: [264256.113405] __journal_remove_journal_head: freeing b_committed_data
Aug 27 16:20:14 atalanta kernel: [264256.113409] __journal_remove_journal_head: freeing b_committed_data
Aug 27 16:20:14 atalanta kernel: [264256.113413] __journal_remove_journal_head: freeing b_committed_data
Aug 27 16:20:14 atalanta kernel: [264256.113418] __journal_remove_journal_head: freeing b_committed_data
Aug 27 16:20:14 atalanta kernel: [264256.113422] __journal_remove_journal_head: freeing b_committed_data
Aug 27 16:20:14 atalanta kernel: [264256.113425] __journal_remove_journal_head: freeing b_committed_data
Aug 27 16:20:14 atalanta kernel: [264256.113477] __journal_remove_journal_head: freeing b_committed_data
Aug 27 16:20:14 atalanta kernel: [264256.113541] __journal_remove_journal_head: freeing b_committed_data
Aug 27 16:20:14 atalanta kernel: [264256.113547] __journal_remove_journal_head: freeing b_committed_data
Aug 27 16:20:14 atalanta kernel: [264256.113557] __journal_remove_journal_head: freeing b_committed_data
Aug 27 16:20:14 atalanta kernel: [264256.113565] __journal_remove_journal_head: freeing b_committed_data
Aug 27 16:20:14 atalanta kernel: [264256.113583] __journal_remove_journal_head: freeing b_committed_data
Aug 27 16:20:14 atalanta kernel: [264256.113771] __journal_remove_journal_head: freeing b_committed_data
Aug 27 16:20:14 atalanta kernel: [264256.114328] WARNING: at fs/buffer.c:1154 mark_buffer_dirty()
Aug 27 16:20:14 atalanta kernel: [264256.114329]
Aug 27 16:20:14 atalanta kernel: [264256.114330] Call Trace:
Aug 27 16:20:14 atalanta kernel: [264256.114347] [mark_buffer_dirty+117/144] mark_buffer_dirty+0x75/0x90
Aug 27 16:20:14 atalanta kernel: [264256.114359] [_end+128966711/2129771920] :ext3:ext3_commit_super+0x57/0xa0
Aug 27 16:20:14 atalanta kernel: [264256.114364] [freeze_bdev+138/176] freeze_bdev+0x8a/0xb0
Aug 27 16:20:14 atalanta kernel: [264256.114374] [_end+128731225/2129771920] :dm_mod:dm_suspend+0x119/0x450
Aug 27 16:20:14 atalanta kernel: [264256.114379] [default_wake_function+0/16] default_wake_function+0x0/0x10
Aug 27 16:20:14 atalanta kernel: [264256.114383] [__up_write+49/368] __up_write+0x31/0x170
Aug 27 16:20:14 atalanta kernel: [264256.114390] [_end+128744272/2129771920] :dm_mod:dev_suspend+0x0/0x210
Aug 27 16:20:14 atalanta kernel: [264256.114396] [_end+128744602/2129771920] :dm_mod:dev_suspend+0x14a/0x210
Aug 27 16:20:14 atalanta kernel: [264256.114402] [_end+128742063/2129771920] :dm_mod:ctl_ioctl+0x1df/0x2e0
Aug 27 16:20:14 atalanta kernel: [264256.114406] [do_wp_page+964/1280] do_wp_page+0x3c4/0x500
Aug 27 16:20:14 atalanta kernel: [264256.114416] [do_ioctl+125/192] do_ioctl+0x7d/0xc0
Aug 27 16:20:14 atalanta kernel: [264256.114418] [vfs_ioctl+116/720] vfs_ioctl+0x74/0x2d0
Aug 27 16:20:14 atalanta kernel: [264256.114421] [sys_ioctl+149/176] sys_ioctl+0x95/0xb0
Aug 27 16:20:14 atalanta kernel: [264256.114425] [system_call+126/131] system_call+0x7e/0x83
Aug 27 16:20:14 atalanta kernel: [264256.114428]
Aug 27 16:20:14 atalanta kernel: [264256.128587] device-mapper: multipath rdac: queueing MODE_SELECT command on 8:64
Aug 27 16:21:16 atalanta kernel: [264318.014646] device-mapper: multipath: Failing path 8:64.
Aug 27 16:21:16 atalanta kernel: [264318.014987] device-mapper: multipath rdac: queueing MODE_SELECT command on 8:112
Aug 27 16:22:18 atalanta kernel: [264379.906125] device-mapper: multipath: Failing path 8:112.
Aug 27 16:22:18 atalanta kernel: [264379.906159] printk: 22947 messages suppressed.
Aug 27 16:22:18 atalanta kernel: [264379.906163] Buffer I/O error on device dm-4, logical block 0
Aug 27 16:22:18 atalanta kernel: [264379.940530] lost page write due to I/O error on dm-4
Aug 27 16:22:18 atalanta kernel: [264379.940562] ext3_abort called.
Aug 27 16:22:18 atalanta multipathd: mysql: load table [0 41943040 multipath 0 1 rdac 1 1 round-robin 0 2 1 8:64 1000 8:112 1000]
Aug 27 16:22:18 atalanta kernel: [264379.959340] EXT3-fs error (device dm-4): ext3_journal_start_sb: Detected aborted journal
Aug 27 16:22:18 atalanta kernel: [264380.008425] Remounting filesystem read-only
Aug 27 16:22:18 atalanta kernel: [264380.035388] device-mapper: multipath rdac: queueing MODE_SELECT command on 8:112
Aug 27 16:23:20 atalanta kernel: [264441.923876] device-mapper: multipath: Failing path 8:112.
Aug 27 16:23:20 atalanta multipathd: 8:112: mark as failed
Aug 27 16:23:20 atalanta multipathd: mysql: remaining active paths: 1
Aug 27 16:23:20 atalanta kernel: [264441.924715] device-mapper: multipath rdac: queueing MODE_SELECT command on 8:64
Aug 27 16:23:30 atalanta multipathd: 8:112: tur checker reports path is up
Aug 27 16:23:30 atalanta multipathd: 8:112: reinstated
Aug 27 16:23:30 atalanta multipathd: mysql: remaining active paths: 2
Aug 27 16:24:22 atalanta kernel: [264503.821431] device-mapper: multipath: Failing path 8:64.
Aug 27 16:24:22 atalanta multipathd: 8:64: mark as failed
Aug 27 16:24:22 atalanta multipathd: mysql: remaining active paths: 1
Aug 27 16:24:22 atalanta kernel: [264503.822237] device-mapper: multipath rdac: queueing MODE_SELECT command on 8:112
Aug 27 16:24:32 atalanta multipathd: 8:64: tur checker reports path is up
Aug 27 16:24:32 atalanta multipathd: 8:64: reinstated
Aug 27 16:24:32 atalanta multipathd: mysql: remaining active paths: 2
Aug 27 16:25:24 atalanta kernel: [264565.701064] device-mapper: multipath: Failing path 8:112.
Aug 27 16:25:24 atalanta kernel: [264565.701395] device-mapper: multipath rdac: queueing MODE_SELECT command on 8:64
Aug 27 16:25:24 atalanta multipathd: 8:112: mark as failed
Aug 27 16:25:24 atalanta multipathd: mysql: remaining active paths: 1
Aug 27 16:25:35 atalanta multipathd: 8:112: tur checker reports path is up
Aug 27 16:25:35 atalanta multipathd: 8:112: reinstated
Aug 27 16:25:35 atalanta multipathd: mysql: remaining active paths: 2
Aug 27 16:26:26 atalanta kernel: [264627.593359] device-mapper: multipath: Failing path 8:64.
Aug 27 16:26:26 atalanta kernel: [264627.594446] device-mapper: multipath rdac: queueing MODE_SELECT command on 8:112
Aug 27 16:26:26 atalanta multipathd: 8:64: mark as failed
Aug 27 16:26:26 atalanta multipathd: mysql: remaining active paths: 1
Aug 27 16:26:37 atalanta multipathd: 8:64: tur checker reports path is up
Aug 27 16:26:37 atalanta multipathd: 8:64: reinstated
Aug 27 16:26:37 atalanta multipathd: mysql: remaining active paths: 2
Aug 27 16:27:28 atalanta kernel: [264689.478050] device-mapper: multipath: Failing path 8:112.
Aug 27 16:27:28 atalanta kernel: [264689.478771] device-mapper: multipath rdac: queueing MODE_SELECT command on 8:64
Aug 27 16:27:28 atalanta multipathd: 8:112: mark as failed
Aug 27 16:27:28 atalanta multipathd: mysql: remaining active paths: 1
Aug 27 16:27:39 atalanta multipathd: 8:112: tur checker reports path is up
Aug 27 16:27:39 atalanta multipathd: 8:112: reinstated
Aug 27 16:27:39 atalanta multipathd: mysql: remaining active paths: 2
Aug 27 16:28:30 atalanta kernel: [264751.365417] device-mapper: multipath: Failing path 8:64.
Aug 27 16:28:30 atalanta kernel: [264751.366649] device-mapper: multipath rdac: queueing MODE_SELECT command on 8:112
Aug 27 16:28:30 atalanta multipathd: 8:64: mark as failed
Aug 27 16:28:30 atalanta multipathd: mysql: remaining active paths: 1
Aug 27 16:28:41 atalanta multipathd: 8:64: tur checker reports path is up
Aug 27 16:28:41 atalanta multipathd: 8:64: reinstated
Aug 27 16:28:41 atalanta multipathd: mysql: remaining active paths: 2
The last messages appear to be looping forever. The passive
controller never goes active, and there's nothing in the logs on the
array that indicates any attempt to move the volume. It seems like the
hardware handler is simply broken...
Another quite nasty thing is that even though I'm using
queue_if_no_path, I/O errors still made it up to the file system layer,
making it read-only. Isn't that exactly what queue_if_no_path is
supposed to prevent?
Regards
--
Tore Anderson
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: dm-rdac not working?
2007-08-27 14:41 dm-rdac not working? Tore Anderson
@ 2007-08-27 17:33 ` Chandra Seetharaman
2007-08-27 18:31 ` Tore Anderson
0 siblings, 1 reply; 6+ messages in thread
From: Chandra Seetharaman @ 2007-08-27 17:33 UTC (permalink / raw)
To: device-mapper development
On Mon, 2007-08-27 at 16:41 +0200, Tore Anderson wrote:
Looks like the MODE_SELECT command (to change the passive path to be
active) that is sent to the device is failing (for whatever reason) and
hence that path is not getting activated.
What version of multipath tools are you using ?
Can you attach your multipath.conf file. You should be using the rdac
path checker instead of the tur path checker.
Can you attach the o/p of "multipath -ll"
Look below for more comments.
> I've got a Sun StorageTek 6140 (an Engenio 3994 with a Sun sticker),
> which is an RDAC active/passive array. When I configure it to use RDAC
> mode and set up my host (2.6.23-rc3, x86_64) to use the rdac hardware
> handler, it doesn't really seem to work. I yank the two paths to the
> active controller using "echo 1 > /sys/block/$dev/device/delete" while
> generating heavy I/O and wait for a pg switch to happen, but all I get
> in the logs is the following:
>
> Aug 27 16:17:45 atalanta multipathd: sdj: remove path (uevent)
> Aug 27 16:17:45 atalanta kernel: [264107.296729] sd 4:0:1:1: [sdj] Synchronizing SCSI cache
> Aug 27 16:17:45 atalanta kernel: [264107.296894] device-mapper: multipath rdac: using RDAC command with timeout 6000
> Aug 27 16:17:45 atalanta kernel: [264107.297271] device-mapper: multipath: Failing path 8:144.
> Aug 27 16:17:47 atalanta multipathd: mysql: load table [0 41943040 multipath 0 1 rdac 2 1 round-robin 0 1 1 8:96 1000 round-robin 0 2 1 8:64 1000 8:112 1000]
> Aug 27 16:18:10 atalanta multipathd: sdg: remove path (uevent)
> Aug 27 16:18:10 atalanta kernel: [264131.733301] sd 3:0:1:1: [sdg] Synchronizing SCSI cache
> Aug 27 16:18:10 atalanta kernel: [264131.733435] device-mapper: multipath rdac: using RDAC command with timeout 6000
> Aug 27 16:18:10 atalanta kernel: [264131.759010] device-mapper: multipath: Failing path 8:96.
> Aug 27 16:18:10 atalanta kernel: [264131.829770] device-mapper: multipath rdac: queueing MODE_SELECT command on 8:64
> Aug 27 16:19:12 atalanta kernel: [264193.753980] device-mapper: multipath: Failing path 8:64.
> Aug 27 16:19:12 atalanta kernel: [264193.754944] device-mapper: multipath rdac: queueing MODE_SELECT command on 8:112
> Aug 27 16:20:14 atalanta kernel: [264255.632988] device-mapper: multipath: Failing path 8:112.
I presume 8:64 and 8:112 are the devices corresponding to other paths of
the device.
You can see that the MODE_SELECT command is sent and immediately the
path is failed (which means the MODE_SELECT command has failed).
And this the same thing that repeats below.
> Aug 27 16:20:14 atalanta kernel: [264255.633032] printk: 12 messages suppressed.
> Aug 27 16:20:14 atalanta kernel: [264255.633035] Buffer I/O error on device dm-4, logical block 1821127
> Aug 27 16:20:14 atalanta kernel: [264255.670535] lost page write due to I/O error on dm-4
> Aug 27 16:20:14 atalanta kernel: [264255.670545] Buffer I/O error on device dm-4, logical block 1821128
> Aug 27 16:20:14 atalanta kernel: [264255.708064] lost page write due to I/O error on dm-4
> Aug 27 16:20:14 atalanta kernel: [264255.708070] Buffer I/O error on device dm-4, logical block 1821129
> Aug 27 16:20:14 atalanta kernel: [264255.745599] lost page write due to I/O error on dm-4
> Aug 27 16:20:14 atalanta kernel: [264255.745608] Buffer I/O error on device dm-4, logical block 1821130
> Aug 27 16:20:14 atalanta kernel: [264255.783080] lost page write due to I/O error on dm-4
> Aug 27 16:20:14 atalanta kernel: [264255.783089] Buffer I/O error on device dm-4, logical block 1821131
> Aug 27 16:20:14 atalanta kernel: [264255.820614] lost page write due to I/O error on dm-4
> Aug 27 16:20:14 atalanta kernel: [264255.820622] Buffer I/O error on device dm-4, logical block 1821132
> Aug 27 16:20:14 atalanta kernel: [264255.858158] lost page write due to I/O error on dm-4
> Aug 27 16:20:14 atalanta kernel: [264255.858165] Buffer I/O error on device dm-4, logical block 1821133
> Aug 27 16:20:14 atalanta kernel: [264255.895633] lost page write due to I/O error on dm-4
> Aug 27 16:20:14 atalanta kernel: [264255.895639] Buffer I/O error on device dm-4, logical block 1821134
> Aug 27 16:20:14 atalanta kernel: [264255.933099] lost page write due to I/O error on dm-4
> Aug 27 16:20:14 atalanta kernel: [264255.933105] Buffer I/O error on device dm-4, logical block 1821135
> Aug 27 16:20:14 atalanta kernel: [264255.970556] lost page write due to I/O error on dm-4
> Aug 27 16:20:14 atalanta kernel: [264255.970562] Buffer I/O error on device dm-4, logical block 1821136
> Aug 27 16:20:14 atalanta kernel: [264256.008010] lost page write due to I/O error on dm-4
> Aug 27 16:20:14 atalanta kernel: [264256.061324] Aborting journal on device dm-4.
> Aug 27 16:20:14 atalanta kernel: [264256.088223] journal commit I/O error
> Aug 27 16:20:14 atalanta kernel: [264256.113352] __journal_remove_journal_head: freeing b_committed_data
> Aug 27 16:20:14 atalanta kernel: [264256.113383] __journal_remove_journal_head: freeing b_committed_data
> Aug 27 16:20:14 atalanta kernel: [264256.113401] __journal_remove_journal_head: freeing b_committed_data
> Aug 27 16:20:14 atalanta kernel: [264256.113405] __journal_remove_journal_head: freeing b_committed_data
> Aug 27 16:20:14 atalanta kernel: [264256.113409] __journal_remove_journal_head: freeing b_committed_data
> Aug 27 16:20:14 atalanta kernel: [264256.113413] __journal_remove_journal_head: freeing b_committed_data
> Aug 27 16:20:14 atalanta kernel: [264256.113418] __journal_remove_journal_head: freeing b_committed_data
> Aug 27 16:20:14 atalanta kernel: [264256.113422] __journal_remove_journal_head: freeing b_committed_data
> Aug 27 16:20:14 atalanta kernel: [264256.113425] __journal_remove_journal_head: freeing b_committed_data
> Aug 27 16:20:14 atalanta kernel: [264256.113477] __journal_remove_journal_head: freeing b_committed_data
> Aug 27 16:20:14 atalanta kernel: [264256.113541] __journal_remove_journal_head: freeing b_committed_data
> Aug 27 16:20:14 atalanta kernel: [264256.113547] __journal_remove_journal_head: freeing b_committed_data
> Aug 27 16:20:14 atalanta kernel: [264256.113557] __journal_remove_journal_head: freeing b_committed_data
> Aug 27 16:20:14 atalanta kernel: [264256.113565] __journal_remove_journal_head: freeing b_committed_data
> Aug 27 16:20:14 atalanta kernel: [264256.113583] __journal_remove_journal_head: freeing b_committed_data
> Aug 27 16:20:14 atalanta kernel: [264256.113771] __journal_remove_journal_head: freeing b_committed_data
> Aug 27 16:20:14 atalanta kernel: [264256.114328] WARNING: at fs/buffer.c:1154 mark_buffer_dirty()
> Aug 27 16:20:14 atalanta kernel: [264256.114329]
> Aug 27 16:20:14 atalanta kernel: [264256.114330] Call Trace:
> Aug 27 16:20:14 atalanta kernel: [264256.114347] [mark_buffer_dirty+117/144] mark_buffer_dirty+0x75/0x90
> Aug 27 16:20:14 atalanta kernel: [264256.114359] [_end+128966711/2129771920] :ext3:ext3_commit_super+0x57/0xa0
> Aug 27 16:20:14 atalanta kernel: [264256.114364] [freeze_bdev+138/176] freeze_bdev+0x8a/0xb0
> Aug 27 16:20:14 atalanta kernel: [264256.114374] [_end+128731225/2129771920] :dm_mod:dm_suspend+0x119/0x450
> Aug 27 16:20:14 atalanta kernel: [264256.114379] [default_wake_function+0/16] default_wake_function+0x0/0x10
> Aug 27 16:20:14 atalanta kernel: [264256.114383] [__up_write+49/368] __up_write+0x31/0x170
> Aug 27 16:20:14 atalanta kernel: [264256.114390] [_end+128744272/2129771920] :dm_mod:dev_suspend+0x0/0x210
> Aug 27 16:20:14 atalanta kernel: [264256.114396] [_end+128744602/2129771920] :dm_mod:dev_suspend+0x14a/0x210
> Aug 27 16:20:14 atalanta kernel: [264256.114402] [_end+128742063/2129771920] :dm_mod:ctl_ioctl+0x1df/0x2e0
> Aug 27 16:20:14 atalanta kernel: [264256.114406] [do_wp_page+964/1280] do_wp_page+0x3c4/0x500
> Aug 27 16:20:14 atalanta kernel: [264256.114416] [do_ioctl+125/192] do_ioctl+0x7d/0xc0
> Aug 27 16:20:14 atalanta kernel: [264256.114418] [vfs_ioctl+116/720] vfs_ioctl+0x74/0x2d0
> Aug 27 16:20:14 atalanta kernel: [264256.114421] [sys_ioctl+149/176] sys_ioctl+0x95/0xb0
> Aug 27 16:20:14 atalanta kernel: [264256.114425] [system_call+126/131] system_call+0x7e/0x83
> Aug 27 16:20:14 atalanta kernel: [264256.114428]
> Aug 27 16:20:14 atalanta kernel: [264256.128587] device-mapper: multipath rdac: queueing MODE_SELECT command on 8:64
> Aug 27 16:21:16 atalanta kernel: [264318.014646] device-mapper: multipath: Failing path 8:64.
> Aug 27 16:21:16 atalanta kernel: [264318.014987] device-mapper: multipath rdac: queueing MODE_SELECT command on 8:112
> Aug 27 16:22:18 atalanta kernel: [264379.906125] device-mapper: multipath: Failing path 8:112.
> Aug 27 16:22:18 atalanta kernel: [264379.906159] printk: 22947 messages suppressed.
> Aug 27 16:22:18 atalanta kernel: [264379.906163] Buffer I/O error on device dm-4, logical block 0
> Aug 27 16:22:18 atalanta kernel: [264379.940530] lost page write due to I/O error on dm-4
> Aug 27 16:22:18 atalanta kernel: [264379.940562] ext3_abort called.
> Aug 27 16:22:18 atalanta multipathd: mysql: load table [0 41943040 multipath 0 1 rdac 1 1 round-robin 0 2 1 8:64 1000 8:112 1000]
> Aug 27 16:22:18 atalanta kernel: [264379.959340] EXT3-fs error (device dm-4): ext3_journal_start_sb: Detected aborted journal
> Aug 27 16:22:18 atalanta kernel: [264380.008425] Remounting filesystem read-only
> Aug 27 16:22:18 atalanta kernel: [264380.035388] device-mapper: multipath rdac: queueing MODE_SELECT command on 8:112
> Aug 27 16:23:20 atalanta kernel: [264441.923876] device-mapper: multipath: Failing path 8:112.
> Aug 27 16:23:20 atalanta multipathd: 8:112: mark as failed
> Aug 27 16:23:20 atalanta multipathd: mysql: remaining active paths: 1
> Aug 27 16:23:20 atalanta kernel: [264441.924715] device-mapper: multipath rdac: queueing MODE_SELECT command on 8:64
> Aug 27 16:23:30 atalanta multipathd: 8:112: tur checker reports path is up
> Aug 27 16:23:30 atalanta multipathd: 8:112: reinstated
> Aug 27 16:23:30 atalanta multipathd: mysql: remaining active paths: 2
> Aug 27 16:24:22 atalanta kernel: [264503.821431] device-mapper: multipath: Failing path 8:64.
> Aug 27 16:24:22 atalanta multipathd: 8:64: mark as failed
> Aug 27 16:24:22 atalanta multipathd: mysql: remaining active paths: 1
> Aug 27 16:24:22 atalanta kernel: [264503.822237] device-mapper: multipath rdac: queueing MODE_SELECT command on 8:112
> Aug 27 16:24:32 atalanta multipathd: 8:64: tur checker reports path is up
> Aug 27 16:24:32 atalanta multipathd: 8:64: reinstated
> Aug 27 16:24:32 atalanta multipathd: mysql: remaining active paths: 2
> Aug 27 16:25:24 atalanta kernel: [264565.701064] device-mapper: multipath: Failing path 8:112.
> Aug 27 16:25:24 atalanta kernel: [264565.701395] device-mapper: multipath rdac: queueing MODE_SELECT command on 8:64
> Aug 27 16:25:24 atalanta multipathd: 8:112: mark as failed
> Aug 27 16:25:24 atalanta multipathd: mysql: remaining active paths: 1
> Aug 27 16:25:35 atalanta multipathd: 8:112: tur checker reports path is up
> Aug 27 16:25:35 atalanta multipathd: 8:112: reinstated
> Aug 27 16:25:35 atalanta multipathd: mysql: remaining active paths: 2
> Aug 27 16:26:26 atalanta kernel: [264627.593359] device-mapper: multipath: Failing path 8:64.
> Aug 27 16:26:26 atalanta kernel: [264627.594446] device-mapper: multipath rdac: queueing MODE_SELECT command on 8:112
> Aug 27 16:26:26 atalanta multipathd: 8:64: mark as failed
> Aug 27 16:26:26 atalanta multipathd: mysql: remaining active paths: 1
> Aug 27 16:26:37 atalanta multipathd: 8:64: tur checker reports path is up
> Aug 27 16:26:37 atalanta multipathd: 8:64: reinstated
> Aug 27 16:26:37 atalanta multipathd: mysql: remaining active paths: 2
> Aug 27 16:27:28 atalanta kernel: [264689.478050] device-mapper: multipath: Failing path 8:112.
> Aug 27 16:27:28 atalanta kernel: [264689.478771] device-mapper: multipath rdac: queueing MODE_SELECT command on 8:64
> Aug 27 16:27:28 atalanta multipathd: 8:112: mark as failed
> Aug 27 16:27:28 atalanta multipathd: mysql: remaining active paths: 1
> Aug 27 16:27:39 atalanta multipathd: 8:112: tur checker reports path is up
> Aug 27 16:27:39 atalanta multipathd: 8:112: reinstated
> Aug 27 16:27:39 atalanta multipathd: mysql: remaining active paths: 2
> Aug 27 16:28:30 atalanta kernel: [264751.365417] device-mapper: multipath: Failing path 8:64.
> Aug 27 16:28:30 atalanta kernel: [264751.366649] device-mapper: multipath rdac: queueing MODE_SELECT command on 8:112
> Aug 27 16:28:30 atalanta multipathd: 8:64: mark as failed
> Aug 27 16:28:30 atalanta multipathd: mysql: remaining active paths: 1
> Aug 27 16:28:41 atalanta multipathd: 8:64: tur checker reports path is up
> Aug 27 16:28:41 atalanta multipathd: 8:64: reinstated
> Aug 27 16:28:41 atalanta multipathd: mysql: remaining active paths: 2
>
> The last messages appear to be looping forever. The passive
> controller never goes active, and there's nothing in the logs on the
> array that indicates any attempt to move the volume. It seems like the
> hardware handler is simply broken...
>
> Another quite nasty thing is that even though I'm using
> queue_if_no_path, I/O errors still made it up to the file system layer,
> making it read-only. Isn't that exactly what queue_if_no_path is
> supposed to prevent?
>
> Regards
> --
> Tore Anderson
>
> --
> dm-devel mailing list
> dm-devel@redhat.com
> https://www.redhat.com/mailman/listinfo/dm-devel
--
----------------------------------------------------------------------
Chandra Seetharaman | Be careful what you choose....
- sekharan@us.ibm.com | .......you may get it.
----------------------------------------------------------------------
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: dm-rdac not working?
2007-08-27 17:33 ` Chandra Seetharaman
@ 2007-08-27 18:31 ` Tore Anderson
2007-08-27 20:08 ` Chandra Seetharaman
0 siblings, 1 reply; 6+ messages in thread
From: Tore Anderson @ 2007-08-27 18:31 UTC (permalink / raw)
To: sekharan, device-mapper development
[-- Attachment #1: Type: text/plain, Size: 2061 bytes --]
* Chandra Seetharaman
> What version of multipath tools are you using ?
0.4.7.
> Can you attach your multipath.conf file. You should be using the rdac
> path checker instead of the tur path checker.
Hmm, this was added in 0.4.8... What's the difference between the
rdac path checker and the tur checker? Anyway, it appears to me that
the problem here is with the kernel hardware handler, not in the
userspace path checker, wouldn't you agree? The hardware handler is
invoked unpon pg init as expected, but fails to do its job.
> Can you attach the o/p of "multipath -ll"
mysql (3600a0b80002984ae0000179c46a68843)
[size=20 GB][features=1 queue_if_no_path][hwhandler=1 rdac]
\_ round-robin 0 [prio=0][enabled]
\_ 3:0:0:1 sdd 8:48 [active][ready]
\_ 4:0:0:1 sdh 8:112 [active][ready]
\_ round-robin 0 [prio=6][active]
\_ 3:0:1:1 sdg 8:96 [active][ready]
\_ 4:0:1:1 sdj 8:144 [active][ready]
www (3600a0b80002984ae0000179b46a687fb)
[size=45 GB][features=1 queue_if_no_path][hwhandler=1 rdac]
\_ round-robin 0 [prio=0][enabled]
\_ 3:0:0:0 sdc 8:32 [active][ready]
\_ 4:0:0:0 sde 8:64 [active][ready]
\_ round-robin 0 [prio=6][active]
\_ 3:0:1:0 sdf 8:80 [active][ready]
\_ 4:0:1:0 sdi 8:128 [active][ready]
This is after the machine has booted, so things have changed around a
little. 3:0:0:1 and 4:0:0:0 has switched devices, but otherwise it
looks like it did when I generated the problem.
> I presume 8:64 and 8:112 are the devices corresponding to other paths of
> the device.
That's right.
> You can see that the MODE_SELECT command is sent and immediately the
> path is failed (which means the MODE_SELECT command has failed).
>
> And this the same thing that repeats below.
Yes. It appears to me it's sending that MODE_SELECT (which I guess is
like dm-emc's "trespass"?) command when it's trying to activate the
pg, but it simply doesn't work, so it's retrying over and over again,
but in vain. And after a while ext3 gets remounted r/o and it's game
over.
Regards
--
Tore Anderson
[-- Attachment #2: multipath.conf --]
[-- Type: text/plain, Size: 1024 bytes --]
# This file is managed by puppet, any local changes will be lost
defaults {
user_friendly_names yes
}
defaults {
multipath_tool "/sbin/multipath -v0"
udev_dir /dev
polling_interval 10
rr_wmin_io 100
}
blacklist {
devnode "^(ram|raw|loop|fd|md|dm-|sr|scd|st)[0-9]*"
devnode "^hd.*"
devnode "^cciss!c[0-9]d[0-9]*[p[0-9]*]"
device {
vendor IBM-ESXS
}
}
multipaths {
multipath {
wwid 3600a0b80002984ae0000179b46a687fb
alias www
}
multipath {
wwid 3600a0b80002984ae0000179c46a68843
alias mysql
}
}
devices {
device {
vendor DGC
product RAID.*
hardware_handler "1 emc"
prio_callout "/sbin/mpath_prio_emc /dev/%n"
path_checker emc_clariion
path_grouping_policy group_by_prio
failback immediate
no_path_retry queue
}
device {
vendor SUN
product CSM200_R
hardware_handler "1 rdac"
prio_callout "/usr/local/sbin/mpath_prio_rdac /dev/%n"
path_checker tur
path_grouping_policy group_by_serial
failback immediate
no_path_retry queue
}
}
[-- Attachment #3: Type: text/plain, Size: 0 bytes --]
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: dm-rdac not working?
2007-08-27 18:31 ` Tore Anderson
@ 2007-08-27 20:08 ` Chandra Seetharaman
[not found] ` <46D33BAA.8090807@linpro.no>
0 siblings, 1 reply; 6+ messages in thread
From: Chandra Seetharaman @ 2007-08-27 20:08 UTC (permalink / raw)
To: Tore Anderson; +Cc: device-mapper development
On Mon, 2007-08-27 at 20:31 +0200, Tore Anderson wrote:
> * Chandra Seetharaman
>
> > What version of multipath tools are you using ?
>
> 0.4.7.
>
> > Can you attach your multipath.conf file. You should be using the rdac
> > path checker instead of the tur path checker.
>
> Hmm, this was added in 0.4.8... What's the difference between the
> rdac path checker and the tur checker? Anyway, it appears to me that
tur checker just sends a test unit ready to see if the path is good.
Whereas rdac sends a c9 page inquiry and determines the state of the
path.
> the problem here is with the kernel hardware handler, not in the
> userspace path checker, wouldn't you agree? The hardware handler is
> invoked unpon pg init as expected, but fails to do its job.
That is true. But, I am wondering if what you have is based on the LSI
Engenio based controller (and hence the controller does not understand
the MODE_SELECT command).
What mode do you have your storage device configured in ? rdac or AVT ?
BTW, rdac path checker would show [ghost] instead of [ready] for the
passive path.
>
> > Can you attach the o/p of "multipath -ll"
>
> mysql (3600a0b80002984ae0000179c46a68843)
> [size=20 GB][features=1 queue_if_no_path][hwhandler=1 rdac]
> \_ round-robin 0 [prio=0][enabled]
> \_ 3:0:0:1 sdd 8:48 [active][ready]
> \_ 4:0:0:1 sdh 8:112 [active][ready]
> \_ round-robin 0 [prio=6][active]
> \_ 3:0:1:1 sdg 8:96 [active][ready]
> \_ 4:0:1:1 sdj 8:144 [active][ready]
Where did you get mpath_prio_rdac from ?
In 0.4.7 you could use mpath_prio_tpc instead (it behaves exactly as
mpath_prio_rdac in 0.4.8).
Can you apply the attached debug patch, repeat your test and send me the
log.
Thanks,
chandra
<snip>
--
----------------------------------------------------------------------
Chandra Seetharaman | Be careful what you choose....
- sekharan@us.ibm.com | .......you may get it.
----------------------------------------------------------------------
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: dm-rdac not working?
[not found] ` <46D33BAA.8090807@linpro.no>
@ 2007-08-27 21:26 ` Chandra Seetharaman
[not found] ` <46D3C099.7080504@linpro.no>
0 siblings, 1 reply; 6+ messages in thread
From: Chandra Seetharaman @ 2007-08-27 21:26 UTC (permalink / raw)
To: Tore Anderson; +Cc: device-mapper development
[-- Attachment #1: Type: text/plain, Size: 2120 bytes --]
On Mon, 2007-08-27 at 23:01 +0200, Tore Anderson wrote:
> * Chandra Seetharaman
>
> > tur checker just sends a test unit ready to see if the path is good.
> > Whereas rdac sends a c9 page inquiry and determines the state of the
> > path.
>
> Okay. I'm not really sure if I understand what practical difference
> there is between the two, though... The TUR check seems to fail like
> it's supposed to if the path goes bad.
In RDAC mode, tur will always fail on the passive path. You do not see
that ?
>
> > That is true. But, I am wondering if what you have is based on the
> > LSI Engenio based controller (and hence the controller does not
> > understand the MODE_SELECT command).
>
> It's an Engenio 3994 that came with a sticker on it that says Sun
> StorageTek 6140.
>
> > What mode do you have your storage device configured in ? rdac or AVT
> > ?
>
> Host-type "Linux", which means RDAC (AVT is mostly unusable for
> clusters because of all the unwanted volume transfers it causes). It's
> supposed to be used with LSI's RDAC 09.01.B2.xx driver, but that one
> doesn't work with recent kernels unfortunately.
>
> > Where did you get mpath_prio_rdac from ?
> >
> > In 0.4.7 you could use mpath_prio_tpc instead (it behaves exactly as
> > mpath_prio_rdac in 0.4.8).
>
> I got it from the 0.4.8 sources - mpath_prio_tpc complained a lot
> about AVT mode being disabled, which screwed up my multipath -ll output
> so I just got the new version instead.
You mean mpath_prio_tpc from 0.4.7 ? I use it all the time in both RDAC
and AVT mode with no issues.
>
> > Can you apply the attached debug patch, repeat your test and send me
> > the log.
>
> Sure, but I think you forgot to actually attach the patch... (Happens
> to me too all the time!)
oops :)... attached
>
> Regards
--
----------------------------------------------------------------------
Chandra Seetharaman | Be careful what you choose....
- sekharan@us.ibm.com | .......you may get it.
----------------------------------------------------------------------
[-- Attachment #2: debug --]
[-- Type: text/x-patch, Size: 617 bytes --]
Index: linux-2.6.22/drivers/md/dm-mpath-rdac.c
===================================================================
--- linux-2.6.22.orig/drivers/md/dm-mpath-rdac.c
+++ linux-2.6.22/drivers/md/dm-mpath-rdac.c
@@ -229,6 +229,13 @@ static void mode_select_endio(struct req
struct scsi_sense_hdr sense_hdr;
int sense = 0, fail = 0;
+ DMINFO("MODE_SELECT of %s returned error %d; host_byte 0x%x, "
+ "msg_byte 0x%x, status_byte 0x%x\n",
+ h->path->dev->name, error,
+ host_byte(req->errors),
+ msg_byte(req->errors),
+ status_byte(req->errors));
+
if (had_failures(req, error)) {
fail = 1;
goto failed;
[-- Attachment #3: Type: text/plain, Size: 0 bytes --]
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: dm-rdac not working?
[not found] ` <1188326877.12737.1.camel@linuxchandra>
@ 2007-08-29 7:10 ` Tore Anderson
0 siblings, 0 replies; 6+ messages in thread
From: Tore Anderson @ 2007-08-29 7:10 UTC (permalink / raw)
To: sekharan; +Cc: device-mapper development
* Chandra Seetharaman
> Do you see multiple failures like shown below during the device probe
> time ?
Yes, on all paths to the passive controller I get numerous I/O errors,
which happens when the kernel attempts to read in the partition table,
when LVM wants to look for PV signatures, and so on. This is the
behaviour I expect when using RDAC mode.
> Hmm. In my storage it is enabled. That is why I do not see that issue.
> May be multipath tools should read and ignore the stderr.
If AVT is enabled in your storage, you shouldn't need the RDAC
hardware handler. When dm-multipath switches pg and starts sending I/O
to the passive paths they should go live automatically, at least they
do so for me...
Bad thing about AVT is of course that when node X in a cluster boots,
the partition table scanning will transfer the volume, making the
active controller that e.g. nodes Y and Z was using heavily go passive,
disrupting I/O and making everything wobbly until node X has finished
its bootup procedure.
Regards
--
Tore Anderson
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2007-08-29 7:10 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-08-27 14:41 dm-rdac not working? Tore Anderson
2007-08-27 17:33 ` Chandra Seetharaman
2007-08-27 18:31 ` Tore Anderson
2007-08-27 20:08 ` Chandra Seetharaman
[not found] ` <46D33BAA.8090807@linpro.no>
2007-08-27 21:26 ` Chandra Seetharaman
[not found] ` <46D3C099.7080504@linpro.no>
[not found] ` <1188326877.12737.1.camel@linuxchandra>
2007-08-29 7:10 ` Tore Anderson
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.