* dm-rdac not working?
@ 2007-08-27 14:41 Tore Anderson
2007-08-27 17:33 ` Chandra Seetharaman
0 siblings, 1 reply; 6+ messages in thread
From: Tore Anderson @ 2007-08-27 14:41 UTC (permalink / raw)
To: dm-devel
I've got a Sun StorageTek 6140 (an Engenio 3994 with a Sun sticker),
which is an RDAC active/passive array. When I configure it to use RDAC
mode and set up my host (2.6.23-rc3, x86_64) to use the rdac hardware
handler, it doesn't really seem to work. I yank the two paths to the
active controller using "echo 1 > /sys/block/$dev/device/delete" while
generating heavy I/O and wait for a pg switch to happen, but all I get
in the logs is the following:
Aug 27 16:17:45 atalanta multipathd: sdj: remove path (uevent)
Aug 27 16:17:45 atalanta kernel: [264107.296729] sd 4:0:1:1: [sdj] Synchronizing SCSI cache
Aug 27 16:17:45 atalanta kernel: [264107.296894] device-mapper: multipath rdac: using RDAC command with timeout 6000
Aug 27 16:17:45 atalanta kernel: [264107.297271] device-mapper: multipath: Failing path 8:144.
Aug 27 16:17:47 atalanta multipathd: mysql: load table [0 41943040 multipath 0 1 rdac 2 1 round-robin 0 1 1 8:96 1000 round-robin 0 2 1 8:64 1000 8:112 1000]
Aug 27 16:18:10 atalanta multipathd: sdg: remove path (uevent)
Aug 27 16:18:10 atalanta kernel: [264131.733301] sd 3:0:1:1: [sdg] Synchronizing SCSI cache
Aug 27 16:18:10 atalanta kernel: [264131.733435] device-mapper: multipath rdac: using RDAC command with timeout 6000
Aug 27 16:18:10 atalanta kernel: [264131.759010] device-mapper: multipath: Failing path 8:96.
Aug 27 16:18:10 atalanta kernel: [264131.829770] device-mapper: multipath rdac: queueing MODE_SELECT command on 8:64
Aug 27 16:19:12 atalanta kernel: [264193.753980] device-mapper: multipath: Failing path 8:64.
Aug 27 16:19:12 atalanta kernel: [264193.754944] device-mapper: multipath rdac: queueing MODE_SELECT command on 8:112
Aug 27 16:20:14 atalanta kernel: [264255.632988] device-mapper: multipath: Failing path 8:112.
Aug 27 16:20:14 atalanta kernel: [264255.633032] printk: 12 messages suppressed.
Aug 27 16:20:14 atalanta kernel: [264255.633035] Buffer I/O error on device dm-4, logical block 1821127
Aug 27 16:20:14 atalanta kernel: [264255.670535] lost page write due to I/O error on dm-4
Aug 27 16:20:14 atalanta kernel: [264255.670545] Buffer I/O error on device dm-4, logical block 1821128
Aug 27 16:20:14 atalanta kernel: [264255.708064] lost page write due to I/O error on dm-4
Aug 27 16:20:14 atalanta kernel: [264255.708070] Buffer I/O error on device dm-4, logical block 1821129
Aug 27 16:20:14 atalanta kernel: [264255.745599] lost page write due to I/O error on dm-4
Aug 27 16:20:14 atalanta kernel: [264255.745608] Buffer I/O error on device dm-4, logical block 1821130
Aug 27 16:20:14 atalanta kernel: [264255.783080] lost page write due to I/O error on dm-4
Aug 27 16:20:14 atalanta kernel: [264255.783089] Buffer I/O error on device dm-4, logical block 1821131
Aug 27 16:20:14 atalanta kernel: [264255.820614] lost page write due to I/O error on dm-4
Aug 27 16:20:14 atalanta kernel: [264255.820622] Buffer I/O error on device dm-4, logical block 1821132
Aug 27 16:20:14 atalanta kernel: [264255.858158] lost page write due to I/O error on dm-4
Aug 27 16:20:14 atalanta kernel: [264255.858165] Buffer I/O error on device dm-4, logical block 1821133
Aug 27 16:20:14 atalanta kernel: [264255.895633] lost page write due to I/O error on dm-4
Aug 27 16:20:14 atalanta kernel: [264255.895639] Buffer I/O error on device dm-4, logical block 1821134
Aug 27 16:20:14 atalanta kernel: [264255.933099] lost page write due to I/O error on dm-4
Aug 27 16:20:14 atalanta kernel: [264255.933105] Buffer I/O error on device dm-4, logical block 1821135
Aug 27 16:20:14 atalanta kernel: [264255.970556] lost page write due to I/O error on dm-4
Aug 27 16:20:14 atalanta kernel: [264255.970562] Buffer I/O error on device dm-4, logical block 1821136
Aug 27 16:20:14 atalanta kernel: [264256.008010] lost page write due to I/O error on dm-4
Aug 27 16:20:14 atalanta kernel: [264256.061324] Aborting journal on device dm-4.
Aug 27 16:20:14 atalanta kernel: [264256.088223] journal commit I/O error
Aug 27 16:20:14 atalanta kernel: [264256.113352] __journal_remove_journal_head: freeing b_committed_data
Aug 27 16:20:14 atalanta kernel: [264256.113383] __journal_remove_journal_head: freeing b_committed_data
Aug 27 16:20:14 atalanta kernel: [264256.113401] __journal_remove_journal_head: freeing b_committed_data
Aug 27 16:20:14 atalanta kernel: [264256.113405] __journal_remove_journal_head: freeing b_committed_data
Aug 27 16:20:14 atalanta kernel: [264256.113409] __journal_remove_journal_head: freeing b_committed_data
Aug 27 16:20:14 atalanta kernel: [264256.113413] __journal_remove_journal_head: freeing b_committed_data
Aug 27 16:20:14 atalanta kernel: [264256.113418] __journal_remove_journal_head: freeing b_committed_data
Aug 27 16:20:14 atalanta kernel: [264256.113422] __journal_remove_journal_head: freeing b_committed_data
Aug 27 16:20:14 atalanta kernel: [264256.113425] __journal_remove_journal_head: freeing b_committed_data
Aug 27 16:20:14 atalanta kernel: [264256.113477] __journal_remove_journal_head: freeing b_committed_data
Aug 27 16:20:14 atalanta kernel: [264256.113541] __journal_remove_journal_head: freeing b_committed_data
Aug 27 16:20:14 atalanta kernel: [264256.113547] __journal_remove_journal_head: freeing b_committed_data
Aug 27 16:20:14 atalanta kernel: [264256.113557] __journal_remove_journal_head: freeing b_committed_data
Aug 27 16:20:14 atalanta kernel: [264256.113565] __journal_remove_journal_head: freeing b_committed_data
Aug 27 16:20:14 atalanta kernel: [264256.113583] __journal_remove_journal_head: freeing b_committed_data
Aug 27 16:20:14 atalanta kernel: [264256.113771] __journal_remove_journal_head: freeing b_committed_data
Aug 27 16:20:14 atalanta kernel: [264256.114328] WARNING: at fs/buffer.c:1154 mark_buffer_dirty()
Aug 27 16:20:14 atalanta kernel: [264256.114329]
Aug 27 16:20:14 atalanta kernel: [264256.114330] Call Trace:
Aug 27 16:20:14 atalanta kernel: [264256.114347] [mark_buffer_dirty+117/144] mark_buffer_dirty+0x75/0x90
Aug 27 16:20:14 atalanta kernel: [264256.114359] [_end+128966711/2129771920] :ext3:ext3_commit_super+0x57/0xa0
Aug 27 16:20:14 atalanta kernel: [264256.114364] [freeze_bdev+138/176] freeze_bdev+0x8a/0xb0
Aug 27 16:20:14 atalanta kernel: [264256.114374] [_end+128731225/2129771920] :dm_mod:dm_suspend+0x119/0x450
Aug 27 16:20:14 atalanta kernel: [264256.114379] [default_wake_function+0/16] default_wake_function+0x0/0x10
Aug 27 16:20:14 atalanta kernel: [264256.114383] [__up_write+49/368] __up_write+0x31/0x170
Aug 27 16:20:14 atalanta kernel: [264256.114390] [_end+128744272/2129771920] :dm_mod:dev_suspend+0x0/0x210
Aug 27 16:20:14 atalanta kernel: [264256.114396] [_end+128744602/2129771920] :dm_mod:dev_suspend+0x14a/0x210
Aug 27 16:20:14 atalanta kernel: [264256.114402] [_end+128742063/2129771920] :dm_mod:ctl_ioctl+0x1df/0x2e0
Aug 27 16:20:14 atalanta kernel: [264256.114406] [do_wp_page+964/1280] do_wp_page+0x3c4/0x500
Aug 27 16:20:14 atalanta kernel: [264256.114416] [do_ioctl+125/192] do_ioctl+0x7d/0xc0
Aug 27 16:20:14 atalanta kernel: [264256.114418] [vfs_ioctl+116/720] vfs_ioctl+0x74/0x2d0
Aug 27 16:20:14 atalanta kernel: [264256.114421] [sys_ioctl+149/176] sys_ioctl+0x95/0xb0
Aug 27 16:20:14 atalanta kernel: [264256.114425] [system_call+126/131] system_call+0x7e/0x83
Aug 27 16:20:14 atalanta kernel: [264256.114428]
Aug 27 16:20:14 atalanta kernel: [264256.128587] device-mapper: multipath rdac: queueing MODE_SELECT command on 8:64
Aug 27 16:21:16 atalanta kernel: [264318.014646] device-mapper: multipath: Failing path 8:64.
Aug 27 16:21:16 atalanta kernel: [264318.014987] device-mapper: multipath rdac: queueing MODE_SELECT command on 8:112
Aug 27 16:22:18 atalanta kernel: [264379.906125] device-mapper: multipath: Failing path 8:112.
Aug 27 16:22:18 atalanta kernel: [264379.906159] printk: 22947 messages suppressed.
Aug 27 16:22:18 atalanta kernel: [264379.906163] Buffer I/O error on device dm-4, logical block 0
Aug 27 16:22:18 atalanta kernel: [264379.940530] lost page write due to I/O error on dm-4
Aug 27 16:22:18 atalanta kernel: [264379.940562] ext3_abort called.
Aug 27 16:22:18 atalanta multipathd: mysql: load table [0 41943040 multipath 0 1 rdac 1 1 round-robin 0 2 1 8:64 1000 8:112 1000]
Aug 27 16:22:18 atalanta kernel: [264379.959340] EXT3-fs error (device dm-4): ext3_journal_start_sb: Detected aborted journal
Aug 27 16:22:18 atalanta kernel: [264380.008425] Remounting filesystem read-only
Aug 27 16:22:18 atalanta kernel: [264380.035388] device-mapper: multipath rdac: queueing MODE_SELECT command on 8:112
Aug 27 16:23:20 atalanta kernel: [264441.923876] device-mapper: multipath: Failing path 8:112.
Aug 27 16:23:20 atalanta multipathd: 8:112: mark as failed
Aug 27 16:23:20 atalanta multipathd: mysql: remaining active paths: 1
Aug 27 16:23:20 atalanta kernel: [264441.924715] device-mapper: multipath rdac: queueing MODE_SELECT command on 8:64
Aug 27 16:23:30 atalanta multipathd: 8:112: tur checker reports path is up
Aug 27 16:23:30 atalanta multipathd: 8:112: reinstated
Aug 27 16:23:30 atalanta multipathd: mysql: remaining active paths: 2
Aug 27 16:24:22 atalanta kernel: [264503.821431] device-mapper: multipath: Failing path 8:64.
Aug 27 16:24:22 atalanta multipathd: 8:64: mark as failed
Aug 27 16:24:22 atalanta multipathd: mysql: remaining active paths: 1
Aug 27 16:24:22 atalanta kernel: [264503.822237] device-mapper: multipath rdac: queueing MODE_SELECT command on 8:112
Aug 27 16:24:32 atalanta multipathd: 8:64: tur checker reports path is up
Aug 27 16:24:32 atalanta multipathd: 8:64: reinstated
Aug 27 16:24:32 atalanta multipathd: mysql: remaining active paths: 2
Aug 27 16:25:24 atalanta kernel: [264565.701064] device-mapper: multipath: Failing path 8:112.
Aug 27 16:25:24 atalanta kernel: [264565.701395] device-mapper: multipath rdac: queueing MODE_SELECT command on 8:64
Aug 27 16:25:24 atalanta multipathd: 8:112: mark as failed
Aug 27 16:25:24 atalanta multipathd: mysql: remaining active paths: 1
Aug 27 16:25:35 atalanta multipathd: 8:112: tur checker reports path is up
Aug 27 16:25:35 atalanta multipathd: 8:112: reinstated
Aug 27 16:25:35 atalanta multipathd: mysql: remaining active paths: 2
Aug 27 16:26:26 atalanta kernel: [264627.593359] device-mapper: multipath: Failing path 8:64.
Aug 27 16:26:26 atalanta kernel: [264627.594446] device-mapper: multipath rdac: queueing MODE_SELECT command on 8:112
Aug 27 16:26:26 atalanta multipathd: 8:64: mark as failed
Aug 27 16:26:26 atalanta multipathd: mysql: remaining active paths: 1
Aug 27 16:26:37 atalanta multipathd: 8:64: tur checker reports path is up
Aug 27 16:26:37 atalanta multipathd: 8:64: reinstated
Aug 27 16:26:37 atalanta multipathd: mysql: remaining active paths: 2
Aug 27 16:27:28 atalanta kernel: [264689.478050] device-mapper: multipath: Failing path 8:112.
Aug 27 16:27:28 atalanta kernel: [264689.478771] device-mapper: multipath rdac: queueing MODE_SELECT command on 8:64
Aug 27 16:27:28 atalanta multipathd: 8:112: mark as failed
Aug 27 16:27:28 atalanta multipathd: mysql: remaining active paths: 1
Aug 27 16:27:39 atalanta multipathd: 8:112: tur checker reports path is up
Aug 27 16:27:39 atalanta multipathd: 8:112: reinstated
Aug 27 16:27:39 atalanta multipathd: mysql: remaining active paths: 2
Aug 27 16:28:30 atalanta kernel: [264751.365417] device-mapper: multipath: Failing path 8:64.
Aug 27 16:28:30 atalanta kernel: [264751.366649] device-mapper: multipath rdac: queueing MODE_SELECT command on 8:112
Aug 27 16:28:30 atalanta multipathd: 8:64: mark as failed
Aug 27 16:28:30 atalanta multipathd: mysql: remaining active paths: 1
Aug 27 16:28:41 atalanta multipathd: 8:64: tur checker reports path is up
Aug 27 16:28:41 atalanta multipathd: 8:64: reinstated
Aug 27 16:28:41 atalanta multipathd: mysql: remaining active paths: 2
The last messages appear to be looping forever. The passive
controller never goes active, and there's nothing in the logs on the
array that indicates any attempt to move the volume. It seems like the
hardware handler is simply broken...
Another quite nasty thing is that even though I'm using
queue_if_no_path, I/O errors still made it up to the file system layer,
making it read-only. Isn't that exactly what queue_if_no_path is
supposed to prevent?
Regards
--
Tore Anderson
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: dm-rdac not working?
2007-08-27 14:41 dm-rdac not working? Tore Anderson
@ 2007-08-27 17:33 ` Chandra Seetharaman
2007-08-27 18:31 ` Tore Anderson
0 siblings, 1 reply; 6+ messages in thread
From: Chandra Seetharaman @ 2007-08-27 17:33 UTC (permalink / raw)
To: device-mapper development
On Mon, 2007-08-27 at 16:41 +0200, Tore Anderson wrote:
Looks like the MODE_SELECT command (to change the passive path to be
active) that is sent to the device is failing (for whatever reason) and
hence that path is not getting activated.
What version of multipath tools are you using ?
Can you attach your multipath.conf file. You should be using the rdac
path checker instead of the tur path checker.
Can you attach the o/p of "multipath -ll"
Look below for more comments.
> I've got a Sun StorageTek 6140 (an Engenio 3994 with a Sun sticker),
> which is an RDAC active/passive array. When I configure it to use RDAC
> mode and set up my host (2.6.23-rc3, x86_64) to use the rdac hardware
> handler, it doesn't really seem to work. I yank the two paths to the
> active controller using "echo 1 > /sys/block/$dev/device/delete" while
> generating heavy I/O and wait for a pg switch to happen, but all I get
> in the logs is the following:
>
> Aug 27 16:17:45 atalanta multipathd: sdj: remove path (uevent)
> Aug 27 16:17:45 atalanta kernel: [264107.296729] sd 4:0:1:1: [sdj] Synchronizing SCSI cache
> Aug 27 16:17:45 atalanta kernel: [264107.296894] device-mapper: multipath rdac: using RDAC command with timeout 6000
> Aug 27 16:17:45 atalanta kernel: [264107.297271] device-mapper: multipath: Failing path 8:144.
> Aug 27 16:17:47 atalanta multipathd: mysql: load table [0 41943040 multipath 0 1 rdac 2 1 round-robin 0 1 1 8:96 1000 round-robin 0 2 1 8:64 1000 8:112 1000]
> Aug 27 16:18:10 atalanta multipathd: sdg: remove path (uevent)
> Aug 27 16:18:10 atalanta kernel: [264131.733301] sd 3:0:1:1: [sdg] Synchronizing SCSI cache
> Aug 27 16:18:10 atalanta kernel: [264131.733435] device-mapper: multipath rdac: using RDAC command with timeout 6000
> Aug 27 16:18:10 atalanta kernel: [264131.759010] device-mapper: multipath: Failing path 8:96.
> Aug 27 16:18:10 atalanta kernel: [264131.829770] device-mapper: multipath rdac: queueing MODE_SELECT command on 8:64
> Aug 27 16:19:12 atalanta kernel: [264193.753980] device-mapper: multipath: Failing path 8:64.
> Aug 27 16:19:12 atalanta kernel: [264193.754944] device-mapper: multipath rdac: queueing MODE_SELECT command on 8:112
> Aug 27 16:20:14 atalanta kernel: [264255.632988] device-mapper: multipath: Failing path 8:112.
I presume 8:64 and 8:112 are the devices corresponding to other paths of
the device.
You can see that the MODE_SELECT command is sent and immediately the
path is failed (which means the MODE_SELECT command has failed).
And this the same thing that repeats below.
> Aug 27 16:20:14 atalanta kernel: [264255.633032] printk: 12 messages suppressed.
> Aug 27 16:20:14 atalanta kernel: [264255.633035] Buffer I/O error on device dm-4, logical block 1821127
> Aug 27 16:20:14 atalanta kernel: [264255.670535] lost page write due to I/O error on dm-4
> Aug 27 16:20:14 atalanta kernel: [264255.670545] Buffer I/O error on device dm-4, logical block 1821128
> Aug 27 16:20:14 atalanta kernel: [264255.708064] lost page write due to I/O error on dm-4
> Aug 27 16:20:14 atalanta kernel: [264255.708070] Buffer I/O error on device dm-4, logical block 1821129
> Aug 27 16:20:14 atalanta kernel: [264255.745599] lost page write due to I/O error on dm-4
> Aug 27 16:20:14 atalanta kernel: [264255.745608] Buffer I/O error on device dm-4, logical block 1821130
> Aug 27 16:20:14 atalanta kernel: [264255.783080] lost page write due to I/O error on dm-4
> Aug 27 16:20:14 atalanta kernel: [264255.783089] Buffer I/O error on device dm-4, logical block 1821131
> Aug 27 16:20:14 atalanta kernel: [264255.820614] lost page write due to I/O error on dm-4
> Aug 27 16:20:14 atalanta kernel: [264255.820622] Buffer I/O error on device dm-4, logical block 1821132
> Aug 27 16:20:14 atalanta kernel: [264255.858158] lost page write due to I/O error on dm-4
> Aug 27 16:20:14 atalanta kernel: [264255.858165] Buffer I/O error on device dm-4, logical block 1821133
> Aug 27 16:20:14 atalanta kernel: [264255.895633] lost page write due to I/O error on dm-4
> Aug 27 16:20:14 atalanta kernel: [264255.895639] Buffer I/O error on device dm-4, logical block 1821134
> Aug 27 16:20:14 atalanta kernel: [264255.933099] lost page write due to I/O error on dm-4
> Aug 27 16:20:14 atalanta kernel: [264255.933105] Buffer I/O error on device dm-4, logical block 1821135
> Aug 27 16:20:14 atalanta kernel: [264255.970556] lost page write due to I/O error on dm-4
> Aug 27 16:20:14 atalanta kernel: [264255.970562] Buffer I/O error on device dm-4, logical block 1821136
> Aug 27 16:20:14 atalanta kernel: [264256.008010] lost page write due to I/O error on dm-4
> Aug 27 16:20:14 atalanta kernel: [264256.061324] Aborting journal on device dm-4.
> Aug 27 16:20:14 atalanta kernel: [264256.088223] journal commit I/O error
> Aug 27 16:20:14 atalanta kernel: [264256.113352] __journal_remove_journal_head: freeing b_committed_data
> Aug 27 16:20:14 atalanta kernel: [264256.113383] __journal_remove_journal_head: freeing b_committed_data
> Aug 27 16:20:14 atalanta kernel: [264256.113401] __journal_remove_journal_head: freeing b_committed_data
> Aug 27 16:20:14 atalanta kernel: [264256.113405] __journal_remove_journal_head: freeing b_committed_data
> Aug 27 16:20:14 atalanta kernel: [264256.113409] __journal_remove_journal_head: freeing b_committed_data
> Aug 27 16:20:14 atalanta kernel: [264256.113413] __journal_remove_journal_head: freeing b_committed_data
> Aug 27 16:20:14 atalanta kernel: [264256.113418] __journal_remove_journal_head: freeing b_committed_data
> Aug 27 16:20:14 atalanta kernel: [264256.113422] __journal_remove_journal_head: freeing b_committed_data
> Aug 27 16:20:14 atalanta kernel: [264256.113425] __journal_remove_journal_head: freeing b_committed_data
> Aug 27 16:20:14 atalanta kernel: [264256.113477] __journal_remove_journal_head: freeing b_committed_data
> Aug 27 16:20:14 atalanta kernel: [264256.113541] __journal_remove_journal_head: freeing b_committed_data
> Aug 27 16:20:14 atalanta kernel: [264256.113547] __journal_remove_journal_head: freeing b_committed_data
> Aug 27 16:20:14 atalanta kernel: [264256.113557] __journal_remove_journal_head: freeing b_committed_data
> Aug 27 16:20:14 atalanta kernel: [264256.113565] __journal_remove_journal_head: freeing b_committed_data
> Aug 27 16:20:14 atalanta kernel: [264256.113583] __journal_remove_journal_head: freeing b_committed_data
> Aug 27 16:20:14 atalanta kernel: [264256.113771] __journal_remove_journal_head: freeing b_committed_data
> Aug 27 16:20:14 atalanta kernel: [264256.114328] WARNING: at fs/buffer.c:1154 mark_buffer_dirty()
> Aug 27 16:20:14 atalanta kernel: [264256.114329]
> Aug 27 16:20:14 atalanta kernel: [264256.114330] Call Trace:
> Aug 27 16:20:14 atalanta kernel: [264256.114347] [mark_buffer_dirty+117/144] mark_buffer_dirty+0x75/0x90
> Aug 27 16:20:14 atalanta kernel: [264256.114359] [_end+128966711/2129771920] :ext3:ext3_commit_super+0x57/0xa0
> Aug 27 16:20:14 atalanta kernel: [264256.114364] [freeze_bdev+138/176] freeze_bdev+0x8a/0xb0
> Aug 27 16:20:14 atalanta kernel: [264256.114374] [_end+128731225/2129771920] :dm_mod:dm_suspend+0x119/0x450
> Aug 27 16:20:14 atalanta kernel: [264256.114379] [default_wake_function+0/16] default_wake_function+0x0/0x10
> Aug 27 16:20:14 atalanta kernel: [264256.114383] [__up_write+49/368] __up_write+0x31/0x170
> Aug 27 16:20:14 atalanta kernel: [264256.114390] [_end+128744272/2129771920] :dm_mod:dev_suspend+0x0/0x210
> Aug 27 16:20:14 atalanta kernel: [264256.114396] [_end+128744602/2129771920] :dm_mod:dev_suspend+0x14a/0x210
> Aug 27 16:20:14 atalanta kernel: [264256.114402] [_end+128742063/2129771920] :dm_mod:ctl_ioctl+0x1df/0x2e0
> Aug 27 16:20:14 atalanta kernel: [264256.114406] [do_wp_page+964/1280] do_wp_page+0x3c4/0x500
> Aug 27 16:20:14 atalanta kernel: [264256.114416] [do_ioctl+125/192] do_ioctl+0x7d/0xc0
> Aug 27 16:20:14 atalanta kernel: [264256.114418] [vfs_ioctl+116/720] vfs_ioctl+0x74/0x2d0
> Aug 27 16:20:14 atalanta kernel: [264256.114421] [sys_ioctl+149/176] sys_ioctl+0x95/0xb0
> Aug 27 16:20:14 atalanta kernel: [264256.114425] [system_call+126/131] system_call+0x7e/0x83
> Aug 27 16:20:14 atalanta kernel: [264256.114428]
> Aug 27 16:20:14 atalanta kernel: [264256.128587] device-mapper: multipath rdac: queueing MODE_SELECT command on 8:64
> Aug 27 16:21:16 atalanta kernel: [264318.014646] device-mapper: multipath: Failing path 8:64.
> Aug 27 16:21:16 atalanta kernel: [264318.014987] device-mapper: multipath rdac: queueing MODE_SELECT command on 8:112
> Aug 27 16:22:18 atalanta kernel: [264379.906125] device-mapper: multipath: Failing path 8:112.
> Aug 27 16:22:18 atalanta kernel: [264379.906159] printk: 22947 messages suppressed.
> Aug 27 16:22:18 atalanta kernel: [264379.906163] Buffer I/O error on device dm-4, logical block 0
> Aug 27 16:22:18 atalanta kernel: [264379.940530] lost page write due to I/O error on dm-4
> Aug 27 16:22:18 atalanta kernel: [264379.940562] ext3_abort called.
> Aug 27 16:22:18 atalanta multipathd: mysql: load table [0 41943040 multipath 0 1 rdac 1 1 round-robin 0 2 1 8:64 1000 8:112 1000]
> Aug 27 16:22:18 atalanta kernel: [264379.959340] EXT3-fs error (device dm-4): ext3_journal_start_sb: Detected aborted journal
> Aug 27 16:22:18 atalanta kernel: [264380.008425] Remounting filesystem read-only
> Aug 27 16:22:18 atalanta kernel: [264380.035388] device-mapper: multipath rdac: queueing MODE_SELECT command on 8:112
> Aug 27 16:23:20 atalanta kernel: [264441.923876] device-mapper: multipath: Failing path 8:112.
> Aug 27 16:23:20 atalanta multipathd: 8:112: mark as failed
> Aug 27 16:23:20 atalanta multipathd: mysql: remaining active paths: 1
> Aug 27 16:23:20 atalanta kernel: [264441.924715] device-mapper: multipath rdac: queueing MODE_SELECT command on 8:64
> Aug 27 16:23:30 atalanta multipathd: 8:112: tur checker reports path is up
> Aug 27 16:23:30 atalanta multipathd: 8:112: reinstated
> Aug 27 16:23:30 atalanta multipathd: mysql: remaining active paths: 2
> Aug 27 16:24:22 atalanta kernel: [264503.821431] device-mapper: multipath: Failing path 8:64.
> Aug 27 16:24:22 atalanta multipathd: 8:64: mark as failed
> Aug 27 16:24:22 atalanta multipathd: mysql: remaining active paths: 1
> Aug 27 16:24:22 atalanta kernel: [264503.822237] device-mapper: multipath rdac: queueing MODE_SELECT command on 8:112
> Aug 27 16:24:32 atalanta multipathd: 8:64: tur checker reports path is up
> Aug 27 16:24:32 atalanta multipathd: 8:64: reinstated
> Aug 27 16:24:32 atalanta multipathd: mysql: remaining active paths: 2
> Aug 27 16:25:24 atalanta kernel: [264565.701064] device-mapper: multipath: Failing path 8:112.
> Aug 27 16:25:24 atalanta kernel: [264565.701395] device-mapper: multipath rdac: queueing MODE_SELECT command on 8:64
> Aug 27 16:25:24 atalanta multipathd: 8:112: mark as failed
> Aug 27 16:25:24 atalanta multipathd: mysql: remaining active paths: 1
> Aug 27 16:25:35 atalanta multipathd: 8:112: tur checker reports path is up
> Aug 27 16:25:35 atalanta multipathd: 8:112: reinstated
> Aug 27 16:25:35 atalanta multipathd: mysql: remaining active paths: 2
> Aug 27 16:26:26 atalanta kernel: [264627.593359] device-mapper: multipath: Failing path 8:64.
> Aug 27 16:26:26 atalanta kernel: [264627.594446] device-mapper: multipath rdac: queueing MODE_SELECT command on 8:112
> Aug 27 16:26:26 atalanta multipathd: 8:64: mark as failed
> Aug 27 16:26:26 atalanta multipathd: mysql: remaining active paths: 1
> Aug 27 16:26:37 atalanta multipathd: 8:64: tur checker reports path is up
> Aug 27 16:26:37 atalanta multipathd: 8:64: reinstated
> Aug 27 16:26:37 atalanta multipathd: mysql: remaining active paths: 2
> Aug 27 16:27:28 atalanta kernel: [264689.478050] device-mapper: multipath: Failing path 8:112.
> Aug 27 16:27:28 atalanta kernel: [264689.478771] device-mapper: multipath rdac: queueing MODE_SELECT command on 8:64
> Aug 27 16:27:28 atalanta multipathd: 8:112: mark as failed
> Aug 27 16:27:28 atalanta multipathd: mysql: remaining active paths: 1
> Aug 27 16:27:39 atalanta multipathd: 8:112: tur checker reports path is up
> Aug 27 16:27:39 atalanta multipathd: 8:112: reinstated
> Aug 27 16:27:39 atalanta multipathd: mysql: remaining active paths: 2
> Aug 27 16:28:30 atalanta kernel: [264751.365417] device-mapper: multipath: Failing path 8:64.
> Aug 27 16:28:30 atalanta kernel: [264751.366649] device-mapper: multipath rdac: queueing MODE_SELECT command on 8:112
> Aug 27 16:28:30 atalanta multipathd: 8:64: mark as failed
> Aug 27 16:28:30 atalanta multipathd: mysql: remaining active paths: 1
> Aug 27 16:28:41 atalanta multipathd: 8:64: tur checker reports path is up
> Aug 27 16:28:41 atalanta multipathd: 8:64: reinstated
> Aug 27 16:28:41 atalanta multipathd: mysql: remaining active paths: 2
>
> The last messages appear to be looping forever. The passive
> controller never goes active, and there's nothing in the logs on the
> array that indicates any attempt to move the volume. It seems like the
> hardware handler is simply broken...
>
> Another quite nasty thing is that even though I'm using
> queue_if_no_path, I/O errors still made it up to the file system layer,
> making it read-only. Isn't that exactly what queue_if_no_path is
> supposed to prevent?
>
> Regards
> --
> Tore Anderson
>
> --
> dm-devel mailing list
> dm-devel@redhat.com
> https://www.redhat.com/mailman/listinfo/dm-devel
--
----------------------------------------------------------------------
Chandra Seetharaman | Be careful what you choose....
- sekharan@us.ibm.com | .......you may get it.
----------------------------------------------------------------------
^ permalink raw reply [flat|nested] 6+ messages in thread* Re: dm-rdac not working?
2007-08-27 17:33 ` Chandra Seetharaman
@ 2007-08-27 18:31 ` Tore Anderson
2007-08-27 20:08 ` Chandra Seetharaman
0 siblings, 1 reply; 6+ messages in thread
From: Tore Anderson @ 2007-08-27 18:31 UTC (permalink / raw)
To: sekharan, device-mapper development
[-- Attachment #1: Type: text/plain, Size: 2061 bytes --]
* Chandra Seetharaman
> What version of multipath tools are you using ?
0.4.7.
> Can you attach your multipath.conf file. You should be using the rdac
> path checker instead of the tur path checker.
Hmm, this was added in 0.4.8... What's the difference between the
rdac path checker and the tur checker? Anyway, it appears to me that
the problem here is with the kernel hardware handler, not in the
userspace path checker, wouldn't you agree? The hardware handler is
invoked unpon pg init as expected, but fails to do its job.
> Can you attach the o/p of "multipath -ll"
mysql (3600a0b80002984ae0000179c46a68843)
[size=20 GB][features=1 queue_if_no_path][hwhandler=1 rdac]
\_ round-robin 0 [prio=0][enabled]
\_ 3:0:0:1 sdd 8:48 [active][ready]
\_ 4:0:0:1 sdh 8:112 [active][ready]
\_ round-robin 0 [prio=6][active]
\_ 3:0:1:1 sdg 8:96 [active][ready]
\_ 4:0:1:1 sdj 8:144 [active][ready]
www (3600a0b80002984ae0000179b46a687fb)
[size=45 GB][features=1 queue_if_no_path][hwhandler=1 rdac]
\_ round-robin 0 [prio=0][enabled]
\_ 3:0:0:0 sdc 8:32 [active][ready]
\_ 4:0:0:0 sde 8:64 [active][ready]
\_ round-robin 0 [prio=6][active]
\_ 3:0:1:0 sdf 8:80 [active][ready]
\_ 4:0:1:0 sdi 8:128 [active][ready]
This is after the machine has booted, so things have changed around a
little. 3:0:0:1 and 4:0:0:0 has switched devices, but otherwise it
looks like it did when I generated the problem.
> I presume 8:64 and 8:112 are the devices corresponding to other paths of
> the device.
That's right.
> You can see that the MODE_SELECT command is sent and immediately the
> path is failed (which means the MODE_SELECT command has failed).
>
> And this the same thing that repeats below.
Yes. It appears to me it's sending that MODE_SELECT (which I guess is
like dm-emc's "trespass"?) command when it's trying to activate the
pg, but it simply doesn't work, so it's retrying over and over again,
but in vain. And after a while ext3 gets remounted r/o and it's game
over.
Regards
--
Tore Anderson
[-- Attachment #2: multipath.conf --]
[-- Type: text/plain, Size: 1024 bytes --]
# This file is managed by puppet, any local changes will be lost
defaults {
user_friendly_names yes
}
defaults {
multipath_tool "/sbin/multipath -v0"
udev_dir /dev
polling_interval 10
rr_wmin_io 100
}
blacklist {
devnode "^(ram|raw|loop|fd|md|dm-|sr|scd|st)[0-9]*"
devnode "^hd.*"
devnode "^cciss!c[0-9]d[0-9]*[p[0-9]*]"
device {
vendor IBM-ESXS
}
}
multipaths {
multipath {
wwid 3600a0b80002984ae0000179b46a687fb
alias www
}
multipath {
wwid 3600a0b80002984ae0000179c46a68843
alias mysql
}
}
devices {
device {
vendor DGC
product RAID.*
hardware_handler "1 emc"
prio_callout "/sbin/mpath_prio_emc /dev/%n"
path_checker emc_clariion
path_grouping_policy group_by_prio
failback immediate
no_path_retry queue
}
device {
vendor SUN
product CSM200_R
hardware_handler "1 rdac"
prio_callout "/usr/local/sbin/mpath_prio_rdac /dev/%n"
path_checker tur
path_grouping_policy group_by_serial
failback immediate
no_path_retry queue
}
}
[-- Attachment #3: Type: text/plain, Size: 0 bytes --]
^ permalink raw reply [flat|nested] 6+ messages in thread* Re: dm-rdac not working?
2007-08-27 18:31 ` Tore Anderson
@ 2007-08-27 20:08 ` Chandra Seetharaman
[not found] ` <46D33BAA.8090807@linpro.no>
0 siblings, 1 reply; 6+ messages in thread
From: Chandra Seetharaman @ 2007-08-27 20:08 UTC (permalink / raw)
To: Tore Anderson; +Cc: device-mapper development
On Mon, 2007-08-27 at 20:31 +0200, Tore Anderson wrote:
> * Chandra Seetharaman
>
> > What version of multipath tools are you using ?
>
> 0.4.7.
>
> > Can you attach your multipath.conf file. You should be using the rdac
> > path checker instead of the tur path checker.
>
> Hmm, this was added in 0.4.8... What's the difference between the
> rdac path checker and the tur checker? Anyway, it appears to me that
tur checker just sends a test unit ready to see if the path is good.
Whereas rdac sends a c9 page inquiry and determines the state of the
path.
> the problem here is with the kernel hardware handler, not in the
> userspace path checker, wouldn't you agree? The hardware handler is
> invoked unpon pg init as expected, but fails to do its job.
That is true. But, I am wondering if what you have is based on the LSI
Engenio based controller (and hence the controller does not understand
the MODE_SELECT command).
What mode do you have your storage device configured in ? rdac or AVT ?
BTW, rdac path checker would show [ghost] instead of [ready] for the
passive path.
>
> > Can you attach the o/p of "multipath -ll"
>
> mysql (3600a0b80002984ae0000179c46a68843)
> [size=20 GB][features=1 queue_if_no_path][hwhandler=1 rdac]
> \_ round-robin 0 [prio=0][enabled]
> \_ 3:0:0:1 sdd 8:48 [active][ready]
> \_ 4:0:0:1 sdh 8:112 [active][ready]
> \_ round-robin 0 [prio=6][active]
> \_ 3:0:1:1 sdg 8:96 [active][ready]
> \_ 4:0:1:1 sdj 8:144 [active][ready]
Where did you get mpath_prio_rdac from ?
In 0.4.7 you could use mpath_prio_tpc instead (it behaves exactly as
mpath_prio_rdac in 0.4.8).
Can you apply the attached debug patch, repeat your test and send me the
log.
Thanks,
chandra
<snip>
--
----------------------------------------------------------------------
Chandra Seetharaman | Be careful what you choose....
- sekharan@us.ibm.com | .......you may get it.
----------------------------------------------------------------------
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2007-08-29 7:10 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-08-27 14:41 dm-rdac not working? Tore Anderson
2007-08-27 17:33 ` Chandra Seetharaman
2007-08-27 18:31 ` Tore Anderson
2007-08-27 20:08 ` Chandra Seetharaman
[not found] ` <46D33BAA.8090807@linpro.no>
2007-08-27 21:26 ` Chandra Seetharaman
[not found] ` <46D3C099.7080504@linpro.no>
[not found] ` <1188326877.12737.1.camel@linuxchandra>
2007-08-29 7:10 ` Tore Anderson
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.