* BUG in dm/dm-mirror module?
@ 2007-08-11 1:08 malahal
2007-08-11 8:52 ` Milan Broz
0 siblings, 1 reply; 8+ messages in thread
From: malahal @ 2007-08-11 1:08 UTC (permalink / raw)
To: dm-devel
Hi, I am trying to create a mirrored disk log. I have four block
devices, two for the log (which is a mirror!) and two for the actual
mirror device. But I can't use the mirror device at all. It just hangs
for any read/write. Here are the details of dmsetup calls. I am using
RHEL5 (2.6.18-8.el5). Looks like a mirror module bug and I appreciate
any help.
dev1="/dev/sda1"
dev2="/dev/sdb1"
dev3="/dev/sdc1"
dev4="/dev/sdd1"
echo "0 8192 mirror core 1 512 2 $dev1 0 $dev2 0" | dmsetup create log
echo "0 24576 mirror disk 2 /dev/mapper/log 512 2 $dev3 0 $dev4 0" | dmsetup create mirror
The following are the stack traces for kmirrord and the "dd
if=/dev/mapper/mirror of=/dev/null" command on the mirror:
crash> bt 312
PID: 312 TASK: c00000001dd311d0 CPU: 1 COMMAND: "kmirrord"
#0 [c00000001dd577d0] .schedule at c0000000003483c8
#1 [c00000001dd578e0] .io_schedule at c000000000349094
#2 [c00000001dd57970] .sync_io at d0000000002777f4
#3 [c00000001dd57a20] .dm_io_sync_vm at d0000000002778a0
#4 [c00000001dd57af0] .disk_flush at d0000000000808cc
#5 [c00000001dd57b90] .do_work at d00000000008342c
#6 [c00000001dd57d50] .run_workqueue at c00000000007b76c
#7 [c00000001dd57df0] .worker_thread at c00000000007c4d8
#8 [c00000001dd57ee0] .kthread at c000000000080fb8
#9 [c00000001dd57f90] .kernel_thread at c0000000000264bc
crash> bt 2489
PID: 2489 TASK: c00000000747cbc0 CPU: 1 COMMAND: "dd"
#0 [c00000000717b5c0] .schedule at c0000000003483c8
#1 [c00000000717b6d0] .io_schedule at c000000000349094
#2 [c00000000717b760] .sync_page at c0000000000aa180
#3 [c00000000717b7e0] .__wait_on_bit_lock at c0000000003492c0
#4 [c00000000717b880] .__lock_page at c0000000000a9fa4
#5 [c00000000717b950] .do_generic_mapping_read at c0000000000aad00
#6 [c00000000717baa0] .__generic_file_aio_read at c0000000000aba74
#7 [c00000000717bb70] .generic_file_read at c0000000000ad0c8
#8 [c00000000717bcf0] .vfs_read at c0000000000e0f10
#9 [c00000000717bd90] .sys_read at c0000000000e13f4
#10 [c00000000717be30] syscall_exit at c00000000000869c
syscall [c00] exception frame:
R0: 0000000000000003 R1: 00000000ffedf650 R2: 000000000fff9450
R3: 0000000000000000 R4: 0000000010030000 R5: 0000000000000200
R6: 000000001001d1a8 R7: 0000000000000000 R8: 00000000000001ff
R9: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000
R12: 0000000000000000 R13: 0000000010025144 R14: 0000000000000000
R15: 0000000010040a60 R16: 000000001001d190 R17: 000000001001d198
R18: 0000000000000000 R19: 000000001001d1f4 R20: 000000001001d128
R21: 000000001001d178 R22: 000000001001d160 R23: 000000001001d1bc
R24: 0000000010030000 R25: 000000001001d170 R26: 000000001001d170
R27: 0000000000000000 R28: 0000000010030000 R29: 0000000000000200
R30: 000000001001d008 R31: 0000000010030200
NIP: 000000000ff1a6d4 MSR: 000000000000d032 OR3: 0000000000000000
CTR: 000000000ff1a6c0 LR: 0000000010001df4 XER: 0000000000000000
CCR: 0000000044000428 MQ: c000000000421ae8 DAR: 0000000010050c64
DSISR: 0000000042000000 Syscall Result: 0000000000000200
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: BUG in dm/dm-mirror module?
2007-08-11 1:08 BUG in dm/dm-mirror module? malahal
@ 2007-08-11 8:52 ` Milan Broz
2007-08-13 15:18 ` Jonathan Brassow
0 siblings, 1 reply; 8+ messages in thread
From: Milan Broz @ 2007-08-11 8:52 UTC (permalink / raw)
To: device-mapper development
malahal@us.ibm.com wrote:
> Hi, I am trying to create a mirrored disk log. I have four block
> devices, two for the log (which is a mirror!) and two for the actual
> mirror device. But I can't use the mirror device at all. It just hangs
> for any read/write. Here are the details of dmsetup calls. I am using
> RHEL5 (2.6.18-8.el5). Looks like a mirror module bug and I appreciate
> any help.
>
> dev1="/dev/sda1"
> dev2="/dev/sdb1"
> dev3="/dev/sdc1"
> dev4="/dev/sdd1"
> echo "0 8192 mirror core 1 512 2 $dev1 0 $dev2 0" | dmsetup create log
> echo "0 24576 mirror disk 2 /dev/mapper/log 512 2 $dev3 0 $dev4 0" | dmsetup create mirror
Hi,
yes, there is known problem with one kmirrord thread and using mirrored log.
(i.e. mirror over mirror)
For problem description see this patch for upstream kernel
http://www2.kernel.org/pub/linux/kernel/people/agk/patches/2.6/2.6.21/dm-raid1-one-kmirrord-per-mirror.patch
All testing RHEL5 kernels from 2.6.18-18 has this fix included,
so for testing purposes you can try RHEL5.1 beta kernel.
Milan
--
mbroz@redhat.com
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: BUG in dm/dm-mirror module?
2007-08-11 8:52 ` Milan Broz
@ 2007-08-13 15:18 ` Jonathan Brassow
2007-08-13 16:48 ` Phillip Susi
2007-08-13 18:24 ` malahal
0 siblings, 2 replies; 8+ messages in thread
From: Jonathan Brassow @ 2007-08-13 15:18 UTC (permalink / raw)
To: device-mapper development
On Aug 11, 2007, at 3:52 AM, Milan Broz wrote:
> malahal@us.ibm.com wrote:
>> Hi, I am trying to create a mirrored disk log. I have four block
>> devices, two for the log (which is a mirror!) and two for the actual
>> mirror device. But I can't use the mirror device at all. It just
>> hangs
>> for any read/write. Here are the details of dmsetup calls. I am using
>> RHEL5 (2.6.18-8.el5). Looks like a mirror module bug and I appreciate
>> any help.
>>
>> dev1="/dev/sda1"
>> dev2="/dev/sdb1"
>> dev3="/dev/sdc1"
>> dev4="/dev/sdd1"
>> echo "0 8192 mirror core 1 512 2 $dev1 0 $dev2 0" | dmsetup create
>> log
>> echo "0 24576 mirror disk 2 /dev/mapper/log 512 2 $dev3 0 $dev4 0"
>> | dmsetup create mirror
>
> Hi,
>
> yes, there is known problem with one kmirrord thread and using
> mirrored log.
> (i.e. mirror over mirror)
>
> For problem description see this patch for upstream kernel
> http://www2.kernel.org/pub/linux/kernel/people/agk/patches/
> 2.6/2.6.21/dm-raid1-one-kmirrord-per-mirror.patch
>
> All testing RHEL5 kernels from 2.6.18-18 has this fix included,
> so for testing purposes you can try RHEL5.1 beta kernel.
On a different topic, why are you mirroring the log? Isn't this
somewhat dangerous?
Let's say that the primary copy of the log dies or goes offline. You
continue on because the log device is still "good". If your machine
crashes and the primary log device is "rediscovered" on bootup, what
happens? The contents of the stale side will be copied - resulting
in your log not properly reflecting the state of your mirror device
and maybe even leaving inconsistencies.
You might argue that we should update the metadata to exclude the
failed primary at the point of failure. Two things come to mind:
1) log I/O will continue until you take action - leaving you open to
the scenario above
2) it would be simpler to just allocate a new log (since you are
changing metadata anyway) and initialize the log as "in-sync" if the
mirror is already "in-sync".
If you ignore the possibility of transient device failures, mirroring
the log might make some sense. You gain an advantage only at the
times when a log device fails and:
1) the machine fails before the initial resync has completed
2) the machine fails while assigning a new log device
Ultimately, I think that in order to have a fast solution that allows
you to do the above (as well as a whole host of other advanced
features, like real-time mirroring) you need kernel accessible device
labels on each mirror device and log. The labels would track things
like: who's the primary, who's a slave, who's in the group, who's
failed, etc. I've seen some people advocate putting this in the log,
but the log can fail. (I hope I've already conveyed why I don't
think it's a good idea to mirror the log.) I don't have any good
ideas for making this happen right now.
brassow
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: BUG in dm/dm-mirror module?
2007-08-13 15:18 ` Jonathan Brassow
@ 2007-08-13 16:48 ` Phillip Susi
2007-08-13 20:18 ` Jonathan Brassow
2007-08-13 18:24 ` malahal
1 sibling, 1 reply; 8+ messages in thread
From: Phillip Susi @ 2007-08-13 16:48 UTC (permalink / raw)
To: device-mapper development
Jonathan Brassow wrote:
> On a different topic, why are you mirroring the log? Isn't this
> somewhat dangerous?
>
> Let's say that the primary copy of the log dies or goes offline. You
> continue on because the log device is still "good". If your machine
> crashes and the primary log device is "rediscovered" on bootup, what
> happens? The contents of the stale side will be copied - resulting in
> your log not properly reflecting the state of your mirror device and
> maybe even leaving inconsistencies.
This is a problem with any mirror, not just one holding a mirror log.
> You might argue that we should update the metadata to exclude the failed
> primary at the point of failure. Two things come to mind:
> 1) log I/O will continue until you take action - leaving you open to the
> scenario above
> 2) it would be simpler to just allocate a new log (since you are
> changing metadata anyway) and initialize the log as "in-sync" if the
> mirror is already "in-sync".
Yes, once one drive fails, the metadata on the other drive should
indicate that the mirror is broken and this is now the most up to date
copy.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: BUG in dm/dm-mirror module?
2007-08-13 15:18 ` Jonathan Brassow
2007-08-13 16:48 ` Phillip Susi
@ 2007-08-13 18:24 ` malahal
1 sibling, 0 replies; 8+ messages in thread
From: malahal @ 2007-08-13 18:24 UTC (permalink / raw)
To: Jonathan Brassow; +Cc: device-mapper development
Jonathan Brassow [jbrassow@redhat.com] wrote:
>
> Let's say that the primary copy of the log dies or goes offline. You
> continue on because the log device is still "good". If your machine
> crashes and the primary log device is "rediscovered" on bootup, what
> happens? The contents of the stale side will be copied - resulting
> in your log not properly reflecting the state of your mirror device
> and maybe even leaving inconsistencies.
How does this work today with a normal mirror (does the disk log keep
enough info who should be the master on reboot?)?
> Ultimately, I think that in order to have a fast solution that allows
> you to do the above (as well as a whole host of other advanced
> features, like real-time mirroring) you need kernel accessible device
> labels on each mirror device and log. The labels would track things
> like: who's the primary, who's a slave, who's in the group, who's
> failed, etc. I've seen some people advocate putting this in the log,
> but the log can fail. (I hope I've already conveyed why I don't
> think it's a good idea to mirror the log.) I don't have any good
> ideas for making this happen right now.
Yes, having a kernel accessible label on the mirror device would be best
to handle these kinds of scenarios. Other possible option is to enhance
log module to handle 'mirrored log' which can update log device failures
in the log itself.
Thanks, Malahal.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: BUG in dm/dm-mirror module?
2007-08-13 16:48 ` Phillip Susi
@ 2007-08-13 20:18 ` Jonathan Brassow
2007-08-13 21:21 ` Phillip Susi
2007-08-14 15:55 ` malahal
0 siblings, 2 replies; 8+ messages in thread
From: Jonathan Brassow @ 2007-08-13 20:18 UTC (permalink / raw)
To: device-mapper development
On Aug 13, 2007, at 11:48 AM, Phillip Susi wrote:
> Jonathan Brassow wrote:
>> On a different topic, why are you mirroring the log? Isn't this
>> somewhat dangerous?
>> Let's say that the primary copy of the log dies or goes offline.
>> You continue on because the log device is still "good". If your
>> machine crashes and the primary log device is "rediscovered" on
>> bootup, what happens? The contents of the stale side will be
>> copied - resulting in your log not properly reflecting the state
>> of your mirror device and maybe even leaving inconsistencies.
>
> This is a problem with any mirror, not just one holding a mirror log.
It is a special problem with the mirror log.
Mirrors will recover themselves and become consistent upon a reboot.
In the case of a mirror that holds a file system, if you lost some of
your most recent writes, journaling/fsck will take care of it. In
the case of a mirror that holds another mirror's log, you wind up
with a log that does not contain recent data - and could spell
coherency issues for the top level mirror.
>
>> You might argue that we should update the metadata to exclude the
>> failed primary at the point of failure. Two things come to mind:
>> 1) log I/O will continue until you take action - leaving you open
>> to the scenario above
>> 2) it would be simpler to just allocate a new log (since you are
>> changing metadata anyway) and initialize the log as "in-sync" if
>> the mirror is already "in-sync".
>
> Yes, once one drive fails, the metadata on the other drive should
> indicate that the mirror is broken and this is now the most up to
> date copy.
There is no metadata on the other drive, that's part of the problem.
We must discern between metadata that is made by LVM (or other
userspace app) and meta-data areas that are known to the device
mapper target. Currently, the mirroring target only has the log
device - which I contend is insufficient.
brassow
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: BUG in dm/dm-mirror module?
2007-08-13 20:18 ` Jonathan Brassow
@ 2007-08-13 21:21 ` Phillip Susi
2007-08-14 15:55 ` malahal
1 sibling, 0 replies; 8+ messages in thread
From: Phillip Susi @ 2007-08-13 21:21 UTC (permalink / raw)
To: device-mapper development
Jonathan Brassow wrote:
> It is a special problem with the mirror log.
>
> Mirrors will recover themselves and become consistent upon a reboot. In
> the case of a mirror that holds a file system, if you lost some of your
> most recent writes, journaling/fsck will take care of it. In the case
> of a mirror that holds another mirror's log, you wind up with a log that
> does not contain recent data - and could spell coherency issues for the
> top level mirror.
Having a filesystem that is consistent is still not correct if it is
older data, at least not when the newer data is available.
> There is no metadata on the other drive, that's part of the problem. We
> must discern between metadata that is made by LVM (or other userspace
> app) and meta-data areas that are known to the device mapper target.
> Currently, the mirroring target only has the log device - which I
> contend is insufficient.
LVM needs to update its metadata to indicate that the other drive failed
and this one now contains more up to date information going forward.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: BUG in dm/dm-mirror module?
2007-08-13 20:18 ` Jonathan Brassow
2007-08-13 21:21 ` Phillip Susi
@ 2007-08-14 15:55 ` malahal
1 sibling, 0 replies; 8+ messages in thread
From: malahal @ 2007-08-14 15:55 UTC (permalink / raw)
To: Jonathan Brassow; +Cc: device-mapper development
Jonathan Brassow [jbrassow@redhat.com] wrote:
>
> On Aug 13, 2007, at 11:48 AM, Phillip Susi wrote:
>
> >
> >This is a problem with any mirror, not just one holding a mirror log.
>
> It is a special problem with the mirror log.
>
> Mirrors will recover themselves and become consistent upon a reboot.
> In the case of a mirror that holds a file system, if you lost some of
> your most recent writes, journaling/fsck will take care of it. In
I believe the mirror code handles errors at region level. So one region
could be out of sync while the other regions are updated with the latest
data if the disk failure(s) are transient. I don't think the disk with
few 'out-of-sync' regions can be assumed to have consistent data.
In any case, we need a better method to select the master mirror device.
Does LVM have an extra sector or so to give it to the kernel module?
Thanks, Malahal.
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2007-08-14 15:55 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-08-11 1:08 BUG in dm/dm-mirror module? malahal
2007-08-11 8:52 ` Milan Broz
2007-08-13 15:18 ` Jonathan Brassow
2007-08-13 16:48 ` Phillip Susi
2007-08-13 20:18 ` Jonathan Brassow
2007-08-13 21:21 ` Phillip Susi
2007-08-14 15:55 ` malahal
2007-08-13 18:24 ` malahal
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.