* BUG in dm/dm-mirror module? @ 2007-08-11 1:08 malahal 2007-08-11 8:52 ` Milan Broz 0 siblings, 1 reply; 8+ messages in thread From: malahal @ 2007-08-11 1:08 UTC (permalink / raw) To: dm-devel Hi, I am trying to create a mirrored disk log. I have four block devices, two for the log (which is a mirror!) and two for the actual mirror device. But I can't use the mirror device at all. It just hangs for any read/write. Here are the details of dmsetup calls. I am using RHEL5 (2.6.18-8.el5). Looks like a mirror module bug and I appreciate any help. dev1="/dev/sda1" dev2="/dev/sdb1" dev3="/dev/sdc1" dev4="/dev/sdd1" echo "0 8192 mirror core 1 512 2 $dev1 0 $dev2 0" | dmsetup create log echo "0 24576 mirror disk 2 /dev/mapper/log 512 2 $dev3 0 $dev4 0" | dmsetup create mirror The following are the stack traces for kmirrord and the "dd if=/dev/mapper/mirror of=/dev/null" command on the mirror: crash> bt 312 PID: 312 TASK: c00000001dd311d0 CPU: 1 COMMAND: "kmirrord" #0 [c00000001dd577d0] .schedule at c0000000003483c8 #1 [c00000001dd578e0] .io_schedule at c000000000349094 #2 [c00000001dd57970] .sync_io at d0000000002777f4 #3 [c00000001dd57a20] .dm_io_sync_vm at d0000000002778a0 #4 [c00000001dd57af0] .disk_flush at d0000000000808cc #5 [c00000001dd57b90] .do_work at d00000000008342c #6 [c00000001dd57d50] .run_workqueue at c00000000007b76c #7 [c00000001dd57df0] .worker_thread at c00000000007c4d8 #8 [c00000001dd57ee0] .kthread at c000000000080fb8 #9 [c00000001dd57f90] .kernel_thread at c0000000000264bc crash> bt 2489 PID: 2489 TASK: c00000000747cbc0 CPU: 1 COMMAND: "dd" #0 [c00000000717b5c0] .schedule at c0000000003483c8 #1 [c00000000717b6d0] .io_schedule at c000000000349094 #2 [c00000000717b760] .sync_page at c0000000000aa180 #3 [c00000000717b7e0] .__wait_on_bit_lock at c0000000003492c0 #4 [c00000000717b880] .__lock_page at c0000000000a9fa4 #5 [c00000000717b950] .do_generic_mapping_read at c0000000000aad00 #6 [c00000000717baa0] .__generic_file_aio_read at c0000000000aba74 #7 [c00000000717bb70] .generic_file_read at c0000000000ad0c8 #8 [c00000000717bcf0] .vfs_read at c0000000000e0f10 #9 [c00000000717bd90] .sys_read at c0000000000e13f4 #10 [c00000000717be30] syscall_exit at c00000000000869c syscall [c00] exception frame: R0: 0000000000000003 R1: 00000000ffedf650 R2: 000000000fff9450 R3: 0000000000000000 R4: 0000000010030000 R5: 0000000000000200 R6: 000000001001d1a8 R7: 0000000000000000 R8: 00000000000001ff R9: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 R13: 0000000010025144 R14: 0000000000000000 R15: 0000000010040a60 R16: 000000001001d190 R17: 000000001001d198 R18: 0000000000000000 R19: 000000001001d1f4 R20: 000000001001d128 R21: 000000001001d178 R22: 000000001001d160 R23: 000000001001d1bc R24: 0000000010030000 R25: 000000001001d170 R26: 000000001001d170 R27: 0000000000000000 R28: 0000000010030000 R29: 0000000000000200 R30: 000000001001d008 R31: 0000000010030200 NIP: 000000000ff1a6d4 MSR: 000000000000d032 OR3: 0000000000000000 CTR: 000000000ff1a6c0 LR: 0000000010001df4 XER: 0000000000000000 CCR: 0000000044000428 MQ: c000000000421ae8 DAR: 0000000010050c64 DSISR: 0000000042000000 Syscall Result: 0000000000000200 ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: BUG in dm/dm-mirror module? 2007-08-11 1:08 BUG in dm/dm-mirror module? malahal @ 2007-08-11 8:52 ` Milan Broz 2007-08-13 15:18 ` Jonathan Brassow 0 siblings, 1 reply; 8+ messages in thread From: Milan Broz @ 2007-08-11 8:52 UTC (permalink / raw) To: device-mapper development malahal@us.ibm.com wrote: > Hi, I am trying to create a mirrored disk log. I have four block > devices, two for the log (which is a mirror!) and two for the actual > mirror device. But I can't use the mirror device at all. It just hangs > for any read/write. Here are the details of dmsetup calls. I am using > RHEL5 (2.6.18-8.el5). Looks like a mirror module bug and I appreciate > any help. > > dev1="/dev/sda1" > dev2="/dev/sdb1" > dev3="/dev/sdc1" > dev4="/dev/sdd1" > echo "0 8192 mirror core 1 512 2 $dev1 0 $dev2 0" | dmsetup create log > echo "0 24576 mirror disk 2 /dev/mapper/log 512 2 $dev3 0 $dev4 0" | dmsetup create mirror Hi, yes, there is known problem with one kmirrord thread and using mirrored log. (i.e. mirror over mirror) For problem description see this patch for upstream kernel http://www2.kernel.org/pub/linux/kernel/people/agk/patches/2.6/2.6.21/dm-raid1-one-kmirrord-per-mirror.patch All testing RHEL5 kernels from 2.6.18-18 has this fix included, so for testing purposes you can try RHEL5.1 beta kernel. Milan -- mbroz@redhat.com ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: BUG in dm/dm-mirror module? 2007-08-11 8:52 ` Milan Broz @ 2007-08-13 15:18 ` Jonathan Brassow 2007-08-13 16:48 ` Phillip Susi 2007-08-13 18:24 ` malahal 0 siblings, 2 replies; 8+ messages in thread From: Jonathan Brassow @ 2007-08-13 15:18 UTC (permalink / raw) To: device-mapper development On Aug 11, 2007, at 3:52 AM, Milan Broz wrote: > malahal@us.ibm.com wrote: >> Hi, I am trying to create a mirrored disk log. I have four block >> devices, two for the log (which is a mirror!) and two for the actual >> mirror device. But I can't use the mirror device at all. It just >> hangs >> for any read/write. Here are the details of dmsetup calls. I am using >> RHEL5 (2.6.18-8.el5). Looks like a mirror module bug and I appreciate >> any help. >> >> dev1="/dev/sda1" >> dev2="/dev/sdb1" >> dev3="/dev/sdc1" >> dev4="/dev/sdd1" >> echo "0 8192 mirror core 1 512 2 $dev1 0 $dev2 0" | dmsetup create >> log >> echo "0 24576 mirror disk 2 /dev/mapper/log 512 2 $dev3 0 $dev4 0" >> | dmsetup create mirror > > Hi, > > yes, there is known problem with one kmirrord thread and using > mirrored log. > (i.e. mirror over mirror) > > For problem description see this patch for upstream kernel > http://www2.kernel.org/pub/linux/kernel/people/agk/patches/ > 2.6/2.6.21/dm-raid1-one-kmirrord-per-mirror.patch > > All testing RHEL5 kernels from 2.6.18-18 has this fix included, > so for testing purposes you can try RHEL5.1 beta kernel. On a different topic, why are you mirroring the log? Isn't this somewhat dangerous? Let's say that the primary copy of the log dies or goes offline. You continue on because the log device is still "good". If your machine crashes and the primary log device is "rediscovered" on bootup, what happens? The contents of the stale side will be copied - resulting in your log not properly reflecting the state of your mirror device and maybe even leaving inconsistencies. You might argue that we should update the metadata to exclude the failed primary at the point of failure. Two things come to mind: 1) log I/O will continue until you take action - leaving you open to the scenario above 2) it would be simpler to just allocate a new log (since you are changing metadata anyway) and initialize the log as "in-sync" if the mirror is already "in-sync". If you ignore the possibility of transient device failures, mirroring the log might make some sense. You gain an advantage only at the times when a log device fails and: 1) the machine fails before the initial resync has completed 2) the machine fails while assigning a new log device Ultimately, I think that in order to have a fast solution that allows you to do the above (as well as a whole host of other advanced features, like real-time mirroring) you need kernel accessible device labels on each mirror device and log. The labels would track things like: who's the primary, who's a slave, who's in the group, who's failed, etc. I've seen some people advocate putting this in the log, but the log can fail. (I hope I've already conveyed why I don't think it's a good idea to mirror the log.) I don't have any good ideas for making this happen right now. brassow ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: BUG in dm/dm-mirror module? 2007-08-13 15:18 ` Jonathan Brassow @ 2007-08-13 16:48 ` Phillip Susi 2007-08-13 20:18 ` Jonathan Brassow 2007-08-13 18:24 ` malahal 1 sibling, 1 reply; 8+ messages in thread From: Phillip Susi @ 2007-08-13 16:48 UTC (permalink / raw) To: device-mapper development Jonathan Brassow wrote: > On a different topic, why are you mirroring the log? Isn't this > somewhat dangerous? > > Let's say that the primary copy of the log dies or goes offline. You > continue on because the log device is still "good". If your machine > crashes and the primary log device is "rediscovered" on bootup, what > happens? The contents of the stale side will be copied - resulting in > your log not properly reflecting the state of your mirror device and > maybe even leaving inconsistencies. This is a problem with any mirror, not just one holding a mirror log. > You might argue that we should update the metadata to exclude the failed > primary at the point of failure. Two things come to mind: > 1) log I/O will continue until you take action - leaving you open to the > scenario above > 2) it would be simpler to just allocate a new log (since you are > changing metadata anyway) and initialize the log as "in-sync" if the > mirror is already "in-sync". Yes, once one drive fails, the metadata on the other drive should indicate that the mirror is broken and this is now the most up to date copy. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: BUG in dm/dm-mirror module? 2007-08-13 16:48 ` Phillip Susi @ 2007-08-13 20:18 ` Jonathan Brassow 2007-08-13 21:21 ` Phillip Susi 2007-08-14 15:55 ` malahal 0 siblings, 2 replies; 8+ messages in thread From: Jonathan Brassow @ 2007-08-13 20:18 UTC (permalink / raw) To: device-mapper development On Aug 13, 2007, at 11:48 AM, Phillip Susi wrote: > Jonathan Brassow wrote: >> On a different topic, why are you mirroring the log? Isn't this >> somewhat dangerous? >> Let's say that the primary copy of the log dies or goes offline. >> You continue on because the log device is still "good". If your >> machine crashes and the primary log device is "rediscovered" on >> bootup, what happens? The contents of the stale side will be >> copied - resulting in your log not properly reflecting the state >> of your mirror device and maybe even leaving inconsistencies. > > This is a problem with any mirror, not just one holding a mirror log. It is a special problem with the mirror log. Mirrors will recover themselves and become consistent upon a reboot. In the case of a mirror that holds a file system, if you lost some of your most recent writes, journaling/fsck will take care of it. In the case of a mirror that holds another mirror's log, you wind up with a log that does not contain recent data - and could spell coherency issues for the top level mirror. > >> You might argue that we should update the metadata to exclude the >> failed primary at the point of failure. Two things come to mind: >> 1) log I/O will continue until you take action - leaving you open >> to the scenario above >> 2) it would be simpler to just allocate a new log (since you are >> changing metadata anyway) and initialize the log as "in-sync" if >> the mirror is already "in-sync". > > Yes, once one drive fails, the metadata on the other drive should > indicate that the mirror is broken and this is now the most up to > date copy. There is no metadata on the other drive, that's part of the problem. We must discern between metadata that is made by LVM (or other userspace app) and meta-data areas that are known to the device mapper target. Currently, the mirroring target only has the log device - which I contend is insufficient. brassow ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: BUG in dm/dm-mirror module? 2007-08-13 20:18 ` Jonathan Brassow @ 2007-08-13 21:21 ` Phillip Susi 2007-08-14 15:55 ` malahal 1 sibling, 0 replies; 8+ messages in thread From: Phillip Susi @ 2007-08-13 21:21 UTC (permalink / raw) To: device-mapper development Jonathan Brassow wrote: > It is a special problem with the mirror log. > > Mirrors will recover themselves and become consistent upon a reboot. In > the case of a mirror that holds a file system, if you lost some of your > most recent writes, journaling/fsck will take care of it. In the case > of a mirror that holds another mirror's log, you wind up with a log that > does not contain recent data - and could spell coherency issues for the > top level mirror. Having a filesystem that is consistent is still not correct if it is older data, at least not when the newer data is available. > There is no metadata on the other drive, that's part of the problem. We > must discern between metadata that is made by LVM (or other userspace > app) and meta-data areas that are known to the device mapper target. > Currently, the mirroring target only has the log device - which I > contend is insufficient. LVM needs to update its metadata to indicate that the other drive failed and this one now contains more up to date information going forward. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: BUG in dm/dm-mirror module? 2007-08-13 20:18 ` Jonathan Brassow 2007-08-13 21:21 ` Phillip Susi @ 2007-08-14 15:55 ` malahal 1 sibling, 0 replies; 8+ messages in thread From: malahal @ 2007-08-14 15:55 UTC (permalink / raw) To: Jonathan Brassow; +Cc: device-mapper development Jonathan Brassow [jbrassow@redhat.com] wrote: > > On Aug 13, 2007, at 11:48 AM, Phillip Susi wrote: > > > > >This is a problem with any mirror, not just one holding a mirror log. > > It is a special problem with the mirror log. > > Mirrors will recover themselves and become consistent upon a reboot. > In the case of a mirror that holds a file system, if you lost some of > your most recent writes, journaling/fsck will take care of it. In I believe the mirror code handles errors at region level. So one region could be out of sync while the other regions are updated with the latest data if the disk failure(s) are transient. I don't think the disk with few 'out-of-sync' regions can be assumed to have consistent data. In any case, we need a better method to select the master mirror device. Does LVM have an extra sector or so to give it to the kernel module? Thanks, Malahal. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: BUG in dm/dm-mirror module? 2007-08-13 15:18 ` Jonathan Brassow 2007-08-13 16:48 ` Phillip Susi @ 2007-08-13 18:24 ` malahal 1 sibling, 0 replies; 8+ messages in thread From: malahal @ 2007-08-13 18:24 UTC (permalink / raw) To: Jonathan Brassow; +Cc: device-mapper development Jonathan Brassow [jbrassow@redhat.com] wrote: > > Let's say that the primary copy of the log dies or goes offline. You > continue on because the log device is still "good". If your machine > crashes and the primary log device is "rediscovered" on bootup, what > happens? The contents of the stale side will be copied - resulting > in your log not properly reflecting the state of your mirror device > and maybe even leaving inconsistencies. How does this work today with a normal mirror (does the disk log keep enough info who should be the master on reboot?)? > Ultimately, I think that in order to have a fast solution that allows > you to do the above (as well as a whole host of other advanced > features, like real-time mirroring) you need kernel accessible device > labels on each mirror device and log. The labels would track things > like: who's the primary, who's a slave, who's in the group, who's > failed, etc. I've seen some people advocate putting this in the log, > but the log can fail. (I hope I've already conveyed why I don't > think it's a good idea to mirror the log.) I don't have any good > ideas for making this happen right now. Yes, having a kernel accessible label on the mirror device would be best to handle these kinds of scenarios. Other possible option is to enhance log module to handle 'mirrored log' which can update log device failures in the log itself. Thanks, Malahal. ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2007-08-14 15:55 UTC | newest] Thread overview: 8+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2007-08-11 1:08 BUG in dm/dm-mirror module? malahal 2007-08-11 8:52 ` Milan Broz 2007-08-13 15:18 ` Jonathan Brassow 2007-08-13 16:48 ` Phillip Susi 2007-08-13 20:18 ` Jonathan Brassow 2007-08-13 21:21 ` Phillip Susi 2007-08-14 15:55 ` malahal 2007-08-13 18:24 ` malahal
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.