* [PATCH lvconvert 0/2] Fixes suspend/resume ordering of lvconvert
@ 2008-02-06 21:55 Jun'ichi Nomura
2008-02-06 22:04 ` [PATCH lvconvert 1/2] Fix resume/suspend ordering after temporary mirror insertion Jun'ichi Nomura
2008-02-06 22:05 ` [PATCH lvconvert 2/2] Update dm table of off-tree layer LV on lvconvert Jun'ichi Nomura
0 siblings, 2 replies; 3+ messages in thread
From: Jun'ichi Nomura @ 2008-02-06 21:55 UTC (permalink / raw)
To: lvm-devel
lvconvert has problems where 2 active mirror maps coexist
for a short while sharing the same log device.
That is critical to cluster mirror as it detects such situation
but also dangerous to non-clustered mirror.
(A lot of thanks to Jon Brassow for the reports, testings and
analysis from cluster mirror viewpoint)
The problems are:
1. resume before suspend
When a layer is inserted beneath a LV, the layer is
resumed before the LV is suspended.
I.e. if the LV is active, lvconvert calls suspend_lv() for
the LV to suspend the LV preparing for the update:
suspend_lv()
_lv_suspend()
_lv_preload()
dev_manager_preload()
dm_tree_preload_children()
Load tables for devices from bottom to top.
If a device has parents, resume the device, too.
_lv_suspend_lv()
dev_manager_suspend()
However, before actually suspend the LV, suspend_lv() will end
up calling dm_tree_preload_children() that involves resuming
of the layer.
2. off-tree device not updated
When a layer is removed, a new table of "error" target
is not loaded/resumed for the layer during the update
of the LV.
So the layer continues to have the old table.
_remove_mirror_images()
remove_layer_from_lv()
Update the in-memory VG metadata.
The layer is no longer a part of the LV in the metadata.
vg_write()
The metadata is pre-committed.
suspend_lv()
vg_commit()
The metadata is committed.
resume_lv()
Load new tables based on the new metadata and resume.
It doesn't load a new table for the layer.
Thanks,
--
Jun'ichi Nomura, NEC Corporation of America
-------------- next part --------------
A non-text attachment was scrubbed...
Name: lvconvert-bad.log
Type: text/x-log
Size: 228564 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/lvm-devel/attachments/20080206/402cb516/attachment.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: lvconvert-good.log
Type: text/x-log
Size: 230023 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/lvm-devel/attachments/20080206/402cb516/attachment-0001.bin>
^ permalink raw reply [flat|nested] 3+ messages in thread
* [PATCH lvconvert 1/2] Fix resume/suspend ordering after temporary mirror insertion
2008-02-06 21:55 [PATCH lvconvert 0/2] Fixes suspend/resume ordering of lvconvert Jun'ichi Nomura
@ 2008-02-06 22:04 ` Jun'ichi Nomura
2008-02-06 22:05 ` [PATCH lvconvert 2/2] Update dm table of off-tree layer LV on lvconvert Jun'ichi Nomura
1 sibling, 0 replies; 3+ messages in thread
From: Jun'ichi Nomura @ 2008-02-06 22:04 UTC (permalink / raw)
To: lvm-devel
This patch is an updated version of the following:
https://www.redhat.com/archives/lvm-devel/2008-January/msg00134.html
There is a small window during updating the in-kernel dm tables
for stacked LV that the upper device and the lower device have
idential active mappings.
In the current LVM2 features, only lvconvert will suffer from
this problem when adding mirror image(s) to mirror LV.
Attached patch works around the lvconvert problem.
Details are below.
When updating a structure of active LV,
LVM2 preloads new dm table for each device from bottom to top,
then suspend top-down and resume bottom-up.
The preloading includes resuming of lower device so that
a new table for upper device can see the attributes of the
new lower device (i.e. new size).
The point is that the resuming of the lower device happens
before the suspending of the upper device.
If the new table of the lower device and the old table of the
upper device were same and the table contains a target with
side-effect after resume (i.e. mirror and snapshot),
it causes a problem.
In the current LVM2 code, the problem can only occur when
lvconvert adds mirrors to existing mirror.
dev_manager_preload() can check CONVERTING flag in lv->status
to see whether a layer LV is inserted or not.
If inserted, it skips preloading and let the resume code handle it.
Below, I'm trying to explain what's happening using the 'dmsetup ls --tree'
output during "lvconvert adds 1 mirror to 2-way mirrored LV".
lvconvert will change the device tree as follows:
1. Before lvconvert
vg-lvol0 (253:4)
|-vg-lvol0_mimage_1 (253:3)
|-vg-lvol0_mimage_0 (253:2)
`-vg-lvol0_mlog (253:1)
2. During lvconvert
vg-lvol0 (253:4)
|-vg-lvol0_mimage_2 (253:6)
`-vg-lvol0_mimagetmp_2 (253:5)
|-vg-lvol0_mimage_1 (253:3)
|-vg-lvol0_mimage_0 (253:2)
`-vg-lvol0_mlog (253:1)
3. After lvconvert
vg-lvol0 (253:4)
|-vg-lvol0_mimage_2 (253:6)
|-vg-lvol0_mimage_1 (253:3)
|-vg-lvol0_mimage_0 (253:2)
`-vg-lvol0_mlog (253:1)
While moving from the stage 1 to the stage 2,
lvconvert will create a LV 'vg-lvol0_mimage_2' as a new mirror image
and a layer 'vg-lvol0_mimagetmp_2' to hold the original mirror map:
vg-lvol0_mimage_2 (253:6)
vg-lvol0_mimagetmp_2 (253:5)
|-vg-lvol0_mimage_1 (253:3)
|-vg-lvol0_mimage_0 (253:2)
`-vg-lvol0_mlog (253:1)
And vg-lvol0 will mirror them:
vg-lvol0 (253:4)
|-vg-lvol0_mimage_2 (253:6)
`-vg-lvol0_mimagetmp_2 (253:5)
device-mapper operations for the above is actually as follows:
(excerpt from lvconvert-bad.log)
#libdm-deptree.c:1470 Loading vg-lvol0_mimagetmp_2 table
#libdm-deptree.c:1421 Adding target: 0 4096 mirror disk 3 253:1 1024 block_on_error 2 253:2 0 253:3 0
#libdm-deptree.c:897 Resuming vg-lvol0_mimagetmp_2 (253:5)
^^^^HERE
#libdm-deptree.c:1470 Loading vg-lvol0_mimage_2 table
#libdm-deptree.c:1421 Adding target: 0 4096 linear 8:49 384
#libdm-deptree.c:897 Resuming vg-lvol0_mimage_2 (253:6)
#libdm-deptree.c:1470 Loading vg-lvol0 table
#libdm-deptree.c:1421 Adding target: 0 4096 mirror core 2 1024 block_on_error 2 253:5 0 253:6 0
#libdm-deptree.c:940 Suspending vg-lvol0 (253:4)
#libdm-deptree.c:940 Suspending vg-lvol0_mimage_1 (253:3)
#libdm-deptree.c:940 Suspending vg-lvol0_mimage_0 (253:2)
#libdm-deptree.c:940 Suspending vg-lvol0_mlog (253:1)
#libdm-deptree.c:1470 Loading vg-lvol0_mimage_1 table
#libdm-deptree.c:1421 Adding target: 0 4096 linear 8:33 384
#libdm-deptree.c:897 Resuming vg-lvol0_mimage_1 (253:3)
#libdm-deptree.c:1470 Loading vg-lvol0_mimage_0 table
#libdm-deptree.c:1421 Adding target: 0 4096 linear 8:65 384
#libdm-deptree.c:897 Resuming vg-lvol0_mimage_0 (253:2)
#libdm-deptree.c:1470 Loading vg-lvol0_mlog table
#libdm-deptree.c:1421 Adding target: 0 4096 linear 8:34 384
#libdm-deptree.c:897 Resuming vg-lvol0_mlog (253:1)
#libdm-deptree.c:1470 Loading vg-lvol0_mimagetmp_2 table
#libdm-deptree.c:1421 Adding target: 0 4096 mirror disk 3 253:1 1024 block_on_error 2 253:2 0 253:3 0
#libdm-deptree.c:1470 Loading vg-lvol0_mimage_2 table
#libdm-deptree.c:1421 Adding target: 0 4096 linear 8:49 384
#libdm-deptree.c:897 Resuming vg-lvol0 (253:4)
Note that at the line commented with "HERE" above,
both vg-lvol0 and vg-lvol0_mimagetmp_2 are active and
having the same structure:
vg-lvol0 (253:4)
|-vg-lvol0_mimage_1 (253:3)
|-vg-lvol0_mimage_0 (253:2)
`-vg-lvol0_mlog (253:1)
It happens because the preloading is done before suspending.
Attached patch disables preloading if CONVERTING is on.
With the patch, vg-lvol0 is suspended first.
So the operations look like this: (excerpt from lvconvert-good.log)
#libdm-deptree.c:940 Suspending vg-lvol0 (253:4)
#libdm-deptree.c:940 Suspending vg-lvol0_mimage_1 (253:3)
#libdm-deptree.c:940 Suspending vg-lvol0_mimage_0 (253:2)
#libdm-deptree.c:940 Suspending vg-lvol0_mlog (253:1)
#libdm-deptree.c:1470 Loading vg-lvol0_mimage_1 table
#libdm-deptree.c:1421 Adding target: 0 4096 linear 8:33 384
#libdm-deptree.c:897 Resuming vg-lvol0_mimage_1 (253:3)
#libdm-deptree.c:1470 Loading vg-lvol0_mimage_0 table
#libdm-deptree.c:1421 Adding target: 0 4096 linear 8:65 384
#libdm-deptree.c:897 Resuming vg-lvol0_mimage_0 (253:2)
#libdm-deptree.c:1470 Loading vg-lvol0_mlog table
#libdm-deptree.c:1421 Adding target: 0 4096 linear 8:34 384
#libdm-deptree.c:897 Resuming vg-lvol0_mlog (253:1)
#libdm-deptree.c:1470 Loading vg-lvol0_mimagetmp_2 table
#libdm-deptree.c:1421 Adding target: 0 4096 mirror disk 3 253:1 1024 block_on_error 2 253:2 0 253:3 0
#libdm-deptree.c:897 Resuming vg-lvol0_mimagetmp_2 (253:5)
#libdm-deptree.c:1470 Loading vg-lvol0_mimage_2 table
#libdm-deptree.c:1421 Adding target: 0 4096 linear 8:49 384
#libdm-deptree.c:897 Resuming vg-lvol0_mimage_2 (253:6)
#libdm-deptree.c:1470 Loading vg-lvol0 table
#libdm-deptree.c:1421 Adding target: 0 4096 mirror core 2 1024 block_on_error 2 253:5 0 253:6 0
#libdm-deptree.c:897 Resuming vg-lvol0 (253:4)
Thanks,
--
Jun'ichi Nomura, NEC Corporation of America
-------------- next part --------------
A non-text attachment was scrubbed...
Name: fix-dont-preload-after-layer-insertion.patch
Type: text/x-patch
Size: 1757 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/lvm-devel/attachments/20080206/bb3a228e/attachment.bin>
^ permalink raw reply [flat|nested] 3+ messages in thread
* [PATCH lvconvert 2/2] Update dm table of off-tree layer LV on lvconvert
2008-02-06 21:55 [PATCH lvconvert 0/2] Fixes suspend/resume ordering of lvconvert Jun'ichi Nomura
2008-02-06 22:04 ` [PATCH lvconvert 1/2] Fix resume/suspend ordering after temporary mirror insertion Jun'ichi Nomura
@ 2008-02-06 22:05 ` Jun'ichi Nomura
1 sibling, 0 replies; 3+ messages in thread
From: Jun'ichi Nomura @ 2008-02-06 22:05 UTC (permalink / raw)
To: lvm-devel
If a LV in the middle of the stacked LV is removed,
the suspend/resume of the stacked LV doesn't update the removed
LV's dm table in kernel.
It is not a problem in the current LVM2 features except for
lvconvert finishing adding mirror image(s) to mirror LV.
Attached patch works around the lvconvert problem.
For details, see below.
> When updating a structure of active LV,
> LVM2 preloads new dm table for each device from bottom to top,
> then suspend top-down and resume bottom-up.
When a layer LV is being removed from the tree,
there is a problem that the removed layer LV is resumed
with the same table before the suspend.
remove_layer_from_lv() will set error segment for the layer LV.
However, since the layer LV is no longer a part of the LV stack,
either preloading or resuming doesn't load the new table with
the error segment.
The upper device will load and resume new table, that is
usually very similar to that for the layer LV.
The layer LV will be removed later. However, until then,
there are 2 active tables working on the same resource
(e.g. mirror log, snapshot metadata).
In the current LVM2 code, the problem can only occur when
lvconvert finishes mirror addition to existing mirror.
_remove_mirror_images() activates the layer LV which is
removed from the mirror LV, before resuming the mirror LV.
Below, I'm trying to explain what's happening using the 'dmsetup ls --tree'
output during "lvconvert adds 1 mirror to 2-way mirrored LV".
lvconvert will change the device tree as follows:
1. Before lvconvert
vg-lvol0 (253:4)
|-vg-lvol0_mimage_1 (253:3)
|-vg-lvol0_mimage_0 (253:2)
`-vg-lvol0_mlog (253:1)
2. During lvconvert
vg-lvol0 (253:4)
|-vg-lvol0_mimage_2 (253:6)
`-vg-lvol0_mimagetmp_2 (253:5)
|-vg-lvol0_mimage_1 (253:3)
|-vg-lvol0_mimage_0 (253:2)
`-vg-lvol0_mlog (253:1)
3. After lvconvert
vg-lvol0 (253:4)
|-vg-lvol0_mimage_2 (253:6)
|-vg-lvol0_mimage_1 (253:3)
|-vg-lvol0_mimage_0 (253:2)
`-vg-lvol0_mlog (253:1)
While moving from the stage 2 to the stage 3,
lvconvert will move the segments of the layer 'vg-lvol0_mimagetmp_2'
to 'vg-lvol0' and put an error segment instead.
Thus, vg-lvol0_mimagetmp_2 is free to be removed.
vg-lvol0_mimagetmp_2 (253:5)
vg-lvol0 (253:4)
|-vg-lvol0_mimage_2 (253:6)
|-vg-lvol0_mimage_1 (253:3)
|-vg-lvol0_mimage_0 (253:2)
`-vg-lvol0_mlog (253:1)
However, since the load/suspend/resume operation is done
only on vg-lvol0 and vg-lvol0_mimagetmp_2 is already out of
the tree, the table of vg-lvol0_mimagetmp_2 is unchanged
from the stage 2:
vg-lvol0_mimagetmp_2 (253:5)
|-vg-lvol0_mimage_1 (253:3)
|-vg-lvol0_mimage_0 (253:2)
`-vg-lvol0_mlog (253:1)
vg-lvol0 (253:4)
|-vg-lvol0_mimage_2 (253:6)
|-vg-lvol0_mimage_1 (253:3)
|-vg-lvol0_mimage_0 (253:2)
`-vg-lvol0_mlog (253:1)
So we have 2 active mirrors with same mirror log for a short while
until lvconvert removes vg-lvol0_mimagetmp_2.
The attached patch updates the table of vg-lvol0_mimagetmp_2
before updating that of vg-lvol0 to avoid this situation.
Without the patch, you can see vg-lvol0_mimagetmp_2
is resumed without loading a new table.
(excerpt from lvconvert-bad.log)
#libdm-deptree.c:940 Suspending vg-lvol0 (253:4)
#libdm-deptree.c:940 Suspending vg-lvol0_mimage_2 (253:6)
#libdm-deptree.c:940 Suspending vg-lvol0_mimagetmp_2 (253:5)
#libdm-deptree.c:940 Suspending vg-lvol0_mimage_1 (253:3)
#libdm-deptree.c:940 Suspending vg-lvol0_mimage_0 (253:2)
#libdm-deptree.c:940 Suspending vg-lvol0_mlog (253:1)
#libdm-deptree.c:1470 Loading vg-lvol0_mimage_2 table
#libdm-deptree.c:1421 Adding target: 0 4096 linear 8:49 384
#libdm-deptree.c:897 Resuming vg-lvol0_mimage_2 (253:6)
#libdm-deptree.c:1470 Loading vg-lvol0_mimage_1 table
#libdm-deptree.c:1421 Adding target: 0 4096 linear 8:33 384
#libdm-deptree.c:897 Resuming vg-lvol0_mimage_1 (253:3)
#libdm-deptree.c:1470 Loading vg-lvol0_mimage_0 table
#libdm-deptree.c:1421 Adding target: 0 4096 linear 8:65 384
#libdm-deptree.c:897 Resuming vg-lvol0_mimage_0 (253:2)
#libdm-deptree.c:1470 Loading vg-lvol0_mlog table
#libdm-deptree.c:1421 Adding target: 0 4096 linear 8:34 384
#libdm-deptree.c:897 Resuming vg-lvol0_mlog (253:1)
#libdm-deptree.c:897 Resuming vg-lvol0_mimagetmp_2 (253:5)
#libdm-deptree.c:897 Resuming vg-lvol0 (253:4)
OTOH, with the patch, it shows that the error target is loaded
for vg-lvol0_mimagetmp_2.
(excerpt from lvconvert-good.log)
#libdm-deptree.c:940 Suspending vg-lvol0 (253:4)
#libdm-deptree.c:940 Suspending vg-lvol0_mimage_2 (253:6)
#libdm-deptree.c:940 Suspending vg-lvol0_mimagetmp_2 (253:5)
#libdm-deptree.c:940 Suspending vg-lvol0_mimage_1 (253:3)
#libdm-deptree.c:940 Suspending vg-lvol0_mimage_0 (253:2)
#libdm-deptree.c:940 Suspending vg-lvol0_mlog (253:1)
#libdm-deptree.c:897 Resuming vg-lvol0_mimage_1 (253:3)
#libdm-deptree.c:897 Resuming vg-lvol0_mimage_0 (253:2)
#libdm-deptree.c:897 Resuming vg-lvol0_mlog (253:1)
#libdm-deptree.c:1470 Loading vg-lvol0_mimagetmp_2 table
#libdm-deptree.c:1421 Adding target: 0 4096 error
#libdm-deptree.c:897 Resuming vg-lvol0_mimagetmp_2 (253:5)
#libdm-deptree.c:1470 Loading vg-lvol0_mimage_2 table
#libdm-deptree.c:1421 Adding target: 0 4096 linear 8:49 384
#libdm-deptree.c:897 Resuming vg-lvol0_mimage_2 (253:6)
#libdm-deptree.c:1470 Loading vg-lvol0_mlog table
#libdm-deptree.c:1421 Adding target: 0 4096 linear 8:34 384
#libdm-deptree.c:1470 Loading vg-lvol0_mimage_0 table
#libdm-deptree.c:1421 Adding target: 0 4096 linear 8:65 384
#libdm-deptree.c:1470 Loading vg-lvol0_mimage_1 table
#libdm-deptree.c:1421 Adding target: 0 4096 linear 8:33 384
#libdm-deptree.c:897 Resuming vg-lvol0 (253:4)
Thanks,
--
Jun'ichi Nomura, NEC Corporation of America
-------------- next part --------------
A non-text attachment was scrubbed...
Name: fix-explicit-activation-of-offtree-lv.patch
Type: text/x-patch
Size: 1517 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/lvm-devel/attachments/20080206/146d25c1/attachment.bin>
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2008-02-06 22:05 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-02-06 21:55 [PATCH lvconvert 0/2] Fixes suspend/resume ordering of lvconvert Jun'ichi Nomura
2008-02-06 22:04 ` [PATCH lvconvert 1/2] Fix resume/suspend ordering after temporary mirror insertion Jun'ichi Nomura
2008-02-06 22:05 ` [PATCH lvconvert 2/2] Update dm table of off-tree layer LV on lvconvert Jun'ichi Nomura
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.