From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from [10.34.130.168] (dhcp130-168.brq.redhat.com [10.34.130.168]) by int-mx14.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id t7QCiDui012520 for ; Wed, 26 Aug 2015 08:44:14 -0400 References: <55DC3EBF.4030703@shockmedia.nl> <55DD9C1C.9070105@redhat.com> <55DDAF71.5080307@shockmedia.nl> From: Zdenek Kabelac Message-ID: <55DDB49D.7060501@redhat.com> Date: Wed, 26 Aug 2015 14:44:13 +0200 MIME-Version: 1.0 In-Reply-To: <55DDAF71.5080307@shockmedia.nl> Content-Transfer-Encoding: 7bit Subject: Re: [linux-lvm] Snapshots on clustered LVM Reply-To: LVM general discussion and development List-Id: LVM general discussion and development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , List-Id: Content-Type: text/plain; charset="us-ascii"; format="flowed" To: LVM general discussion and development Dne 26.8.2015 v 14:22 Bram Klein Gunnewiek napsal(a): > On 08/26/2015 12:59 PM, Zdenek Kabelac wrote: >> Dne 25.8.2015 v 12:09 Bram Klein Gunnewiek napsal(a): >>> Currently we are using LVM as backing storage for our DRBD disks in HA >>> set-ups. We use QEMU instances on our node's using (local) DRBD targets for >>> storage. This enables us to do live migrations between the DRBD >>> primary/secondary nodes. >>> >>> We want to support iSCSI targergets in our HA enviroment. We are trying to see >>> if we can use (c)lvm for that by creating a volume group of our iSCSI block >>> devices and use that volume group on all nodes to create logical volumes. This >>> seems to work fine if we handle locking etc properly and make sure we only >>> activate the logical volumes on one node at a time. As long as we only have a >>> volume active on one node snapshots seem to work fine also. >>> >>> However, we run into problems when we want to perform a live migration of a >>> running QEMU instance. In order to do a live migration we have to start a >>> second similar QEMU on the node we want to migrate to and start a QEMU live >>> migration. In order for us to do that we have to make the logical volume >>> active on the target node otherwise we can't start the QEMU instance. During >>> the live migration QEMU ensures that data is only written on one node (e.g. >>> during the live migration data will be written on the source node, QEMU wil >>> then pause the instance for a short while when copying the last data and will >>> then continue the instance on the target node). >>> >>> This use case works fine with a clustered LVM set-up except for snapshots. >>> Changes are not saved in the snapshot when the logical volume is active on >>> both nodes (as expected if the manual is correct: >>> https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/5/html-single/Logical_Volume_Manager_Administration/#snapshot_volumes). >>> >>> >>> >>> If we are correct it means we can use lvm for as clustered "file system" but >>> can't trust our snapshots to be 100% reliable if a volume group has been made >>> active on more then one node. E.G. when doing a live migration between two >>> nodes of a QEMU instance our snapshots become unreliable. >>> >>> Are these conclusions correct? Is there a solution for this problem or is this >>> simply a known limitation of clustered lvm without a work-around? >> >> Yes - snapshots are supported ONLY for exclusively actived volumes (means LV >> with snapshot is active only on a single node in cluster). >> >> There is no dm target which would support clustered usage of snapshots. >> >> Zdenek >> > > Thanks for the confirmation. It's a pitty we can't get this done with LVM ... > we will try to find an alternative. > > Out of curiosity, how does a node know the volume is opened at another node? > In our test set-up we don't use CLVM or anything (we are just testing), so > there is no communication between the nodes. Is this done through meta data in > the volume group / logical volume? I've no idea what you are using then - I'm clearly talking only about lvm2 solution which is ATM based on clvmd usage (there is now integrated support for another locking manager - sanlock) If you are using some other locking mechanism - it's then purely up-to-you to maintain integrity of the whole system - i.e. ensuring there are not multiple metadata writes from various nodes or where and how are the LVs activated. Also there are already existing solutions for what you describe, but I assume you prefer your own home-brewed solution - but it's long journey ahead of you... Zdenek