[linux-lvm] Snapshots on clustered LVM

linux-lvm.redhat.com archive mirror
 help / color / mirror / Atom feed

* [linux-lvm] Snapshots on clustered LVM
@ 2015-08-25 10:09 Bram Klein Gunnewiek
  2015-08-26 10:59 ` Zdenek Kabelac
  2015-08-26 16:35 ` Digimer
  0 siblings, 2 replies; 7+ messages in thread
From: Bram Klein Gunnewiek @ 2015-08-25 10:09 UTC (permalink / raw)
  To: LVM general discussion and development

[-- Attachment #1: Type: text/plain, Size: 2313 bytes --]

Currently we are using LVM as backing storage for our DRBD disks in HA 
set-ups. We use QEMU instances on our node's using (local) DRBD targets 
for storage. This enables us to do live migrations between the DRBD 
primary/secondary nodes.

We want to support iSCSI targergets in our HA enviroment. We are trying 
to see if we can use (c)lvm for that by creating a volume group of our 
iSCSI block devices and use that volume group on all nodes to create 
logical volumes. This seems to work fine if we handle locking etc 
properly and make sure we only activate the logical volumes on one node 
at a time. As long as we only have a volume active on one node snapshots 
seem to work fine also.

However, we run into problems when we want to perform a live migration 
of a running QEMU instance. In order to do a live migration we have to 
start a second similar QEMU on the node we want to migrate to and start 
a QEMU live migration. In order for us to do that we have to make the 
logical volume active on the target node otherwise we can't start the 
QEMU instance. During the live migration QEMU ensures that data is only 
written on one node (e.g. during the live migration data will be written 
on the source node, QEMU wil then pause the instance for a short while 
when copying the last data and will then continue the instance on the 
target node).

This use case works fine with a clustered LVM set-up except for 
snapshots. Changes are not saved in the snapshot when the logical volume 
is active on both nodes (as expected if the manual is correct: 
https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/5/html-single/Logical_Volume_Manager_Administration/#snapshot_volumes). 

If we are correct it means we can use lvm for as clustered "file system" 
but can't trust our snapshots to be 100% reliable if a volume group has 
been made active on more then one node. E.G. when doing a live migration 
between two nodes of a QEMU instance our snapshots become unreliable.

Are these conclusions correct? Is there a solution for this problem or 
is this simply a known limitation of clustered lvm without a work-around?

-- 
Met vriendelijke groet / Kind regards,
Bram Klein Gunnewiek | Shock Media B.V.

Tel: +31 (0)546 - 714360
Fax: +31 (0)546 - 714361
Web: https://www.shockmedia.nl/

[-- Attachment #2: Type: text/html, Size: 3969 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [linux-lvm] Snapshots on clustered LVM
  2015-08-25 10:09 [linux-lvm] Snapshots on clustered LVM Bram Klein Gunnewiek
@ 2015-08-26 10:59 ` Zdenek Kabelac
  2015-08-26 12:22   ` Bram Klein Gunnewiek
  2015-08-26 16:35 ` Digimer
  1 sibling, 1 reply; 7+ messages in thread
From: Zdenek Kabelac @ 2015-08-26 10:59 UTC (permalink / raw)
  To: LVM general discussion and development

Dne 25.8.2015 v 12:09 Bram Klein Gunnewiek napsal(a):
> Currently we are using LVM as backing storage for our DRBD disks in HA
> set-ups. We use QEMU instances on our node's using (local) DRBD targets for
> storage. This enables us to do live migrations between the DRBD
> primary/secondary nodes.
>
> We want to support iSCSI targergets in our HA enviroment. We are trying to see
> if we can use (c)lvm for that by creating a volume group of our iSCSI block
> devices and use that volume group on all nodes to create logical volumes. This
> seems to work fine if we handle locking etc properly and make sure we only
> activate the logical volumes on one node at a time. As long as we only have a
> volume active on one node snapshots seem to work fine also.
>
> However, we run into problems when we want to perform a live migration of a
> running QEMU instance. In order to do a live migration we have to start a
> second similar QEMU on the node we want to migrate to and start a QEMU live
> migration. In order for us to do that we have to make the logical volume
> active on the target node otherwise we can't start the QEMU instance. During
> the live migration QEMU ensures that data is only written on one node (e.g.
> during the live migration data will be written on the source node, QEMU wil
> then pause the instance for a short while when copying the last data and will
> then continue the instance on the target node).
>
> This use case works fine with a clustered LVM set-up except for snapshots.
> Changes are not saved in the snapshot when the logical volume is active on
> both nodes (as expected if the manual is correct:
> https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/5/html-single/Logical_Volume_Manager_Administration/#snapshot_volumes).
>
>
> If we are correct it means we can use lvm for as clustered "file system" but
> can't trust our snapshots to be 100% reliable if a volume group has been made
> active on more then one node. E.G. when doing a live migration between two
> nodes of a QEMU instance our snapshots become unreliable.
>
> Are these conclusions correct? Is there a solution for this problem or is this
> simply a known limitation of clustered lvm without a work-around?

Yes - snapshots are supported ONLY for exclusively actived volumes (means LV 
with snapshot is active only on a single node in cluster).

There is no dm target which would support clustered usage of snapshots.

Zdenek

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [linux-lvm] Snapshots on clustered LVM
  2015-08-26 10:59 ` Zdenek Kabelac
@ 2015-08-26 12:22   ` Bram Klein Gunnewiek
  2015-08-26 12:44     ` Zdenek Kabelac
  2015-08-26 16:23     ` Digimer
  0 siblings, 2 replies; 7+ messages in thread
From: Bram Klein Gunnewiek @ 2015-08-26 12:22 UTC (permalink / raw)
  To: LVM general discussion and development

On 08/26/2015 12:59 PM, Zdenek Kabelac wrote:
> Dne 25.8.2015 v 12:09 Bram Klein Gunnewiek napsal(a):
>> Currently we are using LVM as backing storage for our DRBD disks in HA
>> set-ups. We use QEMU instances on our node's using (local) DRBD 
>> targets for
>> storage. This enables us to do live migrations between the DRBD
>> primary/secondary nodes.
>>
>> We want to support iSCSI targergets in our HA enviroment. We are 
>> trying to see
>> if we can use (c)lvm for that by creating a volume group of our iSCSI 
>> block
>> devices and use that volume group on all nodes to create logical 
>> volumes. This
>> seems to work fine if we handle locking etc properly and make sure we 
>> only
>> activate the logical volumes on one node at a time. As long as we 
>> only have a
>> volume active on one node snapshots seem to work fine also.
>>
>> However, we run into problems when we want to perform a live 
>> migration of a
>> running QEMU instance. In order to do a live migration we have to 
>> start a
>> second similar QEMU on the node we want to migrate to and start a 
>> QEMU live
>> migration. In order for us to do that we have to make the logical volume
>> active on the target node otherwise we can't start the QEMU instance. 
>> During
>> the live migration QEMU ensures that data is only written on one node 
>> (e.g.
>> during the live migration data will be written on the source node, 
>> QEMU wil
>> then pause the instance for a short while when copying the last data 
>> and will
>> then continue the instance on the target node).
>>
>> This use case works fine with a clustered LVM set-up except for 
>> snapshots.
>> Changes are not saved in the snapshot when the logical volume is 
>> active on
>> both nodes (as expected if the manual is correct:
>> https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/5/html-single/Logical_Volume_Manager_Administration/#snapshot_volumes). 
>>
>>
>>
>> If we are correct it means we can use lvm for as clustered "file 
>> system" but
>> can't trust our snapshots to be 100% reliable if a volume group has 
>> been made
>> active on more then one node. E.G. when doing a live migration 
>> between two
>> nodes of a QEMU instance our snapshots become unreliable.
>>
>> Are these conclusions correct? Is there a solution for this problem 
>> or is this
>> simply a known limitation of clustered lvm without a work-around?
>
> Yes - snapshots are supported ONLY for exclusively actived volumes 
> (means LV with snapshot is active only on a single node in cluster).
>
> There is no dm target which would support clustered usage of snapshots.
>
> Zdenek
>

Thanks for the confirmation. It's a pitty we can't get this done with 
LVM ... we will try to find an alternative.

Out of curiosity, how does a node know the volume is opened at another 
node? In our test set-up we don't use CLVM or anything (we are just 
testing), so there is no communication between the nodes. Is this done 
through meta data in the volume group / logical volume?

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [linux-lvm] Snapshots on clustered LVM
  2015-08-26 12:22   ` Bram Klein Gunnewiek
@ 2015-08-26 12:44     ` Zdenek Kabelac
  2015-08-26 14:17       ` David Teigland
  2015-08-26 16:23     ` Digimer
  1 sibling, 1 reply; 7+ messages in thread
From: Zdenek Kabelac @ 2015-08-26 12:44 UTC (permalink / raw)
  To: LVM general discussion and development

Dne 26.8.2015 v 14:22 Bram Klein Gunnewiek napsal(a):
> On 08/26/2015 12:59 PM, Zdenek Kabelac wrote:
>> Dne 25.8.2015 v 12:09 Bram Klein Gunnewiek napsal(a):
>>> Currently we are using LVM as backing storage for our DRBD disks in HA
>>> set-ups. We use QEMU instances on our node's using (local) DRBD targets for
>>> storage. This enables us to do live migrations between the DRBD
>>> primary/secondary nodes.
>>>
>>> We want to support iSCSI targergets in our HA enviroment. We are trying to see
>>> if we can use (c)lvm for that by creating a volume group of our iSCSI block
>>> devices and use that volume group on all nodes to create logical volumes. This
>>> seems to work fine if we handle locking etc properly and make sure we only
>>> activate the logical volumes on one node at a time. As long as we only have a
>>> volume active on one node snapshots seem to work fine also.
>>>
>>> However, we run into problems when we want to perform a live migration of a
>>> running QEMU instance. In order to do a live migration we have to start a
>>> second similar QEMU on the node we want to migrate to and start a QEMU live
>>> migration. In order for us to do that we have to make the logical volume
>>> active on the target node otherwise we can't start the QEMU instance. During
>>> the live migration QEMU ensures that data is only written on one node (e.g.
>>> during the live migration data will be written on the source node, QEMU wil
>>> then pause the instance for a short while when copying the last data and will
>>> then continue the instance on the target node).
>>>
>>> This use case works fine with a clustered LVM set-up except for snapshots.
>>> Changes are not saved in the snapshot when the logical volume is active on
>>> both nodes (as expected if the manual is correct:
>>> https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/5/html-single/Logical_Volume_Manager_Administration/#snapshot_volumes).
>>>
>>>
>>>
>>> If we are correct it means we can use lvm for as clustered "file system" but
>>> can't trust our snapshots to be 100% reliable if a volume group has been made
>>> active on more then one node. E.G. when doing a live migration between two
>>> nodes of a QEMU instance our snapshots become unreliable.
>>>
>>> Are these conclusions correct? Is there a solution for this problem or is this
>>> simply a known limitation of clustered lvm without a work-around?
>>
>> Yes - snapshots are supported ONLY for exclusively actived volumes (means LV
>> with snapshot is active only on a single node in cluster).
>>
>> There is no dm target which would support clustered usage of snapshots.
>>
>> Zdenek
>>
>
> Thanks for the confirmation. It's a pitty we can't get this done with LVM ...
> we will try to find an alternative.
>
> Out of curiosity, how does a node know the volume is opened at another node?
> In our test set-up we don't use CLVM or anything (we are just testing), so
> there is no communication between the nodes. Is this done through meta data in
> the volume group / logical volume?


I've no idea what you are using then - I'm clearly talking only about lvm2 
solution which is ATM based on clvmd usage (there is now integrated support 
for another locking manager - sanlock)

If you are using some other locking mechanism - it's then purely up-to-you to 
maintain integrity of the whole system - i.e. ensuring there are not multiple 
metadata writes from various nodes or where and how are the LVs activated.

Also there are already existing solutions for what you describe, but I assume 
you prefer your own home-brewed solution - but it's long journey ahead of you...

Zdenek

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [linux-lvm] Snapshots on clustered LVM
  2015-08-26 12:44     ` Zdenek Kabelac
@ 2015-08-26 14:17       ` David Teigland
  0 siblings, 0 replies; 7+ messages in thread
From: David Teigland @ 2015-08-26 14:17 UTC (permalink / raw)
  To: Bram Klein Gunnewiek, Zdenek Kabelac
  Cc: LVM general discussion and development

On Wed, Aug 26, 2015 at 02:44:13PM +0200, Zdenek Kabelac wrote:
> Also there are already existing solutions for what you describe, but
> I assume you prefer your own home-brewed solution - but it's long
> journey ahead of you...

RHEV/ovirt is an existing solution that uses lvm on multiple hosts and
does live migration.  They have quite a bit of very specialized lvm code
to do that right -- not typical lvm usage at all.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [linux-lvm] Snapshots on clustered LVM
  2015-08-26 12:22   ` Bram Klein Gunnewiek
  2015-08-26 12:44     ` Zdenek Kabelac
@ 2015-08-26 16:23     ` Digimer
  1 sibling, 0 replies; 7+ messages in thread
From: Digimer @ 2015-08-26 16:23 UTC (permalink / raw)
  To: LVM general discussion and development

On 26/08/15 08:22 AM, Bram Klein Gunnewiek wrote:
> On 08/26/2015 12:59 PM, Zdenek Kabelac wrote:
>> Dne 25.8.2015 v 12:09 Bram Klein Gunnewiek napsal(a):
>>> Currently we are using LVM as backing storage for our DRBD disks in HA
>>> set-ups. We use QEMU instances on our node's using (local) DRBD
>>> targets for
>>> storage. This enables us to do live migrations between the DRBD
>>> primary/secondary nodes.
>>>
>>> We want to support iSCSI targergets in our HA enviroment. We are
>>> trying to see
>>> if we can use (c)lvm for that by creating a volume group of our iSCSI
>>> block
>>> devices and use that volume group on all nodes to create logical
>>> volumes. This
>>> seems to work fine if we handle locking etc properly and make sure we
>>> only
>>> activate the logical volumes on one node at a time. As long as we
>>> only have a
>>> volume active on one node snapshots seem to work fine also.
>>>
>>> However, we run into problems when we want to perform a live
>>> migration of a
>>> running QEMU instance. In order to do a live migration we have to
>>> start a
>>> second similar QEMU on the node we want to migrate to and start a
>>> QEMU live
>>> migration. In order for us to do that we have to make the logical volume
>>> active on the target node otherwise we can't start the QEMU instance.
>>> During
>>> the live migration QEMU ensures that data is only written on one node
>>> (e.g.
>>> during the live migration data will be written on the source node,
>>> QEMU wil
>>> then pause the instance for a short while when copying the last data
>>> and will
>>> then continue the instance on the target node).
>>>
>>> This use case works fine with a clustered LVM set-up except for
>>> snapshots.
>>> Changes are not saved in the snapshot when the logical volume is
>>> active on
>>> both nodes (as expected if the manual is correct:
>>> https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/5/html-single/Logical_Volume_Manager_Administration/#snapshot_volumes).
>>>
>>>
>>>
>>> If we are correct it means we can use lvm for as clustered "file
>>> system" but
>>> can't trust our snapshots to be 100% reliable if a volume group has
>>> been made
>>> active on more then one node. E.G. when doing a live migration
>>> between two
>>> nodes of a QEMU instance our snapshots become unreliable.
>>>
>>> Are these conclusions correct? Is there a solution for this problem
>>> or is this
>>> simply a known limitation of clustered lvm without a work-around?
>>
>> Yes - snapshots are supported ONLY for exclusively actived volumes
>> (means LV with snapshot is active only on a single node in cluster).
>>
>> There is no dm target which would support clustered usage of snapshots.
>>
>> Zdenek
>>
> 
> Thanks for the confirmation. It's a pitty we can't get this done with
> LVM ... we will try to find an alternative.
> 
> Out of curiosity, how does a node know the volume is opened at another
> node? In our test set-up we don't use CLVM or anything (we are just
> testing), so there is no communication between the nodes. Is this done
> through meta data in the volume group / logical volume?

Clustered LVM uses DLM. You can see which nodes are using a given lock
space with 'dlm_tool ls'. When a node joins or leaves, it joins or
leaves whatever lock spaces it's has resources using.

A nodes doesn't have to be actively using a resource, but if it's in a
cluster, it needs to coordinate with the other nodes, even if just to
say "I ACK the changes" or"I'm not using the resources" when
coordinating locks.

-- 
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without
access to education?

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [linux-lvm] Snapshots on clustered LVM
  2015-08-25 10:09 [linux-lvm] Snapshots on clustered LVM Bram Klein Gunnewiek
  2015-08-26 10:59 ` Zdenek Kabelac
@ 2015-08-26 16:35 ` Digimer
  1 sibling, 0 replies; 7+ messages in thread
From: Digimer @ 2015-08-26 16:35 UTC (permalink / raw)
  To: LVM general discussion and development

On 25/08/15 06:09 AM, Bram Klein Gunnewiek wrote:
> Currently we are using LVM as backing storage for our DRBD disks in HA
> set-ups. We use QEMU instances on our node's using (local) DRBD targets
> for storage. This enables us to do live migrations between the DRBD
> primary/secondary nodes.
> 
> We want to support iSCSI targergets in our HA enviroment. We are trying
> to see if we can use (c)lvm for that by creating a volume group of our
> iSCSI block devices and use that volume group on all nodes to create
> logical volumes. This seems to work fine if we handle locking etc
> properly and make sure we only activate the logical volumes on one node
> at a time. As long as we only have a volume active on one node snapshots
> seem to work fine also.

DRBD, like an iSCSI LUN, is just another block device to LVM. So I see
no reason why clvmd won't work just fine. Main advantage is that you can
scale iscsi to 3+ nodes, but you lose data being replicated unless you
have a very nice SAN.

Once the LV is visible on all nodes though, it's up to you to make sure
they're used by apps/fses that understand clustering. I use clustered
LVs to back gfs2 and to back VMs (LV dedicated to a VM, and the cluster
resource manager ensures that a VM is only on one node at a time).

> However, we run into problems when we want to perform a live migration
> of a running QEMU instance. In order to do a live migration we have to
> start a second similar QEMU on the node we want to migrate to and start
> a QEMU live migration. In order for us to do that we have to make the
> logical volume active on the target node otherwise we can't start the
> QEMU instance. During the live migration QEMU ensures that data is only
> written on one node (e.g. during the live migration data will be written
> on the source node, QEMU wil then pause the instance for a short while
> when copying the last data and will then continue the instance on the
> target node).

If you're using clustered LVM, live migration will work just fine. This
is exactly what I do. The LV will need to be ACTIVE on both nodes though.

> This use case works fine with a clustered LVM set-up except for
> snapshots. Changes are not saved in the snapshot when the logical volume
> is active on both nodes (as expected if the manual is correct:
> https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/5/html-single/Logical_Volume_Manager_Administration/#snapshot_volumes).

Note that your link is very old, for RHEL 5.

Snapshotting is a problem. As Zdenek said, you have to set the other
nodes to inactive and then set the current host node's LV to
'exclusive'. Trick I found though was that you can't mark it as
exclusive while it's ACTIVE, and you can't make the LV inactive while
it's hosting a VM... So in practical terms, snapshotting clustered LVs
is not feasible.

> If we are correct it means we can use lvm for as clustered "file system"
> but can't trust our snapshots to be 100% reliable if a volume group has
> been made active on more then one node. E.G. when doing a live migration
> between two nodes of a QEMU instance our snapshots become unreliable.

You can never trust a snapshot 100%; It doesn't capture information in
the VM's memory. So at best, using a snapshot to recover is like
recovering from a sudden power loss. It's then up to your apps and OS to
recover, and that's not always the case with many DBs, unless they're
carefully configured.

This is the core reason why our company won't support snapshots at all.
It gives people a false sense of having good backups.

> Are these conclusions correct? Is there a solution for this problem or
> is this simply a known limitation of clustered lvm without a work-around?

Clustered LVs over a SAN-backed PV will work perfectly fine for live
migrations. Snapshots are not feasible though, and not recommended in
any case.

-- 
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without
access to education?

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2015-08-26 16:35 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-08-25 10:09 [linux-lvm] Snapshots on clustered LVM Bram Klein Gunnewiek
2015-08-26 10:59 ` Zdenek Kabelac
2015-08-26 12:22   ` Bram Klein Gunnewiek
2015-08-26 12:44     ` Zdenek Kabelac
2015-08-26 14:17       ` David Teigland
2015-08-26 16:23     ` Digimer
2015-08-26 16:35 ` Digimer

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).