linux-lvm.redhat.com archive mirror
 help / color / mirror / Atom feed
From: Vladislav Bogdanov <bubble@hoster-ok.com>
To: Zdenek Kabelac <zkabelac@redhat.com>
Cc: David Teigland <teigland@redhat.com>, linux-lvm@redhat.com
Subject: Re: [linux-lvm] [PATCH 10/10] man: document --node option to lvchange
Date: Wed, 20 Mar 2013 15:12:59 +0300	[thread overview]
Message-ID: <5149A7CB.9090703@hoster-ok.com> (raw)
In-Reply-To: <51497718.5080209@redhat.com>

20.03.2013 11:45, Zdenek Kabelac wrote:
> Dne 19.3.2013 18:36, Vladislav Bogdanov napsal(a):
>> 19.03.2013 20:16, David Teigland wrote:
>>> On Tue, Mar 19, 2013 at 07:52:14PM +0300, Vladislav Bogdanov wrote:
>>>> And, do you have any estimations, how long may it take to have you
>>>> ideas
>>>> ready for production use?
>>>
>>> It'll be quite a while (and the new locking scheme I'm working on
>>> will not
>>> include remote command execution.)
>>>
>>>> Also, as you're not satisfied with this implementation, what
>>>> alternative
>>>> way do you see? (calling ssh from libvirt or LVM API is not a good idea
>>>> at all I think)
>>>
>>> Apart from using ovirt/rhev, I'd try one of the following behind the
>>> libvirt locking api: sanlock, dlm, file locks on nfs, file locks on
>>> gfs2.
>>
>> Unfortunately none of these solve the main thing I need: Allow LVM
>> snapshots without breaking live VM migration :(
>>
>> Cluster-wide snapshots (with shared lock) would solve this, but I do not
>> expect to see this implemented soon.
>>
> 
> Before I'll go any deeper with reviewing patches myself - I'd like to
> make myself clean about this 'snapshot' issue.
> 
> (BTW there is already one thing which will surely not pass - it's the
> 'node' option for lvm command - this would have to be made diferently).
> 
> But back to snapshots -
> 
> What would be the point of having (old, non thinp) snapshots active at
> the same time on more then 1 node ?

There is no need on this.

I need source volume itself to be active on two nodes to perform live VM
migration. libvirt/qemu controls which instance has CPUs turned on. But
qemu processes on two nodes need to have LV open simultaneously.

I'm able to take snapshot only when volume is activated exclusively.
I can open that snapshot (and take backup) only on node where source
volume is exclusive.

And I ultimately do not want to take VM down to lock LV exclusively to
take snapshot (if it runs on a shared-locked VM) and I do not want to do
offline migration (with exclusive lock on LV). To satisfy this lock
conversion is needed.

I'm still new to thinp, because it was introduced relatively recently,
and I had no chance to look at it closer (I tried to allocate pool once
on a clustered VG and the whole test cluster stuck because of this).

Does it work on clustered VGs now?
And, is it possible now to take/activate/open thinp snapshot on a node
different from one where source volume is open?

> 
> That would simply not work - since you would have to ensure that noone
> will write to  snapshot & origin on either of those nodes?
> 
> Is your code doing some transition which needs active device on both nodes
> treating them in read-only way ?

Yes, but not my, it is libvirt.
It opens block device (LV) on both source and destination nodes (it runs
qemu process in a paused state on a destination node and that process
opens block device).
After that memory state is transferred to a destination node, then qemu
process on a source node is paused (turns off virtual CPUs), then qemu
process on a destination node is resumed (turns on virtual CPUs) and
then qemu process on a source node is killed, thus releasing the LV.
Adding one more migration phase ("confirm confirmation") and thus
introducing one more migration protocol version seems to be overkill for me.

When qemu process is paused on a node, LV is effectively read-only (ok,
almost read-only, libvirt still try to set up DAC permissions and
selinux label on it, but data is not written).

There is only bit of time when both source and destination processes are
paused (less that millisecond).

When qemu is running, it writes to device.

What concerns my code in libvirt:
I made one more "logical" pool subtype - clvm, which starts with all LVs
deactivated.
I also wrote the locking driver (which works similar to sanlock an
virtlockd ones), which
* activates volume exclusively on start
* converts lock to shared on a source node before migration
* activates volume in a shared mode on a migration target
* deactivates volume on a source node after migration is finished
* converts lock from a shared to exclusive remotely on destination node
from a source node

It also has local locking concept to prevent LV to be opened more than
one time on the node it is activated exclusively.

As I wrote above, there is no event like "you can convert lock to
exclusive" available on a destination node.

> 
> Since metadata for snapshot are only parsed during first activation of
> snapshot, there is no way, the second node could resync if you would
> have written to the snapshot/origin on the first node.
> 
> So could you please describe in more details how it's supposed to work?

It is ok for me to lose snapshot during migration. I just need to be
able to backup VM data while it is constantly running on one node. If
pacemaker decides to migrate VM, then backup just fails and will be
restarted (together with new snapshot creation) from a beginning after
migration is finished.

Vladislav

  reply	other threads:[~2013-03-20 12:12 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-03-19 13:32 [linux-lvm] [PATCH 00/10] Enhancements to a clustered logical volume activation Vladislav Bogdanov
2013-03-19 13:32 ` [linux-lvm] [PATCH 01/10] lvchange: Allow cluster lock conversion Vladislav Bogdanov
2013-03-19 15:23   ` David Teigland
2013-03-19 15:33     ` Vladislav Bogdanov
2013-03-19 15:44       ` Vladislav Bogdanov
2013-03-19 16:03         ` David Teigland
2013-03-19 16:36           ` Vladislav Bogdanov
2013-03-19 13:32 ` [linux-lvm] [PATCH 02/10] clvmd: Fix buffer size Vladislav Bogdanov
2013-03-19 13:32 ` [linux-lvm] [PATCH 03/10] clvmd: Allow node names to be obtained from corosync's CMAP Vladislav Bogdanov
2013-03-19 13:32 ` [linux-lvm] [PATCH 04/10] clvmd: fix positive return value is not an error in csid->name translation Vladislav Bogdanov
2013-03-19 13:32 ` [linux-lvm] [PATCH 05/10] clvmd: use correct flags for local command execution Vladislav Bogdanov
2013-03-19 13:32 ` [linux-lvm] [PATCH 06/10] clvmd: additional debugging - print message bodies Vladislav Bogdanov
2013-03-19 13:32 ` [linux-lvm] [PATCH 07/10] locking: Allow lock management (activation, deactivation, conversion) on a remote nodes Vladislav Bogdanov
2013-03-19 13:32 ` [linux-lvm] [PATCH 08/10] lvchange: implement remote lock management Vladislav Bogdanov
2013-03-19 13:32 ` [linux-lvm] [PATCH 09/10] man: document --force option to lvchange, provide examples Vladislav Bogdanov
2013-03-19 13:32 ` [linux-lvm] [PATCH 10/10] man: document --node option to lvchange Vladislav Bogdanov
2013-03-19 15:32   ` David Teigland
2013-03-19 15:42     ` Vladislav Bogdanov
2013-03-19 15:54       ` David Teigland
2013-03-19 16:52         ` Vladislav Bogdanov
2013-03-19 17:16           ` David Teigland
2013-03-19 17:36             ` Vladislav Bogdanov
2013-03-20  8:45               ` Zdenek Kabelac
2013-03-20 12:12                 ` Vladislav Bogdanov [this message]
2013-03-21 18:31                 ` Vladislav Bogdanov
2013-03-21 19:01                   ` Zdenek Kabelac
2013-03-21 19:16                     ` Vladislav Bogdanov
2013-03-21 18:23     ` Vladislav Bogdanov
2013-03-19 16:42 ` [linux-lvm] [PATCH 00/10] Enhancements to a clustered logical volume activation Alasdair G Kergon
2013-03-19 17:42   ` Vladislav Bogdanov
2013-06-05 13:23     ` [linux-lvm] clvmd leaving kernel dlm uncontrolled lockspace Andreas Pflug
2013-06-05 15:13       ` David Teigland
2013-06-05 17:29         ` Andreas Pflug
2013-06-06  6:17         ` Andreas Pflug
2013-06-06 11:06           ` matthew patton
2013-06-06 17:54             ` Andreas Pflug

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5149A7CB.9090703@hoster-ok.com \
    --to=bubble@hoster-ok.com \
    --cc=linux-lvm@redhat.com \
    --cc=teigland@redhat.com \
    --cc=zkabelac@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).