Re: [Qemu-devel] [PATCH 2/2] sheepdog: support 'qemu-img snapshot -a'

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Liu Yuan <namei.unix@gmail.com>
To: Kevin Wolf <kwolf@redhat.com>
Cc: Stefan Hajnoczi <stefanha@redhat.com>,
	sheepdog@lists.wpkg.org, qemu-devel@nongnu.org,
	MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp>
Subject: Re: [Qemu-devel] [PATCH 2/2] sheepdog: support 'qemu-img snapshot -a'
Date: Sat, 08 Jun 2013 00:14:26 +0800	[thread overview]
Message-ID: <51B206E2.6020007@gmail.com> (raw)
In-Reply-To: <20130607152246.GH3658@dhcp-200-207.str.redhat.com>

On 06/07/2013 11:22 PM, Kevin Wolf wrote:
> Am 07.06.2013 um 15:48 hat Liu Yuan geschrieben:
>> On 06/07/2013 03:31 PM, Kevin Wolf wrote:
>>> Am 06.06.2013 um 15:09 hat Liu Yuan geschrieben:
>>>> On 06/06/2013 08:46 PM, Kevin Wolf wrote:
>>>>> Am 06.06.2013 um 13:57 hat Liu Yuan geschrieben:
>>>> Only when the write comes from VM, we do the following stuff
>>>>  - delete active vdi A
>>>>  - created a new inode based on the previously reloaded As1's inode
>>>
>>> Thanks, this is the part that I missed.
>>>
>>> I'm not sure however why the actual switch is delayed until the first
>>> write. This seems inconsistent with qcow2 snapshots.
>>>
>>> Do you know if there is a reason why we can't always do this already
>>> during bdrv_snapshot_goto()?
>>>
>>
>> I think the reason is sd_load_vmstate() need to load vm state objects
>> with the correct inode object.
>>
>> I tried to remove
>>
>>   if (!s->inode.vm_state_size)
>>
>> and make sd_create_branch unconditional. This means 'loadvm' command
>> will try to call sd_create_branch() inside sd_snapshot_goto(). But
>> failed with reading the wrong snapshot because the vdi's inode object is
>> changed by sd_create_branch().
> 
> Ok, I think I start to understand how these things work. Basically,
> qemu's savevm functionality is designed to work with qcow2 like this:
> 
> 1. Save the VM state to the active layer
> 2. create_snapshot saves the active layer including VM state
> 3. [ Forget to remove the VM state from the active layer :-) ]
> 4. loadvm loads the snapshot again, including VM state
> 5. VM state is restored from the active layer
> 
> So for Sheepdog, the problem is that step 2 doesn't include the VM state,
> right? So our options are either to change Sheepdog so that step 2 does
> involve moving the VM state (might end up rather ugly) or you need to
> swap steps 1 and 2, so that you first switch to the new snapshot and
> then write the VM state.
> 

Sorry, I didn't fully understand the above example. If sheepdog takes
snapshots, vmstate will go with the exact current active disk into the
cluster storage, for e.g

1 we take two snapshots on the disk 'test' with tag A and B respectively
2 disk(A) + vmstate(A) will be stored as indexed by A's vdi
  disk(B) + vmstate(B) will be stored as indexed by B's vdi

The chain is A --> B --> C, C is the current active vdi. Note, both A, B
and C has different vdi_id.

The problem show up when we do loadvm A, the process is:

1 reload inode of A (in snapshot_goto)
2 read vmstate via A's vdi_id (loadvm_state)
3 delete C and create C1, reload inode of C1 (sd_create_branch on write)

So if at stage 1, we call sd_create_branch(), the inode will point to C1
and stage 2 will fail to read vmstate because vdi_id is C1's.

This is why we rely on the write to call sd_create_branch(). I am not
sure if we can fix sheepdog's loadvm without touching the core code of QEMU.


> Because I think what we're doing currently is not only hard to
> understand, but actually wrong. Consider this:
> 
> 1. savevm with a Sheepdog image
> 2. Exit qemu before any write request is sent
> 3. Restart qemu with the Sheepdog image and notice how the wrong
>    snapshot is active. Oops.
> 

Sheepdog indeed has this problem.

> Or doesn't Sheepdog have any notion of an "active snapshot" and this is
> only a runtime decision?
> 


>>>> The chain will look like:
>>>>
>>>> As1 --> As2
>>>>  |
>>>>  V
>>>>  A
>>>>
>>>> This is how sheepdog handles savevm/loadvm.
>>>>
>>>> So for 'qemu-img snapshot -a', we should create the branch in the
>>>> .bdrv_snapshot_goto.
>>>>
>>>> As you pointed out, we need to consider vm state even for 'snapshot -a',
>>>> so I need to rework the patch 2/2.
>>>
>>> Yes, the presence of VM state is independent from whether you're using
>>> qemu-img or loadvm. And it actually goes both ways: qemu-img can revert
>>> to snapshots that have a VM state, and loadvm can be used with images
>>> that don't have a VM state (if you have multiple images, only one of
>>> them has the VM state).
>>>
>>
>> Seems not true of current code. If I 'loadvm' a snapshot without a
>> vmstate, I'll get 'qemu-system-x86_64: This is a disk-only snapshot.
>> Revert to it offline using qemu-img'.
> 
> Try again with multiple disks. Only one disk gets the VM state, the
> other ones get only the disk snapshot.
> 
>> But 'qemu-img snapshot -a' works as you said, it can rollback to the
>> snapshot regardless of vmstate.
>>
>> Also this is a difference to store vmstate for sheepdog images. *Every*
>> snapshot image can have its own vmstate stored in sheepdog cluster. That
>> is, we can have multiple snapshot with its own private vmstate for sheepdog.
> 
> Sorry, can't parse. :-(
> 

I misunderstood your statements, please ignore it.

Thanks,
Yuan

next prev parent reply	other threads:[~2013-06-07 16:14 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-06-06 11:57 [Qemu-devel] [PATCH 0/2] fix 'qemu-img snapshot -a' operation for sheepdog Liu Yuan
2013-06-06 11:57 ` [Qemu-devel] [PATCH 1/2] sheepdog: fix snapshot tag initialization Liu Yuan
2013-06-06 12:46   ` Kevin Wolf
2013-06-06 11:57 ` [Qemu-devel] [PATCH 2/2] sheepdog: support 'qemu-img snapshot -a' Liu Yuan
2013-06-06 12:46   ` Kevin Wolf
2013-06-06 13:09     ` Liu Yuan
2013-06-07  7:31       ` Kevin Wolf
2013-06-07 13:48         ` Liu Yuan
2013-06-07 15:22           ` Kevin Wolf
2013-06-07 16:14             ` Liu Yuan [this message]
2013-06-07 16:48               ` Kevin Wolf
2013-06-07 17:23                 ` Liu Yuan
2013-06-06 13:41     ` Liu Yuan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=51B206E2.6020007@gmail.com \
    --to=namei.unix@gmail.com \
    --cc=kwolf@redhat.com \
    --cc=morita.kazutaka@lab.ntt.co.jp \
    --cc=qemu-devel@nongnu.org \
    --cc=sheepdog@lists.wpkg.org \
    --cc=stefanha@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.