Re: [Qemu-devel] [PATCH 2/2] sheepdog: support 'qemu-img snapshot -a'

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

From: Liu Yuan <namei.unix@gmail.com>
To: Kevin Wolf <kwolf@redhat.com>
Cc: Stefan Hajnoczi <stefanha@redhat.com>,
	sheepdog@lists.wpkg.org, qemu-devel@nongnu.org,
	MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp>
Subject: Re: [Qemu-devel] [PATCH 2/2] sheepdog: support 'qemu-img snapshot -a'
Date: Sat, 08 Jun 2013 00:14:26 +0800	[thread overview]
Message-ID: <51B206E2.6020007@gmail.com> (raw)
In-Reply-To: <20130607152246.GH3658@dhcp-200-207.str.redhat.com>

On 06/07/2013 11:22 PM, Kevin Wolf wrote:
> Am 07.06.2013 um 15:48 hat Liu Yuan geschrieben:
>> On 06/07/2013 03:31 PM, Kevin Wolf wrote:
>>> Am 06.06.2013 um 15:09 hat Liu Yuan geschrieben:
>>>> On 06/06/2013 08:46 PM, Kevin Wolf wrote:
>>>>> Am 06.06.2013 um 13:57 hat Liu Yuan geschrieben:
>>>> Only when the write comes from VM, we do the following stuff
>>>>  - delete active vdi A
>>>>  - created a new inode based on the previously reloaded As1's inode
>>>
>>> Thanks, this is the part that I missed.
>>>
>>> I'm not sure however why the actual switch is delayed until the first
>>> write. This seems inconsistent with qcow2 snapshots.
>>>
>>> Do you know if there is a reason why we can't always do this already
>>> during bdrv_snapshot_goto()?
>>>
>>
>> I think the reason is sd_load_vmstate() need to load vm state objects
>> with the correct inode object.
>>
>> I tried to remove
>>
>>   if (!s->inode.vm_state_size)
>>
>> and make sd_create_branch unconditional. This means 'loadvm' command
>> will try to call sd_create_branch() inside sd_snapshot_goto(). But
>> failed with reading the wrong snapshot because the vdi's inode object is
>> changed by sd_create_branch().
> 
> Ok, I think I start to understand how these things work. Basically,
> qemu's savevm functionality is designed to work with qcow2 like this:
> 
> 1. Save the VM state to the active layer
> 2. create_snapshot saves the active layer including VM state
> 3. [ Forget to remove the VM state from the active layer :-) ]
> 4. loadvm loads the snapshot again, including VM state
> 5. VM state is restored from the active layer
> 
> So for Sheepdog, the problem is that step 2 doesn't include the VM state,
> right? So our options are either to change Sheepdog so that step 2 does
> involve moving the VM state (might end up rather ugly) or you need to
> swap steps 1 and 2, so that you first switch to the new snapshot and
> then write the VM state.
> 

Sorry, I didn't fully understand the above example. If sheepdog takes
snapshots, vmstate will go with the exact current active disk into the
cluster storage, for e.g

1 we take two snapshots on the disk 'test' with tag A and B respectively
2 disk(A) + vmstate(A) will be stored as indexed by A's vdi
  disk(B) + vmstate(B) will be stored as indexed by B's vdi

The chain is A --> B --> C, C is the current active vdi. Note, both A, B
and C has different vdi_id.

The problem show up when we do loadvm A, the process is:

1 reload inode of A (in snapshot_goto)
2 read vmstate via A's vdi_id (loadvm_state)
3 delete C and create C1, reload inode of C1 (sd_create_branch on write)

So if at stage 1, we call sd_create_branch(), the inode will point to C1
and stage 2 will fail to read vmstate because vdi_id is C1's.

This is why we rely on the write to call sd_create_branch(). I am not
sure if we can fix sheepdog's loadvm without touching the core code of QEMU.


> Because I think what we're doing currently is not only hard to
> understand, but actually wrong. Consider this:
> 
> 1. savevm with a Sheepdog image
> 2. Exit qemu before any write request is sent
> 3. Restart qemu with the Sheepdog image and notice how the wrong
>    snapshot is active. Oops.
> 

Sheepdog indeed has this problem.

> Or doesn't Sheepdog have any notion of an "active snapshot" and this is
> only a runtime decision?
> 


>>>> The chain will look like:
>>>>
>>>> As1 --> As2
>>>>  |
>>>>  V
>>>>  A
>>>>
>>>> This is how sheepdog handles savevm/loadvm.
>>>>
>>>> So for 'qemu-img snapshot -a', we should create the branch in the
>>>> .bdrv_snapshot_goto.
>>>>
>>>> As you pointed out, we need to consider vm state even for 'snapshot -a',
>>>> so I need to rework the patch 2/2.
>>>
>>> Yes, the presence of VM state is independent from whether you're using
>>> qemu-img or loadvm. And it actually goes both ways: qemu-img can revert
>>> to snapshots that have a VM state, and loadvm can be used with images
>>> that don't have a VM state (if you have multiple images, only one of
>>> them has the VM state).
>>>
>>
>> Seems not true of current code. If I 'loadvm' a snapshot without a
>> vmstate, I'll get 'qemu-system-x86_64: This is a disk-only snapshot.
>> Revert to it offline using qemu-img'.
> 
> Try again with multiple disks. Only one disk gets the VM state, the
> other ones get only the disk snapshot.
> 
>> But 'qemu-img snapshot -a' works as you said, it can rollback to the
>> snapshot regardless of vmstate.
>>
>> Also this is a difference to store vmstate for sheepdog images. *Every*
>> snapshot image can have its own vmstate stored in sheepdog cluster. That
>> is, we can have multiple snapshot with its own private vmstate for sheepdog.
> 
> Sorry, can't parse. :-(
> 

I misunderstood your statements, please ignore it.

Thanks,
Yuan

next prev parent reply	other threads:[~2013-06-07 16:14 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-06-06 11:57 [Qemu-devel] [PATCH 0/2] fix 'qemu-img snapshot -a' operation for sheepdog Liu Yuan
2013-06-06 11:57 ` [Qemu-devel] [PATCH 1/2] sheepdog: fix snapshot tag initialization Liu Yuan
2013-06-06 12:46   ` Kevin Wolf
2013-06-06 11:57 ` [Qemu-devel] [PATCH 2/2] sheepdog: support 'qemu-img snapshot -a' Liu Yuan
2013-06-06 12:46   ` Kevin Wolf
2013-06-06 13:09     ` Liu Yuan
2013-06-07  7:31       ` Kevin Wolf
2013-06-07 13:48         ` Liu Yuan
2013-06-07 15:22           ` Kevin Wolf
2013-06-07 16:14             ` Liu Yuan [this message]
2013-06-07 16:48               ` Kevin Wolf
2013-06-07 17:23                 ` Liu Yuan
2013-06-06 13:41     ` Liu Yuan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=51B206E2.6020007@gmail.com \
    --to=namei.unix@gmail.com \
    --cc=kwolf@redhat.com \
    --cc=morita.kazutaka@lab.ntt.co.jp \
    --cc=qemu-devel@nongnu.org \
    --cc=sheepdog@lists.wpkg.org \
    --cc=stefanha@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).