From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:55887)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <jsnow@redhat.com>) id 1YPyUw-0006Dc-Hd
	for qemu-devel@nongnu.org; Mon, 23 Feb 2015 14:16:56 -0500
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <jsnow@redhat.com>) id 1YPyUs-0007Rw-Cg
	for qemu-devel@nongnu.org; Mon, 23 Feb 2015 14:16:54 -0500
Received: from mx1.redhat.com ([209.132.183.28]:37484)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <jsnow@redhat.com>) id 1YPyUs-0007Rq-4g
	for qemu-devel@nongnu.org; Mon, 23 Feb 2015 14:16:50 -0500
Message-ID: <54EB7C9D.7080608@redhat.com>
Date: Mon, 23 Feb 2015 14:16:45 -0500
From: John Snow <jsnow@redhat.com>
MIME-Version: 1.0
References: <1423865338-8576-1-git-send-email-jsnow@redhat.com>
	<20150220110920.GD3867@stefanha-thinkpad.redhat.com>
	<54E76CF5.50303@redhat.com>
	<20150223175245.GA26615@stefanha-thinkpad.redhat.com>
In-Reply-To: <20150223175245.GA26615@stefanha-thinkpad.redhat.com>
Content-Type: text/plain; charset=windows-1252; format=flowed
Content-Transfer-Encoding: 7bit
Subject: Re: [Qemu-devel] [PATCH v13 00/17] block: incremental backup series
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Stefan Hajnoczi <stefanha@gmail.com>
Cc: kwolf@redhat.com, famz@redhat.com, armbru@redhat.com, qemu-devel@nongnu.org, vsementsov@parallels.com, Stefan Hajnoczi <stefanha@redhat.com>, mreitz@redhat.com


On 02/23/2015 12:52 PM, Stefan Hajnoczi wrote:
> On Fri, Feb 20, 2015 at 12:20:53PM -0500, John Snow wrote:
>
> Eric: I've added you on CC for a libvirt perspective on the APIs that
> this series introduces.  We're discussing how incremental backup
> failures should be handled when there are multiple drives attached to
> the guest.  Please see towards the bottom of this email.
>
>> On 02/20/2015 06:09 AM, Stefan Hajnoczi wrote:
>>> On Fri, Feb 13, 2015 at 05:08:41PM -0500, John Snow wrote:
>>>> This series requires: [PATCH v3] blkdebug: fix "once" rule
>>>>
>>>> Welcome to the "incremental backup" newsletter, where we discuss
>>>> exciting developments in non-redundant backup technology.
>>>> This issue is called the "Max Reitz Fixup" issue.
>>>>
>>>> This patchset enables the in-memory part of the incremental backup
>>>> feature. There are two series on the mailing list now by Vladimir
>>>> Sementsov-Ogievskiy that enable the migration and persistence of
>>>> dirty bitmaps.
>>>>
>>>> This series and most patches were originally by Fam Zheng.
>>>
>>> Please add docs/incremental-backup.txt explaining how the commands are
>>> intended to be used to perform backups.  The QAPI documentation is
>>> sparse and QMP clients would probably need to read the code to
>>> understand how these commands work.
>>>
>>
>> OK. I wrote a markdown formatted file that explains it all pretty well;
>> should I check it in as-is, or should I convert it to some other format?
>>
>> https://github.com/jnsnow/qemu/blob/bitmap-demo/docs/bitmaps.md
>
> Thanks, that is helpful.  The multi-disk use case is most complex,
> please include it.
>
> Please submit to docs/ as plain text.
>
>>> I'm not sure I understand the need for all the commands: add, remove,
>>> enable, disable, clear.  Why are these commands necessary?
>>
>> add/remove are self explanatory.
>>
>> clear allows you to re-sync a bitmap to a full backup after you have already
>> been running for some time. See the readme for the usage case. Yes, you
>> *COULD* delete and re-create the bitmap with add/remove, but why?
>> Clear is nice, I stand by it.
>>
>> enable/disable: Not necessarily useful for the simple case at this exact
>> moment; they can be used to track writes during a period of time if desired.
>> You could use them with transactions to record activity through different
>> periods of time. They were originally used for what the "frozen" case covers
>> now, but I left them in.
>>
>> I could stand to part with them if you think they detract more than help.
>> They seemed like useful primitives. My default action will be to leave them
>> in, still.
>>
>>>
>>> I think just add and remove should be enough:
>>>
>>> Initial full backup
>>> -------------------
>>> Use transaction to add bitmaps to all drives atomically (i.e.
>>> point-in-time snapshot across all drives) and launch drive-backup (full)
>>> on all drives.
>>>
>>> In your patch the bitmap starts disabled.  In my example bitmaps are
>>> always enabled unless they have a successor.
>>
>> No they don't:
>>
>>      bitmap->disabled = false;
>
> Great, I missed that.  In that case enable/disable are truly optional.
> They can be added in the future without needing to change the 'add'
> command's default.
>

I will remove the interface, but the status will remain. Vladimir uses 
the read-only mode for migrations.

> All code qemu.git requires code review, testing, maintenance, and in the
> case of QMP, a commitment to keep the command forever.  Please drop
> enable/disable for now.
>
>>>
>>> Incremental backup
>>> ------------------
>>> Use transaction to launch drive-backup (bitmap mode) on all drives.
>>>
>>> If backup completes successfully we'll be able to run the same command
>>> again for the next incremental backup.
>>>
>>> If backup fails I see a problem when taking consistent incremental
>>> backups across multiple drives:
>>>
>>> Imagine 2 drives (A and B).  Transaction is used to launch drive-backup
>>> on both with their respective dirty bitmaps.  drive-backup A fails and
>>> merges successor dirty bits so nothing is lost, but what do we do with
>>> drive-backup B?
>>>
>>> drive-backup B could be in-progress or it could have completed before
>>> drive-backup A failed.  Now we're in trouble because we need to take
>>> consistent snapshots across all drives but we've thrown away the dirty
>>> bitmap for drive B!
>>
>> Just re-run the transaction. If one failed but one succeeded, just run it
>> again. The one that succeeded prior will now have a trivial incremental
>> backup, and the one that failed will have a new valid incremental backup.
>> The two new incrementals will be point in time consistent.
>>
>> E.G.:
>>
>> [full_a] <-- [incremental_a_0: FAILED]
>> [full_b] <-- [incremental_b_0: SUCCESS]
>>
>> then re-run:
>>
>> [full_a] <------------------------ [incremental_a_1]
>> [full_b] <-- [incremental_b_0] <-- [incremental_b_1]
>>
>> If the extra image in the chain for the one that succeeded is problematic,
>> you can use other tools to consolidate them later.
>
> Each client implementation needs to duplicate the logic for partial
> backups.  There can be arbitrary levels of partial backups since failure
> could occur twice, three times, etc in a row.  We're forcing all clients
> to deal with failure themselves or to fall back to full backup on
> failure.
>
> The "client" may be a backup application that wants QEMU to send data
> directly to a NBD server.  In order for that to work, the backup
> application also needs to keep state or fall back to full backup on any
> error.
>
> In other words, making the API require a stateful client is ugly.
>

Every interface is beautiful in its own special way.

>> Either way, how do you propose getting a consistent snapshot across multiple
>> drives after a failure? The only recovery option is to just create a new
>> snapshot *later*, you can never go back to what was, just like you can't for
>> single drives.
>
> This is the same problem as for a single drive.  There the solution is
> to merge the old and new bitmaps, and the client has to take a later
> snapshot like you explained.
>
> That policy is fine and should apply to multiple drives too.  Take a new
> snapshot if there was a failure.
>

OK; I will need to cook something up to allow us to run another action 
on all the drive backups after they all complete...

>> I am not convinced there is a problem. Since we don't want filename
>> management (&c) as part of this solution inside of QEMU, there is nothing
>> left to do except within libvirt.
>
> No filenames are involved.  Just don't throw away the frozen dirty
> bitmap until the entire group of backup operations completes.  It's
> equivalent to what we do in the single drive case.
>
>>> Stopping incremental backup
>>> ---------------------------
>>> Use transaction to remove bitmaps on all drives.  This case is easy.
>>>
>>> Finally, what about bdrv_truncate()?  When the disk size changes the
>>> dirty bitmaps keep the old size.  Seems likely to cause problems.  Have
>>> you thought about this case?
>>>
>>
>> Only just recently when reviewing Vladimir's bitmap persistence patches,
>> actually.
>
> This patch series needs to address the issue since the QMP API to create
> bitmaps is exposed here.
>
> The drive cannot be resized during the backup operation.  Luckily we're
> already protected against that by the op blocker API.
>
> The question is what happens to existing bitmaps (enabled, disabled, or
> frozen) when there is no backup blockjob running and the drive is
> resized.  I think it makes sense to resize the bitmaps along with the
> drive.
>
> Stefan
>

It should not be possible to resize a frozen drive, since they only 
freeze during an operation.

I agree that read-only bitmaps should just go ahead and perform the 
resize, however.