qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH 00/21] new backup architecture
@ 2016-12-23 14:28 Vladimir Sementsov-Ogievskiy
  2016-12-23 14:28 ` [Qemu-devel] [PATCH 01/21] backup: move from done_bitmap to copy_bitmap Vladimir Sementsov-Ogievskiy
                   ` (21 more replies)
  0 siblings, 22 replies; 59+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2016-12-23 14:28 UTC (permalink / raw)
  To: qemu-block, qemu-devel
  Cc: kwolf, mreitz, jsnow, famz, den, stefanha, vsementsov, pbonzini,
	jcody

Hi all.

This is a new architecture for backup. It solves some current problems:
1. intersecting requests: for now at request start we wait for all intersecting requests, which means that
    a. we may wait even for unrelated to our request clusters
    b. not full async: if we are going to copy clusters 1,2,3,4, when 2 and 4 are in flight, why should we wait for 2 and 4 to be fully copied? Why not to start 1 and 3 in parallel with 2 and 4?

2. notifier request is internally synchronous: if notifier starts copying clusters 1,2,3,4, they will be copied one by one in synchronous loop.

3. notifier wait full copying of corresponding clusters (when actually it may wait only for _read_ operations to be finished)

In short, what is done:
1. full async scheme
4. no intersecting requests
3. notifiers can wait only for read, not for write
4. notifiers wait only for corresponding clusters
5. time limit for notifiers
5. skip unallocated clusters for full mode
6. use HBitmap as main backup bitmap and just init it from dirty bitmap for incremental case
7. retrying: do not reread on write fail

# Intro

Instead of sync-copying + async-notifiers as in old backup, or aio requests like in mirror, this scheme just start 24 workers - separate coroutines, each of them copying clusters synchronously. Copying is only done by one cluster, there are no large requests.
The only difference for clusters, awaited by write notifiers, is larger priority. So, notifiers do not start io requests, they just mark some clusters as awaited and yield. Then, when some worker completes read of last cluster, awaited by this notifier it will enter it.

# Some data structures

Instead of done_bitmap - copy_bitmap, like in mirror.
HBitmap copy_bitmap
    Exactly, what should be copied:
    0 - may mean one of three things:
        - this cluster should not be copied at all
        - this cluster is in flight
        - this cluster is already copied
    1 - means that cluster should be copied, but not touched yet (no async io exists for it)

New bitmap: notif_wait_bitmap - not HBitmap, just Bitmap.
    Exactly, in flight clusters, waiting for read operation:
    0 - may mean one of three things:
        - this cluster should not be copied at all
        - this cluster is in flight and it is _already_ read to memory
        - this cluster is already copied
    1 - means that cluster is in flight, but read operation have not finished
        yet
    The only exception is none-mode: in this case 1 means in flight: in io read or write. This is needed for image fleecing.

Cluster states (copy_bitmap, notif_wait_bitmap)

0, 0 - Ignored (should not be copied at all) or In flight (read done) or Copied
0, 1 - In flight, read operation not finished (or write op. - for none-mode)
1, 0 - Should be copied, but not touched yet
1, 1 - Impossible state

NotifierRequest - request from notifier, it changes sequence of cluster copying by workers.
NotifierRequest {
    int64_t start;
    int64_t end;
    int nb_wait; // nb clusters (in specified range) that should be copied but not already read, i.e. clusters awaited by this notifier
    Coroutine *notif; // corresponding notifier coroutine
}

notifier_reqs - list of notifier requests

# More info

At backup start copy_bitmap is inited to sync_bitmap for incremental backup. For top/full backup it is inited to all ones, but in parallel with workers main coroutine skips not allocated clusters.

Worker coroutines are copying clusters, preferable awaited by notifiers (for which NotifierRequest exists in the list). Function get_work helps them.
Workers will copy clusters, awaited by notifiers even if block-job is paused - it is the same behaviour  as in old architecture.

Old backup fails guest-write if notifier fails to backup corresponding clusters. In the new scheme there is a little difference: notifier just wait for 5s and if backup can't copy all corresponding clusters in this time - guest-write fails.
Error scenarios was considered on list, the final solution was to provide user a possibility to chose what should be failed: backup or guest-write. I'll add this later.

Worker can exit (no more clusters to copy or fatal error) or pause (error or user pause or throttling). When last worker goes to pause it rings up main block-job coroutine, which will handle user pause or errors. We need to handle errors in main coroutine because of nature of block_job_error_action, which may yield.

There also is a bonus: new io-retrying scheme: if there is an error on read or write, worker just yield in the retrying loop and if it will be resumed (with job->error_exit = false) it will continue from the same place, so if we have failed write after successful read we will not reread.

Vladimir Sementsov-Ogievskiy (21):
  backup: move from done_bitmap to copy_bitmap
  backup: init copy_bitmap from sync_bitmap for incremental
  backup: improve non-dirty bits progress processing
  backup: use copy_bitmap in incremental backup
  hbitmap: improve dirty iter
  backup: rewrite top mode cluster skipping
  backup: refactor: merge top/full/incremental backup code
  backup: skip unallocated clusters for full mode
  backup: separate copy function
  backup: refactor backup_copy_cluster()
  backup: move r/w error handling code to r/w functions
  iotests: add supported_cache_modes to main function
  coroutine: add qemu_coroutine_add_next
  block: add trace point on bdrv_close_all
  bitmap: add bitmap_count_between() function
  hbitmap: add hbitmap_count_between() function
  backup: make all reads not serializing
  backup: new async architecture
  backup: refactor backup_do_cow
  backup: move bitmap handling from backup_do_cow to get_work
  backup: refactor: remove backup_do_cow()

 block.c                       |   1 +
 block/backup.c                | 871 +++++++++++++++++++++++++++++++-----------
 block/trace-events            |  34 +-
 blockjob.c                    |  29 +-
 include/block/blockjob.h      |  15 +-
 include/qemu/bitmap.h         |   4 +
 include/qemu/coroutine.h      |   2 +
 include/qemu/hbitmap.h        |  26 +-
 tests/qemu-iotests/055        |   4 +-
 tests/qemu-iotests/129        |   6 +-
 tests/qemu-iotests/iotests.py |   7 +-
 util/bitmap.c                 |  27 ++
 util/hbitmap.c                |  32 +-
 util/qemu-coroutine.c         |   7 +
 14 files changed, 805 insertions(+), 260 deletions(-)

-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 59+ messages in thread

end of thread, other threads:[~2017-02-01 16:13 UTC | newest]

Thread overview: 59+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-12-23 14:28 [Qemu-devel] [PATCH 00/21] new backup architecture Vladimir Sementsov-Ogievskiy
2016-12-23 14:28 ` [Qemu-devel] [PATCH 01/21] backup: move from done_bitmap to copy_bitmap Vladimir Sementsov-Ogievskiy
2017-01-23  5:34   ` Jeff Cody
2017-01-23 12:20   ` Vladimir Sementsov-Ogievskiy
2017-01-31 10:25   ` Stefan Hajnoczi
2016-12-23 14:28 ` [Qemu-devel] [PATCH 02/21] backup: init copy_bitmap from sync_bitmap for incremental Vladimir Sementsov-Ogievskiy
2017-01-24  7:09   ` Fam Zheng
2017-01-24  9:00     ` Vladimir Sementsov-Ogievskiy
2017-01-24  9:46       ` Fam Zheng
2017-01-24 10:16         ` Vladimir Sementsov-Ogievskiy
2017-01-31 10:36   ` Stefan Hajnoczi
2016-12-23 14:28 ` [Qemu-devel] [PATCH 03/21] backup: improve non-dirty bits progress processing Vladimir Sementsov-Ogievskiy
2017-01-24  7:17   ` Fam Zheng
2017-01-24  9:12     ` Vladimir Sementsov-Ogievskiy
2017-01-31 10:56       ` Stefan Hajnoczi
2016-12-23 14:28 ` [Qemu-devel] [PATCH 04/21] backup: use copy_bitmap in incremental backup Vladimir Sementsov-Ogievskiy
2017-01-31 11:01   ` Stefan Hajnoczi
2016-12-23 14:28 ` [Qemu-devel] [PATCH 05/21] hbitmap: improve dirty iter Vladimir Sementsov-Ogievskiy
2017-01-31 11:20   ` Stefan Hajnoczi
2017-01-31 11:29   ` Stefan Hajnoczi
2016-12-23 14:28 ` [Qemu-devel] [PATCH 06/21] backup: rewrite top mode cluster skipping Vladimir Sementsov-Ogievskiy
2017-01-31 13:31   ` Stefan Hajnoczi
2016-12-23 14:28 ` [Qemu-devel] [PATCH 07/21] backup: refactor: merge top/full/incremental backup code Vladimir Sementsov-Ogievskiy
2017-01-31 14:26   ` Stefan Hajnoczi
2016-12-23 14:28 ` [Qemu-devel] [PATCH 08/21] backup: skip unallocated clusters for full mode Vladimir Sementsov-Ogievskiy
2017-01-24  7:59   ` Fam Zheng
2017-01-24  9:18     ` Vladimir Sementsov-Ogievskiy
2017-01-24  9:36       ` Fam Zheng
2017-01-24 10:13         ` Vladimir Sementsov-Ogievskiy
2017-01-24 11:12           ` Fam Zheng
2017-01-31 14:33   ` Stefan Hajnoczi
2017-01-31 14:38   ` Stefan Hajnoczi
2016-12-23 14:28 ` [Qemu-devel] [PATCH 09/21] backup: separate copy function Vladimir Sementsov-Ogievskiy
2017-01-31 14:40   ` Stefan Hajnoczi
2016-12-23 14:28 ` [Qemu-devel] [PATCH 10/21] backup: refactor backup_copy_cluster() Vladimir Sementsov-Ogievskiy
2017-01-31 14:57   ` Stefan Hajnoczi
2016-12-23 14:28 ` [Qemu-devel] [PATCH 11/21] backup: move r/w error handling code to r/w functions Vladimir Sementsov-Ogievskiy
2017-01-31 14:57   ` Stefan Hajnoczi
2016-12-23 14:28 ` [Qemu-devel] [PATCH 12/21] iotests: add supported_cache_modes to main function Vladimir Sementsov-Ogievskiy
2017-01-31 14:58   ` Stefan Hajnoczi
2016-12-23 14:28 ` [Qemu-devel] [PATCH 13/21] coroutine: add qemu_coroutine_add_next Vladimir Sementsov-Ogievskiy
2017-01-31 15:03   ` Stefan Hajnoczi
2016-12-23 14:28 ` [Qemu-devel] [PATCH 14/21] block: add trace point on bdrv_close_all Vladimir Sementsov-Ogievskiy
2017-01-31 15:03   ` Stefan Hajnoczi
2016-12-23 14:28 ` [Qemu-devel] [PATCH 15/21] bitmap: add bitmap_count_between() function Vladimir Sementsov-Ogievskiy
2017-01-31 15:15   ` Stefan Hajnoczi
2016-12-23 14:28 ` [Qemu-devel] [PATCH 16/21] hbitmap: add hbitmap_count_between() function Vladimir Sementsov-Ogievskiy
2017-01-31 15:56   ` Stefan Hajnoczi
2016-12-23 14:29 ` [Qemu-devel] [PATCH 17/21] backup: make all reads not serializing Vladimir Sementsov-Ogievskiy
2017-01-31 16:30   ` Stefan Hajnoczi
2016-12-23 14:29 ` [Qemu-devel] [PATCH 18/21] backup: new async architecture Vladimir Sementsov-Ogievskiy
2017-01-31 16:46   ` Stefan Hajnoczi
2017-02-01 16:13   ` [Qemu-devel] [Qemu-block] " Stefan Hajnoczi
2016-12-23 14:29 ` [Qemu-devel] [PATCH 20/21] backup: move bitmap handling from backup_do_cow to get_work Vladimir Sementsov-Ogievskiy
2016-12-23 14:29 ` [Qemu-devel] [PATCH 21/21] backup: refactor: remove backup_do_cow() Vladimir Sementsov-Ogievskiy
2017-01-09 11:04 ` [Qemu-devel] [PATCH 00/21] new backup architecture Stefan Hajnoczi
2017-01-10  6:05   ` Jeff Cody
2017-01-10 18:48     ` John Snow
2017-01-31 10:20 ` Stefan Hajnoczi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).