All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
To: Wen Congyang <wency@cn.fujitsu.com>
Cc: Kevin Wolf <kwolf@redhat.com>,
	Changlong Xie <xiecl.fnst@cn.fujitsu.com>,
	Fam Zheng <famz@redhat.com>,
	zhanghailiang <zhang.zhanghailiang@huawei.com>,
	qemu block <qemu-block@nongnu.org>,
	Jiang Yunhong <yunhong.jiang@intel.com>,
	Dong Eddie <eddie.dong@intel.com>,
	qemu devel <qemu-devel@nongnu.org>,
	"Michael R. Hines" <mrhines@linux.vnet.ibm.com>,
	Max Reitz <mreitz@redhat.com>, Gonglei <arei.gonglei@huawei.com>,
	Stefan Hajnoczi <stefanha@redhat.com>,
	Paolo Bonzini <pbonzini@redhat.com>
Subject: Re: [Qemu-devel] [PATCH v13 00/10] Block replication for continuous checkpoints
Date: Fri, 29 Jan 2016 10:47:54 +0000	[thread overview]
Message-ID: <20160129104754.GB2410@work-vm> (raw)
In-Reply-To: <56AB3EAD.60609@cn.fujitsu.com>

* Wen Congyang (wency@cn.fujitsu.com) wrote:
> On 01/29/2016 06:07 PM, Dr. David Alan Gilbert wrote:
> > * Wen Congyang (wency@cn.fujitsu.com) wrote:
> >> On 01/27/2016 07:03 PM, Dr. David Alan Gilbert wrote:
> >>> Hi,
> >>>   I've got a block error if I kill the secondary.
> >>>
> >>> Start both primary & secondary
> >>> kill -9 secondary qemu
> >>> x_colo_lost_heartbeat on primary
> >>>
> >>> The guest sees a block error and the ext4 root switches to read-only.
> >>>
> >>> I gdb'd the primary with a breakpoint on quorum_report_bad; see
> >>> backtrace below.
> >>> (This is based on colo-v2.4-periodic-mode of the framework
> >>> code with the block and network proxy merged in; so it could be my
> >>> merging but I don't think so ?)
> >>>
> >>>
> >>> (gdb) where
> >>> #0  quorum_report_bad (node_name=0x7f2946a0892c "node0", ret=-5, acb=0x7f2946cb3910, acb=0x7f2946cb3910)
> >>>     at /root/colo/jan-2016/qemu/block/quorum.c:222
> >>> #1  0x00007f2943b23058 in quorum_aio_cb (opaque=<optimized out>, ret=<optimized out>)
> >>>     at /root/colo/jan-2016/qemu/block/quorum.c:315
> >>> #2  0x00007f2943b311be in bdrv_co_complete (acb=0x7f2946cb3f60) at /root/colo/jan-2016/qemu/block/io.c:2122
> >>> #3  0x00007f2943ae777d in aio_bh_call (bh=<optimized out>) at /root/colo/jan-2016/qemu/async.c:64
> >>> #4  aio_bh_poll (ctx=ctx@entry=0x7f2945b771d0) at /root/colo/jan-2016/qemu/async.c:92
> >>> #5  0x00007f2943af5090 in aio_dispatch (ctx=0x7f2945b771d0) at /root/colo/jan-2016/qemu/aio-posix.c:305
> >>> #6  0x00007f2943ae756e in aio_ctx_dispatch (source=<optimized out>, callback=<optimized out>, 
> >>>     user_data=<optimized out>) at /root/colo/jan-2016/qemu/async.c:231
> >>> #7  0x00007f293b84a79a in g_main_context_dispatch () from /lib64/libglib-2.0.so.0
> >>> #8  0x00007f2943af3a00 in glib_pollfds_poll () at /root/colo/jan-2016/qemu/main-loop.c:211
> >>> #9  os_host_main_loop_wait (timeout=<optimized out>) at /root/colo/jan-2016/qemu/main-loop.c:256
> >>> #10 main_loop_wait (nonblocking=<optimized out>) at /root/colo/jan-2016/qemu/main-loop.c:504
> >>> #11 0x00007f29438529ee in main_loop () at /root/colo/jan-2016/qemu/vl.c:1945
> >>> #12 main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at /root/colo/jan-2016/qemu/vl.c:4707
> >>>
> >>> (gdb) p s->num_children
> >>> $1 = 2
> >>> (gdb) p acb->success_count
> >>> $2 = 0
> >>> (gdb) p acb->is_read
> >>> $5 = false
> >>
> >> Sorry for the late reply.
> > 
> > No problem.
> > 
> >> What it the value of acb->count?
> > 
> > (gdb) p acb->count
> > $1 = 1
> 
> Note, the count is 1, not 2. Writing to children.0 is in flight. If writing to children.0 successes,
> the guest doesn't know this error.
> >> If secondary host is down, you should remove quorum's children.1. Otherwise, you will get
> >> I/O error event.
> > 
> > Is that safe?  If the secondary fails, do you always have time to issue the command to
> > remove the children.1  before the guest sees the error?
> 
> We will write to two children, and expect that writing to children.0 will success. If so,
> the guest doesn't know this error. You just get the I/O error event.

I think children.0 is the disk, and that should be OK - so only the children.1/replication should
be failing - so in that case why do I see the error?
The 'node0' in the backtrace above is the name of the replication, so it does look like the error
is coming from the replication.

> > Anyway, I tried removing children.1 but it segfaults now, I guess the replication is unhappy:
> > 
> > (qemu) x_block_change colo-disk0 -d children.1
> > (qemu) x_colo_lost_heartbeat 
> 
> Hmm, you should not remove the child before failover. I will check it how to avoid it in the codes.

 But you said 'If secondary host is down, you should remove quorum's children.1' - is that not
what you meant?

> > 12973 Segmentation fault      (core dumped) ./try/x86_64-softmmu/qemu-system-x86_64 -enable-kvm $console_param -S -boot c -m 4080 -smp 4 -machine pc-i440fx-2.5,accel=kvm -name debug-threads=on -trace events=trace-file -device virtio-rng-pci $block_param $net_param
> > 
> > #0  0x00007f0a398a864c in bdrv_stop_replication (bs=0x7f0a3b0a8430, failover=true, errp=0x7fff6a5c3420)
> >     at /root/colo/jan-2016/qemu/block.c:4426
> > 
> > (gdb) p drv
> > $1 = (BlockDriver *) 0x5d2a
> > 
> >   it looks like the whole of bs is bogus.
> > 
> > #1  0x00007f0a398d87f6 in quorum_stop_replication (bs=<optimized out>, failover=<optimized out>, 
> >     errp=<optimized out>) at /root/colo/jan-2016/qemu/block/quorum.c:1213
> > 
> > (gdb) p s->replication_index
> > $3 = 1
> > 
> > I guess quorum_del_child needs to stop replication before it removes the child?
> 
> Yes, but in the newest version, quorum doesn't know the block replication, and I think
> we shoud add an reference to the bs when starting block replication.

Do you have a new version ready to test?  I'm interested to try it (and also interested
to try the latest version of the colo-proxy)

Dave

> Thanks
> Wen Congyang
> 
> > (although it would have to be careful not to block on the dead nbd).
> > 
> > #2  0x00007f0a398a8901 in bdrv_stop_replication_all (failover=failover@entry=true, errp=errp@entry=0x7fff6a5c3478)
> >     at /root/colo/jan-2016/qemu/block.c:4504
> > #3  0x00007f0a3984b0af in primary_vm_do_failover () at /root/colo/jan-2016/qemu/migration/colo.c:144
> > #4  colo_do_failover (s=<optimized out>) at /root/colo/jan-2016/qemu/migration/colo.c:162
> > #5  0x00007f0a3989d7fd in aio_bh_call (bh=<optimized out>) at /root/colo/jan-2016/qemu/async.c:64
> > #6  aio_bh_poll (ctx=ctx@entry=0x7f0a3a6c21d0) at /root/colo/jan-2016/qemu/async.c:92
> > #7  0x00007f0a398ab110 in aio_dispatch (ctx=0x7f0a3a6c21d0) at /root/colo/jan-2016/qemu/aio-posix.c:305
> > #8  0x00007f0a3989d5ee in aio_ctx_dispatch (source=<optimized out>, callback=<optimized out>, 
> >     user_data=<optimized out>) at /root/colo/jan-2016/qemu/async.c:231
> > #9  0x00007f0a3160079a in g_main_context_dispatch () from /lib64/libglib-2.0.so.0
> > #10 0x00007f0a398a9a80 in glib_pollfds_poll () at /root/colo/jan-2016/qemu/main-loop.c:211
> > #11 os_host_main_loop_wait (timeout=<optimized out>) at /root/colo/jan-2016/qemu/main-loop.c:256
> > #12 main_loop_wait (nonblocking=<optimized out>) at /root/colo/jan-2016/qemu/main-loop.c:504
> > #13 0x00007f0a396089ee in main_loop () at /root/colo/jan-2016/qemu/vl.c:1945
> > #14 main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at /root/colo/jan-2016/qemu/vl.c:4707
> > 
> > Dave
> > 
> >> Thanks
> >> Wen Congyang
> >>
> >>>
> >>> (qemu) info block
> >>> colo-disk0 (#block080): json:{"children": [{"driver": "raw", "file": {"driver": "file", "filename": "/root/colo/bugzilla.raw"}}, {"driver": "replication", "mode": "primary", "file": {"port": "8889", "host": "ibpair", "driver": "nbd", "export": "colo-disk0"}}], "driver": "quorum", "blkverify": false, "rewrite-corrupted": false, "vote-threshold": 1} (quorum)
> >>>     Cache mode:       writeback, direct
> >>>
> >>> Dave
> >>>
> >>> * Changlong Xie (xiecl.fnst@cn.fujitsu.com) wrote:
> >>>> Block replication is a very important feature which is used for
> >>>> continuous checkpoints(for example: COLO).
> >>>>
> >>>> You can get the detailed information about block replication from here:
> >>>> http://wiki.qemu.org/Features/BlockReplication
> >>>>
> >>>> Usage:
> >>>> Please refer to docs/block-replication.txt
> >>>>
> >>>> This patch series is based on the following patch series:
> >>>> 1. http://lists.nongnu.org/archive/html/qemu-devel/2015-12/msg04570.html
> >>>>
> >>>> You can get the patch here:
> >>>> https://github.com/Pating/qemu/tree/changlox/block-replication-v13
> >>>>
> >>>> You can get the patch with framework here:
> >>>> https://github.com/Pating/qemu/tree/changlox/colo_framework_v12
> >>>>
> >>>> TODO:
> >>>> 1. Continuous block replication. It will be started after basic functions
> >>>>    are accepted.
> >>>>
> >>>> Changs Log:
> >>>> V13:
> >>>> 1. Rebase to the newest codes
> >>>> 2. Remove redundant marcos and semicolon in replication.c 
> >>>> 3. Fix typos in block-replication.txt
> >>>> V12:
> >>>> 1. Rebase to the newest codes
> >>>> 2. Use backing reference to replcace 'allow-write-backing-file'
> >>>> V11:
> >>>> 1. Reopen the backing file when starting blcok replication if it is not
> >>>>    opened in R/W mode
> >>>> 2. Unblock BLOCK_OP_TYPE_BACKUP_SOURCE and BLOCK_OP_TYPE_BACKUP_TARGET
> >>>>    when opening backing file
> >>>> 3. Block the top BDS so there is only one block job for the top BDS and
> >>>>    its backing chain.
> >>>> V10:
> >>>> 1. Use blockdev-remove-medium and blockdev-insert-medium to replace backing
> >>>>    reference.
> >>>> 2. Address the comments from Eric Blake
> >>>> V9:
> >>>> 1. Update the error messages
> >>>> 2. Rebase to the newest qemu
> >>>> 3. Split child add/delete support. These patches are sent in another patchset.
> >>>> V8:
> >>>> 1. Address Alberto Garcia's comments
> >>>> V7:
> >>>> 1. Implement adding/removing quorum child. Remove the option non-connect.
> >>>> 2. Simplify the backing refrence option according to Stefan Hajnoczi's suggestion
> >>>> V6:
> >>>> 1. Rebase to the newest qemu.
> >>>> V5:
> >>>> 1. Address the comments from Gong Lei
> >>>> 2. Speed the failover up. The secondary vm can take over very quickly even
> >>>>    if there are too many I/O requests.
> >>>> V4:
> >>>> 1. Introduce a new driver replication to avoid touch nbd and qcow2.
> >>>> V3:
> >>>> 1: use error_setg() instead of error_set()
> >>>> 2. Add a new block job API
> >>>> 3. Active disk, hidden disk and nbd target uses the same AioContext
> >>>> 4. Add a testcase to test new hbitmap API
> >>>> V2:
> >>>> 1. Redesign the secondary qemu(use image-fleecing)
> >>>> 2. Use Error objects to return error message
> >>>> 3. Address the comments from Max Reitz and Eric Blake
> >>>>
> >>>> Wen Congyang (10):
> >>>>   unblock backup operations in backing file
> >>>>   Store parent BDS in BdrvChild
> >>>>   Backup: clear all bitmap when doing block checkpoint
> >>>>   Allow creating backup jobs when opening BDS
> >>>>   docs: block replication's description
> >>>>   Add new block driver interfaces to control block replication
> >>>>   quorum: implement block driver interfaces for block replication
> >>>>   Implement new driver for block replication
> >>>>   support replication driver in blockdev-add
> >>>>   Add a new API to start/stop replication, do checkpoint to all BDSes
> >>>>
> >>>>  block.c                    | 145 ++++++++++++
> >>>>  block/Makefile.objs        |   3 +-
> >>>>  block/backup.c             |  14 ++
> >>>>  block/quorum.c             |  78 +++++++
> >>>>  block/replication.c        | 545 +++++++++++++++++++++++++++++++++++++++++++++
> >>>>  blockjob.c                 |  11 +
> >>>>  docs/block-replication.txt | 227 +++++++++++++++++++
> >>>>  include/block/block.h      |   9 +
> >>>>  include/block/block_int.h  |  15 ++
> >>>>  include/block/blockjob.h   |  12 +
> >>>>  qapi/block-core.json       |  33 ++-
> >>>>  11 files changed, 1089 insertions(+), 3 deletions(-)
> >>>>  create mode 100644 block/replication.c
> >>>>  create mode 100644 docs/block-replication.txt
> >>>>
> >>>> -- 
> >>>> 1.9.3
> >>>>
> >>>>
> >>>>
> >>> --
> >>> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> >>>
> >>>
> >>> .
> >>>
> >>
> >>
> >>
> > --
> > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> > 
> > 
> > .
> > 
> 
> 
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

  reply	other threads:[~2016-01-29 10:48 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-12-25 10:30 [Qemu-devel] [PATCH v13 00/10] Block replication for continuous checkpoints Changlong Xie
2015-12-25 10:30 ` [Qemu-devel] [PATCH v13 01/10] unblock backup operations in backing file Changlong Xie
2015-12-25 10:30 ` [Qemu-devel] [PATCH v13 02/10] Store parent BDS in BdrvChild Changlong Xie
2015-12-25 10:30 ` [Qemu-devel] [PATCH v13 03/10] Backup: clear all bitmap when doing block checkpoint Changlong Xie
2015-12-25 10:30 ` [Qemu-devel] [PATCH v13 04/10] Allow creating backup jobs when opening BDS Changlong Xie
2015-12-25 10:30 ` [Qemu-devel] [PATCH v13 05/10] docs: block replication's description Changlong Xie
2015-12-25 10:30 ` [Qemu-devel] [PATCH v13 06/10] Add new block driver interfaces to control block replication Changlong Xie
2015-12-25 10:30 ` [Qemu-devel] [PATCH v13 07/10] quorum: implement block driver interfaces for " Changlong Xie
2015-12-25 10:30 ` [Qemu-devel] [PATCH v13 08/10] Implement new driver " Changlong Xie
2015-12-25 10:30 ` [Qemu-devel] [PATCH v13 09/10] support replication driver in blockdev-add Changlong Xie
2015-12-25 10:30 ` [Qemu-devel] [PATCH v13 10/10] Add a new API to start/stop replication, do checkpoint to all BDSes Changlong Xie
2016-01-22 15:14 ` [Qemu-devel] [PATCH v13 00/10] Block replication for continuous checkpoints Dr. David Alan Gilbert
2016-01-25  1:06   ` Wen Congyang
2016-01-25 12:10     ` Dr. David Alan Gilbert
2016-01-25  1:20   ` Wen Congyang
2016-01-25 11:56     ` Dr. David Alan Gilbert
2016-01-27 11:03 ` Dr. David Alan Gilbert
2016-01-29  6:52   ` Wen Congyang
2016-01-29 10:07     ` Dr. David Alan Gilbert
2016-01-29 10:27       ` Wen Congyang
2016-01-29 10:47         ` Dr. David Alan Gilbert [this message]
2016-02-01  1:18           ` Wen Congyang
2016-02-01 10:18             ` Dr. David Alan Gilbert
2016-02-04  2:32             ` Changlong Xie
2016-02-04  9:07               ` Dr. David Alan Gilbert
2016-02-04  9:16                 ` Wen Congyang
2016-02-04 10:17                 ` Changlong Xie

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160129104754.GB2410@work-vm \
    --to=dgilbert@redhat.com \
    --cc=arei.gonglei@huawei.com \
    --cc=eddie.dong@intel.com \
    --cc=famz@redhat.com \
    --cc=kwolf@redhat.com \
    --cc=mreitz@redhat.com \
    --cc=mrhines@linux.vnet.ibm.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=stefanha@redhat.com \
    --cc=wency@cn.fujitsu.com \
    --cc=xiecl.fnst@cn.fujitsu.com \
    --cc=yunhong.jiang@intel.com \
    --cc=zhang.zhanghailiang@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.