qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Peter Xu <peterx@redhat.com>
To: Zhang Chen <zhangckid@gmail.com>
Cc: "Lukas Straub" <lukasstraub2@web.de>,
	"Hailiang Zhang" <zhanghailiang@xfusion.com>,
	qemu-devel@nongnu.org,
	"Dr . David Alan Gilbert" <dave@treblig.org>,
	"Kevin Wolf" <kwolf@redhat.com>,
	"Paolo Bonzini" <pbonzini@redhat.com>,
	"Daniel P . Berrangé" <berrange@redhat.com>,
	"Fabiano Rosas" <farosas@suse.de>,
	"Yury Kotov" <yury-kotov@yandex-team.ru>,
	"Vladimir Sementsov-Ogievskiy" <vsementsov@yandex-team.ru>,
	"Prasad Pandit" <ppandit@redhat.com>,
	"Li Zhijian" <lizhijian@fujitsu.com>,
	"Juraj Marcin" <jmarcin@redhat.com>
Subject: Re: [PATCH RFC 0/9] migration: Threadify loadvm process
Date: Tue, 21 Oct 2025 09:58:00 -0400	[thread overview]
Message-ID: <aPeRaOv3BkRYohqA@x1.local> (raw)
In-Reply-To: <CAK3tnv+dREd=iNsZdK-Th9rrF1wRvxZrR=Pc+yE21V=2Bc=s+g@mail.gmail.com>

On Tue, Oct 21, 2025 at 10:31:50AM +0800, Zhang Chen wrote:
> On Tue, Oct 21, 2025 at 6:09 AM Lukas Straub <lukasstraub2@web.de> wrote:
> >
> > On Mon, 20 Oct 2025 17:41:30 -0400
> > Peter Xu <peterx@redhat.com> wrote:
> >
> > > On Wed, Oct 08, 2025 at 05:26:13PM -0400, Peter Xu wrote:
> > > > On Thu, Sep 04, 2025 at 04:27:39PM +0800, Zhang Chen wrote:
> > > > > > I confess I didn't test anything on COLO but only from code observations
> > > > > > and analysis.  COLO maintainers: could you add some unit tests to QEMU's
> > > > > > qtests?
> > > > >
> > > > > For the COLO part, I think remove the coroutines related code is OK for me.
> > > > > Because the original coroutine still need to call the
> > > > > "colo_process_incoming_thread".
> > > >
> > > > Chen, thanks for the comment.  It's still reassuring.
> > > >
> > > > >
> > > > > Hi Hailiang, any comments for this part?
> > > >
> > > > Any further comment on this series would always be helpful.
> > > >
> > > > It'll be also great if anyone can come up with a selftest for COLO.  Now
> > > > any new migration features needs both unit test and doc to get merged.
> > > > COLO was merged earlier so it doesn't need to, however these will be
> > > > helpful for sure to make sure COLO won't be easily broken.
> > >
> > > Chen/Hailiang:
> > >
> > > I may use some help from COLO side.
> > >
> > > Just now, I did give it a shot with the current docs/COLO-FT.txt and it
> > > didn't really work for me.
> > >
> > > The cmdlines I used almost followed the doc, however I changed a few
> > > things.  For example, on secondary VM I added "file.locking=off" for drive
> > > "parent0" because otherwise the "nbd-server-add" command will fail taking
> > > the lock and it won't ever boot.  Meanwhile I switched to socket netdev
> > > from tap, in my case I only plan to run the COLO main routine, I hope
> > > that's harmless too but let me know if it is a problem.
> > >
> > > So below are the final cmdlines I used..
> > >
> > > For primary:
> > >
> > > bin=~/git/qemu/bin/qemu-system-x86_64
> > > $bin -enable-kvm -cpu qemu64,kvmclock=on \
> > >      -m 512 -smp 1 -qmp stdio \
> > >      -device piix3-usb-uhci -device usb-tablet -name primary \
> > >      -netdev socket,id=hn0,listen=127.0.0.1:10000 \
> > >      -device rtl8139,id=e0,netdev=hn0 \
> > >      -chardev socket,id=mirror0,host=0.0.0.0,port=9003,server=on,wait=off \
> > >      -chardev socket,id=compare1,host=0.0.0.0,port=9004,server=on,wait=on \
> > >      -chardev socket,id=compare0,host=127.0.0.1,port=9001,server=on,wait=off \
> > >      -chardev socket,id=compare0-0,host=127.0.0.1,port=9001 \
> > >      -chardev socket,id=compare_out,host=127.0.0.1,port=9005,server=on,wait=off \
> > >      -chardev socket,id=compare_out0,host=127.0.0.1,port=9005 \
> > >      -object filter-mirror,id=m0,netdev=hn0,queue=tx,outdev=mirror0 \
> > >      -object filter-redirector,netdev=hn0,id=redire0,queue=rx,indev=compare_out \
> > >      -object filter-redirector,netdev=hn0,id=redire1,queue=rx,outdev=compare0 \
> > >      -object iothread,id=iothread1 \
> > >      -object colo-compare,id=comp0,primary_in=compare0-0,secondary_in=compare1,outdev=compare_out0,iothread=iothread1 \
> > >      -drive if=ide,id=colo-disk0,driver=quorum,read-pattern=fifo,vote-threshold=1,children.0.file.filename=./primary.qcow2,children.0.driver=qcow2
> > >
> > > For secondary (testing locally, hence using 127.0.0.1 as primary_ip):
> > >
> > > bin=~/git/qemu/bin/qemu-system-x86_64
> > > primary_ip=127.0.0.1
> > > $bin -enable-kvm -cpu qemu64,kvmclock=on -m 512 -smp 1 -qmp stdio \
> > >      -device piix3-usb-uhci -device usb-tablet -name secondary \
> > >      -netdev socket,id=hn0,connect=127.0.0.1:10000 \
> > >      -device rtl8139,id=e0,netdev=hn0 \
> > >      -chardev socket,id=red0,host=$primary_ip,port=9003,reconnect-ms=1000 \
> > >      -chardev socket,id=red1,host=$primary_ip,port=9004,reconnect-ms=1000 \
> > >      -object filter-redirector,id=f1,netdev=hn0,queue=tx,indev=red0 \
> > >      -object filter-redirector,id=f2,netdev=hn0,queue=rx,outdev=red1 \
> > >      -object filter-rewriter,id=rew0,netdev=hn0,queue=all \
> > >      -drive if=none,id=parent0,file.filename=primary.qcow2,driver=qcow2,file.locking=off \
> > >      -drive if=none,id=childs0,driver=replication,mode=secondary,file.driver=qcow2,top-id=colo-disk0,file.file.filename=secondary-active.qcow2,file.backing.driver=qcow2,file.backing.file.filename=secondary-hidden.qcow2,file.backing.backing=parent0 \
> > >      -drive if=ide,id=colo-disk0,driver=quorum,read-pattern=fifo,vote-threshold=1,children.0=childs0 \
> > >      -incoming tcp:0.0.0.0:9998
> > >
> >
> > Hi Peter,
> > You have to use -incoming defer and enable x-colo on the
> > secondary side before starting migration.
> >
> > And primary.qcow2 should be a separate image (with same content) for
> > each qemu instance.
> 
> Yes, Lukas is right. Qemu can't allow 2 VM touch 1 image.
> So, you can try to "cp primary.qcow2 secondary.qcow2",
> then change the secondary side to " -drive
> if=none,id=parent0,file.filename=secondary.qcow2,driver=qcow2,file.locking=off
> \"

Thanks both.

I think the doc says otherwise.. do you mean the doc is wrong and needs
fixing at least?

I created the secondary.qcow2, and switched to that, still it hit the same
error.

Step 1: start primary QEMU

bin=~/git/qemu/bin/qemu-system-x86_64
$bin -enable-kvm -cpu qemu64,kvmclock=on \
     -m 512 -smp 1 -qmp stdio \
     -device piix3-usb-uhci -device usb-tablet -name primary \
     -netdev socket,id=hn0,listen=127.0.0.1:10000 \
     -device rtl8139,id=e0,netdev=hn0 \
     -chardev socket,id=mirror0,host=0.0.0.0,port=9003,server=on,wait=off \
     -chardev socket,id=compare1,host=0.0.0.0,port=9004,server=on,wait=on \
     -chardev socket,id=compare0,host=127.0.0.1,port=9001,server=on,wait=off \
     -chardev socket,id=compare0-0,host=127.0.0.1,port=9001 \
     -chardev socket,id=compare_out,host=127.0.0.1,port=9005,server=on,wait=off \
     -chardev socket,id=compare_out0,host=127.0.0.1,port=9005 \
     -object filter-mirror,id=m0,netdev=hn0,queue=tx,outdev=mirror0 \
     -object filter-redirector,netdev=hn0,id=redire0,queue=rx,indev=compare_out \
     -object filter-redirector,netdev=hn0,id=redire1,queue=rx,outdev=compare0 \
     -object iothread,id=iothread1 \
     -object colo-compare,id=comp0,primary_in=compare0-0,secondary_in=compare1,outdev=compare_out0,iothread=iothread1 \
     -drive if=ide,id=colo-disk0,driver=quorum,read-pattern=fifo,vote-threshold=1,children.0.file.filename=./primary.qcow2,children.0.driver=qcow2

Step 2: start secondary QEMU

bin=~/git/qemu/bin/qemu-system-x86_64 
primary_ip=127.0.0.1
$bin -enable-kvm -cpu qemu64,kvmclock=on -m 512 -smp 1 -qmp stdio \
     -device piix3-usb-uhci -device usb-tablet -name secondary \
     -netdev socket,id=hn0,connect=127.0.0.1:10000 \
     -device rtl8139,id=e0,netdev=hn0 \
     -chardev socket,id=red0,host=$primary_ip,port=9003,reconnect-ms=1000 \
     -chardev socket,id=red1,host=$primary_ip,port=9004,reconnect-ms=1000 \
     -object filter-redirector,id=f1,netdev=hn0,queue=tx,indev=red0 \
     -object filter-redirector,id=f2,netdev=hn0,queue=rx,outdev=red1 \
     -object filter-rewriter,id=rew0,netdev=hn0,queue=all \
     -drive if=none,id=parent0,file.filename=secondary.qcow2,driver=qcow2,file.locking=off \
     -drive if=none,id=childs0,driver=replication,mode=secondary,file.driver=qcow2,top-id=colo-disk0,file.file.filename=secondary-active.qcow2,file.backing.driver=qcow2,file.backing.file.filename=secondary-hidden.qcow2,file.backing.backing=parent0 \
     -drive if=ide,id=colo-disk0,driver=quorum,read-pattern=fifo,vote-threshold=1,children.0=childs0 \
     -incoming tcp:0.0.0.0:9998

Step 3: Run these commands on secondary QEMU

{"execute":"qmp_capabilities"}
{"execute": "migrate-set-capabilities", "arguments": {"capabilities": [ {"capability": "x-colo", "state": true } ] } }
{"execute": "nbd-server-start", "arguments": {"addr": {"type": "inet", "data": {"host": "0.0.0.0", "port": "9999"} } } }
{"execute": "nbd-server-add", "arguments": {"device": "parent0", "writable": true } }

Step 4: Run these commands on primary QEMU

{"execute":"qmp_capabilities"}
{"execute": "human-monitor-command", "arguments": {"command-line": "drive_add -n buddy driver=replication,mode=primary,file.driver=nbd,file.host=127.0.0.2,file.port=9999,file.export=parent0,node-name=replication0"}}
{"execute": "x-blockdev-change", "arguments":{"parent": "colo-disk0", "node": "replication0" } }
{"execute": "migrate-set-capabilities", "arguments": {"capabilities": [ {"capability": "x-colo", "state": true } ] } }
{"execute": "migrate", "arguments": {"uri": "tcp:127.0.0.2:9998" } }

What I got:

Primary QEMU output:

qemu-system-x86_64: -chardev socket,id=compare1,host=0.0.0.0,port=9004,server=on,wait=on: info: QEMU waiting for connection on: disconnected:tcp:0.0.0.0:9004,server=on
{"QMP": {"version": {"qemu": {"micro": 50, "minor": 1, "major": 10}, "package": "v10.1.0-1513-g94586867df"}, "capabilities": ["oob"]}}
VNC server running on ::1:5901
{"error": {"class": "GenericError", "desc": "JSON parse error, stray '\f'"}}
{"execute":"qmp_capabilities"}
{"return": {}}
{"execute": "human-monitor-command", "arguments": {"command-line": "drive_add -n buddy driver=replication,mode=primary,file.driver=nbd,file.host=127.0.0.2,file.port=9999,file.export=parent0,node-name=replication0"}}
{"return": ""}
{"execute": "x-blockdev-change", "arguments":{"parent": "colo-disk0", "node": "replication0" } }
{"return": {}}
{"execute": "migrate-set-capabilities", "arguments": {"capabilities": [ {"capability": "x-colo", "state": true } ] } }
{"return": {}}
{"execute": "migrate", "arguments": {"uri": "tcp:127.0.0.2:9998" } }
{"return": {}}
{"timestamp": {"seconds": 1761054720, "microseconds": 515770}, "event": "STOP"}

Secondary QEMU output:

{"QMP": {"version": {"qemu": {"micro": 50, "minor": 1, "major": 10}, "package": "v10.1.0-1513-g94586867df"}, "capabilities": ["oob"]}}
VNC server running on ::1:5900
{"execute":"qmp_capabilities"}
{"return": {}}
{"execute": "migrate-set-capabilities", "arguments": {"capabilities": [ {"capability": "x-colo", "state": true } ] } }
{"return": {}}
{"execute": "nbd-server-start", "arguments": {"addr": {"type": "inet", "data": {"host": "0.0.0.0", "port": "9999"} } } }
{"return": {}}
{"execute": "nbd-server-add", "arguments": {"device": "parent0", "writable": true } }
{"return": {}}
{"timestamp": {"seconds": 1761054721, "microseconds": 188336}, "event": "RESUME"}
qemu-system-x86_64: Can't receive COLO message: Input/output error
{"timestamp": {"seconds": 1761054721, "microseconds": 188883}, "event": "COLO_EXIT", "data": {"mode": "secondary", "reason": "error"}}

Thanks,

-- 
Peter Xu



  reply	other threads:[~2025-10-21 13:58 UTC|newest]

Thread overview: 51+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-08-27 20:59 [PATCH RFC 0/9] migration: Threadify loadvm process Peter Xu
2025-08-27 20:59 ` [PATCH RFC 1/9] migration/vfio: Remove BQL implication in vfio_multifd_switchover_start() Peter Xu
2025-08-28 18:05   ` Maciej S. Szmigiero
2025-10-21 20:36     ` Peter Xu
2025-09-16 21:34   ` Fabiano Rosas
2025-08-27 20:59 ` [PATCH RFC 2/9] migration/rdma: Fix wrong context in qio_channel_rdma_shutdown() Peter Xu
2025-09-16 21:41   ` Fabiano Rosas
2025-09-26  1:01   ` Zhijian Li (Fujitsu)
2025-08-27 20:59 ` [PATCH RFC 3/9] migration/rdma: Allow qemu_rdma_wait_comp_channel work with thread Peter Xu
2025-09-16 21:50   ` Fabiano Rosas
2025-09-26  1:02   ` Zhijian Li (Fujitsu)
2025-08-27 20:59 ` [PATCH RFC 4/9] migration/rdma: Change io_create_watch() to return immediately Peter Xu
2025-09-16 22:35   ` Fabiano Rosas
2025-10-08 20:34     ` Peter Xu
2025-09-26  2:39   ` Zhijian Li (Fujitsu)
2025-10-08 20:42     ` Peter Xu
2025-08-27 20:59 ` [PATCH RFC 5/9] migration: Thread-ify precopy vmstate load process Peter Xu
2025-08-27 23:51   ` Dr. David Alan Gilbert
2025-08-29 16:37     ` Peter Xu
2025-09-04  1:38       ` Dr. David Alan Gilbert
2025-10-08 21:02         ` Peter Xu
2025-08-29  8:29   ` Vladimir Sementsov-Ogievskiy
2025-08-29 17:17     ` Peter Xu
2025-09-01  9:35       ` Vladimir Sementsov-Ogievskiy
2025-10-21 18:49         ` Peter Xu
2025-09-17 18:23   ` Fabiano Rosas
2025-10-09 21:41     ` Peter Xu
2025-09-26  3:41   ` Zhijian Li (Fujitsu)
2025-10-08 21:10     ` Peter Xu
2025-08-27 20:59 ` [PATCH RFC 6/9] migration/rdma: Remove coroutine path in qemu_rdma_wait_comp_channel Peter Xu
2025-09-16 22:39   ` Fabiano Rosas
2025-10-08 21:18     ` Peter Xu
2025-09-26  2:44   ` Zhijian Li (Fujitsu)
2025-08-27 20:59 ` [PATCH RFC 7/9] migration/postcopy: Remove workaround on wait preempt channel Peter Xu
2025-09-17 18:30   ` Fabiano Rosas
2025-08-27 20:59 ` [PATCH RFC 8/9] migration/ram: Remove workaround on ram yield during load Peter Xu
2025-09-17 18:31   ` Fabiano Rosas
2025-08-27 20:59 ` [PATCH RFC 9/9] migration/rdma: Remove rdma_cm_poll_handler Peter Xu
2025-09-17 18:38   ` Fabiano Rosas
2025-10-08 21:22     ` Peter Xu
2025-09-26  3:38   ` Zhijian Li (Fujitsu)
2025-08-29  8:29 ` [PATCH RFC 0/9] migration: Threadify loadvm process Vladimir Sementsov-Ogievskiy
2025-08-29 17:18   ` Peter Xu
2025-09-04  8:27 ` Zhang Chen
2025-10-08 21:26   ` Peter Xu
2025-10-20 21:41     ` Peter Xu
2025-10-20 22:08       ` Lukas Straub
2025-10-21  2:31         ` Zhang Chen
2025-10-21 13:58           ` Peter Xu [this message]
2025-09-16 21:32 ` Fabiano Rosas
2025-10-09 16:58   ` Peter Xu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aPeRaOv3BkRYohqA@x1.local \
    --to=peterx@redhat.com \
    --cc=berrange@redhat.com \
    --cc=dave@treblig.org \
    --cc=farosas@suse.de \
    --cc=jmarcin@redhat.com \
    --cc=kwolf@redhat.com \
    --cc=lizhijian@fujitsu.com \
    --cc=lukasstraub2@web.de \
    --cc=pbonzini@redhat.com \
    --cc=ppandit@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=vsementsov@yandex-team.ru \
    --cc=yury-kotov@yandex-team.ru \
    --cc=zhangckid@gmail.com \
    --cc=zhanghailiang@xfusion.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).