All of lore.kernel.org
 help / color / mirror / Atom feed
From: Peter Xu <peterx@redhat.com>
To: Zhang Chen <zhangckid@gmail.com>
Cc: "Lukas Straub" <lukasstraub2@web.de>,
	"Hailiang Zhang" <zhanghailiang@xfusion.com>,
	qemu-devel@nongnu.org,
	"Dr . David Alan Gilbert" <dave@treblig.org>,
	"Kevin Wolf" <kwolf@redhat.com>,
	"Paolo Bonzini" <pbonzini@redhat.com>,
	"Daniel P . Berrangé" <berrange@redhat.com>,
	"Fabiano Rosas" <farosas@suse.de>,
	"Yury Kotov" <yury-kotov@yandex-team.ru>,
	"Vladimir Sementsov-Ogievskiy" <vsementsov@yandex-team.ru>,
	"Prasad Pandit" <ppandit@redhat.com>,
	"Li Zhijian" <lizhijian@fujitsu.com>,
	"Juraj Marcin" <jmarcin@redhat.com>
Subject: Re: [PATCH RFC 0/9] migration: Threadify loadvm process
Date: Tue, 21 Oct 2025 09:58:00 -0400	[thread overview]
Message-ID: <aPeRaOv3BkRYohqA@x1.local> (raw)
In-Reply-To: <CAK3tnv+dREd=iNsZdK-Th9rrF1wRvxZrR=Pc+yE21V=2Bc=s+g@mail.gmail.com>

On Tue, Oct 21, 2025 at 10:31:50AM +0800, Zhang Chen wrote:
> On Tue, Oct 21, 2025 at 6:09 AM Lukas Straub <lukasstraub2@web.de> wrote:
> >
> > On Mon, 20 Oct 2025 17:41:30 -0400
> > Peter Xu <peterx@redhat.com> wrote:
> >
> > > On Wed, Oct 08, 2025 at 05:26:13PM -0400, Peter Xu wrote:
> > > > On Thu, Sep 04, 2025 at 04:27:39PM +0800, Zhang Chen wrote:
> > > > > > I confess I didn't test anything on COLO but only from code observations
> > > > > > and analysis.  COLO maintainers: could you add some unit tests to QEMU's
> > > > > > qtests?
> > > > >
> > > > > For the COLO part, I think remove the coroutines related code is OK for me.
> > > > > Because the original coroutine still need to call the
> > > > > "colo_process_incoming_thread".
> > > >
> > > > Chen, thanks for the comment.  It's still reassuring.
> > > >
> > > > >
> > > > > Hi Hailiang, any comments for this part?
> > > >
> > > > Any further comment on this series would always be helpful.
> > > >
> > > > It'll be also great if anyone can come up with a selftest for COLO.  Now
> > > > any new migration features needs both unit test and doc to get merged.
> > > > COLO was merged earlier so it doesn't need to, however these will be
> > > > helpful for sure to make sure COLO won't be easily broken.
> > >
> > > Chen/Hailiang:
> > >
> > > I may use some help from COLO side.
> > >
> > > Just now, I did give it a shot with the current docs/COLO-FT.txt and it
> > > didn't really work for me.
> > >
> > > The cmdlines I used almost followed the doc, however I changed a few
> > > things.  For example, on secondary VM I added "file.locking=off" for drive
> > > "parent0" because otherwise the "nbd-server-add" command will fail taking
> > > the lock and it won't ever boot.  Meanwhile I switched to socket netdev
> > > from tap, in my case I only plan to run the COLO main routine, I hope
> > > that's harmless too but let me know if it is a problem.
> > >
> > > So below are the final cmdlines I used..
> > >
> > > For primary:
> > >
> > > bin=~/git/qemu/bin/qemu-system-x86_64
> > > $bin -enable-kvm -cpu qemu64,kvmclock=on \
> > >      -m 512 -smp 1 -qmp stdio \
> > >      -device piix3-usb-uhci -device usb-tablet -name primary \
> > >      -netdev socket,id=hn0,listen=127.0.0.1:10000 \
> > >      -device rtl8139,id=e0,netdev=hn0 \
> > >      -chardev socket,id=mirror0,host=0.0.0.0,port=9003,server=on,wait=off \
> > >      -chardev socket,id=compare1,host=0.0.0.0,port=9004,server=on,wait=on \
> > >      -chardev socket,id=compare0,host=127.0.0.1,port=9001,server=on,wait=off \
> > >      -chardev socket,id=compare0-0,host=127.0.0.1,port=9001 \
> > >      -chardev socket,id=compare_out,host=127.0.0.1,port=9005,server=on,wait=off \
> > >      -chardev socket,id=compare_out0,host=127.0.0.1,port=9005 \
> > >      -object filter-mirror,id=m0,netdev=hn0,queue=tx,outdev=mirror0 \
> > >      -object filter-redirector,netdev=hn0,id=redire0,queue=rx,indev=compare_out \
> > >      -object filter-redirector,netdev=hn0,id=redire1,queue=rx,outdev=compare0 \
> > >      -object iothread,id=iothread1 \
> > >      -object colo-compare,id=comp0,primary_in=compare0-0,secondary_in=compare1,outdev=compare_out0,iothread=iothread1 \
> > >      -drive if=ide,id=colo-disk0,driver=quorum,read-pattern=fifo,vote-threshold=1,children.0.file.filename=./primary.qcow2,children.0.driver=qcow2
> > >
> > > For secondary (testing locally, hence using 127.0.0.1 as primary_ip):
> > >
> > > bin=~/git/qemu/bin/qemu-system-x86_64
> > > primary_ip=127.0.0.1
> > > $bin -enable-kvm -cpu qemu64,kvmclock=on -m 512 -smp 1 -qmp stdio \
> > >      -device piix3-usb-uhci -device usb-tablet -name secondary \
> > >      -netdev socket,id=hn0,connect=127.0.0.1:10000 \
> > >      -device rtl8139,id=e0,netdev=hn0 \
> > >      -chardev socket,id=red0,host=$primary_ip,port=9003,reconnect-ms=1000 \
> > >      -chardev socket,id=red1,host=$primary_ip,port=9004,reconnect-ms=1000 \
> > >      -object filter-redirector,id=f1,netdev=hn0,queue=tx,indev=red0 \
> > >      -object filter-redirector,id=f2,netdev=hn0,queue=rx,outdev=red1 \
> > >      -object filter-rewriter,id=rew0,netdev=hn0,queue=all \
> > >      -drive if=none,id=parent0,file.filename=primary.qcow2,driver=qcow2,file.locking=off \
> > >      -drive if=none,id=childs0,driver=replication,mode=secondary,file.driver=qcow2,top-id=colo-disk0,file.file.filename=secondary-active.qcow2,file.backing.driver=qcow2,file.backing.file.filename=secondary-hidden.qcow2,file.backing.backing=parent0 \
> > >      -drive if=ide,id=colo-disk0,driver=quorum,read-pattern=fifo,vote-threshold=1,children.0=childs0 \
> > >      -incoming tcp:0.0.0.0:9998
> > >
> >
> > Hi Peter,
> > You have to use -incoming defer and enable x-colo on the
> > secondary side before starting migration.
> >
> > And primary.qcow2 should be a separate image (with same content) for
> > each qemu instance.
> 
> Yes, Lukas is right. Qemu can't allow 2 VM touch 1 image.
> So, you can try to "cp primary.qcow2 secondary.qcow2",
> then change the secondary side to " -drive
> if=none,id=parent0,file.filename=secondary.qcow2,driver=qcow2,file.locking=off
> \"

Thanks both.

I think the doc says otherwise.. do you mean the doc is wrong and needs
fixing at least?

I created the secondary.qcow2, and switched to that, still it hit the same
error.

Step 1: start primary QEMU

bin=~/git/qemu/bin/qemu-system-x86_64
$bin -enable-kvm -cpu qemu64,kvmclock=on \
     -m 512 -smp 1 -qmp stdio \
     -device piix3-usb-uhci -device usb-tablet -name primary \
     -netdev socket,id=hn0,listen=127.0.0.1:10000 \
     -device rtl8139,id=e0,netdev=hn0 \
     -chardev socket,id=mirror0,host=0.0.0.0,port=9003,server=on,wait=off \
     -chardev socket,id=compare1,host=0.0.0.0,port=9004,server=on,wait=on \
     -chardev socket,id=compare0,host=127.0.0.1,port=9001,server=on,wait=off \
     -chardev socket,id=compare0-0,host=127.0.0.1,port=9001 \
     -chardev socket,id=compare_out,host=127.0.0.1,port=9005,server=on,wait=off \
     -chardev socket,id=compare_out0,host=127.0.0.1,port=9005 \
     -object filter-mirror,id=m0,netdev=hn0,queue=tx,outdev=mirror0 \
     -object filter-redirector,netdev=hn0,id=redire0,queue=rx,indev=compare_out \
     -object filter-redirector,netdev=hn0,id=redire1,queue=rx,outdev=compare0 \
     -object iothread,id=iothread1 \
     -object colo-compare,id=comp0,primary_in=compare0-0,secondary_in=compare1,outdev=compare_out0,iothread=iothread1 \
     -drive if=ide,id=colo-disk0,driver=quorum,read-pattern=fifo,vote-threshold=1,children.0.file.filename=./primary.qcow2,children.0.driver=qcow2

Step 2: start secondary QEMU

bin=~/git/qemu/bin/qemu-system-x86_64 
primary_ip=127.0.0.1
$bin -enable-kvm -cpu qemu64,kvmclock=on -m 512 -smp 1 -qmp stdio \
     -device piix3-usb-uhci -device usb-tablet -name secondary \
     -netdev socket,id=hn0,connect=127.0.0.1:10000 \
     -device rtl8139,id=e0,netdev=hn0 \
     -chardev socket,id=red0,host=$primary_ip,port=9003,reconnect-ms=1000 \
     -chardev socket,id=red1,host=$primary_ip,port=9004,reconnect-ms=1000 \
     -object filter-redirector,id=f1,netdev=hn0,queue=tx,indev=red0 \
     -object filter-redirector,id=f2,netdev=hn0,queue=rx,outdev=red1 \
     -object filter-rewriter,id=rew0,netdev=hn0,queue=all \
     -drive if=none,id=parent0,file.filename=secondary.qcow2,driver=qcow2,file.locking=off \
     -drive if=none,id=childs0,driver=replication,mode=secondary,file.driver=qcow2,top-id=colo-disk0,file.file.filename=secondary-active.qcow2,file.backing.driver=qcow2,file.backing.file.filename=secondary-hidden.qcow2,file.backing.backing=parent0 \
     -drive if=ide,id=colo-disk0,driver=quorum,read-pattern=fifo,vote-threshold=1,children.0=childs0 \
     -incoming tcp:0.0.0.0:9998

Step 3: Run these commands on secondary QEMU

{"execute":"qmp_capabilities"}
{"execute": "migrate-set-capabilities", "arguments": {"capabilities": [ {"capability": "x-colo", "state": true } ] } }
{"execute": "nbd-server-start", "arguments": {"addr": {"type": "inet", "data": {"host": "0.0.0.0", "port": "9999"} } } }
{"execute": "nbd-server-add", "arguments": {"device": "parent0", "writable": true } }

Step 4: Run these commands on primary QEMU

{"execute":"qmp_capabilities"}
{"execute": "human-monitor-command", "arguments": {"command-line": "drive_add -n buddy driver=replication,mode=primary,file.driver=nbd,file.host=127.0.0.2,file.port=9999,file.export=parent0,node-name=replication0"}}
{"execute": "x-blockdev-change", "arguments":{"parent": "colo-disk0", "node": "replication0" } }
{"execute": "migrate-set-capabilities", "arguments": {"capabilities": [ {"capability": "x-colo", "state": true } ] } }
{"execute": "migrate", "arguments": {"uri": "tcp:127.0.0.2:9998" } }

What I got:

Primary QEMU output:

qemu-system-x86_64: -chardev socket,id=compare1,host=0.0.0.0,port=9004,server=on,wait=on: info: QEMU waiting for connection on: disconnected:tcp:0.0.0.0:9004,server=on
{"QMP": {"version": {"qemu": {"micro": 50, "minor": 1, "major": 10}, "package": "v10.1.0-1513-g94586867df"}, "capabilities": ["oob"]}}
VNC server running on ::1:5901
{"error": {"class": "GenericError", "desc": "JSON parse error, stray '\f'"}}
{"execute":"qmp_capabilities"}
{"return": {}}
{"execute": "human-monitor-command", "arguments": {"command-line": "drive_add -n buddy driver=replication,mode=primary,file.driver=nbd,file.host=127.0.0.2,file.port=9999,file.export=parent0,node-name=replication0"}}
{"return": ""}
{"execute": "x-blockdev-change", "arguments":{"parent": "colo-disk0", "node": "replication0" } }
{"return": {}}
{"execute": "migrate-set-capabilities", "arguments": {"capabilities": [ {"capability": "x-colo", "state": true } ] } }
{"return": {}}
{"execute": "migrate", "arguments": {"uri": "tcp:127.0.0.2:9998" } }
{"return": {}}
{"timestamp": {"seconds": 1761054720, "microseconds": 515770}, "event": "STOP"}

Secondary QEMU output:

{"QMP": {"version": {"qemu": {"micro": 50, "minor": 1, "major": 10}, "package": "v10.1.0-1513-g94586867df"}, "capabilities": ["oob"]}}
VNC server running on ::1:5900
{"execute":"qmp_capabilities"}
{"return": {}}
{"execute": "migrate-set-capabilities", "arguments": {"capabilities": [ {"capability": "x-colo", "state": true } ] } }
{"return": {}}
{"execute": "nbd-server-start", "arguments": {"addr": {"type": "inet", "data": {"host": "0.0.0.0", "port": "9999"} } } }
{"return": {}}
{"execute": "nbd-server-add", "arguments": {"device": "parent0", "writable": true } }
{"return": {}}
{"timestamp": {"seconds": 1761054721, "microseconds": 188336}, "event": "RESUME"}
qemu-system-x86_64: Can't receive COLO message: Input/output error
{"timestamp": {"seconds": 1761054721, "microseconds": 188883}, "event": "COLO_EXIT", "data": {"mode": "secondary", "reason": "error"}}

Thanks,

-- 
Peter Xu



  reply	other threads:[~2025-10-21 13:58 UTC|newest]

Thread overview: 51+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-08-27 20:59 [PATCH RFC 0/9] migration: Threadify loadvm process Peter Xu
2025-08-27 20:59 ` [PATCH RFC 1/9] migration/vfio: Remove BQL implication in vfio_multifd_switchover_start() Peter Xu
2025-08-28 18:05   ` Maciej S. Szmigiero
2025-10-21 20:36     ` Peter Xu
2025-09-16 21:34   ` Fabiano Rosas
2025-08-27 20:59 ` [PATCH RFC 2/9] migration/rdma: Fix wrong context in qio_channel_rdma_shutdown() Peter Xu
2025-09-16 21:41   ` Fabiano Rosas
2025-09-26  1:01   ` Zhijian Li (Fujitsu)
2025-08-27 20:59 ` [PATCH RFC 3/9] migration/rdma: Allow qemu_rdma_wait_comp_channel work with thread Peter Xu
2025-09-16 21:50   ` Fabiano Rosas
2025-09-26  1:02   ` Zhijian Li (Fujitsu)
2025-08-27 20:59 ` [PATCH RFC 4/9] migration/rdma: Change io_create_watch() to return immediately Peter Xu
2025-09-16 22:35   ` Fabiano Rosas
2025-10-08 20:34     ` Peter Xu
2025-09-26  2:39   ` Zhijian Li (Fujitsu)
2025-10-08 20:42     ` Peter Xu
2025-08-27 20:59 ` [PATCH RFC 5/9] migration: Thread-ify precopy vmstate load process Peter Xu
2025-08-27 23:51   ` Dr. David Alan Gilbert
2025-08-29 16:37     ` Peter Xu
2025-09-04  1:38       ` Dr. David Alan Gilbert
2025-10-08 21:02         ` Peter Xu
2025-08-29  8:29   ` Vladimir Sementsov-Ogievskiy
2025-08-29 17:17     ` Peter Xu
2025-09-01  9:35       ` Vladimir Sementsov-Ogievskiy
2025-10-21 18:49         ` Peter Xu
2025-09-17 18:23   ` Fabiano Rosas
2025-10-09 21:41     ` Peter Xu
2025-09-26  3:41   ` Zhijian Li (Fujitsu)
2025-10-08 21:10     ` Peter Xu
2025-08-27 20:59 ` [PATCH RFC 6/9] migration/rdma: Remove coroutine path in qemu_rdma_wait_comp_channel Peter Xu
2025-09-16 22:39   ` Fabiano Rosas
2025-10-08 21:18     ` Peter Xu
2025-09-26  2:44   ` Zhijian Li (Fujitsu)
2025-08-27 20:59 ` [PATCH RFC 7/9] migration/postcopy: Remove workaround on wait preempt channel Peter Xu
2025-09-17 18:30   ` Fabiano Rosas
2025-08-27 20:59 ` [PATCH RFC 8/9] migration/ram: Remove workaround on ram yield during load Peter Xu
2025-09-17 18:31   ` Fabiano Rosas
2025-08-27 20:59 ` [PATCH RFC 9/9] migration/rdma: Remove rdma_cm_poll_handler Peter Xu
2025-09-17 18:38   ` Fabiano Rosas
2025-10-08 21:22     ` Peter Xu
2025-09-26  3:38   ` Zhijian Li (Fujitsu)
2025-08-29  8:29 ` [PATCH RFC 0/9] migration: Threadify loadvm process Vladimir Sementsov-Ogievskiy
2025-08-29 17:18   ` Peter Xu
2025-09-04  8:27 ` Zhang Chen
2025-10-08 21:26   ` Peter Xu
2025-10-20 21:41     ` Peter Xu
2025-10-20 22:08       ` Lukas Straub
2025-10-21  2:31         ` Zhang Chen
2025-10-21 13:58           ` Peter Xu [this message]
2025-09-16 21:32 ` Fabiano Rosas
2025-10-09 16:58   ` Peter Xu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aPeRaOv3BkRYohqA@x1.local \
    --to=peterx@redhat.com \
    --cc=berrange@redhat.com \
    --cc=dave@treblig.org \
    --cc=farosas@suse.de \
    --cc=jmarcin@redhat.com \
    --cc=kwolf@redhat.com \
    --cc=lizhijian@fujitsu.com \
    --cc=lukasstraub2@web.de \
    --cc=pbonzini@redhat.com \
    --cc=ppandit@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=vsementsov@yandex-team.ru \
    --cc=yury-kotov@yandex-team.ru \
    --cc=zhangckid@gmail.com \
    --cc=zhanghailiang@xfusion.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.