From: "Gonglei (Arei)" via <qemu-devel@nongnu.org>
To: Peter Xu <peterx@redhat.com>
Cc: "qemu-devel@nongnu.org" <qemu-devel@nongnu.org>,
"yu.zhang@ionos.com" <yu.zhang@ionos.com>,
"mgalaxy@akamai.com" <mgalaxy@akamai.com>,
"elmar.gerdes@ionos.com" <elmar.gerdes@ionos.com>,
zhengchuan <zhengchuan@huawei.com>,
"berrange@redhat.com" <berrange@redhat.com>,
"armbru@redhat.com" <armbru@redhat.com>,
"lizhijian@fujitsu.com" <lizhijian@fujitsu.com>,
"pbonzini@redhat.com" <pbonzini@redhat.com>,
"mst@redhat.com" <mst@redhat.com>,
Xiexiangyou <xiexiangyou@huawei.com>,
"linux-rdma@vger.kernel.org" <linux-rdma@vger.kernel.org>,
"lixiao (H)" <lixiao91@huawei.com>,
"jinpu.wang@ionos.com" <jinpu.wang@ionos.com>,
Wangjialin <wangjialin23@huawei.com>,
Fabiano Rosas <farosas@suse.de>
Subject: RE: [PATCH 0/6] refactor RDMA live migration based on rsocket API
Date: Fri, 7 Jun 2024 08:49:01 +0000 [thread overview]
Message-ID: <2fa61f902c244211af7d1316b67fe0a1@huawei.com> (raw)
In-Reply-To: <ZmBzusHyxLYqMeQg@x1n>
> -----Original Message-----
> From: Peter Xu [mailto:peterx@redhat.com]
> Sent: Wednesday, June 5, 2024 10:19 PM
> To: Gonglei (Arei) <arei.gonglei@huawei.com>
> Cc: qemu-devel@nongnu.org; yu.zhang@ionos.com; mgalaxy@akamai.com;
> elmar.gerdes@ionos.com; zhengchuan <zhengchuan@huawei.com>;
> berrange@redhat.com; armbru@redhat.com; lizhijian@fujitsu.com;
> pbonzini@redhat.com; mst@redhat.com; Xiexiangyou
> <xiexiangyou@huawei.com>; linux-rdma@vger.kernel.org; lixiao (H)
> <lixiao91@huawei.com>; jinpu.wang@ionos.com; Wangjialin
> <wangjialin23@huawei.com>; Fabiano Rosas <farosas@suse.de>
> Subject: Re: [PATCH 0/6] refactor RDMA live migration based on rsocket API
>
> On Wed, Jun 05, 2024 at 10:09:43AM +0000, Gonglei (Arei) wrote:
> > Hi Peter,
> >
> > > -----Original Message-----
> > > From: Peter Xu [mailto:peterx@redhat.com]
> > > Sent: Wednesday, June 5, 2024 3:32 AM
> > > To: Gonglei (Arei) <arei.gonglei@huawei.com>
> > > Cc: qemu-devel@nongnu.org; yu.zhang@ionos.com;
> mgalaxy@akamai.com;
> > > elmar.gerdes@ionos.com; zhengchuan <zhengchuan@huawei.com>;
> > > berrange@redhat.com; armbru@redhat.com; lizhijian@fujitsu.com;
> > > pbonzini@redhat.com; mst@redhat.com; Xiexiangyou
> > > <xiexiangyou@huawei.com>; linux-rdma@vger.kernel.org; lixiao (H)
> > > <lixiao91@huawei.com>; jinpu.wang@ionos.com; Wangjialin
> > > <wangjialin23@huawei.com>; Fabiano Rosas <farosas@suse.de>
> > > Subject: Re: [PATCH 0/6] refactor RDMA live migration based on
> > > rsocket API
> > >
> > > Hi, Lei, Jialin,
> > >
> > > Thanks a lot for working on this!
> > >
> > > I think we'll need to wait a bit on feedbacks from Jinpu and his
> > > team on RDMA side, also Daniel for iochannels. Also, please
> > > remember to copy Fabiano Rosas in any relevant future posts. We'd
> > > also like to know whether he has any comments too. I have him copied in
> this reply.
> > >
> > > On Tue, Jun 04, 2024 at 08:14:06PM +0800, Gonglei wrote:
> > > > From: Jialin Wang <wangjialin23@huawei.com>
> > > >
> > > > Hi,
> > > >
> > > > This patch series attempts to refactor RDMA live migration by
> > > > introducing a new QIOChannelRDMA class based on the rsocket API.
> > > >
> > > > The /usr/include/rdma/rsocket.h provides a higher level rsocket
> > > > API that is a 1-1 match of the normal kernel 'sockets' API, which
> > > > hides the detail of rdma protocol into rsocket and allows us to
> > > > add support for some modern features like multifd more easily.
> > > >
> > > > Here is the previous discussion on refactoring RDMA live migration
> > > > using the rsocket API:
> > > >
> > > > https://lore.kernel.org/qemu-devel/20240328130255.52257-1-philmd@l
> > > > inar
> > > > o.org/
> > > >
> > > > We have encountered some bugs when using rsocket and plan to
> > > > submit them to the rdma-core community.
> > > >
> > > > In addition, the use of rsocket makes our programming more
> > > > convenient, but it must be noted that this method introduces
> > > > multiple memory copies, which can be imagined that there will be a
> > > > certain performance degradation, hoping that friends with RDMA
> > > > network cards can help verify,
> > > thank you!
> > >
> > > It'll be good to elaborate if you tested it in-house. What people
> > > should expect on the numbers exactly? Is that okay from Huawei's POV?
> > >
> > > Besides that, the code looks pretty good at a first glance to me.
> > > Before others chim in, here're some high level comments..
> > >
> > > Firstly, can we avoid using coroutine when listen()? Might be
> > > relevant when I see that rdma_accept_incoming_migration() runs in a
> > > loop to do raccept(), but would that also hang the qemu main loop
> > > even with the coroutine, before all channels are ready? I'm not a
> > > coroutine person, but I think the hope is that we can make dest QEMU
> > > run in a thread in the future just like the src QEMU, so the less coroutine
> the better in this path.
> > >
> >
> > Because rsocket is set to non-blocking, raccept will return EAGAIN
> > when no connection is received, coroutine will yield, and will not hang the
> qemu main loop.
>
> Ah that's ok. And also I just noticed it may not be a big deal either as long as
> we're before migration_incoming_process().
>
> I'm wondering whether it can do it similarly like what we do with sockets in
> qio_net_listener_set_client_func_full(). After all, rsocket wants to mimic the
> socket API. It'll make sense if rsocket code tries to match with socket, or
> even reuse.
>
Actually we tried this solution, but it didn't work. Pls see patch 3/6
Known limitations:
For a blocking rsocket fd, if we use io_create_watch to wait for
POLLIN or POLLOUT events, since the rsocket fd is blocking, we
cannot determine when it is not ready to read/write as we can with
non-blocking fds. Therefore, when an event occurs, it will occurs
always, potentially leave the qemu hanging. So we need be cautious
to avoid hanging when using io_create_watch .
Regards,
-Gonglei
> >
> > > I think I also left a comment elsewhere on whether it would be
> > > possible to allow iochannels implement their own poll() functions to
> > > avoid the per-channel poll thread that is proposed in this series.
> > >
> > > https://lore.kernel.org/r/ZldY21xVExtiMddB@x1n
> > >
> >
> > We noticed that, and it's a big operation. I'm not sure that's a better way.
> >
> > > Personally I think even with the thread proposal it's better than
> > > the old rdma code, but I just still want to double check with you
> > > guys. E.g., maybe that just won't work at all? Again, that'll also
> > > be based on the fact that we move migration incoming into a thread
> > > first to keep the dest QEMU main loop intact, I think, but I hope we
> > > will reach that irrelevant of rdma, IOW it'll be nice to happen even earlier if
> possible.
> > >
> > Yep. This is a fairly big change, I wonder what other people's suggestions
> are?
>
> Yes we can wait for others' opinions. And btw I'm not asking for it and I don't
> think it'll be a blocker for this approach to land, as I said this is better than the
> current code so it's definitely an improvement to me.
>
> I'm purely curious, because if you're not going to do it for rdma, maybe
> someday I'll try to do that, and I want to know what "big change" could be as I
> didn't dig further. It may help me by sharing what issues you've found.
>
> Thanks,
>
> --
> Peter Xu
next prev parent reply other threads:[~2024-06-07 8:49 UTC|newest]
Thread overview: 55+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-06-04 12:14 [PATCH 0/6] refactor RDMA live migration based on rsocket API Gonglei via
2024-06-04 12:14 ` [PATCH 1/6] migration: remove RDMA live migration temporarily Gonglei via
2024-06-04 14:01 ` David Hildenbrand
2024-06-05 10:02 ` Gonglei (Arei) via
2024-06-10 11:45 ` Markus Armbruster
2024-06-04 12:14 ` [PATCH 2/6] io: add QIOChannelRDMA class Gonglei via
2024-06-10 6:54 ` Jinpu Wang
2024-06-04 12:14 ` [PATCH 3/6] io/channel-rdma: support working in coroutine Gonglei via
2024-06-06 13:34 ` Haris Iqbal
2024-06-07 8:45 ` Gonglei (Arei) via
2024-06-07 10:01 ` Haris Iqbal
2024-06-07 9:04 ` Daniel P. Berrangé
2024-06-07 9:28 ` Gonglei (Arei) via
2024-06-04 12:14 ` [PATCH 4/6] tests/unit: add test-io-channel-rdma.c Gonglei via
2024-06-04 12:14 ` [PATCH 5/6] migration: introduce new RDMA live migration Gonglei via
2024-06-04 12:14 ` [PATCH 6/6] migration/rdma: support multifd for RDMA migration Gonglei via
2024-06-04 19:32 ` [PATCH 0/6] refactor RDMA live migration based on rsocket API Peter Xu
2024-06-05 10:09 ` Gonglei (Arei) via
2024-06-05 14:18 ` Peter Xu
2024-06-07 8:49 ` Gonglei (Arei) via [this message]
2024-06-10 16:35 ` Peter Xu
2024-06-07 10:06 ` Daniel P. Berrangé
2024-06-05 7:57 ` Michael S. Tsirkin
2024-06-05 10:00 ` Gonglei (Arei) via
2024-06-05 10:23 ` Michael S. Tsirkin
2024-06-06 11:31 ` Leon Romanovsky
2024-06-07 1:04 ` Zhijian Li (Fujitsu) via
2024-06-07 16:24 ` Yu Zhang
2024-06-07 5:53 ` Jinpu Wang
2024-06-07 8:28 ` Gonglei (Arei) via
2024-06-10 16:31 ` Peter Xu
2024-08-27 20:15 ` Peter Xu
2024-08-27 20:57 ` Michael S. Tsirkin
2024-09-22 19:29 ` Michael Galaxy
2024-09-23 1:04 ` Gonglei (Arei) via
2024-09-25 15:08 ` Peter Xu
2024-09-27 21:45 ` Sean Hefty
2024-09-28 17:52 ` Michael Galaxy
2024-09-29 18:14 ` Michael S. Tsirkin
2024-09-29 20:26 ` Michael Galaxy
2024-09-29 22:26 ` Michael S. Tsirkin
2024-09-30 15:00 ` Michael Galaxy
2024-09-30 15:31 ` Yu Zhang
2024-09-30 18:16 ` Peter Xu
2024-09-30 19:20 ` Sean Hefty
2024-09-30 19:47 ` Peter Xu
2024-10-03 21:26 ` Michael Galaxy
2024-10-03 21:43 ` Peter Xu
2024-10-04 14:04 ` Michael Galaxy
2024-10-07 8:47 ` Yu Zhang
2024-10-07 13:45 ` Michael Galaxy
2024-10-07 18:15 ` Leon Romanovsky
2024-10-08 9:31 ` Zhu Yanjun
2024-10-23 13:42 ` Michael Galaxy
2024-09-27 20:34 ` Michael Galaxy
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=2fa61f902c244211af7d1316b67fe0a1@huawei.com \
--to=qemu-devel@nongnu.org \
--cc=arei.gonglei@huawei.com \
--cc=armbru@redhat.com \
--cc=berrange@redhat.com \
--cc=elmar.gerdes@ionos.com \
--cc=farosas@suse.de \
--cc=jinpu.wang@ionos.com \
--cc=linux-rdma@vger.kernel.org \
--cc=lixiao91@huawei.com \
--cc=lizhijian@fujitsu.com \
--cc=mgalaxy@akamai.com \
--cc=mst@redhat.com \
--cc=pbonzini@redhat.com \
--cc=peterx@redhat.com \
--cc=wangjialin23@huawei.com \
--cc=xiexiangyou@huawei.com \
--cc=yu.zhang@ionos.com \
--cc=zhengchuan@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).