From: "Gonglei (Arei)" via <qemu-devel@nongnu.org>
To: Peter Xu <peterx@redhat.com>,
"Dr. David Alan Gilbert" <dave@treblig.org>
Cc: "Michael Galaxy" <mgalaxy@akamai.com>,
zhengchuan <zhengchuan@huawei.com>,
"Daniel P. Berrangé" <berrange@redhat.com>,
"Markus Armbruster" <armbru@redhat.com>,
"Yu Zhang" <yu.zhang@ionos.com>,
"Zhijian Li (Fujitsu)" <lizhijian@fujitsu.com>,
"Jinpu Wang" <jinpu.wang@ionos.com>,
"Elmar Gerdes" <elmar.gerdes@ionos.com>,
"qemu-devel@nongnu.org" <qemu-devel@nongnu.org>,
"Yuval Shaia" <yuval.shaia.ml@gmail.com>,
"Kevin Wolf" <kwolf@redhat.com>,
"Prasanna Kumar Kalever" <prasanna.kalever@redhat.com>,
"Cornelia Huck" <cohuck@redhat.com>,
"Michael Roth" <michael.roth@amd.com>,
"Prasanna Kumar Kalever" <prasanna4324@gmail.com>,
"integration@gluster.org" <integration@gluster.org>,
"Paolo Bonzini" <pbonzini@redhat.com>,
"qemu-block@nongnu.org" <qemu-block@nongnu.org>,
"devel@lists.libvirt.org" <devel@lists.libvirt.org>,
"Hanna Reitz" <hreitz@redhat.com>,
"Michael S. Tsirkin" <mst@redhat.com>,
"Thomas Huth" <thuth@redhat.com>,
"Eric Blake" <eblake@redhat.com>,
"Song Gao" <gaosong@loongson.cn>,
"Marc-André Lureau" <marcandre.lureau@redhat.com>,
"Alex Bennée" <alex.bennee@linaro.org>,
"Wainer dos Santos Moschetta" <wainersm@redhat.com>,
"Beraldo Leal" <bleal@redhat.com>,
Pannengyuan <pannengyuan@huawei.com>,
Xiexiangyou <xiexiangyou@huawei.com>,
Wangjialin <wangjialin23@huawei.com>
Subject: RE: [PATCH-for-9.1 v2 2/3] migration: Remove RDMA protocol handling
Date: Fri, 7 Jun 2024 08:57:30 +0000 [thread overview]
Message-ID: <36fc28e07101464db670eebc3833baac@huawei.com> (raw)
In-Reply-To: <ZmDWLkBKISvQcA8I@x1n>
Hi,
> -----Original Message-----
> From: Peter Xu [mailto:peterx@redhat.com]
> Sent: Thursday, June 6, 2024 5:19 AM
> To: Dr. David Alan Gilbert <dave@treblig.org>
> Cc: Michael Galaxy <mgalaxy@akamai.com>; zhengchuan
> <zhengchuan@huawei.com>; Gonglei (Arei) <arei.gonglei@huawei.com>;
> Daniel P. Berrangé <berrange@redhat.com>; Markus Armbruster
> <armbru@redhat.com>; Yu Zhang <yu.zhang@ionos.com>; Zhijian Li (Fujitsu)
> <lizhijian@fujitsu.com>; Jinpu Wang <jinpu.wang@ionos.com>; Elmar Gerdes
> <elmar.gerdes@ionos.com>; qemu-devel@nongnu.org; Yuval Shaia
> <yuval.shaia.ml@gmail.com>; Kevin Wolf <kwolf@redhat.com>; Prasanna
> Kumar Kalever <prasanna.kalever@redhat.com>; Cornelia Huck
> <cohuck@redhat.com>; Michael Roth <michael.roth@amd.com>; Prasanna
> Kumar Kalever <prasanna4324@gmail.com>; integration@gluster.org; Paolo
> Bonzini <pbonzini@redhat.com>; qemu-block@nongnu.org;
> devel@lists.libvirt.org; Hanna Reitz <hreitz@redhat.com>; Michael S. Tsirkin
> <mst@redhat.com>; Thomas Huth <thuth@redhat.com>; Eric Blake
> <eblake@redhat.com>; Song Gao <gaosong@loongson.cn>; Marc-André
> Lureau <marcandre.lureau@redhat.com>; Alex Bennée
> <alex.bennee@linaro.org>; Wainer dos Santos Moschetta
> <wainersm@redhat.com>; Beraldo Leal <bleal@redhat.com>; Pannengyuan
> <pannengyuan@huawei.com>; Xiexiangyou <xiexiangyou@huawei.com>
> Subject: Re: [PATCH-for-9.1 v2 2/3] migration: Remove RDMA protocol handling
>
> On Wed, Jun 05, 2024 at 08:48:28PM +0000, Dr. David Alan Gilbert wrote:
> > > > I just noticed this thread; some random notes from a somewhat
> > > > fragmented memory of this:
> > > >
> > > > a) Long long ago, I also tried rsocket;
> > > >
> https://lists.gnu.org/archive/html/qemu-devel/2015-01/msg02040.html
> > > > as I remember the library was quite flaky at the time.
> > >
> > > Hmm interesting. There also looks like a thread doing rpoll().
> >
> > Yeh, I can't actually remember much more about what I did back then!
>
> Heh, that's understandable and fair. :)
>
> > > I hope Lei and his team has tested >4G mem, otherwise definitely
> > > worth checking. Lei also mentioned there're rsocket bugs they found
> > > in the cover letter, but not sure what's that about.
> >
> > It would probably be a good idea to keep track of what bugs are in
> > flight with it, and try it on a few RDMA cards to see what problems
> > get triggered.
> > I think I reported a few at the time, but I gave up after feeling it
> > was getting very hacky.
>
> Agreed. Maybe we can have a list of that in the cover letter or even QEMU's
> migration/rmda doc page.
>
> Lei, if you think that makes sense please do so in your upcoming posts.
> There'll need to have a list of things you encountered in the kernel driver and
> it'll be even better if there're further links to read on each problem.
>
OK, no problem. There are two bugs:
Bug 1:
https://github.com/linux-rdma/rdma-core/commit/23985e25aebb559b761872313f8cab4e811c5a3d#diff-5ddbf83c6f021688166096ca96c9bba874dffc3cab88ded2e9d8b2176faa084cR3302-R3303
his commit introduces a bug that causes QEMU suspension.
When the timeout parameter of the rpoll is not -1 or 0, the program is suspended occasionally.
Problem analysis:
During the first rpoll,
In line 3297, rs_poll_enter () performs pollcnt++. In this case, the value of pollcnt is 1.
In line 3302, timeout expires and the function exits. Note that rs_poll_exit () is not --pollcnt here.
In this case, the value of pollcnt is 1.
During the second rpoll, pollcnt++ is performed in line 3297 rs_poll_enter (). In this case, the value of pollcnt is 2.
If no timeout expires and the poll return value is greater than 0, the rs_poll_stop () function is executed. Because the if (--pollcnt) condition is false, suspendpoll = 1 is executed.
Go back to the do while loop inside rpoll, again rs_poll_enter () now if (suspendpoll) condition is true, execute pthread_yield (); and return -EBUSY, Then, the do while loop in the rpoll is returned. Because the if (rs_poll_enter ()) condition is true, the rs_poll_enter () function is executed again after the continue operation. As a result, the program is suspended.
Root cause: In line 3302, rs_poll_exit () is not executed before the timeout expires function exits.
Bug 2:
In rsocket.c, there is a receive queue int accept_queue[2] implemented by socketpair. The listen_svc thread in rsocket.c is responsible for receiving connections and writing them to the accept_queue[1]. When raccept () is called, a connection is received from accept_queue[0].
In the test case, qio_channel_wait(QIO_CHANNEL(lioc), G_IO_IN); waits for a readable event (waiting for a connection), rpoll () checks if accept_queue[0] has a readable event, However, this poll does not poll accept_queue[0]. After the timeout expires, rpoll () obtains the readable event of accept_queue[0] from rs_poll_arm again.
Impaction:
The accept operation can be performed only after 5000 ms. Of course, we can shorten this time by echoing the millisecond time > /etc/rdma/rsocket/wake_up_interval.
Regards,
-Gonglei
> > > >
> > > > e) Someone made a good suggestion (sorry can't remember who) -
> that the
> > > > RDMA migration structure was the wrong way around - it should
> be the
> > > > destination which initiates an RDMA read, rather than the source
> > > > doing a write; then things might become a LOT simpler; you just
> need
> > > > to send page ranges to the destination and it can pull it.
> > > > That might work nicely for postcopy.
> > >
> > > I'm not sure whether it'll still be a problem if rdma recv side is
> > > based on zero-copy. It would be a matter of whether atomicity can
> > > be guaranteed so that we don't want the guest vcpus to see a
> > > partially copied page during on-flight DMAs. UFFDIO_COPY (or
> > > friend) is currently the only solution for that.
> >
> > Yes, but even ignoring that (and the UFFDIO_CONTINUE idea you
> > mention), if the destination can issue an RDMA read itself, it doesn't
> > need to send messages to the source to ask for a page fetch; it just
> > goes and grabs it itself, that's got to be good for latency.
>
> Oh, that's pretty internal stuff of rdma to me and beyond my knowledge..
> but from what I can tell it sounds very reasonable indeed!
>
> Thanks!
>
> --
> Peter Xu
>
next prev parent reply other threads:[~2024-06-07 8:58 UTC|newest]
Thread overview: 84+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-03-28 13:02 [PATCH-for-9.1 v2 0/3] rdma: Remove RDMA subsystem and pvrdma device Philippe Mathieu-Daudé
2024-03-28 13:02 ` [PATCH-for-9.1 v2 1/3] hw/rdma: Remove pvrdma device and rdmacm-mux helper Philippe Mathieu-Daudé
2024-03-28 17:51 ` Thomas Huth
2024-03-28 13:02 ` [PATCH-for-9.1 v2 2/3] migration: Remove RDMA protocol handling Philippe Mathieu-Daudé
2024-03-28 14:18 ` Fabiano Rosas
2024-03-28 15:01 ` Peter Xu
2024-03-28 15:22 ` Thomas Huth
2024-03-28 19:04 ` Peter Xu
2024-03-29 1:53 ` Zhijian Li (Fujitsu) via
2024-03-29 10:28 ` Philippe Mathieu-Daudé
2024-03-29 19:44 ` Daniel P. Berrangé
2024-04-01 7:55 ` Zhijian Li (Fujitsu) via
2024-04-01 21:26 ` Yu Zhang
2024-04-02 21:23 ` Peter Xu
2024-04-08 14:07 ` Jinpu Wang
2024-04-08 16:18 ` Peter Xu
2024-04-09 7:32 ` Jinpu Wang
2024-04-09 19:46 ` Peter Xu
2024-04-10 2:28 ` Zhijian Li (Fujitsu) via
2024-04-10 13:49 ` Peter Xu
2024-04-11 14:20 ` Peter Xu
2024-04-11 16:36 ` Yu Zhang
2024-04-12 14:04 ` Peter Xu
2024-04-29 13:08 ` Michael Galaxy
2024-04-29 14:56 ` Peter Xu
2024-04-29 20:45 ` Yu Zhang
2024-04-29 20:56 ` Michael Galaxy
2024-04-30 7:15 ` Markus Armbruster
2024-04-30 8:00 ` Daniel P. Berrangé
2024-05-01 15:31 ` Peter Xu
2024-05-01 15:59 ` Daniel P. Berrangé
2024-05-01 16:16 ` Peter Xu
2024-05-02 13:22 ` Michael Galaxy
2024-05-02 13:30 ` Jinpu Wang
2024-05-02 16:19 ` Peter Xu
2024-05-02 17:10 ` Jinpu Wang
2024-05-03 6:40 ` Jinpu Wang
2024-05-03 14:33 ` Peter Xu
2024-05-06 10:08 ` Jinpu Wang
2024-05-06 15:28 ` Peter Xu
2024-05-07 4:52 ` Jinpu Wang
2024-05-08 10:06 ` Daniel P. Berrangé
2024-05-06 2:06 ` Gonglei (Arei) via
2024-05-06 15:18 ` Peter Xu
2024-05-07 1:50 ` Gonglei (Arei) via
2024-05-07 16:28 ` Peter Xu
2024-05-09 8:58 ` Zheng Chuan via
2024-05-09 14:13 ` Peter Xu
2024-05-13 7:30 ` Jinpu Wang
2024-05-14 15:19 ` Yu Zhang
2024-05-16 17:29 ` Michael Galaxy
2024-05-17 13:01 ` Yu Zhang
2024-05-21 22:15 ` Peter Xu
2024-05-28 9:06 ` Gonglei (Arei) via
2024-05-28 9:11 ` Jinpu Wang
2024-05-28 15:54 ` Peter Xu
2024-05-29 2:43 ` Gonglei (Arei) via
2024-05-29 4:33 ` Jinpu Wang
2024-05-29 6:05 ` Greg Sword
2024-05-29 7:04 ` Jinpu Wang
2024-05-29 8:30 ` Gonglei (Arei)
2024-05-29 8:30 ` Gonglei (Arei) via
2024-05-29 9:17 ` Jinpu Wang
2024-05-29 9:34 ` Gonglei (Arei)
2024-05-29 9:34 ` Gonglei (Arei) via
2024-05-29 9:44 ` Jinpu Wang
2024-05-29 9:47 ` Gonglei (Arei)
2024-05-29 9:47 ` Gonglei (Arei) via
2024-05-29 11:13 ` Haris Iqbal
2024-05-30 18:23 ` Sean Hefty
2024-05-29 16:33 ` Peter Xu
2024-05-13 18:52 ` Michael Galaxy
2024-06-05 0:31 ` Dr. David Alan Gilbert
2024-06-05 14:10 ` Peter Xu
2024-06-05 14:59 ` Peter Xu
2024-06-05 20:48 ` Dr. David Alan Gilbert
2024-06-05 21:18 ` Peter Xu
2024-06-07 8:57 ` Gonglei (Arei) via [this message]
2024-04-11 14:42 ` Jinpu Wang
2024-04-09 9:00 ` Markus Armbruster
2024-03-28 13:02 ` [PATCH-for-9.1 v2 3/3] block/gluster: " Philippe Mathieu-Daudé
2024-03-28 17:54 ` Thomas Huth
2024-03-29 9:17 ` [PATCH-for-9.1 v2 0/3] rdma: Remove RDMA subsystem and pvrdma device Michael S. Tsirkin
2024-04-03 9:37 ` Philippe Mathieu-Daudé
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=36fc28e07101464db670eebc3833baac@huawei.com \
--to=qemu-devel@nongnu.org \
--cc=alex.bennee@linaro.org \
--cc=arei.gonglei@huawei.com \
--cc=armbru@redhat.com \
--cc=berrange@redhat.com \
--cc=bleal@redhat.com \
--cc=cohuck@redhat.com \
--cc=dave@treblig.org \
--cc=devel@lists.libvirt.org \
--cc=eblake@redhat.com \
--cc=elmar.gerdes@ionos.com \
--cc=gaosong@loongson.cn \
--cc=hreitz@redhat.com \
--cc=integration@gluster.org \
--cc=jinpu.wang@ionos.com \
--cc=kwolf@redhat.com \
--cc=lizhijian@fujitsu.com \
--cc=marcandre.lureau@redhat.com \
--cc=mgalaxy@akamai.com \
--cc=michael.roth@amd.com \
--cc=mst@redhat.com \
--cc=pannengyuan@huawei.com \
--cc=pbonzini@redhat.com \
--cc=peterx@redhat.com \
--cc=prasanna.kalever@redhat.com \
--cc=prasanna4324@gmail.com \
--cc=qemu-block@nongnu.org \
--cc=thuth@redhat.com \
--cc=wainersm@redhat.com \
--cc=wangjialin23@huawei.com \
--cc=xiexiangyou@huawei.com \
--cc=yu.zhang@ionos.com \
--cc=yuval.shaia.ml@gmail.com \
--cc=zhengchuan@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.