qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Peter Xu <peterx@redhat.com>
To: Sean Hefty <shefty@nvidia.com>
Cc: Michael Galaxy <mgalaxy@akamai.com>,
	"Gonglei (Arei)" <arei.gonglei@huawei.com>,
	"Michael S. Tsirkin" <mst@redhat.com>,
	"qemu-devel@nongnu.org" <qemu-devel@nongnu.org>,
	"yu.zhang@ionos.com" <yu.zhang@ionos.com>,
	"elmar.gerdes@ionos.com" <elmar.gerdes@ionos.com>,
	zhengchuan <zhengchuan@huawei.com>,
	"berrange@redhat.com" <berrange@redhat.com>,
	"armbru@redhat.com" <armbru@redhat.com>,
	"lizhijian@fujitsu.com" <lizhijian@fujitsu.com>,
	"pbonzini@redhat.com" <pbonzini@redhat.com>,
	Xiexiangyou <xiexiangyou@huawei.com>,
	"linux-rdma@vger.kernel.org" <linux-rdma@vger.kernel.org>,
	"lixiao (H)" <lixiao91@huawei.com>,
	"jinpu.wang@ionos.com" <jinpu.wang@ionos.com>,
	Wangjialin <wangjialin23@huawei.com>
Subject: Re: [PATCH 0/6] refactor RDMA live migration based on rsocket API
Date: Mon, 30 Sep 2024 15:47:35 -0400	[thread overview]
Message-ID: <ZvsAV0MugV85HuZf@x1n> (raw)
In-Reply-To: <DM6PR12MB4313D6BA256740DE1ACA29E9BD762@DM6PR12MB4313.namprd12.prod.outlook.com>

On Mon, Sep 30, 2024 at 07:20:56PM +0000, Sean Hefty wrote:
> > > I'm sure rsocket has its place with much smaller transfer sizes, but
> > > this is very different.
> > 
> > Is it possible to make rsocket be friendly with large buffers (>4GB) like the VM
> > use case?
> 
> If you can perform large VM migrations using streaming sockets, rsockets is likely usable, but it will involve data copies.  The problem is the socket API semantics.
> 
> There are rsocket API extensions (riowrite, riomap) to support RDMA write operations.  This avoids the data copy at the target, but not the sender.   (riowrite follows the socket send semantics on buffer ownership.)
> 
> It may be possible to enhance rsockets with MSG_ZEROCOPY or io_uring extensions to enable zero-copy for large transfers, but that's not something I've looked at.  True zero copy may require combining MSG_ZEROCOPY with riowrite, but then that moves further away from using traditional socket calls.

Thanks, Sean.

One thing to mention is that QEMU has QIO_CHANNEL_WRITE_FLAG_ZERO_COPY,
which already supports MSG_ZEROCOPY but only on sender side, and only if
when multifd is enabled, because it requires page pinning and alignments,
while it's more challenging to pin a random buffer than a guest page.

Nobody moved on yet with zerocopy recv for TCP; there might be similar
challenges that normal socket APIs may not work easily on top of current
iochannel design, but I don't know well to say..

Not sure whether it means there can be a shared goal with QEMU ultimately
supporting better zerocopy via either TCP or RDMA.  If that's true, maybe
there's chance we can move towards rsocket with all the above facilities,
meanwhile RDMA can, ideally, run similiarly like TCP with the same (to be
enhanced..) iochannel API, so that it can do zerocopy on both sides with
either transport.

-- 
Peter Xu



  reply	other threads:[~2024-09-30 19:48 UTC|newest]

Thread overview: 55+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-06-04 12:14 [PATCH 0/6] refactor RDMA live migration based on rsocket API Gonglei via
2024-06-04 12:14 ` [PATCH 1/6] migration: remove RDMA live migration temporarily Gonglei via
2024-06-04 14:01   ` David Hildenbrand
2024-06-05 10:02     ` Gonglei (Arei) via
2024-06-10 11:45   ` Markus Armbruster
2024-06-04 12:14 ` [PATCH 2/6] io: add QIOChannelRDMA class Gonglei via
2024-06-10  6:54   ` Jinpu Wang
2024-06-04 12:14 ` [PATCH 3/6] io/channel-rdma: support working in coroutine Gonglei via
2024-06-06 13:34   ` Haris Iqbal
2024-06-07  8:45     ` Gonglei (Arei) via
2024-06-07 10:01       ` Haris Iqbal
2024-06-07  9:04   ` Daniel P. Berrangé
2024-06-07  9:28     ` Gonglei (Arei) via
2024-06-04 12:14 ` [PATCH 4/6] tests/unit: add test-io-channel-rdma.c Gonglei via
2024-06-04 12:14 ` [PATCH 5/6] migration: introduce new RDMA live migration Gonglei via
2024-06-04 12:14 ` [PATCH 6/6] migration/rdma: support multifd for RDMA migration Gonglei via
2024-06-04 19:32 ` [PATCH 0/6] refactor RDMA live migration based on rsocket API Peter Xu
2024-06-05 10:09   ` Gonglei (Arei) via
2024-06-05 14:18     ` Peter Xu
2024-06-07  8:49       ` Gonglei (Arei) via
2024-06-10 16:35         ` Peter Xu
2024-06-07 10:06   ` Daniel P. Berrangé
2024-06-05  7:57 ` Michael S. Tsirkin
2024-06-05 10:00   ` Gonglei (Arei) via
2024-06-05 10:23     ` Michael S. Tsirkin
2024-06-06 11:31     ` Leon Romanovsky
2024-06-07  1:04       ` Zhijian Li (Fujitsu) via
2024-06-07 16:24     ` Yu Zhang
2024-06-07  5:53 ` Jinpu Wang
2024-06-07  8:28   ` Gonglei (Arei) via
2024-06-10 16:31     ` Peter Xu
2024-08-27 20:15 ` Peter Xu
2024-08-27 20:57   ` Michael S. Tsirkin
2024-09-22 19:29     ` Michael Galaxy
2024-09-23  1:04       ` Gonglei (Arei) via
2024-09-25 15:08         ` Peter Xu
2024-09-27 21:45           ` Sean Hefty
2024-09-28 17:52             ` Michael Galaxy
2024-09-29 18:14               ` Michael S. Tsirkin
2024-09-29 20:26                 ` Michael Galaxy
2024-09-29 22:26                   ` Michael S. Tsirkin
2024-09-30 15:00                     ` Michael Galaxy
2024-09-30 15:31                       ` Yu Zhang
2024-09-30 18:16               ` Peter Xu
2024-09-30 19:20                 ` Sean Hefty
2024-09-30 19:47                   ` Peter Xu [this message]
2024-10-03 21:26                     ` Michael Galaxy
2024-10-03 21:43                       ` Peter Xu
2024-10-04 14:04                         ` Michael Galaxy
2024-10-07  8:47                           ` Yu Zhang
2024-10-07 13:45                             ` Michael Galaxy
2024-10-07 18:15                               ` Leon Romanovsky
2024-10-08  9:31                                 ` Zhu Yanjun
2024-10-23 13:42                               ` Michael Galaxy
2024-09-27 20:34         ` Michael Galaxy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZvsAV0MugV85HuZf@x1n \
    --to=peterx@redhat.com \
    --cc=arei.gonglei@huawei.com \
    --cc=armbru@redhat.com \
    --cc=berrange@redhat.com \
    --cc=elmar.gerdes@ionos.com \
    --cc=jinpu.wang@ionos.com \
    --cc=linux-rdma@vger.kernel.org \
    --cc=lixiao91@huawei.com \
    --cc=lizhijian@fujitsu.com \
    --cc=mgalaxy@akamai.com \
    --cc=mst@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=shefty@nvidia.com \
    --cc=wangjialin23@huawei.com \
    --cc=xiexiangyou@huawei.com \
    --cc=yu.zhang@ionos.com \
    --cc=zhengchuan@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).