Re: [PATCH v3] migration/rdma: add x-rdma-chunk-size parameter

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Peter Xu <peterx@redhat.com>
To: "Zhang, GuoQing (Sam)" <guoqzhan@amd.com>
Cc: "Zhijian Li (Fujitsu)" <lizhijian@fujitsu.com>,
	Samuel Zhang <guoqing.zhang@amd.com>,
	"qemu-devel@nongnu.org" <qemu-devel@nongnu.org>,
	"farosas@suse.de" <farosas@suse.de>,
	"eblake@redhat.com" <eblake@redhat.com>,
	"armbru@redhat.com" <armbru@redhat.com>,
	"Emily.Deng@amd.com" <Emily.Deng@amd.com>,
	"Victor.Zhao@amd.com" <Victor.Zhao@amd.com>,
	"PengJu.Zhou@amd.com" <PengJu.Zhou@amd.com>,
	"Qing.Ma@amd.com" <Qing.Ma@amd.com>
Subject: Re: [PATCH v3] migration/rdma: add x-rdma-chunk-size parameter
Date: Wed, 1 Apr 2026 11:56:03 -0400	[thread overview]
Message-ID: <ac1AE4vrAlKmawlb@x1.local> (raw)
In-Reply-To: <02a44178-eebc-4fef-a8fb-802dac76c11f@amd.com>

On Tue, Mar 31, 2026 at 06:33:23PM +0800, Zhang, GuoQing (Sam) wrote:
> 
> On 2026/3/31 11:30, Zhijian Li (Fujitsu) wrote:
> > [Some people who received this message don't often get email from lizhijian@fujitsu.com. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ]
> > 
> > On 31/03/2026 00:10, Peter Xu wrote:
> > > Hi, Samuel,
> > > 
> > > On Fri, Mar 27, 2026 at 02:50:06PM +0800, Samuel Zhang wrote:
> > > > The default 1MB RDMA chunk size causes slow live migration because
> > > > each chunk triggers a write_flush (ibv_post_send). For 8GB RAM,
> > > > 1MB chunk size produces ~15000 flushes vs ~3700 with 1024MB chunk size.
> > > > 
> > > > Add x-rdma-chunk-size parameter to configure the RDMA chunk size for
> > > > faster migration.
> > > > Usage: `migrate_set_parameter x-rdma-chunk-size 1024M`
> > > > 
> > > > Performance with RDMA live migration of 8GB RAM VM:
> > > > 
> > > > | x-rdma-chunk-size (B) | time (s) | throughput (MB/s) |
> > > > |-----------------------|----------|-------------------|
> > > > | 1M (default)          | 37.915   |  1,007            |
> > > This is the default. It surprised me a bit knowing it can only reach 1GB/s
> > > throughput with the current code base.  Do you know why?  I thought RDMA
> > > should be much faster than this on throughput with whatever hardware setup.
> > 
> > Regarding the baseline performance, Samuel's numbers look reasonable. I checked
> > some of my old test data on a ConnectX-4 Lx card years ago, and the throughput
> > was around 10 Gbps (~1.25 GB/s), which is consistent with the 1 GB/s he reported.
> > 
> > > > | 32M                   | 17.880   |  2,260            |
> > > > | 1024M                 |  4.368   | 17,529            |
> > My guess for the dramatic performance improvement is that a larger chunk size
> > allows qemu_rdma_write() to batch more *contiguous dirty pages* into a single,
> > more efficient RDMA send operation.
> 
> The `throughput` data is collected from `info migrate` qemu monitor command
> after live-migration.
> 
> Yes, Zhijian is right. As each chunk triggers a write_flush and each flush
> involves posting an RDMA WRITE and WAITING for completion, there's software
> overhead here.
> 
> For 8GB RAM VM migration, 1MB chunk size produces ~15000 flushes. The
> software overhead adds up and prevents the RDMA hardware from sustaining
> high throughput.
> 
> When chunk size is 1GB, there are ~3700 flushes. Reduced flush count means
> reduced software overhead and improved overall throughput.
> 

OK, thanks both.

> 
> > 
> > Is there any workloads running on the guest during the migration, or just an idle guest? @Samuel
> 
> 
> The guest is idle when I test the migration and collect the data.
> 
> 
> > 
> > Given the significant benefit and the fact that the patch itself is straightforward,
> > I think it's a worthwhile addition.
> > 
> > Acked-by: Li Zhijian <lizhijian@fujitsu.com>
> 
> 
> Thank you for the ack, Zhijian!
> 
> 
> > 
> > 
> > 
> > > > Signed-off-by: Samuel Zhang <guoqing.zhang@amd.com>
> > > One thing to mention is RDMA migration is in odd-fixes stage, actually it
> > > doesn't have a real maintainer so it is kind of "orphaned".  In this case,
> > > I actually won't suggest we add any new knobs for performance reasons.
> > > 
> > > Do you have a strong reason to propose this patch to land upstream?  Is it
> > > used in production systems and it solves some real problems for you?
> 
> 
> We have VMs with large RAM and find TCP live-migration is not fast enough
> and expect RDMA migration can be faster.
> 
> But we found the rdma mode migration speed is slower than tcp mode. See
> following data.
> 
> 
> 8GB RAM idle VM live-migration performance:
> | transport mode       | time (s) | throughput (MB/s) |
> |----------------------|----------|-------------------|
> | TCP                  | 36.89    |  1,081            |

What is the NIC setup?  Did you try to enable multifd to offload zeropage
detections?  Or is that not feasible due to some reason?

> | RDMA, 1MB chunk size | 37.915   |  1,007            |
> | RDMA, 1GB chunk size |  4.368   | 17,529            |
> 
> This patch allows us to use larger chunk size for faster RDMA migration.

Sure, Zhijian's point is reasonable.  If he's fine, I'm OK.

Thanks,

> 
> 
> Regards
> Sam
> 
> 
> > > 
> > > I also wonder what Zhijian would say on this.
> > > 
> > > Thanks,
> > > 
> 

-- 
Peter Xu

next prev parent reply	other threads:[~2026-04-01 15:57 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-27  6:50 [PATCH v3] migration/rdma: add x-rdma-chunk-size parameter Samuel Zhang
2026-03-27  9:45 ` Markus Armbruster
2026-03-27 10:27   ` Zhang, GuoQing (Sam)
2026-03-27 11:24     ` Markus Armbruster
2026-03-30 16:10 ` Peter Xu
2026-03-31  3:30   ` Zhijian Li (Fujitsu)
2026-03-31 10:33     ` Zhang, GuoQing (Sam)
2026-03-31 11:29       ` Zhijian Li (Fujitsu)
2026-04-01 15:56       ` Peter Xu [this message]
2026-04-03  6:15         ` Zhang, GuoQing (Sam)
2026-04-03  9:39           ` Zhijian Li (Fujitsu)
2026-04-03  9:59             ` Zhang, GuoQing (Sam)
2026-04-07  6:15               ` Zhijian Li (Fujitsu)
2026-03-31 11:06   ` Markus Armbruster

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ac1AE4vrAlKmawlb@x1.local \
    --to=peterx@redhat.com \
    --cc=Emily.Deng@amd.com \
    --cc=PengJu.Zhou@amd.com \
    --cc=Qing.Ma@amd.com \
    --cc=Victor.Zhao@amd.com \
    --cc=armbru@redhat.com \
    --cc=eblake@redhat.com \
    --cc=farosas@suse.de \
    --cc=guoqing.zhang@amd.com \
    --cc=guoqzhan@amd.com \
    --cc=lizhijian@fujitsu.com \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.