All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Daniel P. Berrangé" <berrange@redhat.com>
To: "Zhang, GuoQing (Sam)" <guoqzhan@amd.com>
Cc: Samuel Zhang <guoqing.zhang@amd.com>,
	qemu-devel@nongnu.org, peterx@redhat.com, farosas@suse.de,
	lizhijian@fujitsu.com, eblake@redhat.com, armbru@redhat.com,
	Emily.Deng@amd.com, Victor.Zhao@amd.com, PengJu.Zhou@amd.com,
	Qing.Ma@amd.com, Guoqing Zhang <simpzan@gmail.com>
Subject: Re: [PATCH v6] migration/rdma: add x-rdma-chunk-size parameter
Date: Thu, 30 Apr 2026 11:36:35 +0100	[thread overview]
Message-ID: <afMwszhgeMN8Mtl2@redhat.com> (raw)
In-Reply-To: <5481c478-1957-4a6a-9732-806b81100e1d@amd.com>

On Thu, Apr 30, 2026 at 05:46:43PM +0800, Zhang, GuoQing (Sam) wrote:
> 
> On 2026/4/27 15:17, Daniel P. Berrangé wrote:
> > [You don't often get email from berrange@redhat.com. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ]
> > 
> > On Mon, Apr 27, 2026 at 11:14:01AM +0800, Samuel Zhang wrote:
> > > The default 1MB RDMA chunk size causes slow live migration because
> > > each chunk triggers a write_flush (ibv_post_send). For 8GB RAM,
> > > 1MB chunk size produces ~15000 flushes vs ~3700 with 1024MB chunk size.
> > > 
> > > Add x-rdma-chunk-size parameter to configure the RDMA chunk size for
> > > faster migration.
> > > Usage: `migrate_set_parameter x-rdma-chunk-size 1024M`
> > > 
> > > Performance with RDMA live migration of 8GB RAM VM:
> > > 
> > > | x-rdma-chunk-size (B) | time (s) | throughput (MB/s) |
> > > |-----------------------|----------|-------------------|
> > > | 1M (default)          | 37.915   |  1,007            |
> > > | 32M                   | 17.880   |  2,260            |
> > > | 1024M                 |  4.368   | 17,529            |
> > What is the downside of setting a larger chunk size ?
> > 
> > IOW, why should we keep 1M as the default when it gives
> > such terrible relative performance ?  Why not make 1G
> > be the default instead of creating this flag and requiring
> > people to know about setting it ?
> > 
> Hi Daniel,
> 
> Thank you for the very good question.
> I dug into the git history. The 1M chunk size dates back to the original
> RDMA implementation by Michael R. Hines in 2013 (commit 2da776db48).
> I agree 1M is too conservative for modern hardware. However, I found that 1G
> is not necessarily the optimal chunk size either.
> 
> I collected the following performance data on my server:
> 
> NIC: Mellanox Technologies MT43244 BlueField-3 integrated ConnectX-7 network
> controller
> qemu version: v11.0.0-rc2-139-g25fcd86805
> qemu config: pin-all off (default setting)
> VM system RAM size: 8GB
> guest workload: `stress-ng --vm 4 --vm-bytes 1G --vm-method rand-set
> --timeout 0`
> ```
> chunk_size  total(ms) setup(ms)   down(ms)  Throughput(Mbps) total_size 
> transferred
> 1m            45,156       864      1,166         1,252.50  8.02 GiB   
>  6.46 GiB
> 2m            41,848       853      1,161         1,354.11  8.02 GiB   
>  6.46 GiB
> 4m            37,836       861      1,435         1,523.33  8.02 GiB   
>  6.56 GiB
> 8m            37,684       852      1,176         1,537.98  8.02 GiB   
>  6.59 GiB
> 16m           37,620       852      1,173         1,538.96  8.02 GiB   
>  6.59 GiB
> 32m           15,034       963      1,864         3,401.26  8.02 GiB   
>  5.57 GiB
> 64m            4,492       868      1,554        13,637.46  8.02 GiB   
>  5.75 GiB
> 128m           3,940       851      1,662        16,860.59  8.02 GiB   
>  6.06 GiB
> 256m           3,640       852      2,206        19,390.99  8.02 GiB   
>  6.29 GiB
> 512m           3,645       852      2,179        23,200.67  8.02 GiB   
>  7.54 GiB
> 1024m          3,665       865      2,238        24,676.59  8.02 GiB   
>  8.04 GiB
> ```
> 
> The downside of a larger chunk size:
> A larger chunk causes more data to be transferred per dirty region. For
> example, a single dirty page (4K) will cause a full 1G
> chunk to be transferred when chunk size is 1G. As a result, the total
> migration time may not be the shortest with the largest chunk
> size. See 256m row and 1024m row in the table as an example.
> 
> Based on my data, 128m appears to be the sweet spot for my hardware and
> workload, but different configurations may have different
> optimal values. I think increasing the default (e.g., to 64m or 128m) while
> keeping this parameter for user tuning would be a good
> approach.

Yep, that sounds reasonable.  Also I think it possibly justifies
adding this tunable without the 'x-' prefix.

With regards,
Daniel



      reply	other threads:[~2026-04-30 10:37 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-27  3:14 [PATCH v6] migration/rdma: add x-rdma-chunk-size parameter Samuel Zhang
2026-04-27  7:17 ` Daniel P. Berrangé
2026-04-30  9:46   ` Zhang, GuoQing (Sam)
2026-04-30 10:36     ` Daniel P. Berrangé [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=afMwszhgeMN8Mtl2@redhat.com \
    --to=berrange@redhat.com \
    --cc=Emily.Deng@amd.com \
    --cc=PengJu.Zhou@amd.com \
    --cc=Qing.Ma@amd.com \
    --cc=Victor.Zhao@amd.com \
    --cc=armbru@redhat.com \
    --cc=eblake@redhat.com \
    --cc=farosas@suse.de \
    --cc=guoqing.zhang@amd.com \
    --cc=guoqzhan@amd.com \
    --cc=lizhijian@fujitsu.com \
    --cc=peterx@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=simpzan@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.