From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 9189FD5B84D for ; Mon, 15 Dec 2025 16:27:06 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1vVBP7-0007Rn-Ul; Mon, 15 Dec 2025 11:26:34 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1vVBOv-0007PZ-FC for qemu-devel@nongnu.org; Mon, 15 Dec 2025 11:26:22 -0500 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1vVBOt-0008Fx-9V for qemu-devel@nongnu.org; Mon, 15 Dec 2025 11:26:21 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1765815978; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=WN8t2QwDvS6NKYGoL2sxfFykDhpA5BocAQ8aXzCz0OQ=; b=GyF/bBl/czMQTQdinJMg9bORJja3bEMoeaudJHIAcSAVAKLK6OQZRBXhNAtODpVtej7Mms Y5+S7Jl+XgqtFMyLMsrFooyIMHVt2e+2HUzdPKclFo4/z2EdwKFUFlFajkWvdY77FAubJN KdXUyZL/zN9iJ55QaUFLIDc0HCj82gc= Received: from mail-qv1-f69.google.com (mail-qv1-f69.google.com [209.85.219.69]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-588-cTPcJZ52P9a_7Evr2vJvvg-1; Mon, 15 Dec 2025 11:26:16 -0500 X-MC-Unique: cTPcJZ52P9a_7Evr2vJvvg-1 X-Mimecast-MFC-AGG-ID: cTPcJZ52P9a_7Evr2vJvvg_1765815976 Received: by mail-qv1-f69.google.com with SMTP id 6a1803df08f44-88a2d8b7ea5so21601426d6.0 for ; Mon, 15 Dec 2025 08:26:16 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=google; t=1765815976; x=1766420776; darn=nongnu.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=WN8t2QwDvS6NKYGoL2sxfFykDhpA5BocAQ8aXzCz0OQ=; b=sEZJxG48fgk2KCtOOOcBPqKGURQhFVRCT0hKw7nk6ixYlnWDg9cfvqzLi3aperkHRQ fXYRiEdQNKC2Dxpiyo3XGHOAxHOIGE1cRexuYTa74aOUS+lCIoDHrkqB8N6PvrR8HdVP vTW+wteGtcf0VseZOQBj+HCxY+yWzA6303sBeIsyALV8J7RM0yAdfZxjEMwvF7DkRNkR VAqwBSvQSCA7wtKnjfmlIqY7evGCTMPp8fp4aGlIBOtQ97rtGr7YGW6WnfM0R4JsvUfr Ii4aYmpL6jHsZkt13YZFqn05iJa8QcgeXkU186BE6f62rahJqw1pyKaI36+Zl0fYXkVb PEiQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1765815976; x=1766420776; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-gg:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=WN8t2QwDvS6NKYGoL2sxfFykDhpA5BocAQ8aXzCz0OQ=; b=ML/LdtSS8DmZnlxm7r2t+rnQg2hPqYDu01iagOb0wPYa1bWrp8UHIJEsYixRX+hrLC c1+5ExDUVj/0/KvGI0uef5+ruwynx3hZFKRShkRvMVcfn/CRCmaL97j7URdtgCaih6ev UXrK2vMKC4pOloqGmoOll7kKcK67vnv2EC/fasq3VJTVih6acN1pbqRyI4IEr3CprL84 uh5Js2Ih9AfsRB9fQBp6T7fSJREECzVjDiOOMGghkCaEraKNgQ4cXScGOBHcZGgHGbqZ Dcrlr2hJjqyf2cG8SIQ+GNar26Ekw2JzGrRyd9h7TPgsvkbpIVXCejJcyI9M5Ra3XFSh uNxw== X-Gm-Message-State: AOJu0YyreoCoUvN58rPuy9Pm3cQlc9T9OitLSiMkcP9nfhJHZ56KRe6x pwzVTGAwZbvDP9rGruC9Z+vh/IxSDhkwCVvYwK8FzW+VZ5I1FlPDNcOz4p1bEeBpK/P+MGz7iVt j+hLzs0MDjo9kdNeaPBVpaL2tuVKSv5vjyoR0EJQdi5EG8X7kqaLrv7Pa X-Gm-Gg: AY/fxX52WrbsYy3yWQL0BTgiwWn+t3O8Qhnk0ICCBChbsZSBU+unKMrX4PQkBYbAb2R m05KJTrjmPxfcG4fiKWmoB+n7pZpxtdAlfneLanCUpiB4GwtY2pfo1mslQku2MZeYvgJYdEHGl+ P33L7uOJ4mCTaw2rVBfggNmvfKS+pQFPqWf0tylMBc/nyKTFhOKHfvWW8vM8AhgwFCQ2tN5Q2SO T0sAnCgWGnL0HNBp277zoVgU+yyomOjOEbIYZmAcQ6Vx6QN5rcdiXisvdngGOmyIqRCZW2HvGpq LxCfEa+GvkVYEaj2kCixxWSeKITpnsH/1JzQ0rU3fAA/pv6+T/wRCAhoCIQX6XQf2l2D4TdhLny 1ce8= X-Received: by 2002:a0c:edc3:0:b0:884:6f86:e096 with SMTP id 6a1803df08f44-8887e7a0854mr120467546d6.34.1765815976110; Mon, 15 Dec 2025 08:26:16 -0800 (PST) X-Google-Smtp-Source: AGHT+IHtnZbJIWIM7kpz31vP3L8RJ7MVrtMKE65Fsq7jZpp44HD79Pu6dq0YqPQ/zDN7t31dJ7aZ1A== X-Received: by 2002:a0c:edc3:0:b0:884:6f86:e096 with SMTP id 6a1803df08f44-8887e7a0854mr120467186d6.34.1765815975607; Mon, 15 Dec 2025 08:26:15 -0800 (PST) Received: from x1.local ([142.188.210.156]) by smtp.gmail.com with ESMTPSA id 6a1803df08f44-88993b41f86sm54823376d6.3.2025.12.15.08.26.14 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 15 Dec 2025 08:26:15 -0800 (PST) Date: Mon, 15 Dec 2025 11:26:13 -0500 From: Peter Xu To: Chuang Xu Cc: qemu-devel@nongnu.org, mst@redhat.com, sgarzare@redhat.com, richard.henderson@linaro.org, pbonzini@redhat.com, david@kernel.org, philmd@linaro.org, farosas@suse.de Subject: Re: [PATCH v2 0/1] migration: reduce bitmap sync time and make dirty pages converge much more easily Message-ID: References: <20251215140611.16180-1-xuchuangxclwt@bytedance.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <20251215140611.16180-1-xuchuangxclwt@bytedance.com> Received-SPF: pass client-ip=170.10.133.124; envelope-from=peterx@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=0.001, RCVD_IN_MSPIKE_WL=0.001, RCVD_IN_VALIDITY_CERTIFIED_BLOCKED=0.001, RCVD_IN_VALIDITY_RPBL_BLOCKED=0.001, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org On Mon, Dec 15, 2025 at 10:06:10PM +0800, Chuang Xu wrote: > In this version: > > - drop duplicate vhost_log_sync optimization > - refactor physical_memory_test_and_clear_dirty > - provide more detailed bitmap sync time for each part in this cover > > > In our long-term experience in Bytedance, we've found that under the same load, > live migration of larger VMs with more devices is often more difficult to > converge (requiring a larger downtime limit). > > We've observed that the live migration bandwidth of large, multi-device VMs is > severely distorted, a phenomenon likely similar to the problem described in this link > (https://wiki.qemu.org/ToDo/LiveMigration#Optimize_migration_bandwidth_calculation). > > Through some testing and calculations, we conclude that bitmap sync time affects > the calculation of live migration bandwidth. > > Now, let me use formulaic reasoning to illustrate the relationship between the downtime > limit required to achieve the stop conditions and the bitmap sync time. > > Assume the actual live migration bandwidth is B, the dirty page rate is D, > the bitmap sync time is x (ms), the transfer time per iteration is t (ms), and the > downtime limit is y (ms). > > To simplify the calculation, we assume all of dirty pages are not zero page and only > consider the case B > D. > > When x + t > 100ms, the bandwidth calculated by qemu is R = B * t / (x + t). > When x + t < 100ms, the bandwidth calculated by qemu is R = B * (100 - x) / 100. > > If there is a critical convergence state, then we have: > (1) B * t = D * (x + t) > (2) t = D * x / (B - D) > For the stop condition to be successfully determined, then we have two cases: > When: > (3) x + t > 100 > (4) x + D * x / (B - D) > 100 > (5) x > 100 - 100 * D / B > Then: > (6) R * y > D * (x + t) > (7) B * t * y / (x + t) > D * (x + t) > (8) (B * (D * x / (B - D)) * y) / (x + D * x / (B - D)) > D * (x + D * x / (B - D)) > (9) D * y > D * (x + D * x / (B - D)) > (10) y > x + D * x / (B - D) > (11) (B - D) * y > B * x > (12) y > B * x / (B - D) > > When: > (13) x + t < 100 > (14) x + D * x / (B - D) < 100 > (15) x < 100 - 100 * D / B > Then: > (16) R * y > D * (x + t) > (17) B * (100 - x) * y / 100 > D * (x + t) > (18) B * (100 - x) * y / 100 > D * (x + D * x / (B - D)) > (19) y > 100 * D * x / ((B - D) * (100 - x)) > > After deriving the formula, we can use some data for comparison. > > For a 64C256G vm with 8 vhost-user-net(32 queue per nic) and 16 vhost-user-blk(4 queue per blk), > the sync time is as high as *73ms* (tested with 10GBps dirty rate, the sync time increases as the dirty page rate increases), > Here are each part of the sync time: > > - sync from kvm to ram_list: 2.5ms > - vhost_log_sync:3ms > - sync aligned memory from ram_list to RAMBlock: 5ms > - sync misaligned memory from ram_list to RAMBlock: 61ms > > After applying this patch, syncing misaligned memory from ram_list to RAMBlock takes only about 1ms, > and the total sync time is only *12ms*. These numbers are greatly helpful, thanks a lot. Please put that into the commit message of the patch. OTOH, IMHO you can drop the formula and bw calculation complexities. Your numbers here already justify this patch very useful. I could have amended the commit message myself when queuing, but there's a code change I want to double check with you. I'll reply there soon. > > *First case, assume our maximum bandwidth can reach 15GBps and the dirty page rate is 10GBps. > > If x = 73 ms, when there is a critical convergence state, > we use formula(2) get t = D * x / (B - D) = 146 ms, > because x + t = 219ms > 100ms, > so we get y > B * x / (B - D) = 219ms. > > If x = 12 ms, when there is a critical convergence state, > we use formula(2) get t = D * x / (B - D) = 24 ms, > because x + t = 36ms < 100ms, > so we get y > 100 * D * x / ((B - D) * (100 - x)) = 27.2ms. > > We can see that after optimization, under the same bandwidth and dirty rate scenario, > the downtime limit required for dirty page convergence is significantly reduced. > > *Second case, assume our maximum bandwidth can reach 15GBps and the downtime limit is set to 150ms. > If x = 73 ms, > when x + t > 100ms, > we use formula(12) get D < B * (y - x) / y = 15 * (150 - 73) / 150 = 7.7GBps, > when x + t < 100ms, > we use formula(19) get D < 5.35GBps > > If x = 12 ms, > when x + t > 100ms, > we use formula(12) get D < B * (y - x) / y = 15 * (150 - 12) / 150 = 13.8GBps, > when x + t < 100ms, > we use formula(19) get D < 13.75GBps > > We can see that after optimization, under the same bandwidth and downtime limit scenario, > the convergent dirty page rate is significantly improved. > > Through the above formula derivation, we have proven that reducing bitmap sync time > can significantly improve dirty page convergence capability. > > This patch only optimizes bitmap sync time for part of scenarios. > There may still be many scenarios where bitmap sync time negatively impacts dirty page > convergence capability, and we can also try to optimize using this approach. > -- Peter Xu