From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 02B12CEBF69 for ; Fri, 27 Sep 2024 02:51:12 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1su13q-0005zP-U2; Thu, 26 Sep 2024 22:50:26 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1su13o-0005yv-3n for qemu-devel@nongnu.org; Thu, 26 Sep 2024 22:50:24 -0400 Received: from mail-oa1-x33.google.com ([2001:4860:4864:20::33]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1su13l-0002rC-1r for qemu-devel@nongnu.org; Thu, 26 Sep 2024 22:50:23 -0400 Received: by mail-oa1-x33.google.com with SMTP id 586e51a60fabf-277f35c01f5so860714fac.0 for ; Thu, 26 Sep 2024 19:50:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=smartx-com.20230601.gappssmtp.com; s=20230601; t=1727405419; x=1728010219; darn=nongnu.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=vTAom9xeL8MHhLseDr7d74lB1OqL1948p0FeGvCYKGM=; b=BzfUVAfm28bTkwvN7+2H765VAj4GirQI8d8LLpQRbKBBQ1L91qCnk/OutrvMy88x6X efySJvIQx2Fyyp0OMfzphLZABWysTrqqka50EI3Cc+owF6KbeZtMO7e3LPjJ+Ty2LiXv y2thzcOi4oXScWlhVF20CpEHjxOp+7MbJxzvrdgJ376O9Zedgk1sDor2wDk91d9sUftq D0kYAtTJRaJCZnZySnVSRebfRTuhKrDUKsp7pP3IdZp22LirSZloe0wzfW5IHn6/Wu4Q Hvatpd40LQ+TAGr9Nb5uylo4rc6F5tlFZYvUMWre4wLynipyeNbJDpNDhtsZhfjwuzN2 ORFA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1727405419; x=1728010219; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=vTAom9xeL8MHhLseDr7d74lB1OqL1948p0FeGvCYKGM=; b=wm2mnU2XheE/kE2jESucsJXelnj0XOR4o9QuXone1eeV3hVmanfLh8B3yOiKatmpLC n78s621Jtgl8kDdPftIEzJeTvNXbWOlXITH4fpVZoSBHYwWLnPxXQ65KQjkiKMrYyXH0 71VgvLniYyFAUfaeHeyk+hlAv9guMF/tjWtmBHQQ6tJM6BBQ7qnuHWgiSBYxv1kA1ONH zrLFP50dz75PxYDB2sEaYoaxZdG3vurgg4L9/6gRuMy0m2asQxHXINLI4bQ/V7kvs33Y G8wvH6QWk6hzx0NcUL/UImjFOxLUxIurDNC5vFFc49tTIy4bpBx8dC6qFIQZdtbKkvHp mOfw== X-Forwarded-Encrypted: i=1; AJvYcCUzNaDM02oSHirOJKwCAf349hGxOyWl9g8/uJ74Rq8YDBuHMC5xl0U5falt20KIvjRtZogWPozKFD9k@nongnu.org X-Gm-Message-State: AOJu0YzovlpIXBRkpfUEQ89Y7VbGR7rKFGonyC++ArdOQ9G6fvtrz6y0 cenL49liRHJ8UpOHVr3kkQX+mn7x8OFm4guT8tcApFjxVkesQhwRtmkHps1NJFhp9/ZYYvq6/Zk mUeQQAo5P+9yI3Y9YwCJc1qxp18Jz22d1fP0qzA== X-Google-Smtp-Source: AGHT+IECtSfSzO4ZCzzwuMkYRV1z3HQ4WnHLAGxXdk5nMwmCayH0t4/yF9WxxNEpXNjPb24onk+DiNE+/7l9rMrhoEc= X-Received: by 2002:a05:6870:6b8b:b0:278:157:25bb with SMTP id 586e51a60fabf-28710ad5eeamr1731940fac.26.1727405418054; Thu, 26 Sep 2024 19:50:18 -0700 (PDT) MIME-Version: 1.0 References: <531750c8d7b6c09f877b5f335a60fab402c168be.1726390098.git.yong.huang@smartx.com> <87msk7z4l3.fsf@suse.de> In-Reply-To: From: Yong Huang Date: Fri, 27 Sep 2024 10:50:01 +0800 Message-ID: Subject: Re: [PATCH v1 1/7] migration: Introduce structs for background sync To: Peter Xu Cc: Fabiano Rosas , qemu-devel@nongnu.org, Eric Blake , Markus Armbruster , David Hildenbrand , =?UTF-8?Q?Philippe_Mathieu=2DDaud=C3=A9?= , Paolo Bonzini Content-Type: multipart/alternative; boundary="00000000000053a180062310eafc" Received-SPF: pass client-ip=2001:4860:4864:20::33; envelope-from=yong.huang@smartx.com; helo=mail-oa1-x33.google.com X-Spam_score_int: -18 X-Spam_score: -1.9 X-Spam_bar: - X-Spam_report: (-1.9 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, HTML_MESSAGE=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org --00000000000053a180062310eafc Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Fri, Sep 27, 2024 at 3:55=E2=80=AFAM Peter Xu wrote: > On Fri, Sep 27, 2024 at 02:13:47AM +0800, Yong Huang wrote: > > On Thu, Sep 26, 2024 at 3:17=E2=80=AFAM Peter Xu wr= ote: > > > > > On Fri, Sep 20, 2024 at 10:43:31AM +0800, Yong Huang wrote: > > > > Yes, invoke migration_bitmap_sync_precopy more frequently is also m= y > > > > first idea but it involves bitmap updating and interfere with the > > > behavior > > > > of page sending, it also affects the migration information stats an= d > > > > interfere other migration logic such as migration_update_rates(). > > > > > > Could you elaborate? > > > > > > For example, what happens if we start to sync in ram_save_iterate() f= or > > > some time intervals (e.g. 5 seconds)? > > > > > > > I didn't try to sync in ram_save_iterate but in the > > migration_bitmap_sync_precopy. > > > > If we use the migration_bitmap_sync_precopy in the ram_save_iterate > > function, > > This approach seems to be correct. However, the bitmap will be updated = as > > the > > migration thread iterates through each dirty page in the RAMBlock list. > > Compared > > to the existing implementation, this is different but still > straightforward; > > I'll give it a shot soon to see if it works. > > It's still serialized in the migration thread, so I'd expect it is simila= r > What does "serialized" mean? How about we: 1. invoke the migration_bitmap_sync_precopy in a timer(bg_sync_timer) hook, every 5 seconds. 2. register the bg_sync_timer in the main loop when the machine starts like throttle_timer 3. activate the timer when ram_save_iterate gets called and deactivate it i= n the ram_save_cleanup gracefully during migration. I think it is simple enough and also isn't "serialized"? to e.g. ->state_pending_exact() calls when QEMU flushed most dirty pages in > the current bitmap. > > > > > > > > Btw, we shouldn't have this extra sync exist if auto converge is > disabled > > > no matter which way we use, because it's pure overhead when auto > converge > > > is not in use. > > > > > > > Ok, I'll add the check in the next versioni. > > Let's start with simple, and if there's anything unsure we can discuss > upfront, just to avoid coding something and change direction later. Agai= n, > personally I think we shouldn't add too much new code to auto converge > (unless very well justfied, but I think it's just hard.. fundamentally wi= th > any pure throttling solutions), hopefully something small can make it sta= rt > to work for huge VMs. > > Thanks, > > -- > Peter Xu > > --=20 Best regards --00000000000053a180062310eafc Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable


On Fri, Sep 27, 20= 24 at 3:55=E2=80=AFAM Peter Xu <pet= erx@redhat.com> wrote:
On Fri, Sep 27, 202= 4 at 02:13:47AM +0800, Yong Huang wrote:
> On Thu, Sep 26, 2024 at 3:17=E2=80=AFAM Peter Xu <peterx@redhat.com> wrote:
>
> > On Fri, Sep 20, 2024 at 10:43:31AM +0800, Yong Huang wrote:
> > > Yes, invoke migration_bitmap_sync_precopy more frequently is= also my
> > > first idea but it involves bitmap updating and interfere wit= h the
> > behavior
> > > of page sending, it also affects the migration information s= tats and
> > > interfere other migration logic such as migration_update_rat= es().
> >
> > Could you elaborate?
> >
> > For example, what happens if we start to sync in ram_save_iterate= () for
> > some time intervals (e.g. 5 seconds)?
> >
>
> I didn't try to sync in ram_save_iterate but in the
> migration_bitmap_sync_precopy.
>
> If we use the migration_bitmap_sync_precopy in the ram_save_iterate > function,
> This approach seems to be correct. However, the bitmap will be updated= as
> the
> migration thread iterates through each dirty page in the RAMBlock list= .
> Compared
> to the existing implementation, this is different but still straightfo= rward;
> I'll give it a shot soon to see if it works.

It's still serialized in the migration thread, so I'd expect it is = similar

What does "se= rialized" mean?

How about we:=
1. invoke the=C2=A0migration_bitmap_sync_precopy in a ti= mer(bg_sync_timer) hook,
=C2=A0 =C2=A0every 5 seconds.
2. register the bg_sync_timer=C2=A0in the main loop when th= e machine starts like
=C2=A0 =C2=A0 throttle_timer
<= div style=3D"font-family:"comic sans ms",sans-serif" class=3D"gma= il_default">3. activate the timer when ram_save_iterate gets called and dea= ctivate it in
=C2=A0 =C2=A0 the ram_save_cleanup graceful= ly during migration.=C2=A0

I= think it is simple enough and also isn't "serialized"?
=

to e.g. ->state_pending_exact() calls when QEMU flushed most dirty pages= in
the current bitmap.

>
>
> > Btw, we shouldn't have this extra sync exist if auto converge= is disabled
> > no matter which way we use, because it's pure overhead when a= uto converge
> > is not in use.
> >
>
> Ok, I'll add the check in the next versioni.

Let's start with simple, and if there's anything unsure we can disc= uss
upfront, just to avoid coding something and change direction later.=C2=A0 A= gain,
personally I think we shouldn't add too much new code to auto converge<= br> (unless very well justfied, but I think it's just hard.. fundamentally = with
any pure throttling solutions), hopefully something small can make it start=
to work for huge VMs.

Thanks,

--
Peter Xu



--
Best regards=
--00000000000053a180062310eafc--