From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 75A2FC77B7A for ; Thu, 20 Apr 2023 19:20:50 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1ppZov-0003Wd-4S; Thu, 20 Apr 2023 15:19:53 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1ppZos-0003WR-G7 for qemu-devel@nongnu.org; Thu, 20 Apr 2023 15:19:50 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1ppZon-0004Lf-9x for qemu-devel@nongnu.org; Thu, 20 Apr 2023 15:19:50 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1682018383; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=EvhD0pZwx6q3ENxz9bZfHb5e41eKB8hmj2SgCiyZkNM=; b=VtX+AYyJP8Q0340dLONZ3avr7jFe78rSmf3sIEs1cFkbn08pu5WDABtlG/wcqYNbtUC7wA djR1XSDRtXqTh6TLCi+K5ix3aP3GVYmEsfE2SgAIyqwj6kJROyEo79bAP446XlIMzvbtBq mtr+cQ279VZjD9FyFOlyTJI6tIe0UE0= Received: from mail-qk1-f198.google.com (mail-qk1-f198.google.com [209.85.222.198]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-364-1Mo7mPvxMZWJhlwCOR9WDQ-1; Thu, 20 Apr 2023 15:19:42 -0400 X-MC-Unique: 1Mo7mPvxMZWJhlwCOR9WDQ-1 Received: by mail-qk1-f198.google.com with SMTP id af79cd13be357-74cf009f476so20729685a.0 for ; Thu, 20 Apr 2023 12:19:42 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1682018382; x=1684610382; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=EvhD0pZwx6q3ENxz9bZfHb5e41eKB8hmj2SgCiyZkNM=; b=jBgSA7vseP0WRJ0ubFiixL1uIEalT2nSVe+OEB8WAg4u4bXu+qKs0laWi5L3EmON6S CqA/OpCreb9T63FbuTu4+zpUxGYvxG03sEtz3aLBros73s22tzWKpg0BqaFySoYLgGwz xL455hQhu033SyPZFWixfhwepgQ2ePvL986Sjn96tO7SZUmBtNT8M9haM5oQoovNSDrO XyEJcoev/LT4mm21ycYm4hDcSoxMFaMbmbve74H2SRmJ3uTFBO1F1wbxJXLt4VN9hrgz g2gZdc908HxSahMGiWw0MOeJ/f/Wp4hD19QSCIIM7J5vmm12LOQZI5lj/tWYGjgbzEbQ +tmg== X-Gm-Message-State: AAQBX9fkj/O8DxUCOC2acsvqrCxxVYUaAPv4sO7YrDYrsBRObyP/6fhH hrFey9wk9kLXYy5EvdpTmt103tRYotCBZbrdZw3wUXo824Ys5lvfby9kRVrtifO2YUY8OdiEouN e9DJzVknm7r8fxL0= X-Received: by 2002:a05:6214:3014:b0:5ef:5132:7ad7 with SMTP id ke20-20020a056214301400b005ef51327ad7mr3961169qvb.2.1682018381706; Thu, 20 Apr 2023 12:19:41 -0700 (PDT) X-Google-Smtp-Source: AKy350Zo8K/xk1YBfGOrI+ws3IN6PRJWpx7U6gYEMiYZttlqBEkpW7+x1no8aCB0w1D2uLJXmRZoeg== X-Received: by 2002:a05:6214:3014:b0:5ef:5132:7ad7 with SMTP id ke20-20020a056214301400b005ef51327ad7mr3961122qvb.2.1682018381251; Thu, 20 Apr 2023 12:19:41 -0700 (PDT) Received: from x1n (bras-base-aurron9127w-grc-40-70-52-229-124.dsl.bell.ca. [70.52.229.124]) by smtp.gmail.com with ESMTPSA id r8-20020a0ccc08000000b005f0d51d3f60sm585083qvk.60.2023.04.20.12.19.39 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 20 Apr 2023 12:19:40 -0700 (PDT) Date: Thu, 20 Apr 2023 15:19:39 -0400 From: Peter Xu To: Daniel =?utf-8?B?UC4gQmVycmFuZ8Op?= Cc: Fabiano Rosas , qemu-devel@nongnu.org, Claudio Fontana , jfehlig@suse.com, dfaggioli@suse.com, dgilbert@redhat.com, Juan Quintela Subject: Re: [RFC PATCH v1 00/26] migration: File based migration with multifd and fixed-ram Message-ID: References: <878rfdos4a.fsf@suse.de> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: Received-SPF: pass client-ip=170.10.133.124; envelope-from=peterx@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org On Thu, Apr 20, 2023 at 10:02:43AM +0100, Daniel P. Berrangé wrote: > On Wed, Apr 19, 2023 at 03:07:19PM -0400, Peter Xu wrote: > > On Wed, Apr 19, 2023 at 06:12:05PM +0100, Daniel P. Berrangé wrote: > > > On Tue, Apr 18, 2023 at 03:26:45PM -0400, Peter Xu wrote: > > > > On Tue, Apr 18, 2023 at 05:58:44PM +0100, Daniel P. Berrangé wrote: > > > > > Libvirt has multiple APIs where it currently uses its migrate-to-file > > > > > approach > > > > > > > > > > * virDomainManagedSave() > > > > > > > > > > This saves VM state to an libvirt managed file, stops the VM, and the > > > > > file state is auto-restored on next request to start the VM, and the > > > > > file deleted. The VM CPUs are stopped during both save + restore > > > > > phase > > > > > > > > > > * virDomainSave/virDomainRestore > > > > > > > > > > The former saves VM state to a file specified by the mgmt app/user. > > > > > A later call to virDomaniRestore starts the VM using that saved > > > > > state. The mgmt app / user can delete the file state, or re-use > > > > > it many times as they desire. The VM CPUs are stopped during both > > > > > save + restore phase > > > > > > > > > > * virDomainSnapshotXXX > > > > > > > > > > This family of APIs takes snapshots of the VM disks, optionally > > > > > also including the full VM state to a separate file. The snapshots > > > > > can later be restored. The VM CPUs remain running during the > > > > > save phase, but are stopped during restore phase > > > > > > > > For this one IMHO it'll be good if Libvirt can consider leveraging the new > > > > background-snapshot capability (QEMU 6.0+, so not very new..). Or is there > > > > perhaps any reason why a generic migrate:fd approach is better? > > > > > > I'm not sure I fully understand the implications of 'background-snapshot' ? > > > > > > Based on what the QAPI comment says, it sounds potentially interesting, > > > as conceptually it would be nicer to have the memory / state snapshot > > > represent the VM at the point where we started the snapshot operation, > > > rather than where we finished the snapshot operation. > > > > > > It would not solve the performance problems that the work in this thread > > > was intended to address though. With large VMs (100's of GB of RAM), > > > saving all the RAM state to disk takes a very long time, regardless of > > > whether the VM vCPUs are paused or running. > > > > I think it solves the performance problem by only copy each of the guest > > page once, even if the guest is running. > > I think we're talking about different performance problems. > > What you describe here is about ensuring the snapshot is of finite size > and completes in linear time, by ensuring each page is written only > once. > > What I'm talking about is being able to parallelize the writing of all > RAM, so if a single thread can saturate the storage, using multiple > threads will make the overal process faster, even when we're only > writing each page once. It depends on how much we want it. Here the live snapshot scenaior could probably leverage a same multi-threading framework with a vm suspend case because it can assume all the pages are static and only saved once. But I agree it's at least not there yet.. so we can directly leverage multifd at least for now. > > > Different from mostly all the rest of "migrate" use cases, background > > snapshot does not use the generic dirty tracking at all (for KVM that's > > get-dirty-log), instead it uses userfaultfd wr-protects, so that when > > taking the snapshot all the guest pages will be protected once. > > Oh, so that means this 'background-snapshot' feature only works on > Linux, and only when permissions allow it. The migration parameter > probably should be marked with 'CONFIG_LINUX' in the QAPI schema > to make it clear this is a non-portable feature. Indeed, I can have a follow up patch for this. But it'll be the same as some other features, like, postcopy (and all its sub-features including postcopy-blocktime and postcopy-preempt)? > > > It guarantees the best efficiency of creating a snapshot with VM running, > > afaict. I sincerely think Libvirt should have someone investigating and > > see whether virDomainSnapshotXXX() can be implemented by this cap rather > > than the default migration. > > Since the background-snapshot feature is not universally available, > it will only ever be possible to use it as an optional enhancement > with virDomainSnapshotXXX, we'll need the portable impl to be the > default / fallback. I am actually curious on how a live snapshot can be implemented correctly if without something like background snapshot. I raised this question in another reply here: https://lore.kernel.org/all/ZDWBSuGDU9IMohEf@x1n/ I was using fixed-ram and vm suspend as example, but I assume it applies to any live snapshot that is based on current default migration scheme. For a real live snapshot (not vm suspend), IIUC we have similar challenges. The problem is when migration completes (snapshot taken) the VM is still running with a live disk image. Then how can we take a snapshot exactly at the same time when we got the guest image mirrored in the vm dump? What guarantees that there's no IO changes after VM image created but before we take a snapshot on the disk image? In short, it's a question on how libvirt can make sure the VM image and disk snapshot image be taken at exactly the same time for live. Thanks, -- Peter Xu