From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 4FF51C02192 for ; Mon, 3 Feb 2025 23:03:29 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1tf5Sw-0003uP-Bt; Mon, 03 Feb 2025 18:02:54 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1tf5Sl-0003tk-F8 for qemu-devel@nongnu.org; Mon, 03 Feb 2025 18:02:46 -0500 Received: from us-smtp-delivery-124.mimecast.com ([170.10.129.124]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1tf5SZ-0008Gd-B0 for qemu-devel@nongnu.org; Mon, 03 Feb 2025 18:02:33 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1738623749; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=MAABnQKr/x7DrGJ28M0o2ddxX2cJt65sLXs9fg04kKc=; b=cORbE7r+tjdlLE24qdg7DqMQVHSQO7Pchh+otLILX3AN/xbRXuKp3IZy0emp/qgT4akOZz q+DB9dH8Jqst9aIs8hl6UiOizZ//i+kcP08xXCenz352eRJR5oKLi4ouiUDpQ+SJnpidZy 68KsRrTFZdnffKTeVI+DjMm/0X9BMLs= Received: from mail-qk1-f198.google.com (mail-qk1-f198.google.com [209.85.222.198]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-541-1VE_g2wWNXWwyy3F4JTubQ-1; Mon, 03 Feb 2025 18:02:27 -0500 X-MC-Unique: 1VE_g2wWNXWwyy3F4JTubQ-1 X-Mimecast-MFC-AGG-ID: 1VE_g2wWNXWwyy3F4JTubQ Received: by mail-qk1-f198.google.com with SMTP id af79cd13be357-7b6ef813ed1so725824885a.3 for ; Mon, 03 Feb 2025 15:02:27 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1738623747; x=1739228547; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=MAABnQKr/x7DrGJ28M0o2ddxX2cJt65sLXs9fg04kKc=; b=QI3bpkfC5EGnFAQganUNaKy5s1VW0LIQRNbSbn1ElZLAxcd6zSTarBVV5FYPdgN5bC PuSzGl3noS6MfpCL/hWJ1XG/mj1rWZ35egupPY2c1ltnzonHUwXHiU4/1+mBnCIfGbf4 pyaVwAya9wE5Q+bXSPUo4VfworchznCOelLgr1g+MfhF70T2EC66luTKI4Ag0Go81woV i+0fj/WZs1lgpoLZ8y2UbOxJxI4m1ZCKeyIccCBsFL54i/6bw9w5/SP7+UNwRuRqodE7 hzGAocUBI2Im0RZcs+pFkKkXcYkQVyzBQdLF75MSgmGFOjLuYWjiYG89BPAf4RSHnIbE C4rg== X-Forwarded-Encrypted: i=1; AJvYcCVOMXwP9DnxAO8EJJ+Ttpd10yWAheop6X7ZlYTgndTraDm7/5TLuf5S1zx8Y3xYJZvPvRV9SWOOygIL@nongnu.org X-Gm-Message-State: AOJu0YxG0ZcdtorYAWcX8FmYPuy5Dki9W6phgp/fJaZ4X9sKNavpjb+5 IKc3L8W/LamM35LJGWDkr44jOLQiXYkOb2+wHS3LRpE9qCQKygemgGdUjKOwkysJMz5uCKFav+I DdivhqdqRSYXkcsBf5K71/KLzk/6CdefxwvWw+9pFij7y5lgG9iAk X-Gm-Gg: ASbGncutdGCZgABIFERf7MrNEb81C0kjPNtHp0D/xtZoAevrmld14oOj7zxSNfuDGEP ObT9d2tg1EJi104PhM/QNRD0nbd5ZDkwNZa6b9hRZUiTN6gWg+NFbfy1wyRDBtQP0rdrPMAw8jW uCmcfRH/Aal+NZJX7AWq5EZBUrqHyWJjkyemMxcLtXLj38uqWiRrbZRt6644sqikH5+dVtS1VZ1 qGQF655rn9Gv4hKmoZuww9iwGlyMNiBuHQoZGcydS2iD4Yt2xdK5tAM7e5eLPrmGdF1BK9T2+GB yqmFK0bnUvIVnYM6SPzDfnahC8ERPmjsvEEly1Bjhkenjr1Z X-Received: by 2002:a05:620a:31aa:b0:7b6:f024:3ca with SMTP id af79cd13be357-7bffcdabe56mr3730589485a.58.1738623747042; Mon, 03 Feb 2025 15:02:27 -0800 (PST) X-Google-Smtp-Source: AGHT+IHwSPqmEhQS9bFeGagyy/xQS4t6LhbhQXYriATQ6ar93iPm//c+kZ0htUViFK24SBC8F8UrVw== X-Received: by 2002:a05:620a:31aa:b0:7b6:f024:3ca with SMTP id af79cd13be357-7bffcdabe56mr3730585185a.58.1738623746624; Mon, 03 Feb 2025 15:02:26 -0800 (PST) Received: from x1.local (pool-99-254-114-190.cpe.net.cable.rogers.com. [99.254.114.190]) by smtp.gmail.com with ESMTPSA id af79cd13be357-7c00a904781sm580290885a.78.2025.02.03.15.02.24 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 03 Feb 2025 15:02:25 -0800 (PST) Date: Mon, 3 Feb 2025 18:02:23 -0500 From: Peter Xu To: "Maciej S. Szmigiero" Cc: "Dr. David Alan Gilbert" , Fabiano Rosas , Alex Williamson , =?utf-8?Q?C=C3=A9dric?= Le Goater , Eric Blake , Markus Armbruster , Daniel =?utf-8?B?UC4gQmVycmFuZ8Op?= , Avihai Horon , Joao Martins , qemu-devel@nongnu.org Subject: Re: [PATCH v4 09/33] migration: postcopy_ram_listen_thread() needs to take BQL for some calls Message-ID: References: <139bf266dbd1e25a1e5a050ecb82e3e59120d705.1738171076.git.maciej.szmigiero@oracle.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: Received-SPF: pass client-ip=170.10.129.124; envelope-from=peterx@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H5=0.001, RCVD_IN_MSPIKE_WL=0.001, RCVD_IN_VALIDITY_RPBL_BLOCKED=0.001, RCVD_IN_VALIDITY_SAFE_BLOCKED=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org On Mon, Feb 03, 2025 at 10:41:43PM +0100, Maciej S. Szmigiero wrote: > On 3.02.2025 21:36, Peter Xu wrote: > > On Mon, Feb 03, 2025 at 09:15:52PM +0100, Maciej S. Szmigiero wrote: > > > On 3.02.2025 20:58, Peter Xu wrote: > > > > On Mon, Feb 03, 2025 at 02:57:36PM +0100, Maciej S. Szmigiero wrote: > > > > > On 2.02.2025 13:45, Dr. David Alan Gilbert wrote: > > > > > > * Maciej S. Szmigiero (mail@maciej.szmigiero.name) wrote: > > > > > > > On 2.02.2025 03:06, Dr. David Alan Gilbert wrote: > > > > > > > > * Maciej S. Szmigiero (mail@maciej.szmigiero.name) wrote: > > > > > > > > > From: "Maciej S. Szmigiero" > > > > > > > > > > > > > > > > > > postcopy_ram_listen_thread() is a free running thread, so it needs to > > > > > > > > > take BQL around function calls to migration methods requiring BQL. > > > > > > > > > > > > > > > > > > qemu_loadvm_state_main() needs BQL held since it ultimately calls > > > > > > > > > "load_state" SaveVMHandlers. > > > > > > > > > > > > > > > > > > migration_incoming_state_destroy() needs BQL held since it ultimately calls > > > > > > > > > "load_cleanup" SaveVMHandlers. > > > > > > > > > > > > > > > > > > Signed-off-by: Maciej S. Szmigiero > > > > > > > > > --- > > > > > > > > > migration/savevm.c | 4 ++++ > > > > > > > > > 1 file changed, 4 insertions(+) > > > > > > > > > > > > > > > > > > diff --git a/migration/savevm.c b/migration/savevm.c > > > > > > > > > index b0b74140daea..0ceea9638cc1 100644 > > > > > > > > > --- a/migration/savevm.c > > > > > > > > > +++ b/migration/savevm.c > > > > > > > > > @@ -2013,7 +2013,9 @@ static void *postcopy_ram_listen_thread(void *opaque) > > > > > > > > > * in qemu_file, and thus we must be blocking now. > > > > > > > > > */ > > > > > > > > > qemu_file_set_blocking(f, true); > > > > > > > > > + bql_lock(); > > > > > > > > > load_res = qemu_loadvm_state_main(f, mis); > > > > > > > > > + bql_unlock(); > > > > > > > > > > > > > > > > Doesn't that leave that held for a heck of a long time? > > > > > > > > > > > > > > Yes, and it effectively broke "postcopy recover" test but I > > > > > > > think the reason for that is qemu_loadvm_state_main() and > > > > > > > its children don't drop BQL while waiting for I/O. > > > > > > > > > > > > > > I've described this case in more detail in my reply to Fabiano here: > > > > > > > https://lore.kernel.org/qemu-devel/0a09e627-955e-4f26-8d08-0192ecd250a8@maciej.szmigiero.name/ > > > > > > > > > > > > While it might be the cause in this case, my feeling is it's more fundamental > > > > > > here - it's the whole reason that postcopy has a separate ram listen > > > > > > thread. As the destination is running, after it loads it's devices > > > > > > and as it starts up the destination will be still loading RAM > > > > > > (and other postcopiable devices) potentially for quite a while. > > > > > > Holding the bql around the ram listen thread means that the > > > > > > execution of the destination won't be able to take that lock > > > > > > until the postcopy load has finished; so while that might apparently > > > > > > complete, it'll lead to the destination stalling until that's finished > > > > > > which defeats the whole point of postcopy. > > > > > > That last one probably won't fail a test but it will lead to a long stall > > > > > > if you give it a nice big guest with lots of RAM that it's rapidly > > > > > > changing. > > > > > > > > > > Okay, I understand the postcopy case/flow now. > > > > > Thanks for explaining it clearly. > > > > > > > > > > > > I still think that "load_state" SaveVMHandlers need to be called > > > > > > > with BQL held since implementations apparently expect it that way: > > > > > > > for example, I think PCI device configuration restore calls > > > > > > > address space manipulation methods which abort() if called > > > > > > > without BQL held. > > > > > > > > > > > > However, the only devices that *should* be arriving on the channel > > > > > > that the postcopy_ram_listen_thread is reading from are those > > > > > > that are postcopiable (i.e. RAM and hmm block's dirty_bitmap). > > > > > > Those load handlers are safe to be run while the other devices > > > > > > are being changed. Note the *should* - you could add a check > > > > > > to fail if any other device arrives on that channel. > > > > > > > > > > I think ultimately there should be either an explicit check, or, > > > > > as you suggest in the paragraph below, a separate SaveVMHandler > > > > > that runs without BQL held. > > > > > > > > To me those are bugs happening during postcopy, so those abort()s in > > > > memory.c are indeed for catching these issues too. > > > > > > > > > Since the current state of just running these SaveVMHandlers > > > > > without BQL in this case and hoping that nothing breaks is > > > > > clearly sub-optimal. > > > > > > > > > > > > I have previously even submitted a patch to explicitly document > > > > > > > "load_state" SaveVMHandler as requiring BQL (which was also > > > > > > > included in the previous version of this patch set) and it > > > > > > > received a "Reviewed-by:" tag: > > > > > > > https://lore.kernel.org/qemu-devel/6976f129df610c8207da4e531c8c0475ec204fa4.1730203967.git.maciej.szmigiero@oracle.com/ > > > > > > > https://lore.kernel.org/qemu-devel/e1949839932efaa531e2fe63ac13324e5787439c.1731773021.git.maciej.szmigiero@oracle.com/ > > > > > > > https://lore.kernel.org/qemu-devel/87o732bti7.fsf@suse.de/ > > > > > > > > > > > > It happens! > > > > > > You could make this safer by having a load_state and a load_state_postcopy > > > > > > member, and only mark the load_state as requiring the lock. > > > > > > > > > > To not digress too much from the subject of this patch set > > > > > (multifd VFIO device state transfer) for now I've just updated the > > > > > TODO comment around that qemu_loadvm_state_main(), so hopefully this > > > > > discussion won't get forgotten: > > > > > https://gitlab.com/maciejsszmigiero/qemu/-/commit/046e3deac5b1dbc406b3e9571f62468bd6743e79 > > > > > > > > The commit message may still need some touch ups, e.g.: > > > > > > > > postcopy_ram_listen_thread() is a free running thread, so it needs to > > > > take BQL around function calls to migration methods requiring BQL. > > > > > > > > > > > > This sentence is still not correct, IMHO. As Dave explained, the ram load > > > > thread is designed to run without BQL at least for the major workloads it > > > > runs. > > > > > > So what's your proposed wording of this commit then? > > > > Perhaps dropping it? As either it implies qemu_loadvm_state_main() needs to > > take bql (which could be wrong in case of postcopy at least from > > design.. not sanity check pov), or it provides no real meaning to suggest > > where to take it.. > > > > Personally I would put the comment as easy as possible - the large portion > > isn't helping me to understand the code but only made it slightly more > > confusing.. > > > > /* > > * TODO: qemu_loadvm_state_main() could call "load_state" SaveVMHandlers > > * that are expecting BQL to be held, which isn't in this case. > > * > > * In principle, the only devices that should be arriving on this channel > > * now are those that are postcopiable and whose load handlers are safe > > * to be called without BQL being held. > > * > > * But nothing currently prevents the source from sending data for "unsafe" > > * devices which would cause trouble here. > > */ > > > > IMHO we could put it very simple if you think we need such sanity check > > later: > > > > /* TODO: sanity check that only postcopiable data will be loaded here */ > > I think I will change that comment wording to the one ^^^^ you suggested above, > since we still need to have this commit to take BQL around that > migration_incoming_state_destroy() call in postcopy_ram_listen_thread(). > > > > > > > > I don't worry on src sending something that crashes the dest: if that > > > > happens, that's a bug, we need to fix it.. In that case abort() either in > > > > memory.c or migration/ would be the same. > > > > > > Yeah, but it would be a bug in the source (or just bit stream corruption for > > > any reason), yet it's the destination which would abort() or crash. > > > > > > I think cases like that in principle should be handled more gracefully, > > > like exiting the destination QEMU with an error. > > > But that's something outside of the scope of this patch set. > > > > Yes I agree. It's just that postcopy normally cannot gracefully quits on > > dest anyway.. as src QEMU cannot continue with a dead dest QEMU. For > > obvious programming errors, I think abort() is still ok in this case, on > > either src or dest if postcopy already started. > > > > For this series, we could always stick with precopy, it could help converge > > the series. > > To be clear I'm messing with postcopy only because without adding that > BQL lock around migration_incoming_state_destroy() in > postcopy_ram_listen_thread() other changes in this patch set would break > postcopy. > And that's obviously not acceptable. Ah, of course. > > > > > > > > We could add some explicit check > > > > in migration code, but I don't expect it to catch anything real, at least > > > > such never happened since postcopy introduced.. so it's roughly 10 years > > > > without anything like that happens. > > > > > > > > Taking BQL for migration_incoming_state_destroy() looks all safe. There's > > > > one qemu_ram_block_writeback() which made me a bit nervous initially, but > > > > then looks like RAM backends should be almost noop (for shmem and > > > > hugetlbfs) but except pmem. > > > > > > That's the only part where taking BQL is actually necessary for the > > > functionality of this patch set to work properly, so it's fine to leave > > > that call to qemu_loadvm_state_main() as-is (without BQL) for time being. > > > > > > > > > > > The other alternative is we define load_cleanup() to not rely on BQL (which > > > > I believe is true before this series?), then take it only when VFIO's path > > > > needs it. > > > > > > I think other paths always call load_cleanup() with BQL so it's probably > > > safer to have consistent semantics here. > > > > IMHO we don't necessarily need to make it the default that vmstate handler > > hooks will need BQL by default - we can always properly define them to best > > suite our need. > > But I think consistency is important - if other callers take BQL for > load_cleanup() then it makes sense to take it in all places (only if to make > the code simpler). I assume current QEMU master branch doesn't need bql for all existing (only RAM and VFIO..) load_cleanup(), or am I wrong? Thanks, -- Peter Xu