From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id D9989C6FD1F for ; Wed, 22 Mar 2023 14:06:24 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1pez5y-0002ZH-Mx; Wed, 22 Mar 2023 10:05:42 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1pez5s-0002Vf-1F for qemu-devel@nongnu.org; Wed, 22 Mar 2023 10:05:38 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.129.124]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1pez5p-0004eH-6j for qemu-devel@nongnu.org; Wed, 22 Mar 2023 10:05:35 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1679493921; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=jo1f4P3EV3yzlit/7rZWd+UCFPz7UAs2RTvJgra0Mak=; b=EGbdJUINBUxncNtXFwHL2psklSxwFLgkA11p29XqthUOT+TLjUgcsTExII8BpKn9jYmVML zzGTroD7DCaiR+XrBqobw4UwXnYc4GQVQyD07L/73DHC7ZpHFed30A9rbaA9xdvCf/IGiA 3UAf5logEI9Lr72YdZ/CMMOqDiessIE= Received: from mail-wm1-f71.google.com (mail-wm1-f71.google.com [209.85.128.71]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-211-pSDKHxAnOemtZjy04yxGbw-1; Wed, 22 Mar 2023 10:05:20 -0400 X-MC-Unique: pSDKHxAnOemtZjy04yxGbw-1 Received: by mail-wm1-f71.google.com with SMTP id j27-20020a05600c1c1b00b003edd2023418so5755391wms.4 for ; Wed, 22 Mar 2023 07:05:10 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1679493909; h=user-agent:in-reply-to:content-disposition:mime-version:references :message-id:subject:cc:to:from:date:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=jo1f4P3EV3yzlit/7rZWd+UCFPz7UAs2RTvJgra0Mak=; b=1VsXt7kq+9DaLVfRYQk9JAKerPxbZxAOYi0Wfp5HfFTjbFyz67eRampUUHxLiyMs6c euqF5+OUwW+3snJWNH5NSCdiACsHFfiX8R0T2oR87suczj7+viI3Thlxf6g/QIdSZUVY DxWkLNijzVmdCdw3509kj5hhaflvwVcAa9fCvw4nNGnJxdtRcHLFYrn3zz+S7P5Af+g8 sKhMqzG2dttUQM144nLbcxYHt0ZV7GgUF1KVLkj4ToiZtNwl8IseBa9rJGJde2DljLDh OfebGi86Mvs/205TKUYWsQLZc4K6+5A0LejKIQyYaTI4DF+PobcMIV7+xSuL9ovwLgRU YWeA== X-Gm-Message-State: AAQBX9feWJCnVDNG40QGu5DP5A/jFGzOjkEchpdVqB5ksp3Dy8PGUPkp Bb72S52FuQTr13Cecf11maFUSszwSM1WvcE0VN6ZmSuYmhluF6Gb78ZZW/xuiihQkSfWMm9tsrs NvvXxWJ0nKQ83GWw= X-Received: by 2002:a5d:6b0a:0:b0:2d1:7ade:aac with SMTP id v10-20020a5d6b0a000000b002d17ade0aacmr26558wrw.0.1679493909189; Wed, 22 Mar 2023 07:05:09 -0700 (PDT) X-Google-Smtp-Source: AKy350bYSWq06DIpdTYXHMkHAiPqYJnVpxwNpNPkH+cGbfGj4KAQAzNHQNwNu79YnChvwcESebZm3A== X-Received: by 2002:a5d:6b0a:0:b0:2d1:7ade:aac with SMTP id v10-20020a5d6b0a000000b002d17ade0aacmr26528wrw.0.1679493908815; Wed, 22 Mar 2023 07:05:08 -0700 (PDT) Received: from work-vm (ward-16-b2-v4wan-166627-cust863.vm18.cable.virginm.net. [81.97.203.96]) by smtp.gmail.com with ESMTPSA id a3-20020adffb83000000b002c561805a4csm13867112wrr.45.2023.03.22.07.05.07 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 22 Mar 2023 07:05:08 -0700 (PDT) Date: Wed, 22 Mar 2023 14:05:06 +0000 From: "Dr. David Alan Gilbert" To: Peter Xu Cc: peter.maydell@linaro.org, qemu-devel@nongnu.org, quintela@redhat.com Subject: Re: s390 migration crash Message-ID: References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/2.2.9 (2022-11-12) Received-SPF: pass client-ip=170.10.129.124; envelope-from=dgilbert@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org * Peter Xu (peterx@redhat.com) wrote: > On Tue, Mar 21, 2023 at 08:24:37PM +0000, Dr. David Alan Gilbert wrote: > > Hi Peter's, > > Peter M pointed me to a seg in a migration test in CI; I can reproduce > > it: > > * On an s390 host > > How easy to reproduce? Pretty much every time when run as: make check -j 4 > > * only as part of a make check - running migration-test by itself > > doesn't trigger for me. > > * It looks like it's postcopy preempt > > > > (gdb) bt full > > #0 iov_size (iov=iov@entry=0x2aa00e60670, iov_cnt=) at ../util/iov.c:88 > > len = 13517923312037845750 > > i = 17305 > > #1 0x000002aa004d068c in qemu_fflush (f=0x2aa00e58630) at ../migration/qemu-file.c:307 > > local_error = 0x0 > > #2 0x000002aa004d0e04 in qemu_fflush (f=) at ../migration/qemu-file.c:297 > > #3 0x000002aa00613962 in postcopy_preempt_shutdown_file (s=s@entry=0x2aa00d1b4e0) at ../migration/ram.c:4657 > > #4 0x000002aa004e12b4 in migration_completion (s=0x2aa00d1b4e0) at ../migration/migration.c:3469 > > ret = > > current_active_state = 5 > > must_precopy = 0 > > can_postcopy = 0 > > in_postcopy = true > > pending_size = 0 > > __func__ = "migration_iteration_run" > > iter_state = > > s = 0x2aa00d1b4e0 > > thread = > > setup_start = > > thr_error = > > urgent = > > #5 migration_iteration_run (s=0x2aa00d1b4e0) at ../migration/migration.c:3882 > > must_precopy = 0 > > can_postcopy = 0 > > in_postcopy = true > > pending_size = 0 > > __func__ = "migration_iteration_run" > > iter_state = > > s = 0x2aa00d1b4e0 > > thread = > > setup_start = > > thr_error = > > urgent = > > #6 migration_thread (opaque=opaque@entry=0x2aa00d1b4e0) at ../migration/migration.c:4124 > > iter_state = > > s = 0x2aa00d1b4e0 > > --Type for more, q to quit, c to continue without paging-- > > thread = > > setup_start = > > thr_error = > > urgent = > > #7 0x000002aa00819b8c in qemu_thread_start (args=) at ../util/qemu-thread-posix.c:541 > > __cancel_buf = > > {__cancel_jmp_buf = {{__cancel_jmp_buf = {{__gregs = {4396782422080, 4393751543808, 4397299389454, 4396844235904, 2929182727824, 2929182933488, 4396843986792, 4397299389455, 33679382915066768, 33678512846981306}, __fpregs = {4396774031360, 8392704, 2929182933488, 0, 4396782422272, 2929172491858, 4396774031360, 1}}}, __mask_was_saved = 0}}, __pad = {0x3ffb4a77a60, 0x0, 0x0, 0x0}} > > __cancel_routine = 0x2aa00819bf0 > > __not_first_call = > > start_routine = 0x2aa004e08f0 > > arg = 0x2aa00d1b4e0 > > r = > > #8 0x000003ffb7b1e2e6 in start_thread () at /lib64/libc.so.6 > > #9 0x000003ffb7aafdbe in thread_start () at /lib64/libc.so.6 > > > > It looks like it's in the preempt test: > > > > (gdb) where > > #0 0x000003ffb17a0126 in __pthread_kill_implementation () from /lib64/libc.so.6 > > #1 0x000003ffb1750890 in raise () from /lib64/libc.so.6 > > #2 0x000003ffb172a340 in abort () from /lib64/libc.so.6 > > #3 0x000002aa0041c130 in qtest_check_status (s=) at ../tests/qtest/libqtest.c:194 > > #4 0x000003ffb1a3b5de in g_hook_list_invoke () from /lib64/libglib-2.0.so.0 > > #5 > > #6 0x000003ffb17a0126 in __pthread_kill_implementation () from /lib64/libc.so.6 > > #7 0x000003ffb1750890 in raise () from /lib64/libc.so.6 > > #8 0x000003ffb172a340 in abort () from /lib64/libc.so.6 > > #9 0x000002aa00420318 in qmp_fd_receive (fd=) at ../tests/qtest/libqmp.c:80 > > #10 0x000002aa0041d5ee in qtest_qmp_receive_dict (s=0x2aa01eb2700) at ../tests/qtest/libqtest.c:713 > > #11 qtest_qmp_receive (s=0x2aa01eb2700) at ../tests/qtest/libqtest.c:701 > > #12 qtest_vqmp (s=s@entry=0x2aa01eb2700, fmt=fmt@entry=0x2aa00487100 "{ 'execute': 'query-migrate' }", ap=ap@entry=0x3ffc247cc68) > > at ../tests/qtest/libqtest.c:765 > > #13 0x000002aa00413f1e in wait_command (who=who@entry=0x2aa01eb2700, command=command@entry=0x2aa00487100 "{ 'execute': 'query-migrate' }") > > at ../tests/qtest/migration-helpers.c:73 > > #14 0x000002aa00414078 in migrate_query (who=who@entry=0x2aa01eb2700) at ../tests/qtest/migration-helpers.c:139 > > #15 migrate_query_status (who=who@entry=0x2aa01eb2700) at ../tests/qtest/migration-helpers.c:161 > > #16 0x000002aa00414480 in check_migration_status (ungoals=0x0, goal=0x2aa00495c7e "completed", who=0x2aa01eb2700) at ../tests/qtest/migration-helpers.c:177 > > #17 wait_for_migration_status (who=0x2aa01eb2700, goal=, ungoals=0x0) at ../tests/qtest/migration-helpers.c:202 > > #18 0x000002aa0041300e in migrate_postcopy_complete (from=from@entry=0x2aa01eb2700, to=to@entry=0x2aa01eb3000, args=args@entry=0x3ffc247cf48) > > at ../tests/qtest/migration-test.c:1137 > > #19 0x000002aa004131a4 in test_postcopy_common (args=0x3ffc247cf48) at ../tests/qtest/migration-test.c:1162 > > #20 test_postcopy_preempt () at ../tests/qtest/migration-test.c:1178 > > > > Looking at the iov and file it's garbage; so it makes me think this is > > something like a flush on a closed file. > > I didn't figure out how that could be closed, but I think there's indeed a > possible race that the qemufile can be accessed by both the return path > thread and the migration thread concurrently, while qemufile is not thread > safe on that. > > What postcopy_preempt_shutdown_file() does was: the src uses this EOS to > kick the dest QEMU preempt thread out of the migration and shut it off. > After some thought I think this is unnecessary complexity, since postcopy > should end at the point where dest received all the data, then it sends a > SHUT to src. So potentially it's not good to have dest relying on anything > from src to shutdown anything (the preempt thread here) because it's the > dest qemu that makes the final decision to finish. Ideally the preempt > thread on dest should be able to shutdown itself. > > The trick here is preempt thread will block at read() (aka, recvmsg()) at > the channel at that time and the only way to kick it out from that is a > shutdown() on dest. I attached a patch did it. I'm not 100% sure whether > it'll already resolve our problem but worth trying. This also made me > notice we forgot to enable SHUTDOWN feature on tls server when I was > running the patch 1 with qtest, so two patches needed. Well, it seems to fix it: Tested-by: Dr. David Alan Gilbert Will this break interaction with an old qemu that's still waiting for the EOS? Dave > -- > Peter Xu > From 0e317fa78e9671c119f6be78a0e0a36201517dc2 Mon Sep 17 00:00:00 2001 > From: Peter Xu > Date: Tue, 21 Mar 2023 19:58:42 -0400 > Subject: [PATCH 1/2] io: tls: Inherit QIO_CHANNEL_FEATURE_SHUTDOWN on server > side > > TLS iochannel will inherit io_shutdown() from the master ioc, however we > missed to do that on the server side. > > This will e.g. allow qemu_file_shutdown() to work on dest QEMU too for > migration. > > Signed-off-by: Peter Xu > --- > io/channel-tls.c | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/io/channel-tls.c b/io/channel-tls.c > index 5a7a3d48d6..9805dd0a3f 100644 > --- a/io/channel-tls.c > +++ b/io/channel-tls.c > @@ -74,6 +74,9 @@ qio_channel_tls_new_server(QIOChannel *master, > ioc = QIO_CHANNEL_TLS(object_new(TYPE_QIO_CHANNEL_TLS)); > > ioc->master = master; > + if (qio_channel_has_feature(master, QIO_CHANNEL_FEATURE_SHUTDOWN)) { > + qio_channel_set_feature(QIO_CHANNEL(ioc), QIO_CHANNEL_FEATURE_SHUTDOWN); > + } > object_ref(OBJECT(master)); > > ioc->session = qcrypto_tls_session_new( > -- > 2.39.1 > > From 73b267c17e689d0dafbde1d93c0125b81a43184e Mon Sep 17 00:00:00 2001 > From: Peter Xu > Date: Tue, 21 Mar 2023 19:11:22 -0400 > Subject: [PATCH 2/2] migration: Fix potential race on postcopy_qemufile_src > > postcopy_qemufile_src object should be owned by one thread, either the main > thread (e.g. when at the beginning, or at the end of migration), or by the > return path thread (when during a preempt enabled postcopy migration). If > that's not the case the access to the object might be racy. > > postcopy_preempt_shutdown_file() can be potentially racy, because it's > called at the end phase of migration on the main thread, however during > which the return path thread hasn't yet been recycled; the recycle happens > in await_return_path_close_on_source() which is after this point. > > It means, logically it's posslbe the main thread and the return path thread > are both operating on the same qemufile. While I don't think qemufile is > thread safe at all. > > postcopy_preempt_shutdown_file() used to be needed because that's where we > send EOS to dest so that dest can safely shutdown the preempt thread. > > To avoid the possible race, remove this only place that a race can happen. > Instead we figure out another way to safely close the preempt thread on > dest. > > The core idea during postcopy on deciding "when to stop" is that dest will > send a postcopy SHUT message to src, telling src that all data is there. > Hence to shut the dest preempt thread maybe better to do it directly on > dest node. > > This patch proposed such a way that we change postcopy_prio_thread_created > into PreemptThreadStatus, so that we kick the preempt thread on dest qemu > by a sequence of: > > mis->preempt_thread_status = PREEMPT_THREAD_QUIT; > qemu_file_shutdown(mis->postcopy_qemufile_dst); > > While here shutdown() is probably so far the easiest way to kick preempt > thread from a blocked qemu_get_be64(). Then it reads preempt_thread_status > to make sure it's not a network failure but a willingness to quit the > thread. > > We could have avoided that extra status but just rely on migration status. > The problem is postcopy_ram_incoming_cleanup() is just called early enough > so we're still during POSTCOPY_ACTIVE no matter what.. So just make it > simple to have the status introduced. > > Fixes: 36f62f11e4 ("migration: Postcopy preemption preparation on channel creation") > Signed-off-by: Peter Xu > --- > migration/migration.c | 7 ------- > migration/migration.h | 13 ++++++++++++- > migration/postcopy-ram.c | 20 +++++++++++++++----- > migration/ram.c | 6 ------ > migration/ram.h | 1 - > 5 files changed, 27 insertions(+), 20 deletions(-) > > diff --git a/migration/migration.c b/migration/migration.c > index ae2025d9d8..bbc36100de 100644 > --- a/migration/migration.c > +++ b/migration/migration.c > @@ -3459,16 +3459,9 @@ static void migration_completion(MigrationState *s) > } > } else if (s->state == MIGRATION_STATUS_POSTCOPY_ACTIVE) { > trace_migration_completion_postcopy_end(); > - > qemu_mutex_lock_iothread(); > qemu_savevm_state_complete_postcopy(s->to_dst_file); > qemu_mutex_unlock_iothread(); > - > - /* Shutdown the postcopy fast path thread */ > - if (migrate_postcopy_preempt()) { > - postcopy_preempt_shutdown_file(s); > - } > - > trace_migration_completion_postcopy_end_after_complete(); > } else { > goto fail; > diff --git a/migration/migration.h b/migration/migration.h > index 2da2f8a164..7228163cc8 100644 > --- a/migration/migration.h > +++ b/migration/migration.h > @@ -65,6 +65,12 @@ typedef struct { > bool all_zero; > } PostcopyTmpPage; > > +typedef enum { > + PREEMPT_THREAD_NONE = 0, > + PREEMPT_THREAD_CREATED, > + PREEMPT_THREAD_QUIT, > +} PreemptThreadStatus; > + > /* State for the incoming migration */ > struct MigrationIncomingState { > QEMUFile *from_src_file; > @@ -124,7 +130,12 @@ struct MigrationIncomingState { > QemuSemaphore postcopy_qemufile_dst_done; > /* Postcopy priority thread is used to receive postcopy requested pages */ > QemuThread postcopy_prio_thread; > - bool postcopy_prio_thread_created; > + /* > + * Always set by the main vm load thread only, but can be read by the > + * postcopy preempt thread. "volatile" makes sure all reads will be > + * uptodate across cores. > + */ > + volatile PreemptThreadStatus preempt_thread_status; > /* > * Used to sync between the ram load main thread and the fast ram load > * thread. It protects postcopy_qemufile_dst, which is the postcopy > diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c > index 41c0713650..263bab75ec 100644 > --- a/migration/postcopy-ram.c > +++ b/migration/postcopy-ram.c > @@ -568,9 +568,14 @@ int postcopy_ram_incoming_cleanup(MigrationIncomingState *mis) > { > trace_postcopy_ram_incoming_cleanup_entry(); > > - if (mis->postcopy_prio_thread_created) { > + if (mis->preempt_thread_status == PREEMPT_THREAD_CREATED) { > + /* Notify the fast load thread to quit */ > + mis->preempt_thread_status = PREEMPT_THREAD_QUIT; > + if (mis->postcopy_qemufile_dst) { > + qemu_file_shutdown(mis->postcopy_qemufile_dst); > + } > qemu_thread_join(&mis->postcopy_prio_thread); > - mis->postcopy_prio_thread_created = false; > + mis->preempt_thread_status = PREEMPT_THREAD_NONE; > } > > if (mis->have_fault_thread) { > @@ -1203,7 +1208,7 @@ int postcopy_ram_incoming_setup(MigrationIncomingState *mis) > */ > postcopy_thread_create(mis, &mis->postcopy_prio_thread, "fault-fast", > postcopy_preempt_thread, QEMU_THREAD_JOINABLE); > - mis->postcopy_prio_thread_created = true; > + mis->preempt_thread_status = PREEMPT_THREAD_CREATED; > } > > trace_postcopy_ram_enable_notify(); > @@ -1652,6 +1657,11 @@ static void postcopy_pause_ram_fast_load(MigrationIncomingState *mis) > trace_postcopy_pause_fast_load_continued(); > } > > +static bool preempt_thread_should_run(MigrationIncomingState *mis) > +{ > + return mis->preempt_thread_status != PREEMPT_THREAD_QUIT; > +} > + > void *postcopy_preempt_thread(void *opaque) > { > MigrationIncomingState *mis = opaque; > @@ -1671,11 +1681,11 @@ void *postcopy_preempt_thread(void *opaque) > > /* Sending RAM_SAVE_FLAG_EOS to terminate this thread */ > qemu_mutex_lock(&mis->postcopy_prio_thread_mutex); > - while (1) { > + while (preempt_thread_should_run(mis)) { > ret = ram_load_postcopy(mis->postcopy_qemufile_dst, > RAM_CHANNEL_POSTCOPY); > /* If error happened, go into recovery routine */ > - if (ret) { > + if (ret && preempt_thread_should_run(mis)) { > postcopy_pause_ram_fast_load(mis); > } else { > /* We're done */ > diff --git a/migration/ram.c b/migration/ram.c > index 96e8a19a58..8c316ffed2 100644 > --- a/migration/ram.c > +++ b/migration/ram.c > @@ -4651,12 +4651,6 @@ static int ram_resume_prepare(MigrationState *s, void *opaque) > return 0; > } > > -void postcopy_preempt_shutdown_file(MigrationState *s) > -{ > - qemu_put_be64(s->postcopy_qemufile_src, RAM_SAVE_FLAG_EOS); > - qemu_fflush(s->postcopy_qemufile_src); > -} > - > static SaveVMHandlers savevm_ram_handlers = { > .save_setup = ram_save_setup, > .save_live_iterate = ram_save_iterate, > diff --git a/migration/ram.h b/migration/ram.h > index 81cbb0947c..93b071a1a7 100644 > --- a/migration/ram.h > +++ b/migration/ram.h > @@ -96,7 +96,6 @@ int64_t ramblock_recv_bitmap_send(QEMUFile *file, > const char *block_name); > int ram_dirty_bitmap_reload(MigrationState *s, RAMBlock *rb); > bool ramblock_page_is_discarded(RAMBlock *rb, ram_addr_t start); > -void postcopy_preempt_shutdown_file(MigrationState *s); > void *postcopy_preempt_thread(void *opaque); > > /* ram cache */ > -- > 2.39.1 > -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK