From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 81AE0C433EF for ; Tue, 1 Mar 2022 09:27:27 +0000 (UTC) Received: from localhost ([::1]:43048 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1nOyn0-0006iF-DR for qemu-devel@archiver.kernel.org; Tue, 01 Mar 2022 04:27:26 -0500 Received: from eggs.gnu.org ([209.51.188.92]:54834) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1nOy3c-0004R0-AM for qemu-devel@nongnu.org; Tue, 01 Mar 2022 03:40:33 -0500 Received: from us-smtp-delivery-124.mimecast.com ([170.10.129.124]:29181) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1nOy3a-0008Kh-9f for qemu-devel@nongnu.org; Tue, 01 Mar 2022 03:40:32 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1646124029; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=vDW0WR/YILw7vbC/woY+9lmdtY42QRLuti/BpXnPTr8=; b=Qq8vO92LCp3rEb0IjczHK8BwhwMKdh9WXRhWe4i5IwQ7CPTzMVWJFn143s2hktztpmdYZu PJj8M8fgsH6XAcI/S1zIVQ4TeNFl2/NDRNhU1dM4NnO5+/JWts2+GW583P/UEn/CXlaw99 JxjyxEc196Pv3srj9mPuLXye+eX/Yzc= Received: from mail-pf1-f198.google.com (mail-pf1-f198.google.com [209.85.210.198]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-497-RAr-iqF-M7aErH-lKRaK2Q-1; Tue, 01 Mar 2022 03:40:28 -0500 X-MC-Unique: RAr-iqF-M7aErH-lKRaK2Q-1 Received: by mail-pf1-f198.google.com with SMTP id 62-20020a621541000000b004f110fdb1aeso9381026pfv.13 for ; Tue, 01 Mar 2022 00:40:28 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=vDW0WR/YILw7vbC/woY+9lmdtY42QRLuti/BpXnPTr8=; b=YcM9CFLpDA3tOqrC9BwIooL28NrXr4HO8r4hQ0H8yEGYffExv1CijzKPuLqyv5xTgo Q6i8QDEz8C1Byc+dSSrkTpIl6PzhNmn98AYBsnNXkAXTuE7V+EzsWTiFTFJDHHC5PvvC 8o/0qfOVKDoYBaE0MYNj6fvH4PEOm1DKNBpF+ZWSM4GW1lRYC3bCJmR3aFkVuYdWJ9iS x325rcMzc8SGucdoD07GHrsfJt5bpk318up586nidJDjOrUW5xmep0a4UAHmc5MY8dKC sSaQR9xsByvAgLYMw0BzVtlF4WQIjhbGg5YewZQ/J0laXJOJlKPXh+p6iYveuMJGeMbW SNIA== X-Gm-Message-State: AOAM532uHApoCMFoLmLpgpVNN6+ImH3ZM0b14bgn+Dz1b31SmPfo9L50 AnjwZBCb2e2mR7xAWKK72+qJTMdYtlChCiTfwjupUexxOV6yUogVPaQfhyQatO+s3oqVCijrCXh Wnmz51Xf0hlvwt01FG3EWJeDqb5NQInOxWeEmF5Sz2JgENXSrPM5Va0RVSy/ucV6b X-Received: by 2002:a17:90a:5302:b0:1b9:ba0a:27e5 with SMTP id x2-20020a17090a530200b001b9ba0a27e5mr20438249pjh.91.1646124027504; Tue, 01 Mar 2022 00:40:27 -0800 (PST) X-Google-Smtp-Source: ABdhPJzZmThyYMXmNROFLEjD7sgE0XxQM4+KKwqGvmdKpr8PFTA82glYwZ+u9lVFB5ZA4EgAH4YO6g== X-Received: by 2002:a17:90a:5302:b0:1b9:ba0a:27e5 with SMTP id x2-20020a17090a530200b001b9ba0a27e5mr20438213pjh.91.1646124026975; Tue, 01 Mar 2022 00:40:26 -0800 (PST) Received: from localhost.localdomain ([94.177.118.144]) by smtp.gmail.com with ESMTPSA id l1-20020a17090aec0100b001bc6d8bb27dsm1439987pjy.37.2022.03.01.00.40.24 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Tue, 01 Mar 2022 00:40:26 -0800 (PST) From: Peter Xu To: qemu-devel@nongnu.org Subject: [PATCH v2 19/25] migration: Postcopy recover with preempt enabled Date: Tue, 1 Mar 2022 16:39:19 +0800 Message-Id: <20220301083925.33483-20-peterx@redhat.com> X-Mailer: git-send-email 2.32.0 In-Reply-To: <20220301083925.33483-1-peterx@redhat.com> References: <20220301083925.33483-1-peterx@redhat.com> MIME-Version: 1.0 Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=peterx@redhat.com X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Transfer-Encoding: 8bit Content-Type: text/plain; charset="US-ASCII" Received-SPF: pass client-ip=170.10.129.124; envelope-from=peterx@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -28 X-Spam_score: -2.9 X-Spam_bar: -- X-Spam_report: (-2.9 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.082, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H5=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Juan Quintela , "Dr . David Alan Gilbert" , peterx@redhat.com, Leonardo Bras Soares Passos Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" To allow postcopy recovery, the ram fast load (preempt-only) dest QEMU thread needs similar handling on fault tolerance. When ram_load_postcopy() fails, instead of stopping the thread it halts with a semaphore, preparing to be kicked again when recovery is detected. A mutex is introduced to make sure there's no concurrent operation upon the socket. To make it simple, the fast ram load thread will take the mutex during its whole procedure, and only release it if it's paused. The fast-path socket will be properly released by the main loading thread safely when there's network failures during postcopy with that mutex held. Reviewed-by: Dr. David Alan Gilbert Signed-off-by: Peter Xu --- migration/migration.c | 27 +++++++++++++++++++++++---- migration/migration.h | 19 +++++++++++++++++++ migration/postcopy-ram.c | 24 ++++++++++++++++++++++-- migration/qemu-file.c | 27 +++++++++++++++++++++++++++ migration/qemu-file.h | 1 + migration/savevm.c | 21 +++++++++++++++++++-- migration/trace-events | 2 ++ 7 files changed, 113 insertions(+), 8 deletions(-) diff --git a/migration/migration.c b/migration/migration.c index d20db04097..69778cab23 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -215,9 +215,11 @@ void migration_object_init(void) current_incoming->postcopy_remote_fds = g_array_new(FALSE, TRUE, sizeof(struct PostCopyFD)); qemu_mutex_init(¤t_incoming->rp_mutex); + qemu_mutex_init(¤t_incoming->postcopy_prio_thread_mutex); qemu_event_init(¤t_incoming->main_thread_load_event, false); qemu_sem_init(¤t_incoming->postcopy_pause_sem_dst, 0); qemu_sem_init(¤t_incoming->postcopy_pause_sem_fault, 0); + qemu_sem_init(¤t_incoming->postcopy_pause_sem_fast_load, 0); qemu_mutex_init(¤t_incoming->page_request_mutex); current_incoming->page_requested = g_tree_new(page_request_addr_cmp); @@ -697,9 +699,9 @@ static bool postcopy_try_recover(void) /* * Here, we only wake up the main loading thread (while the - * fault thread will still be waiting), so that we can receive + * rest threads will still be waiting), so that we can receive * commands from source now, and answer it if needed. The - * fault thread will be woken up afterwards until we are sure + * rest threads will be woken up afterwards until we are sure * that source is ready to reply to page requests. */ qemu_sem_post(&mis->postcopy_pause_sem_dst); @@ -3466,6 +3468,18 @@ static MigThrError postcopy_pause(MigrationState *s) qemu_file_shutdown(file); qemu_fclose(file); + /* + * Do the same to postcopy fast path socket too if there is. No + * locking needed because no racer as long as we do this before setting + * status to paused. + */ + if (s->postcopy_qemufile_src) { + migration_ioc_unregister_yank_from_file(s->postcopy_qemufile_src); + qemu_file_shutdown(s->postcopy_qemufile_src); + qemu_fclose(s->postcopy_qemufile_src); + s->postcopy_qemufile_src = NULL; + } + migrate_set_state(&s->state, s->state, MIGRATION_STATUS_POSTCOPY_PAUSED); @@ -3521,8 +3535,13 @@ static MigThrError migration_detect_error(MigrationState *s) return MIG_THR_ERR_FATAL; } - /* Try to detect any file errors */ - ret = qemu_file_get_error_obj(s->to_dst_file, &local_error); + /* + * Try to detect any file errors. Note that postcopy_qemufile_src will + * be NULL when postcopy preempt is not enabled. + */ + ret = qemu_file_get_error_obj_any(s->to_dst_file, + s->postcopy_qemufile_src, + &local_error); if (!ret) { /* Everything is fine */ assert(!local_error); diff --git a/migration/migration.h b/migration/migration.h index b8aacfe3af..91f845e9e4 100644 --- a/migration/migration.h +++ b/migration/migration.h @@ -118,6 +118,18 @@ struct MigrationIncomingState { /* Postcopy priority thread is used to receive postcopy requested pages */ QemuThread postcopy_prio_thread; bool postcopy_prio_thread_created; + /* + * Used to sync between the ram load main thread and the fast ram load + * thread. It protects postcopy_qemufile_dst, which is the postcopy + * fast channel. + * + * The ram fast load thread will take it mostly for the whole lifecycle + * because it needs to continuously read data from the channel, and + * it'll only release this mutex if postcopy is interrupted, so that + * the ram load main thread will take this mutex over and properly + * release the broken channel. + */ + QemuMutex postcopy_prio_thread_mutex; /* * An array of temp host huge pages to be used, one for each postcopy * channel. @@ -147,6 +159,13 @@ struct MigrationIncomingState { /* notify PAUSED postcopy incoming migrations to try to continue */ QemuSemaphore postcopy_pause_sem_dst; QemuSemaphore postcopy_pause_sem_fault; + /* + * This semaphore is used to allow the ram fast load thread (only when + * postcopy preempt is enabled) fall into sleep when there's network + * interruption detected. When the recovery is done, the main load + * thread will kick the fast ram load thread using this semaphore. + */ + QemuSemaphore postcopy_pause_sem_fast_load; /* List of listening socket addresses */ SocketAddressList *socket_address_list; diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c index df0c02f729..e20305a9e2 100644 --- a/migration/postcopy-ram.c +++ b/migration/postcopy-ram.c @@ -1580,6 +1580,15 @@ int postcopy_preempt_setup(MigrationState *s, Error **errp) return 0; } +static void postcopy_pause_ram_fast_load(MigrationIncomingState *mis) +{ + trace_postcopy_pause_fast_load(); + qemu_mutex_unlock(&mis->postcopy_prio_thread_mutex); + qemu_sem_wait(&mis->postcopy_pause_sem_fast_load); + qemu_mutex_lock(&mis->postcopy_prio_thread_mutex); + trace_postcopy_pause_fast_load_continued(); +} + void *postcopy_preempt_thread(void *opaque) { MigrationIncomingState *mis = opaque; @@ -1592,11 +1601,22 @@ void *postcopy_preempt_thread(void *opaque) qemu_sem_post(&mis->thread_sync_sem); /* Sending RAM_SAVE_FLAG_EOS to terminate this thread */ - ret = ram_load_postcopy(mis->postcopy_qemufile_dst, RAM_CHANNEL_POSTCOPY); + qemu_mutex_lock(&mis->postcopy_prio_thread_mutex); + while (1) { + ret = ram_load_postcopy(mis->postcopy_qemufile_dst, RAM_CHANNEL_POSTCOPY); + /* If error happened, go into recovery routine */ + if (ret) { + postcopy_pause_ram_fast_load(mis); + } else { + /* We're done */ + break; + } + } + qemu_mutex_unlock(&mis->postcopy_prio_thread_mutex); rcu_unregister_thread(); trace_postcopy_preempt_thread_exit(); - return ret == 0 ? NULL : (void *)-1; + return NULL; } diff --git a/migration/qemu-file.c b/migration/qemu-file.c index 1479cddad9..397652f0ba 100644 --- a/migration/qemu-file.c +++ b/migration/qemu-file.c @@ -139,6 +139,33 @@ int qemu_file_get_error_obj(QEMUFile *f, Error **errp) return f->last_error; } +/* + * Get last error for either stream f1 or f2 with optional Error*. + * The error returned (non-zero) can be either from f1 or f2. + * + * If any of the qemufile* is NULL, then skip the check on that file. + * + * When there is no error on both qemufile, zero is returned. + */ +int qemu_file_get_error_obj_any(QEMUFile *f1, QEMUFile *f2, Error **errp) +{ + int ret = 0; + + if (f1) { + ret = qemu_file_get_error_obj(f1, errp); + /* If there's already error detected, return */ + if (ret) { + return ret; + } + } + + if (f2) { + ret = qemu_file_get_error_obj(f2, errp); + } + + return ret; +} + /* * Set the last error for stream f with optional Error* */ diff --git a/migration/qemu-file.h b/migration/qemu-file.h index 3f36d4dc8c..2564e5e1c7 100644 --- a/migration/qemu-file.h +++ b/migration/qemu-file.h @@ -156,6 +156,7 @@ void qemu_file_update_transfer(QEMUFile *f, int64_t len); void qemu_file_set_rate_limit(QEMUFile *f, int64_t new_rate); int64_t qemu_file_get_rate_limit(QEMUFile *f); int qemu_file_get_error_obj(QEMUFile *f, Error **errp); +int qemu_file_get_error_obj_any(QEMUFile *f1, QEMUFile *f2, Error **errp); void qemu_file_set_error_obj(QEMUFile *f, int ret, Error *err); void qemu_file_set_error(QEMUFile *f, int ret); int qemu_file_shutdown(QEMUFile *f); diff --git a/migration/savevm.c b/migration/savevm.c index 254aa78234..24b69a1008 100644 --- a/migration/savevm.c +++ b/migration/savevm.c @@ -2152,6 +2152,13 @@ static int loadvm_postcopy_handle_resume(MigrationIncomingState *mis) */ qemu_sem_post(&mis->postcopy_pause_sem_fault); + if (migrate_postcopy_preempt()) { + /* The channel should already be setup again; make sure of it */ + assert(mis->postcopy_qemufile_dst); + /* Kick the fast ram load thread too */ + qemu_sem_post(&mis->postcopy_pause_sem_fast_load); + } + return 0; } @@ -2607,6 +2614,16 @@ static bool postcopy_pause_incoming(MigrationIncomingState *mis) mis->to_src_file = NULL; qemu_mutex_unlock(&mis->rp_mutex); + if (mis->postcopy_qemufile_dst) { + qemu_file_shutdown(mis->postcopy_qemufile_dst); + /* Take the mutex to make sure the fast ram load thread halted */ + qemu_mutex_lock(&mis->postcopy_prio_thread_mutex); + migration_ioc_unregister_yank_from_file(mis->postcopy_qemufile_dst); + qemu_fclose(mis->postcopy_qemufile_dst); + mis->postcopy_qemufile_dst = NULL; + qemu_mutex_unlock(&mis->postcopy_prio_thread_mutex); + } + migrate_set_state(&mis->state, MIGRATION_STATUS_POSTCOPY_ACTIVE, MIGRATION_STATUS_POSTCOPY_PAUSED); @@ -2634,8 +2651,8 @@ retry: while (true) { section_type = qemu_get_byte(f); - if (qemu_file_get_error(f)) { - ret = qemu_file_get_error(f); + ret = qemu_file_get_error_obj_any(f, mis->postcopy_qemufile_dst, NULL); + if (ret) { break; } diff --git a/migration/trace-events b/migration/trace-events index b1155d9da0..dfe36a3340 100644 --- a/migration/trace-events +++ b/migration/trace-events @@ -270,6 +270,8 @@ mark_postcopy_blocktime_begin(uint64_t addr, void *dd, uint32_t time, int cpu, i mark_postcopy_blocktime_end(uint64_t addr, void *dd, uint32_t time, int affected_cpu) "addr: 0x%" PRIx64 ", dd: %p, time: %u, affected_cpu: %d" postcopy_pause_fault_thread(void) "" postcopy_pause_fault_thread_continued(void) "" +postcopy_pause_fast_load(void) "" +postcopy_pause_fast_load_continued(void) "" postcopy_ram_fault_thread_entry(void) "" postcopy_ram_fault_thread_exit(void) "" postcopy_ram_fault_thread_fds_core(int baseufd, int quitfd) "ufd: %d quitfd: %d" -- 2.32.0