From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-9.5 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI, SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 52377C2D0A3 for ; Thu, 12 Nov 2020 13:40:34 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 7D43622249 for ; Thu, 12 Nov 2020 13:40:33 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="UPqyhPgv" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 7D43622249 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id DBA286B009A; Thu, 12 Nov 2020 08:40:32 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id D91456B009B; Thu, 12 Nov 2020 08:40:32 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id CCD046B009C; Thu, 12 Nov 2020 08:40:32 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0151.hostedemail.com [216.40.44.151]) by kanga.kvack.org (Postfix) with ESMTP id 9FB056B009A for ; Thu, 12 Nov 2020 08:40:32 -0500 (EST) Received: from smtpin19.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 3FFA91EE6 for ; Thu, 12 Nov 2020 13:40:32 +0000 (UTC) X-FDA: 77475875904.19.men55_430ad5427306 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin19.hostedemail.com (Postfix) with ESMTP id 118071AD1B7 for ; Thu, 12 Nov 2020 13:40:32 +0000 (UTC) X-HE-Tag: men55_430ad5427306 X-Filterd-Recvd-Size: 8812 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [63.128.21.124]) by imf38.hostedemail.com (Postfix) with ESMTP for ; Thu, 12 Nov 2020 13:40:31 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1605188431; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=ci9DwN+WAadijoV1wNIhAPoOIaorog2wpuEd9B8XADw=; b=UPqyhPgv/pjSZDwwOfcF5HbPePMddO4JdU/K6is6But5ZJAUQPeqH7CiM0SNiOLRi83I2n Z0mWQuV5T1zUEid2ACzb/V+OGpTFURbMxKRhk1HAgEWxAhGbLyERyKnpKvAE4adxp/Saw1 3RKdqv/aW6QKXUfomN0IubZY7BCNPyI= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-90-g7TRWyFLPQWIiWe1xJpGPw-1; Thu, 12 Nov 2020 08:40:27 -0500 X-MC-Unique: g7TRWyFLPQWIiWe1xJpGPw-1 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.phx2.redhat.com [10.5.11.15]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id BC5B1108E1B0; Thu, 12 Nov 2020 13:40:25 +0000 (UTC) Received: from t480s.redhat.com (ovpn-115-61.ams2.redhat.com [10.36.115.61]) by smtp.corp.redhat.com (Postfix) with ESMTP id BDE7175132; Thu, 12 Nov 2020 13:40:16 +0000 (UTC) From: David Hildenbrand To: linux-kernel@vger.kernel.org Cc: virtualization@lists.linux-foundation.org, linux-mm@kvack.org, "Michael S . Tsirkin" , David Hildenbrand , Wei Yang , Jason Wang , Pankaj Gupta , Michal Hocko , Oscar Salvador , Andrew Morton Subject: [PATCH v2 29/29] virtio-mem: Big Block Mode (BBM) - safe memory hotunplug Date: Thu, 12 Nov 2020 14:38:15 +0100 Message-Id: <20201112133815.13332-30-david@redhat.com> In-Reply-To: <20201112133815.13332-1-david@redhat.com> References: <20201112133815.13332-1-david@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.15 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Let's add a safe mechanism to unplug memory, avoiding long/endless loops when trying to offline memory - similar to in SBM. Fake-offline all memory (via alloc_contig_range()) before trying to offline+remove it. Use this mode as default, but allow to enable the othe= r mode explicitly (which could give better memory hotunplug guarantees in some environments). The "unsafe" mode can be enabled e.g., via virtio_mem.bbm_safe_unplug=3D0 on the cmdline. Reviewed-by: Wei Yang Cc: "Michael S. Tsirkin" Cc: Jason Wang Cc: Pankaj Gupta Cc: Michal Hocko Cc: Oscar Salvador Cc: Wei Yang Cc: Andrew Morton Signed-off-by: David Hildenbrand --- drivers/virtio/virtio_mem.c | 97 ++++++++++++++++++++++++++++++++++++- 1 file changed, 95 insertions(+), 2 deletions(-) diff --git a/drivers/virtio/virtio_mem.c b/drivers/virtio/virtio_mem.c index f1696cdb7b0c..9fc9ec4a25f5 100644 --- a/drivers/virtio/virtio_mem.c +++ b/drivers/virtio/virtio_mem.c @@ -37,6 +37,11 @@ module_param(bbm_block_size, ulong, 0444); MODULE_PARM_DESC(bbm_block_size, "Big Block size in bytes. Default is 0 (auto-detection)."); =20 +static bool bbm_safe_unplug =3D true; +module_param(bbm_safe_unplug, bool, 0444); +MODULE_PARM_DESC(bbm_safe_unplug, + "Use a safe unplug mechanism in BBM, avoiding long/endless loops")= ; + /* * virtio-mem currently supports the following modes of operation: * @@ -87,6 +92,8 @@ enum virtio_mem_bbm_bb_state { VIRTIO_MEM_BBM_BB_PLUGGED, /* Plugged and added to Linux. */ VIRTIO_MEM_BBM_BB_ADDED, + /* All online parts are fake-offline, ready to remove. */ + VIRTIO_MEM_BBM_BB_FAKE_OFFLINE, VIRTIO_MEM_BBM_BB_COUNT }; =20 @@ -889,6 +896,32 @@ static void virtio_mem_sbm_notify_cancel_offline(str= uct virtio_mem *vm, } } =20 +static void virtio_mem_bbm_notify_going_offline(struct virtio_mem *vm, + unsigned long bb_id, + unsigned long pfn, + unsigned long nr_pages) +{ + /* + * When marked as "fake-offline", all online memory of this device bloc= k + * is allocated by us. Otherwise, we don't have any memory allocated. + */ + if (virtio_mem_bbm_get_bb_state(vm, bb_id) !=3D + VIRTIO_MEM_BBM_BB_FAKE_OFFLINE) + return; + virtio_mem_fake_offline_going_offline(pfn, nr_pages); +} + +static void virtio_mem_bbm_notify_cancel_offline(struct virtio_mem *vm, + unsigned long bb_id, + unsigned long pfn, + unsigned long nr_pages) +{ + if (virtio_mem_bbm_get_bb_state(vm, bb_id) !=3D + VIRTIO_MEM_BBM_BB_FAKE_OFFLINE) + return; + virtio_mem_fake_offline_cancel_offline(pfn, nr_pages); +} + /* * This callback will either be called synchronously from add_memory() o= r * asynchronously (e.g., triggered via user space). We have to be carefu= l @@ -949,6 +982,10 @@ static int virtio_mem_memory_notifier_cb(struct noti= fier_block *nb, vm->hotplug_active =3D true; if (vm->in_sbm) virtio_mem_sbm_notify_going_offline(vm, id); + else + virtio_mem_bbm_notify_going_offline(vm, id, + mhp->start_pfn, + mhp->nr_pages); break; case MEM_GOING_ONLINE: mutex_lock(&vm->hotplug_mutex); @@ -999,6 +1036,10 @@ static int virtio_mem_memory_notifier_cb(struct not= ifier_block *nb, break; if (vm->in_sbm) virtio_mem_sbm_notify_cancel_offline(vm, id); + else + virtio_mem_bbm_notify_cancel_offline(vm, id, + mhp->start_pfn, + mhp->nr_pages); vm->hotplug_active =3D false; mutex_unlock(&vm->hotplug_mutex); break; @@ -1189,7 +1230,13 @@ static void virtio_mem_online_page_cb(struct page = *page, unsigned int order) do_online =3D virtio_mem_sbm_test_sb_plugged(vm, id, sb_id, 1); } else { - do_online =3D true; + /* + * If the whole block is marked fake offline, keep + * everything that way. + */ + id =3D virtio_mem_phys_to_bb_id(vm, addr); + do_online =3D virtio_mem_bbm_get_bb_state(vm, id) !=3D + VIRTIO_MEM_BBM_BB_FAKE_OFFLINE; } if (do_online) generic_online_page(page, order); @@ -1969,15 +2016,50 @@ static int virtio_mem_sbm_unplug_request(struct v= irtio_mem *vm, uint64_t diff) static int virtio_mem_bbm_offline_remove_and_unplug_bb(struct virtio_mem= *vm, unsigned long bb_id) { + const unsigned long start_pfn =3D PFN_DOWN(virtio_mem_bb_id_to_phys(vm,= bb_id)); + const unsigned long nr_pages =3D PFN_DOWN(vm->bbm.bb_size); + unsigned long end_pfn =3D start_pfn + nr_pages; + unsigned long pfn; + struct page *page; int rc; =20 if (WARN_ON_ONCE(virtio_mem_bbm_get_bb_state(vm, bb_id) !=3D VIRTIO_MEM_BBM_BB_ADDED)) return -EINVAL; =20 + if (bbm_safe_unplug) { + /* + * Start by fake-offlining all memory. Once we marked the device + * block as fake-offline, all newly onlined memory will + * automatically be kept fake-offline. Protect from concurrent + * onlining/offlining until we have a consistent state. + */ + mutex_lock(&vm->hotplug_mutex); + virtio_mem_bbm_set_bb_state(vm, bb_id, + VIRTIO_MEM_BBM_BB_FAKE_OFFLINE); + + for (pfn =3D start_pfn; pfn < end_pfn; pfn +=3D PAGES_PER_SECTION) { + page =3D pfn_to_online_page(pfn); + if (!page) + continue; + + rc =3D virtio_mem_fake_offline(pfn, PAGES_PER_SECTION); + if (rc) { + end_pfn =3D pfn; + goto rollback_safe_unplug; + } + } + mutex_unlock(&vm->hotplug_mutex); + } + rc =3D virtio_mem_bbm_offline_and_remove_bb(vm, bb_id); - if (rc) + if (rc) { + if (bbm_safe_unplug) { + mutex_lock(&vm->hotplug_mutex); + goto rollback_safe_unplug; + } return rc; + } =20 rc =3D virtio_mem_bbm_unplug_bb(vm, bb_id); if (rc) @@ -1987,6 +2069,17 @@ static int virtio_mem_bbm_offline_remove_and_unplu= g_bb(struct virtio_mem *vm, virtio_mem_bbm_set_bb_state(vm, bb_id, VIRTIO_MEM_BBM_BB_UNUSED); return rc; + +rollback_safe_unplug: + for (pfn =3D start_pfn; pfn < end_pfn; pfn +=3D PAGES_PER_SECTION) { + page =3D pfn_to_online_page(pfn); + if (!page) + continue; + virtio_mem_fake_online(pfn, PAGES_PER_SECTION); + } + virtio_mem_bbm_set_bb_state(vm, bb_id, VIRTIO_MEM_BBM_BB_ADDED); + mutex_unlock(&vm->hotplug_mutex); + return rc; } =20 /* --=20 2.26.2