From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-9.5 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI, SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id D4783C388F7 for ; Thu, 12 Nov 2020 13:40:22 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 558CA22240 for ; Thu, 12 Nov 2020 13:40:22 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="N8dJpvfj" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 558CA22240 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id D43F86B0098; Thu, 12 Nov 2020 08:40:21 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id D1C246B009A; Thu, 12 Nov 2020 08:40:21 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C0A0C6B009B; Thu, 12 Nov 2020 08:40:21 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0245.hostedemail.com [216.40.44.245]) by kanga.kvack.org (Postfix) with ESMTP id 942D36B0098 for ; Thu, 12 Nov 2020 08:40:21 -0500 (EST) Received: from smtpin05.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 34D74181AEF00 for ; Thu, 12 Nov 2020 13:40:21 +0000 (UTC) X-FDA: 77475875442.05.crib03_310e68e27306 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin05.hostedemail.com (Postfix) with ESMTP id 15FF71801FC43 for ; Thu, 12 Nov 2020 13:40:21 +0000 (UTC) X-HE-Tag: crib03_310e68e27306 X-Filterd-Recvd-Size: 9092 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [216.205.24.124]) by imf17.hostedemail.com (Postfix) with ESMTP for ; Thu, 12 Nov 2020 13:40:20 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1605188419; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Q75NoBIQ1n3ceap6HyNYt+igNAE7IbQvm8osvP9E7Y8=; b=N8dJpvfjEqDgqrw5moLQRNFjE1wi71wyieREC0PIfHWMA7nKYKVPhRuZ9ZhoH6+F92Razt Ojl69343Z5Hl3qHQZC46a76PysrZI8UMslBQbbNuw8O6tSfq0JCW6dSKVcywSw8WGg6BZs rmtmQgXV7OUAlOQxi1QT7XgUwgv9dfY= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-475-dsjH7fiQNzOF0agIIrIJ_g-1; Thu, 12 Nov 2020 08:40:18 -0500 X-MC-Unique: dsjH7fiQNzOF0agIIrIJ_g-1 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.phx2.redhat.com [10.5.11.15]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 6F202107ACF8; Thu, 12 Nov 2020 13:40:16 +0000 (UTC) Received: from t480s.redhat.com (ovpn-115-61.ams2.redhat.com [10.36.115.61]) by smtp.corp.redhat.com (Postfix) with ESMTP id 3F34A75132; Thu, 12 Nov 2020 13:40:14 +0000 (UTC) From: David Hildenbrand To: linux-kernel@vger.kernel.org Cc: virtualization@lists.linux-foundation.org, linux-mm@kvack.org, "Michael S . Tsirkin" , David Hildenbrand , Jason Wang , Pankaj Gupta , Michal Hocko , Oscar Salvador , Wei Yang , Andrew Morton Subject: [PATCH v2 28/29] virtio-mem: Big Block Mode (BBM) - basic memory hotunplug Date: Thu, 12 Nov 2020 14:38:14 +0100 Message-Id: <20201112133815.13332-29-david@redhat.com> In-Reply-To: <20201112133815.13332-1-david@redhat.com> References: <20201112133815.13332-1-david@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.15 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Let's try to unplug completely offline big blocks first. Then, (if enabled via unplug_offline) try to offline and remove whole big blocks. No locking necessary - we can deal with concurrent onlining/offlining just fine. Note1: This is sub-optimal and might be dangerous in some environments: w= e could end up in an infinite loop when offlining (e.g., long-term pinnings= ), similar as with DIMMs. We'll introduce safe memory hotunplug via fake-offlining next, and use this basic mode only when explicitly enabled= . Note2: Without ZONE_MOVABLE, memory unplug will be extremely unreliable with bigger block sizes. Cc: "Michael S. Tsirkin" Cc: Jason Wang Cc: Pankaj Gupta Cc: Michal Hocko Cc: Oscar Salvador Cc: Wei Yang Cc: Andrew Morton Signed-off-by: David Hildenbrand --- drivers/virtio/virtio_mem.c | 156 +++++++++++++++++++++++++++++++++++- 1 file changed, 155 insertions(+), 1 deletion(-) diff --git a/drivers/virtio/virtio_mem.c b/drivers/virtio/virtio_mem.c index 861149acafe5..f1696cdb7b0c 100644 --- a/drivers/virtio/virtio_mem.c +++ b/drivers/virtio/virtio_mem.c @@ -390,6 +390,12 @@ static int virtio_mem_bbm_bb_states_prepare_next_bb(= struct virtio_mem *vm) _bb_id++) \ if (virtio_mem_bbm_get_bb_state(_vm, _bb_id) =3D=3D _state) =20 +#define virtio_mem_bbm_for_each_bb_rev(_vm, _bb_id, _state) \ + for (_bb_id =3D vm->bbm.next_bb_id - 1; \ + _bb_id >=3D vm->bbm.first_bb_id && _vm->bbm.bb_count[_state]; \ + _bb_id--) \ + if (virtio_mem_bbm_get_bb_state(_vm, _bb_id) =3D=3D _state) + /* * Set the state of a memory block, taking care of the state counter. */ @@ -685,6 +691,18 @@ static int virtio_mem_sbm_remove_mb(struct virtio_me= m *vm, unsigned long mb_id) return virtio_mem_remove_memory(vm, addr, size); } =20 +/* + * See virtio_mem_remove_memory(): Try to remove all Linux memory blocks= covered + * by the big block. + */ +static int virtio_mem_bbm_remove_bb(struct virtio_mem *vm, unsigned long= bb_id) +{ + const uint64_t addr =3D virtio_mem_bb_id_to_phys(vm, bb_id); + const uint64_t size =3D vm->bbm.bb_size; + + return virtio_mem_remove_memory(vm, addr, size); +} + /* * Try offlining and removing memory from Linux. * @@ -731,6 +749,19 @@ static int virtio_mem_sbm_offline_and_remove_mb(stru= ct virtio_mem *vm, return virtio_mem_offline_and_remove_memory(vm, addr, size); } =20 +/* + * See virtio_mem_offline_and_remove_memory(): Try to offline and remove= a + * all Linux memory blocks covered by the big block. + */ +static int virtio_mem_bbm_offline_and_remove_bb(struct virtio_mem *vm, + unsigned long bb_id) +{ + const uint64_t addr =3D virtio_mem_bb_id_to_phys(vm, bb_id); + const uint64_t size =3D vm->bbm.bb_size; + + return virtio_mem_offline_and_remove_memory(vm, addr, size); +} + /* * Trigger the workqueue so the device can perform its magic. */ @@ -1928,6 +1959,129 @@ static int virtio_mem_sbm_unplug_request(struct v= irtio_mem *vm, uint64_t diff) return rc; } =20 +/* + * Try to offline and remove a big block from Linux and unplug it. Will = fail + * with -EBUSY if some memory is busy and cannot get unplugged. + * + * Will modify the state of the memory block. Might temporarily drop the + * hotplug_mutex. + */ +static int virtio_mem_bbm_offline_remove_and_unplug_bb(struct virtio_mem= *vm, + unsigned long bb_id) +{ + int rc; + + if (WARN_ON_ONCE(virtio_mem_bbm_get_bb_state(vm, bb_id) !=3D + VIRTIO_MEM_BBM_BB_ADDED)) + return -EINVAL; + + rc =3D virtio_mem_bbm_offline_and_remove_bb(vm, bb_id); + if (rc) + return rc; + + rc =3D virtio_mem_bbm_unplug_bb(vm, bb_id); + if (rc) + virtio_mem_bbm_set_bb_state(vm, bb_id, + VIRTIO_MEM_BBM_BB_PLUGGED); + else + virtio_mem_bbm_set_bb_state(vm, bb_id, + VIRTIO_MEM_BBM_BB_UNUSED); + return rc; +} + +/* + * Try to remove a big block from Linux and unplug it. Will fail with + * -EBUSY if some memory is online. + * + * Will modify the state of the memory block. + */ +static int virtio_mem_bbm_remove_and_unplug_bb(struct virtio_mem *vm, + unsigned long bb_id) +{ + int rc; + + if (WARN_ON_ONCE(virtio_mem_bbm_get_bb_state(vm, bb_id) !=3D + VIRTIO_MEM_BBM_BB_ADDED)) + return -EINVAL; + + rc =3D virtio_mem_bbm_remove_bb(vm, bb_id); + if (rc) + return -EBUSY; + + rc =3D virtio_mem_bbm_unplug_bb(vm, bb_id); + if (rc) + virtio_mem_bbm_set_bb_state(vm, bb_id, + VIRTIO_MEM_BBM_BB_PLUGGED); + else + virtio_mem_bbm_set_bb_state(vm, bb_id, + VIRTIO_MEM_BBM_BB_UNUSED); + return rc; +} + +/* + * Test if a big block is completely offline. + */ +static bool virtio_mem_bbm_bb_is_offline(struct virtio_mem *vm, + unsigned long bb_id) +{ + const unsigned long start_pfn =3D PFN_DOWN(virtio_mem_bb_id_to_phys(vm,= bb_id)); + const unsigned long nr_pages =3D PFN_DOWN(vm->bbm.bb_size); + unsigned long pfn; + + for (pfn =3D start_pfn; pfn < start_pfn + nr_pages; + pfn +=3D PAGES_PER_SECTION) { + if (pfn_to_online_page(pfn)) + return false; + } + + return true; +} + +static int virtio_mem_bbm_unplug_request(struct virtio_mem *vm, uint64_t= diff) +{ + uint64_t nb_bb =3D diff / vm->bbm.bb_size; + uint64_t bb_id; + int rc; + + if (!nb_bb) + return 0; + + /* Try to unplug completely offline big blocks first. */ + virtio_mem_bbm_for_each_bb_rev(vm, bb_id, VIRTIO_MEM_BBM_BB_ADDED) { + cond_resched(); + /* + * As we're holding no locks, this check is racy as memory + * can get onlined in the meantime - but we'll fail gracefully. + */ + if (!virtio_mem_bbm_bb_is_offline(vm, bb_id)) + continue; + rc =3D virtio_mem_bbm_remove_and_unplug_bb(vm, bb_id); + if (rc =3D=3D -EBUSY) + continue; + if (!rc) + nb_bb--; + if (rc || !nb_bb) + return rc; + } + + if (!unplug_online) + return 0; + + /* Try to unplug any big blocks. */ + virtio_mem_bbm_for_each_bb_rev(vm, bb_id, VIRTIO_MEM_BBM_BB_ADDED) { + cond_resched(); + rc =3D virtio_mem_bbm_offline_remove_and_unplug_bb(vm, bb_id); + if (rc =3D=3D -EBUSY) + continue; + if (!rc) + nb_bb--; + if (rc || !nb_bb) + return rc; + } + + return nb_bb ? -EBUSY : 0; +} + /* * Try to unplug the requested amount of memory. */ @@ -1935,7 +2089,7 @@ static int virtio_mem_unplug_request(struct virtio_= mem *vm, uint64_t diff) { if (vm->in_sbm) return virtio_mem_sbm_unplug_request(vm, diff); - return -EBUSY; + return virtio_mem_bbm_unplug_request(vm, diff); } =20 /* --=20 2.26.2