From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5FE56C54755 for ; Wed, 14 May 2025 23:43:13 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 653F66B00C2; Wed, 14 May 2025 19:43:07 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 5DE506B00C3; Wed, 14 May 2025 19:43:07 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 40BE26B00C6; Wed, 14 May 2025 19:43:07 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 187966B00C2 for ; Wed, 14 May 2025 19:43:07 -0400 (EDT) Received: from smtpin14.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 6B4A980406 for ; Wed, 14 May 2025 23:43:06 +0000 (UTC) X-FDA: 83443141572.14.4EE2B4C Received: from mail-pg1-f201.google.com (mail-pg1-f201.google.com [209.85.215.201]) by imf24.hostedemail.com (Postfix) with ESMTP id 99DEB18000C for ; Wed, 14 May 2025 23:43:04 +0000 (UTC) Authentication-Results: imf24.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=eQv62IvK; spf=pass (imf24.hostedemail.com: domain of 3hyolaAsKCLgYaicpjcwrleemmejc.amkjglsv-kkitYai.mpe@flex--ackerleytng.bounces.google.com designates 209.85.215.201 as permitted sender) smtp.mailfrom=3hyolaAsKCLgYaicpjcwrleemmejc.amkjglsv-kkitYai.mpe@flex--ackerleytng.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Authentication-Results: i=1; imf24.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=eQv62IvK; spf=pass (imf24.hostedemail.com: domain of 3hyolaAsKCLgYaicpjcwrleemmejc.amkjglsv-kkitYai.mpe@flex--ackerleytng.bounces.google.com designates 209.85.215.201 as permitted sender) smtp.mailfrom=3hyolaAsKCLgYaicpjcwrleemmejc.amkjglsv-kkitYai.mpe@flex--ackerleytng.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1747266184; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=8VuNQPGchcOIq5zjntCI1hvesntZRo2DrNtwYp0PG1Q=; b=PGL2l/k7SD+Dc6HLm96F7qjntqC+vm1xFm7cY3axA9duZwRpKTbl5XDceT3OTLwXa+GL2b 2pWhgpSsAxtURdawjZPneXxPv7GeyfPxhTDFifgfvJJoqgkSezPlyuczpQHG9QUM4GM90J tJfHJTQFV8nqKJdc6DML1rcPx5FqpPA= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1747266184; a=rsa-sha256; cv=none; b=elvawPADWupLxyJps1RyryUeO9OF0XA9GeZeQK+00R3plzgKqcSc5xExFAvp1BXLM1d0UY 90suLRSlGmTGVAI1vgX6c+GwqpPmRQxzSvMO80sdI/fYvzL8xicydUKWGNmFWOXLQzp2WJ wtMVk0IscuS8PRTWdhRJ+RDikjVv1UI= Received: by mail-pg1-f201.google.com with SMTP id 41be03b00d2f7-b0e5f28841dso158126a12.2 for ; Wed, 14 May 2025 16:43:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1747266183; x=1747870983; darn=kvack.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=8VuNQPGchcOIq5zjntCI1hvesntZRo2DrNtwYp0PG1Q=; b=eQv62IvKm0/EEMqYrKXjoySz1wZpRfQTkeG9RZHUMK++1JdabVLFpliqXhAHw1+GlS 9VMYUDf0MlbcKbBN8wP4bJhBhm4kAiOFON31TjHPiygEYxlAvVCGmVQ0sm1WmMS4G8TX 6LKT2mb31bLy3TvsYqSEgwpVoJjLtDoAAGDQT3uMZEoMHB5/Y3Nb+2IiYFoCqoCmddNy A0A6Hx6skJNxyII2BBC1acsKsYOwvE+pM3YvdipVBOZELpeOxG2q+M07S5iARotAahfa gWXSQLgMMAKvDyBhC95QQXDmwV4UUXHnhHvNxRRhdCP3qjFAIt0esKo7Nk4QnIMXn3ug 4VMg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1747266183; x=1747870983; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=8VuNQPGchcOIq5zjntCI1hvesntZRo2DrNtwYp0PG1Q=; b=vCMEQcVwqufw3vywwomTqwwII3kch0draSLhPBYyqJ/cUTu9ixFtHhOT1eWUiMnu4s XdyA9cYN91uo6C8yVC0JiyPtitdCM0QCW1vlFLeiWz7BX6SDLdq1A4n2dnxJsUNMN00q EfZca1Dtz9A4V10EuKyBjbpWaprAKGE557Vbxv0lq+091vzPi2OXnNrg0/04nl/S7+/k 5MYjAaGgbA8JaW/oQywwgtC4qX/xVfRU1uMtVaDLbrAMJNHsx1uCeSVOqoRslAOacmWd 30HG08N4btu3/7D5tauJXnzSj0powZqYf78QteqL84lsm3w7wyNAm/XVyTt574IUrfwg Pg7Q== X-Forwarded-Encrypted: i=1; AJvYcCVQ+NL8VDSZzggXCNyHFVfanVQ1Ed6SP4JZYiinEJYFt59S4JiOfTdQtgf600fuCw3Oeyl5cqWPtQ==@kvack.org X-Gm-Message-State: AOJu0YyTg6h+41x+yyqJBl4eiDxPHwCZthAkBXz/MT3oHaRSmZAQpCoQ XC5yBTTPGi9NweAqbUpAdxD6rcY4/O8L0t5JKU5MGt9n57bx5qj8fAVZqM0m64hhpNXdlp4pLuG sDAa6AsqIkF8IePPsk3+v8A== X-Google-Smtp-Source: AGHT+IG1FkJkhArG6Mt61Kb4omEnCDD1MSArIB4pDvqjFD6Q0hmbBicjQgSUCHYOc1MdC3NRkt8lFYlToXXQ998Tug== X-Received: from pjbqx6.prod.google.com ([2002:a17:90b:3e46:b0:30a:7c16:a1aa]) (user=ackerleytng job=prod-delivery.src-stubby-dispatcher) by 2002:a17:90b:4a86:b0:305:5f28:2d5c with SMTP id 98e67ed59e1d1-30e2e5d6aa8mr8074003a91.15.1747266183385; Wed, 14 May 2025 16:43:03 -0700 (PDT) Date: Wed, 14 May 2025 16:41:43 -0700 In-Reply-To: Mime-Version: 1.0 References: X-Mailer: git-send-email 2.49.0.1045.g170613ef41-goog Message-ID: Subject: [RFC PATCH v2 04/51] KVM: guest_memfd: Introduce KVM_GMEM_CONVERT_SHARED/PRIVATE ioctls From: Ackerley Tng To: kvm@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, x86@kernel.org, linux-fsdevel@vger.kernel.org Cc: ackerleytng@google.com, aik@amd.com, ajones@ventanamicro.com, akpm@linux-foundation.org, amoorthy@google.com, anthony.yznaga@oracle.com, anup@brainfault.org, aou@eecs.berkeley.edu, bfoster@redhat.com, binbin.wu@linux.intel.com, brauner@kernel.org, catalin.marinas@arm.com, chao.p.peng@intel.com, chenhuacai@kernel.org, dave.hansen@intel.com, david@redhat.com, dmatlack@google.com, dwmw@amazon.co.uk, erdemaktas@google.com, fan.du@intel.com, fvdl@google.com, graf@amazon.com, haibo1.xu@intel.com, hch@infradead.org, hughd@google.com, ira.weiny@intel.com, isaku.yamahata@intel.com, jack@suse.cz, james.morse@arm.com, jarkko@kernel.org, jgg@ziepe.ca, jgowans@amazon.com, jhubbard@nvidia.com, jroedel@suse.de, jthoughton@google.com, jun.miao@intel.com, kai.huang@intel.com, keirf@google.com, kent.overstreet@linux.dev, kirill.shutemov@intel.com, liam.merwick@oracle.com, maciej.wieczor-retman@intel.com, mail@maciej.szmigiero.name, maz@kernel.org, mic@digikod.net, michael.roth@amd.com, mpe@ellerman.id.au, muchun.song@linux.dev, nikunj@amd.com, nsaenz@amazon.es, oliver.upton@linux.dev, palmer@dabbelt.com, pankaj.gupta@amd.com, paul.walmsley@sifive.com, pbonzini@redhat.com, pdurrant@amazon.co.uk, peterx@redhat.com, pgonda@google.com, pvorel@suse.cz, qperret@google.com, quic_cvanscha@quicinc.com, quic_eberman@quicinc.com, quic_mnalajal@quicinc.com, quic_pderrin@quicinc.com, quic_pheragu@quicinc.com, quic_svaddagi@quicinc.com, quic_tsoni@quicinc.com, richard.weiyang@gmail.com, rick.p.edgecombe@intel.com, rientjes@google.com, roypat@amazon.co.uk, rppt@kernel.org, seanjc@google.com, shuah@kernel.org, steven.price@arm.com, steven.sistare@oracle.com, suzuki.poulose@arm.com, tabba@google.com, thomas.lendacky@amd.com, usama.arif@bytedance.com, vannapurve@google.com, vbabka@suse.cz, viro@zeniv.linux.org.uk, vkuznets@redhat.com, wei.w.wang@intel.com, will@kernel.org, willy@infradead.org, xiaoyao.li@intel.com, yan.y.zhao@intel.com, yilun.xu@intel.com, yuzenghui@huawei.com, zhiquan1.li@intel.com Content-Type: text/plain; charset="UTF-8" X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 99DEB18000C X-Stat-Signature: ab645ykazsf3a7u8ksduoz7fg8s5crwd X-Rspam-User: X-HE-Tag: 1747266184-397690 X-HE-Meta: U2FsdGVkX193gCWaZSKy28FAJadzUfVMmueI5yegxtbieXH5Qy6ADwLRtboL6ryDTINet+tXbwzwrzALNavL1sxh2cTNi/PoZ3VwxxS7XtOC9yIaV/6toeT2GrhkScEn8FVB0VxCND7XX2PJLXYW2xYKOVvEYHgW3TbWXHUWYs7fS9Kzh6tJ+gLt/Ad2H9o16rYdUvjeRBiunzF+VnZZtq30UJgyeaEDdcD/i12tUYe+LgHYz3AXdgPHkLLjFW689G9YOctWmfOcTiQCqXx9DPP8G7Oz67Jy8SVGzOmhR/goljePymIR0AoJ09JFjNXF8m/UJQ7dTnQwo7NK0NeNE6CTedyg8X1PGqsoLKkz0QPL/o943WfM0nWyXky5CKZNvzov4ny/8QlHNL3PliRSYquc8y8+gdqPGDlZj7KRZFSphauhL4RpOXOP/K7BVsxcnTRsZmaoOYt4uCedjiIlHLRcwPYciZruHifMerh2KXHTLfQLP7Q5zm9HociHBK++hczepTbLra7T3dQ3xvEKtCbvUGHcNzSgpuS8KeDQu/DeLxcPFZ9dHaQHfKcFbLsQZc4h/AG+KrAska2qGa2P/PO3TP24mhIe95EV7kd9Aya0gLEL5J3HTiFzQDvd5GLopnAomhpJB3ty7C/ZHxvQByD48jy5bfqIt5aHytJTzG+AhJlkx6RlK+wlP5Ih733kRs9CxgiiJDpOmk/3xrIDQH8fw/M8fH5SjqW6QvNbMr8IUJT417IiyUv93UPFwkI+9NTqAp2RMetpNzrWEszFHk8J1duTLtcr5VKMtxyo2EDPTb6PikAYhtsNe9AnxNaLK+1lqQvq8r2anYYxCqqPDx4Sg/xM8hPyus885dWnAIVlr2KPAWu1j11lVAhiIljajKyYbax7BFBbEYIhaAa36zY+F+98A8RSPjwz+4lKdwwc6rq+p0fd/REb2KWrgwm4F6vUp38GuopM+dO/Vhs Xcjleaoj N7eFvSkkdN2FgdNgCkgh5kJKqW68Y2t7NxO5wxlkLh8LJXS5ItUZRFfom6cfcHNBEclN0MT3F20bhooAF1204qaF2rjY3we2rvRwujaXKxdEpLQ//kAYxsclEQt+Nd/gex0nkrwKthbbuAp7QELodHa1Q5PSI+1mkDQLuUX73F7/WmfH4ZFPzx5Eywd+vhr5+1MGeBsXbcjWLQbCcyCn3wguv3IvfjRBNp+rXriGpItlOoRsUEisKoaP9nF8UrFkHefnXUK8CFAJ+tdSYNt7PeCcfoftw9VKzKNG1Z72f0W7BAqyLsvUqDG2i/VpjFSbOunQ3eaoUImOdiWLUijT4v3foEPmyBbEEuWgwjsEuWhksKzPSmrLYsdKunj/IuUMDbkkLITANmurhyZ6C/Eavt6OAv+Ym9HeaCdtHe2Hof/eLIYxvbBT5vz6Gj8VFDlt+wKTBcrDqMq7Fc/xPgGpieH2m1gFfWHbqFhH3FTqC5uscoHLFYHOKdh98jsUlv+BRVaDCJRdRcFeWu5BJ6uN8rlcbL9U03l6TmVlp0DTtEYgFRg0gwcCleSjZl7Y3472B0csD500/VLeTIkHme8bGD/RCz3GVpuz6IQYazVH2aIQUoDbfx2KrTki4oBuakuXoATP4M0qs2atIUq8W7Tqts3YY4Oo9IOjfxov4ZEGFPhxhVAwSpudvtUBBGdRuTieXBUSN X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: The two new guest_memfd ioctls KVM_GMEM_CONVERT_SHARED and KVM_GMEM_CONVERT_PRIVATE convert the requested memory ranges to shared and private respectively. A guest_memfd ioctl is used because shareability is a property of the memory, and this property should be modifiable independently of the attached struct kvm. This allows shareability to be modified even if the memory is not yet bound using memslots. For shared to private conversions, if refcounts on any of the folios within the range are elevated, fail the conversion with -EAGAIN. At the point of shared to private conversion, all folios in range are also unmapped. The filemap_invalidate_lock() is held, so no faulting can occur. Hence, from that point on, only transient refcounts can be taken on the folios associated with that guest_memfd. Hence, it is safe to do the conversion from shared to private. After conversion is complete, refcounts may become elevated, but that is fine since users of transient refcounts don't actually access memory. For private to shared conversions, there are no refcount checks. any transient refcounts are expected to drop their refcounts soon. The conversion process will spin waiting for these transient refcounts to go away. Signed-off-by: Ackerley Tng Change-Id: I3546aaf6c1b795de6dc9ba09e816b64934221918 --- include/uapi/linux/kvm.h | 11 ++ virt/kvm/guest_memfd.c | 357 ++++++++++++++++++++++++++++++++++++++- 2 files changed, 366 insertions(+), 2 deletions(-) diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h index d7df312479aa..5b28e17f6f14 100644 --- a/include/uapi/linux/kvm.h +++ b/include/uapi/linux/kvm.h @@ -1577,6 +1577,17 @@ struct kvm_create_guest_memfd { __u64 reserved[6]; }; +#define KVM_GMEM_IO 0xAF +#define KVM_GMEM_CONVERT_SHARED _IOWR(KVM_GMEM_IO, 0x41, struct kvm_gmem_convert) +#define KVM_GMEM_CONVERT_PRIVATE _IOWR(KVM_GMEM_IO, 0x42, struct kvm_gmem_convert) + +struct kvm_gmem_convert { + __u64 offset; + __u64 size; + __u64 error_offset; + __u64 reserved[5]; +}; + #define KVM_PRE_FAULT_MEMORY _IOWR(KVMIO, 0xd5, struct kvm_pre_fault_memory) struct kvm_pre_fault_memory { diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c index 590932499eba..f802116290ce 100644 --- a/virt/kvm/guest_memfd.c +++ b/virt/kvm/guest_memfd.c @@ -30,6 +30,10 @@ enum shareability { }; static struct folio *kvm_gmem_get_folio(struct inode *inode, pgoff_t index); +static void kvm_gmem_invalidate_begin(struct kvm_gmem *gmem, pgoff_t start, + pgoff_t end); +static void kvm_gmem_invalidate_end(struct kvm_gmem *gmem, pgoff_t start, + pgoff_t end); static struct kvm_gmem_inode_private *kvm_gmem_private(struct inode *inode) { @@ -85,6 +89,306 @@ static struct folio *kvm_gmem_get_shared_folio(struct inode *inode, pgoff_t inde return kvm_gmem_get_folio(inode, index); } +/** + * kvm_gmem_shareability_store() - Sets shareability to @value for range. + * + * @mt: the shareability maple tree. + * @index: the range begins at this index in the inode. + * @nr_pages: number of PAGE_SIZE pages in this range. + * @value: the shareability value to set for this range. + * + * Unlike mtree_store_range(), this function also merges adjacent ranges that + * have the same values as an optimization. Assumes that all stores to @mt go + * through this function, such that adjacent ranges are always merged. + * + * Return: 0 on success and negative error otherwise. + */ +static int kvm_gmem_shareability_store(struct maple_tree *mt, pgoff_t index, + size_t nr_pages, enum shareability value) +{ + MA_STATE(mas, mt, 0, 0); + unsigned long start; + unsigned long last; + void *entry; + int ret; + + start = index; + last = start + nr_pages - 1; + + mas_lock(&mas); + + /* Try extending range. entry is NULL on overflow/wrap-around. */ + mas_set_range(&mas, last + 1, last + 1); + entry = mas_find(&mas, last + 1); + if (entry && xa_to_value(entry) == value) + last = mas.last; + + mas_set_range(&mas, start - 1, start - 1); + entry = mas_find(&mas, start - 1); + if (entry && xa_to_value(entry) == value) + start = mas.index; + + mas_set_range(&mas, start, last); + ret = mas_store_gfp(&mas, xa_mk_value(value), GFP_KERNEL); + + mas_unlock(&mas); + + return ret; +} + +struct conversion_work { + struct list_head list; + pgoff_t start; + size_t nr_pages; +}; + +static int add_to_work_list(struct list_head *list, pgoff_t start, pgoff_t last) +{ + struct conversion_work *work; + + work = kzalloc(sizeof(*work), GFP_KERNEL); + if (!work) + return -ENOMEM; + + work->start = start; + work->nr_pages = last + 1 - start; + + list_add_tail(&work->list, list); + + return 0; +} + +static bool kvm_gmem_has_safe_refcount(struct address_space *mapping, pgoff_t start, + size_t nr_pages, pgoff_t *error_index) +{ + const int filemap_get_folios_refcount = 1; + struct folio_batch fbatch; + bool refcount_safe; + pgoff_t last; + int i; + + last = start + nr_pages - 1; + refcount_safe = true; + + folio_batch_init(&fbatch); + while (refcount_safe && + filemap_get_folios(mapping, &start, last, &fbatch)) { + + for (i = 0; i < folio_batch_count(&fbatch); ++i) { + int filemap_refcount; + int safe_refcount; + struct folio *f; + + f = fbatch.folios[i]; + filemap_refcount = folio_nr_pages(f); + + safe_refcount = filemap_refcount + filemap_get_folios_refcount; + if (folio_ref_count(f) != safe_refcount) { + refcount_safe = false; + *error_index = f->index; + break; + } + } + + folio_batch_release(&fbatch); + } + + return refcount_safe; +} + +static int kvm_gmem_shareability_apply(struct inode *inode, + struct conversion_work *work, + enum shareability m) +{ + struct maple_tree *mt; + + mt = &kvm_gmem_private(inode)->shareability; + return kvm_gmem_shareability_store(mt, work->start, work->nr_pages, m); +} + +static int kvm_gmem_convert_compute_work(struct inode *inode, pgoff_t start, + size_t nr_pages, enum shareability m, + struct list_head *work_list) +{ + struct maple_tree *mt; + struct ma_state mas; + pgoff_t last; + void *entry; + int ret; + + last = start + nr_pages - 1; + + mt = &kvm_gmem_private(inode)->shareability; + ret = 0; + + mas_init(&mas, mt, start); + + rcu_read_lock(); + mas_for_each(&mas, entry, last) { + enum shareability current_m; + pgoff_t m_range_index; + pgoff_t m_range_last; + int ret; + + m_range_index = max(mas.index, start); + m_range_last = min(mas.last, last); + + current_m = xa_to_value(entry); + if (m == current_m) + continue; + + mas_pause(&mas); + rcu_read_unlock(); + /* Caller will clean this up on error. */ + ret = add_to_work_list(work_list, m_range_index, m_range_last); + rcu_read_lock(); + if (ret) + break; + } + rcu_read_unlock(); + + return ret; +} + +static void kvm_gmem_convert_invalidate_begin(struct inode *inode, + struct conversion_work *work) +{ + struct list_head *gmem_list; + struct kvm_gmem *gmem; + pgoff_t end; + + end = work->start + work->nr_pages; + + gmem_list = &inode->i_mapping->i_private_list; + list_for_each_entry(gmem, gmem_list, entry) + kvm_gmem_invalidate_begin(gmem, work->start, end); +} + +static void kvm_gmem_convert_invalidate_end(struct inode *inode, + struct conversion_work *work) +{ + struct list_head *gmem_list; + struct kvm_gmem *gmem; + pgoff_t end; + + end = work->start + work->nr_pages; + + gmem_list = &inode->i_mapping->i_private_list; + list_for_each_entry(gmem, gmem_list, entry) + kvm_gmem_invalidate_end(gmem, work->start, end); +} + +static int kvm_gmem_convert_should_proceed(struct inode *inode, + struct conversion_work *work, + bool to_shared, pgoff_t *error_index) +{ + if (!to_shared) { + unmap_mapping_pages(inode->i_mapping, work->start, + work->nr_pages, false); + + if (!kvm_gmem_has_safe_refcount(inode->i_mapping, work->start, + work->nr_pages, error_index)) { + return -EAGAIN; + } + } + + return 0; +} + +static int kvm_gmem_convert_range(struct file *file, pgoff_t start, + size_t nr_pages, bool shared, + pgoff_t *error_index) +{ + struct conversion_work *work, *tmp, *rollback_stop_item; + LIST_HEAD(work_list); + struct inode *inode; + enum shareability m; + int ret; + + inode = file_inode(file); + + filemap_invalidate_lock(inode->i_mapping); + + m = shared ? SHAREABILITY_ALL : SHAREABILITY_GUEST; + ret = kvm_gmem_convert_compute_work(inode, start, nr_pages, m, &work_list); + if (ret || list_empty(&work_list)) + goto out; + + list_for_each_entry(work, &work_list, list) + kvm_gmem_convert_invalidate_begin(inode, work); + + list_for_each_entry(work, &work_list, list) { + ret = kvm_gmem_convert_should_proceed(inode, work, shared, + error_index); + if (ret) + goto invalidate_end; + } + + list_for_each_entry(work, &work_list, list) { + rollback_stop_item = work; + ret = kvm_gmem_shareability_apply(inode, work, m); + if (ret) + break; + } + + if (ret) { + m = shared ? SHAREABILITY_GUEST : SHAREABILITY_ALL; + list_for_each_entry(work, &work_list, list) { + if (work == rollback_stop_item) + break; + + WARN_ON(kvm_gmem_shareability_apply(inode, work, m)); + } + } + +invalidate_end: + list_for_each_entry(work, &work_list, list) + kvm_gmem_convert_invalidate_end(inode, work); +out: + filemap_invalidate_unlock(inode->i_mapping); + + list_for_each_entry_safe(work, tmp, &work_list, list) { + list_del(&work->list); + kfree(work); + } + + return ret; +} + +static int kvm_gmem_ioctl_convert_range(struct file *file, + struct kvm_gmem_convert *param, + bool shared) +{ + pgoff_t error_index; + size_t nr_pages; + pgoff_t start; + int ret; + + if (param->error_offset) + return -EINVAL; + + if (param->size == 0) + return 0; + + if (param->offset + param->size < param->offset || + param->offset > file_inode(file)->i_size || + param->offset + param->size > file_inode(file)->i_size) + return -EINVAL; + + if (!IS_ALIGNED(param->offset, PAGE_SIZE) || + !IS_ALIGNED(param->size, PAGE_SIZE)) + return -EINVAL; + + start = param->offset >> PAGE_SHIFT; + nr_pages = param->size >> PAGE_SHIFT; + + ret = kvm_gmem_convert_range(file, start, nr_pages, shared, &error_index); + if (ret) + param->error_offset = error_index << PAGE_SHIFT; + + return ret; +} + #else static int kvm_gmem_shareability_setup(struct maple_tree *mt, loff_t size, u64 flags) @@ -186,15 +490,26 @@ static void kvm_gmem_invalidate_begin(struct kvm_gmem *gmem, pgoff_t start, unsigned long index; xa_for_each_range(&gmem->bindings, index, slot, start, end - 1) { + enum kvm_gfn_range_filter filter; pgoff_t pgoff = slot->gmem.pgoff; + filter = KVM_FILTER_PRIVATE; + if (kvm_gmem_memslot_supports_shared(slot)) { + /* + * Unmapping would also cause invalidation, but cannot + * rely on mmu_notifiers to do invalidation via + * unmapping, since memory may not be mapped to + * userspace. + */ + filter |= KVM_FILTER_SHARED; + } + struct kvm_gfn_range gfn_range = { .start = slot->base_gfn + max(pgoff, start) - pgoff, .end = slot->base_gfn + min(pgoff + slot->npages, end) - pgoff, .slot = slot, .may_block = true, - /* guest memfd is relevant to only private mappings. */ - .attr_filter = KVM_FILTER_PRIVATE, + .attr_filter = filter, }; if (!found_memslot) { @@ -484,11 +799,49 @@ EXPORT_SYMBOL_GPL(kvm_gmem_memslot_supports_shared); #define kvm_gmem_mmap NULL #endif /* CONFIG_KVM_GMEM_SHARED_MEM */ +static long kvm_gmem_ioctl(struct file *file, unsigned int ioctl, + unsigned long arg) +{ + void __user *argp; + int r; + + argp = (void __user *)arg; + + switch (ioctl) { +#ifdef CONFIG_KVM_GMEM_SHARED_MEM + case KVM_GMEM_CONVERT_SHARED: + case KVM_GMEM_CONVERT_PRIVATE: { + struct kvm_gmem_convert param; + bool to_shared; + + r = -EFAULT; + if (copy_from_user(¶m, argp, sizeof(param))) + goto out; + + to_shared = ioctl == KVM_GMEM_CONVERT_SHARED; + r = kvm_gmem_ioctl_convert_range(file, ¶m, to_shared); + if (r) { + if (copy_to_user(argp, ¶m, sizeof(param))) { + r = -EFAULT; + goto out; + } + } + break; + } +#endif + default: + r = -ENOTTY; + } +out: + return r; +} + static struct file_operations kvm_gmem_fops = { .mmap = kvm_gmem_mmap, .open = generic_file_open, .release = kvm_gmem_release, .fallocate = kvm_gmem_fallocate, + .unlocked_ioctl = kvm_gmem_ioctl, }; static void kvm_gmem_free_inode(struct inode *inode) -- 2.49.0.1045.g170613ef41-goog