From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id D3BC9C4332F for ; Mon, 14 Nov 2022 11:54:14 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 594A68E0001; Mon, 14 Nov 2022 06:54:14 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 51E706B0073; Mon, 14 Nov 2022 06:54:14 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3971E8E0001; Mon, 14 Nov 2022 06:54:14 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 2464F6B0072 for ; Mon, 14 Nov 2022 06:54:14 -0500 (EST) Received: from smtpin25.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id E13451A0E84 for ; Mon, 14 Nov 2022 11:54:13 +0000 (UTC) X-FDA: 80131889586.25.AAC80B4 Received: from mail-wm1-f46.google.com (mail-wm1-f46.google.com [209.85.128.46]) by imf18.hostedemail.com (Postfix) with ESMTP id 62C671C0006 for ; Mon, 14 Nov 2022 11:54:13 +0000 (UTC) Received: by mail-wm1-f46.google.com with SMTP id o30so7305113wms.2 for ; Mon, 14 Nov 2022 03:54:13 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=content-transfer-encoding:mime-version:message-id:in-reply-to:date :subject:cc:to:from:user-agent:references:from:to:cc:subject:date :message-id:reply-to; bh=UFrv/gZh3M5M+GHa7rgQfMzZBcJfOT6Htd7GIwYXDjY=; b=EUoeLavTVlHYdJZCNph9EHvSI+oEmosznJ6f/upKgsdxBCVKAD+wKTTxF2HefZF4aJ +voBzLkimQpEK0QSkfAiC1ct7gmCt04TDRoYbiQk2+ZrURn8qyzn7B8NaLHEY4CnDGm6 2jfadRSEAwkAURqUwVqJzHS0olixUeaFGq8/b2tw1FpYy5tfyimf03hvtmiBkAGD8Dwl b0COvNcSFazcgu2nDWFiq0ZFNQbDFye/UqRwXqYAyjzv+ihl5HvLX0qY0nWmYQKPF5Em ZwJAftOJ9dK9vLfL3ce4f68mG7VbwAJZ0JD+C1zsDfb08VanEK8T+D9r6Jo/S49b7wiV j91A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:message-id:in-reply-to:date :subject:cc:to:from:user-agent:references:x-gm-message-state:from:to :cc:subject:date:message-id:reply-to; bh=UFrv/gZh3M5M+GHa7rgQfMzZBcJfOT6Htd7GIwYXDjY=; b=699yHH+xXP79hrZxLKangkZ5FBqAeEZNBAbCxqB530xszHuLoxYQtMngqgao+b9D/K bxmm31boCzyEfhUu4wPlNSjqEGfKfROkapa8j2M5JhaV3lxq5FvwAcJM7DLCJQdaA0Q3 irVI2pUfOSjVvSWncwyLsSe7K8k1wL28F+x9Bv9U20t+fXe+veOvvlQU/bTizQaPS4Yk A+kBvVZKNeiFeJrG6CudO2ILwzUCzxFTKsDxlxbpOGF6zcBYy03lS1HM5RqhdwQFhZml UwlQ6af1D7bmos2+wzWusVvtgz+1JuqmP9UjO8Xh63BRMXphkBucZnNLe+Axsxa6rvCB 7Bzg== X-Gm-Message-State: ANoB5pmA6Eg0fHvareENiB7+J0bkb0wZSzU5bBF2J+alvA0pA/KItE2j qXMcvwjzWx9wuZ1yxt8BXD/7Ig== X-Google-Smtp-Source: AA0mqf403V0vwqaE+mnyncg76u8ir3wUv9Nm7KTIlaElnQNtqjHebbBnqedRs9sCRLI9w2BMk0/32Q== X-Received: by 2002:a05:600c:2315:b0:3cf:ae53:918f with SMTP id 21-20020a05600c231500b003cfae53918fmr7727329wmo.131.1668426851811; Mon, 14 Nov 2022 03:54:11 -0800 (PST) Received: from zen.linaroharston ([185.81.254.11]) by smtp.gmail.com with ESMTPSA id c17-20020adffb11000000b002417f35767asm6097766wrr.40.2022.11.14.03.54.10 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 14 Nov 2022 03:54:11 -0800 (PST) Received: from zen (localhost [127.0.0.1]) by zen.linaroharston (Postfix) with ESMTP id 5E6971FFB7; Mon, 14 Nov 2022 11:54:10 +0000 (GMT) References: <20221025151344.3784230-1-chao.p.peng@linux.intel.com> User-agent: mu4e 1.9.2; emacs 28.2.50 From: Alex =?utf-8?Q?Benn=C3=A9e?= To: Chao Peng Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, linux-doc@vger.kernel.org, qemu-devel@nongnu.org, Paolo Bonzini , Jonathan Corbet , Sean Christopherson , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , Thomas Gleixner , Ingo Molnar , Borislav Petkov , x86@kernel.org, "H . Peter Anvin" , Hugh Dickins , Jeff Layton , "J . Bruce Fields" , Andrew Morton , Shuah Khan , Mike Rapoport , Steven Price , "Maciej S . Szmigiero" , Vlastimil Babka , Vishal Annapurve , Yu Zhang , "Kirill A . Shutemov" , luto@kernel.org, jun.nakajima@intel.com, dave.hansen@intel.com, ak@linux.intel.com, david@redhat.com, aarcange@redhat.com, ddutile@redhat.com, dhildenb@redhat.com, Quentin Perret , tabba@google.com, Michael Roth , mhocko@suse.com, Muchun Song , wei.w.wang@intel.com, Viresh Kumar , Mathieu Poirier , AKASHI Takahiro Subject: Re: [PATCH v9 0/8] KVM: mm: fd-based approach for supporting KVM Date: Mon, 14 Nov 2022 11:43:37 +0000 In-reply-to: <20221025151344.3784230-1-chao.p.peng@linux.intel.com> Message-ID: <87k03xbvkt.fsf@linaro.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable ARC-Authentication-Results: i=1; imf18.hostedemail.com; dkim=pass header.d=linaro.org header.s=google header.b=EUoeLavT; spf=pass (imf18.hostedemail.com: domain of alex.bennee@linaro.org designates 209.85.128.46 as permitted sender) smtp.mailfrom=alex.bennee@linaro.org; dmarc=pass (policy=none) header.from=linaro.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1668426853; a=rsa-sha256; cv=none; b=Yatd6FK7Ru1Oc/90fNJ2Pa23hV/+aaO76ihRJft5BTIHkJ+QiufSV6Mrxon0Gg/uNcJhNa 0wAoIF09BLSqqaTV/0cawzxLRbm08avtCtVF5SU+8GIHmMrMC1JnL+2AJsCXI5XzfLGWPN 0h+uT1wvJYvdu39nhFTF7MKfkxYRd4I= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1668426853; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=UFrv/gZh3M5M+GHa7rgQfMzZBcJfOT6Htd7GIwYXDjY=; b=qTNUpSnb0OALyBEtYZSg9dKK8OTIpvxbU0LAlXoxFAadvnfDIyF8GEdI5E9aZeoi1WS4OL AbWptvlw3FsDkFHvjaWjzJpj0sNWjyzQy6hmhTkqhp1DsFS+FGwfZadMQ+4GdOQnId455C JtO9jRCUYORVEg4aCKfSyujhnBsrsuI= X-Rspam-User: Authentication-Results: imf18.hostedemail.com; dkim=pass header.d=linaro.org header.s=google header.b=EUoeLavT; spf=pass (imf18.hostedemail.com: domain of alex.bennee@linaro.org designates 209.85.128.46 as permitted sender) smtp.mailfrom=alex.bennee@linaro.org; dmarc=pass (policy=none) header.from=linaro.org X-Stat-Signature: fn3fds7edmaj5nir6fttkg1xm7b3psrp X-Rspamd-Queue-Id: 62C671C0006 X-Rspamd-Server: rspam09 X-HE-Tag: 1668426853-926471 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Chao Peng writes: > Introduction > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > KVM userspace being able to crash the host is horrible. Under current > KVM architecture, all guest memory is inherently accessible from KVM > userspace and is exposed to the mentioned crash issue. The goal of this > series is to provide a solution to align mm and KVM, on a userspace > inaccessible approach of exposing guest memory.=20 > > Normally, KVM populates secondary page table (e.g. EPT) by using a host > virtual address (hva) from core mm page table (e.g. x86 userspace page > table). This requires guest memory being mmaped into KVM userspace, but > this is also the source where the mentioned crash issue can happen. In > theory, apart from those 'shared' memory for device emulation etc, guest > memory doesn't have to be mmaped into KVM userspace. > > This series introduces fd-based guest memory which will not be mmaped > into KVM userspace. KVM populates secondary page table by using a > fd/offset pair backed by a memory file system. The fd can be created > from a supported memory filesystem like tmpfs/hugetlbfs and KVM can > directly interact with them with newly introduced in-kernel interface, > therefore remove the KVM userspace from the path of accessing/mmaping > the guest memory.=20 > > Kirill had a patch [2] to address the same issue in a different way. It > tracks guest encrypted memory at the 'struct page' level and relies on > HWPOISON to reject the userspace access. The patch has been discussed in > several online and offline threads and resulted in a design document [3] > which is also the original proposal for this series. Later this patch > series evolved as more comments received in community but the major > concepts in [3] still hold true so recommend reading. > > The patch series may also be useful for other usages, for example, pure > software approach may use it to harden itself against unintentional > access to guest memory. This series is designed with these usages in > mind but doesn't have code directly support them and extension might be > needed. There are a couple of additional use cases where having a consistent memory interface with the kernel would be useful. - Xen DomU guests providing other domains with VirtIO backends Xen by default doesn't give other domains special access to a domains memory. The guest can grant access to regions of its memory to other domains for this purpose.=20 - pKVM on ARM Similar to Xen, pKVM moves the management of the page tables into the hypervisor and again doesn't allow those domains to share memory by default. - VirtIO loopback This allows for VirtIO devices for the host kernel to be serviced by backends running in userspace. Obviously the memory userspace is allowed to access is strictly limited to the buffers and queues because giving userspace unrestricted access to the host kernel would have consequences. All of these VirtIO backends work with vhost-user which uses memfds to pass references to guest memory from the VMM to the backend implementation. > mm change > =3D=3D=3D=3D=3D=3D=3D=3D=3D > Introduces a new memfd_restricted system call which can create memory > file that is restricted from userspace access via normal MMU operations > like read(), write() or mmap() etc and the only way to use it is > passing it to a third kernel module like KVM and relying on it to > access the fd through the newly added restrictedmem kernel interface. > The restrictedmem interface bridges the memory file subsystems > (tmpfs/hugetlbfs etc) and their users (KVM in this case) and provides > bi-directional communication between them.=20 > > > KVM change > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > Extends the KVM memslot to provide guest private (encrypted) memory from > a fd. With this extension, a single memslot can maintain both private > memory through private fd (restricted_fd/restricted_offset) and shared > (unencrypted) memory through userspace mmaped host virtual address > (userspace_addr). For a particular guest page, the corresponding page in > KVM memslot can be only either private or shared and only one of the > shared/private parts of the memslot is visible to guest. For how this > new extension is used in QEMU, please refer to kvm_set_phys_mem() in > below TDX-enabled QEMU repo. > > Introduces new KVM_EXIT_MEMORY_FAULT exit to allow userspace to get the > chance on decision-making for shared <-> private memory conversion. The > exit can be an implicit conversion in KVM page fault handler or an > explicit conversion from guest OS. > > Extends existing SEV ioctls KVM_MEMORY_ENCRYPT_{UN,}REG_REGION to > convert a guest page between private <-> shared. The data maintained in > these ioctls tells the truth whether a guest page is private or shared > and this information will be used in KVM page fault handler to decide > whether the private or the shared part of the memslot is visible to > guest. > --=20 Alex Benn=C3=A9e