From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.8 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 978EFC433E0 for ; Thu, 11 Feb 2021 11:27:20 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id E67B0601FF for ; Thu, 11 Feb 2021 11:27:19 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org E67B0601FF Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=kernel.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 6FBBD6B00CA; Thu, 11 Feb 2021 06:27:19 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 6AE306B00CB; Thu, 11 Feb 2021 06:27:19 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 54CF06B00CC; Thu, 11 Feb 2021 06:27:19 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 3F4B76B00CA for ; Thu, 11 Feb 2021 06:27:19 -0500 (EST) Received: from smtpin02.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 062D1180AD815 for ; Thu, 11 Feb 2021 11:27:19 +0000 (UTC) X-FDA: 77805760998.02.month94_330464c27618 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin02.hostedemail.com (Postfix) with ESMTP id E041C10097AA1 for ; Thu, 11 Feb 2021 11:27:18 +0000 (UTC) X-HE-Tag: month94_330464c27618 X-Filterd-Recvd-Size: 8890 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by imf30.hostedemail.com (Postfix) with ESMTP for ; Thu, 11 Feb 2021 11:27:18 +0000 (UTC) Received: by mail.kernel.org (Postfix) with ESMTPSA id 418C664E26; Thu, 11 Feb 2021 11:27:07 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1613042837; bh=opUcWXi7xUrvrbmjoDXPB2U6cbnKyKBkuV2eGoN2RQ0=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=A5/jbbX3iRUf/Jtm1xCh7A2cWOvkK3kzDtdhjuXDmdNdtTGER2HK1BFbOMjDHidEz SSAvvaYro9hQ+4UOj7rZChw3GBiogzmiN+96ZwTf8KpbmHjppkBqlpNOGx1+GYSKnW iiIhy5VQwzOPRd5nhGKpZX9Mb+eGqJHBo98bQp3LuQGfqZdzRYf/BQyRJyHvb6rQIR vf9QchKbM0VCUg7uK5ktt19qAUj7CPyDJjv4a7QDFu0SNwMWyc8ZvtWr5Inmu5RJdS kK6a/mH5rwOREuKbaRyEGTNNkx6bq86w3omRuJcmAk5yyZCS72ObmTWwCdBPlR5ydV SRb+HO2Qsif8g== Date: Thu, 11 Feb 2021 13:27:02 +0200 From: Mike Rapoport To: David Hildenbrand Cc: Michal Hocko , Mike Rapoport , Andrew Morton , Alexander Viro , Andy Lutomirski , Arnd Bergmann , Borislav Petkov , Catalin Marinas , Christopher Lameter , Dan Williams , Dave Hansen , Elena Reshetova , "H. Peter Anvin" , Ingo Molnar , James Bottomley , "Kirill A. Shutemov" , Matthew Wilcox , Mark Rutland , Michael Kerrisk , Palmer Dabbelt , Paul Walmsley , Peter Zijlstra , Rick Edgecombe , Roman Gushchin , Shakeel Butt , Shuah Khan , Thomas Gleixner , Tycho Andersen , Will Deacon , linux-api@vger.kernel.org, linux-arch@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-nvdimm@lists.01.org, linux-riscv@lists.infradead.org, x86@kernel.org, Hagen Paul Pfeifer , Palmer Dabbelt Subject: Re: [PATCH v17 07/10] mm: introduce memfd_secret system call to create "secret" memory areas Message-ID: <20210211112702.GI242749@kernel.org> References: <20210208084920.2884-1-rppt@kernel.org> <20210208084920.2884-8-rppt@kernel.org> <20210208212605.GX242749@kernel.org> <20210209090938.GP299309@linux.ibm.com> <20210211071319.GF242749@kernel.org> <0d66baec-1898-987b-7eaf-68a015c027ff@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <0d66baec-1898-987b-7eaf-68a015c027ff@redhat.com> Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu, Feb 11, 2021 at 10:01:32AM +0100, David Hildenbrand wrote: > On 11.02.21 09:39, Michal Hocko wrote: > > On Thu 11-02-21 09:13:19, Mike Rapoport wrote: > > > On Tue, Feb 09, 2021 at 02:17:11PM +0100, Michal Hocko wrote: > > > > On Tue 09-02-21 11:09:38, Mike Rapoport wrote: > > [...] > > > > > Citing my older email: > > > > >=20 > > > > > I've hesitated whether to continue to use new flags to mem= fd_create() or to > > > > > add a new system call and I've decided to use a new system= call after I've > > > > > started to look into man pages update. There would have be= en two completely > > > > > independent descriptions and I think it would have been ve= ry confusing. > > > >=20 > > > > Could you elaborate? Unmapping from the kernel address space can = work > > > > both for sealed or hugetlb memfds, no? Those features are complet= ely > > > > orthogonal AFAICS. With a dedicated syscall you will need to intr= oduce > > > > this functionality on top if that is required. Have you considere= d that? > > > > I mean hugetlb pages are used to back guest memory very often. Is= this > > > > something that will be a secret memory usecase? > > > >=20 > > > > Please be really specific when giving arguments to back a new sys= call > > > > decision. > > >=20 > > > Isn't "syscalls have completely independent description" specific e= nough? > >=20 > > No, it's not as you can see from questions I've had above. More on th= at > > below. > >=20 > > > We are talking about API here, not the implementation details wheth= er > > > secretmem supports large pages or not. > > >=20 > > > The purpose of memfd_create() is to create a file-like access to me= mory. > > > The purpose of memfd_secret() is to create a way to access memory h= idden > > > from the kernel. > > >=20 > > > I don't think overloading memfd_create() with the secretmem flags b= ecause > > > they happen to return a file descriptor will be better for users, b= ut > > > rather will be more confusing. > >=20 > > This is quite a subjective conclusion. I could very well argue that i= t > > would be much better to have a single syscall to get a fd backed memo= ry > > with spedific requirements (sealing, unmapping from the kernel addres= s > > space). Neither of us would be clearly right or wrong. A more importa= nt > > point is a future extensibility and usability, though. So let's just > > think of few usecases I have outlined above. Is it unrealistic to exp= ect > > that secret memory should be sealable? What about hugetlb? Because if > > the answer is no then a new API is a clear win as the combination of > > flags would never work and then we would just suffer from the syscall > > multiplexing without much gain. On the other hand if combination of t= he > > functionality is to be expected then you will have to jam it into > > memfd_create and copy the interface likely causing more confusion. Se= e > > what I mean? > >=20 > > I by no means do not insist one way or the other but from what I have > > seen so far I have a feeling that the interface hasn't been thought > > through enough. Sure you have landed with fd based approach and that > > seems fair. But how to get that fd seems to still have some gaps IMHO= . > >=20 >=20 > I agree with Michal. This has been raised by different > people already, including on LWN (https://lwn.net/Articles/835342/). >=20 > I can follow Mike's reasoning (man page), and I am also fine if there i= s > a valid reason. However, IMHO the basic description seems to match quit= e good: >=20 > memfd_create() creates an anonymous file and returns a file desc= riptor that refers to it. The > file behaves like a regular file, and so can be modified, trunca= ted, memory-mapped, and so on. > However, unlike a regular file, it lives in RAM and has a volat= ile backing storage. Once all > references to the file are dropped, it is automatically released= . Anonymous memory is used > for all backing pages of the file. Therefore, files created b= y memfd_create() have the same > semantics as other anonymous memory allocations such as those al= located using mmap(2) with the > MAP_ANONYMOUS flag. Even despite my laziness and huge amount of copy-paste you can spot the differences (this is a very old version, update is due): memfd_secret() creates an anonymous file and returns a file descr= iptor that refers to it. The file can only be memory-mapped; the memor= y in such mapping will have stronger protection than usual memory m= apped files, and so it can be used to store application secrets. Unli= ke a regular file, a file created with memfd_secret() lives in RAM and = has a volatile backing storage. Once all references to the file are dro= pped, it is automatically released. The initial size of the file is s= et to 0. Following the call, the file size should be set using ftruncat= e(2). The memory areas obtained with mmap(2) from the file descriptor ar= e ex=E2=80=90 clusive to the owning context. These areas are removed from the k= ernel page tables and only the page table of the process holding the fil= e de=E2=80=90 scriptor maps the corresponding physical memory. =20 > AFAIKS, we would need MFD_SECRET and disallow > MFD_ALLOW_SEALING and MFD_HUGETLB. So here we start to multiplex. > In addition, we could add MFD_SECRET_NEVER_MAP, which could disallow an= y kind of > temporary mappings (eor migration). TBC. Never map is the default. When we'll need to map we'll add an explicit fl= ag for it. --=20 Sincerely yours, Mike.