From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.6 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id B5795C433DB for ; Mon, 8 Feb 2021 21:48:12 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 6C01264DE1 for ; Mon, 8 Feb 2021 21:48:12 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236990AbhBHVrk (ORCPT ); Mon, 8 Feb 2021 16:47:40 -0500 Received: from mail.kernel.org ([198.145.29.99]:35556 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236260AbhBHV1F (ORCPT ); Mon, 8 Feb 2021 16:27:05 -0500 Received: by mail.kernel.org (Postfix) with ESMTPSA id E27F164E6C; Mon, 8 Feb 2021 21:26:09 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1612819582; bh=Y8G7gf2IwGjBXBF5ErxQfWKVOLSdVIOpGEIG67JA2AY=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=Eerdn/7TBUJNWh5JSQGQanhkCbXO6r8i+07ORvc0kNlXTv/gMu3WAJsIb2RYRWrDl NzMmvcDn3iTmjaUTOACFOeq+fWRq6saXTmy51YQtoTk38Gte4b+P8L19Xbwx0L6ml+ Ea3Z7Qd3D+MldYIlVvyh5Yb26npCORZwTE+3wJTir9k1lgaS6eZYPXQ4zFo0WChRGs fUiJgsx/cCeor6w8T2f3XtMihtpacVS0+tPYSqs4uM0k3MR4YWV+vaapoyzoPPxAl+ yFpPIEmRq4D/vlPxshoMK56Hm/B9URPT2mlstPmtBfgreYHioAYDlS64KLgxO+QeoG FSPZy/knOHzIQ== Date: Mon, 8 Feb 2021 23:26:05 +0200 From: Mike Rapoport To: Michal Hocko Cc: Andrew Morton , Alexander Viro , Andy Lutomirski , Arnd Bergmann , Borislav Petkov , Catalin Marinas , Christopher Lameter , Dan Williams , Dave Hansen , David Hildenbrand , Elena Reshetova , "H. Peter Anvin" , Ingo Molnar , James Bottomley , "Kirill A. Shutemov" , Matthew Wilcox , Mark Rutland , Mike Rapoport , Michael Kerrisk , Palmer Dabbelt , Paul Walmsley , Peter Zijlstra , Rick Edgecombe , Roman Gushchin , Shakeel Butt , Shuah Khan , Thomas Gleixner , Tycho Andersen , Will Deacon , linux-api@vger.kernel.org, linux-arch@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-nvdimm@lists.01.org, linux-riscv@lists.infradead.org, x86@kernel.org, Hagen Paul Pfeifer , Palmer Dabbelt Subject: Re: [PATCH v17 07/10] mm: introduce memfd_secret system call to create "secret" memory areas Message-ID: <20210208212605.GX242749@kernel.org> References: <20210208084920.2884-1-rppt@kernel.org> <20210208084920.2884-8-rppt@kernel.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Feb 08, 2021 at 11:49:22AM +0100, Michal Hocko wrote: > On Mon 08-02-21 10:49:17, Mike Rapoport wrote: > > From: Mike Rapoport > > > > Introduce "memfd_secret" system call with the ability to create memory > > areas visible only in the context of the owning process and not mapped not > > only to other processes but in the kernel page tables as well. > > > > The secretmem feature is off by default and the user must explicitly enable > > it at the boot time. > > > > Once secretmem is enabled, the user will be able to create a file > > descriptor using the memfd_secret() system call. The memory areas created > > by mmap() calls from this file descriptor will be unmapped from the kernel > > direct map and they will be only mapped in the page table of the owning mm. > > Is this really true? I guess you meant to say that the memory will > visible only via page tables to anybody who can mmap the respective file > descriptor. There is nothing like an owning mm as the fd is inherently a > shareable resource and the ownership becomes a very vague and hard to > define term. Hmm, it seems I've been dragging this paragraph from the very first mmap(MAP_EXCLUSIVE) rfc and nobody (including myself) noticed the inconsistency. > > The file descriptor based memory has several advantages over the > > "traditional" mm interfaces, such as mlock(), mprotect(), madvise(). It > > paves the way for VMMs to remove the secret memory range from the process; > > I do not understand how it helps to remove the memory from the process > as the interface explicitly allows to add a memory that is removed from > all other processes via direct map. The current implementation does not help to remove the memory from the process, but using fd-backed memory seems a better interface to remove guest memory from host mappings than mmap. As Andy nicely put it: "Getting fd-backed memory into a guest will take some possibly major work in the kernel, but getting vma-backed memory into a guest without mapping it in the host user address space seems much, much worse." > > As secret memory implementation is not an extension of tmpfs or hugetlbfs, > > usage of a dedicated system call rather than hooking new functionality into > > memfd_create(2) emphasises that memfd_secret(2) has different semantics and > > allows better upwards compatibility. > > What is this supposed to mean? What are differences? Well, the phrasing could be better indeed. That supposed to mean that they differ in the semantics behind the file descriptor: memfd_create implements sealing for shmem and hugetlbfs while memfd_secret implements memory hidden from the kernel. > > The secretmem mappings are locked in memory so they cannot exceed > > RLIMIT_MEMLOCK. Since these mappings are already locked an attempt to > > mlock() secretmem range would fail and mlockall() will ignore secretmem > > mappings. > > What about munlock? Isn't this implied? ;-) I'll add a sentence about it. -- Sincerely yours, Mike.