From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-9.2 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,NICE_REPLY_A, SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_1 autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3A6E2C2B9F7 for ; Fri, 28 May 2021 17:07:11 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id B6C376113E for ; Fri, 28 May 2021 17:07:10 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org B6C376113E Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 1F2396B0070; Fri, 28 May 2021 13:07:10 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 1A3336B0071; Fri, 28 May 2021 13:07:10 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id F386C6B0072; Fri, 28 May 2021 13:07:09 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0187.hostedemail.com [216.40.44.187]) by kanga.kvack.org (Postfix) with ESMTP id BF2376B0070 for ; Fri, 28 May 2021 13:07:09 -0400 (EDT) Received: from smtpin05.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 546AD1817902E for ; Fri, 28 May 2021 17:07:09 +0000 (UTC) X-FDA: 78191270178.05.43AF889 Received: from mail-pg1-f179.google.com (mail-pg1-f179.google.com [209.85.215.179]) by imf25.hostedemail.com (Postfix) with ESMTP id CC1A760002CC for ; Fri, 28 May 2021 17:06:59 +0000 (UTC) Received: by mail-pg1-f179.google.com with SMTP id r1so2979595pgk.8 for ; Fri, 28 May 2021 10:07:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=subject:to:cc:references:from:message-id:date:user-agent :mime-version:in-reply-to:content-language:content-transfer-encoding; bh=BGK4uUtY/47ELdZTjRTXwT+LtePr9wzh99tC7Z5sRvA=; b=WHrn8xms18dad5rtvGjozvYLe69er9+pUzUsaXen4aLEuDnY/8uWbNLLT9980klYmx 5j2EB7Qz7TlTP62H8lfAij6Hr6b2oerMbBMUASKM17hI/QIWBf7qJSIniJAYhbFVxi4m x6GQNMub41labn+NEj8Hu+XZeEC479Ip2ZsZzsImUXVCDm/C/Sjnq/0E1MUkw8jXpGRY gu9AFJRHC5Zm3YU4E4ZbfzuIImmcNIS8fuUphvqYXxn0pJfj5vOUF7w+43hQgnIR3l+v N8Ybz9hGb7A6J2tlm/dASzxsOTuE2I5DGgTxXUH0h1NJ6BerX2RgANzyBmNXK0H7EEUY TQAQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:cc:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=BGK4uUtY/47ELdZTjRTXwT+LtePr9wzh99tC7Z5sRvA=; b=OcY0Ax6tNKCzysSG4SAeUleubdtJiF6qxOFH/zU+2EmPhVlJEmCcLWtO67503ATqyJ RyJhvU3oO8O7sm5+rRdVeDHHNDRSoev52n/e8ytKXmrx1h2aMO4hI1eUZbqXgDnZxuKX tSAktHSyFPjccqPbi80lCBBFyGg852p6vOAk2CFzmqFJl2G9oCsS6vPg22UM6qmvpMHj POZh7Dd9xen5vVVqefdbri9lfTPjXiiwBIC20ygfNTOumjHdUzLopQi3oBeU9cU4bMNS qXZ3GBcnB92A1xLSEWEKdhMBmCgZoDArsstQTAvxVtrW4/fLoPPFV4tk0C9FmILi3FLI zU7w== X-Gm-Message-State: AOAM531IuTujTaHgBtG+dXSE5Q56PXC+lz2nWVVRCB3S8K67L6T7YIGy fkV4YhubOCek16T44gEUg6k= X-Google-Smtp-Source: ABdhPJz/JAziGiAdfqsYZltBtOymO6kYNHITPYnCn2zPIBzEejzjFDSRxD4b5QqF8IrIo2WmvsSmUA== X-Received: by 2002:a63:f00b:: with SMTP id k11mr10049482pgh.154.1622221627956; Fri, 28 May 2021 10:07:07 -0700 (PDT) Received: from [192.168.0.15] (c-73-158-171-241.hsd1.ca.comcast.net. [73.158.171.241]) by smtp.gmail.com with ESMTPSA id f7sm4791492pfq.8.2021.05.28.10.07.03 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Fri, 28 May 2021 10:07:06 -0700 (PDT) Subject: Re: Sealed memfd & no-fault mmap To: Linus Torvalds , Simon Ser Cc: Peter Xu , "Kirill A. Shutemov" , Matthew Wilcox , Dan Williams , "Kirill A. Shutemov" , Will Deacon , Linux Kernel Mailing List , David Herrmann , "linux-mm@kvack.org" , Greg Kroah-Hartman , "tytso@mit.edu" References: <20210429154807.hptls4vnmq2svuea@box> <20210429183836.GF8339@xz-x1> From: "Lin, Ming" Message-ID: <7718ec5b-0a9e-ffa6-16f2-bc0b6afbd9ab@gmail.com> Date: Fri, 28 May 2021 10:07:02 -0700 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101 Thunderbird/78.10.2 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit X-Rspamd-Queue-Id: CC1A760002CC Authentication-Results: imf25.hostedemail.com; dkim=pass header.d=gmail.com header.s=20161025 header.b=WHrn8xms; spf=pass (imf25.hostedemail.com: domain of minggr@gmail.com designates 209.85.215.179 as permitted sender) smtp.mailfrom=minggr@gmail.com; dmarc=pass (policy=none) header.from=gmail.com X-Rspamd-Server: rspam04 X-Stat-Signature: d8o5w3oexrt7ir9pys7p65xp4tdsdcpu X-HE-Tag: 1622221619-705511 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 5/5/2021 11:42 AM, Linus Torvalds wrote: > On Wed, May 5, 2021 at 3:21 AM Simon Ser wrote: >>> >>> Is there some very specific and targeted pattern for that "shared >>> mapping" case? For example, if it's always a shared anonymous mapping >>> with no filesystem backing, then that would possibly be a simpler case >>> than the "random arbitrary shared file descriptor". >> >> Yes. I don't know of any Wayland client using buffers with real >> filesystem backing. I think the main cases are: >> >> - shm_open(3) immediately followed by shm_unlink(3). On Linux, this is >> implemented with /dev/shm which is a tmpfs. >> - Abusing /tmp or /run's tmpfs by creating a file there and unlinking >> it immediately afterwards. Kind of similar to the first case. >> - memfd_create(2) on Linux. >> >> Is this enough to make it work on shared memory mappings? Is it >> important that the mapping is anonymous? > > All of those should be anonymous in the sense that the backing store > is all the kernel's notion of anonymous pages, and there is no actual > file backing. The mappings may then be shared, of course. > > So that does make Peter's idea to have some inode flag for "don't > SIGBUS on fault" be more reasonable, because there isn't some random > actual filesystem involved, only the core VM layer. > > I'm not going to write the patch, though, but maybe you can convince > somebody else to try it.. Does something like following draft patch on the right track? 1. Application set S_NOFAULT flag on shm mmap fd #define S_NOFAULT (1 << 17) fd = shm_open(shmpath, O_RDONLY, S_IRUSR | S_IWUSR); ioctl(fd, FS_IOC_GETFLAGS, &flags); flags |= S_NOFAULT; ioctl(fd, FS_IOC_SETFLAGS, &flags) 2. Don't SIGBUS on read beyond i_size if S_NOFAULT flag set in inode. Use zero page instead. --- [RFC DRAFT PATCH] shm: no SIGBUS fault on out-of-band mmap read --- include/linux/fs.h | 2 ++ mm/shmem.c | 44 +++++++++++++++++++++++++++++++++++++++++++- 2 files changed, 45 insertions(+), 1 deletion(-) diff --git a/include/linux/fs.h b/include/linux/fs.h index c3c88fdb9b2a..a9be7cd71b94 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -2202,6 +2202,7 @@ struct super_operations { #define S_ENCRYPTED (1 << 14) /* Encrypted file (using fs/crypto/) */ #define S_CASEFOLD (1 << 15) /* Casefolded file */ #define S_VERITY (1 << 16) /* Verity file (using fs/verity/) */ +#define S_NOFAULT (1 << 17) /* No SIGBUS fault on out-of-band mmap read */ /* * Note that nosuid etc flags are inode-specific: setting some file-system @@ -2244,6 +2245,7 @@ static inline bool sb_rdonly(const struct super_block *sb) { return sb->s_flags #define IS_ENCRYPTED(inode) ((inode)->i_flags & S_ENCRYPTED) #define IS_CASEFOLDED(inode) ((inode)->i_flags & S_CASEFOLD) #define IS_VERITY(inode) ((inode)->i_flags & S_VERITY) +#define IS_NOFAULT(inode) ((inode)->i_flags & S_NOFAULT) #define IS_WHITEOUT(inode) (S_ISCHR(inode->i_mode) && \ (inode)->i_rdev == WHITEOUT_DEV) diff --git a/mm/shmem.c b/mm/shmem.c index 5d46611cba8d..856d2d8d4cdf 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -38,8 +38,11 @@ #include #include #include +#include +#include #include /* for arch/microblaze update_mmu_cache() */ +#include static struct vfsmount *shm_mnt; @@ -1812,7 +1815,27 @@ static int shmem_getpage_gfp(struct inode *inode, pgoff_t index, repeat: if (sgp <= SGP_CACHE && ((loff_t)index << PAGE_SHIFT) >= i_size_read(inode)) { - return -EINVAL; + unsigned long dst_addr = vmf->address; + pte_t _dst_pte, *dst_pte; + spinlock_t *ptl; + int ret; + + if (!IS_NOFAULT(inode)) + return -EINVAL; + + _dst_pte = pte_mkspecial(pfn_pte(my_zero_pfn(dst_addr), + vma->vm_page_prot)); + dst_pte = pte_offset_map_lock(vma->vm_mm, vmf->pmd, dst_addr, &ptl); + ret = -EEXIST; + if (!pte_none(*dst_pte)) + goto out_unlock; + set_pte_at(vma->vm_mm, dst_addr, dst_pte, _dst_pte); + update_mmu_cache(vma, dst_addr, dst_pte); + *fault_type = VM_FAULT_NOPAGE; + ret = 0; +out_unlock: + pte_unmap_unlock(dst_pte, ptl); + return ret; } sbinfo = SHMEM_SB(inode->i_sb); @@ -3819,6 +3842,23 @@ const struct address_space_operations shmem_aops = { }; EXPORT_SYMBOL(shmem_aops); +static int shmem_fileattr_get(struct dentry *dentry, struct fileattr *fa) +{ + struct inode *inode = d_inode(dentry); + + fileattr_fill_flags(fa, inode->i_flags); + + return 0; +} + +static int shmem_fileattr_set(struct user_namespace *mnt_userns, + struct dentry *dentry, struct fileattr *fa) +{ + struct inode *inode = d_inode(dentry); + inode->i_flags = fa->flags; + return 0; +} + static const struct file_operations shmem_file_operations = { .mmap = shmem_mmap, .get_unmapped_area = shmem_get_unmapped_area, @@ -3836,6 +3876,8 @@ static const struct file_operations shmem_file_operations = { static const struct inode_operations shmem_inode_operations = { .getattr = shmem_getattr, .setattr = shmem_setattr, + .fileattr_get = shmem_fileattr_get, + .fileattr_set = shmem_fileattr_set, #ifdef CONFIG_TMPFS_XATTR .listxattr = shmem_listxattr, .set_acl = simple_set_acl,