From: "Darrick J. Wong" <djwong@kernel.org>
To: Khalid Aziz <khalid.aziz@oracle.com>
Cc: akpm@linux-foundation.org, willy@infradead.org,
aneesh.kumar@linux.ibm.com, arnd@arndb.de, 21cnbao@gmail.com,
corbet@lwn.net, dave.hansen@linux.intel.com, david@redhat.com,
ebiederm@xmission.com, hagen@jauu.net, jack@suse.cz,
keescook@chromium.org, kirill@shutemov.name, kucharsk@gmail.com,
linkinjeon@kernel.org, linux-fsdevel@vger.kernel.org,
linux-kernel@vger.kernel.org, linux-mm@kvack.org,
longpeng2@huawei.com, luto@kernel.org, markhemm@googlemail.com,
pcc@google.com, rppt@kernel.org, sieberf@amazon.com,
sjpark@amazon.de, surenb@google.com, tst@schoebel-theuer.de,
yzaikin@google.com
Subject: Re: [PATCH v2 1/9] mm: Add msharefs filesystem
Date: Thu, 30 Jun 2022 14:53:08 -0700 [thread overview]
Message-ID: <Yr4bREHJQV0oISSo@magnolia> (raw)
In-Reply-To: <de5566e71e038d95342d00364c6760c7078cb091.1656531090.git.khalid.aziz@oracle.com>
On Wed, Jun 29, 2022 at 04:53:52PM -0600, Khalid Aziz wrote:
> Add a ram-based filesystem that contains page table sharing
> information and files that enables processes to share page tables.
> This patch adds the basic filesystem that can be mounted.
>
> Signed-off-by: Khalid Aziz <khalid.aziz@oracle.com>
> ---
> Documentation/filesystems/msharefs.rst | 19 +++++
> include/uapi/linux/magic.h | 1 +
> mm/Makefile | 2 +-
> mm/mshare.c | 103 +++++++++++++++++++++++++
> 4 files changed, 124 insertions(+), 1 deletion(-)
> create mode 100644 Documentation/filesystems/msharefs.rst
> create mode 100644 mm/mshare.c
>
> diff --git a/Documentation/filesystems/msharefs.rst b/Documentation/filesystems/msharefs.rst
> new file mode 100644
> index 000000000000..fd161f67045d
> --- /dev/null
> +++ b/Documentation/filesystems/msharefs.rst
> @@ -0,0 +1,19 @@
> +.. SPDX-License-Identifier: GPL-2.0
> +
> +=====================================================
> +msharefs - a filesystem to support shared page tables
> +=====================================================
> +
> +msharefs is a ram-based filesystem that allows multiple processes to
> +share page table entries for shared pages.
> +
> +msharefs is typically mounted like this::
> +
> + mount -t msharefs none /sys/fs/mshare
> +
> +When a process calls mshare syscall with a name for the shared address
> +range,
You mean creat()?
> a file with the same name is created under msharefs with that
> +name. This file can be opened by another process, if permissions
> +allow, to query the addresses shared under this range. These files are
> +removed by mshare_unlink syscall and can not be deleted directly.
Oh?
> +Hence these files are created as immutable files.
> diff --git a/include/uapi/linux/magic.h b/include/uapi/linux/magic.h
> index f724129c0425..2a57a6ec6f3e 100644
> --- a/include/uapi/linux/magic.h
> +++ b/include/uapi/linux/magic.h
> @@ -105,5 +105,6 @@
> #define Z3FOLD_MAGIC 0x33
> #define PPC_CMM_MAGIC 0xc7571590
> #define SECRETMEM_MAGIC 0x5345434d /* "SECM" */
> +#define MSHARE_MAGIC 0x4d534852 /* "MSHR" */
>
> #endif /* __LINUX_MAGIC_H__ */
> diff --git a/mm/Makefile b/mm/Makefile
> index 6f9ffa968a1a..51a2ab9080d9 100644
> --- a/mm/Makefile
> +++ b/mm/Makefile
> @@ -37,7 +37,7 @@ CFLAGS_init-mm.o += $(call cc-disable-warning, override-init)
> CFLAGS_init-mm.o += $(call cc-disable-warning, initializer-overrides)
>
> mmu-y := nommu.o
> -mmu-$(CONFIG_MMU) := highmem.o memory.o mincore.o \
> +mmu-$(CONFIG_MMU) := highmem.o memory.o mincore.o mshare.o \
> mlock.o mmap.o mmu_gather.o mprotect.o mremap.o \
> msync.o page_vma_mapped.o pagewalk.o \
> pgtable-generic.o rmap.o vmalloc.o
> diff --git a/mm/mshare.c b/mm/mshare.c
> new file mode 100644
> index 000000000000..c8fab3869bab
> --- /dev/null
> +++ b/mm/mshare.c
Filesystems are usually supposed to live under fs/; is there some reason
to put it in mm/?
I guess shmfs is in mm so maybe this isn't much of an objection.
Also, should this fs be selectable via a Kconfig option?
--D
> @@ -0,0 +1,103 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * Enable copperating processes to share page table between
> + * them to reduce the extra memory consumed by multiple copies
> + * of page tables.
> + *
> + * This code adds an in-memory filesystem - msharefs.
> + * msharefs is used to manage page table sharing
> + *
> + *
> + * Copyright (C) 2022 Oracle Corp. All rights reserved.
> + * Author: Khalid Aziz <khalid.aziz@oracle.com>
> + *
> + */
> +
> +#include <linux/fs.h>
> +#include <linux/mount.h>
> +#include <linux/syscalls.h>
> +#include <linux/uaccess.h>
> +#include <linux/pseudo_fs.h>
> +#include <linux/fileattr.h>
> +#include <uapi/linux/magic.h>
> +#include <uapi/linux/limits.h>
> +
> +static struct super_block *msharefs_sb;
> +
> +static const struct file_operations msharefs_file_operations = {
> + .open = simple_open,
> + .llseek = no_llseek,
> +};
> +
> +static int
> +msharefs_d_hash(const struct dentry *dentry, struct qstr *qstr)
> +{
> + unsigned long hash = init_name_hash(dentry);
> + const unsigned char *s = qstr->name;
> + unsigned int len = qstr->len;
> +
> + while (len--)
> + hash = partial_name_hash(*s++, hash);
> + qstr->hash = end_name_hash(hash);
> + return 0;
> +}
> +
> +static const struct dentry_operations msharefs_d_ops = {
> + .d_hash = msharefs_d_hash,
> +};
> +
> +static int
> +msharefs_fill_super(struct super_block *sb, struct fs_context *fc)
> +{
> + static const struct tree_descr empty_descr = {""};
> + int err;
> +
> + sb->s_d_op = &msharefs_d_ops;
> + err = simple_fill_super(sb, MSHARE_MAGIC, &empty_descr);
> + if (err)
> + return err;
> +
> + msharefs_sb = sb;
> + return 0;
> +}
> +
> +static int
> +msharefs_get_tree(struct fs_context *fc)
> +{
> + return get_tree_single(fc, msharefs_fill_super);
> +}
> +
> +static const struct fs_context_operations msharefs_context_ops = {
> + .get_tree = msharefs_get_tree,
> +};
> +
> +static int
> +mshare_init_fs_context(struct fs_context *fc)
> +{
> + fc->ops = &msharefs_context_ops;
> + return 0;
> +}
> +
> +static struct file_system_type mshare_fs = {
> + .name = "msharefs",
> + .init_fs_context = mshare_init_fs_context,
> + .kill_sb = kill_litter_super,
> +};
> +
> +static int
> +mshare_init(void)
> +{
> + int ret = 0;
> +
> + ret = sysfs_create_mount_point(fs_kobj, "mshare");
> + if (ret)
> + return ret;
> +
> + ret = register_filesystem(&mshare_fs);
> + if (ret)
> + sysfs_remove_mount_point(fs_kobj, "mshare");
> +
> + return ret;
> +}
> +
> +fs_initcall(mshare_init);
> --
> 2.32.0
>
next prev parent reply other threads:[~2022-06-30 21:53 UTC|newest]
Thread overview: 43+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-06-29 22:53 [PATCH v2 0/9] Add support for shared PTEs across processes Khalid Aziz
2022-06-29 22:53 ` [PATCH v2 1/9] mm: Add msharefs filesystem Khalid Aziz
2022-06-30 21:53 ` Darrick J. Wong [this message]
2022-07-01 16:05 ` Khalid Aziz
2022-06-30 22:57 ` Al Viro
2022-07-01 16:08 ` Khalid Aziz
2022-06-29 22:53 ` [PATCH v2 2/9] mm/mshare: pre-populate msharefs with information file Khalid Aziz
2022-06-30 21:37 ` Darrick J. Wong
2022-06-30 22:54 ` Khalid Aziz
2022-06-30 23:01 ` Al Viro
2022-07-01 16:11 ` Khalid Aziz
2022-06-29 22:53 ` [PATCH v2 3/9] mm/mshare: make msharefs writable and support directories Khalid Aziz
2022-06-30 21:34 ` Darrick J. Wong
2022-06-30 22:49 ` Khalid Aziz
2022-06-30 23:09 ` Al Viro
2022-07-02 0:22 ` Khalid Aziz
2022-06-29 22:53 ` [PATCH v2 4/9] mm/mshare: Add a read operation for msharefs files Khalid Aziz
2022-06-30 21:27 ` Darrick J. Wong
2022-06-30 22:27 ` Khalid Aziz
2022-06-29 22:53 ` [PATCH v2 5/9] mm/mshare: Add vm flag for shared PTE Khalid Aziz
2022-06-30 14:59 ` Mark Hemment
2022-06-30 15:46 ` Khalid Aziz
2022-06-29 22:53 ` [PATCH v2 6/9] mm/mshare: Add mmap operation Khalid Aziz
2022-06-30 21:44 ` Darrick J. Wong
2022-06-30 23:30 ` Khalid Aziz
2022-06-29 22:53 ` [PATCH v2 7/9] mm/mshare: Add unlink and munmap support Khalid Aziz
2022-06-30 21:50 ` Darrick J. Wong
2022-07-01 15:58 ` Khalid Aziz
2022-06-29 22:53 ` [PATCH v2 8/9] mm/mshare: Add basic page table sharing support Khalid Aziz
2022-07-07 9:13 ` Xin Hao
2022-07-07 15:33 ` Khalid Aziz
2022-06-29 22:54 ` [PATCH v2 9/9] mm/mshare: Enable mshare region mapping across processes Khalid Aziz
2022-06-30 11:57 ` [PATCH v2 0/9] Add support for shared PTEs " Mark Hemment
2022-06-30 15:39 ` Khalid Aziz
2022-07-02 4:24 ` Andrew Morton
2022-07-06 19:26 ` Khalid Aziz
2022-07-08 11:47 ` David Hildenbrand
2022-07-08 19:36 ` Khalid Aziz
2022-07-13 14:00 ` David Hildenbrand
2022-07-13 17:58 ` Mike Kravetz
2022-07-13 18:03 ` David Hildenbrand
2022-07-14 22:02 ` Khalid Aziz
2022-07-18 12:59 ` David Hildenbrand
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Yr4bREHJQV0oISSo@magnolia \
--to=djwong@kernel.org \
--cc=21cnbao@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=aneesh.kumar@linux.ibm.com \
--cc=arnd@arndb.de \
--cc=corbet@lwn.net \
--cc=dave.hansen@linux.intel.com \
--cc=david@redhat.com \
--cc=ebiederm@xmission.com \
--cc=hagen@jauu.net \
--cc=jack@suse.cz \
--cc=keescook@chromium.org \
--cc=khalid.aziz@oracle.com \
--cc=kirill@shutemov.name \
--cc=kucharsk@gmail.com \
--cc=linkinjeon@kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=longpeng2@huawei.com \
--cc=luto@kernel.org \
--cc=markhemm@googlemail.com \
--cc=pcc@google.com \
--cc=rppt@kernel.org \
--cc=sieberf@amazon.com \
--cc=sjpark@amazon.de \
--cc=surenb@google.com \
--cc=tst@schoebel-theuer.de \
--cc=willy@infradead.org \
--cc=yzaikin@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).