linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH 0/2] Add flag as THP allocation hint for memfd_restricted() syscall
@ 2023-02-18  0:43 Ackerley Tng
  2023-02-18  0:43 ` [RFC PATCH 1/2] mm: restrictedmem: " Ackerley Tng
                   ` (2 more replies)
  0 siblings, 3 replies; 5+ messages in thread
From: Ackerley Tng @ 2023-02-18  0:43 UTC (permalink / raw)
  To: kvm, linux-api, linux-arch, linux-doc, linux-fsdevel,
	linux-kernel, linux-mm, qemu-devel
  Cc: aarcange, ak, akpm, arnd, bfields, bp, chao.p.peng, corbet,
	dave.hansen, david, ddutile, dhildenb, hpa, hughd, jlayton,
	jmattson, joro, jun.nakajima, kirill.shutemov, linmiaohe, luto,
	mail, mhocko, michael.roth, mingo, naoya.horiguchi, pbonzini,
	qperret, rppt, seanjc, shuah, steven.price, tabba, tglx,
	vannapurve, vbabka, vkuznets, wanpengli, wei.w.wang, x86,
	yu.c.zhang, Ackerley Tng

Hello,

This patchset builds upon the memfd_restricted() system call that has
been discussed in the ‘KVM: mm: fd-based approach for supporting KVM’
patch series, at
https://lore.kernel.org/lkml/20221202061347.1070246-1-chao.p.peng@linux.intel.com/T/#m7e944d7892afdd1d62a03a287bd488c56e377b0c

The tree can be found at:
https://github.com/googleprodkernel/linux-cc/tree/restrictedmem-rmfd-hugepage

Following the RFC to provide mount for memfd_restricted() syscall at
https://lore.kernel.org/lkml/cover.1676507663.git.ackerleytng@google.com/T/#u,
this patchset adds the RMFD_HUGEPAGE flag to the memfd_restricted()
syscall, which will hint the kernel to use Transparent HugePages to
back restrictedmem pages.

This supplements the interface proposed earlier, which requires the
creation of a tmpfs mount to be passed to memfd_restricted(), with a
more direct per-file hint.

Dependencies:

+ Sean’s iteration of the ‘KVM: mm: fd-based approach for supporting
  KVM’ patch series at
  https://github.com/sean-jc/linux/tree/x86/upm_base_support
+ Proposed fix for restrictedmem_getattr() as mentioned on the mailing
  list at
  https://lore.kernel.org/lkml/diqzzga0fv96.fsf@ackerleytng-cloudtop-sg.c.googlers.com/
+ Hugh’s patch:
  https://lore.kernel.org/lkml/c140f56a-1aa3-f7ae-b7d1-93da7d5a3572@google.com/,
  which provides functionality in shmem that reads the VM_HUGEPAGE
  flag in key functions shmem_is_huge() and shmem_get_inode()

Future work/TODOs:
+ man page for the memfd_restricted() syscall
+ Support for per file NUMA binding hints

Ackerley Tng (2):
  mm: restrictedmem: Add flag as THP allocation hint for
    memfd_restricted() syscall
  selftests: restrictedmem: Add selftest for RMFD_HUGEPAGE

 include/uapi/linux/restrictedmem.h            |  1 +
 mm/restrictedmem.c                            | 27 ++++++++++++-------
 .../restrictedmem_hugepage_test.c             | 25 +++++++++++++++++
 3 files changed, 43 insertions(+), 10 deletions(-)

--
2.39.2.637.g21b0678d19-goog

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [RFC PATCH 1/2] mm: restrictedmem: Add flag as THP allocation hint for memfd_restricted() syscall
  2023-02-18  0:43 [RFC PATCH 0/2] Add flag as THP allocation hint for memfd_restricted() syscall Ackerley Tng
@ 2023-02-18  0:43 ` Ackerley Tng
  2023-02-18  0:43 ` [RFC PATCH 2/2] selftests: restrictedmem: Add selftest for RMFD_HUGEPAGE Ackerley Tng
  2023-02-20  3:04 ` [RFC PATCH 0/2] Add flag as THP allocation hint for memfd_restricted() syscall Yuan Yao
  2 siblings, 0 replies; 5+ messages in thread
From: Ackerley Tng @ 2023-02-18  0:43 UTC (permalink / raw)
  To: kvm, linux-api, linux-arch, linux-doc, linux-fsdevel,
	linux-kernel, linux-mm, qemu-devel
  Cc: aarcange, ak, akpm, arnd, bfields, bp, chao.p.peng, corbet,
	dave.hansen, david, ddutile, dhildenb, hpa, hughd, jlayton,
	jmattson, joro, jun.nakajima, kirill.shutemov, linmiaohe, luto,
	mail, mhocko, michael.roth, mingo, naoya.horiguchi, pbonzini,
	qperret, rppt, seanjc, shuah, steven.price, tabba, tglx,
	vannapurve, vbabka, vkuznets, wanpengli, wei.w.wang, x86,
	yu.c.zhang, Ackerley Tng

Allow userspace to hint the kernel to use Transparent HugePages to
back restricted memory on a per-file basis.

Signed-off-by: Ackerley Tng <ackerleytng@google.com>
---
 include/uapi/linux/restrictedmem.h |  1 +
 mm/restrictedmem.c                 | 27 +++++++++++++++++----------
 2 files changed, 18 insertions(+), 10 deletions(-)

diff --git a/include/uapi/linux/restrictedmem.h b/include/uapi/linux/restrictedmem.h
index 9f108dd1ac4c..f671ccbb43bc 100644
--- a/include/uapi/linux/restrictedmem.h
+++ b/include/uapi/linux/restrictedmem.h
@@ -4,5 +4,6 @@
 
 /* flags for memfd_restricted */
 #define RMFD_TMPFILE		0x0001U
+#define RMFD_HUGEPAGE		0x0002U
 
 #endif /* _UAPI_LINUX_RESTRICTEDMEM_H */
diff --git a/mm/restrictedmem.c b/mm/restrictedmem.c
index 97f3e2159e8b..87c829960b31 100644
--- a/mm/restrictedmem.c
+++ b/mm/restrictedmem.c
@@ -190,19 +190,25 @@ static struct file *restrictedmem_file_create(struct file *memfd)
 	return file;
 }
 
-static int restrictedmem_create(struct vfsmount *mount)
+static int restrictedmem_create(unsigned int flags, struct vfsmount *mount)
 {
 	struct file *file, *restricted_file;
 	int fd, err;
+	unsigned long shmem_setup_flags = VM_NORESERVE;
 
 	fd = get_unused_fd_flags(0);
 	if (fd < 0)
 		return fd;
 
-	if (mount)
-		file = shmem_file_setup_with_mnt(mount, "memfd:restrictedmem", 0, VM_NORESERVE);
-	else
-		file = shmem_file_setup("memfd:restrictedmem", 0, VM_NORESERVE);
+	if (flags & RMFD_HUGEPAGE)
+		shmem_setup_flags |= VM_HUGEPAGE;
+
+	if (mount) {
+		file = shmem_file_setup_with_mnt(mount, "memfd:restrictedmem",
+						 0, shmem_setup_flags);
+	} else {
+		file = shmem_file_setup("memfd:restrictedmem", 0, shmem_setup_flags);
+	}
 
 	if (IS_ERR(file)) {
 		err = PTR_ERR(file);
@@ -230,7 +236,8 @@ static bool is_shmem_mount(struct vfsmount *mnt)
 	return mnt->mnt_sb->s_magic == TMPFS_MAGIC;
 }
 
-static int restrictedmem_create_from_path(const char __user *mount_path)
+static int restrictedmem_create_from_path(unsigned int flags,
+					  const char __user *mount_path)
 {
 	int ret;
 	struct path path;
@@ -250,7 +257,7 @@ static int restrictedmem_create_from_path(const char __user *mount_path)
 	if (unlikely(ret))
 		goto out;
 
-	ret = restrictedmem_create(path.mnt);
+	ret = restrictedmem_create(flags, path.mnt);
 
 	mnt_drop_write(path.mnt);
 out:
@@ -261,16 +268,16 @@ static int restrictedmem_create_from_path(const char __user *mount_path)
 
 SYSCALL_DEFINE2(memfd_restricted, unsigned int, flags, const char __user *, mount_path)
 {
-	if (flags & ~RMFD_TMPFILE)
+	if (flags & ~(RMFD_TMPFILE | RMFD_HUGEPAGE))
 		return -EINVAL;
 
 	if (flags == RMFD_TMPFILE) {
 		if (!mount_path)
 			return -EINVAL;
 
-		return restrictedmem_create_from_path(mount_path);
+		return restrictedmem_create_from_path(flags, mount_path);
 	} else {
-		return restrictedmem_create(NULL);
+		return restrictedmem_create(flags, NULL);
 	}
 }
 
-- 
2.39.2.637.g21b0678d19-goog


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [RFC PATCH 2/2] selftests: restrictedmem: Add selftest for RMFD_HUGEPAGE
  2023-02-18  0:43 [RFC PATCH 0/2] Add flag as THP allocation hint for memfd_restricted() syscall Ackerley Tng
  2023-02-18  0:43 ` [RFC PATCH 1/2] mm: restrictedmem: " Ackerley Tng
@ 2023-02-18  0:43 ` Ackerley Tng
  2023-02-20  3:04 ` [RFC PATCH 0/2] Add flag as THP allocation hint for memfd_restricted() syscall Yuan Yao
  2 siblings, 0 replies; 5+ messages in thread
From: Ackerley Tng @ 2023-02-18  0:43 UTC (permalink / raw)
  To: kvm, linux-api, linux-arch, linux-doc, linux-fsdevel,
	linux-kernel, linux-mm, qemu-devel
  Cc: aarcange, ak, akpm, arnd, bfields, bp, chao.p.peng, corbet,
	dave.hansen, david, ddutile, dhildenb, hpa, hughd, jlayton,
	jmattson, joro, jun.nakajima, kirill.shutemov, linmiaohe, luto,
	mail, mhocko, michael.roth, mingo, naoya.horiguchi, pbonzini,
	qperret, rppt, seanjc, shuah, steven.price, tabba, tglx,
	vannapurve, vbabka, vkuznets, wanpengli, wei.w.wang, x86,
	yu.c.zhang, Ackerley Tng

Tests that when RMFD_HUGEPAGE is specified, restrictedmem will be
backed by Transparent HugePages.

Signed-off-by: Ackerley Tng <ackerleytng@google.com>
---
 .../restrictedmem_hugepage_test.c             | 25 +++++++++++++++++++
 1 file changed, 25 insertions(+)

diff --git a/tools/testing/selftests/restrictedmem/restrictedmem_hugepage_test.c b/tools/testing/selftests/restrictedmem/restrictedmem_hugepage_test.c
index 0d9cf2ced754..75283d68696f 100644
--- a/tools/testing/selftests/restrictedmem/restrictedmem_hugepage_test.c
+++ b/tools/testing/selftests/restrictedmem/restrictedmem_hugepage_test.c
@@ -180,6 +180,31 @@ TEST_F(reset_shmem_enabled, restrictedmem_fstat_shmem_enabled_always)
 	close(mfd);
 }
 
+TEST(restrictedmem_invalid_flags)
+{
+	int mfd = memfd_restricted(99, NULL);
+
+	ASSERT_EQ(-1, mfd);
+	ASSERT_EQ(EINVAL, errno);
+}
+
+TEST_F(reset_shmem_enabled, restrictedmem_rmfd_hugepage)
+{
+	int mfd = -1;
+	struct stat stat;
+
+	ASSERT_EQ(0, set_shmem_thp_policy("never"));
+
+	mfd = memfd_restricted(RMFD_HUGEPAGE, NULL);
+	ASSERT_NE(-1, mfd);
+
+	ASSERT_EQ(0, fstat(mfd, &stat));
+
+	ASSERT_EQ(stat.st_blksize, get_hpage_pmd_size());
+
+	close(mfd);
+}
+
 TEST(restrictedmem_tmpfile_no_mount_path)
 {
 	int mfd = memfd_restricted(RMFD_TMPFILE, NULL);
-- 
2.39.2.637.g21b0678d19-goog


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [RFC PATCH 0/2] Add flag as THP allocation hint for memfd_restricted() syscall
  2023-02-18  0:43 [RFC PATCH 0/2] Add flag as THP allocation hint for memfd_restricted() syscall Ackerley Tng
  2023-02-18  0:43 ` [RFC PATCH 1/2] mm: restrictedmem: " Ackerley Tng
  2023-02-18  0:43 ` [RFC PATCH 2/2] selftests: restrictedmem: Add selftest for RMFD_HUGEPAGE Ackerley Tng
@ 2023-02-20  3:04 ` Yuan Yao
  2023-02-23  1:31   ` Ackerley Tng
  2 siblings, 1 reply; 5+ messages in thread
From: Yuan Yao @ 2023-02-20  3:04 UTC (permalink / raw)
  To: Ackerley Tng
  Cc: kvm, linux-api, linux-arch, linux-doc, linux-fsdevel,
	linux-kernel, linux-mm, qemu-devel, aarcange, ak, akpm, arnd,
	bfields, bp, chao.p.peng, corbet, dave.hansen, david, ddutile,
	dhildenb, hpa, hughd, jlayton, jmattson, joro, jun.nakajima,
	kirill.shutemov, linmiaohe, luto, mail, mhocko, michael.roth,
	mingo, naoya.horiguchi, pbonzini, qperret, rppt, seanjc, shuah,
	steven.price, tabba, tglx, vannapurve, vbabka, vkuznets,
	wanpengli, wei.w.wang, x86, yu.c.zhang

On Sat, Feb 18, 2023 at 12:43:00AM +0000, Ackerley Tng wrote:
> Hello,
>
> This patchset builds upon the memfd_restricted() system call that has
> been discussed in the ‘KVM: mm: fd-based approach for supporting KVM’
> patch series, at
> https://lore.kernel.org/lkml/20221202061347.1070246-1-chao.p.peng@linux.intel.com/T/#m7e944d7892afdd1d62a03a287bd488c56e377b0c
>
> The tree can be found at:
> https://github.com/googleprodkernel/linux-cc/tree/restrictedmem-rmfd-hugepage
>
> Following the RFC to provide mount for memfd_restricted() syscall at
> https://lore.kernel.org/lkml/cover.1676507663.git.ackerleytng@google.com/T/#u,
> this patchset adds the RMFD_HUGEPAGE flag to the memfd_restricted()
> syscall, which will hint the kernel to use Transparent HugePages to
> back restrictedmem pages.
>
> This supplements the interface proposed earlier, which requires the
> creation of a tmpfs mount to be passed to memfd_restricted(), with a
> more direct per-file hint.
>
> Dependencies:
>
> + Sean’s iteration of the ‘KVM: mm: fd-based approach for supporting
>   KVM’ patch series at
>   https://github.com/sean-jc/linux/tree/x86/upm_base_support
> + Proposed fix for restrictedmem_getattr() as mentioned on the mailing
>   list at
>   https://lore.kernel.org/lkml/diqzzga0fv96.fsf@ackerleytng-cloudtop-sg.c.googlers.com/
> + Hugh’s patch:
>   https://lore.kernel.org/lkml/c140f56a-1aa3-f7ae-b7d1-93da7d5a3572@google.com/,
>   which provides functionality in shmem that reads the VM_HUGEPAGE
>   flag in key functions shmem_is_huge() and shmem_get_inode()

Will Hugh's patch be merged into 6.3 ? I didn't find it in 6.2-rc8.
IMHO this patch won't work without Hugh's patch, or at least need
another way, e.g. HMEM_SB(inode->i_sb)->huge.

>
> Future work/TODOs:
> + man page for the memfd_restricted() syscall
> + Support for per file NUMA binding hints
>
> Ackerley Tng (2):
>   mm: restrictedmem: Add flag as THP allocation hint for
>     memfd_restricted() syscall
>   selftests: restrictedmem: Add selftest for RMFD_HUGEPAGE
>
>  include/uapi/linux/restrictedmem.h            |  1 +
>  mm/restrictedmem.c                            | 27 ++++++++++++-------
>  .../restrictedmem_hugepage_test.c             | 25 +++++++++++++++++
>  3 files changed, 43 insertions(+), 10 deletions(-)
>
> --
> 2.39.2.637.g21b0678d19-goog
>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC PATCH 0/2] Add flag as THP allocation hint for memfd_restricted() syscall
  2023-02-20  3:04 ` [RFC PATCH 0/2] Add flag as THP allocation hint for memfd_restricted() syscall Yuan Yao
@ 2023-02-23  1:31   ` Ackerley Tng
  0 siblings, 0 replies; 5+ messages in thread
From: Ackerley Tng @ 2023-02-23  1:31 UTC (permalink / raw)
  To: Yuan Yao
  Cc: kvm, linux-api, linux-arch, linux-doc, linux-fsdevel,
	linux-kernel, linux-mm, qemu-devel, aarcange, ak, akpm, arnd,
	bfields, bp, chao.p.peng, corbet, dave.hansen, david, ddutile,
	dhildenb, hpa, hughd, jlayton, jmattson, joro, jun.nakajima,
	kirill.shutemov, linmiaohe, luto, mail, mhocko, michael.roth,
	mingo, naoya.horiguchi, pbonzini, qperret, rppt, seanjc, shuah,
	steven.price, tabba, tglx, vannapurve, vbabka, vkuznets,
	wanpengli, wei.w.wang, x86, yu.c.zhang

Yuan Yao <yuan.yao@linux.intel.com> writes:

> On Sat, Feb 18, 2023 at 12:43:00AM +0000, Ackerley Tng wrote:
>> Hello,

>> This patchset builds upon the memfd_restricted() system call that has
>> been discussed in the ‘KVM: mm: fd-based approach for supporting KVM’
>> patch series, at
>> https://lore.kernel.org/lkml/20221202061347.1070246-1-chao.p.peng@linux.intel.com/T/#m7e944d7892afdd1d62a03a287bd488c56e377b0c

>> The tree can be found at:
>> https://github.com/googleprodkernel/linux-cc/tree/restrictedmem-rmfd-hugepage

>> Following the RFC to provide mount for memfd_restricted() syscall at
>> https://lore.kernel.org/lkml/cover.1676507663.git.ackerleytng@google.com/T/#u,
>> this patchset adds the RMFD_HUGEPAGE flag to the memfd_restricted()
>> syscall, which will hint the kernel to use Transparent HugePages to
>> back restrictedmem pages.

>> This supplements the interface proposed earlier, which requires the
>> creation of a tmpfs mount to be passed to memfd_restricted(), with a
>> more direct per-file hint.

>> Dependencies:

>> + Sean’s iteration of the ‘KVM: mm: fd-based approach for supporting
>>    KVM’ patch series at
>>    https://github.com/sean-jc/linux/tree/x86/upm_base_support
>> + Proposed fix for restrictedmem_getattr() as mentioned on the mailing
>>    list at
>>     
>> https://lore.kernel.org/lkml/diqzzga0fv96.fsf@ackerleytng-cloudtop-sg.c.googlers.com/
>> + Hugh’s patch:
>>     
>> https://lore.kernel.org/lkml/c140f56a-1aa3-f7ae-b7d1-93da7d5a3572@google.com/,
>>    which provides functionality in shmem that reads the VM_HUGEPAGE
>>    flag in key functions shmem_is_huge() and shmem_get_inode()

> Will Hugh's patch be merged into 6.3 ? I didn't find it in 6.2-rc8.
> IMHO this patch won't work without Hugh's patch, or at least need
> another way, e.g. HMEM_SB(inode->i_sb)->huge.


Hugh's patch is still pending discussion and may not be merged so
soon. These patches will not work without Hugh's patch.

I would like to understand what the community thinks of the proposed
interface (RMFD_HUGEPAGE flag, passed to the memfd_restricted()
syscall). If this interface is favorably received, we can definitely
find another way for shmem to support this interface.

If I understand correctly, SHMEM_SB(inode->i_sb)->huge checks the state
of hugepage-ness for the superblock. Since the proposed interface will
only affect a single file, we will need something closer to

     bool shmem_is_huge(struct vm_area_struct *vma, struct inode *inode,
                        pgoff_t index, bool shmem_huge_force)
     {
             ...

             if (SHMEM_I(inode)->flags & VM_HUGEPAGE)
                     return true;

             ...
     }

from Hugh's patch.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2023-02-23  1:32 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-02-18  0:43 [RFC PATCH 0/2] Add flag as THP allocation hint for memfd_restricted() syscall Ackerley Tng
2023-02-18  0:43 ` [RFC PATCH 1/2] mm: restrictedmem: " Ackerley Tng
2023-02-18  0:43 ` [RFC PATCH 2/2] selftests: restrictedmem: Add selftest for RMFD_HUGEPAGE Ackerley Tng
2023-02-20  3:04 ` [RFC PATCH 0/2] Add flag as THP allocation hint for memfd_restricted() syscall Yuan Yao
2023-02-23  1:31   ` Ackerley Tng

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).