* [PATCH v4 0/3] mm/memfd: Reserve hugetlb folios before allocation
@ 2025-06-18 5:30 Vivek Kasireddy
2025-06-18 5:30 ` [PATCH v4 1/3] mm/hugetlb: Make hugetlb_reserve_pages() return nr of entries updated Vivek Kasireddy
` (2 more replies)
0 siblings, 3 replies; 6+ messages in thread
From: Vivek Kasireddy @ 2025-06-18 5:30 UTC (permalink / raw)
To: dri-devel, linux-mm
Cc: Vivek Kasireddy, Gerd Hoffmann, Steve Sistare, Muchun Song,
David Hildenbrand, Andrew Morton
There are cases when we try to pin a folio but discover that it has
not been faulted-in. So, we try to allocate it in memfd_alloc_folio()
but the allocation request may not succeed if there are no active
reservations in the system at that instant.
Therefore, making a reservation (by calling hugetlb_reserve_pages())
associated with the allocation will ensure that our request would
not fail due to lack of reservations. This will also ensure that
proper region/subpool accounting is done with our allocation.
-----------------------------
Patchset overview:
Patch 1: Return nr of updated entries from hugetlb_reserve_pages()
Patch 2: Make reservation (hugetlb_reserve_pages) in memfd_alloc_folio()
Patch 3: New udmabuf selftest to invoke memfd_alloc_folio()
This series is tested by running the new udmabuf selftest introduced
in patch #3 along with the other selftests.
Changelog:
v3 -> v4:
- Create a standalone patch to fix the BUG reported by syzbot that
can be backported to stable kernels (Andrew)
- Split the changes in memfd_alloc_folio() that add a call to
hugetlb_reserve_pages() into a separate patch
v2 -> v3:
- Call hugetlb_unreserve_pages() only if the reservation was actively
(and successfully) made from memfd_alloc_folio() (David)
v1 -> v2:
- Replace VM_BUG_ON() with WARN_ON_ONCE() in the function
alloc_hugetlb_folio_reserve() (David)
- Move the inline function subpool_inode() from hugetlb.c into the
relevant header (hugetlb.h)
- Call hugetlb_unreserve_pages() if the folio cannot be added to
the page cache as well
- Added a new udmabuf selftest to exercise the same path as that
of syzbot
Cc: Gerd Hoffmann <kraxel@redhat.com>
Cc: Steve Sistare <steven.sistare@oracle.com>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: David Hildenbrand <david@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Vivek Kasireddy (3):
mm/hugetlb: Make hugetlb_reserve_pages() return nr of entries updated
mm/memfd: Reserve hugetlb folios before allocation
selftests/udmabuf: Add a test to pin first before writing to memfd
fs/hugetlbfs/inode.c | 8 +++----
include/linux/hugetlb.h | 7 +++++-
mm/hugetlb.c | 24 ++++++++++---------
mm/memfd.c | 17 ++++++++++---
.../selftests/drivers/dma-buf/udmabuf.c | 20 +++++++++++++++-
5 files changed, 56 insertions(+), 20 deletions(-)
--
2.49.0
^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH v4 1/3] mm/hugetlb: Make hugetlb_reserve_pages() return nr of entries updated
2025-06-18 5:30 [PATCH v4 0/3] mm/memfd: Reserve hugetlb folios before allocation Vivek Kasireddy
@ 2025-06-18 5:30 ` Vivek Kasireddy
2025-06-18 5:30 ` [PATCH v4 2/3] mm/memfd: Reserve hugetlb folios before allocation Vivek Kasireddy
2025-06-18 5:30 ` [PATCH v4 3/3] selftests/udmabuf: Add a test to pin first before writing to memfd Vivek Kasireddy
2 siblings, 0 replies; 6+ messages in thread
From: Vivek Kasireddy @ 2025-06-18 5:30 UTC (permalink / raw)
To: dri-devel, linux-mm
Cc: Vivek Kasireddy, Steve Sistare, Muchun Song, David Hildenbrand,
Andrew Morton
Currently, hugetlb_reserve_pages() returns a bool to indicate whether
the reservation map update for the range [from, to] was successful or
not. This is not sufficient for the case where the caller needs to
determine how many entries were updated for the range.
Therefore, have hugetlb_reserve_pages() return the number of entries
updated in the reservation map associated with the range [from, to].
Also, update the callers of hugetlb_reserve_pages() to handle the new
return value.
Cc: Steve Sistare <steven.sistare@oracle.com>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: David Hildenbrand <david@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Vivek Kasireddy <vivek.kasireddy@intel.com>
---
fs/hugetlbfs/inode.c | 8 ++++----
include/linux/hugetlb.h | 2 +-
mm/hugetlb.c | 19 +++++++++++++------
3 files changed, 18 insertions(+), 11 deletions(-)
diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
index e4de5425838d..00b2d1a032fd 100644
--- a/fs/hugetlbfs/inode.c
+++ b/fs/hugetlbfs/inode.c
@@ -150,10 +150,10 @@ static int hugetlbfs_file_mmap(struct file *file, struct vm_area_struct *vma)
if (inode->i_flags & S_PRIVATE)
vm_flags |= VM_NORESERVE;
- if (!hugetlb_reserve_pages(inode,
+ if (hugetlb_reserve_pages(inode,
vma->vm_pgoff >> huge_page_order(h),
len >> huge_page_shift(h), vma,
- vm_flags))
+ vm_flags) < 0)
goto out;
ret = 0;
@@ -1561,9 +1561,9 @@ struct file *hugetlb_file_setup(const char *name, size_t size,
inode->i_size = size;
clear_nlink(inode);
- if (!hugetlb_reserve_pages(inode, 0,
+ if (hugetlb_reserve_pages(inode, 0,
size >> huge_page_shift(hstate_inode(inode)), NULL,
- acctflag))
+ acctflag) < 0)
file = ERR_PTR(-ENOMEM);
else
file = alloc_file_pseudo(inode, mnt, name, O_RDWR,
diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index 42f374e828a2..d8310b0f36dd 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -149,7 +149,7 @@ int hugetlb_mfill_atomic_pte(pte_t *dst_pte,
uffd_flags_t flags,
struct folio **foliop);
#endif /* CONFIG_USERFAULTFD */
-bool hugetlb_reserve_pages(struct inode *inode, long from, long to,
+long hugetlb_reserve_pages(struct inode *inode, long from, long to,
struct vm_area_struct *vma,
vm_flags_t vm_flags);
long hugetlb_unreserve_pages(struct inode *inode, long start, long end,
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index f74c54ecf955..6b34152744cc 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -7267,8 +7267,15 @@ long hugetlb_change_protection(struct vm_area_struct *vma,
return pages > 0 ? (pages << h->order) : pages;
}
-/* Return true if reservation was successful, false otherwise. */
-bool hugetlb_reserve_pages(struct inode *inode,
+/*
+ * Update the reservation map for the range [from, to].
+ *
+ * Returns the number of entries that would be added to the reservation map
+ * associated with the range [from, to]. This number is greater or equal to
+ * zero. -EINVAL or -ENOMEM is returned in case of any errors.
+ */
+
+long hugetlb_reserve_pages(struct inode *inode,
long from, long to,
struct vm_area_struct *vma,
vm_flags_t vm_flags)
@@ -7283,7 +7290,7 @@ bool hugetlb_reserve_pages(struct inode *inode,
/* This should never happen */
if (from > to) {
VM_WARN(1, "%s called with a negative range\n", __func__);
- return false;
+ return -EINVAL;
}
/*
@@ -7298,7 +7305,7 @@ bool hugetlb_reserve_pages(struct inode *inode,
* without using reserves
*/
if (vm_flags & VM_NORESERVE)
- return true;
+ return 0;
/*
* Shared mappings base their reservation on the number of pages that
@@ -7405,7 +7412,7 @@ bool hugetlb_reserve_pages(struct inode *inode,
hugetlb_cgroup_put_rsvd_cgroup(h_cg);
}
}
- return true;
+ return chg;
out_put_pages:
spool_resv = chg - gbl_reserve;
@@ -7433,7 +7440,7 @@ bool hugetlb_reserve_pages(struct inode *inode,
kref_put(&resv_map->refs, resv_map_release);
set_vma_resv_map(vma, NULL);
}
- return false;
+ return chg < 0 ? chg : add < 0 ? add : -EINVAL;
}
long hugetlb_unreserve_pages(struct inode *inode, long start, long end,
--
2.49.0
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH v4 2/3] mm/memfd: Reserve hugetlb folios before allocation
2025-06-18 5:30 [PATCH v4 0/3] mm/memfd: Reserve hugetlb folios before allocation Vivek Kasireddy
2025-06-18 5:30 ` [PATCH v4 1/3] mm/hugetlb: Make hugetlb_reserve_pages() return nr of entries updated Vivek Kasireddy
@ 2025-06-18 5:30 ` Vivek Kasireddy
2025-06-18 7:46 ` Oscar Salvador
2025-06-18 5:30 ` [PATCH v4 3/3] selftests/udmabuf: Add a test to pin first before writing to memfd Vivek Kasireddy
2 siblings, 1 reply; 6+ messages in thread
From: Vivek Kasireddy @ 2025-06-18 5:30 UTC (permalink / raw)
To: dri-devel, linux-mm
Cc: Vivek Kasireddy, Steve Sistare, Muchun Song, David Hildenbrand,
Andrew Morton
When we try to allocate a folio via alloc_hugetlb_folio_reserve(),
we need to ensure that there is an active reservation associated
with the allocation. Otherwise, our allocation request would fail
if there are no active reservations made at that moment against any
other allocations. This is because alloc_hugetlb_folio_reserve()
checks h->resv_huge_pages before proceeding with the allocation.
Therefore, to address this issue, we just need to make a reservation
(by calling hugetlb_reserve_pages()) before we try to allocate the
folio. This will also ensure that proper region/subpool accounting is
done associated with our allocation.
Cc: Steve Sistare <steven.sistare@oracle.com>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: David Hildenbrand <david@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Vivek Kasireddy <vivek.kasireddy@intel.com>
---
include/linux/hugetlb.h | 5 +++++
mm/hugetlb.c | 5 -----
mm/memfd.c | 17 ++++++++++++++---
3 files changed, 19 insertions(+), 8 deletions(-)
diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index d8310b0f36dd..c6c87eae4a8d 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -740,6 +740,11 @@ extern unsigned int default_hstate_idx;
#define default_hstate (hstates[default_hstate_idx])
+static inline struct hugepage_subpool *subpool_inode(struct inode *inode)
+{
+ return HUGETLBFS_SB(inode->i_sb)->spool;
+}
+
static inline struct hugepage_subpool *hugetlb_folio_subpool(struct folio *folio)
{
return folio->_hugetlb_subpool;
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 6b34152744cc..57d85af6db3f 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -284,11 +284,6 @@ static long hugepage_subpool_put_pages(struct hugepage_subpool *spool,
return ret;
}
-static inline struct hugepage_subpool *subpool_inode(struct inode *inode)
-{
- return HUGETLBFS_SB(inode->i_sb)->spool;
-}
-
static inline struct hugepage_subpool *subpool_vma(struct vm_area_struct *vma)
{
return subpool_inode(file_inode(vma->vm_file));
diff --git a/mm/memfd.c b/mm/memfd.c
index ab367e61553d..2c861a7ac345 100644
--- a/mm/memfd.c
+++ b/mm/memfd.c
@@ -71,7 +71,6 @@ struct folio *memfd_alloc_folio(struct file *memfd, pgoff_t idx)
#ifdef CONFIG_HUGETLB_PAGE
struct folio *folio;
gfp_t gfp_mask;
- int err;
if (is_file_hugepages(memfd)) {
/*
@@ -80,12 +79,19 @@ struct folio *memfd_alloc_folio(struct file *memfd, pgoff_t idx)
* alloc from. Also, the folio will be pinned for an indefinite
* amount of time, so it is not expected to be migrated away.
*/
+ struct inode *inode = file_inode(memfd);
struct hstate *h = hstate_file(memfd);
+ int err = -ENOMEM;
+ long nr_resv;
gfp_mask = htlb_alloc_mask(h);
gfp_mask &= ~(__GFP_HIGHMEM | __GFP_MOVABLE);
idx >>= huge_page_order(h);
+ nr_resv = hugetlb_reserve_pages(inode, idx, idx + 1, NULL, 0);
+ if (nr_resv < 0)
+ return ERR_PTR(nr_resv);
+
folio = alloc_hugetlb_folio_reserve(h,
numa_node_id(),
NULL,
@@ -96,12 +102,17 @@ struct folio *memfd_alloc_folio(struct file *memfd, pgoff_t idx)
idx);
if (err) {
folio_put(folio);
- return ERR_PTR(err);
+ goto err_unresv;
}
+
+ hugetlb_set_folio_subpool(folio, subpool_inode(inode));
folio_unlock(folio);
return folio;
}
- return ERR_PTR(-ENOMEM);
+err_unresv:
+ if (nr_resv > 0)
+ hugetlb_unreserve_pages(inode, idx, idx + 1, 0);
+ return ERR_PTR(err);
}
#endif
return shmem_read_folio(memfd->f_mapping, idx);
--
2.49.0
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH v4 3/3] selftests/udmabuf: Add a test to pin first before writing to memfd
2025-06-18 5:30 [PATCH v4 0/3] mm/memfd: Reserve hugetlb folios before allocation Vivek Kasireddy
2025-06-18 5:30 ` [PATCH v4 1/3] mm/hugetlb: Make hugetlb_reserve_pages() return nr of entries updated Vivek Kasireddy
2025-06-18 5:30 ` [PATCH v4 2/3] mm/memfd: Reserve hugetlb folios before allocation Vivek Kasireddy
@ 2025-06-18 5:30 ` Vivek Kasireddy
2 siblings, 0 replies; 6+ messages in thread
From: Vivek Kasireddy @ 2025-06-18 5:30 UTC (permalink / raw)
To: dri-devel, linux-mm
Cc: Vivek Kasireddy, Gerd Hoffmann, Steve Sistare, Muchun Song,
David Hildenbrand, Andrew Morton
Unlike the existing tests, this new test will create a memfd (backed
by hugetlb) and pin the folios in it (a small subset) before writing/
populating it with data. This is a valid use-case that invokes the
memfd_alloc_folio() kernel API and is expected to work unless there
aren't enough hugetlb folios to satisfy the allocation needs.
Cc: Gerd Hoffmann <kraxel@redhat.com>
Cc: Steve Sistare <steven.sistare@oracle.com>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: David Hildenbrand <david@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Vivek Kasireddy <vivek.kasireddy@intel.com>
---
.../selftests/drivers/dma-buf/udmabuf.c | 20 ++++++++++++++++++-
1 file changed, 19 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/drivers/dma-buf/udmabuf.c b/tools/testing/selftests/drivers/dma-buf/udmabuf.c
index 6062723a172e..77aa2897e79f 100644
--- a/tools/testing/selftests/drivers/dma-buf/udmabuf.c
+++ b/tools/testing/selftests/drivers/dma-buf/udmabuf.c
@@ -138,7 +138,7 @@ int main(int argc, char *argv[])
void *addr1, *addr2;
ksft_print_header();
- ksft_set_plan(6);
+ ksft_set_plan(7);
devfd = open("/dev/udmabuf", O_RDWR);
if (devfd < 0) {
@@ -248,6 +248,24 @@ int main(int argc, char *argv[])
else
ksft_test_result_pass("%s: [PASS,test-6]\n", TEST_PREFIX);
+ close(buf);
+ close(memfd);
+
+ /* same test as above but we pin first before writing to memfd */
+ page_size = getpagesize() * 512; /* 2 MB */
+ size = MEMFD_SIZE * page_size;
+ memfd = create_memfd_with_seals(size, true);
+ buf = create_udmabuf_list(devfd, memfd, size);
+ addr2 = mmap_fd(buf, NUM_PAGES * NUM_ENTRIES * getpagesize());
+ addr1 = mmap_fd(memfd, size);
+ write_to_memfd(addr1, size, 'a');
+ write_to_memfd(addr1, size, 'b');
+ ret = compare_chunks(addr1, addr2, size);
+ if (ret < 0)
+ ksft_test_result_fail("%s: [FAIL,test-7]\n", TEST_PREFIX);
+ else
+ ksft_test_result_pass("%s: [PASS,test-7]\n", TEST_PREFIX);
+
close(buf);
close(memfd);
close(devfd);
--
2.49.0
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH v4 2/3] mm/memfd: Reserve hugetlb folios before allocation
2025-06-18 5:30 ` [PATCH v4 2/3] mm/memfd: Reserve hugetlb folios before allocation Vivek Kasireddy
@ 2025-06-18 7:46 ` Oscar Salvador
2025-06-19 5:34 ` Kasireddy, Vivek
0 siblings, 1 reply; 6+ messages in thread
From: Oscar Salvador @ 2025-06-18 7:46 UTC (permalink / raw)
To: Vivek Kasireddy
Cc: dri-devel, linux-mm, Steve Sistare, Muchun Song,
David Hildenbrand, Andrew Morton
On Tue, Jun 17, 2025 at 10:30:54PM -0700, Vivek Kasireddy wrote:
> When we try to allocate a folio via alloc_hugetlb_folio_reserve(),
> we need to ensure that there is an active reservation associated
> with the allocation. Otherwise, our allocation request would fail
> if there are no active reservations made at that moment against any
> other allocations. This is because alloc_hugetlb_folio_reserve()
> checks h->resv_huge_pages before proceeding with the allocation.
>
> Therefore, to address this issue, we just need to make a reservation
> (by calling hugetlb_reserve_pages()) before we try to allocate the
> folio. This will also ensure that proper region/subpool accounting is
> done associated with our allocation.
I'm not really familiar with memfd code, but can't you make such
reservation when you create the file in alloc_file?
I see that you explicitly pass VM_NORESERVE. What's the reason for
that?
--
Oscar Salvador
SUSE Labs
^ permalink raw reply [flat|nested] 6+ messages in thread
* RE: [PATCH v4 2/3] mm/memfd: Reserve hugetlb folios before allocation
2025-06-18 7:46 ` Oscar Salvador
@ 2025-06-19 5:34 ` Kasireddy, Vivek
0 siblings, 0 replies; 6+ messages in thread
From: Kasireddy, Vivek @ 2025-06-19 5:34 UTC (permalink / raw)
To: Oscar Salvador
Cc: dri-devel@lists.freedesktop.org, linux-mm@kvack.org,
Steve Sistare, Muchun Song, David Hildenbrand, Andrew Morton
Hi Oscar,
> Subject: Re: [PATCH v4 2/3] mm/memfd: Reserve hugetlb folios before
> allocation
>
> On Tue, Jun 17, 2025 at 10:30:54PM -0700, Vivek Kasireddy wrote:
> > When we try to allocate a folio via alloc_hugetlb_folio_reserve(),
> > we need to ensure that there is an active reservation associated
> > with the allocation. Otherwise, our allocation request would fail
> > if there are no active reservations made at that moment against any
> > other allocations. This is because alloc_hugetlb_folio_reserve()
> > checks h->resv_huge_pages before proceeding with the allocation.
> >
> > Therefore, to address this issue, we just need to make a reservation
> > (by calling hugetlb_reserve_pages()) before we try to allocate the
> > folio. This will also ensure that proper region/subpool accounting is
> > done associated with our allocation.
>
> I'm not really familiar with memfd code, but can't you make such
> reservation when you create the file in alloc_file?
> I see that you explicitly pass VM_NORESERVE. What's the reason for
> that?
AFAICT, there are at-least two reasons:
- The initial size of memfd is 0 when it gets created. So, there is nothing
to reserve when hugetlb_file_setup() gets called from memfd_create().
- And, I think reservations are typically associated with allocations. In
other words, they are made on-demand, when a user is about to write
to a file (after calling mmap()).
Thanks,
Vivek
>
>
> --
> Oscar Salvador
> SUSE Labs
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2025-06-19 5:34 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-06-18 5:30 [PATCH v4 0/3] mm/memfd: Reserve hugetlb folios before allocation Vivek Kasireddy
2025-06-18 5:30 ` [PATCH v4 1/3] mm/hugetlb: Make hugetlb_reserve_pages() return nr of entries updated Vivek Kasireddy
2025-06-18 5:30 ` [PATCH v4 2/3] mm/memfd: Reserve hugetlb folios before allocation Vivek Kasireddy
2025-06-18 7:46 ` Oscar Salvador
2025-06-19 5:34 ` Kasireddy, Vivek
2025-06-18 5:30 ` [PATCH v4 3/3] selftests/udmabuf: Add a test to pin first before writing to memfd Vivek Kasireddy
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).