* [PATCH v4 1/3] mm/hugetlb: Make hugetlb_reserve_pages() return nr of entries updated
2025-06-18 5:30 [PATCH v4 0/3] mm/memfd: Reserve hugetlb folios before allocation Vivek Kasireddy
@ 2025-06-18 5:30 ` Vivek Kasireddy
2025-06-18 5:30 ` [PATCH v4 2/3] mm/memfd: Reserve hugetlb folios before allocation Vivek Kasireddy
2025-06-18 5:30 ` [PATCH v4 3/3] selftests/udmabuf: Add a test to pin first before writing to memfd Vivek Kasireddy
2 siblings, 0 replies; 6+ messages in thread
From: Vivek Kasireddy @ 2025-06-18 5:30 UTC (permalink / raw)
To: dri-devel, linux-mm
Cc: Vivek Kasireddy, Steve Sistare, Muchun Song, David Hildenbrand,
Andrew Morton
Currently, hugetlb_reserve_pages() returns a bool to indicate whether
the reservation map update for the range [from, to] was successful or
not. This is not sufficient for the case where the caller needs to
determine how many entries were updated for the range.
Therefore, have hugetlb_reserve_pages() return the number of entries
updated in the reservation map associated with the range [from, to].
Also, update the callers of hugetlb_reserve_pages() to handle the new
return value.
Cc: Steve Sistare <steven.sistare@oracle.com>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: David Hildenbrand <david@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Vivek Kasireddy <vivek.kasireddy@intel.com>
---
fs/hugetlbfs/inode.c | 8 ++++----
include/linux/hugetlb.h | 2 +-
mm/hugetlb.c | 19 +++++++++++++------
3 files changed, 18 insertions(+), 11 deletions(-)
diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
index e4de5425838d..00b2d1a032fd 100644
--- a/fs/hugetlbfs/inode.c
+++ b/fs/hugetlbfs/inode.c
@@ -150,10 +150,10 @@ static int hugetlbfs_file_mmap(struct file *file, struct vm_area_struct *vma)
if (inode->i_flags & S_PRIVATE)
vm_flags |= VM_NORESERVE;
- if (!hugetlb_reserve_pages(inode,
+ if (hugetlb_reserve_pages(inode,
vma->vm_pgoff >> huge_page_order(h),
len >> huge_page_shift(h), vma,
- vm_flags))
+ vm_flags) < 0)
goto out;
ret = 0;
@@ -1561,9 +1561,9 @@ struct file *hugetlb_file_setup(const char *name, size_t size,
inode->i_size = size;
clear_nlink(inode);
- if (!hugetlb_reserve_pages(inode, 0,
+ if (hugetlb_reserve_pages(inode, 0,
size >> huge_page_shift(hstate_inode(inode)), NULL,
- acctflag))
+ acctflag) < 0)
file = ERR_PTR(-ENOMEM);
else
file = alloc_file_pseudo(inode, mnt, name, O_RDWR,
diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index 42f374e828a2..d8310b0f36dd 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -149,7 +149,7 @@ int hugetlb_mfill_atomic_pte(pte_t *dst_pte,
uffd_flags_t flags,
struct folio **foliop);
#endif /* CONFIG_USERFAULTFD */
-bool hugetlb_reserve_pages(struct inode *inode, long from, long to,
+long hugetlb_reserve_pages(struct inode *inode, long from, long to,
struct vm_area_struct *vma,
vm_flags_t vm_flags);
long hugetlb_unreserve_pages(struct inode *inode, long start, long end,
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index f74c54ecf955..6b34152744cc 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -7267,8 +7267,15 @@ long hugetlb_change_protection(struct vm_area_struct *vma,
return pages > 0 ? (pages << h->order) : pages;
}
-/* Return true if reservation was successful, false otherwise. */
-bool hugetlb_reserve_pages(struct inode *inode,
+/*
+ * Update the reservation map for the range [from, to].
+ *
+ * Returns the number of entries that would be added to the reservation map
+ * associated with the range [from, to]. This number is greater or equal to
+ * zero. -EINVAL or -ENOMEM is returned in case of any errors.
+ */
+
+long hugetlb_reserve_pages(struct inode *inode,
long from, long to,
struct vm_area_struct *vma,
vm_flags_t vm_flags)
@@ -7283,7 +7290,7 @@ bool hugetlb_reserve_pages(struct inode *inode,
/* This should never happen */
if (from > to) {
VM_WARN(1, "%s called with a negative range\n", __func__);
- return false;
+ return -EINVAL;
}
/*
@@ -7298,7 +7305,7 @@ bool hugetlb_reserve_pages(struct inode *inode,
* without using reserves
*/
if (vm_flags & VM_NORESERVE)
- return true;
+ return 0;
/*
* Shared mappings base their reservation on the number of pages that
@@ -7405,7 +7412,7 @@ bool hugetlb_reserve_pages(struct inode *inode,
hugetlb_cgroup_put_rsvd_cgroup(h_cg);
}
}
- return true;
+ return chg;
out_put_pages:
spool_resv = chg - gbl_reserve;
@@ -7433,7 +7440,7 @@ bool hugetlb_reserve_pages(struct inode *inode,
kref_put(&resv_map->refs, resv_map_release);
set_vma_resv_map(vma, NULL);
}
- return false;
+ return chg < 0 ? chg : add < 0 ? add : -EINVAL;
}
long hugetlb_unreserve_pages(struct inode *inode, long start, long end,
--
2.49.0
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH v4 2/3] mm/memfd: Reserve hugetlb folios before allocation
2025-06-18 5:30 [PATCH v4 0/3] mm/memfd: Reserve hugetlb folios before allocation Vivek Kasireddy
2025-06-18 5:30 ` [PATCH v4 1/3] mm/hugetlb: Make hugetlb_reserve_pages() return nr of entries updated Vivek Kasireddy
@ 2025-06-18 5:30 ` Vivek Kasireddy
2025-06-18 7:46 ` Oscar Salvador
2025-06-18 5:30 ` [PATCH v4 3/3] selftests/udmabuf: Add a test to pin first before writing to memfd Vivek Kasireddy
2 siblings, 1 reply; 6+ messages in thread
From: Vivek Kasireddy @ 2025-06-18 5:30 UTC (permalink / raw)
To: dri-devel, linux-mm
Cc: Vivek Kasireddy, Steve Sistare, Muchun Song, David Hildenbrand,
Andrew Morton
When we try to allocate a folio via alloc_hugetlb_folio_reserve(),
we need to ensure that there is an active reservation associated
with the allocation. Otherwise, our allocation request would fail
if there are no active reservations made at that moment against any
other allocations. This is because alloc_hugetlb_folio_reserve()
checks h->resv_huge_pages before proceeding with the allocation.
Therefore, to address this issue, we just need to make a reservation
(by calling hugetlb_reserve_pages()) before we try to allocate the
folio. This will also ensure that proper region/subpool accounting is
done associated with our allocation.
Cc: Steve Sistare <steven.sistare@oracle.com>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: David Hildenbrand <david@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Vivek Kasireddy <vivek.kasireddy@intel.com>
---
include/linux/hugetlb.h | 5 +++++
mm/hugetlb.c | 5 -----
mm/memfd.c | 17 ++++++++++++++---
3 files changed, 19 insertions(+), 8 deletions(-)
diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index d8310b0f36dd..c6c87eae4a8d 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -740,6 +740,11 @@ extern unsigned int default_hstate_idx;
#define default_hstate (hstates[default_hstate_idx])
+static inline struct hugepage_subpool *subpool_inode(struct inode *inode)
+{
+ return HUGETLBFS_SB(inode->i_sb)->spool;
+}
+
static inline struct hugepage_subpool *hugetlb_folio_subpool(struct folio *folio)
{
return folio->_hugetlb_subpool;
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 6b34152744cc..57d85af6db3f 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -284,11 +284,6 @@ static long hugepage_subpool_put_pages(struct hugepage_subpool *spool,
return ret;
}
-static inline struct hugepage_subpool *subpool_inode(struct inode *inode)
-{
- return HUGETLBFS_SB(inode->i_sb)->spool;
-}
-
static inline struct hugepage_subpool *subpool_vma(struct vm_area_struct *vma)
{
return subpool_inode(file_inode(vma->vm_file));
diff --git a/mm/memfd.c b/mm/memfd.c
index ab367e61553d..2c861a7ac345 100644
--- a/mm/memfd.c
+++ b/mm/memfd.c
@@ -71,7 +71,6 @@ struct folio *memfd_alloc_folio(struct file *memfd, pgoff_t idx)
#ifdef CONFIG_HUGETLB_PAGE
struct folio *folio;
gfp_t gfp_mask;
- int err;
if (is_file_hugepages(memfd)) {
/*
@@ -80,12 +79,19 @@ struct folio *memfd_alloc_folio(struct file *memfd, pgoff_t idx)
* alloc from. Also, the folio will be pinned for an indefinite
* amount of time, so it is not expected to be migrated away.
*/
+ struct inode *inode = file_inode(memfd);
struct hstate *h = hstate_file(memfd);
+ int err = -ENOMEM;
+ long nr_resv;
gfp_mask = htlb_alloc_mask(h);
gfp_mask &= ~(__GFP_HIGHMEM | __GFP_MOVABLE);
idx >>= huge_page_order(h);
+ nr_resv = hugetlb_reserve_pages(inode, idx, idx + 1, NULL, 0);
+ if (nr_resv < 0)
+ return ERR_PTR(nr_resv);
+
folio = alloc_hugetlb_folio_reserve(h,
numa_node_id(),
NULL,
@@ -96,12 +102,17 @@ struct folio *memfd_alloc_folio(struct file *memfd, pgoff_t idx)
idx);
if (err) {
folio_put(folio);
- return ERR_PTR(err);
+ goto err_unresv;
}
+
+ hugetlb_set_folio_subpool(folio, subpool_inode(inode));
folio_unlock(folio);
return folio;
}
- return ERR_PTR(-ENOMEM);
+err_unresv:
+ if (nr_resv > 0)
+ hugetlb_unreserve_pages(inode, idx, idx + 1, 0);
+ return ERR_PTR(err);
}
#endif
return shmem_read_folio(memfd->f_mapping, idx);
--
2.49.0
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH v4 3/3] selftests/udmabuf: Add a test to pin first before writing to memfd
2025-06-18 5:30 [PATCH v4 0/3] mm/memfd: Reserve hugetlb folios before allocation Vivek Kasireddy
2025-06-18 5:30 ` [PATCH v4 1/3] mm/hugetlb: Make hugetlb_reserve_pages() return nr of entries updated Vivek Kasireddy
2025-06-18 5:30 ` [PATCH v4 2/3] mm/memfd: Reserve hugetlb folios before allocation Vivek Kasireddy
@ 2025-06-18 5:30 ` Vivek Kasireddy
2 siblings, 0 replies; 6+ messages in thread
From: Vivek Kasireddy @ 2025-06-18 5:30 UTC (permalink / raw)
To: dri-devel, linux-mm
Cc: Vivek Kasireddy, Gerd Hoffmann, Steve Sistare, Muchun Song,
David Hildenbrand, Andrew Morton
Unlike the existing tests, this new test will create a memfd (backed
by hugetlb) and pin the folios in it (a small subset) before writing/
populating it with data. This is a valid use-case that invokes the
memfd_alloc_folio() kernel API and is expected to work unless there
aren't enough hugetlb folios to satisfy the allocation needs.
Cc: Gerd Hoffmann <kraxel@redhat.com>
Cc: Steve Sistare <steven.sistare@oracle.com>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: David Hildenbrand <david@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Vivek Kasireddy <vivek.kasireddy@intel.com>
---
.../selftests/drivers/dma-buf/udmabuf.c | 20 ++++++++++++++++++-
1 file changed, 19 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/drivers/dma-buf/udmabuf.c b/tools/testing/selftests/drivers/dma-buf/udmabuf.c
index 6062723a172e..77aa2897e79f 100644
--- a/tools/testing/selftests/drivers/dma-buf/udmabuf.c
+++ b/tools/testing/selftests/drivers/dma-buf/udmabuf.c
@@ -138,7 +138,7 @@ int main(int argc, char *argv[])
void *addr1, *addr2;
ksft_print_header();
- ksft_set_plan(6);
+ ksft_set_plan(7);
devfd = open("/dev/udmabuf", O_RDWR);
if (devfd < 0) {
@@ -248,6 +248,24 @@ int main(int argc, char *argv[])
else
ksft_test_result_pass("%s: [PASS,test-6]\n", TEST_PREFIX);
+ close(buf);
+ close(memfd);
+
+ /* same test as above but we pin first before writing to memfd */
+ page_size = getpagesize() * 512; /* 2 MB */
+ size = MEMFD_SIZE * page_size;
+ memfd = create_memfd_with_seals(size, true);
+ buf = create_udmabuf_list(devfd, memfd, size);
+ addr2 = mmap_fd(buf, NUM_PAGES * NUM_ENTRIES * getpagesize());
+ addr1 = mmap_fd(memfd, size);
+ write_to_memfd(addr1, size, 'a');
+ write_to_memfd(addr1, size, 'b');
+ ret = compare_chunks(addr1, addr2, size);
+ if (ret < 0)
+ ksft_test_result_fail("%s: [FAIL,test-7]\n", TEST_PREFIX);
+ else
+ ksft_test_result_pass("%s: [PASS,test-7]\n", TEST_PREFIX);
+
close(buf);
close(memfd);
close(devfd);
--
2.49.0
^ permalink raw reply related [flat|nested] 6+ messages in thread