From: Jane Chu <jane.chu@oracle.com>
To: akpm@linux-foundation.org, david@kernel.org,
muchun.song@linux.dev, osalvador@suse.de
Cc: lorenzo.stoakes@oracle.com, Liam.Howlett@oracle.com,
vbabka@kernel.org, rppt@kernel.org, surenb@google.com,
mhocko@suse.com, corbet@lwn.net, skhan@linuxfoundation.org,
hughd@google.com, baolin.wang@linux.alibaba.com,
peterx@redhat.com, linux-mm@kvack.org, linux-doc@vger.kernel.org,
linux-kernel@vger.kernel.org
Subject: [PATCH 6/6] hugetlb: pass hugetlb reservation ranges in base-page indices
Date: Thu, 9 Apr 2026 17:41:57 -0600 [thread overview]
Message-ID: <20260409234158.837786-7-jane.chu@oracle.com> (raw)
In-Reply-To: <20260409234158.837786-1-jane.chu@oracle.com>
hugetlb_reserve_pages() consume indices in hugepage granularity although
some callers naturally compute offsets in PAGE_SIZE units.
Teach the reservation helpers to accept base-page index ranges and
convert to hugepage indices internally before operating on the
reservation map. This keeps the internal representation unchanged while
making the API contract more uniform for callers.
Update hugetlbfs and memfd call sites to pass base-page indices, and
adjust the documentation to describe the new calling convention. Add
alignment warnings in hugetlb_reserve_pages() to catch invalid ranges
early.
No functional changes.
Signed-off-by: Jane Chu <jane.chu@oracle.com>
---
Documentation/mm/hugetlbfs_reserv.rst | 12 +++++------
fs/hugetlbfs/inode.c | 29 ++++++++++++---------------
mm/hugetlb.c | 26 ++++++++++++++++--------
mm/memfd.c | 9 +++++----
4 files changed, 42 insertions(+), 34 deletions(-)
diff --git a/Documentation/mm/hugetlbfs_reserv.rst b/Documentation/mm/hugetlbfs_reserv.rst
index a49115db18c7..60a52b28f0b4 100644
--- a/Documentation/mm/hugetlbfs_reserv.rst
+++ b/Documentation/mm/hugetlbfs_reserv.rst
@@ -112,8 +112,8 @@ flag was specified in either the shmget() or mmap() call. If NORESERVE
was specified, then this routine returns immediately as no reservations
are desired.
-The arguments 'from' and 'to' are huge page indices into the mapping or
-underlying file. For shmget(), 'from' is always 0 and 'to' corresponds to
+The arguments 'from' and 'to' are base page indices into the mapping or
+underlying file. For shmget(), 'from' is always 0 and 'to' corresponds to
the length of the segment/mapping. For mmap(), the offset argument could
be used to specify the offset into the underlying file. In such a case,
the 'from' and 'to' arguments have been adjusted by this offset.
@@ -136,10 +136,10 @@ to indicate this VMA owns the reservations.
The reservation map is consulted to determine how many huge page reservations
are needed for the current mapping/segment. For private mappings, this is
-always the value (to - from). However, for shared mappings it is possible that
-some reservations may already exist within the range (to - from). See the
-section :ref:`Reservation Map Modifications <resv_map_modifications>`
-for details on how this is accomplished.
+always the number of huge pages covered by the range [from, to). However,
+for shared mappings it is possible that some reservations may already exist
+within the range [from, to). See the section :ref:`Reservation Map Modifications
+<resv_map_modifications>` for details on how this is accomplished.
The mapping may be associated with a subpool. If so, the subpool is consulted
to ensure there is sufficient space for the mapping. It is possible that the
diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
index a72d46ff7980..ec05ed30b70f 100644
--- a/fs/hugetlbfs/inode.c
+++ b/fs/hugetlbfs/inode.c
@@ -157,10 +157,8 @@ static int hugetlbfs_file_mmap_prepare(struct vm_area_desc *desc)
if (inode->i_flags & S_PRIVATE)
vma_flags_set(&vma_flags, VMA_NORESERVE_BIT);
- if (hugetlb_reserve_pages(inode,
- desc->pgoff >> huge_page_order(h),
- len >> huge_page_shift(h), desc,
- vma_flags) < 0)
+ if (hugetlb_reserve_pages(inode, desc->pgoff, len >> PAGE_SHIFT, desc,
+ vma_flags) < 0)
goto out;
ret = 0;
@@ -408,8 +406,8 @@ static void hugetlb_unmap_file_folio(struct hstate *h,
unsigned long v_end;
pgoff_t start, end;
- start = index * pages_per_huge_page(h);
- end = (index + 1) * pages_per_huge_page(h);
+ start = index;
+ end = start + pages_per_huge_page(h);
i_mmap_lock_write(mapping);
retry:
@@ -518,6 +516,8 @@ static void remove_inode_single_folio(struct hstate *h, struct inode *inode,
struct address_space *mapping, struct folio *folio,
pgoff_t index, bool truncate_op)
{
+ pgoff_t next_index;
+
/*
* If folio is mapped, it was faulted in after being
* unmapped in caller or hugetlb_vmdelete_list() skips
@@ -540,8 +540,9 @@ static void remove_inode_single_folio(struct hstate *h, struct inode *inode,
VM_BUG_ON_FOLIO(folio_test_hugetlb_restore_reserve(folio), folio);
hugetlb_delete_from_page_cache(folio);
if (!truncate_op) {
+ next_index = index + pages_per_huge_page(h);
if (unlikely(hugetlb_unreserve_pages(inode, index,
- index + 1, 1)))
+ next_index, 1)))
hugetlb_fix_reserve_counts(inode);
}
@@ -575,7 +576,7 @@ static void remove_inode_hugepages(struct inode *inode, loff_t lstart,
struct address_space *mapping = &inode->i_data;
const pgoff_t end = lend >> PAGE_SHIFT;
struct folio_batch fbatch;
- pgoff_t next, idx;
+ pgoff_t next;
int i, freed = 0;
bool truncate_op = (lend == LLONG_MAX);
@@ -592,9 +593,8 @@ static void remove_inode_hugepages(struct inode *inode, loff_t lstart,
/*
* Remove folio that was part of folio_batch.
*/
- idx = folio->index >> huge_page_order(h);
remove_inode_single_folio(h, inode, mapping, folio,
- idx, truncate_op);
+ folio->index, truncate_op);
freed++;
mutex_unlock(&hugetlb_fault_mutex_table[hash]);
@@ -604,9 +604,8 @@ static void remove_inode_hugepages(struct inode *inode, loff_t lstart,
}
if (truncate_op)
- (void)hugetlb_unreserve_pages(inode,
- lstart >> huge_page_shift(h),
- LONG_MAX, freed);
+ (void)hugetlb_unreserve_pages(inode, lstart >> PAGE_SHIFT,
+ LONG_MAX, freed);
}
static void hugetlbfs_evict_inode(struct inode *inode)
@@ -1561,9 +1560,7 @@ struct file *hugetlb_file_setup(const char *name, size_t size,
inode->i_size = size;
clear_nlink(inode);
- if (hugetlb_reserve_pages(inode, 0,
- size >> huge_page_shift(hstate_inode(inode)), NULL,
- acctflag) < 0)
+ if (hugetlb_reserve_pages(inode, 0, size >> PAGE_SHIFT, NULL, acctflag) < 0)
file = ERR_PTR(-ENOMEM);
else
file = alloc_file_pseudo(inode, mnt, name, O_RDWR,
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 47ef41b6fb2e..eb4ab5bd0c9f 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -6532,10 +6532,11 @@ long hugetlb_change_protection(struct vm_area_struct *vma,
}
/*
- * Update the reservation map for the range [from, to].
+ * Update the reservation map for the range [from, to) where 'from' and 'to'
+ * are base-page indices that are expected to be huge page aligned.
*
- * Returns the number of entries that would be added to the reservation map
- * associated with the range [from, to]. This number is greater or equal to
+ * Returns the number of huge pages that would be added to the reservation map
+ * associated with the range [from, to). This number is greater or equal to
* zero. -EINVAL or -ENOMEM is returned in case of any errors.
*/
@@ -6550,6 +6551,7 @@ long hugetlb_reserve_pages(struct inode *inode,
struct resv_map *resv_map;
struct hugetlb_cgroup *h_cg = NULL;
long gbl_reserve, regions_needed = 0;
+ long from_idx, to_idx;
int err;
/* This should never happen */
@@ -6558,6 +6560,12 @@ long hugetlb_reserve_pages(struct inode *inode,
return -EINVAL;
}
+ VM_WARN_ON(!IS_ALIGNED(from, 1UL << huge_page_order(h)));
+ VM_WARN_ON(!IS_ALIGNED(to, 1UL << huge_page_order(h)));
+
+ from_idx = from >> huge_page_order(h);
+ to_idx = to >> huge_page_order(h);
+
/*
* Only apply hugepage reservation if asked. At fault time, an
* attempt will be made for VM_NORESERVE to allocate a page
@@ -6580,7 +6588,7 @@ long hugetlb_reserve_pages(struct inode *inode,
*/
resv_map = inode_resv_map(inode);
- chg = region_chg(resv_map, from, to, ®ions_needed);
+ chg = region_chg(resv_map, from_idx, to_idx, ®ions_needed);
} else {
/* Private mapping. */
resv_map = resv_map_alloc();
@@ -6589,7 +6597,7 @@ long hugetlb_reserve_pages(struct inode *inode,
goto out_err;
}
- chg = to - from;
+ chg = to_idx - from_idx;
set_vma_desc_resv_map(desc, resv_map);
set_vma_desc_resv_flags(desc, HPAGE_RESV_OWNER);
@@ -6644,7 +6652,7 @@ long hugetlb_reserve_pages(struct inode *inode,
* else has to be done for private mappings here
*/
if (!desc || vma_desc_test(desc, VMA_MAYSHARE_BIT)) {
- add = region_add(resv_map, from, to, regions_needed, h, h_cg);
+ add = region_add(resv_map, from_idx, to_idx, regions_needed, h, h_cg);
if (unlikely(add < 0)) {
hugetlb_acct_memory(h, -gbl_reserve);
@@ -6712,7 +6720,7 @@ long hugetlb_reserve_pages(struct inode *inode,
* region_add failed or didn't run.
*/
if (chg >= 0 && add < 0)
- region_abort(resv_map, from, to, regions_needed);
+ region_abort(resv_map, from_idx, to_idx, regions_needed);
if (desc && is_vma_desc_resv_set(desc, HPAGE_RESV_OWNER)) {
kref_put(&resv_map->refs, resv_map_release);
set_vma_desc_resv_map(desc, NULL);
@@ -6728,13 +6736,15 @@ long hugetlb_unreserve_pages(struct inode *inode, long start, long end,
long chg = 0;
struct hugepage_subpool *spool = subpool_inode(inode);
long gbl_reserve;
+ long start_idx = start >> huge_page_order(h);
+ long end_idx = end >> huge_page_order(h);
/*
* Since this routine can be called in the evict inode path for all
* hugetlbfs inodes, resv_map could be NULL.
*/
if (resv_map) {
- chg = region_del(resv_map, start, end);
+ chg = region_del(resv_map, start_idx, end_idx);
/*
* region_del() can fail in the rare case where a region
* must be split and another region descriptor can not be
diff --git a/mm/memfd.c b/mm/memfd.c
index 56c8833c4195..59c174c7533c 100644
--- a/mm/memfd.c
+++ b/mm/memfd.c
@@ -80,14 +80,15 @@ struct folio *memfd_alloc_folio(struct file *memfd, pgoff_t index)
struct inode *inode = file_inode(memfd);
struct hstate *h = hstate_file(memfd);
long nr_resv;
- pgoff_t idx;
+ pgoff_t next_index;
int err = -ENOMEM;
gfp_mask = htlb_alloc_mask(h);
gfp_mask &= ~(__GFP_HIGHMEM | __GFP_MOVABLE);
- idx = index >> huge_page_order(h);
+ next_index = index + pages_per_huge_page(h);
- nr_resv = hugetlb_reserve_pages(inode, idx, idx + 1, NULL, EMPTY_VMA_FLAGS);
+ nr_resv = hugetlb_reserve_pages(inode, index, next_index, NULL,
+ EMPTY_VMA_FLAGS);
if (nr_resv < 0)
return ERR_PTR(nr_resv);
@@ -137,7 +138,7 @@ struct folio *memfd_alloc_folio(struct file *memfd, pgoff_t index)
}
err_unresv:
if (nr_resv > 0)
- hugetlb_unreserve_pages(inode, idx, idx + 1, 0);
+ hugetlb_unreserve_pages(inode, index, next_index, 0);
return ERR_PTR(err);
}
#endif
--
2.43.5
prev parent reply other threads:[~2026-04-09 23:42 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-09 23:41 [PATCH 0/6] hugetlb: normalize exported interfaces to use base-page indices Jane Chu
2026-04-09 23:41 ` [PATCH 1/6] hugetlb: open-code hugetlb folio lookup index conversion Jane Chu
2026-04-09 23:41 ` [PATCH 2/6] hugetlb: remove the hugetlb_linear_page_index() helper Jane Chu
2026-04-09 23:41 ` [PATCH 3/6] hugetlb: make hugetlb_fault_mutex_hash() take PAGE_SIZE index Jane Chu
2026-04-09 23:41 ` [PATCH 4/6] hugetlb: drop vma_hugecache_offset() in favor of linear_page_index() Jane Chu
2026-04-09 23:41 ` [PATCH 5/6] hugetlb: make hugetlb_add_to_page_cache() use PAGE_SIZE-based index Jane Chu
2026-04-09 23:41 ` Jane Chu [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260409234158.837786-7-jane.chu@oracle.com \
--to=jane.chu@oracle.com \
--cc=Liam.Howlett@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=baolin.wang@linux.alibaba.com \
--cc=corbet@lwn.net \
--cc=david@kernel.org \
--cc=hughd@google.com \
--cc=linux-doc@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lorenzo.stoakes@oracle.com \
--cc=mhocko@suse.com \
--cc=muchun.song@linux.dev \
--cc=osalvador@suse.de \
--cc=peterx@redhat.com \
--cc=rppt@kernel.org \
--cc=skhan@linuxfoundation.org \
--cc=surenb@google.com \
--cc=vbabka@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox