* [PATCH v2 0/1] change ->index to PAGE_SIZE for hugetlb pages
@ 2023-07-10 23:04 Sidhartha Kumar
2023-07-10 23:04 ` [PATCH v2 1/1] mm/filemap: remove hugetlb special casing in filemap.c Sidhartha Kumar
2023-07-20 0:00 ` [PATCH v2 0/1] change ->index to PAGE_SIZE for hugetlb pages Mike Kravetz
0 siblings, 2 replies; 6+ messages in thread
From: Sidhartha Kumar @ 2023-07-10 23:04 UTC (permalink / raw)
To: linux-kernel, linux-mm
Cc: akpm, willy, songmuchun, mike.kravetz, david, Sidhartha Kumar
========================== OVERVIEW ========================================
This patchset attempts to implement a listed filemap TODO which is
changing hugetlb folios to have ->index in PAGE_SIZE. This simplifies many
functions within filemap.c as they have to special case hugetlb pages.
From the RFC v1[1], Mike pointed out that hugetlb will still have to maintain
a huge page sized index as it is used for the reservation map and the hash
function for the hugetlb mutex table.
This patchset adds new wrappers for hugetlb code to to interact with the
page cache. These wrappers calculate a linear page index as this is now
what the page cache expects for hugetlb pages.
From the discussion on HGM for hugetlb[3], there is a want to remove hugetlb
special casing throughout the core mm code. This series accomplishes
a part of this by shifting complexity from filemap.c to hugetlb.c. There
are still checks for hugetlb within the filemap code as cgroup accounting
and hugetlb accounting are special cased as well.
=========================== PERFORMANCE =====================================
6.4.0-rc5
@hugetlb_add_to_page_cache:
[512, 1K) 7518 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
[1K, 2K) 158 |@ |
[2K, 4K) 30 | |
[4K, 8K) 6 | |
[8K, 16K) 9 |
6.5.0-rc1 + this patch series
@hugetlb_add_to_page_cache:
[512, 1K) 6345 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
[1K, 2K) 1308 |@@@@@@@@@@ |
[2K, 4K) 39 | |
[4K, 8K) 20 | |
[8K, 16K) 8 | |
[16K, 32K) 1 | |
6.4.0-rc5
@__filemap_get_folio:
[256, 512) 11292 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
[512, 1K) 4615 |@@@@@@@@@@@@@@@@@@@@@ |
[1K, 2K) 960 |@@@@ |
[2K, 4K) 188 | |
[4K, 8K) 68 | |
[8K, 16K) 14 | |
[16K, 32K) 4 | |
[2G, 4G) 4 | |
6.5.0-rc1 + this patch series
@__filemap_get_folio:
@get_folio_ns:
[128, 256) 13 | |
[256, 512) 11131 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
[512, 1K) 5544 |@@@@@@@@@@@@@@@@@@@@@@@@@ |
[1K, 2K) 996 |@@@@ |
[2K, 4K) 317 |@ |
[4K, 8K) 76 | |
[8K, 16K) 23 | |
[16K, 32K) 6 | |
[32K, 64K) 1 | |
[64K, 128K) 0 | |
[128K, 256K) 0 | |
[256K, 512K) 0 | |
[512K, 1M) 0 | |
[1M, 2M) 0 | |
[2M, 4M) 0 | |
[4M, 8M) 0 | |
[8M, 16M) 0 | |
[16M, 32M) 0 | |
[32M, 64M) 0 | |
[64M, 128M) 0 | |
[128M, 256M) 0 | |
[256M, 512M) 0 | |
[512M, 1G) 0 | |
[1G, 2G) 0 | |
[2G, 4G) 3 |
=========================== TESTING ==========================================
This series passes the LTP hugetlb test cases as well as the libhugetlbfs tests.
********** TEST SUMMARY
* 2M
* 32-bit 64-bit
* Total testcases: 110 113
* Skipped: 0 0
* PASS: 107 113
* FAIL: 0 0
* Killed by signal: 3 0
* Bad configuration: 0 0
* Expected FAIL: 0 0
* Unexpected PASS: 0 0
* Test not present: 0 0
* Strange test result: 0 0
**********
RFC v2[2]-> v1:
-cleanup code style
RFC v1 -> v2
-change direction of series to maintain both huge and base page size index
rather than try to get rid of all references to a huge page sized index.
v1 -> v2
- squash seperate filemap and hugetlb patches into one to allow for bisection
per Matthew
- get rid of page_to_index()
- fix errors in hugetlb_fallocate() and remove_inode_hugepages()
rebased on 07/10/2023 mm-unstable
Sidhartha Kumar (1):
mm/filemap: remove hugetlb special casing in filemap.c
fs/hugetlbfs/inode.c | 10 +++++-----
include/linux/hugetlb.h | 12 ++++++++++++
include/linux/pagemap.h | 26 ++------------------------
mm/filemap.c | 36 +++++++++++-------------------------
mm/hugetlb.c | 11 ++++++-----
5 files changed, 36 insertions(+), 59 deletions(-)
--
2.41.0
^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH v2 1/1] mm/filemap: remove hugetlb special casing in filemap.c
2023-07-10 23:04 [PATCH v2 0/1] change ->index to PAGE_SIZE for hugetlb pages Sidhartha Kumar
@ 2023-07-10 23:04 ` Sidhartha Kumar
2023-07-11 19:32 ` Andrew Morton
2023-07-21 20:22 ` Mike Kravetz
2023-07-20 0:00 ` [PATCH v2 0/1] change ->index to PAGE_SIZE for hugetlb pages Mike Kravetz
1 sibling, 2 replies; 6+ messages in thread
From: Sidhartha Kumar @ 2023-07-10 23:04 UTC (permalink / raw)
To: linux-kernel, linux-mm
Cc: akpm, willy, songmuchun, mike.kravetz, david, Sidhartha Kumar
Remove special cased hugetlb handling code within the page cache by
changing the granularity of each index to the base page size rather than
the huge page size. Adds new wrappers for hugetlb code to to interact with the
page cache which convert to a linear index.
Signed-off-by: Sidhartha Kumar <sidhartha.kumar@oracle.com>
---
fs/hugetlbfs/inode.c | 10 +++++-----
include/linux/hugetlb.h | 12 ++++++++++++
include/linux/pagemap.h | 26 ++------------------------
mm/filemap.c | 36 +++++++++++-------------------------
mm/hugetlb.c | 11 ++++++-----
5 files changed, 36 insertions(+), 59 deletions(-)
diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
index c2b807d37f852..d78c71dacf0d4 100644
--- a/fs/hugetlbfs/inode.c
+++ b/fs/hugetlbfs/inode.c
@@ -663,20 +663,20 @@ static void remove_inode_hugepages(struct inode *inode, loff_t lstart,
struct hstate *h = hstate_inode(inode);
struct address_space *mapping = &inode->i_data;
const pgoff_t start = lstart >> huge_page_shift(h);
- const pgoff_t end = lend >> huge_page_shift(h);
+ const pgoff_t end = lend >> PAGE_SHIFT;
struct folio_batch fbatch;
pgoff_t next, index;
int i, freed = 0;
bool truncate_op = (lend == LLONG_MAX);
folio_batch_init(&fbatch);
- next = start;
+ next = lstart >> PAGE_SHIFT;
while (filemap_get_folios(mapping, &next, end - 1, &fbatch)) {
for (i = 0; i < folio_batch_count(&fbatch); ++i) {
struct folio *folio = fbatch.folios[i];
u32 hash = 0;
- index = folio->index;
+ index = folio->index >> huge_page_order(h);
hash = hugetlb_fault_mutex_hash(mapping, index);
mutex_lock(&hugetlb_fault_mutex_table[hash]);
@@ -742,7 +742,7 @@ static void hugetlbfs_zero_partial_page(struct hstate *h,
pgoff_t idx = start >> huge_page_shift(h);
struct folio *folio;
- folio = filemap_lock_folio(mapping, idx);
+ folio = filemap_lock_hugetlb_folio(h, mapping, idx);
if (IS_ERR(folio))
return;
@@ -887,7 +887,7 @@ static long hugetlbfs_fallocate(struct file *file, int mode, loff_t offset,
mutex_lock(&hugetlb_fault_mutex_table[hash]);
/* See if already present in mapping to avoid alloc/free */
- folio = filemap_get_folio(mapping, index);
+ folio = filemap_get_folio(mapping, index << huge_page_order(h));
if (!IS_ERR(folio)) {
folio_put(folio);
mutex_unlock(&hugetlb_fault_mutex_table[hash]);
diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index 623e98d62df3a..57f21279c529f 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -811,6 +811,12 @@ static inline unsigned int blocks_per_huge_page(struct hstate *h)
return huge_page_size(h) / 512;
}
+static inline struct folio *filemap_lock_hugetlb_folio(struct hstate *h,
+ struct address_space *mapping, pgoff_t idx)
+{
+ return filemap_lock_folio(mapping, idx << huge_page_order(h));
+}
+
#include <asm/hugetlb.h>
#ifndef is_hugepage_only_range
@@ -1024,6 +1030,12 @@ static inline struct hugepage_subpool *hugetlb_folio_subpool(struct folio *folio
return NULL;
}
+static inline struct folio *filemap_lock_hugetlb_folio(struct hstate *h,
+ struct address_space *mapping, pgoff_t idx)
+{
+ return NULL;
+}
+
static inline int isolate_or_dissolve_huge_page(struct page *page,
struct list_head *list)
{
diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 69b99b61ed72c..71d969557bd94 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -739,9 +739,6 @@ static inline struct page *folio_file_page(struct folio *folio, pgoff_t index)
*/
static inline bool folio_contains(struct folio *folio, pgoff_t index)
{
- /* HugeTLBfs indexes the page cache in units of hpage_size */
- if (folio_test_hugetlb(folio))
- return folio->index == index;
return index - folio_index(folio) < folio_nr_pages(folio);
}
@@ -799,10 +796,9 @@ static inline struct folio *read_mapping_folio(struct address_space *mapping,
}
/*
- * Get index of the page within radix-tree (but not for hugetlb pages).
- * (TODO: remove once hugetlb pages will have ->index in PAGE_SIZE)
+ * Get the offset in PAGE_SIZE (even for hugetlb pages).
*/
-static inline pgoff_t page_to_index(struct page *page)
+static inline pgoff_t page_to_pgoff(struct page *page)
{
struct page *head;
@@ -817,19 +813,6 @@ static inline pgoff_t page_to_index(struct page *page)
return head->index + page - head;
}
-extern pgoff_t hugetlb_basepage_index(struct page *page);
-
-/*
- * Get the offset in PAGE_SIZE (even for hugetlb pages).
- * (TODO: hugetlb pages should have ->index in PAGE_SIZE)
- */
-static inline pgoff_t page_to_pgoff(struct page *page)
-{
- if (unlikely(PageHuge(page)))
- return hugetlb_basepage_index(page);
- return page_to_index(page);
-}
-
/*
* Return byte-offset into filesystem object for page.
*/
@@ -866,12 +849,9 @@ static inline loff_t folio_file_pos(struct folio *folio)
/*
* Get the offset in PAGE_SIZE (even for hugetlb folios).
- * (TODO: hugetlb folios should have ->index in PAGE_SIZE)
*/
static inline pgoff_t folio_pgoff(struct folio *folio)
{
- if (unlikely(folio_test_hugetlb(folio)))
- return hugetlb_basepage_index(&folio->page);
return folio->index;
}
@@ -882,8 +862,6 @@ static inline pgoff_t linear_page_index(struct vm_area_struct *vma,
unsigned long address)
{
pgoff_t pgoff;
- if (unlikely(is_vm_hugetlb_page(vma)))
- return linear_hugepage_index(vma, address);
pgoff = (address - vma->vm_start) >> PAGE_SHIFT;
pgoff += vma->vm_pgoff;
return pgoff;
diff --git a/mm/filemap.c b/mm/filemap.c
index 8040545954bc4..12f51e1b0f4d2 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -134,11 +134,8 @@ static void page_cache_delete(struct address_space *mapping,
mapping_set_update(&xas, mapping);
- /* hugetlb pages are represented by a single entry in the xarray */
- if (!folio_test_hugetlb(folio)) {
- xas_set_order(&xas, folio->index, folio_order(folio));
- nr = folio_nr_pages(folio);
- }
+ xas_set_order(&xas, folio->index, folio_order(folio));
+ nr = folio_nr_pages(folio);
VM_BUG_ON_FOLIO(!folio_test_locked(folio), folio);
@@ -237,7 +234,7 @@ void filemap_free_folio(struct address_space *mapping, struct folio *folio)
if (free_folio)
free_folio(folio);
- if (folio_test_large(folio) && !folio_test_hugetlb(folio))
+ if (folio_test_large(folio))
refs = folio_nr_pages(folio);
folio_put_refs(folio, refs);
}
@@ -858,14 +855,15 @@ noinline int __filemap_add_folio(struct address_space *mapping,
if (!huge) {
int error = mem_cgroup_charge(folio, NULL, gfp);
- VM_BUG_ON_FOLIO(index & (folio_nr_pages(folio) - 1), folio);
if (error)
return error;
charged = true;
- xas_set_order(&xas, index, folio_order(folio));
- nr = folio_nr_pages(folio);
}
+ VM_BUG_ON_FOLIO(index & (folio_nr_pages(folio) - 1), folio);
+ xas_set_order(&xas, index, folio_order(folio));
+ nr = folio_nr_pages(folio);
+
gfp &= GFP_RECLAIM_MASK;
folio_ref_add(folio, nr);
folio->mapping = mapping;
@@ -2038,7 +2036,7 @@ unsigned find_get_entries(struct address_space *mapping, pgoff_t *start,
int idx = folio_batch_count(fbatch) - 1;
folio = fbatch->folios[idx];
- if (!xa_is_value(folio) && !folio_test_hugetlb(folio))
+ if (!xa_is_value(folio))
nr = folio_nr_pages(folio);
*start = indices[idx] + nr;
}
@@ -2102,7 +2100,7 @@ unsigned find_lock_entries(struct address_space *mapping, pgoff_t *start,
int idx = folio_batch_count(fbatch) - 1;
folio = fbatch->folios[idx];
- if (!xa_is_value(folio) && !folio_test_hugetlb(folio))
+ if (!xa_is_value(folio))
nr = folio_nr_pages(folio);
*start = indices[idx] + nr;
}
@@ -2143,9 +2141,6 @@ unsigned filemap_get_folios(struct address_space *mapping, pgoff_t *start,
continue;
if (!folio_batch_add(fbatch, folio)) {
unsigned long nr = folio_nr_pages(folio);
-
- if (folio_test_hugetlb(folio))
- nr = 1;
*start = folio->index + nr;
goto out;
}
@@ -2171,7 +2166,7 @@ EXPORT_SYMBOL(filemap_get_folios);
static inline
bool folio_more_pages(struct folio *folio, pgoff_t index, pgoff_t max)
{
- if (!folio_test_large(folio) || folio_test_hugetlb(folio))
+ if (!folio_test_large(folio))
return false;
if (index >= max)
return false;
@@ -2221,9 +2216,6 @@ unsigned filemap_get_folios_contig(struct address_space *mapping,
if (!folio_batch_add(fbatch, folio)) {
nr = folio_nr_pages(folio);
-
- if (folio_test_hugetlb(folio))
- nr = 1;
*start = folio->index + nr;
goto out;
}
@@ -2240,10 +2232,7 @@ unsigned filemap_get_folios_contig(struct address_space *mapping,
if (nr) {
folio = fbatch->folios[nr - 1];
- if (folio_test_hugetlb(folio))
- *start = folio->index + 1;
- else
- *start = folio_next_index(folio);
+ *start = folio->index + folio_nr_pages(folio);
}
out:
rcu_read_unlock();
@@ -2281,9 +2270,6 @@ unsigned filemap_get_folios_tag(struct address_space *mapping, pgoff_t *start,
continue;
if (!folio_batch_add(fbatch, folio)) {
unsigned long nr = folio_nr_pages(folio);
-
- if (folio_test_hugetlb(folio))
- nr = 1;
*start = folio->index + nr;
goto out;
}
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index e4a28ce0667f1..71d18bb76b4df 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -952,7 +952,7 @@ static long region_count(struct resv_map *resv, long f, long t)
/*
* Convert the address within this vma to the page offset within
- * the mapping, in pagecache page units; huge pages here.
+ * the mapping, huge page units here.
*/
static pgoff_t vma_hugecache_offset(struct hstate *h,
struct vm_area_struct *vma, unsigned long address)
@@ -5724,7 +5724,7 @@ static bool hugetlbfs_pagecache_present(struct hstate *h,
struct vm_area_struct *vma, unsigned long address)
{
struct address_space *mapping = vma->vm_file->f_mapping;
- pgoff_t idx = vma_hugecache_offset(h, vma, address);
+ pgoff_t idx = linear_page_index(vma, address);
struct folio *folio;
folio = filemap_get_folio(mapping, idx);
@@ -5741,6 +5741,7 @@ int hugetlb_add_to_page_cache(struct folio *folio, struct address_space *mapping
struct hstate *h = hstate_inode(inode);
int err;
+ idx <<= huge_page_order(h);
__folio_set_locked(folio);
err = __filemap_add_folio(mapping, folio, idx, GFP_KERNEL, NULL);
@@ -5848,7 +5849,7 @@ static vm_fault_t hugetlb_no_page(struct mm_struct *mm,
* before we get page_table_lock.
*/
new_folio = false;
- folio = filemap_lock_folio(mapping, idx);
+ folio = filemap_lock_hugetlb_folio(h, mapping, idx);
if (IS_ERR(folio)) {
size = i_size_read(mapping->host) >> huge_page_shift(h);
if (idx >= size)
@@ -6151,7 +6152,7 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
/* Just decrements count, does not deallocate */
vma_end_reservation(h, vma, haddr);
- pagecache_folio = filemap_lock_folio(mapping, idx);
+ pagecache_folio = filemap_lock_hugetlb_folio(h, mapping, idx);
if (IS_ERR(pagecache_folio))
pagecache_folio = NULL;
}
@@ -6283,7 +6284,7 @@ int hugetlb_mfill_atomic_pte(pte_t *dst_pte,
if (is_continue) {
ret = -EFAULT;
- folio = filemap_lock_folio(mapping, idx);
+ folio = filemap_lock_hugetlb_folio(h, mapping, idx);
if (IS_ERR(folio))
goto out;
folio_in_pagecache = true;
--
2.41.0
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH v2 1/1] mm/filemap: remove hugetlb special casing in filemap.c
2023-07-10 23:04 ` [PATCH v2 1/1] mm/filemap: remove hugetlb special casing in filemap.c Sidhartha Kumar
@ 2023-07-11 19:32 ` Andrew Morton
2023-07-21 20:22 ` Mike Kravetz
1 sibling, 0 replies; 6+ messages in thread
From: Andrew Morton @ 2023-07-11 19:32 UTC (permalink / raw)
To: Sidhartha Kumar
Cc: linux-kernel, linux-mm, willy, songmuchun, mike.kravetz, david
On Mon, 10 Jul 2023 16:04:50 -0700 Sidhartha Kumar <sidhartha.kumar@oracle.com> wrote:
> Remove special cased hugetlb handling code within the page cache by
> changing the granularity of each index to the base page size rather than
> the huge page size. Adds new wrappers for hugetlb code to to interact with the
> page cache which convert to a linear index.
folio_more_pages() was just removed by "filemap: add
filemap_map_folio_range()"
(https://lkml.kernel.org/r/20230710204339.3554919-35-willy@infradead.org).
It looks like simply dropping that hunk is OK, but please check.
I'll be pushing this all out in a couple hours I expect.
However the series which contains "filemap: add
filemap_map_folio_range()" might be about to be dropped because of
possible s390 breakage.
Also, I don't think it's necessary to have a [0/N] intro for a single
patch. Maybe crunch all the (useful) info into the single patch's
changelog?
Thanks.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v2 0/1] change ->index to PAGE_SIZE for hugetlb pages
2023-07-10 23:04 [PATCH v2 0/1] change ->index to PAGE_SIZE for hugetlb pages Sidhartha Kumar
2023-07-10 23:04 ` [PATCH v2 1/1] mm/filemap: remove hugetlb special casing in filemap.c Sidhartha Kumar
@ 2023-07-20 0:00 ` Mike Kravetz
2023-07-22 4:05 ` Matthew Wilcox
1 sibling, 1 reply; 6+ messages in thread
From: Mike Kravetz @ 2023-07-20 0:00 UTC (permalink / raw)
To: Sidhartha Kumar; +Cc: linux-kernel, linux-mm, akpm, willy, songmuchun, david
On 07/10/23 16:04, Sidhartha Kumar wrote:
> ========================== OVERVIEW ========================================
> This patchset attempts to implement a listed filemap TODO which is
> changing hugetlb folios to have ->index in PAGE_SIZE. This simplifies many
> functions within filemap.c as they have to special case hugetlb pages.
> From the RFC v1[1], Mike pointed out that hugetlb will still have to maintain
> a huge page sized index as it is used for the reservation map and the hash
> function for the hugetlb mutex table.
>
> This patchset adds new wrappers for hugetlb code to to interact with the
> page cache. These wrappers calculate a linear page index as this is now
> what the page cache expects for hugetlb pages.
>
> From the discussion on HGM for hugetlb[3], there is a want to remove hugetlb
> special casing throughout the core mm code. This series accomplishes
> a part of this by shifting complexity from filemap.c to hugetlb.c. There
> are still checks for hugetlb within the filemap code as cgroup accounting
> and hugetlb accounting are special cased as well.
>
> =========================== PERFORMANCE =====================================
Hi Sid,
Sorry for being dense but can you tell me what the below performance
information means. My concern with such a change would be any noticeable
difference in populating a large (up to TB) hugetlb file. My guess is
that it is going to take longer unless xarray is optimized for this.
We do have users that create and pre-populate hugetlb files this big.
Just want to make sure there are no surprises for them.
--
Mike Kravetz
> 6.4.0-rc5
> @hugetlb_add_to_page_cache:
> [512, 1K) 7518 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
> [1K, 2K) 158 |@ |
> [2K, 4K) 30 | |
> [4K, 8K) 6 | |
> [8K, 16K) 9 |
>
> 6.5.0-rc1 + this patch series
> @hugetlb_add_to_page_cache:
> [512, 1K) 6345 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
> [1K, 2K) 1308 |@@@@@@@@@@ |
> [2K, 4K) 39 | |
> [4K, 8K) 20 | |
> [8K, 16K) 8 | |
> [16K, 32K) 1 | |
>
> 6.4.0-rc5
> @__filemap_get_folio:
> [256, 512) 11292 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
> [512, 1K) 4615 |@@@@@@@@@@@@@@@@@@@@@ |
> [1K, 2K) 960 |@@@@ |
> [2K, 4K) 188 | |
> [4K, 8K) 68 | |
> [8K, 16K) 14 | |
> [16K, 32K) 4 | |
> [2G, 4G) 4 | |
>
> 6.5.0-rc1 + this patch series
> @__filemap_get_folio:
> @get_folio_ns:
> [128, 256) 13 | |
> [256, 512) 11131 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
> [512, 1K) 5544 |@@@@@@@@@@@@@@@@@@@@@@@@@ |
> [1K, 2K) 996 |@@@@ |
> [2K, 4K) 317 |@ |
> [4K, 8K) 76 | |
> [8K, 16K) 23 | |
> [16K, 32K) 6 | |
> [32K, 64K) 1 | |
> [64K, 128K) 0 | |
> [128K, 256K) 0 | |
> [256K, 512K) 0 | |
> [512K, 1M) 0 | |
> [1M, 2M) 0 | |
> [2M, 4M) 0 | |
> [4M, 8M) 0 | |
> [8M, 16M) 0 | |
> [16M, 32M) 0 | |
> [32M, 64M) 0 | |
> [64M, 128M) 0 | |
> [128M, 256M) 0 | |
> [256M, 512M) 0 | |
> [512M, 1G) 0 | |
> [1G, 2G) 0 | |
> [2G, 4G) 3 |
>
> =========================== TESTING ==========================================
> This series passes the LTP hugetlb test cases as well as the libhugetlbfs tests.
>
> ********** TEST SUMMARY
> * 2M
> * 32-bit 64-bit
> * Total testcases: 110 113
> * Skipped: 0 0
> * PASS: 107 113
> * FAIL: 0 0
> * Killed by signal: 3 0
> * Bad configuration: 0 0
> * Expected FAIL: 0 0
> * Unexpected PASS: 0 0
> * Test not present: 0 0
> * Strange test result: 0 0
> **********
>
>
>
> RFC v2[2]-> v1:
> -cleanup code style
>
> RFC v1 -> v2
> -change direction of series to maintain both huge and base page size index
> rather than try to get rid of all references to a huge page sized index.
>
> v1 -> v2
> - squash seperate filemap and hugetlb patches into one to allow for bisection
> per Matthew
> - get rid of page_to_index()
> - fix errors in hugetlb_fallocate() and remove_inode_hugepages()
>
>
> rebased on 07/10/2023 mm-unstable
>
>
> Sidhartha Kumar (1):
> mm/filemap: remove hugetlb special casing in filemap.c
>
> fs/hugetlbfs/inode.c | 10 +++++-----
> include/linux/hugetlb.h | 12 ++++++++++++
> include/linux/pagemap.h | 26 ++------------------------
> mm/filemap.c | 36 +++++++++++-------------------------
> mm/hugetlb.c | 11 ++++++-----
> 5 files changed, 36 insertions(+), 59 deletions(-)
>
> --
> 2.41.0
>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v2 1/1] mm/filemap: remove hugetlb special casing in filemap.c
2023-07-10 23:04 ` [PATCH v2 1/1] mm/filemap: remove hugetlb special casing in filemap.c Sidhartha Kumar
2023-07-11 19:32 ` Andrew Morton
@ 2023-07-21 20:22 ` Mike Kravetz
1 sibling, 0 replies; 6+ messages in thread
From: Mike Kravetz @ 2023-07-21 20:22 UTC (permalink / raw)
To: Sidhartha Kumar; +Cc: linux-kernel, linux-mm, akpm, willy, songmuchun, david
On 07/10/23 16:04, Sidhartha Kumar wrote:
> Remove special cased hugetlb handling code within the page cache by
> changing the granularity of each index to the base page size rather than
> the huge page size. Adds new wrappers for hugetlb code to to interact with the
> page cache which convert to a linear index.
>
> Signed-off-by: Sidhartha Kumar <sidhartha.kumar@oracle.com>
> ---
> fs/hugetlbfs/inode.c | 10 +++++-----
> include/linux/hugetlb.h | 12 ++++++++++++
> include/linux/pagemap.h | 26 ++------------------------
> mm/filemap.c | 36 +++++++++++-------------------------
> mm/hugetlb.c | 11 ++++++-----
> 5 files changed, 36 insertions(+), 59 deletions(-)
>
I still want to see a better explanation of the performance impact of
this change as previously stated. However, I did take a closer look at
the actual code changes and could not find any issues. One suggestion
noted below.
> diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
> index c2b807d37f852..d78c71dacf0d4 100644
> --- a/fs/hugetlbfs/inode.c
> +++ b/fs/hugetlbfs/inode.c
> @@ -663,20 +663,20 @@ static void remove_inode_hugepages(struct inode *inode, loff_t lstart,
> struct hstate *h = hstate_inode(inode);
> struct address_space *mapping = &inode->i_data;
> const pgoff_t start = lstart >> huge_page_shift(h);
> - const pgoff_t end = lend >> huge_page_shift(h);
> + const pgoff_t end = lend >> PAGE_SHIFT;
The code is correct, but when looking at this it 'appears' wrong to have
start and end in different size units. start is only used in the statement:
if (truncate_op)
(void)hugetlb_unreserve_pages(inode, start, LONG_MAX, freed);
So, to avoid confusion it might be better getting rid of start and code the
above as:
if (truncate_op)
(void)hugetlb_unreserve_pages(inode,
lstart >> huge_page_shift(h),
LONG_MAX, freed);
> struct folio_batch fbatch;
> pgoff_t next, index;
> int i, freed = 0;
> bool truncate_op = (lend == LLONG_MAX);
>
> folio_batch_init(&fbatch);
> - next = start;
> + next = lstart >> PAGE_SHIFT;
> while (filemap_get_folios(mapping, &next, end - 1, &fbatch)) {
> for (i = 0; i < folio_batch_count(&fbatch); ++i) {
> struct folio *folio = fbatch.folios[i];
> u32 hash = 0;
>
> - index = folio->index;
> + index = folio->index >> huge_page_order(h);
> hash = hugetlb_fault_mutex_hash(mapping, index);
> mutex_lock(&hugetlb_fault_mutex_table[hash]);
>
--
Mike Kravetz
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v2 0/1] change ->index to PAGE_SIZE for hugetlb pages
2023-07-20 0:00 ` [PATCH v2 0/1] change ->index to PAGE_SIZE for hugetlb pages Mike Kravetz
@ 2023-07-22 4:05 ` Matthew Wilcox
0 siblings, 0 replies; 6+ messages in thread
From: Matthew Wilcox @ 2023-07-22 4:05 UTC (permalink / raw)
To: Mike Kravetz
Cc: Sidhartha Kumar, linux-kernel, linux-mm, akpm, songmuchun, david
On Wed, Jul 19, 2023 at 05:00:11PM -0700, Mike Kravetz wrote:
> On 07/10/23 16:04, Sidhartha Kumar wrote:
> > ========================== OVERVIEW ========================================
> > This patchset attempts to implement a listed filemap TODO which is
> > changing hugetlb folios to have ->index in PAGE_SIZE. This simplifies many
> > functions within filemap.c as they have to special case hugetlb pages.
> > From the RFC v1[1], Mike pointed out that hugetlb will still have to maintain
> > a huge page sized index as it is used for the reservation map and the hash
> > function for the hugetlb mutex table.
> >
> > This patchset adds new wrappers for hugetlb code to to interact with the
> > page cache. These wrappers calculate a linear page index as this is now
> > what the page cache expects for hugetlb pages.
> >
> > From the discussion on HGM for hugetlb[3], there is a want to remove hugetlb
> > special casing throughout the core mm code. This series accomplishes
> > a part of this by shifting complexity from filemap.c to hugetlb.c. There
> > are still checks for hugetlb within the filemap code as cgroup accounting
> > and hugetlb accounting are special cased as well.
> >
> > =========================== PERFORMANCE =====================================
>
> Hi Sid,
>
> Sorry for being dense but can you tell me what the below performance
> information means. My concern with such a change would be any noticeable
> difference in populating a large (up to TB) hugetlb file. My guess is
> that it is going to take longer unless xarray is optimized for this.
>
> We do have users that create and pre-populate hugetlb files this big.
> Just want to make sure there are no surprises for them.
It's Going To Depend. Annoyingly.
Let's say you're using 1GB pages on a 4kB PAGE_SIZE machine. That's an
order-18 folio, so we end up skipping three layers of the tree, and if
you're going up to 1TB, it's structured:
root -> node (shift 30) -> node (shift 24) -> entry
-> entry (...)
-> node (shift 24) -> entry
(...)
(...)
This is essentially no different from before where each 1GB page would
occupy a single entry. It's just that it now occupies 2^18 entries,
and everything in the tree has a different label.
Where you will (may?) see a difference is with the 2MB entries.
An order-9 page doesn't quite fit with the order-6 nodes in the tree,
so it looks like this:
root -> node (s30) -> node (s24) -> node (s18) -> node (s12) -> entry 0
-> sibling
-> sibling
(...)
-> entry 8
-> sibling
-> sibling
(...)
so all of a sudden the tree is 8x as big as it used to be. The upside
is that we lose all the calculations from filemap.c/pagemap.h. It's a
lot better than it was perhaps five years ago when each 2MB page would
occupy 512 entries, but 8 entries is still worse than 1.
Could we do better? Undoubtedly. We could have variable shifts & node
sizes in the tree so that we perhaps had an s18 node that was 8x as large
(4160 bytes), and then each order-9 entry in the tree would occupy one
entry in that special large node. I've been reluctant to introduce such
a beast without strong evidence it would help. Or we could introduce a
small s12 node which could only store 8 entries (again an order-9 entry
would occupy one entry in such a special node).
These are things which would only benefit hugetlbfs, so there's a bit
of a chicken-and-egg problem; no demand for the feature until the work
is done, and the work maybe performs badly until the feature exists.
And then some architectures have other orders for their huge pages.
Order 11 is probably the worst possibility to exist (or in general 6n -
1), but I haven't done a detailed survey to figure out if anyone supports
such a thing.
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2023-07-22 4:05 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-07-10 23:04 [PATCH v2 0/1] change ->index to PAGE_SIZE for hugetlb pages Sidhartha Kumar
2023-07-10 23:04 ` [PATCH v2 1/1] mm/filemap: remove hugetlb special casing in filemap.c Sidhartha Kumar
2023-07-11 19:32 ` Andrew Morton
2023-07-21 20:22 ` Mike Kravetz
2023-07-20 0:00 ` [PATCH v2 0/1] change ->index to PAGE_SIZE for hugetlb pages Mike Kravetz
2023-07-22 4:05 ` Matthew Wilcox
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).