* [PATCH v3 0/2] Minimize xa_node allocation during xarry split
@ 2025-02-26 21:08 Zi Yan
2025-02-26 21:08 ` [PATCH v3 1/2] mm/filemap: use xas_try_split() in __filemap_add_folio() Zi Yan
` (2 more replies)
0 siblings, 3 replies; 9+ messages in thread
From: Zi Yan @ 2025-02-26 21:08 UTC (permalink / raw)
To: Baolin Wang, Matthew Wilcox, linux-mm, linux-fsdevel
Cc: Andrew Morton, Hugh Dickins, Kairui Song, Miaohe Lin,
linux-kernel, Zi Yan
Hi all,
When splitting a multi-index entry in XArray from order-n to order-m,
existing xas_split_alloc()+xas_split() approach requires
2^(n % XA_CHUNK_SHIFT) xa_node allocations. But its callers,
__filemap_add_folio() and shmem_split_large_entry(), use at most 1 xa_node.
To minimize xa_node allocation and remove the limitation of no split from
order-12 (or above) to order-0 (or anything between 0 and 5)[1],
xas_try_split() was added[2], which allocates
(n / XA_CHUNK_SHIFT - m / XA_CHUNK_SHIFT) xa_node. It is used
for non-uniform folio split, but can be used by __filemap_add_folio()
and shmem_split_large_entry().
xas_split_alloc() and xas_split() split an order-9 to order-0:
---------------------------------
| | | | | | | | |
| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
| | | | | | | | |
---------------------------------
| | | |
------- --- --- -------
| | ... | |
V V V V
----------- ----------- ----------- -----------
| xa_node | | xa_node | ... | xa_node | | xa_node |
----------- ----------- ----------- -----------
xas_try_split() splits an order-9 to order-0:
---------------------------------
| | | | | | | | |
| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
| | | | | | | | |
---------------------------------
|
|
V
-----------
| xa_node |
-----------
xas_try_split() is designed to be called iteratively with n = m + 1.
xas_try_split_mini_order() is added to minmize the number of calls to
xas_try_split() by telling the caller the next minimal order to split to
instead of n - 1. Splitting order-n to order-m when m= l * XA_CHUNK_SHIFT
does not require xa_node allocation and requires 1 xa_node
when n=l * XA_CHUNK_SHIFT and m = n - 1, so it is OK to use
xas_try_split() with n > m + 1 when no new xa_node is needed.
xfstests quick group test passed on xfs and tmpfs.
It is on top of Buddy allocator like (or non-uniform)
folio split V9[2], which is on top of mm-everything-2025-02-26-03-56.
Changelog
===
From V2[3]:
1. Fixed shmem_split_large_entry() by setting swap offset correct.
(Thank Baolin for the detailed review)
2. Used updated xas_try_split() to avoid a bug when xa_node is allocated
by xas_nomem() instead of xas_try_split() itself.
Let me know your comments.
[1] https://lore.kernel.org/linux-mm/Z6YX3RznGLUD07Ao@casper.infradead.org/
[2] https://lore.kernel.org/linux-mm/20250226210032.2044041-1-ziy@nvidia.com/
[3] https://lore.kernel.org/linux-mm/20250218235444.1543173-1-ziy@nvidia.com/
Zi Yan (2):
mm/filemap: use xas_try_split() in __filemap_add_folio()
mm/shmem: use xas_try_split() in shmem_split_large_entry()
include/linux/xarray.h | 7 +++++
lib/xarray.c | 25 ++++++++++++++++++
mm/filemap.c | 45 +++++++++++++-------------------
mm/shmem.c | 59 ++++++++++++++++++++----------------------
4 files changed, 78 insertions(+), 58 deletions(-)
--
2.47.2
^ permalink raw reply [flat|nested] 9+ messages in thread* [PATCH v3 1/2] mm/filemap: use xas_try_split() in __filemap_add_folio() 2025-02-26 21:08 [PATCH v3 0/2] Minimize xa_node allocation during xarry split Zi Yan @ 2025-02-26 21:08 ` Zi Yan 2025-03-08 18:14 ` SeongJae Park 2025-02-26 21:08 ` [PATCH v3 2/2] mm/shmem: use xas_try_split() in shmem_split_large_entry() Zi Yan 2025-03-07 20:33 ` [PATCH v3 0/2] Minimize xa_node allocation during xarry split Zi Yan 2 siblings, 1 reply; 9+ messages in thread From: Zi Yan @ 2025-02-26 21:08 UTC (permalink / raw) To: Baolin Wang, Matthew Wilcox, linux-mm, linux-fsdevel Cc: Andrew Morton, Hugh Dickins, Kairui Song, Miaohe Lin, linux-kernel, Zi Yan, David Hildenbrand, John Hubbard, Kefeng Wang, Kirill A. Shuemov, Ryan Roberts, Yang Shi, Yu Zhao During __filemap_add_folio(), a shadow entry is covering n slots and a folio covers m slots with m < n is to be added. Instead of splitting all n slots, only the m slots covered by the folio need to be split and the remaining n-m shadow entries can be retained with orders ranging from m to n-1. This method only requires (n/XA_CHUNK_SHIFT) - (m/XA_CHUNK_SHIFT) new xa_nodes instead of (n % XA_CHUNK_SHIFT) * ((n/XA_CHUNK_SHIFT) - (m/XA_CHUNK_SHIFT)) new xa_nodes, compared to the original xas_split_alloc() + xas_split() one. For example, to insert an order-0 folio when an order-9 shadow entry is present (assuming XA_CHUNK_SHIFT is 6), 1 xa_node is needed instead of 8. xas_try_split_min_order() is introduced to reduce the number of calls to xas_try_split() during split. Signed-off-by: Zi Yan <ziy@nvidia.com> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Hugh Dickins <hughd@google.com> Cc: Kairui Song <kasong@tencent.com> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Mattew Wilcox <willy@infradead.org> Cc: David Hildenbrand <david@redhat.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Kirill A. Shuemov <kirill.shutemov@linux.intel.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Yang Shi <yang@os.amperecomputing.com> Cc: Yu Zhao <yuzhao@google.com> --- include/linux/xarray.h | 7 +++++++ lib/xarray.c | 25 +++++++++++++++++++++++ mm/filemap.c | 45 +++++++++++++++++------------------------- 3 files changed, 50 insertions(+), 27 deletions(-) diff --git a/include/linux/xarray.h b/include/linux/xarray.h index 4010195201c9..78eede109b1a 100644 --- a/include/linux/xarray.h +++ b/include/linux/xarray.h @@ -1556,6 +1556,7 @@ int xas_get_order(struct xa_state *xas); void xas_split(struct xa_state *, void *entry, unsigned int order); void xas_split_alloc(struct xa_state *, void *entry, unsigned int order, gfp_t); void xas_try_split(struct xa_state *xas, void *entry, unsigned int order); +unsigned int xas_try_split_min_order(unsigned int order); #else static inline int xa_get_order(struct xarray *xa, unsigned long index) { @@ -1582,6 +1583,12 @@ static inline void xas_try_split(struct xa_state *xas, void *entry, unsigned int order) { } + +static inline unsigned int xas_try_split_min_order(unsigned int order) +{ + return 0; +} + #endif /** diff --git a/lib/xarray.c b/lib/xarray.c index bc197c96d171..8067182d3e43 100644 --- a/lib/xarray.c +++ b/lib/xarray.c @@ -1133,6 +1133,28 @@ void xas_split(struct xa_state *xas, void *entry, unsigned int order) } EXPORT_SYMBOL_GPL(xas_split); +/** + * xas_try_split_min_order() - Minimal split order xas_try_split() can accept + * @order: Current entry order. + * + * xas_try_split() can split a multi-index entry to smaller than @order - 1 if + * no new xa_node is needed. This function provides the minimal order + * xas_try_split() supports. + * + * Return: the minimal order xas_try_split() supports + * + * Context: Any context. + * + */ +unsigned int xas_try_split_min_order(unsigned int order) +{ + if (order % XA_CHUNK_SHIFT == 0) + return order == 0 ? 0 : order - 1; + + return order - (order % XA_CHUNK_SHIFT); +} +EXPORT_SYMBOL_GPL(xas_try_split_min_order); + /** * xas_try_split() - Try to split a multi-index entry. * @xas: XArray operation state. @@ -1144,6 +1166,9 @@ EXPORT_SYMBOL_GPL(xas_split); * needed, the function will use GFP_NOWAIT to get one if xas->xa_alloc is * NULL. If more new xa_node are needed, the function gives EINVAL error. * + * NOTE: use xas_try_split_min_order() to get next split order instead of + * @order - 1 if you want to minmize xas_try_split() calls. + * * Context: Any context. The caller should hold the xa_lock. */ void xas_try_split(struct xa_state *xas, void *entry, unsigned int order) diff --git a/mm/filemap.c b/mm/filemap.c index 2b860b59a521..cfb49ed659a1 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -857,11 +857,10 @@ EXPORT_SYMBOL_GPL(replace_page_cache_folio); noinline int __filemap_add_folio(struct address_space *mapping, struct folio *folio, pgoff_t index, gfp_t gfp, void **shadowp) { - XA_STATE(xas, &mapping->i_pages, index); - void *alloced_shadow = NULL; - int alloced_order = 0; + XA_STATE_ORDER(xas, &mapping->i_pages, index, folio_order(folio)); bool huge; long nr; + unsigned int forder = folio_order(folio); VM_BUG_ON_FOLIO(!folio_test_locked(folio), folio); VM_BUG_ON_FOLIO(folio_test_swapbacked(folio), folio); @@ -870,7 +869,6 @@ noinline int __filemap_add_folio(struct address_space *mapping, mapping_set_update(&xas, mapping); VM_BUG_ON_FOLIO(index & (folio_nr_pages(folio) - 1), folio); - xas_set_order(&xas, index, folio_order(folio)); huge = folio_test_hugetlb(folio); nr = folio_nr_pages(folio); @@ -880,7 +878,7 @@ noinline int __filemap_add_folio(struct address_space *mapping, folio->index = xas.xa_index; for (;;) { - int order = -1, split_order = 0; + int order = -1; void *entry, *old = NULL; xas_lock_irq(&xas); @@ -898,21 +896,25 @@ noinline int __filemap_add_folio(struct address_space *mapping, order = xas_get_order(&xas); } - /* entry may have changed before we re-acquire the lock */ - if (alloced_order && (old != alloced_shadow || order != alloced_order)) { - xas_destroy(&xas); - alloced_order = 0; - } - if (old) { - if (order > 0 && order > folio_order(folio)) { + if (order > 0 && order > forder) { + unsigned int split_order = max(forder, + xas_try_split_min_order(order)); + /* How to handle large swap entries? */ BUG_ON(shmem_mapping(mapping)); - if (!alloced_order) { - split_order = order; - goto unlock; + + while (order > forder) { + xas_set_order(&xas, index, split_order); + xas_try_split(&xas, old, order); + if (xas_error(&xas)) + goto unlock; + order = split_order; + split_order = + max(xas_try_split_min_order( + split_order), + forder); } - xas_split(&xas, old, order); xas_reset(&xas); } if (shadowp) @@ -936,17 +938,6 @@ noinline int __filemap_add_folio(struct address_space *mapping, unlock: xas_unlock_irq(&xas); - /* split needed, alloc here and retry. */ - if (split_order) { - xas_split_alloc(&xas, old, split_order, gfp); - if (xas_error(&xas)) - goto error; - alloced_shadow = old; - alloced_order = split_order; - xas_reset(&xas); - continue; - } - if (!xas_nomem(&xas, gfp)) break; } -- 2.47.2 ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH v3 1/2] mm/filemap: use xas_try_split() in __filemap_add_folio() 2025-02-26 21:08 ` [PATCH v3 1/2] mm/filemap: use xas_try_split() in __filemap_add_folio() Zi Yan @ 2025-03-08 18:14 ` SeongJae Park 2025-03-08 18:32 ` Zi Yan 0 siblings, 1 reply; 9+ messages in thread From: SeongJae Park @ 2025-03-08 18:14 UTC (permalink / raw) To: Zi Yan Cc: SeongJae Park, Baolin Wang, Matthew Wilcox, linux-mm, linux-fsdevel, Andrew Morton, Hugh Dickins, Kairui Song, Miaohe Lin, linux-kernel, David Hildenbrand, John Hubbard, Kefeng Wang, Kirill A. Shuemov, Ryan Roberts, Yang Shi, Yu Zhao Hello, On Wed, 26 Feb 2025 16:08:53 -0500 Zi Yan <ziy@nvidia.com> wrote: > During __filemap_add_folio(), a shadow entry is covering n slots and a > folio covers m slots with m < n is to be added. Instead of splitting all > n slots, only the m slots covered by the folio need to be split and the > remaining n-m shadow entries can be retained with orders ranging from m to > n-1. This method only requires > > (n/XA_CHUNK_SHIFT) - (m/XA_CHUNK_SHIFT) > > new xa_nodes instead of > (n % XA_CHUNK_SHIFT) * ((n/XA_CHUNK_SHIFT) - (m/XA_CHUNK_SHIFT)) > > new xa_nodes, compared to the original xas_split_alloc() + xas_split() > one. For example, to insert an order-0 folio when an order-9 shadow entry > is present (assuming XA_CHUNK_SHIFT is 6), 1 xa_node is needed instead of > 8. > > xas_try_split_min_order() is introduced to reduce the number of calls to > xas_try_split() during split. > > Signed-off-by: Zi Yan <ziy@nvidia.com> > Cc: Baolin Wang <baolin.wang@linux.alibaba.com> > Cc: Hugh Dickins <hughd@google.com> > Cc: Kairui Song <kasong@tencent.com> > Cc: Miaohe Lin <linmiaohe@huawei.com> > Cc: Mattew Wilcox <willy@infradead.org> > Cc: David Hildenbrand <david@redhat.com> > Cc: John Hubbard <jhubbard@nvidia.com> > Cc: Kefeng Wang <wangkefeng.wang@huawei.com> > Cc: Kirill A. Shuemov <kirill.shutemov@linux.intel.com> > Cc: Ryan Roberts <ryan.roberts@arm.com> > Cc: Yang Shi <yang@os.amperecomputing.com> > Cc: Yu Zhao <yuzhao@google.com> > --- > include/linux/xarray.h | 7 +++++++ > lib/xarray.c | 25 +++++++++++++++++++++++ > mm/filemap.c | 45 +++++++++++++++++------------------------- > 3 files changed, 50 insertions(+), 27 deletions(-) > > diff --git a/include/linux/xarray.h b/include/linux/xarray.h > index 4010195201c9..78eede109b1a 100644 > --- a/include/linux/xarray.h > +++ b/include/linux/xarray.h > @@ -1556,6 +1556,7 @@ int xas_get_order(struct xa_state *xas); > void xas_split(struct xa_state *, void *entry, unsigned int order); > void xas_split_alloc(struct xa_state *, void *entry, unsigned int order, gfp_t); > void xas_try_split(struct xa_state *xas, void *entry, unsigned int order); > +unsigned int xas_try_split_min_order(unsigned int order); > #else > static inline int xa_get_order(struct xarray *xa, unsigned long index) > { > @@ -1582,6 +1583,12 @@ static inline void xas_try_split(struct xa_state *xas, void *entry, > unsigned int order) > { > } > + > +static inline unsigned int xas_try_split_min_order(unsigned int order) > +{ > + return 0; > +} > + > #endif > > /** > diff --git a/lib/xarray.c b/lib/xarray.c > index bc197c96d171..8067182d3e43 100644 > --- a/lib/xarray.c > +++ b/lib/xarray.c > @@ -1133,6 +1133,28 @@ void xas_split(struct xa_state *xas, void *entry, unsigned int order) > } > EXPORT_SYMBOL_GPL(xas_split); > > +/** > + * xas_try_split_min_order() - Minimal split order xas_try_split() can accept > + * @order: Current entry order. > + * > + * xas_try_split() can split a multi-index entry to smaller than @order - 1 if > + * no new xa_node is needed. This function provides the minimal order > + * xas_try_split() supports. > + * > + * Return: the minimal order xas_try_split() supports > + * > + * Context: Any context. > + * > + */ > +unsigned int xas_try_split_min_order(unsigned int order) > +{ > + if (order % XA_CHUNK_SHIFT == 0) > + return order == 0 ? 0 : order - 1; > + > + return order - (order % XA_CHUNK_SHIFT); > +} > +EXPORT_SYMBOL_GPL(xas_try_split_min_order); > + I found this makes build fails when CONFIG_XARRAY_MULTI is unset, like below. /linux/lib/xarray.c:1251:14: error: redefinition of ‘xas_try_split_min_order’ 1251 | unsigned int xas_try_split_min_order(unsigned int order) | ^~~~~~~~~~~~~~~~~~~~~~~ In file included from /linux/lib/xarray.c:13: /linux/include/linux/xarray.h:1587:28: note: previous definition of ‘xas_try_split_min_order’ with type ‘unsigned int(unsigned int)’ 1587 | static inline unsigned int xas_try_split_min_order(unsigned int order) | ^~~~~~~~~~~~~~~~~~~~~~~ I think we should have the definition only when CONFIG_XARRAY_MULTI? Thanks, SJ [...] ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v3 1/2] mm/filemap: use xas_try_split() in __filemap_add_folio() 2025-03-08 18:14 ` SeongJae Park @ 2025-03-08 18:32 ` Zi Yan 2025-03-08 18:36 ` Zi Yan 2025-03-08 21:34 ` SeongJae Park 0 siblings, 2 replies; 9+ messages in thread From: Zi Yan @ 2025-03-08 18:32 UTC (permalink / raw) To: SeongJae Park, Andrew Morton Cc: Baolin Wang, Matthew Wilcox, linux-mm, linux-fsdevel, Hugh Dickins, Kairui Song, Miaohe Lin, linux-kernel, David Hildenbrand, John Hubbard, Kefeng Wang, Kirill A. Shuemov, Ryan Roberts, Yang Shi, Yu Zhao On 8 Mar 2025, at 13:14, SeongJae Park wrote: > Hello, > > On Wed, 26 Feb 2025 16:08:53 -0500 Zi Yan <ziy@nvidia.com> wrote: > >> During __filemap_add_folio(), a shadow entry is covering n slots and a >> folio covers m slots with m < n is to be added. Instead of splitting all >> n slots, only the m slots covered by the folio need to be split and the >> remaining n-m shadow entries can be retained with orders ranging from m to >> n-1. This method only requires >> >> (n/XA_CHUNK_SHIFT) - (m/XA_CHUNK_SHIFT) >> >> new xa_nodes instead of >> (n % XA_CHUNK_SHIFT) * ((n/XA_CHUNK_SHIFT) - (m/XA_CHUNK_SHIFT)) >> >> new xa_nodes, compared to the original xas_split_alloc() + xas_split() >> one. For example, to insert an order-0 folio when an order-9 shadow entry >> is present (assuming XA_CHUNK_SHIFT is 6), 1 xa_node is needed instead of >> 8. >> >> xas_try_split_min_order() is introduced to reduce the number of calls to >> xas_try_split() during split. >> >> Signed-off-by: Zi Yan <ziy@nvidia.com> >> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> >> Cc: Hugh Dickins <hughd@google.com> >> Cc: Kairui Song <kasong@tencent.com> >> Cc: Miaohe Lin <linmiaohe@huawei.com> >> Cc: Mattew Wilcox <willy@infradead.org> >> Cc: David Hildenbrand <david@redhat.com> >> Cc: John Hubbard <jhubbard@nvidia.com> >> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> >> Cc: Kirill A. Shuemov <kirill.shutemov@linux.intel.com> >> Cc: Ryan Roberts <ryan.roberts@arm.com> >> Cc: Yang Shi <yang@os.amperecomputing.com> >> Cc: Yu Zhao <yuzhao@google.com> >> --- >> include/linux/xarray.h | 7 +++++++ >> lib/xarray.c | 25 +++++++++++++++++++++++ >> mm/filemap.c | 45 +++++++++++++++++------------------------- >> 3 files changed, 50 insertions(+), 27 deletions(-) >> >> diff --git a/include/linux/xarray.h b/include/linux/xarray.h >> index 4010195201c9..78eede109b1a 100644 >> --- a/include/linux/xarray.h >> +++ b/include/linux/xarray.h >> @@ -1556,6 +1556,7 @@ int xas_get_order(struct xa_state *xas); >> void xas_split(struct xa_state *, void *entry, unsigned int order); >> void xas_split_alloc(struct xa_state *, void *entry, unsigned int order, gfp_t); >> void xas_try_split(struct xa_state *xas, void *entry, unsigned int order); >> +unsigned int xas_try_split_min_order(unsigned int order); >> #else >> static inline int xa_get_order(struct xarray *xa, unsigned long index) >> { >> @@ -1582,6 +1583,12 @@ static inline void xas_try_split(struct xa_state *xas, void *entry, >> unsigned int order) >> { >> } >> + >> +static inline unsigned int xas_try_split_min_order(unsigned int order) >> +{ >> + return 0; >> +} >> + >> #endif >> >> /** >> diff --git a/lib/xarray.c b/lib/xarray.c >> index bc197c96d171..8067182d3e43 100644 >> --- a/lib/xarray.c >> +++ b/lib/xarray.c >> @@ -1133,6 +1133,28 @@ void xas_split(struct xa_state *xas, void *entry, unsigned int order) >> } >> EXPORT_SYMBOL_GPL(xas_split); >> >> +/** >> + * xas_try_split_min_order() - Minimal split order xas_try_split() can accept >> + * @order: Current entry order. >> + * >> + * xas_try_split() can split a multi-index entry to smaller than @order - 1 if >> + * no new xa_node is needed. This function provides the minimal order >> + * xas_try_split() supports. >> + * >> + * Return: the minimal order xas_try_split() supports >> + * >> + * Context: Any context. >> + * >> + */ >> +unsigned int xas_try_split_min_order(unsigned int order) >> +{ >> + if (order % XA_CHUNK_SHIFT == 0) >> + return order == 0 ? 0 : order - 1; >> + >> + return order - (order % XA_CHUNK_SHIFT); >> +} >> +EXPORT_SYMBOL_GPL(xas_try_split_min_order); >> + > > I found this makes build fails when CONFIG_XARRAY_MULTI is unset, like below. > > /linux/lib/xarray.c:1251:14: error: redefinition of ‘xas_try_split_min_order’ > 1251 | unsigned int xas_try_split_min_order(unsigned int order) > | ^~~~~~~~~~~~~~~~~~~~~~~ > In file included from /linux/lib/xarray.c:13: > /linux/include/linux/xarray.h:1587:28: note: previous definition of ‘xas_try_split_min_order’ with type ‘unsigned int(unsigned int)’ > 1587 | static inline unsigned int xas_try_split_min_order(unsigned int order) > | ^~~~~~~~~~~~~~~~~~~~~~~ > > I think we should have the definition only when CONFIG_XARRAY_MULTI? I think it might be a merge issue, since my original patch[1] places xas_try_split_min_order() above xas_try_split(), both of which are in #ifdef CONFIG_XARRAY_MULTI #endif. But mm-everything-2025-03-08-00-43 seems to move xas_try_split_min_order() below xas_try_split() and out of CONFIG_XARRAY_MULTI guard. [1] https://lore.kernel.org/linux-mm/20250226210854.2045816-2-ziy@nvidia.com/ -- Best Regards, Yan, Zi ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v3 1/2] mm/filemap: use xas_try_split() in __filemap_add_folio() 2025-03-08 18:32 ` Zi Yan @ 2025-03-08 18:36 ` Zi Yan 2025-03-08 21:34 ` SeongJae Park 1 sibling, 0 replies; 9+ messages in thread From: Zi Yan @ 2025-03-08 18:36 UTC (permalink / raw) To: SeongJae Park, Andrew Morton Cc: Baolin Wang, Matthew Wilcox, linux-mm, linux-fsdevel, Hugh Dickins, Kairui Song, Miaohe Lin, linux-kernel, David Hildenbrand, John Hubbard, Kefeng Wang, Kirill A. Shuemov, Ryan Roberts, Yang Shi, Yu Zhao On 8 Mar 2025, at 13:32, Zi Yan wrote: > On 8 Mar 2025, at 13:14, SeongJae Park wrote: > >> Hello, >> >> On Wed, 26 Feb 2025 16:08:53 -0500 Zi Yan <ziy@nvidia.com> wrote: >> >>> During __filemap_add_folio(), a shadow entry is covering n slots and a >>> folio covers m slots with m < n is to be added. Instead of splitting all >>> n slots, only the m slots covered by the folio need to be split and the >>> remaining n-m shadow entries can be retained with orders ranging from m to >>> n-1. This method only requires >>> >>> (n/XA_CHUNK_SHIFT) - (m/XA_CHUNK_SHIFT) >>> >>> new xa_nodes instead of >>> (n % XA_CHUNK_SHIFT) * ((n/XA_CHUNK_SHIFT) - (m/XA_CHUNK_SHIFT)) >>> >>> new xa_nodes, compared to the original xas_split_alloc() + xas_split() >>> one. For example, to insert an order-0 folio when an order-9 shadow entry >>> is present (assuming XA_CHUNK_SHIFT is 6), 1 xa_node is needed instead of >>> 8. >>> >>> xas_try_split_min_order() is introduced to reduce the number of calls to >>> xas_try_split() during split. >>> >>> Signed-off-by: Zi Yan <ziy@nvidia.com> >>> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> >>> Cc: Hugh Dickins <hughd@google.com> >>> Cc: Kairui Song <kasong@tencent.com> >>> Cc: Miaohe Lin <linmiaohe@huawei.com> >>> Cc: Mattew Wilcox <willy@infradead.org> >>> Cc: David Hildenbrand <david@redhat.com> >>> Cc: John Hubbard <jhubbard@nvidia.com> >>> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> >>> Cc: Kirill A. Shuemov <kirill.shutemov@linux.intel.com> >>> Cc: Ryan Roberts <ryan.roberts@arm.com> >>> Cc: Yang Shi <yang@os.amperecomputing.com> >>> Cc: Yu Zhao <yuzhao@google.com> >>> --- >>> include/linux/xarray.h | 7 +++++++ >>> lib/xarray.c | 25 +++++++++++++++++++++++ >>> mm/filemap.c | 45 +++++++++++++++++------------------------- >>> 3 files changed, 50 insertions(+), 27 deletions(-) >>> >>> diff --git a/include/linux/xarray.h b/include/linux/xarray.h >>> index 4010195201c9..78eede109b1a 100644 >>> --- a/include/linux/xarray.h >>> +++ b/include/linux/xarray.h >>> @@ -1556,6 +1556,7 @@ int xas_get_order(struct xa_state *xas); >>> void xas_split(struct xa_state *, void *entry, unsigned int order); >>> void xas_split_alloc(struct xa_state *, void *entry, unsigned int order, gfp_t); >>> void xas_try_split(struct xa_state *xas, void *entry, unsigned int order); >>> +unsigned int xas_try_split_min_order(unsigned int order); >>> #else >>> static inline int xa_get_order(struct xarray *xa, unsigned long index) >>> { >>> @@ -1582,6 +1583,12 @@ static inline void xas_try_split(struct xa_state *xas, void *entry, >>> unsigned int order) >>> { >>> } >>> + >>> +static inline unsigned int xas_try_split_min_order(unsigned int order) >>> +{ >>> + return 0; >>> +} >>> + >>> #endif >>> >>> /** >>> diff --git a/lib/xarray.c b/lib/xarray.c >>> index bc197c96d171..8067182d3e43 100644 >>> --- a/lib/xarray.c >>> +++ b/lib/xarray.c >>> @@ -1133,6 +1133,28 @@ void xas_split(struct xa_state *xas, void *entry, unsigned int order) >>> } >>> EXPORT_SYMBOL_GPL(xas_split); >>> >>> +/** >>> + * xas_try_split_min_order() - Minimal split order xas_try_split() can accept >>> + * @order: Current entry order. >>> + * >>> + * xas_try_split() can split a multi-index entry to smaller than @order - 1 if >>> + * no new xa_node is needed. This function provides the minimal order >>> + * xas_try_split() supports. >>> + * >>> + * Return: the minimal order xas_try_split() supports >>> + * >>> + * Context: Any context. >>> + * >>> + */ >>> +unsigned int xas_try_split_min_order(unsigned int order) >>> +{ >>> + if (order % XA_CHUNK_SHIFT == 0) >>> + return order == 0 ? 0 : order - 1; >>> + >>> + return order - (order % XA_CHUNK_SHIFT); >>> +} >>> +EXPORT_SYMBOL_GPL(xas_try_split_min_order); >>> + >> >> I found this makes build fails when CONFIG_XARRAY_MULTI is unset, like below. >> >> /linux/lib/xarray.c:1251:14: error: redefinition of ‘xas_try_split_min_order’ >> 1251 | unsigned int xas_try_split_min_order(unsigned int order) >> | ^~~~~~~~~~~~~~~~~~~~~~~ >> In file included from /linux/lib/xarray.c:13: >> /linux/include/linux/xarray.h:1587:28: note: previous definition of ‘xas_try_split_min_order’ with type ‘unsigned int(unsigned int)’ >> 1587 | static inline unsigned int xas_try_split_min_order(unsigned int order) >> | ^~~~~~~~~~~~~~~~~~~~~~~ >> >> I think we should have the definition only when CONFIG_XARRAY_MULTI? > > I think it might be a merge issue, since my original patch[1] places > xas_try_split_min_order() above xas_try_split(), both of which are > in #ifdef CONFIG_XARRAY_MULTI #endif. But mm-everything-2025-03-08-00-43 > seems to move xas_try_split_min_order() below xas_try_split() and > out of CONFIG_XARRAY_MULTI guard. > > [1] https://lore.kernel.org/linux-mm/20250226210854.2045816-2-ziy@nvidia.com/ In addition, the new comment for xas_try_split() is added to xas_split() comment. See https://web.git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git/tree/lib/xarray.c?h=mm-everything-2025-03-08-00-43#n1084 Something went wrong when this patch was applied. -- Best Regards, Yan, Zi ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v3 1/2] mm/filemap: use xas_try_split() in __filemap_add_folio() 2025-03-08 18:32 ` Zi Yan 2025-03-08 18:36 ` Zi Yan @ 2025-03-08 21:34 ` SeongJae Park 1 sibling, 0 replies; 9+ messages in thread From: SeongJae Park @ 2025-03-08 21:34 UTC (permalink / raw) To: Zi Yan Cc: SeongJae Park, Andrew Morton, Baolin Wang, Matthew Wilcox, linux-mm, linux-fsdevel, Hugh Dickins, Kairui Song, Miaohe Lin, linux-kernel, David Hildenbrand, John Hubbard, Kefeng Wang, Kirill A. Shuemov, Ryan Roberts, Yang Shi, Yu Zhao On Sat, 08 Mar 2025 13:32:02 -0500 Zi Yan <ziy@nvidia.com> wrote: > On 8 Mar 2025, at 13:14, SeongJae Park wrote: > > > Hello, > > > > On Wed, 26 Feb 2025 16:08:53 -0500 Zi Yan <ziy@nvidia.com> wrote: [...] > >> diff --git a/include/linux/xarray.h b/include/linux/xarray.h > >> index 4010195201c9..78eede109b1a 100644 > >> --- a/include/linux/xarray.h > >> +++ b/include/linux/xarray.h > >> @@ -1556,6 +1556,7 @@ int xas_get_order(struct xa_state *xas); > >> void xas_split(struct xa_state *, void *entry, unsigned int order); > >> void xas_split_alloc(struct xa_state *, void *entry, unsigned int order, gfp_t); > >> void xas_try_split(struct xa_state *xas, void *entry, unsigned int order); > >> +unsigned int xas_try_split_min_order(unsigned int order); > >> #else > >> static inline int xa_get_order(struct xarray *xa, unsigned long index) > >> { > >> @@ -1582,6 +1583,12 @@ static inline void xas_try_split(struct xa_state *xas, void *entry, > >> unsigned int order) > >> { > >> } > >> + > >> +static inline unsigned int xas_try_split_min_order(unsigned int order) > >> +{ > >> + return 0; > >> +} > >> + > >> #endif > >> > >> /** > >> diff --git a/lib/xarray.c b/lib/xarray.c > >> index bc197c96d171..8067182d3e43 100644 > >> --- a/lib/xarray.c > >> +++ b/lib/xarray.c > >> @@ -1133,6 +1133,28 @@ void xas_split(struct xa_state *xas, void *entry, unsigned int order) > >> } > >> EXPORT_SYMBOL_GPL(xas_split); > >> > >> +/** > >> + * xas_try_split_min_order() - Minimal split order xas_try_split() can accept > >> + * @order: Current entry order. > >> + * > >> + * xas_try_split() can split a multi-index entry to smaller than @order - 1 if > >> + * no new xa_node is needed. This function provides the minimal order > >> + * xas_try_split() supports. > >> + * > >> + * Return: the minimal order xas_try_split() supports > >> + * > >> + * Context: Any context. > >> + * > >> + */ > >> +unsigned int xas_try_split_min_order(unsigned int order) > >> +{ > >> + if (order % XA_CHUNK_SHIFT = 0) > >> + return order = 0 ? 0 : order - 1; > >> + > >> + return order - (order % XA_CHUNK_SHIFT); > >> +} > >> +EXPORT_SYMBOL_GPL(xas_try_split_min_order); > >> + > > > > I found this makes build fails when CONFIG_XARRAY_MULTI is unset, like below. > > > > /linux/lib/xarray.c:1251:14: error: redefinition of ‘xas_try_split_min_order’ > > 1251 | unsigned int xas_try_split_min_order(unsigned int order) > > | ^~~~~~~~~~~~~~~~~~~~~~~ > > In file included from /linux/lib/xarray.c:13: > > /linux/include/linux/xarray.h:1587:28: note: previous definition of ‘xas_try_split_min_order’ with type ‘unsigned int(unsigned int)’ > > 1587 | static inline unsigned int xas_try_split_min_order(unsigned int order) > > | ^~~~~~~~~~~~~~~~~~~~~~~ > > > > I think we should have the definition only when CONFIG_XARRAY_MULTI? > > I think it might be a merge issue, since my original patch[1] places > xas_try_split_min_order() above xas_try_split(), both of which are > in #ifdef CONFIG_XARRAY_MULTI #endif. But mm-everything-2025-03-08-00-43 > seems to move xas_try_split_min_order() below xas_try_split() and > out of CONFIG_XARRAY_MULTI guard. You're right. I was testing this on the mm-unstable tree, more specifically, commit 2f0c87542d97. I confirmed the build failure goes away after moving the definition to the original place. > > [1] https://lore.kernel.org/linux-mm/20250226210854.2045816-2-ziy@nvidia.com/ > > -- > Best Regards, > Yan, Zi Thanks, SJ ^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH v3 2/2] mm/shmem: use xas_try_split() in shmem_split_large_entry() 2025-02-26 21:08 [PATCH v3 0/2] Minimize xa_node allocation during xarry split Zi Yan 2025-02-26 21:08 ` [PATCH v3 1/2] mm/filemap: use xas_try_split() in __filemap_add_folio() Zi Yan @ 2025-02-26 21:08 ` Zi Yan 2025-02-27 3:43 ` Baolin Wang 2025-03-07 20:33 ` [PATCH v3 0/2] Minimize xa_node allocation during xarry split Zi Yan 2 siblings, 1 reply; 9+ messages in thread From: Zi Yan @ 2025-02-26 21:08 UTC (permalink / raw) To: Baolin Wang, Matthew Wilcox, linux-mm, linux-fsdevel Cc: Andrew Morton, Hugh Dickins, Kairui Song, Miaohe Lin, linux-kernel, Zi Yan, David Hildenbrand, John Hubbard, Kefeng Wang, Kirill A. Shuemov, Ryan Roberts, Yang Shi, Yu Zhao During shmem_split_large_entry(), large swap entries are covering n slots and an order-0 folio needs to be inserted. Instead of splitting all n slots, only the 1 slot covered by the folio need to be split and the remaining n-1 shadow entries can be retained with orders ranging from 0 to n-1. This method only requires (n/XA_CHUNK_SHIFT) new xa_nodes instead of (n % XA_CHUNK_SHIFT) * (n/XA_CHUNK_SHIFT) new xa_nodes, compared to the original xas_split_alloc() + xas_split() one. For example, to split an order-9 large swap entry (assuming XA_CHUNK_SHIFT is 6), 1 xa_node is needed instead of 8. xas_try_split_min_order() is used to reduce the number of calls to xas_try_split() during split. Signed-off-by: Zi Yan <ziy@nvidia.com> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Hugh Dickins <hughd@google.com> Cc: Kairui Song <kasong@tencent.com> Cc: Mattew Wilcox <willy@infradead.org> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: David Hildenbrand <david@redhat.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Kirill A. Shuemov <kirill.shutemov@linux.intel.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Yang Shi <yang@os.amperecomputing.com> Cc: Yu Zhao <yuzhao@google.com> --- mm/shmem.c | 59 ++++++++++++++++++++++++++---------------------------- 1 file changed, 28 insertions(+), 31 deletions(-) diff --git a/mm/shmem.c b/mm/shmem.c index b959edf15073..f08ba9718013 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -2153,15 +2153,16 @@ static int shmem_split_large_entry(struct inode *inode, pgoff_t index, { struct address_space *mapping = inode->i_mapping; XA_STATE_ORDER(xas, &mapping->i_pages, index, 0); - void *alloced_shadow = NULL; - int alloced_order = 0, i; + int split_order = 0, entry_order; + int i; /* Convert user data gfp flags to xarray node gfp flags */ gfp &= GFP_RECLAIM_MASK; for (;;) { - int order = -1, split_order = 0; void *old = NULL; + int cur_order; + pgoff_t swap_index; xas_lock_irq(&xas); old = xas_load(&xas); @@ -2170,60 +2171,56 @@ static int shmem_split_large_entry(struct inode *inode, pgoff_t index, goto unlock; } - order = xas_get_order(&xas); + entry_order = xas_get_order(&xas); - /* Swap entry may have changed before we re-acquire the lock */ - if (alloced_order && - (old != alloced_shadow || order != alloced_order)) { - xas_destroy(&xas); - alloced_order = 0; - } + if (!entry_order) + goto unlock; /* Try to split large swap entry in pagecache */ - if (order > 0) { - if (!alloced_order) { - split_order = order; + cur_order = entry_order; + swap_index = round_down(index, 1 << entry_order); + + split_order = xas_try_split_min_order(cur_order); + + while (cur_order > 0) { + pgoff_t aligned_index = + round_down(index, 1 << cur_order); + pgoff_t swap_offset = aligned_index - swap_index; + + xas_set_order(&xas, index, split_order); + xas_try_split(&xas, old, cur_order); + if (xas_error(&xas)) goto unlock; - } - xas_split(&xas, old, order); /* * Re-set the swap entry after splitting, and the swap * offset of the original large entry must be continuous. */ - for (i = 0; i < 1 << order; i++) { - pgoff_t aligned_index = round_down(index, 1 << order); + for (i = 0; i < 1 << cur_order; + i += (1 << split_order)) { swp_entry_t tmp; - tmp = swp_entry(swp_type(swap), swp_offset(swap) + i); + tmp = swp_entry(swp_type(swap), + swp_offset(swap) + swap_offset + + i); __xa_store(&mapping->i_pages, aligned_index + i, swp_to_radix_entry(tmp), 0); } + cur_order = split_order; + split_order = xas_try_split_min_order(split_order); } unlock: xas_unlock_irq(&xas); - /* split needed, alloc here and retry. */ - if (split_order) { - xas_split_alloc(&xas, old, split_order, gfp); - if (xas_error(&xas)) - goto error; - alloced_shadow = old; - alloced_order = split_order; - xas_reset(&xas); - continue; - } - if (!xas_nomem(&xas, gfp)) break; } -error: if (xas_error(&xas)) return xas_error(&xas); - return alloced_order; + return entry_order; } /* -- 2.47.2 ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH v3 2/2] mm/shmem: use xas_try_split() in shmem_split_large_entry() 2025-02-26 21:08 ` [PATCH v3 2/2] mm/shmem: use xas_try_split() in shmem_split_large_entry() Zi Yan @ 2025-02-27 3:43 ` Baolin Wang 0 siblings, 0 replies; 9+ messages in thread From: Baolin Wang @ 2025-02-27 3:43 UTC (permalink / raw) To: Zi Yan, Matthew Wilcox, linux-mm, linux-fsdevel Cc: Andrew Morton, Hugh Dickins, Kairui Song, Miaohe Lin, linux-kernel, David Hildenbrand, John Hubbard, Kefeng Wang, Kirill A. Shuemov, Ryan Roberts, Yang Shi, Yu Zhao On 2025/2/27 05:08, Zi Yan wrote: > During shmem_split_large_entry(), large swap entries are covering n slots > and an order-0 folio needs to be inserted. > > Instead of splitting all n slots, only the 1 slot covered by the folio > need to be split and the remaining n-1 shadow entries can be retained with > orders ranging from 0 to n-1. This method only requires > (n/XA_CHUNK_SHIFT) new xa_nodes instead of (n % XA_CHUNK_SHIFT) * > (n/XA_CHUNK_SHIFT) new xa_nodes, compared to the original > xas_split_alloc() + xas_split() one. > > For example, to split an order-9 large swap entry (assuming XA_CHUNK_SHIFT > is 6), 1 xa_node is needed instead of 8. > > xas_try_split_min_order() is used to reduce the number of calls to > xas_try_split() during split. > > Signed-off-by: Zi Yan <ziy@nvidia.com> > Cc: Baolin Wang <baolin.wang@linux.alibaba.com> > Cc: Hugh Dickins <hughd@google.com> > Cc: Kairui Song <kasong@tencent.com> > Cc: Mattew Wilcox <willy@infradead.org> > Cc: Miaohe Lin <linmiaohe@huawei.com> > Cc: David Hildenbrand <david@redhat.com> > Cc: John Hubbard <jhubbard@nvidia.com> > Cc: Kefeng Wang <wangkefeng.wang@huawei.com> > Cc: Kirill A. Shuemov <kirill.shutemov@linux.intel.com> > Cc: Ryan Roberts <ryan.roberts@arm.com> > Cc: Yang Shi <yang@os.amperecomputing.com> > Cc: Yu Zhao <yuzhao@google.com> > --- LGTM. Feel free to add: Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com> Tested-by: Baolin Wang <baolin.wang@linux.alibaba.com> ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v3 0/2] Minimize xa_node allocation during xarry split 2025-02-26 21:08 [PATCH v3 0/2] Minimize xa_node allocation during xarry split Zi Yan 2025-02-26 21:08 ` [PATCH v3 1/2] mm/filemap: use xas_try_split() in __filemap_add_folio() Zi Yan 2025-02-26 21:08 ` [PATCH v3 2/2] mm/shmem: use xas_try_split() in shmem_split_large_entry() Zi Yan @ 2025-03-07 20:33 ` Zi Yan 2 siblings, 0 replies; 9+ messages in thread From: Zi Yan @ 2025-03-07 20:33 UTC (permalink / raw) To: Andrew Morton Cc: Hugh Dickins, Kairui Song, Miaohe Lin, linux-kernel, Zi Yan, Baolin Wang, Matthew Wilcox, linux-fsdevel, linux-mm On 26 Feb 2025, at 16:08, Zi Yan wrote: > Hi all, > > When splitting a multi-index entry in XArray from order-n to order-m, > existing xas_split_alloc()+xas_split() approach requires > 2^(n % XA_CHUNK_SHIFT) xa_node allocations. But its callers, > __filemap_add_folio() and shmem_split_large_entry(), use at most 1 xa_node. > To minimize xa_node allocation and remove the limitation of no split from > order-12 (or above) to order-0 (or anything between 0 and 5)[1], > xas_try_split() was added[2], which allocates > (n / XA_CHUNK_SHIFT - m / XA_CHUNK_SHIFT) xa_node. It is used > for non-uniform folio split, but can be used by __filemap_add_folio() > and shmem_split_large_entry(). > > xas_split_alloc() and xas_split() split an order-9 to order-0: > > --------------------------------- > | | | | | | | | | > | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | > | | | | | | | | | > --------------------------------- > | | | | > ------- --- --- ------- > | | ... | | > V V V V > ----------- ----------- ----------- ----------- > | xa_node | | xa_node | ... | xa_node | | xa_node | > ----------- ----------- ----------- ----------- > > xas_try_split() splits an order-9 to order-0: > --------------------------------- > | | | | | | | | | > | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | > | | | | | | | | | > --------------------------------- > | > | > V > ----------- > | xa_node | > ----------- > > xas_try_split() is designed to be called iteratively with n = m + 1. > xas_try_split_mini_order() is added to minmize the number of calls to > xas_try_split() by telling the caller the next minimal order to split to > instead of n - 1. Splitting order-n to order-m when m= l * XA_CHUNK_SHIFT > does not require xa_node allocation and requires 1 xa_node > when n=l * XA_CHUNK_SHIFT and m = n - 1, so it is OK to use > xas_try_split() with n > m + 1 when no new xa_node is needed. > > xfstests quick group test passed on xfs and tmpfs. > > It is on top of Buddy allocator like (or non-uniform) > folio split V9[2], which is on top of mm-everything-2025-02-26-03-56. > > Changelog > === > From V2[3]: > 1. Fixed shmem_split_large_entry() by setting swap offset correct. > (Thank Baolin for the detailed review) > 2. Used updated xas_try_split() to avoid a bug when xa_node is allocated > by xas_nomem() instead of xas_try_split() itself. > > Let me know your comments. > > > [1] https://lore.kernel.org/linux-mm/Z6YX3RznGLUD07Ao@casper.infradead.org/ > [2] https://lore.kernel.org/linux-mm/20250226210032.2044041-1-ziy@nvidia.com/ > [3] https://lore.kernel.org/linux-mm/20250218235444.1543173-1-ziy@nvidia.com/ Hi Andrew, Do you want me to resend this? This still applies cleanly on mm-everything-2025-03-07-07-55 plus V10. Thanks. Best Regards, Yan, Zi ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2025-03-08 21:34 UTC | newest] Thread overview: 9+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-02-26 21:08 [PATCH v3 0/2] Minimize xa_node allocation during xarry split Zi Yan 2025-02-26 21:08 ` [PATCH v3 1/2] mm/filemap: use xas_try_split() in __filemap_add_folio() Zi Yan 2025-03-08 18:14 ` SeongJae Park 2025-03-08 18:32 ` Zi Yan 2025-03-08 18:36 ` Zi Yan 2025-03-08 21:34 ` SeongJae Park 2025-02-26 21:08 ` [PATCH v3 2/2] mm/shmem: use xas_try_split() in shmem_split_large_entry() Zi Yan 2025-02-27 3:43 ` Baolin Wang 2025-03-07 20:33 ` [PATCH v3 0/2] Minimize xa_node allocation during xarry split Zi Yan
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.