* [PATCH v4 01/10] mm: vmscan: add validation before spliting shmem large folio
2024-08-07 7:31 [PATCH v4 00/10] support large folio swap-out and swap-in for shmem Baolin Wang
@ 2024-08-07 7:31 ` Baolin Wang
2024-08-07 15:53 ` David Hildenbrand
2024-08-07 7:31 ` [PATCH v4 02/10] mm: swap: extend swap_shmem_alloc() to support batch SWAP_MAP_SHMEM flag setting Baolin Wang
` (8 subsequent siblings)
9 siblings, 1 reply; 21+ messages in thread
From: Baolin Wang @ 2024-08-07 7:31 UTC (permalink / raw)
To: akpm, hughd
Cc: willy, david, wangkefeng.wang, chrisl, ying.huang, 21cnbao,
ryan.roberts, shy828301, ziy, ioworker0, da.gomez, p.raghav,
baolin.wang, linux-mm, linux-kernel
Page reclaim will not scan anon LRU if no swap space, however MADV_PAGEOUT
can still split shmem large folios even without a swap device. Thus add
swap available space validation before spliting shmem large folio to
avoid redundant split.
Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
---
mm/vmscan.c | 8 ++++++++
1 file changed, 8 insertions(+)
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 31d13462571e..796f65781f4f 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1259,6 +1259,14 @@ static unsigned int shrink_folio_list(struct list_head *folio_list,
}
} else if (folio_test_swapbacked(folio) &&
folio_test_large(folio)) {
+
+ /*
+ * Do not split shmem folio if no swap memory
+ * available.
+ */
+ if (!total_swap_pages)
+ goto activate_locked;
+
/* Split shmem folio */
if (split_folio_to_list(folio, folio_list))
goto keep_locked;
--
2.39.3
^ permalink raw reply related [flat|nested] 21+ messages in thread* Re: [PATCH v4 01/10] mm: vmscan: add validation before spliting shmem large folio
2024-08-07 7:31 ` [PATCH v4 01/10] mm: vmscan: add validation before spliting shmem large folio Baolin Wang
@ 2024-08-07 15:53 ` David Hildenbrand
2024-08-08 2:36 ` Baolin Wang
0 siblings, 1 reply; 21+ messages in thread
From: David Hildenbrand @ 2024-08-07 15:53 UTC (permalink / raw)
To: Baolin Wang, akpm, hughd
Cc: willy, wangkefeng.wang, chrisl, ying.huang, 21cnbao, ryan.roberts,
shy828301, ziy, ioworker0, da.gomez, p.raghav, linux-mm,
linux-kernel, Christian Brauner, Luis Chamberlain
On 07.08.24 09:31, Baolin Wang wrote:
> Page reclaim will not scan anon LRU if no swap space, however MADV_PAGEOUT
> can still split shmem large folios even without a swap device. Thus add
> swap available space validation before spliting shmem large folio to
> avoid redundant split.
>
> Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
> ---
> mm/vmscan.c | 8 ++++++++
> 1 file changed, 8 insertions(+)
>
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 31d13462571e..796f65781f4f 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -1259,6 +1259,14 @@ static unsigned int shrink_folio_list(struct list_head *folio_list,
> }
> } else if (folio_test_swapbacked(folio) &&
> folio_test_large(folio)) {
> +
> + /*
> + * Do not split shmem folio if no swap memory
> + * available.
> + */
> + if (!total_swap_pages)
> + goto activate_locked;
> +
> /* Split shmem folio */
> if (split_folio_to_list(folio, folio_list))
> goto keep_locked;
Reminds me of
commit 9a976f0c847b67d22ed694556a3626ed92da0422
Author: Luis Chamberlain <mcgrof@kernel.org>
Date: Thu Mar 9 15:05:43 2023 -0800
shmem: skip page split if we're not reclaiming
In theory when info->flags & VM_LOCKED we should not be getting
shem_writepage() called so we should be verifying this with a
WARN_ON_ONCE(). Since we should not be swapping then best to ensure we
also don't do the folio split earlier too. So just move the check early
to avoid folio splits in case its a dubious call.
We also have a similar early bail when !total_swap_pages so just move that
earlier to avoid the possible folio split in the same situation.
But indeed, pageout() -> writepage() is called *after* the split in the vmscan path.
In that "noswap" context, I wonder if we also want to skip folios part of shmem
with disabled swapping?
But now I am wondering under which circumstances we end up calling
shmem_writepage() with a large folio. And I think the answer is the comment of
folio_test_large(): via drivers/gpu/drm/i915/gem/i915_gem_shmem.c.
... so if shmem_writepage() handles+checks that, could we do
diff --git a/mm/vmscan.c b/mm/vmscan.c
index a332cb80e928..7dfa3d6e8ba7 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1257,11 +1257,6 @@ static unsigned int shrink_folio_list(struct list_head *folio_list,
goto activate_locked_split;
}
}
- } else if (folio_test_swapbacked(folio) &&
- folio_test_large(folio)) {
- /* Split shmem folio */
- if (split_folio_to_list(folio, folio_list))
- goto keep_locked;
}
/*
instead?
--
Cheers,
David / dhildenb
^ permalink raw reply related [flat|nested] 21+ messages in thread* Re: [PATCH v4 01/10] mm: vmscan: add validation before spliting shmem large folio
2024-08-07 15:53 ` David Hildenbrand
@ 2024-08-08 2:36 ` Baolin Wang
2024-08-08 8:51 ` David Hildenbrand
2024-08-08 12:35 ` Matthew Wilcox
0 siblings, 2 replies; 21+ messages in thread
From: Baolin Wang @ 2024-08-08 2:36 UTC (permalink / raw)
To: David Hildenbrand, akpm, hughd
Cc: willy, wangkefeng.wang, chrisl, ying.huang, 21cnbao, ryan.roberts,
shy828301, ziy, ioworker0, da.gomez, p.raghav, linux-mm,
linux-kernel, Christian Brauner, Luis Chamberlain
On 2024/8/7 23:53, David Hildenbrand wrote:
> On 07.08.24 09:31, Baolin Wang wrote:
>> Page reclaim will not scan anon LRU if no swap space, however
>> MADV_PAGEOUT
>> can still split shmem large folios even without a swap device. Thus add
>> swap available space validation before spliting shmem large folio to
>> avoid redundant split.
>>
>> Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
>> ---
>> mm/vmscan.c | 8 ++++++++
>> 1 file changed, 8 insertions(+)
>>
>> diff --git a/mm/vmscan.c b/mm/vmscan.c
>> index 31d13462571e..796f65781f4f 100644
>> --- a/mm/vmscan.c
>> +++ b/mm/vmscan.c
>> @@ -1259,6 +1259,14 @@ static unsigned int shrink_folio_list(struct
>> list_head *folio_list,
>> }
>> } else if (folio_test_swapbacked(folio) &&
>> folio_test_large(folio)) {
>> +
>> + /*
>> + * Do not split shmem folio if no swap memory
>> + * available.
>> + */
>> + if (!total_swap_pages)
>> + goto activate_locked;
>> +
>> /* Split shmem folio */
>> if (split_folio_to_list(folio, folio_list))
>> goto keep_locked;
>
> Reminds me of
>
> commit 9a976f0c847b67d22ed694556a3626ed92da0422
> Author: Luis Chamberlain <mcgrof@kernel.org>
> Date: Thu Mar 9 15:05:43 2023 -0800
>
> shmem: skip page split if we're not reclaiming
> In theory when info->flags & VM_LOCKED we should not be getting
> shem_writepage() called so we should be verifying this with a
> WARN_ON_ONCE(). Since we should not be swapping then best to
> ensure we
> also don't do the folio split earlier too. So just move the check
> early
> to avoid folio splits in case its a dubious call.
> We also have a similar early bail when !total_swap_pages so just
> move that
> earlier to avoid the possible folio split in the same situation.
>
>
> But indeed, pageout() -> writepage() is called *after* the split in the
> vmscan path.
>
> In that "noswap" context, I wonder if we also want to skip folios part
> of shmem
> with disabled swapping?
Yes, I think so.
>
> But now I am wondering under which circumstances we end up calling
> shmem_writepage() with a large folio. And I think the answer is the
> comment of
> folio_test_large(): via drivers/gpu/drm/i915/gem/i915_gem_shmem.c.
>
>
> ... so if shmem_writepage() handles+checks that, could we do
>
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index a332cb80e928..7dfa3d6e8ba7 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -1257,11 +1257,6 @@ static unsigned int shrink_folio_list(struct
> list_head *folio_list,
> goto
> activate_locked_split;
> }
> }
> - } else if (folio_test_swapbacked(folio) &&
> - folio_test_large(folio)) {
> - /* Split shmem folio */
> - if (split_folio_to_list(folio, folio_list))
> - goto keep_locked;
> }
>
> /*
>
> instead?
Seems reasonable to me. But we should pass the 'folio_list' to
shmem_writepage() to list the subpages of the large folio. Let me try.
Thanks.
^ permalink raw reply [flat|nested] 21+ messages in thread* Re: [PATCH v4 01/10] mm: vmscan: add validation before spliting shmem large folio
2024-08-08 2:36 ` Baolin Wang
@ 2024-08-08 8:51 ` David Hildenbrand
2024-08-08 9:34 ` Baolin Wang
2024-08-08 12:35 ` Matthew Wilcox
1 sibling, 1 reply; 21+ messages in thread
From: David Hildenbrand @ 2024-08-08 8:51 UTC (permalink / raw)
To: Baolin Wang, akpm, hughd
Cc: willy, wangkefeng.wang, chrisl, ying.huang, 21cnbao, ryan.roberts,
shy828301, ziy, ioworker0, da.gomez, p.raghav, linux-mm,
linux-kernel, Christian Brauner, Luis Chamberlain
On 08.08.24 04:36, Baolin Wang wrote:
>
>
> On 2024/8/7 23:53, David Hildenbrand wrote:
>> On 07.08.24 09:31, Baolin Wang wrote:
>>> Page reclaim will not scan anon LRU if no swap space, however
>>> MADV_PAGEOUT
>>> can still split shmem large folios even without a swap device. Thus add
>>> swap available space validation before spliting shmem large folio to
>>> avoid redundant split.
>>>
>>> Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
>>> ---
>>> mm/vmscan.c | 8 ++++++++
>>> 1 file changed, 8 insertions(+)
>>>
>>> diff --git a/mm/vmscan.c b/mm/vmscan.c
>>> index 31d13462571e..796f65781f4f 100644
>>> --- a/mm/vmscan.c
>>> +++ b/mm/vmscan.c
>>> @@ -1259,6 +1259,14 @@ static unsigned int shrink_folio_list(struct
>>> list_head *folio_list,
>>> }
>>> } else if (folio_test_swapbacked(folio) &&
>>> folio_test_large(folio)) {
>>> +
>>> + /*
>>> + * Do not split shmem folio if no swap memory
>>> + * available.
>>> + */
>>> + if (!total_swap_pages)
>>> + goto activate_locked;
>>> +
>>> /* Split shmem folio */
>>> if (split_folio_to_list(folio, folio_list))
>>> goto keep_locked;
>>
>> Reminds me of
>>
>> commit 9a976f0c847b67d22ed694556a3626ed92da0422
>> Author: Luis Chamberlain <mcgrof@kernel.org>
>> Date: Thu Mar 9 15:05:43 2023 -0800
>>
>> shmem: skip page split if we're not reclaiming
>> In theory when info->flags & VM_LOCKED we should not be getting
>> shem_writepage() called so we should be verifying this with a
>> WARN_ON_ONCE(). Since we should not be swapping then best to
>> ensure we
>> also don't do the folio split earlier too. So just move the check
>> early
>> to avoid folio splits in case its a dubious call.
>> We also have a similar early bail when !total_swap_pages so just
>> move that
>> earlier to avoid the possible folio split in the same situation.
>>
>>
>> But indeed, pageout() -> writepage() is called *after* the split in the
>> vmscan path.
>>
>> In that "noswap" context, I wonder if we also want to skip folios part
>> of shmem
>> with disabled swapping?
>
> Yes, I think so.
>
>>
>> But now I am wondering under which circumstances we end up calling
>> shmem_writepage() with a large folio. And I think the answer is the
>> comment of
>> folio_test_large(): via drivers/gpu/drm/i915/gem/i915_gem_shmem.c.
>>
>>
>> ... so if shmem_writepage() handles+checks that, could we do
>>
>> diff --git a/mm/vmscan.c b/mm/vmscan.c
>> index a332cb80e928..7dfa3d6e8ba7 100644
>> --- a/mm/vmscan.c
>> +++ b/mm/vmscan.c
>> @@ -1257,11 +1257,6 @@ static unsigned int shrink_folio_list(struct
>> list_head *folio_list,
>> goto
>> activate_locked_split;
>> }
>> }
>> - } else if (folio_test_swapbacked(folio) &&
>> - folio_test_large(folio)) {
>> - /* Split shmem folio */
>> - if (split_folio_to_list(folio, folio_list))
>> - goto keep_locked;
>> }
>>
>> /*
>>
>> instead?
>
> Seems reasonable to me. But we should pass the 'folio_list' to
> shmem_writepage() to list the subpages of the large folio. Let me try.
Ah, yes, good point. Alternatively, we'd have to split and try writing
all subpages in there. I wonder what to do if we fail to write some, and
if we could handle that transparently, without the folio_list.
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 21+ messages in thread* Re: [PATCH v4 01/10] mm: vmscan: add validation before spliting shmem large folio
2024-08-08 8:51 ` David Hildenbrand
@ 2024-08-08 9:34 ` Baolin Wang
2024-08-08 10:48 ` Daniel Gomez
0 siblings, 1 reply; 21+ messages in thread
From: Baolin Wang @ 2024-08-08 9:34 UTC (permalink / raw)
To: David Hildenbrand, akpm, hughd
Cc: willy, wangkefeng.wang, chrisl, ying.huang, 21cnbao, ryan.roberts,
shy828301, ziy, ioworker0, da.gomez, p.raghav, linux-mm,
linux-kernel, Christian Brauner, Luis Chamberlain
On 2024/8/8 16:51, David Hildenbrand wrote:
> On 08.08.24 04:36, Baolin Wang wrote:
>>
>>
>> On 2024/8/7 23:53, David Hildenbrand wrote:
>>> On 07.08.24 09:31, Baolin Wang wrote:
>>>> Page reclaim will not scan anon LRU if no swap space, however
>>>> MADV_PAGEOUT
>>>> can still split shmem large folios even without a swap device. Thus add
>>>> swap available space validation before spliting shmem large folio to
>>>> avoid redundant split.
>>>>
>>>> Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
>>>> ---
>>>> mm/vmscan.c | 8 ++++++++
>>>> 1 file changed, 8 insertions(+)
>>>>
>>>> diff --git a/mm/vmscan.c b/mm/vmscan.c
>>>> index 31d13462571e..796f65781f4f 100644
>>>> --- a/mm/vmscan.c
>>>> +++ b/mm/vmscan.c
>>>> @@ -1259,6 +1259,14 @@ static unsigned int shrink_folio_list(struct
>>>> list_head *folio_list,
>>>> }
>>>> } else if (folio_test_swapbacked(folio) &&
>>>> folio_test_large(folio)) {
>>>> +
>>>> + /*
>>>> + * Do not split shmem folio if no swap memory
>>>> + * available.
>>>> + */
>>>> + if (!total_swap_pages)
>>>> + goto activate_locked;
>>>> +
>>>> /* Split shmem folio */
>>>> if (split_folio_to_list(folio, folio_list))
>>>> goto keep_locked;
>>>
>>> Reminds me of
>>>
>>> commit 9a976f0c847b67d22ed694556a3626ed92da0422
>>> Author: Luis Chamberlain <mcgrof@kernel.org>
>>> Date: Thu Mar 9 15:05:43 2023 -0800
>>>
>>> shmem: skip page split if we're not reclaiming
>>> In theory when info->flags & VM_LOCKED we should not be getting
>>> shem_writepage() called so we should be verifying this with a
>>> WARN_ON_ONCE(). Since we should not be swapping then best to
>>> ensure we
>>> also don't do the folio split earlier too. So just move the check
>>> early
>>> to avoid folio splits in case its a dubious call.
>>> We also have a similar early bail when !total_swap_pages so just
>>> move that
>>> earlier to avoid the possible folio split in the same situation.
>>>
>>>
>>> But indeed, pageout() -> writepage() is called *after* the split in the
>>> vmscan path.
>>>
>>> In that "noswap" context, I wonder if we also want to skip folios part
>>> of shmem
>>> with disabled swapping?
>>
>> Yes, I think so.
>>
>>>
>>> But now I am wondering under which circumstances we end up calling
>>> shmem_writepage() with a large folio. And I think the answer is the
>>> comment of
>>> folio_test_large(): via drivers/gpu/drm/i915/gem/i915_gem_shmem.c.
>>>
>>>
>>> ... so if shmem_writepage() handles+checks that, could we do
>>>
>>> diff --git a/mm/vmscan.c b/mm/vmscan.c
>>> index a332cb80e928..7dfa3d6e8ba7 100644
>>> --- a/mm/vmscan.c
>>> +++ b/mm/vmscan.c
>>> @@ -1257,11 +1257,6 @@ static unsigned int shrink_folio_list(struct
>>> list_head *folio_list,
>>> goto
>>> activate_locked_split;
>>> }
>>> }
>>> - } else if (folio_test_swapbacked(folio) &&
>>> - folio_test_large(folio)) {
>>> - /* Split shmem folio */
>>> - if (split_folio_to_list(folio, folio_list))
>>> - goto keep_locked;
>>> }
>>>
>>> /*
>>>
>>> instead?
>>
>> Seems reasonable to me. But we should pass the 'folio_list' to
>> shmem_writepage() to list the subpages of the large folio. Let me try.
>
> Ah, yes, good point. Alternatively, we'd have to split and try writing
> all subpages in there. I wonder what to do if we fail to write some, and
> if we could handle that transparently, without the folio_list.
After some investigation, I prefer to pass 'folio_list' to
shmem_writepage() via 'struct writeback_control', which could simplify
the logic a lot. Otherwise, we need to handle each subpage's
writeback/reclaim/dirty state, as well as tracking each subpage's write
result, which seems more complicated.
I made a quick change by passing 'folio_list', and it looks simple and
works as expected.
diff --git a/include/linux/writeback.h b/include/linux/writeback.h
index 75196b0f894f..10100e22d5c6 100644
--- a/include/linux/writeback.h
+++ b/include/linux/writeback.h
@@ -80,6 +80,9 @@ struct writeback_control {
*/
struct swap_iocb **swap_plug;
+ /* Target list for splitting a large folio */
+ struct list_head *list;
+
/* internal fields used by the ->writepages implementation: */
struct folio_batch fbatch;
pgoff_t index;
diff --git a/mm/shmem.c b/mm/shmem.c
index 05525e9e7423..0a5a68f7d0a0 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -1496,9 +1496,10 @@ static int shmem_writepage(struct page *page,
struct writeback_control *wbc)
* and its shmem_writeback() needs them to be split when swapping.
*/
if (wbc->split_large_folio && folio_test_large(folio)) {
+try_split:
/* Ensure the subpages are still dirty */
folio_test_set_dirty(folio);
- if (split_huge_page(page) < 0)
+ if (split_huge_page_to_list_to_order(page, wbc->list, 0))
goto redirty;
folio = page_folio(page);
folio_clear_dirty(folio);
@@ -1540,8 +1541,12 @@ static int shmem_writepage(struct page *page,
struct writeback_control *wbc)
}
swap = folio_alloc_swap(folio);
- if (!swap.val)
+ if (!swap.val) {
+ if (nr_pages > 1)
+ goto try_split;
+
goto redirty;
+ }
/*
* Add inode to shmem_unuse()'s list of swapped-out inodes,
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 277571815789..cf982cf2454f 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -628,7 +628,7 @@ typedef enum {
* Calls ->writepage().
*/
static pageout_t pageout(struct folio *folio, struct address_space
*mapping,
- struct swap_iocb **plug)
+ struct swap_iocb **plug, struct list_head
*folio_list)
{
/*
* If the folio is dirty, only perform writeback if that write
@@ -676,6 +676,11 @@ static pageout_t pageout(struct folio *folio,
struct address_space *mapping,
.swap_plug = plug,
};
+ if (shmem_mapping(mapping)) {
+ wbc.list = folio_list;
+ wbc.split_large_folio =
!IS_ENABLED(CONFIG_THP_SWAP);
+ }
+
folio_set_reclaim(folio);
res = mapping->a_ops->writepage(&folio->page, &wbc);
if (res < 0)
@@ -1259,23 +1264,6 @@ static unsigned int shrink_folio_list(struct
list_head *folio_list,
goto activate_locked_split;
}
}
- } else if (folio_test_swapbacked(folio) &&
- folio_test_large(folio)) {
-
- /*
- * Do not split shmem folio if no swap memory
- * available.
- */
- if (!total_swap_pages)
- goto activate_locked;
-
- /*
- * Only split shmem folio when CONFIG_THP_SWAP
- * is not enabled.
- */
- if (!IS_ENABLED(CONFIG_THP_SWAP) &&
- split_folio_to_list(folio, folio_list))
- goto keep_locked;
}
/*
@@ -1377,18 +1365,21 @@ static unsigned int shrink_folio_list(struct
list_head *folio_list,
* starts and then write it out here.
*/
try_to_unmap_flush_dirty();
-try_pageout:
- switch (pageout(folio, mapping, &plug)) {
+ switch (pageout(folio, mapping, &plug,
folio_list)) {
case PAGE_KEEP:
goto keep_locked;
case PAGE_ACTIVATE:
- if (shmem_mapping(mapping) &&
folio_test_large(folio) &&
- !split_folio_to_list(folio,
folio_list)) {
+ /* Shmem can be split when writeback to
swap */
+ if ((nr_pages > 1) &&
!folio_test_large(folio)) {
+ sc->nr_scanned -= (nr_pages - 1);
nr_pages = 1;
- goto try_pageout;
}
goto activate_locked;
case PAGE_SUCCESS:
+ if ((nr_pages > 1) &&
!folio_test_large(folio)) {
+ sc->nr_scanned -= (nr_pages - 1);
+ nr_pages = 1;
+ }
stat->nr_pageout += nr_pages;
if (folio_test_writeback(folio)) {
^ permalink raw reply related [flat|nested] 21+ messages in thread* Re: [PATCH v4 01/10] mm: vmscan: add validation before spliting shmem large folio
2024-08-08 9:34 ` Baolin Wang
@ 2024-08-08 10:48 ` Daniel Gomez
2024-08-08 10:57 ` Daniel Gomez
0 siblings, 1 reply; 21+ messages in thread
From: Daniel Gomez @ 2024-08-08 10:48 UTC (permalink / raw)
To: Baolin Wang
Cc: David Hildenbrand, akpm@linux-foundation.org, hughd@google.com,
willy@infradead.org, wangkefeng.wang@huawei.com,
chrisl@kernel.org, ying.huang@intel.com, 21cnbao@gmail.com,
ryan.roberts@arm.com, shy828301@gmail.com, ziy@nvidia.com,
ioworker0@gmail.com, Pankaj Raghav, linux-mm@kvack.org,
linux-kernel@vger.kernel.org, Christian Brauner, Luis Chamberlain
On Thu, Aug 08, 2024 at 05:34:50PM GMT, Baolin Wang wrote:
>
>
> On 2024/8/8 16:51, David Hildenbrand wrote:
> > On 08.08.24 04:36, Baolin Wang wrote:
> > >
> > >
> > > On 2024/8/7 23:53, David Hildenbrand wrote:
> > > > On 07.08.24 09:31, Baolin Wang wrote:
> > > > > Page reclaim will not scan anon LRU if no swap space, however
> > > > > MADV_PAGEOUT
> > > > > can still split shmem large folios even without a swap device. Thus add
> > > > > swap available space validation before spliting shmem large folio to
> > > > > avoid redundant split.
> > > > >
> > > > > Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
> > > > > ---
> > > > > mm/vmscan.c | 8 ++++++++
> > > > > 1 file changed, 8 insertions(+)
> > > > >
> > > > > diff --git a/mm/vmscan.c b/mm/vmscan.c
> > > > > index 31d13462571e..796f65781f4f 100644
> > > > > --- a/mm/vmscan.c
> > > > > +++ b/mm/vmscan.c
> > > > > @@ -1259,6 +1259,14 @@ static unsigned int shrink_folio_list(struct
> > > > > list_head *folio_list,
> > > > > }
> > > > > } else if (folio_test_swapbacked(folio) &&
> > > > > folio_test_large(folio)) {
> > > > > +
> > > > > + /*
> > > > > + * Do not split shmem folio if no swap memory
> > > > > + * available.
> > > > > + */
> > > > > + if (!total_swap_pages)
> > > > > + goto activate_locked;
> > > > > +
> > > > > /* Split shmem folio */
> > > > > if (split_folio_to_list(folio, folio_list))
> > > > > goto keep_locked;
> > > >
> > > > Reminds me of
> > > >
> > > > commit 9a976f0c847b67d22ed694556a3626ed92da0422
> > > > Author: Luis Chamberlain <mcgrof@kernel.org>
> > > > Date: Thu Mar 9 15:05:43 2023 -0800
> > > >
> > > > shmem: skip page split if we're not reclaiming
> > > > In theory when info->flags & VM_LOCKED we should not be getting
> > > > shem_writepage() called so we should be verifying this with a
> > > > WARN_ON_ONCE(). Since we should not be swapping then best to
> > > > ensure we
> > > > also don't do the folio split earlier too. So just move the check
> > > > early
> > > > to avoid folio splits in case its a dubious call.
> > > > We also have a similar early bail when !total_swap_pages so just
> > > > move that
> > > > earlier to avoid the possible folio split in the same situation.
> > > >
> > > >
> > > > But indeed, pageout() -> writepage() is called *after* the split in the
> > > > vmscan path.
> > > >
> > > > In that "noswap" context, I wonder if we also want to skip folios part
> > > > of shmem
> > > > with disabled swapping?
> > >
> > > Yes, I think so.
> > >
> > > >
> > > > But now I am wondering under which circumstances we end up calling
> > > > shmem_writepage() with a large folio. And I think the answer is the
> > > > comment of
> > > > folio_test_large(): via drivers/gpu/drm/i915/gem/i915_gem_shmem.c.
> > > >
> > > >
> > > > ... so if shmem_writepage() handles+checks that, could we do
> > > >
> > > > diff --git a/mm/vmscan.c b/mm/vmscan.c
> > > > index a332cb80e928..7dfa3d6e8ba7 100644
> > > > --- a/mm/vmscan.c
> > > > +++ b/mm/vmscan.c
> > > > @@ -1257,11 +1257,6 @@ static unsigned int shrink_folio_list(struct
> > > > list_head *folio_list,
> > > > goto
> > > > activate_locked_split;
> > > > }
> > > > }
> > > > - } else if (folio_test_swapbacked(folio) &&
> > > > - folio_test_large(folio)) {
> > > > - /* Split shmem folio */
> > > > - if (split_folio_to_list(folio, folio_list))
> > > > - goto keep_locked;
> > > > }
> > > >
> > > > /*
> > > >
> > > > instead?
> > >
> > > Seems reasonable to me. But we should pass the 'folio_list' to
> > > shmem_writepage() to list the subpages of the large folio. Let me try.
> >
> > Ah, yes, good point. Alternatively, we'd have to split and try writing
> > all subpages in there. I wonder what to do if we fail to write some, and
> > if we could handle that transparently, without the folio_list.
>
> After some investigation, I prefer to pass 'folio_list' to shmem_writepage()
> via 'struct writeback_control', which could simplify the logic a lot.
> Otherwise, we need to handle each subpage's writeback/reclaim/dirty state,
> as well as tracking each subpage's write result, which seems more
> complicated.
>
> I made a quick change by passing 'folio_list', and it looks simple and works
> as expected.
>
> diff --git a/include/linux/writeback.h b/include/linux/writeback.h
> index 75196b0f894f..10100e22d5c6 100644
> --- a/include/linux/writeback.h
> +++ b/include/linux/writeback.h
> @@ -80,6 +80,9 @@ struct writeback_control {
> */
> struct swap_iocb **swap_plug;
>
> + /* Target list for splitting a large folio */
> + struct list_head *list;
> +
> /* internal fields used by the ->writepages implementation: */
> struct folio_batch fbatch;
> pgoff_t index;
> diff --git a/mm/shmem.c b/mm/shmem.c
> index 05525e9e7423..0a5a68f7d0a0 100644
> --- a/mm/shmem.c
> +++ b/mm/shmem.c
> @@ -1496,9 +1496,10 @@ static int shmem_writepage(struct page *page, struct
> writeback_control *wbc)
> * and its shmem_writeback() needs them to be split when swapping.
> */
> if (wbc->split_large_folio && folio_test_large(folio)) {
> +try_split:
> /* Ensure the subpages are still dirty */
> folio_test_set_dirty(folio);
> - if (split_huge_page(page) < 0)
> + if (split_huge_page_to_list_to_order(page, wbc->list, 0))
We check for split_large_folio, but we still send the wbc->list for i915
writepage() case. Previously, we were sending a NULL list. Shouldn't we address
that case too?
> goto redirty;
> folio = page_folio(page);
> folio_clear_dirty(folio);
> @@ -1540,8 +1541,12 @@ static int shmem_writepage(struct page *page, struct
> writeback_control *wbc)
> }
>
> swap = folio_alloc_swap(folio);
> - if (!swap.val)
> + if (!swap.val) {
> + if (nr_pages > 1)
> + goto try_split;
> +
> goto redirty;
> + }
>
> /*
> * Add inode to shmem_unuse()'s list of swapped-out inodes,
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 277571815789..cf982cf2454f 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -628,7 +628,7 @@ typedef enum {
> * Calls ->writepage().
> */
> static pageout_t pageout(struct folio *folio, struct address_space
> *mapping,
> - struct swap_iocb **plug)
> + struct swap_iocb **plug, struct list_head
> *folio_list)
> {
> /*
> * If the folio is dirty, only perform writeback if that write
> @@ -676,6 +676,11 @@ static pageout_t pageout(struct folio *folio, struct
> address_space *mapping,
> .swap_plug = plug,
> };
>
> + if (shmem_mapping(mapping)) {
> + wbc.list = folio_list;
> + wbc.split_large_folio =
> !IS_ENABLED(CONFIG_THP_SWAP);
> + }
> +
> folio_set_reclaim(folio);
> res = mapping->a_ops->writepage(&folio->page, &wbc);
> if (res < 0)
> @@ -1259,23 +1264,6 @@ static unsigned int shrink_folio_list(struct
> list_head *folio_list,
> goto activate_locked_split;
> }
> }
> - } else if (folio_test_swapbacked(folio) &&
> - folio_test_large(folio)) {
> -
> - /*
> - * Do not split shmem folio if no swap memory
> - * available.
> - */
> - if (!total_swap_pages)
> - goto activate_locked;
> -
> - /*
> - * Only split shmem folio when CONFIG_THP_SWAP
> - * is not enabled.
> - */
> - if (!IS_ENABLED(CONFIG_THP_SWAP) &&
> - split_folio_to_list(folio, folio_list))
> - goto keep_locked;
> }
>
> /*
> @@ -1377,18 +1365,21 @@ static unsigned int shrink_folio_list(struct
> list_head *folio_list,
> * starts and then write it out here.
> */
> try_to_unmap_flush_dirty();
> -try_pageout:
> - switch (pageout(folio, mapping, &plug)) {
> + switch (pageout(folio, mapping, &plug, folio_list))
> {
> case PAGE_KEEP:
> goto keep_locked;
> case PAGE_ACTIVATE:
> - if (shmem_mapping(mapping) &&
> folio_test_large(folio) &&
> - !split_folio_to_list(folio, folio_list))
> {
> + /* Shmem can be split when writeback to swap
> */
> + if ((nr_pages > 1) &&
> !folio_test_large(folio)) {
> + sc->nr_scanned -= (nr_pages - 1);
> nr_pages = 1;
> - goto try_pageout;
> }
> goto activate_locked;
> case PAGE_SUCCESS:
> + if ((nr_pages > 1) &&
> !folio_test_large(folio)) {
> + sc->nr_scanned -= (nr_pages - 1);
> + nr_pages = 1;
> + }
> stat->nr_pageout += nr_pages;
>
> if (folio_test_writeback(folio)) {
^ permalink raw reply [flat|nested] 21+ messages in thread* Re: [PATCH v4 01/10] mm: vmscan: add validation before spliting shmem large folio
2024-08-08 10:48 ` Daniel Gomez
@ 2024-08-08 10:57 ` Daniel Gomez
2024-08-08 11:51 ` Baolin Wang
0 siblings, 1 reply; 21+ messages in thread
From: Daniel Gomez @ 2024-08-08 10:57 UTC (permalink / raw)
To: Baolin Wang
Cc: David Hildenbrand, akpm@linux-foundation.org, hughd@google.com,
willy@infradead.org, wangkefeng.wang@huawei.com,
chrisl@kernel.org, ying.huang@intel.com, 21cnbao@gmail.com,
ryan.roberts@arm.com, shy828301@gmail.com, ziy@nvidia.com,
ioworker0@gmail.com, Pankaj Raghav, linux-mm@kvack.org,
linux-kernel@vger.kernel.org, Christian Brauner, Luis Chamberlain
On Thu, Aug 08, 2024 at 12:48:52PM GMT, Daniel Gomez wrote:
> On Thu, Aug 08, 2024 at 05:34:50PM GMT, Baolin Wang wrote:
> >
> >
> > On 2024/8/8 16:51, David Hildenbrand wrote:
> > > On 08.08.24 04:36, Baolin Wang wrote:
> > > >
> > > >
> > > > On 2024/8/7 23:53, David Hildenbrand wrote:
> > > > > On 07.08.24 09:31, Baolin Wang wrote:
> > > > > > Page reclaim will not scan anon LRU if no swap space, however
> > > > > > MADV_PAGEOUT
> > > > > > can still split shmem large folios even without a swap device. Thus add
> > > > > > swap available space validation before spliting shmem large folio to
> > > > > > avoid redundant split.
> > > > > >
> > > > > > Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
> > > > > > ---
> > > > > > mm/vmscan.c | 8 ++++++++
> > > > > > 1 file changed, 8 insertions(+)
> > > > > >
> > > > > > diff --git a/mm/vmscan.c b/mm/vmscan.c
> > > > > > index 31d13462571e..796f65781f4f 100644
> > > > > > --- a/mm/vmscan.c
> > > > > > +++ b/mm/vmscan.c
> > > > > > @@ -1259,6 +1259,14 @@ static unsigned int shrink_folio_list(struct
> > > > > > list_head *folio_list,
> > > > > > }
> > > > > > } else if (folio_test_swapbacked(folio) &&
> > > > > > folio_test_large(folio)) {
> > > > > > +
> > > > > > + /*
> > > > > > + * Do not split shmem folio if no swap memory
> > > > > > + * available.
> > > > > > + */
> > > > > > + if (!total_swap_pages)
> > > > > > + goto activate_locked;
> > > > > > +
> > > > > > /* Split shmem folio */
> > > > > > if (split_folio_to_list(folio, folio_list))
> > > > > > goto keep_locked;
> > > > >
> > > > > Reminds me of
> > > > >
> > > > > commit 9a976f0c847b67d22ed694556a3626ed92da0422
> > > > > Author: Luis Chamberlain <mcgrof@kernel.org>
> > > > > Date: Thu Mar 9 15:05:43 2023 -0800
> > > > >
> > > > > shmem: skip page split if we're not reclaiming
> > > > > In theory when info->flags & VM_LOCKED we should not be getting
> > > > > shem_writepage() called so we should be verifying this with a
> > > > > WARN_ON_ONCE(). Since we should not be swapping then best to
> > > > > ensure we
> > > > > also don't do the folio split earlier too. So just move the check
> > > > > early
> > > > > to avoid folio splits in case its a dubious call.
> > > > > We also have a similar early bail when !total_swap_pages so just
> > > > > move that
> > > > > earlier to avoid the possible folio split in the same situation.
> > > > >
> > > > >
> > > > > But indeed, pageout() -> writepage() is called *after* the split in the
> > > > > vmscan path.
> > > > >
> > > > > In that "noswap" context, I wonder if we also want to skip folios part
> > > > > of shmem
> > > > > with disabled swapping?
> > > >
> > > > Yes, I think so.
> > > >
> > > > >
> > > > > But now I am wondering under which circumstances we end up calling
> > > > > shmem_writepage() with a large folio. And I think the answer is the
> > > > > comment of
> > > > > folio_test_large(): via drivers/gpu/drm/i915/gem/i915_gem_shmem.c.
> > > > >
> > > > >
> > > > > ... so if shmem_writepage() handles+checks that, could we do
> > > > >
> > > > > diff --git a/mm/vmscan.c b/mm/vmscan.c
> > > > > index a332cb80e928..7dfa3d6e8ba7 100644
> > > > > --- a/mm/vmscan.c
> > > > > +++ b/mm/vmscan.c
> > > > > @@ -1257,11 +1257,6 @@ static unsigned int shrink_folio_list(struct
> > > > > list_head *folio_list,
> > > > > goto
> > > > > activate_locked_split;
> > > > > }
> > > > > }
> > > > > - } else if (folio_test_swapbacked(folio) &&
> > > > > - folio_test_large(folio)) {
> > > > > - /* Split shmem folio */
> > > > > - if (split_folio_to_list(folio, folio_list))
> > > > > - goto keep_locked;
> > > > > }
> > > > >
> > > > > /*
> > > > >
> > > > > instead?
> > > >
> > > > Seems reasonable to me. But we should pass the 'folio_list' to
> > > > shmem_writepage() to list the subpages of the large folio. Let me try.
> > >
> > > Ah, yes, good point. Alternatively, we'd have to split and try writing
> > > all subpages in there. I wonder what to do if we fail to write some, and
> > > if we could handle that transparently, without the folio_list.
> >
> > After some investigation, I prefer to pass 'folio_list' to shmem_writepage()
> > via 'struct writeback_control', which could simplify the logic a lot.
> > Otherwise, we need to handle each subpage's writeback/reclaim/dirty state,
> > as well as tracking each subpage's write result, which seems more
> > complicated.
> >
> > I made a quick change by passing 'folio_list', and it looks simple and works
> > as expected.
> >
> > diff --git a/include/linux/writeback.h b/include/linux/writeback.h
> > index 75196b0f894f..10100e22d5c6 100644
> > --- a/include/linux/writeback.h
> > +++ b/include/linux/writeback.h
> > @@ -80,6 +80,9 @@ struct writeback_control {
> > */
> > struct swap_iocb **swap_plug;
> >
> > + /* Target list for splitting a large folio */
> > + struct list_head *list;
> > +
> > /* internal fields used by the ->writepages implementation: */
> > struct folio_batch fbatch;
> > pgoff_t index;
> > diff --git a/mm/shmem.c b/mm/shmem.c
> > index 05525e9e7423..0a5a68f7d0a0 100644
> > --- a/mm/shmem.c
> > +++ b/mm/shmem.c
> > @@ -1496,9 +1496,10 @@ static int shmem_writepage(struct page *page, struct
> > writeback_control *wbc)
> > * and its shmem_writeback() needs them to be split when swapping.
> > */
> > if (wbc->split_large_folio && folio_test_large(folio)) {
> > +try_split:
> > /* Ensure the subpages are still dirty */
> > folio_test_set_dirty(folio);
> > - if (split_huge_page(page) < 0)
> > + if (split_huge_page_to_list_to_order(page, wbc->list, 0))
>
> We check for split_large_folio, but we still send the wbc->list for i915
> writepage() case. Previously, we were sending a NULL list. Shouldn't we address
> that case too?
I guess I was missing wbc initialization snippet:
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_shmem.c b/drivers/gpu/drm/i915/gem/i915_gem_shmem.c
index fe69f2c8527d..174b95a9a988 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_shmem.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_shmem.c
@@ -308,6 +308,7 @@ void __shmem_writeback(size_t size, struct address_space *mapping)
.range_start = 0,
.range_end = LLONG_MAX,
.for_reclaim = 1,
+ .list = NULL,
};
unsigned long i;
> > goto redirty;
> > folio = page_folio(page);
> > folio_clear_dirty(folio);
> > @@ -1540,8 +1541,12 @@ static int shmem_writepage(struct page *page, struct
> > writeback_control *wbc)
> > }
> >
> > swap = folio_alloc_swap(folio);
> > - if (!swap.val)
> > + if (!swap.val) {
> > + if (nr_pages > 1)
> > + goto try_split;
> > +
> > goto redirty;
> > + }
> >
> > /*
> > * Add inode to shmem_unuse()'s list of swapped-out inodes,
> > diff --git a/mm/vmscan.c b/mm/vmscan.c
> > index 277571815789..cf982cf2454f 100644
> > --- a/mm/vmscan.c
> > +++ b/mm/vmscan.c
> > @@ -628,7 +628,7 @@ typedef enum {
> > * Calls ->writepage().
> > */
> > static pageout_t pageout(struct folio *folio, struct address_space
> > *mapping,
> > - struct swap_iocb **plug)
> > + struct swap_iocb **plug, struct list_head
> > *folio_list)
> > {
> > /*
> > * If the folio is dirty, only perform writeback if that write
> > @@ -676,6 +676,11 @@ static pageout_t pageout(struct folio *folio, struct
> > address_space *mapping,
> > .swap_plug = plug,
> > };
> >
> > + if (shmem_mapping(mapping)) {
> > + wbc.list = folio_list;
> > + wbc.split_large_folio =
> > !IS_ENABLED(CONFIG_THP_SWAP);
> > + }
> > +
> > folio_set_reclaim(folio);
> > res = mapping->a_ops->writepage(&folio->page, &wbc);
> > if (res < 0)
> > @@ -1259,23 +1264,6 @@ static unsigned int shrink_folio_list(struct
> > list_head *folio_list,
> > goto activate_locked_split;
> > }
> > }
> > - } else if (folio_test_swapbacked(folio) &&
> > - folio_test_large(folio)) {
> > -
> > - /*
> > - * Do not split shmem folio if no swap memory
> > - * available.
> > - */
> > - if (!total_swap_pages)
> > - goto activate_locked;
> > -
> > - /*
> > - * Only split shmem folio when CONFIG_THP_SWAP
> > - * is not enabled.
> > - */
> > - if (!IS_ENABLED(CONFIG_THP_SWAP) &&
> > - split_folio_to_list(folio, folio_list))
> > - goto keep_locked;
> > }
> >
> > /*
> > @@ -1377,18 +1365,21 @@ static unsigned int shrink_folio_list(struct
> > list_head *folio_list,
> > * starts and then write it out here.
> > */
> > try_to_unmap_flush_dirty();
> > -try_pageout:
> > - switch (pageout(folio, mapping, &plug)) {
> > + switch (pageout(folio, mapping, &plug, folio_list))
> > {
> > case PAGE_KEEP:
> > goto keep_locked;
> > case PAGE_ACTIVATE:
> > - if (shmem_mapping(mapping) &&
> > folio_test_large(folio) &&
> > - !split_folio_to_list(folio, folio_list))
> > {
> > + /* Shmem can be split when writeback to swap
> > */
> > + if ((nr_pages > 1) &&
> > !folio_test_large(folio)) {
> > + sc->nr_scanned -= (nr_pages - 1);
> > nr_pages = 1;
> > - goto try_pageout;
> > }
> > goto activate_locked;
> > case PAGE_SUCCESS:
> > + if ((nr_pages > 1) &&
> > !folio_test_large(folio)) {
> > + sc->nr_scanned -= (nr_pages - 1);
> > + nr_pages = 1;
> > + }
> > stat->nr_pageout += nr_pages;
> >
> > if (folio_test_writeback(folio)) {
^ permalink raw reply related [flat|nested] 21+ messages in thread* Re: [PATCH v4 01/10] mm: vmscan: add validation before spliting shmem large folio
2024-08-08 10:57 ` Daniel Gomez
@ 2024-08-08 11:51 ` Baolin Wang
0 siblings, 0 replies; 21+ messages in thread
From: Baolin Wang @ 2024-08-08 11:51 UTC (permalink / raw)
To: Daniel Gomez
Cc: David Hildenbrand, akpm@linux-foundation.org, hughd@google.com,
willy@infradead.org, wangkefeng.wang@huawei.com,
chrisl@kernel.org, ying.huang@intel.com, 21cnbao@gmail.com,
ryan.roberts@arm.com, shy828301@gmail.com, ziy@nvidia.com,
ioworker0@gmail.com, Pankaj Raghav, linux-mm@kvack.org,
linux-kernel@vger.kernel.org, Christian Brauner, Luis Chamberlain
On 2024/8/8 18:57, Daniel Gomez wrote:
> On Thu, Aug 08, 2024 at 12:48:52PM GMT, Daniel Gomez wrote:
>> On Thu, Aug 08, 2024 at 05:34:50PM GMT, Baolin Wang wrote:
>>>
>>>
>>> On 2024/8/8 16:51, David Hildenbrand wrote:
>>>> On 08.08.24 04:36, Baolin Wang wrote:
>>>>>
>>>>>
>>>>> On 2024/8/7 23:53, David Hildenbrand wrote:
>>>>>> On 07.08.24 09:31, Baolin Wang wrote:
>>>>>>> Page reclaim will not scan anon LRU if no swap space, however
>>>>>>> MADV_PAGEOUT
>>>>>>> can still split shmem large folios even without a swap device. Thus add
>>>>>>> swap available space validation before spliting shmem large folio to
>>>>>>> avoid redundant split.
>>>>>>>
>>>>>>> Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
>>>>>>> ---
>>>>>>> mm/vmscan.c | 8 ++++++++
>>>>>>> 1 file changed, 8 insertions(+)
>>>>>>>
>>>>>>> diff --git a/mm/vmscan.c b/mm/vmscan.c
>>>>>>> index 31d13462571e..796f65781f4f 100644
>>>>>>> --- a/mm/vmscan.c
>>>>>>> +++ b/mm/vmscan.c
>>>>>>> @@ -1259,6 +1259,14 @@ static unsigned int shrink_folio_list(struct
>>>>>>> list_head *folio_list,
>>>>>>> }
>>>>>>> } else if (folio_test_swapbacked(folio) &&
>>>>>>> folio_test_large(folio)) {
>>>>>>> +
>>>>>>> + /*
>>>>>>> + * Do not split shmem folio if no swap memory
>>>>>>> + * available.
>>>>>>> + */
>>>>>>> + if (!total_swap_pages)
>>>>>>> + goto activate_locked;
>>>>>>> +
>>>>>>> /* Split shmem folio */
>>>>>>> if (split_folio_to_list(folio, folio_list))
>>>>>>> goto keep_locked;
>>>>>>
>>>>>> Reminds me of
>>>>>>
>>>>>> commit 9a976f0c847b67d22ed694556a3626ed92da0422
>>>>>> Author: Luis Chamberlain <mcgrof@kernel.org>
>>>>>> Date: Thu Mar 9 15:05:43 2023 -0800
>>>>>>
>>>>>> shmem: skip page split if we're not reclaiming
>>>>>> In theory when info->flags & VM_LOCKED we should not be getting
>>>>>> shem_writepage() called so we should be verifying this with a
>>>>>> WARN_ON_ONCE(). Since we should not be swapping then best to
>>>>>> ensure we
>>>>>> also don't do the folio split earlier too. So just move the check
>>>>>> early
>>>>>> to avoid folio splits in case its a dubious call.
>>>>>> We also have a similar early bail when !total_swap_pages so just
>>>>>> move that
>>>>>> earlier to avoid the possible folio split in the same situation.
>>>>>>
>>>>>>
>>>>>> But indeed, pageout() -> writepage() is called *after* the split in the
>>>>>> vmscan path.
>>>>>>
>>>>>> In that "noswap" context, I wonder if we also want to skip folios part
>>>>>> of shmem
>>>>>> with disabled swapping?
>>>>>
>>>>> Yes, I think so.
>>>>>
>>>>>>
>>>>>> But now I am wondering under which circumstances we end up calling
>>>>>> shmem_writepage() with a large folio. And I think the answer is the
>>>>>> comment of
>>>>>> folio_test_large(): via drivers/gpu/drm/i915/gem/i915_gem_shmem.c.
>>>>>>
>>>>>>
>>>>>> ... so if shmem_writepage() handles+checks that, could we do
>>>>>>
>>>>>> diff --git a/mm/vmscan.c b/mm/vmscan.c
>>>>>> index a332cb80e928..7dfa3d6e8ba7 100644
>>>>>> --- a/mm/vmscan.c
>>>>>> +++ b/mm/vmscan.c
>>>>>> @@ -1257,11 +1257,6 @@ static unsigned int shrink_folio_list(struct
>>>>>> list_head *folio_list,
>>>>>> goto
>>>>>> activate_locked_split;
>>>>>> }
>>>>>> }
>>>>>> - } else if (folio_test_swapbacked(folio) &&
>>>>>> - folio_test_large(folio)) {
>>>>>> - /* Split shmem folio */
>>>>>> - if (split_folio_to_list(folio, folio_list))
>>>>>> - goto keep_locked;
>>>>>> }
>>>>>>
>>>>>> /*
>>>>>>
>>>>>> instead?
>>>>>
>>>>> Seems reasonable to me. But we should pass the 'folio_list' to
>>>>> shmem_writepage() to list the subpages of the large folio. Let me try.
>>>>
>>>> Ah, yes, good point. Alternatively, we'd have to split and try writing
>>>> all subpages in there. I wonder what to do if we fail to write some, and
>>>> if we could handle that transparently, without the folio_list.
>>>
>>> After some investigation, I prefer to pass 'folio_list' to shmem_writepage()
>>> via 'struct writeback_control', which could simplify the logic a lot.
>>> Otherwise, we need to handle each subpage's writeback/reclaim/dirty state,
>>> as well as tracking each subpage's write result, which seems more
>>> complicated.
>>>
>>> I made a quick change by passing 'folio_list', and it looks simple and works
>>> as expected.
>>>
>>> diff --git a/include/linux/writeback.h b/include/linux/writeback.h
>>> index 75196b0f894f..10100e22d5c6 100644
>>> --- a/include/linux/writeback.h
>>> +++ b/include/linux/writeback.h
>>> @@ -80,6 +80,9 @@ struct writeback_control {
>>> */
>>> struct swap_iocb **swap_plug;
>>>
>>> + /* Target list for splitting a large folio */
>>> + struct list_head *list;
>>> +
>>> /* internal fields used by the ->writepages implementation: */
>>> struct folio_batch fbatch;
>>> pgoff_t index;
>>> diff --git a/mm/shmem.c b/mm/shmem.c
>>> index 05525e9e7423..0a5a68f7d0a0 100644
>>> --- a/mm/shmem.c
>>> +++ b/mm/shmem.c
>>> @@ -1496,9 +1496,10 @@ static int shmem_writepage(struct page *page, struct
>>> writeback_control *wbc)
>>> * and its shmem_writeback() needs them to be split when swapping.
>>> */
>>> if (wbc->split_large_folio && folio_test_large(folio)) {
>>> +try_split:
>>> /* Ensure the subpages are still dirty */
>>> folio_test_set_dirty(folio);
>>> - if (split_huge_page(page) < 0)
>>> + if (split_huge_page_to_list_to_order(page, wbc->list, 0))
>>
>> We check for split_large_folio, but we still send the wbc->list for i915
>> writepage() case. Previously, we were sending a NULL list. Shouldn't we address
>> that case too?
>
> I guess I was missing wbc initialization snippet:
>
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_shmem.c b/drivers/gpu/drm/i915/gem/i915_gem_shmem.c
> index fe69f2c8527d..174b95a9a988 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_shmem.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_shmem.c
> @@ -308,6 +308,7 @@ void __shmem_writeback(size_t size, struct address_space *mapping)
> .range_start = 0,
> .range_end = LLONG_MAX,
> .for_reclaim = 1,
> + .list = NULL,
> };
> unsigned long i;
>
IMO, we don't need an explicit initialization, and 'list' will be
initialized as NULL. Please see:
https://gcc.gnu.org/onlinedocs/gcc/Designated-Inits.html
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v4 01/10] mm: vmscan: add validation before spliting shmem large folio
2024-08-08 2:36 ` Baolin Wang
2024-08-08 8:51 ` David Hildenbrand
@ 2024-08-08 12:35 ` Matthew Wilcox
2024-08-09 3:21 ` Baolin Wang
1 sibling, 1 reply; 21+ messages in thread
From: Matthew Wilcox @ 2024-08-08 12:35 UTC (permalink / raw)
To: Baolin Wang
Cc: David Hildenbrand, akpm, hughd, wangkefeng.wang, chrisl,
ying.huang, 21cnbao, ryan.roberts, shy828301, ziy, ioworker0,
da.gomez, p.raghav, linux-mm, linux-kernel, Christian Brauner,
Luis Chamberlain
On Thu, Aug 08, 2024 at 10:36:23AM +0800, Baolin Wang wrote:
> On 2024/8/7 23:53, David Hildenbrand wrote:
> > But now I am wondering under which circumstances we end up calling
> > shmem_writepage() with a large folio. And I think the answer is the
> > comment of
> > folio_test_large(): via drivers/gpu/drm/i915/gem/i915_gem_shmem.c.
> >
> >
> > ... so if shmem_writepage() handles+checks that, could we do
> >
> > diff --git a/mm/vmscan.c b/mm/vmscan.c
> > index a332cb80e928..7dfa3d6e8ba7 100644
> > --- a/mm/vmscan.c
> > +++ b/mm/vmscan.c
> > @@ -1257,11 +1257,6 @@ static unsigned int shrink_folio_list(struct
> > list_head *folio_list,
> > goto
> > activate_locked_split;
> > }
> > }
> > - } else if (folio_test_swapbacked(folio) &&
> > - folio_test_large(folio)) {
> > - /* Split shmem folio */
> > - if (split_folio_to_list(folio, folio_list))
> > - goto keep_locked;
> > }
> >
> > /*
> >
> > instead?
>
> Seems reasonable to me. But we should pass the 'folio_list' to
> shmem_writepage() to list the subpages of the large folio. Let me try.
> Thanks.
We should be trying to remove shmem_writepage(), not make it more
complex. We're making good progress removing instances of ->writepage;
just ceph, ecryptfs, f2fs, gfs2, hostfs, nilfs2, orangefs, vboxsf, shmem
& swap are left. gfs2 patches are out for review.
As you can see from previous patches, the approach is to use
->writepages instead of ->writepage. There should be no need to
handle a folio split list as splitting a folio leaves the folios in the
page cache and they'll naturally be found by subsequent iterations.
^ permalink raw reply [flat|nested] 21+ messages in thread* Re: [PATCH v4 01/10] mm: vmscan: add validation before spliting shmem large folio
2024-08-08 12:35 ` Matthew Wilcox
@ 2024-08-09 3:21 ` Baolin Wang
0 siblings, 0 replies; 21+ messages in thread
From: Baolin Wang @ 2024-08-09 3:21 UTC (permalink / raw)
To: Matthew Wilcox
Cc: David Hildenbrand, akpm, hughd, wangkefeng.wang, chrisl,
ying.huang, 21cnbao, ryan.roberts, shy828301, ziy, ioworker0,
da.gomez, p.raghav, linux-mm, linux-kernel, Christian Brauner,
Luis Chamberlain
On 2024/8/8 20:35, Matthew Wilcox wrote:
> On Thu, Aug 08, 2024 at 10:36:23AM +0800, Baolin Wang wrote:
>> On 2024/8/7 23:53, David Hildenbrand wrote:
>>> But now I am wondering under which circumstances we end up calling
>>> shmem_writepage() with a large folio. And I think the answer is the
>>> comment of
>>> folio_test_large(): via drivers/gpu/drm/i915/gem/i915_gem_shmem.c.
>>>
>>>
>>> ... so if shmem_writepage() handles+checks that, could we do
>>>
>>> diff --git a/mm/vmscan.c b/mm/vmscan.c
>>> index a332cb80e928..7dfa3d6e8ba7 100644
>>> --- a/mm/vmscan.c
>>> +++ b/mm/vmscan.c
>>> @@ -1257,11 +1257,6 @@ static unsigned int shrink_folio_list(struct
>>> list_head *folio_list,
>>> goto
>>> activate_locked_split;
>>> }
>>> }
>>> - } else if (folio_test_swapbacked(folio) &&
>>> - folio_test_large(folio)) {
>>> - /* Split shmem folio */
>>> - if (split_folio_to_list(folio, folio_list))
>>> - goto keep_locked;
>>> }
>>>
>>> /*
>>>
>>> instead?
>>
>> Seems reasonable to me. But we should pass the 'folio_list' to
>> shmem_writepage() to list the subpages of the large folio. Let me try.
>> Thanks.
>
> We should be trying to remove shmem_writepage(), not make it more
> complex. We're making good progress removing instances of ->writepage;
> just ceph, ecryptfs, f2fs, gfs2, hostfs, nilfs2, orangefs, vboxsf, shmem
> & swap are left. gfs2 patches are out for review.
I am afraid shmem is a bit special. IIUC, ->writepages() is used to
write back some dirty pages from the mapping by the writeback flusher
thread, but shmem cannot be written back (mapping_can_writeback() will
return false). Therefore, shmem can only be reclaimed through direct
reclaim or kswapd if a swap device is set up (if no swap, shmem should
always be kept in memory). So currently, we should still keep
shmem_writepage() to reclaim shmem pages.
> As you can see from previous patches, the approach is to use
> ->writepages instead of ->writepage. There should be no need to
> handle a folio split list as splitting a folio leaves the folios in the
> page cache and they'll naturally be found by subsequent iterations.
Again, shmem is special. If shmem folio is reclaimable (when a swap
device is set up), we need to allocate contiguous swap entries for large
folios. However, if there is significant fragmentation of swap entries
(there is already a topic to talk about this issue), we will not able to
allocate contiguous swap entries for large shmem folios. Therefore, in
this case, it is necessary to split the large shmem folio in order to
try to allocate a singe swap entry for reclaiming shmem.
^ permalink raw reply [flat|nested] 21+ messages in thread
* [PATCH v4 02/10] mm: swap: extend swap_shmem_alloc() to support batch SWAP_MAP_SHMEM flag setting
2024-08-07 7:31 [PATCH v4 00/10] support large folio swap-out and swap-in for shmem Baolin Wang
2024-08-07 7:31 ` [PATCH v4 01/10] mm: vmscan: add validation before spliting shmem large folio Baolin Wang
@ 2024-08-07 7:31 ` Baolin Wang
2024-08-07 8:02 ` Barry Song
2024-08-07 7:31 ` [PATCH v4 03/10] mm: shmem: extend shmem_partial_swap_usage() to support large folio swap Baolin Wang
` (7 subsequent siblings)
9 siblings, 1 reply; 21+ messages in thread
From: Baolin Wang @ 2024-08-07 7:31 UTC (permalink / raw)
To: akpm, hughd
Cc: willy, david, wangkefeng.wang, chrisl, ying.huang, 21cnbao,
ryan.roberts, shy828301, ziy, ioworker0, da.gomez, p.raghav,
baolin.wang, linux-mm, linux-kernel
To support shmem large folio swap operations, add a new parameter to
swap_shmem_alloc() that allows batch SWAP_MAP_SHMEM flag setting for
shmem swap entries.
While we are at it, using folio_nr_pages() to get the number of pages
of the folio as a preparation.
Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
---
include/linux/swap.h | 4 ++--
mm/shmem.c | 6 ++++--
mm/swapfile.c | 4 ++--
3 files changed, 8 insertions(+), 6 deletions(-)
diff --git a/include/linux/swap.h b/include/linux/swap.h
index 1c8f844a9f0f..248db1dd7812 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -481,7 +481,7 @@ void put_swap_folio(struct folio *folio, swp_entry_t entry);
extern swp_entry_t get_swap_page_of_type(int);
extern int get_swap_pages(int n, swp_entry_t swp_entries[], int order);
extern int add_swap_count_continuation(swp_entry_t, gfp_t);
-extern void swap_shmem_alloc(swp_entry_t);
+extern void swap_shmem_alloc(swp_entry_t, int);
extern int swap_duplicate(swp_entry_t);
extern int swapcache_prepare(swp_entry_t entry, int nr);
extern void swap_free_nr(swp_entry_t entry, int nr_pages);
@@ -548,7 +548,7 @@ static inline int add_swap_count_continuation(swp_entry_t swp, gfp_t gfp_mask)
return 0;
}
-static inline void swap_shmem_alloc(swp_entry_t swp)
+static inline void swap_shmem_alloc(swp_entry_t swp, int nr)
{
}
diff --git a/mm/shmem.c b/mm/shmem.c
index 4a5254bfd610..22cdc10f27ea 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -1452,6 +1452,7 @@ static int shmem_writepage(struct page *page, struct writeback_control *wbc)
struct shmem_sb_info *sbinfo = SHMEM_SB(inode->i_sb);
swp_entry_t swap;
pgoff_t index;
+ int nr_pages;
/*
* Our capabilities prevent regular writeback or sync from ever calling
@@ -1484,6 +1485,7 @@ static int shmem_writepage(struct page *page, struct writeback_control *wbc)
}
index = folio->index;
+ nr_pages = folio_nr_pages(folio);
/*
* This is somewhat ridiculous, but without plumbing a SWAP_MAP_FALLOC
@@ -1536,8 +1538,8 @@ static int shmem_writepage(struct page *page, struct writeback_control *wbc)
if (add_to_swap_cache(folio, swap,
__GFP_HIGH | __GFP_NOMEMALLOC | __GFP_NOWARN,
NULL) == 0) {
- shmem_recalc_inode(inode, 0, 1);
- swap_shmem_alloc(swap);
+ shmem_recalc_inode(inode, 0, nr_pages);
+ swap_shmem_alloc(swap, nr_pages);
shmem_delete_from_page_cache(folio, swp_to_radix_entry(swap));
mutex_unlock(&shmem_swaplist_mutex);
diff --git a/mm/swapfile.c b/mm/swapfile.c
index ea023fc25d08..88d73880aada 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -3604,9 +3604,9 @@ static int __swap_duplicate(swp_entry_t entry, unsigned char usage, int nr)
* Help swapoff by noting that swap entry belongs to shmem/tmpfs
* (in which case its reference count is never incremented).
*/
-void swap_shmem_alloc(swp_entry_t entry)
+void swap_shmem_alloc(swp_entry_t entry, int nr)
{
- __swap_duplicate(entry, SWAP_MAP_SHMEM, 1);
+ __swap_duplicate(entry, SWAP_MAP_SHMEM, nr);
}
/*
--
2.39.3
^ permalink raw reply related [flat|nested] 21+ messages in thread* Re: [PATCH v4 02/10] mm: swap: extend swap_shmem_alloc() to support batch SWAP_MAP_SHMEM flag setting
2024-08-07 7:31 ` [PATCH v4 02/10] mm: swap: extend swap_shmem_alloc() to support batch SWAP_MAP_SHMEM flag setting Baolin Wang
@ 2024-08-07 8:02 ` Barry Song
0 siblings, 0 replies; 21+ messages in thread
From: Barry Song @ 2024-08-07 8:02 UTC (permalink / raw)
To: Baolin Wang
Cc: akpm, hughd, willy, david, wangkefeng.wang, chrisl, ying.huang,
ryan.roberts, shy828301, ziy, ioworker0, da.gomez, p.raghav,
linux-mm, linux-kernel
On Wed, Aug 7, 2024 at 7:31 PM Baolin Wang
<baolin.wang@linux.alibaba.com> wrote:
>
> To support shmem large folio swap operations, add a new parameter to
> swap_shmem_alloc() that allows batch SWAP_MAP_SHMEM flag setting for
> shmem swap entries.
>
> While we are at it, using folio_nr_pages() to get the number of pages
> of the folio as a preparation.
>
> Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Reviewed-by: Barry Song <baohua@kernel.org>
> ---
> include/linux/swap.h | 4 ++--
> mm/shmem.c | 6 ++++--
> mm/swapfile.c | 4 ++--
> 3 files changed, 8 insertions(+), 6 deletions(-)
>
> diff --git a/include/linux/swap.h b/include/linux/swap.h
> index 1c8f844a9f0f..248db1dd7812 100644
> --- a/include/linux/swap.h
> +++ b/include/linux/swap.h
> @@ -481,7 +481,7 @@ void put_swap_folio(struct folio *folio, swp_entry_t entry);
> extern swp_entry_t get_swap_page_of_type(int);
> extern int get_swap_pages(int n, swp_entry_t swp_entries[], int order);
> extern int add_swap_count_continuation(swp_entry_t, gfp_t);
> -extern void swap_shmem_alloc(swp_entry_t);
> +extern void swap_shmem_alloc(swp_entry_t, int);
> extern int swap_duplicate(swp_entry_t);
> extern int swapcache_prepare(swp_entry_t entry, int nr);
> extern void swap_free_nr(swp_entry_t entry, int nr_pages);
> @@ -548,7 +548,7 @@ static inline int add_swap_count_continuation(swp_entry_t swp, gfp_t gfp_mask)
> return 0;
> }
>
> -static inline void swap_shmem_alloc(swp_entry_t swp)
> +static inline void swap_shmem_alloc(swp_entry_t swp, int nr)
> {
> }
>
> diff --git a/mm/shmem.c b/mm/shmem.c
> index 4a5254bfd610..22cdc10f27ea 100644
> --- a/mm/shmem.c
> +++ b/mm/shmem.c
> @@ -1452,6 +1452,7 @@ static int shmem_writepage(struct page *page, struct writeback_control *wbc)
> struct shmem_sb_info *sbinfo = SHMEM_SB(inode->i_sb);
> swp_entry_t swap;
> pgoff_t index;
> + int nr_pages;
>
> /*
> * Our capabilities prevent regular writeback or sync from ever calling
> @@ -1484,6 +1485,7 @@ static int shmem_writepage(struct page *page, struct writeback_control *wbc)
> }
>
> index = folio->index;
> + nr_pages = folio_nr_pages(folio);
>
> /*
> * This is somewhat ridiculous, but without plumbing a SWAP_MAP_FALLOC
> @@ -1536,8 +1538,8 @@ static int shmem_writepage(struct page *page, struct writeback_control *wbc)
> if (add_to_swap_cache(folio, swap,
> __GFP_HIGH | __GFP_NOMEMALLOC | __GFP_NOWARN,
> NULL) == 0) {
> - shmem_recalc_inode(inode, 0, 1);
> - swap_shmem_alloc(swap);
> + shmem_recalc_inode(inode, 0, nr_pages);
> + swap_shmem_alloc(swap, nr_pages);
> shmem_delete_from_page_cache(folio, swp_to_radix_entry(swap));
>
> mutex_unlock(&shmem_swaplist_mutex);
> diff --git a/mm/swapfile.c b/mm/swapfile.c
> index ea023fc25d08..88d73880aada 100644
> --- a/mm/swapfile.c
> +++ b/mm/swapfile.c
> @@ -3604,9 +3604,9 @@ static int __swap_duplicate(swp_entry_t entry, unsigned char usage, int nr)
> * Help swapoff by noting that swap entry belongs to shmem/tmpfs
> * (in which case its reference count is never incremented).
> */
> -void swap_shmem_alloc(swp_entry_t entry)
> +void swap_shmem_alloc(swp_entry_t entry, int nr)
> {
> - __swap_duplicate(entry, SWAP_MAP_SHMEM, 1);
> + __swap_duplicate(entry, SWAP_MAP_SHMEM, nr);
> }
>
> /*
> --
> 2.39.3
>
^ permalink raw reply [flat|nested] 21+ messages in thread
* [PATCH v4 03/10] mm: shmem: extend shmem_partial_swap_usage() to support large folio swap
2024-08-07 7:31 [PATCH v4 00/10] support large folio swap-out and swap-in for shmem Baolin Wang
2024-08-07 7:31 ` [PATCH v4 01/10] mm: vmscan: add validation before spliting shmem large folio Baolin Wang
2024-08-07 7:31 ` [PATCH v4 02/10] mm: swap: extend swap_shmem_alloc() to support batch SWAP_MAP_SHMEM flag setting Baolin Wang
@ 2024-08-07 7:31 ` Baolin Wang
2024-08-07 7:31 ` [PATCH v4 04/10] mm: shmem: return number of pages beeing freed in shmem_free_swap Baolin Wang
` (6 subsequent siblings)
9 siblings, 0 replies; 21+ messages in thread
From: Baolin Wang @ 2024-08-07 7:31 UTC (permalink / raw)
To: akpm, hughd
Cc: willy, david, wangkefeng.wang, chrisl, ying.huang, 21cnbao,
ryan.roberts, shy828301, ziy, ioworker0, da.gomez, p.raghav,
baolin.wang, linux-mm, linux-kernel
To support shmem large folio swapout in the following patches, using
xa_get_order() to get the order of the swap entry to calculate the swap
usage of shmem.
Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
---
mm/shmem.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/mm/shmem.c b/mm/shmem.c
index 22cdc10f27ea..02fb188d627f 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -890,7 +890,7 @@ unsigned long shmem_partial_swap_usage(struct address_space *mapping,
if (xas_retry(&xas, page))
continue;
if (xa_is_value(page))
- swapped++;
+ swapped += 1 << xa_get_order(xas.xa, xas.xa_index);
if (xas.xa_index == max)
break;
if (need_resched()) {
--
2.39.3
^ permalink raw reply related [flat|nested] 21+ messages in thread* [PATCH v4 04/10] mm: shmem: return number of pages beeing freed in shmem_free_swap
2024-08-07 7:31 [PATCH v4 00/10] support large folio swap-out and swap-in for shmem Baolin Wang
` (2 preceding siblings ...)
2024-08-07 7:31 ` [PATCH v4 03/10] mm: shmem: extend shmem_partial_swap_usage() to support large folio swap Baolin Wang
@ 2024-08-07 7:31 ` Baolin Wang
2024-08-07 7:31 ` [PATCH v4 05/10] mm: filemap: use xa_get_order() to get the swap entry order Baolin Wang
` (5 subsequent siblings)
9 siblings, 0 replies; 21+ messages in thread
From: Baolin Wang @ 2024-08-07 7:31 UTC (permalink / raw)
To: akpm, hughd
Cc: willy, david, wangkefeng.wang, chrisl, ying.huang, 21cnbao,
ryan.roberts, shy828301, ziy, ioworker0, da.gomez, p.raghav,
baolin.wang, linux-mm, linux-kernel
From: Daniel Gomez <da.gomez@samsung.com>
Both shmem_free_swap callers expect the number of pages being freed. In
the large folios context, this needs to support larger values other than
0 (used as 1 page being freed) and -ENOENT (used as 0 pages being
freed). In preparation for large folios adoption, make shmem_free_swap
routine return the number of pages being freed. So, returning 0 in this
context, means 0 pages being freed.
While we are at it, changing to use free_swap_and_cache_nr() to free large
order swap entry by Baolin Wang.
Suggested-by: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Daniel Gomez <da.gomez@samsung.com>
Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
---
mm/shmem.c | 25 ++++++++++++++++---------
1 file changed, 16 insertions(+), 9 deletions(-)
diff --git a/mm/shmem.c b/mm/shmem.c
index 02fb188d627f..d0d54939da48 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -856,18 +856,22 @@ static void shmem_delete_from_page_cache(struct folio *folio, void *radswap)
}
/*
- * Remove swap entry from page cache, free the swap and its page cache.
+ * Remove swap entry from page cache, free the swap and its page cache. Returns
+ * the number of pages being freed. 0 means entry not found in XArray (0 pages
+ * being freed).
*/
-static int shmem_free_swap(struct address_space *mapping,
- pgoff_t index, void *radswap)
+static long shmem_free_swap(struct address_space *mapping,
+ pgoff_t index, void *radswap)
{
+ int order = xa_get_order(&mapping->i_pages, index);
void *old;
old = xa_cmpxchg_irq(&mapping->i_pages, index, radswap, NULL, 0);
if (old != radswap)
- return -ENOENT;
- free_swap_and_cache(radix_to_swp_entry(radswap));
- return 0;
+ return 0;
+ free_swap_and_cache_nr(radix_to_swp_entry(radswap), 1 << order);
+
+ return 1 << order;
}
/*
@@ -1019,7 +1023,7 @@ static void shmem_undo_range(struct inode *inode, loff_t lstart, loff_t lend,
if (xa_is_value(folio)) {
if (unfalloc)
continue;
- nr_swaps_freed += !shmem_free_swap(mapping,
+ nr_swaps_freed += shmem_free_swap(mapping,
indices[i], folio);
continue;
}
@@ -1086,14 +1090,17 @@ static void shmem_undo_range(struct inode *inode, loff_t lstart, loff_t lend,
folio = fbatch.folios[i];
if (xa_is_value(folio)) {
+ long swaps_freed;
+
if (unfalloc)
continue;
- if (shmem_free_swap(mapping, indices[i], folio)) {
+ swaps_freed = shmem_free_swap(mapping, indices[i], folio);
+ if (!swaps_freed) {
/* Swap was replaced by page: retry */
index = indices[i];
break;
}
- nr_swaps_freed++;
+ nr_swaps_freed += swaps_freed;
continue;
}
--
2.39.3
^ permalink raw reply related [flat|nested] 21+ messages in thread* [PATCH v4 05/10] mm: filemap: use xa_get_order() to get the swap entry order
2024-08-07 7:31 [PATCH v4 00/10] support large folio swap-out and swap-in for shmem Baolin Wang
` (3 preceding siblings ...)
2024-08-07 7:31 ` [PATCH v4 04/10] mm: shmem: return number of pages beeing freed in shmem_free_swap Baolin Wang
@ 2024-08-07 7:31 ` Baolin Wang
2024-08-07 7:31 ` [PATCH v4 06/10] mm: shmem: use swap_free_nr() to free shmem swap entries Baolin Wang
` (4 subsequent siblings)
9 siblings, 0 replies; 21+ messages in thread
From: Baolin Wang @ 2024-08-07 7:31 UTC (permalink / raw)
To: akpm, hughd
Cc: willy, david, wangkefeng.wang, chrisl, ying.huang, 21cnbao,
ryan.roberts, shy828301, ziy, ioworker0, da.gomez, p.raghav,
baolin.wang, linux-mm, linux-kernel
In the following patches, shmem will support the swap out of large folios,
which means the shmem mappings may contain large order swap entries, so
using xa_get_order() to get the folio order of the shmem swap entry to
update the '*start' correctly.
Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
---
mm/filemap.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/mm/filemap.c b/mm/filemap.c
index 4130be74f6fd..4c312aab8b1f 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -2056,6 +2056,8 @@ unsigned find_get_entries(struct address_space *mapping, pgoff_t *start,
folio = fbatch->folios[idx];
if (!xa_is_value(folio))
nr = folio_nr_pages(folio);
+ else
+ nr = 1 << xa_get_order(&mapping->i_pages, indices[idx]);
*start = indices[idx] + nr;
}
return folio_batch_count(fbatch);
@@ -2120,6 +2122,8 @@ unsigned find_lock_entries(struct address_space *mapping, pgoff_t *start,
folio = fbatch->folios[idx];
if (!xa_is_value(folio))
nr = folio_nr_pages(folio);
+ else
+ nr = 1 << xa_get_order(&mapping->i_pages, indices[idx]);
*start = indices[idx] + nr;
}
return folio_batch_count(fbatch);
--
2.39.3
^ permalink raw reply related [flat|nested] 21+ messages in thread* [PATCH v4 06/10] mm: shmem: use swap_free_nr() to free shmem swap entries
2024-08-07 7:31 [PATCH v4 00/10] support large folio swap-out and swap-in for shmem Baolin Wang
` (4 preceding siblings ...)
2024-08-07 7:31 ` [PATCH v4 05/10] mm: filemap: use xa_get_order() to get the swap entry order Baolin Wang
@ 2024-08-07 7:31 ` Baolin Wang
2024-08-07 7:31 ` [PATCH v4 07/10] mm: shmem: support large folio allocation for shmem_replace_folio() Baolin Wang
` (3 subsequent siblings)
9 siblings, 0 replies; 21+ messages in thread
From: Baolin Wang @ 2024-08-07 7:31 UTC (permalink / raw)
To: akpm, hughd
Cc: willy, david, wangkefeng.wang, chrisl, ying.huang, 21cnbao,
ryan.roberts, shy828301, ziy, ioworker0, da.gomez, p.raghav,
baolin.wang, linux-mm, linux-kernel
As a preparation for supporting shmem large folio swapout, use swap_free_nr()
to free some continuous swap entries of the shmem large folio when the
large folio was swapped in from the swap cache. In addition, the index
should also be round down to the number of pages when adding the swapin
folio into the pagecache.
Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
---
mm/shmem.c | 16 ++++++++++------
1 file changed, 10 insertions(+), 6 deletions(-)
diff --git a/mm/shmem.c b/mm/shmem.c
index d0d54939da48..f6bab42180ea 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -1961,6 +1961,7 @@ static void shmem_set_folio_swapin_error(struct inode *inode, pgoff_t index,
struct address_space *mapping = inode->i_mapping;
swp_entry_t swapin_error;
void *old;
+ int nr_pages;
swapin_error = make_poisoned_swp_entry();
old = xa_cmpxchg_irq(&mapping->i_pages, index,
@@ -1969,6 +1970,7 @@ static void shmem_set_folio_swapin_error(struct inode *inode, pgoff_t index,
if (old != swp_to_radix_entry(swap))
return;
+ nr_pages = folio_nr_pages(folio);
folio_wait_writeback(folio);
delete_from_swap_cache(folio);
/*
@@ -1976,8 +1978,8 @@ static void shmem_set_folio_swapin_error(struct inode *inode, pgoff_t index,
* won't be 0 when inode is released and thus trigger WARN_ON(i_blocks)
* in shmem_evict_inode().
*/
- shmem_recalc_inode(inode, -1, -1);
- swap_free(swap);
+ shmem_recalc_inode(inode, -nr_pages, -nr_pages);
+ swap_free_nr(swap, nr_pages);
}
/*
@@ -1996,7 +1998,7 @@ static int shmem_swapin_folio(struct inode *inode, pgoff_t index,
struct swap_info_struct *si;
struct folio *folio = NULL;
swp_entry_t swap;
- int error;
+ int error, nr_pages;
VM_BUG_ON(!*foliop || !xa_is_value(*foliop));
swap = radix_to_swp_entry(*foliop);
@@ -2043,6 +2045,7 @@ static int shmem_swapin_folio(struct inode *inode, pgoff_t index,
goto failed;
}
folio_wait_writeback(folio);
+ nr_pages = folio_nr_pages(folio);
/*
* Some architectures may have to restore extra metadata to the
@@ -2056,19 +2059,20 @@ static int shmem_swapin_folio(struct inode *inode, pgoff_t index,
goto failed;
}
- error = shmem_add_to_page_cache(folio, mapping, index,
+ error = shmem_add_to_page_cache(folio, mapping,
+ round_down(index, nr_pages),
swp_to_radix_entry(swap), gfp);
if (error)
goto failed;
- shmem_recalc_inode(inode, 0, -1);
+ shmem_recalc_inode(inode, 0, -nr_pages);
if (sgp == SGP_WRITE)
folio_mark_accessed(folio);
delete_from_swap_cache(folio);
folio_mark_dirty(folio);
- swap_free(swap);
+ swap_free_nr(swap, nr_pages);
put_swap_device(si);
*foliop = folio;
--
2.39.3
^ permalink raw reply related [flat|nested] 21+ messages in thread* [PATCH v4 07/10] mm: shmem: support large folio allocation for shmem_replace_folio()
2024-08-07 7:31 [PATCH v4 00/10] support large folio swap-out and swap-in for shmem Baolin Wang
` (5 preceding siblings ...)
2024-08-07 7:31 ` [PATCH v4 06/10] mm: shmem: use swap_free_nr() to free shmem swap entries Baolin Wang
@ 2024-08-07 7:31 ` Baolin Wang
2024-08-07 7:31 ` [PATCH v4 08/10] mm: shmem: drop folio reference count using 'nr_pages' in shmem_delete_from_page_cache() Baolin Wang
` (2 subsequent siblings)
9 siblings, 0 replies; 21+ messages in thread
From: Baolin Wang @ 2024-08-07 7:31 UTC (permalink / raw)
To: akpm, hughd
Cc: willy, david, wangkefeng.wang, chrisl, ying.huang, 21cnbao,
ryan.roberts, shy828301, ziy, ioworker0, da.gomez, p.raghav,
baolin.wang, linux-mm, linux-kernel
To support large folio swapin for shmem in the following patches, add
large folio allocation for the new replacement folio in shmem_replace_folio().
Moreover large folios occupy N consecutive entries in the swap cache
instead of using multi-index entries like the page cache, therefore
we should replace each consecutive entries in the swap cache instead
of using the shmem_replace_entry().
As well as updating statistics and folio reference count using the number
of pages in the folio.
Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
---
mm/shmem.c | 54 +++++++++++++++++++++++++++++++-----------------------
1 file changed, 31 insertions(+), 23 deletions(-)
diff --git a/mm/shmem.c b/mm/shmem.c
index f6bab42180ea..d94f02ad7bd1 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -1889,28 +1889,24 @@ static bool shmem_should_replace_folio(struct folio *folio, gfp_t gfp)
static int shmem_replace_folio(struct folio **foliop, gfp_t gfp,
struct shmem_inode_info *info, pgoff_t index)
{
- struct folio *old, *new;
- struct address_space *swap_mapping;
- swp_entry_t entry;
- pgoff_t swap_index;
- int error;
-
- old = *foliop;
- entry = old->swap;
- swap_index = swap_cache_index(entry);
- swap_mapping = swap_address_space(entry);
+ struct folio *new, *old = *foliop;
+ swp_entry_t entry = old->swap;
+ struct address_space *swap_mapping = swap_address_space(entry);
+ pgoff_t swap_index = swap_cache_index(entry);
+ XA_STATE(xas, &swap_mapping->i_pages, swap_index);
+ int nr_pages = folio_nr_pages(old);
+ int error = 0, i;
/*
* We have arrived here because our zones are constrained, so don't
* limit chance of success by further cpuset and node constraints.
*/
gfp &= ~GFP_CONSTRAINT_MASK;
- VM_BUG_ON_FOLIO(folio_test_large(old), old);
- new = shmem_alloc_folio(gfp, 0, info, index);
+ new = shmem_alloc_folio(gfp, folio_order(old), info, index);
if (!new)
return -ENOMEM;
- folio_get(new);
+ folio_ref_add(new, nr_pages);
folio_copy(new, old);
flush_dcache_folio(new);
@@ -1920,18 +1916,25 @@ static int shmem_replace_folio(struct folio **foliop, gfp_t gfp,
new->swap = entry;
folio_set_swapcache(new);
- /*
- * Our caller will very soon move newpage out of swapcache, but it's
- * a nice clean interface for us to replace oldpage by newpage there.
- */
+ /* Swap cache still stores N entries instead of a high-order entry */
xa_lock_irq(&swap_mapping->i_pages);
- error = shmem_replace_entry(swap_mapping, swap_index, old, new);
+ for (i = 0; i < nr_pages; i++) {
+ void *item = xas_load(&xas);
+
+ if (item != old) {
+ error = -ENOENT;
+ break;
+ }
+
+ xas_store(&xas, new);
+ xas_next(&xas);
+ }
if (!error) {
mem_cgroup_replace_folio(old, new);
- __lruvec_stat_mod_folio(new, NR_FILE_PAGES, 1);
- __lruvec_stat_mod_folio(new, NR_SHMEM, 1);
- __lruvec_stat_mod_folio(old, NR_FILE_PAGES, -1);
- __lruvec_stat_mod_folio(old, NR_SHMEM, -1);
+ __lruvec_stat_mod_folio(new, NR_FILE_PAGES, nr_pages);
+ __lruvec_stat_mod_folio(new, NR_SHMEM, nr_pages);
+ __lruvec_stat_mod_folio(old, NR_FILE_PAGES, -nr_pages);
+ __lruvec_stat_mod_folio(old, NR_SHMEM, -nr_pages);
}
xa_unlock_irq(&swap_mapping->i_pages);
@@ -1951,7 +1954,12 @@ static int shmem_replace_folio(struct folio **foliop, gfp_t gfp,
old->private = NULL;
folio_unlock(old);
- folio_put_refs(old, 2);
+ /*
+ * The old folio are removed from swap cache, drop the 'nr_pages'
+ * reference, as well as one temporary reference getting from swap
+ * cache.
+ */
+ folio_put_refs(old, nr_pages + 1);
return error;
}
--
2.39.3
^ permalink raw reply related [flat|nested] 21+ messages in thread* [PATCH v4 08/10] mm: shmem: drop folio reference count using 'nr_pages' in shmem_delete_from_page_cache()
2024-08-07 7:31 [PATCH v4 00/10] support large folio swap-out and swap-in for shmem Baolin Wang
` (6 preceding siblings ...)
2024-08-07 7:31 ` [PATCH v4 07/10] mm: shmem: support large folio allocation for shmem_replace_folio() Baolin Wang
@ 2024-08-07 7:31 ` Baolin Wang
2024-08-07 7:31 ` [PATCH v4 09/10] mm: shmem: split large entry if the swapin folio is not large Baolin Wang
2024-08-07 7:31 ` [PATCH v4 10/10] mm: shmem: support large folio swap out Baolin Wang
9 siblings, 0 replies; 21+ messages in thread
From: Baolin Wang @ 2024-08-07 7:31 UTC (permalink / raw)
To: akpm, hughd
Cc: willy, david, wangkefeng.wang, chrisl, ying.huang, 21cnbao,
ryan.roberts, shy828301, ziy, ioworker0, da.gomez, p.raghav,
baolin.wang, linux-mm, linux-kernel
To support large folio swapin/swapout for shmem in the following patches,
drop the folio's reference count by the number of pages contained in the
folio when a shmem folio is deleted from shmem pagecache after adding
into swap cache.
Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
---
mm/shmem.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/mm/shmem.c b/mm/shmem.c
index d94f02ad7bd1..345e25425e37 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -851,7 +851,7 @@ static void shmem_delete_from_page_cache(struct folio *folio, void *radswap)
__lruvec_stat_mod_folio(folio, NR_FILE_PAGES, -nr);
__lruvec_stat_mod_folio(folio, NR_SHMEM, -nr);
xa_unlock_irq(&mapping->i_pages);
- folio_put(folio);
+ folio_put_refs(folio, nr);
BUG_ON(error);
}
--
2.39.3
^ permalink raw reply related [flat|nested] 21+ messages in thread* [PATCH v4 09/10] mm: shmem: split large entry if the swapin folio is not large
2024-08-07 7:31 [PATCH v4 00/10] support large folio swap-out and swap-in for shmem Baolin Wang
` (7 preceding siblings ...)
2024-08-07 7:31 ` [PATCH v4 08/10] mm: shmem: drop folio reference count using 'nr_pages' in shmem_delete_from_page_cache() Baolin Wang
@ 2024-08-07 7:31 ` Baolin Wang
2024-08-07 7:31 ` [PATCH v4 10/10] mm: shmem: support large folio swap out Baolin Wang
9 siblings, 0 replies; 21+ messages in thread
From: Baolin Wang @ 2024-08-07 7:31 UTC (permalink / raw)
To: akpm, hughd
Cc: willy, david, wangkefeng.wang, chrisl, ying.huang, 21cnbao,
ryan.roberts, shy828301, ziy, ioworker0, da.gomez, p.raghav,
baolin.wang, linux-mm, linux-kernel
Now the swap device can only swap-in order 0 folio, even though a large
folio is swapped out. This requires us to split the large entry previously
saved in the shmem pagecache to support the swap in of small folios.
Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
---
mm/shmem.c | 100 +++++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 100 insertions(+)
diff --git a/mm/shmem.c b/mm/shmem.c
index 345e25425e37..996062dc196b 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -1990,6 +1990,81 @@ static void shmem_set_folio_swapin_error(struct inode *inode, pgoff_t index,
swap_free_nr(swap, nr_pages);
}
+static int shmem_split_large_entry(struct inode *inode, pgoff_t index,
+ swp_entry_t swap, int new_order, gfp_t gfp)
+{
+ struct address_space *mapping = inode->i_mapping;
+ XA_STATE_ORDER(xas, &mapping->i_pages, index, new_order);
+ void *alloced_shadow = NULL;
+ int alloced_order = 0, i;
+
+ for (;;) {
+ int order = -1, split_order = 0;
+ void *old = NULL;
+
+ xas_lock_irq(&xas);
+ old = xas_load(&xas);
+ if (!xa_is_value(old) || swp_to_radix_entry(swap) != old) {
+ xas_set_err(&xas, -EEXIST);
+ goto unlock;
+ }
+
+ order = xas_get_order(&xas);
+
+ /* Swap entry may have changed before we re-acquire the lock */
+ if (alloced_order &&
+ (old != alloced_shadow || order != alloced_order)) {
+ xas_destroy(&xas);
+ alloced_order = 0;
+ }
+
+ /* Try to split large swap entry in pagecache */
+ if (order > 0 && order > new_order) {
+ if (!alloced_order) {
+ split_order = order;
+ goto unlock;
+ }
+ xas_split(&xas, old, order);
+
+ /*
+ * Re-set the swap entry after splitting, and the swap
+ * offset of the original large entry must be continuous.
+ */
+ for (i = 0; i < 1 << order; i += (1 << new_order)) {
+ pgoff_t aligned_index = round_down(index, 1 << order);
+ swp_entry_t tmp;
+
+ tmp = swp_entry(swp_type(swap), swp_offset(swap) + i);
+ __xa_store(&mapping->i_pages, aligned_index + i,
+ swp_to_radix_entry(tmp), 0);
+ }
+ }
+
+unlock:
+ xas_unlock_irq(&xas);
+
+ /* split needed, alloc here and retry. */
+ if (split_order) {
+ xas_split_alloc(&xas, old, split_order, gfp);
+ if (xas_error(&xas))
+ goto error;
+ alloced_shadow = old;
+ alloced_order = split_order;
+ xas_reset(&xas);
+ continue;
+ }
+
+ if (!xas_nomem(&xas, gfp))
+ break;
+ }
+
+error:
+ if (xas_error(&xas))
+ return xas_error(&xas);
+
+ return alloced_order;
+}
+
/*
* Swap in the folio pointed to by *foliop.
* Caller has to make sure that *foliop contains a valid swapped folio.
@@ -2026,12 +2101,37 @@ static int shmem_swapin_folio(struct inode *inode, pgoff_t index,
/* Look it up and read it in.. */
folio = swap_cache_get_folio(swap, NULL, 0);
if (!folio) {
+ int split_order;
+
/* Or update major stats only when swapin succeeds?? */
if (fault_type) {
*fault_type |= VM_FAULT_MAJOR;
count_vm_event(PGMAJFAULT);
count_memcg_event_mm(fault_mm, PGMAJFAULT);
}
+
+ /*
+ * Now swap device can only swap in order 0 folio, then we
+ * should split the large swap entry stored in the pagecache
+ * if necessary.
+ */
+ split_order = shmem_split_large_entry(inode, index, swap, 0, gfp);
+ if (split_order < 0) {
+ error = split_order;
+ goto failed;
+ }
+
+ /*
+ * If the large swap entry has already been split, it is
+ * necessary to recalculate the new swap entry based on
+ * the old order alignment.
+ */
+ if (split_order > 0) {
+ pgoff_t offset = index - round_down(index, 1 << split_order);
+
+ swap = swp_entry(swp_type(swap), swp_offset(swap) + offset);
+ }
+
/* Here we actually start the io */
folio = shmem_swapin_cluster(swap, gfp, info, index);
if (!folio) {
--
2.39.3
^ permalink raw reply related [flat|nested] 21+ messages in thread* [PATCH v4 10/10] mm: shmem: support large folio swap out
2024-08-07 7:31 [PATCH v4 00/10] support large folio swap-out and swap-in for shmem Baolin Wang
` (8 preceding siblings ...)
2024-08-07 7:31 ` [PATCH v4 09/10] mm: shmem: split large entry if the swapin folio is not large Baolin Wang
@ 2024-08-07 7:31 ` Baolin Wang
9 siblings, 0 replies; 21+ messages in thread
From: Baolin Wang @ 2024-08-07 7:31 UTC (permalink / raw)
To: akpm, hughd
Cc: willy, david, wangkefeng.wang, chrisl, ying.huang, 21cnbao,
ryan.roberts, shy828301, ziy, ioworker0, da.gomez, p.raghav,
baolin.wang, linux-mm, linux-kernel
Shmem will support large folio allocation [1] [2] to get a better performance,
however, the memory reclaim still splits the precious large folios when trying
to swap out shmem, which may lead to the memory fragmentation issue and can not
take advantage of the large folio for shmeme.
Moreover, the swap code already supports for swapping out large folio without
split, hence this patch set supports the large folio swap out for shmem.
Note the i915_gem_shmem driver still need to be split when swapping, thus
add a new flag 'split_large_folio' for writeback_control to indicate spliting
the large folio.
[1] https://lore.kernel.org/all/cover.1717495894.git.baolin.wang@linux.alibaba.com/
[2] https://lore.kernel.org/all/20240515055719.32577-1-da.gomez@samsung.com/
Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
---
drivers/gpu/drm/i915/gem/i915_gem_shmem.c | 1 +
include/linux/writeback.h | 1 +
mm/shmem.c | 3 +--
mm/vmscan.c | 14 ++++++++++++--
4 files changed, 15 insertions(+), 4 deletions(-)
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_shmem.c b/drivers/gpu/drm/i915/gem/i915_gem_shmem.c
index c5e1c718a6d2..c66cb9c585e1 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_shmem.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_shmem.c
@@ -308,6 +308,7 @@ void __shmem_writeback(size_t size, struct address_space *mapping)
.range_start = 0,
.range_end = LLONG_MAX,
.for_reclaim = 1,
+ .split_large_folio = 1,
};
unsigned long i;
diff --git a/include/linux/writeback.h b/include/linux/writeback.h
index 1a54676d843a..75196b0f894f 100644
--- a/include/linux/writeback.h
+++ b/include/linux/writeback.h
@@ -63,6 +63,7 @@ struct writeback_control {
unsigned range_cyclic:1; /* range_start is cyclic */
unsigned for_sync:1; /* sync(2) WB_SYNC_ALL writeback */
unsigned unpinned_netfs_wb:1; /* Cleared I_PINNING_NETFS_WB */
+ unsigned split_large_folio:1; /* Split large folio for shmem writeback */
/*
* When writeback IOs are bounced through async layers, only the
diff --git a/mm/shmem.c b/mm/shmem.c
index 996062dc196b..68c9a31bc763 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -795,7 +795,6 @@ static int shmem_add_to_page_cache(struct folio *folio,
VM_BUG_ON_FOLIO(index != round_down(index, nr), folio);
VM_BUG_ON_FOLIO(!folio_test_locked(folio), folio);
VM_BUG_ON_FOLIO(!folio_test_swapbacked(folio), folio);
- VM_BUG_ON(expected && folio_test_large(folio));
folio_ref_add(folio, nr);
folio->mapping = mapping;
@@ -1482,7 +1481,7 @@ static int shmem_writepage(struct page *page, struct writeback_control *wbc)
* "force", drivers/gpu/drm/i915/gem/i915_gem_shmem.c gets huge pages,
* and its shmem_writeback() needs them to be split when swapping.
*/
- if (folio_test_large(folio)) {
+ if (wbc->split_large_folio && folio_test_large(folio)) {
/* Ensure the subpages are still dirty */
folio_test_set_dirty(folio);
if (split_huge_page(page) < 0)
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 796f65781f4f..21acd6c2fbab 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1267,8 +1267,12 @@ static unsigned int shrink_folio_list(struct list_head *folio_list,
if (!total_swap_pages)
goto activate_locked;
- /* Split shmem folio */
- if (split_folio_to_list(folio, folio_list))
+ /*
+ * Only split shmem folio when CONFIG_THP_SWAP
+ * is not enabled.
+ */
+ if (!IS_ENABLED(CONFIG_THP_SWAP) &&
+ split_folio_to_list(folio, folio_list))
goto keep_locked;
}
@@ -1370,10 +1374,16 @@ static unsigned int shrink_folio_list(struct list_head *folio_list,
* starts and then write it out here.
*/
try_to_unmap_flush_dirty();
+try_pageout:
switch (pageout(folio, mapping, &plug)) {
case PAGE_KEEP:
goto keep_locked;
case PAGE_ACTIVATE:
+ if (shmem_mapping(mapping) && folio_test_large(folio) &&
+ !split_folio_to_list(folio, folio_list)) {
+ nr_pages = 1;
+ goto try_pageout;
+ }
goto activate_locked;
case PAGE_SUCCESS:
stat->nr_pageout += nr_pages;
--
2.39.3
^ permalink raw reply related [flat|nested] 21+ messages in thread