From: Baolin Wang <baolin.wang@linux.alibaba.com>
To: Hugh Dickins <hughd@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
Baoquan He <bhe@redhat.com>, Barry Song <21cnbao@gmail.com>,
Chris Li <chrisl@kernel.org>,
David Rientjes <rientjes@google.com>,
Kairui Song <ryncsn@gmail.com>,
Kemeng Shi <shikemeng@huaweicloud.com>,
Shakeel Butt <shakeel.butt@linux.dev>,
linux-kernel@vger.kernel.org, linux-mm@kvack.org
Subject: Re: [PATCH mm-new 2/2] mm/shmem: writeout free swap if swap_writeout() reactivates
Date: Sat, 19 Jul 2025 12:32:41 +0800 [thread overview]
Message-ID: <f8bb53a3-c328-4dde-a19b-520d3fbd0fd0@linux.alibaba.com> (raw)
In-Reply-To: <a6f83e63-c5af-cd27-5379-0e8fcb347e18@google.com>
On 2025/7/19 08:51, Hugh Dickins wrote:
> On Thu, 17 Jul 2025, Baolin Wang wrote:
>
>> Hi Hugh,
>>
>> On 2025/7/16 16:08, Hugh Dickins wrote:
>>> If swap_writeout() returns AOP_WRITEPAGE_ACTIVATE (for example, because
>>> zswap cannot compress and memcg disables writeback), there is no virtue
>>> in keeping that folio in swap cache and holding the swap allocation:
>>> shmem_writeout() switch it back to shmem page cache before returning.
>>>
>>> Folio lock is held, and folio->memcg_data remains set throughout, so
>>> there is no need to get into any memcg or memsw charge complications:
>>> swap_free_nr() and delete_from_swap_cache() do as much as is needed (but
>>> beware the race with shmem_free_swap() when inode truncated or evicted).
>>>
>>> Doing the same for an anonymous folio is harder, since it will usually
>>> have been unmapped, with references to the swap left in the page tables.
>>> Adding a function to remap the folio would be fun, but not worthwhile
>>> unless it has other uses, or an urgent bug with anon is demonstrated.
>>>
>>> Signed-off-by: Hugh Dickins <hughd@google.com>
>>> ---
>>> mm/shmem.c | 33 ++++++++++++++++++++++++++++++++-
>>> 1 file changed, 32 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/mm/shmem.c b/mm/shmem.c
>>> index 33675361031b..5a7ce4c8bad6 100644
>>> --- a/mm/shmem.c
>>> +++ b/mm/shmem.c
>>> @@ -1655,6 +1655,7 @@ int shmem_writeout(struct folio *folio, struct
>>> swap_iocb **plug,
>>>
>>> if (!folio_alloc_swap(folio, __GFP_HIGH | __GFP_NOMEMALLOC |
>>> __GFP_NOWARN)) {
>>> bool first_swapped = shmem_recalc_inode(inode, 0, nr_pages);
>>> + int error;
>>>
>>> /*
>>> * Add inode to shmem_unuse()'s list of swapped-out inodes,
>>> @@ -1675,7 +1676,37 @@ int shmem_writeout(struct folio *folio, struct
>>> swap_iocb **plug,
>>> shmem_delete_from_page_cache(folio, swp_to_radix_entry(folio->swap));
>>>
>>> BUG_ON(folio_mapped(folio));
>>> - return swap_writeout(folio, plug);
>>> + error = swap_writeout(folio, plug);
>>> + if (error != AOP_WRITEPAGE_ACTIVATE) {
>>> + /* folio has been unlocked */
>>> + return error;
>>> + }
>>> +
>>> + /*
>>> + * The intention here is to avoid holding on to the swap when
>>> + * zswap was unable to compress and unable to writeback; but
>>> + * it will be appropriate if other reactivate cases are added.
>>> + */
>>> + error = shmem_add_to_page_cache(folio, mapping, index,
>>> + swp_to_radix_entry(folio->swap),
>>> + __GFP_HIGH | __GFP_NOMEMALLOC | __GFP_NOWARN);
>>> + /* Swap entry might be erased by racing shmem_free_swap() */
>>> + if (!error) {
>>> + spin_lock(&info->lock);
>>> + info->swapped -= nr_pages;
>>> + spin_unlock(&info->lock);
>>
>> Using the helper 'shmem_recalc_inode(inode, 0, -nr_pages)' seems more
>> readable?
>
> Yes, that's better, thanks: I don't know if I'd say "more readable",
> but it is much more in the spirit of shmem_recalc_inode(), bringing
> the counts into balance sooner rather than later.
>
> I'll follow up with a "fix" patch to Andrew.
>
>>
>>> + swap_free_nr(folio->swap, nr_pages);
>>> + }
>>> +
>>> + /*
>>> + * The delete_from_swap_cache() below could be left for
>>> + * shrink_folio_list()'s folio_free_swap() to dispose of;
>>> + * but I'm a little nervous about letting this folio out of
>>> + * shmem_writeout() in a hybrid half-tmpfs-half-swap state
>>> + * e.g. folio_mapping(folio) might give an unexpected answer.
>>> + */
>>> + delete_from_swap_cache(folio);
>>
>> IIUC, Should the delete_from_swap_cache() also be moved into the 'if (!error)'
>> branch? Since if shmem_free_swap() has freed the swap entry, it would also
>> reclaim the swap cache, no?
>
> No, but it was a good point to raise, and led into more research than
> I had anticipated.
>
> No: because shmem_free_swap->free_swap_and_cache_nr->__try_to_reclaim_swap
> has to return after doing nothing if its folio_trylock fails: it cannot do
> the delete_from_swap_cache() part of the job, which we do here - on this
> AOP_WRITEPAGE_ACTIVATE path, we hold the folio_lock throughout.
I missed the 'folio_trylock', yes, you are right. Thanks for explanation.
> But it led into more research, because I wanted to point you to the
> equivalent coding in shmem_swapin_folio(): but, to my initial alarm,
> the equivalent is not there; but used to be.
>
> See 5.8 commit 14235ab36019 ("mm: shmem: remove rare optimization when
> swapin races with hole punching"). There (in the deleted lines) you can
> see the helpful comment on this case, with its delete_from_swap_cache()
> when shmem_add_to_page_cache() fails. But for memcg-charging reasons,
> 5.8 found it simpler to drop that, and just let shrink_page_list()
> clear up the debris later.
>
> Here in shmem_writeout(), holding folio_lock throughout, we have no
> memcg complications, and can go ahead with delete_from_swap_cache(),
> both when successfully added back to page cache, and when that fails.
OK. Thanks for pointing out the change history here, and I have no
further questions.
With your following changes, feel free to add:
Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
next prev parent reply other threads:[~2025-07-19 4:32 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-07-16 8:05 [PATCH mm-new 1/2] mm/shmem: hold shmem_swaplist spinlock (not mutex) much less Hugh Dickins
2025-07-16 8:08 ` [PATCH mm-new 2/2] mm/shmem: writeout free swap if swap_writeout() reactivates Hugh Dickins
2025-07-17 9:44 ` Baolin Wang
2025-07-19 0:51 ` Hugh Dickins
2025-07-19 4:32 ` Baolin Wang [this message]
2025-07-19 0:56 ` [PATCH mm-unstable] mm/shmem: writeout free swap if swap_writeout() reactivates fix Hugh Dickins
2025-07-20 7:07 ` [PATCH mm-new 2/2] mm/shmem: writeout free swap if swap_writeout() reactivates David Rientjes
2025-07-17 8:46 ` [PATCH mm-new 1/2] mm/shmem: hold shmem_swaplist spinlock (not mutex) much less Baolin Wang
2025-07-20 7:07 ` David Rientjes
2025-07-21 17:54 ` Kairui Song
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=f8bb53a3-c328-4dde-a19b-520d3fbd0fd0@linux.alibaba.com \
--to=baolin.wang@linux.alibaba.com \
--cc=21cnbao@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=bhe@redhat.com \
--cc=chrisl@kernel.org \
--cc=hughd@google.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=rientjes@google.com \
--cc=ryncsn@gmail.com \
--cc=shakeel.butt@linux.dev \
--cc=shikemeng@huaweicloud.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).