From: "Huang, Ying" <ying.huang@linux.alibaba.com>
To: Kairui Song <ryncsn@gmail.com>
Cc: Kairui Song via B4 Relay <devnull+kasong.tencent.com@kernel.org>,
linux-mm@kvack.org, Andrew Morton <akpm@linux-foundation.org>,
Kemeng Shi <shikemeng@huaweicloud.com>,
Nhat Pham <nphamcs@gmail.com>, Baoquan He <bhe@redhat.com>,
Barry Song <baohua@kernel.org>, Chris Li <chrisl@kernel.org>,
Johannes Weiner <hannes@cmpxchg.org>,
Yosry Ahmed <yosry.ahmed@linux.dev>,
Chengming Zhou <chengming.zhou@linux.dev>,
Youngjun Park <youngjun.park@lge.com>,
linux-kernel@vger.kernel.org, stable@vger.kernel.org,
Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Subject: Re: [PATCH] Revert "mm, swap: avoid redundant swap device pinning"
Date: Mon, 10 Nov 2025 18:50:01 +0800 [thread overview]
Message-ID: <877bvymaau.fsf@DESKTOP-5N7EMDA> (raw)
In-Reply-To: <CAMgjq7CTdtjMUUk2YvanL_PMZxS_7+pQhHDP-DjkhDaUhDRjDw@mail.gmail.com> (Kairui Song's message of "Mon, 10 Nov 2025 13:32:52 +0800")
Kairui Song <ryncsn@gmail.com> writes:
> On Mon, Nov 10, 2025 at 9:56 AM Huang, Ying
> <ying.huang@linux.alibaba.com> wrote:
>>
>> Hi, Kairui,
>>
>> Kairui Song via B4 Relay <devnull+kasong.tencent.com@kernel.org> writes:
>>
>> > From: Kairui Song <kasong@tencent.com>
>> >
>> > This reverts commit 78524b05f1a3e16a5d00cc9c6259c41a9d6003ce.
>> >
>> > While reviewing recent leaf entry changes, I noticed that commit
>> > 78524b05f1a3 ("mm, swap: avoid redundant swap device pinning") isn't
>> > correct. It's true that most all callers of __read_swap_cache_async are
>> > already holding a swap entry reference, so the repeated swap device
>> > pinning isn't needed on the same swap device, but it is possible that
>> > VMA readahead (swap_vma_readahead()) may encounter swap entries from a
>> > different swap device when there are multiple swap devices, and call
>> > __read_swap_cache_async without holding a reference to that swap device.
>> >
>> > So it is possible to cause a UAF if swapoff of device A raced with
>> > swapin on device B, and VMA readahead tries to read swap entries from
>> > device A. It's not easy to trigger but in theory possible to cause real
>> > issues. And besides, that commit made swap more vulnerable to issues
>> > like corrupted page tables.
>> >
>> > Just revert it. __read_swap_cache_async isn't that sensitive to
>> > performance after all, as it's mostly used for SSD/HDD swap devices with
>> > readahead. SYNCHRONOUS_IO devices may fallback onto it for swap count >
>> > 1 entries, but very soon we will have a new helper and routine for
>> > such devices, so they will never touch this helper or have redundant
>> > swap device reference overhead.
>>
>> Is it better to add get_swap_device() in swap_vma_readahead()? Whenever
>> we get a swap entry, the first thing we need to do is call
>> get_swap_device() to check the validity of the swap entry and prevent
>> the backing swap device from going under us. This helps us to avoid
>> checking the validity of the swap entry in every swap function. Does
>> this sound reasonable?
>
> Hi Ying, thanks for the suggestion!
>
> Yes, that's also a feasible approach.
>
> What I was thinking is that, currently except the readahead path, all
> swapin entry goes through the get_swap_device() helper, that helper
> also helps to mitigate swap entry corruption that may causes OOB or
> NULL deref. Although I think it's really not that helpful at all to
> mitigate page table corruption from the kernel side, but seems not a
> really bad idea to have.
>
> And the code is simpler this way, and seems more suitable for a stable
> & mainline fix. If we want to add get_swap_device() in
> swap_vma_readahead(), we need to do that for every entry that doesn't
> match the target entry's swap device. The reference overhead is
> trivial compared to readhead and bio layer, and only non
> SYNCHRONOUS_IO devices use this helper (madvise is a special case, we
> may optimize that later). ZRAM may fallback to the readahead path but
> this fallback will be eliminated very soon in swap table p2.
We have 2 choices in general.
1. Add get/put_swap_device() in every swap function.
2. Add get/put_swap_device() in every caller of the swap functions.
Personally, I prefer 2. It works better in situations like calling
multiple swap functions. It can reduce duplicated references. It helps
improve code reasoning and readability.
> Another approach I thought about is that we might want readahead to
> stop when it sees entries from a different swap device. That swap
> device might be ZRAM where VMA readahead is not helpful.
>
> How do you think?
One possible solution is to skip or stop for a swap entry from the
SYNCHRONOUS_IO swap device.
---
Best Regards,
Huang, Ying
next prev parent reply other threads:[~2025-11-10 10:50 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-11-09 18:06 [PATCH] Revert "mm, swap: avoid redundant swap device pinning" Kairui Song
2025-11-09 18:06 ` Kairui Song via B4 Relay
2025-11-10 1:00 ` Greg KH
2025-11-10 5:33 ` Kairui Song
2025-11-10 1:56 ` Huang, Ying
2025-11-10 5:32 ` Kairui Song
2025-11-10 10:50 ` Huang, Ying [this message]
2025-11-10 11:37 ` Kairui Song
2025-11-10 12:33 ` Kairui Song
2025-11-11 6:48 ` Huang, Ying
2025-11-14 15:18 ` Kairui Song
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=877bvymaau.fsf@DESKTOP-5N7EMDA \
--to=ying.huang@linux.alibaba.com \
--cc=akpm@linux-foundation.org \
--cc=baohua@kernel.org \
--cc=bhe@redhat.com \
--cc=chengming.zhou@linux.dev \
--cc=chrisl@kernel.org \
--cc=devnull+kasong.tencent.com@kernel.org \
--cc=hannes@cmpxchg.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lorenzo.stoakes@oracle.com \
--cc=nphamcs@gmail.com \
--cc=ryncsn@gmail.com \
--cc=shikemeng@huaweicloud.com \
--cc=stable@vger.kernel.org \
--cc=yosry.ahmed@linux.dev \
--cc=youngjun.park@lge.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.