From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from out30-119.freemail.mail.aliyun.com (out30-119.freemail.mail.aliyun.com [115.124.30.119]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2E9E323B60A; Mon, 10 Nov 2025 10:55:26 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=115.124.30.119 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1762772131; cv=none; b=TmrzwRjqiP46Ih22zTSrdXmm5zn8Zyk71ENMg+ll/xaOp8zkh3ndFqitgDjB0VIMWOJTZiqlfQqjNgnX9yqvQNV8po/dkI+k2xlW2701bDaJEu9E6aduGLeSUFc+a+sucznVmkvJ9lSjgyD7E2KytQG3TYIQZspw+NWhR6O3T1I= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1762772131; c=relaxed/simple; bh=Q8IujptSu2LeCXU8JhQKUxaH7FsIZEOd9utsCw4OaiA=; h=From:To:Cc:Subject:In-Reply-To:References:Date:Message-ID: MIME-Version:Content-Type; b=CD9lk6tPoYcSw28LO894WEm0xjzpZ0OILJIqUdtlCx+/Iocv2Lp/pLrrC5t7Ab266eH++BTh53AlRvNG9o43YAXEgicDzPcMhtpkBAyjch3WNlsPfNKF8I/ZOty0wqZlkmcXz11f5ul9az1slyWB2tj/MhZ9Fto0MdVR92+62E4= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.alibaba.com; spf=pass smtp.mailfrom=linux.alibaba.com; dkim=pass (1024-bit key) header.d=linux.alibaba.com header.i=@linux.alibaba.com header.b=S0Xsly9W; arc=none smtp.client-ip=115.124.30.119 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.alibaba.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.alibaba.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.alibaba.com header.i=@linux.alibaba.com header.b="S0Xsly9W" DKIM-Signature:v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.alibaba.com; s=default; t=1762772124; h=From:To:Subject:Date:Message-ID:MIME-Version:Content-Type; bh=IYGpWxlfS7Ghtlne/3hBnEGQd5/vaVez4+R8iM/BRvc=; b=S0Xsly9W6abOUcCOpwfQjpjm12cwEqHwLF0vC24JAfMXWy1cA4AQNyqV8mq+NyusVZA+hAUmL3Ur/9oODTxOqU7AwCsrAIa5B9EJ8QyvC/nXG2l2fO54ao8XkngaeJXCkxdIl7AUP8b/Ki0g1ykuOnmEvm1Lh+ur2EbW37KQn6A= Received: from DESKTOP-5N7EMDA(mailfrom:ying.huang@linux.alibaba.com fp:SMTPD_---0Ws1z-Vy_1762771801 cluster:ay36) by smtp.aliyun-inc.com; Mon, 10 Nov 2025 18:50:02 +0800 From: "Huang, Ying" To: Kairui Song Cc: Kairui Song via B4 Relay , linux-mm@kvack.org, Andrew Morton , Kemeng Shi , Nhat Pham , Baoquan He , Barry Song , Chris Li , Johannes Weiner , Yosry Ahmed , Chengming Zhou , Youngjun Park , linux-kernel@vger.kernel.org, stable@vger.kernel.org, Lorenzo Stoakes Subject: Re: [PATCH] Revert "mm, swap: avoid redundant swap device pinning" In-Reply-To: (Kairui Song's message of "Mon, 10 Nov 2025 13:32:52 +0800") References: <20251110-revert-78524b05f1a3-v1-1-88313f2b9b20@tencent.com> <875xbiodl2.fsf@DESKTOP-5N7EMDA> Date: Mon, 10 Nov 2025 18:50:01 +0800 Message-ID: <877bvymaau.fsf@DESKTOP-5N7EMDA> User-Agent: Gnus/5.13 (Gnus v5.13) Precedence: bulk X-Mailing-List: stable@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Kairui Song writes: > On Mon, Nov 10, 2025 at 9:56=E2=80=AFAM Huang, Ying > wrote: >> >> Hi, Kairui, >> >> Kairui Song via B4 Relay writes: >> >> > From: Kairui Song >> > >> > This reverts commit 78524b05f1a3e16a5d00cc9c6259c41a9d6003ce. >> > >> > While reviewing recent leaf entry changes, I noticed that commit >> > 78524b05f1a3 ("mm, swap: avoid redundant swap device pinning") isn't >> > correct. It's true that most all callers of __read_swap_cache_async are >> > already holding a swap entry reference, so the repeated swap device >> > pinning isn't needed on the same swap device, but it is possible that >> > VMA readahead (swap_vma_readahead()) may encounter swap entries from a >> > different swap device when there are multiple swap devices, and call >> > __read_swap_cache_async without holding a reference to that swap devic= e. >> > >> > So it is possible to cause a UAF if swapoff of device A raced with >> > swapin on device B, and VMA readahead tries to read swap entries from >> > device A. It's not easy to trigger but in theory possible to cause real >> > issues. And besides, that commit made swap more vulnerable to issues >> > like corrupted page tables. >> > >> > Just revert it. __read_swap_cache_async isn't that sensitive to >> > performance after all, as it's mostly used for SSD/HDD swap devices wi= th >> > readahead. SYNCHRONOUS_IO devices may fallback onto it for swap count > >> > 1 entries, but very soon we will have a new helper and routine for >> > such devices, so they will never touch this helper or have redundant >> > swap device reference overhead. >> >> Is it better to add get_swap_device() in swap_vma_readahead()? Whenever >> we get a swap entry, the first thing we need to do is call >> get_swap_device() to check the validity of the swap entry and prevent >> the backing swap device from going under us. This helps us to avoid >> checking the validity of the swap entry in every swap function. Does >> this sound reasonable? > > Hi Ying, thanks for the suggestion! > > Yes, that's also a feasible approach. > > What I was thinking is that, currently except the readahead path, all > swapin entry goes through the get_swap_device() helper, that helper > also helps to mitigate swap entry corruption that may causes OOB or > NULL deref. Although I think it's really not that helpful at all to > mitigate page table corruption from the kernel side, but seems not a > really bad idea to have. > > And the code is simpler this way, and seems more suitable for a stable > & mainline fix. If we want to add get_swap_device() in > swap_vma_readahead(), we need to do that for every entry that doesn't > match the target entry's swap device. The reference overhead is > trivial compared to readhead and bio layer, and only non > SYNCHRONOUS_IO devices use this helper (madvise is a special case, we > may optimize that later). ZRAM may fallback to the readahead path but > this fallback will be eliminated very soon in swap table p2. We have 2 choices in general. 1. Add get/put_swap_device() in every swap function. 2. Add get/put_swap_device() in every caller of the swap functions. Personally, I prefer 2. It works better in situations like calling multiple swap functions. It can reduce duplicated references. It helps improve code reasoning and readability. > Another approach I thought about is that we might want readahead to > stop when it sees entries from a different swap device. That swap > device might be ZRAM where VMA readahead is not helpful. > > How do you think? One possible solution is to skip or stop for a swap entry from the SYNCHRONOUS_IO swap device. --- Best Regards, Huang, Ying