From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C5268E7719A for ; Wed, 8 Jan 2025 06:42:30 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 0E9436B0088; Wed, 8 Jan 2025 01:42:30 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 098526B0089; Wed, 8 Jan 2025 01:42:30 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id EA1F76B008A; Wed, 8 Jan 2025 01:42:29 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id CC0BC6B0088 for ; Wed, 8 Jan 2025 01:42:29 -0500 (EST) Received: from smtpin13.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 44F6180FA0 for ; Wed, 8 Jan 2025 06:42:29 +0000 (UTC) X-FDA: 82983340818.13.EF298DD Received: from nyc.source.kernel.org (nyc.source.kernel.org [147.75.193.91]) by imf24.hostedemail.com (Postfix) with ESMTP id 9716A180009 for ; Wed, 8 Jan 2025 06:42:27 +0000 (UTC) Authentication-Results: imf24.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=ou2rJIlc; spf=pass (imf24.hostedemail.com: domain of akpm@linux-foundation.org designates 147.75.193.91 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1736318547; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=nY2eF4mKZhEvoha1PyzcK+jx9EQxEtxfOKioQSR3Fl4=; b=sanV2n4doV/wEUd0BTy5PURex7dZm4jmFddB3O5AjQlLsD7E0VfKKdySQfeNr9xRjqyGGd oO1oKbtXREhlosJIw2axRuAE6p9X4ITo00rHECB2rynq44FxE13/EwTVS3+zzmoXS6VZoD H+HAPrKukK1hAb2q8fvrlffuTVtr6c0= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1736318547; a=rsa-sha256; cv=none; b=O9AbFPu+iMSAgmAumcVJl3CNVDbh4Gw3d8ZA9/TcrjdWnT985YM5FxNzA3UPLMK9SyIjyd auFTzgP+Dd6HhIrhw7+8LeTbCDi18Ogb9dQFyyEkEB5elH/LZAd/+r05Fj2gk5ukGRg2nM 1H8qJs8Yy+z/OfqX82mi2aa2L/rUmFU= ARC-Authentication-Results: i=1; imf24.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=ou2rJIlc; spf=pass (imf24.hostedemail.com: domain of akpm@linux-foundation.org designates 147.75.193.91 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org; dmarc=none Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by nyc.source.kernel.org (Postfix) with ESMTP id 22BEAA402AE; Wed, 8 Jan 2025 06:40:38 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 23AE0C4CED0; Wed, 8 Jan 2025 06:42:26 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1736318546; bh=eqQyL6uXRSBkobZ5H5C6/zVISQanWe64Esxtx85Tx5s=; h=Date:From:To:Cc:Subject:In-Reply-To:References:From; b=ou2rJIlcwB17AxGpGf99JyAmieT95M9Rl55xPTMsU3d9RJkMJ971cUB+cLCvc34Jl Kvs56MiDaU8lc3FMWUUhzH7dnRto8wr3RBsg9wlpcoM04KOtmM0Ei2bNnrsSVfrTgM HmUNQ+IPH52/hF9HeGGMsJU8tt8/eRKS0aLj2qpc= Date: Tue, 7 Jan 2025 22:42:25 -0800 From: Andrew Morton To: Yosry Ahmed Cc: Nhat Pham , Johannes Weiner , Chengming Zhou , Vitaly Wool , Barry Song , Sam Sun , linux-mm@kvack.org, linux-kernel@vger.kernel.org, stable@vger.kernel.org Subject: Re: [PATCH RESEND 2/2] mm: zswap: use SRCU to synchronize with CPU hotunplug Message-Id: <20250107224225.ca41ec2f0340b6b768f44a6a@linux-foundation.org> In-Reply-To: References: <20250107074724.1756696-1-yosryahmed@google.com> <20250107074724.1756696-2-yosryahmed@google.com> <20250107180345.GD37530@cmpxchg.org> X-Mailer: Sylpheed 3.8.0beta1 (GTK+ 2.24.33; x86_64-pc-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-Rspamd-Server: rspam10 X-Rspamd-Queue-Id: 9716A180009 X-Stat-Signature: a8pt3biw441ydt6adygc1h3zoxyh4jzt X-Rspam-User: X-HE-Tag: 1736318547-498780 X-HE-Meta: U2FsdGVkX1/BPlh54aECj4U60Jj3Gisvq7l2m2D/I1LQUaCVdqFvrbCfjNefJa3suklpxn3wztNz73SZp++Rygr6EAwAkuWq/rBv3ujmB8G+nQFY+kMkKxsOxD/6+jpIC2byEFU3fUEXpvGvaGVy5w6dha9wSpvD66fXuCoz/jlNz4IcQHNkCEdATYblNp9tKop5vjBMF7km/Kw+t6UH8UO/W88OzAl2WG7G3DLmWgtYP/5XBo+azR7iqZVizob1IdUmDawqPqT5PhwRebc9Coz5a7JXWuqNToTV/oRUbj7LaccWTV5FiRLkUlNVPdo/9l9kj6vj4M4JN2ZOMWFYad2BOWdpVZyqyTj0UcctOpu+s0WTqHCCcVFZGx8GXcz8mYNKFr7YKXmDnaJKeTsjIXGSv0Yr3f3MxAcheeYHukfT2OtqA00FbN/aBwc5C4mj4dGk/RVSyZhFH+qBi8XKnNCiuzChSz9TL4ELMTH6v0C2OPu3Sgg5nJU5evYnjHnvL4HT2OfdGBykuWW3Ir4e9die6Pga8A3E8f0+GQtrG90yZSbki1qmwgg3AcRBZdKQ6mg366a9QA+L2BBJhdTP06SxQPWcxi+OlOf7dDFx4/IF9wNdE428nvoGh7eD9V3GHAo/COO9/uEAdDO7QhOlgnjxYXB9Wl7NLzGa5j5jzz22Uz8VoaMoZHaEkxgPq+w56hUx2bYWVEnjvn9DW6xdK61KjCiAnL13hUq4G2Ou7NWCHcSWVOLKoHbYsUVVt5nBse0/daruBy0aYuOmaJlyy2DgasciI+P3E4Ffj32H1NbeYClM7bK1ZLsiszbsl0PJlXTraBzP+hulqMnwwjPp4QJK1GwpRG4HSkrV/fn9R+lRauDGetEYOubEq1h49xJj17tiSRSeIINZNzEZWVHIcx7S8R52/xc0DseUbaaNCNcHSDPHt4Bc69x/9LzTIeCdeCp1vmekk0iGGZcccQl wFSrFc1/ ZZJM6WE0mP5wIZVd+1+rQ4RADY5yoCg4NsOfodKC5VUJMOljTrIxBalrx5vyEOUTKGpd0Yo1agYpjjtzvF0p5DLA/awzbiAOaySfgB7vhDkxRvpEIhezJFUYZOG1lSJqjQXbb9C3EnxIPh5K/kh0c+NShIPZYQDFtdfzyOgoVpJg6sAx/xqf4km/LybyhWgqoi+8USfeA9GhA+OEXB7DjaGTmf2IXNRZRTI12i5qyYCljRSGgkUjt4kEkeocPvhKhiV462m6HiY6nRAJOHDmBP5mlh0OWWaaYZw7oJY35Wes1jPaytZ1JHEmQSPICaNdekQSx2GcaRMZMmSK10hVAHP9HIBbD3wZQNNrOgsFQCTdtvIAvT3TZW5GWXyHo7V8G3ElMjxBF7Hxwl72sFwJZWkHWMF0/ev7MnrTJwNRmPecZWf0aDUGhUHybxTZ0e7XNPVwABaN9alRvz/I5WA60yh2Snw/OHvz2DkDnEBHqF0eRi4Ym7eMqfLbK6js+eol2Ldo9X0qnlkdISdw= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: > > Andrew, could you please pick up patch 1 (the revert) while we figure > out the alternative fix? It's important that it lands in v6.13 to > avoid the possibility of deadlock. Figuring out an alternative fix is > less important. I have the below patch in mm-hotfixes-unstable. I also have https://lkml.kernel.org/r/20250107222236.2715883-2-yosryahmed@google.com in mm-hotfixes-unstable. Don't know what to do with it. I have no patch "mm: zswap: use SRCU to synchronize with CPU hotunplug" in mm-unstable. From: Yosry Ahmed Subject: Revert "mm: zswap: fix race between [de]compression and CPU hotunplug" Date: Tue, 7 Jan 2025 22:22:34 +0000 This reverts commit eaebeb93922ca6ab0dd92027b73d0112701706ef. Commit eaebeb93922c ("mm: zswap: fix race between [de]compression and CPU hotunplug") used the CPU hotplug lock in zswap compress/decompress operations to protect against a race with CPU hotunplug making some per-CPU resources go away. However, zswap compress/decompress can be reached through reclaim while the lock is held, resulting in a potential deadlock as reported by syzbot: ====================================================== WARNING: possible circular locking dependency detected 6.13.0-rc6-syzkaller-00006-g5428dc1906dd #0 Not tainted ------------------------------------------------------ kswapd0/89 is trying to acquire lock: ffffffff8e7d2ed0 (cpu_hotplug_lock){++++}-{0:0}, at: acomp_ctx_get_cpu mm/zswap.c:886 [inline] ffffffff8e7d2ed0 (cpu_hotplug_lock){++++}-{0:0}, at: zswap_compress mm/zswap.c:908 [inline] ffffffff8e7d2ed0 (cpu_hotplug_lock){++++}-{0:0}, at: zswap_store_page mm/zswap.c:1439 [inline] ffffffff8e7d2ed0 (cpu_hotplug_lock){++++}-{0:0}, at: zswap_store+0xa74/0x1ba0 mm/zswap.c:1546 but task is already holding lock: ffffffff8ea355a0 (fs_reclaim){+.+.}-{0:0}, at: balance_pgdat mm/vmscan.c:6871 [inline] ffffffff8ea355a0 (fs_reclaim){+.+.}-{0:0}, at: kswapd+0xb58/0x2f30 mm/vmscan.c:7253 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (fs_reclaim){+.+.}-{0:0}: lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5849 __fs_reclaim_acquire mm/page_alloc.c:3853 [inline] fs_reclaim_acquire+0x88/0x130 mm/page_alloc.c:3867 might_alloc include/linux/sched/mm.h:318 [inline] slab_pre_alloc_hook mm/slub.c:4070 [inline] slab_alloc_node mm/slub.c:4148 [inline] __kmalloc_cache_node_noprof+0x40/0x3a0 mm/slub.c:4337 kmalloc_node_noprof include/linux/slab.h:924 [inline] alloc_worker kernel/workqueue.c:2638 [inline] create_worker+0x11b/0x720 kernel/workqueue.c:2781 workqueue_prepare_cpu+0xe3/0x170 kernel/workqueue.c:6628 cpuhp_invoke_callback+0x48d/0x830 kernel/cpu.c:194 __cpuhp_invoke_callback_range kernel/cpu.c:965 [inline] cpuhp_invoke_callback_range kernel/cpu.c:989 [inline] cpuhp_up_callbacks kernel/cpu.c:1020 [inline] _cpu_up+0x2b3/0x580 kernel/cpu.c:1690 cpu_up+0x184/0x230 kernel/cpu.c:1722 cpuhp_bringup_mask+0xdf/0x260 kernel/cpu.c:1788 cpuhp_bringup_cpus_parallel+0xf9/0x160 kernel/cpu.c:1878 bringup_nonboot_cpus+0x2b/0x50 kernel/cpu.c:1892 smp_init+0x34/0x150 kernel/smp.c:1009 kernel_init_freeable+0x417/0x5d0 init/main.c:1569 kernel_init+0x1d/0x2b0 init/main.c:1466 ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244 -> #0 (cpu_hotplug_lock){++++}-{0:0}: check_prev_add kernel/locking/lockdep.c:3161 [inline] check_prevs_add kernel/locking/lockdep.c:3280 [inline] validate_chain+0x18ef/0x5920 kernel/locking/lockdep.c:3904 __lock_acquire+0x1397/0x2100 kernel/locking/lockdep.c:5226 lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5849 percpu_down_read include/linux/percpu-rwsem.h:51 [inline] cpus_read_lock+0x42/0x150 kernel/cpu.c:490 acomp_ctx_get_cpu mm/zswap.c:886 [inline] zswap_compress mm/zswap.c:908 [inline] zswap_store_page mm/zswap.c:1439 [inline] zswap_store+0xa74/0x1ba0 mm/zswap.c:1546 swap_writepage+0x647/0xce0 mm/page_io.c:279 shmem_writepage+0x1248/0x1610 mm/shmem.c:1579 pageout mm/vmscan.c:696 [inline] shrink_folio_list+0x35ee/0x57e0 mm/vmscan.c:1374 shrink_inactive_list mm/vmscan.c:1967 [inline] shrink_list mm/vmscan.c:2205 [inline] shrink_lruvec+0x16db/0x2f30 mm/vmscan.c:5734 mem_cgroup_shrink_node+0x385/0x8e0 mm/vmscan.c:6575 mem_cgroup_soft_reclaim mm/memcontrol-v1.c:312 [inline] memcg1_soft_limit_reclaim+0x346/0x810 mm/memcontrol-v1.c:362 balance_pgdat mm/vmscan.c:6975 [inline] kswapd+0x17b3/0x2f30 mm/vmscan.c:7253 kthread+0x2f0/0x390 kernel/kthread.c:389 ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244 other info that might help us debug this: Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(fs_reclaim); lock(cpu_hotplug_lock); lock(fs_reclaim); rlock(cpu_hotplug_lock); *** DEADLOCK *** 1 lock held by kswapd0/89: #0: ffffffff8ea355a0 (fs_reclaim){+.+.}-{0:0}, at: balance_pgdat mm/vmscan.c:6871 [inline] #0: ffffffff8ea355a0 (fs_reclaim){+.+.}-{0:0}, at: kswapd+0xb58/0x2f30 mm/vmscan.c:7253 stack backtrace: CPU: 0 UID: 0 PID: 89 Comm: kswapd0 Not tainted 6.13.0-rc6-syzkaller-00006-g5428dc1906dd #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/13/2024 Call Trace: __dump_stack lib/dump_stack.c:94 [inline] dump_stack_lvl+0x241/0x360 lib/dump_stack.c:120 print_circular_bug+0x13a/0x1b0 kernel/locking/lockdep.c:2074 check_noncircular+0x36a/0x4a0 kernel/locking/lockdep.c:2206 check_prev_add kernel/locking/lockdep.c:3161 [inline] check_prevs_add kernel/locking/lockdep.c:3280 [inline] validate_chain+0x18ef/0x5920 kernel/locking/lockdep.c:3904 __lock_acquire+0x1397/0x2100 kernel/locking/lockdep.c:5226 lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5849 percpu_down_read include/linux/percpu-rwsem.h:51 [inline] cpus_read_lock+0x42/0x150 kernel/cpu.c:490 acomp_ctx_get_cpu mm/zswap.c:886 [inline] zswap_compress mm/zswap.c:908 [inline] zswap_store_page mm/zswap.c:1439 [inline] zswap_store+0xa74/0x1ba0 mm/zswap.c:1546 swap_writepage+0x647/0xce0 mm/page_io.c:279 shmem_writepage+0x1248/0x1610 mm/shmem.c:1579 pageout mm/vmscan.c:696 [inline] shrink_folio_list+0x35ee/0x57e0 mm/vmscan.c:1374 shrink_inactive_list mm/vmscan.c:1967 [inline] shrink_list mm/vmscan.c:2205 [inline] shrink_lruvec+0x16db/0x2f30 mm/vmscan.c:5734 mem_cgroup_shrink_node+0x385/0x8e0 mm/vmscan.c:6575 mem_cgroup_soft_reclaim mm/memcontrol-v1.c:312 [inline] memcg1_soft_limit_reclaim+0x346/0x810 mm/memcontrol-v1.c:362 balance_pgdat mm/vmscan.c:6975 [inline] kswapd+0x17b3/0x2f30 mm/vmscan.c:7253 kthread+0x2f0/0x390 kernel/kthread.c:389 ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244 Revert the change. A different fix for the race with CPU hotunplug will follow. Link: https://lkml.kernel.org/r/20250107222236.2715883-1-yosryahmed@google.com Signed-off-by: Yosry Ahmed Reported-by: syzbot Cc: Barry Song Cc: Chengming Zhou Cc: Johannes Weiner Cc: Kanchana P Sridhar Cc: Nhat Pham Cc: Sam Sun Cc: Vitaly Wool Cc: Signed-off-by: Andrew Morton --- mm/zswap.c | 19 +++---------------- 1 file changed, 3 insertions(+), 16 deletions(-) --- a/mm/zswap.c~revert-mm-zswap-fix-race-between-compression-and-cpu-hotunplug +++ a/mm/zswap.c @@ -880,18 +880,6 @@ static int zswap_cpu_comp_dead(unsigned return 0; } -/* Prevent CPU hotplug from freeing up the per-CPU acomp_ctx resources */ -static struct crypto_acomp_ctx *acomp_ctx_get_cpu(struct crypto_acomp_ctx __percpu *acomp_ctx) -{ - cpus_read_lock(); - return raw_cpu_ptr(acomp_ctx); -} - -static void acomp_ctx_put_cpu(void) -{ - cpus_read_unlock(); -} - static bool zswap_compress(struct page *page, struct zswap_entry *entry, struct zswap_pool *pool) { @@ -905,7 +893,8 @@ static bool zswap_compress(struct page * gfp_t gfp; u8 *dst; - acomp_ctx = acomp_ctx_get_cpu(pool->acomp_ctx); + acomp_ctx = raw_cpu_ptr(pool->acomp_ctx); + mutex_lock(&acomp_ctx->mutex); dst = acomp_ctx->buffer; @@ -961,7 +950,6 @@ unlock: zswap_reject_alloc_fail++; mutex_unlock(&acomp_ctx->mutex); - acomp_ctx_put_cpu(); return comp_ret == 0 && alloc_ret == 0; } @@ -972,7 +960,7 @@ static void zswap_decompress(struct zswa struct crypto_acomp_ctx *acomp_ctx; u8 *src; - acomp_ctx = acomp_ctx_get_cpu(entry->pool->acomp_ctx); + acomp_ctx = raw_cpu_ptr(entry->pool->acomp_ctx); mutex_lock(&acomp_ctx->mutex); src = zpool_map_handle(zpool, entry->handle, ZPOOL_MM_RO); @@ -1002,7 +990,6 @@ static void zswap_decompress(struct zswa if (src != acomp_ctx->buffer) zpool_unmap_handle(zpool, entry->handle); - acomp_ctx_put_cpu(); } /********************************* _