From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id DBAEFC52D7C for ; Thu, 22 Aug 2024 18:13:08 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 42F5C6B02B4; Thu, 22 Aug 2024 14:13:08 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 3DE0B6B02B5; Thu, 22 Aug 2024 14:13:08 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 27F036B02B6; Thu, 22 Aug 2024 14:13:08 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 010A46B02B4 for ; Thu, 22 Aug 2024 14:13:07 -0400 (EDT) Received: from smtpin26.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 95DEF140682 for ; Thu, 22 Aug 2024 18:13:07 +0000 (UTC) X-FDA: 82480678014.26.2AD2C76 Received: from mail-ej1-f54.google.com (mail-ej1-f54.google.com [209.85.218.54]) by imf12.hostedemail.com (Postfix) with ESMTP id AC48340013 for ; Thu, 22 Aug 2024 18:13:04 +0000 (UTC) Authentication-Results: imf12.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=nEkZxJIM; spf=pass (imf12.hostedemail.com: domain of yosryahmed@google.com designates 209.85.218.54 as permitted sender) smtp.mailfrom=yosryahmed@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1724350276; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=bEA07ivxcKTrhghHbIXWztSXXfx1nItRF3cPkcAPhMM=; b=U1GVDWdSvwqi3nLKZcxxiPai4ZPLbg4m42KhyYBpkFm5ky10vnw8DA5qBq/r1XIqgZuNQv 6LQQyjowe2sIoIpeILYfHsuEQ/X7eUzhuOTXfc64dLQmWTmtno0rSYb0Ive7xZxuY2ryKX QTqTHi7oqaqyrBAttbYLak0fF1zlvmU= ARC-Authentication-Results: i=1; imf12.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=nEkZxJIM; spf=pass (imf12.hostedemail.com: domain of yosryahmed@google.com designates 209.85.218.54 as permitted sender) smtp.mailfrom=yosryahmed@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1724350276; a=rsa-sha256; cv=none; b=3h+zM9W1iOUV4UUGG2kDze81AVWsDTS7J6nkTDSjthiiWrJQJhTQEO8Dd6UerYhRSB/Ga7 mO5XYP1v0cm+Qz+t1kOhfpKy2OcZA4TnOV1HuUPGnua5SHTRXg1W8GsOrOztLKtDM7CwDz HFNgSRcQpgIUFTS1CiuOPRu3wYd8fM8= Received: by mail-ej1-f54.google.com with SMTP id a640c23a62f3a-a868b8bb0feso144607666b.0 for ; Thu, 22 Aug 2024 11:13:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1724350383; x=1724955183; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=bEA07ivxcKTrhghHbIXWztSXXfx1nItRF3cPkcAPhMM=; b=nEkZxJIMUVZ/iNyBIhhVZR4J2ilGgGIuOrdUKFyOJqsO9y43KzfJpjFfzCfa5Dxmms He0y6yGvM2DSNsqaKAusB8gStOULzbs9q+LdMBXujqssn9jalu0mmAFhzmDFTRIWfc/5 1jA+Cli5H+6vzpfOv59sC4+bLH1S8q3svv5jkRL5vj0XdmylD2RxLE2vS35Bhq8onoC5 lSoJq5DRJlVtTHeowr9upVIKCNMgzaWGYkOc4ySCxDidM1L499hMNqB7lndcI9aYXRiN ltJxFLsQSNnq/I5htsrDq3d31Z93wjfFbXG2jNliwfDOp3C2ULx0tCd5o11jeGinDS6r 1aXw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1724350383; x=1724955183; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=bEA07ivxcKTrhghHbIXWztSXXfx1nItRF3cPkcAPhMM=; b=Uk047CdYSKU0jpNIFlmEqTm58xKHE+ZSmBRhaF4jr/p/z/a9Hxl2OFsCNZe0A4IM/a OJqC/7MhE9E4Wk0+b0ez5sYtrEJWa5ap6NlNtyY2YhJ87Gu3x4yuHgYEZpMqZz9qApQB mma6o7VtysRXUACdpTGriF5ZQ269dUPS67aEqUWO/ZoUN0ZX0N+BLPI7BaqX7T7JzPWa EPFDtPHs/Sq0d71DmyrpKbaXtvZV8t0aThcoDXgpfRs2yWMnVFz9Gii9AILHuRXGBs6S /bXgSalRJ7LQtA8gGh/FlLdQRAPITfJp/HN7us6KEGbmclMDPQAVFYEBQcxqm/Qf/yK9 0F0w== X-Forwarded-Encrypted: i=1; AJvYcCXyzl0/ku+CJcwIwiKLOpzNxSypfOP0oAVr/rmoXiGmIWKSS8OhwIkRX46e0VKiiRmMTl3ynSt52Q==@kvack.org X-Gm-Message-State: AOJu0Yz3BKzu0v8mj/6EIPW9ThwPwgPxsUmms/MvDsBdkpnH6++HY/QP EJjIV1T60SbOaan4JD8CKSgWE0PdaVv6b03BsqL8Xh1jx1VARyDLSADlovt6fTeS/tveBOH8TTH 9VLl9ni4Z1T43XCB8vFGqoDFLLFUeUKnQ8EqQ1glWmGqRnzmhfZB7 X-Google-Smtp-Source: AGHT+IHZl4hfCu+FDuxH3sMFAWUJr0Msw0v3YdsqpBI9goQtutme2zQT59IVi6ziiRDX0Tu5fMnROUr4Y2nd8DTvxZ0= X-Received: by 2002:a17:907:2d87:b0:a86:86a7:fc with SMTP id a640c23a62f3a-a8686a7024fmr398683266b.45.1724350382324; Thu, 22 Aug 2024 11:13:02 -0700 (PDT) MIME-Version: 1.0 References: <20240821054921.43468-1-21cnbao@gmail.com> In-Reply-To: From: Yosry Ahmed Date: Thu, 22 Aug 2024 11:12:26 -0700 Message-ID: Subject: Re: [syzbot] [mm?] WARNING in zswap_swapoff To: Kairui Song Cc: Barry Song <21cnbao@gmail.com>, akpm@linux-foundation.org, chengming.zhou@linux.dev, chrisl@kernel.org, hannes@cmpxchg.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, nphamcs@gmail.com, ryan.roberts@arm.com, syzbot+ce6029250d7fd4d0476d@syzkaller.appspotmail.com, syzkaller-bugs@googlegroups.com, ying.huang@intel.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: AC48340013 X-Stat-Signature: hst4gsyxc688esximt8qce9csu8uckbm X-Rspam-User: X-HE-Tag: 1724350384-766179 X-HE-Meta: U2FsdGVkX1+zO7jZMdW/t3pfTEzvk8KzBhBVFOp40rlUxHYPr6LHCvU07IH/inCZ7qDXMLQsoCTUbbokzjNvDAcCXpIeAiXiGnW+wFy8FCgqGwG1AqK3YsRSbMnMDQQ89i8MesWkwcq6wmEOXtG0cHhH2aOk34LTOL6KhX9VvWE8/R7GTLs1TPMmgiIVChCd1CT7hPcd2f1dAKiSskWw1y4I6qcWU7aclvAtrot+YX3juRn9JQEC6FGf3qt94GEX9pjtULj++UrzMmPwVFLSF7NbfrSCgBvlYWsbl6AowUfBrB/WnSuIOnKsE0whQv6eBQNXMbiSA8Ks6RalMiqpK1S0s/yyMtpGb6ZgpzDRGY8HHyMAnml8j4j/D0F3a0H21PePxOs8uoy+npkmdYlfvQpIbf1DYiwtHzbLyZteokQ/37ILq+4yOeX6hqL+V6e1oTwj0zCx3NPRbJaanlHmlyZYgf16nczqy2QqKJ3KST4BeXRRjrWbPlr2Qs+5bculkpjA+hGH66JdVykLmRVvx4IFxChczmYcOlGQ41J+UCygBlE6w4uz3DtsfG3cdswWdMRbKAgRXsGdJHF48xH70vILXuzzEc83S1W1myy7yh5p+SjDQAT5NZVqHYAeMUkl3tw9kEHDwXbLhvwjsOPamHoaoYB3beTH7VO8z/VTKVU7jq+7PebCvwT4XLbZ6+2cZ98NvrqCa3uG22HYsSfvl3071HpFl0U95rIMp9JGOmP5a+LC2YoPc7N2d62eINnqH13AxPxvsX5sCKdoIzMUCS73Z79Ipo3ob2jKa9bf3rlzcsx5NQQX1DZ2v+UqxTCjFUQOCQnXqJoCqLv21IRclhKFVhzj4PpkW6H+w/5+YqvvRB88m76ZooKLTZQkmzXE2GdSSdGt4h7dFueHcFyiV7kTAX2axQ3Nc+4b5VhDxTkSUskgG4NB43bvExmWVkh/ISrTjtxG7CJbnGNabac FQTlGyzT 7g8ktf4KVwVcCfd15Hjuw5Unr9bJFDKq7c53+4GDoznVbVv2eY5L2xT7uPf65FA87sr7Dd+YN1X/fBZhcg6dtkTtPtk4e7A8wazUKcEJwspYW/fdHU7PG8GNBKaXZeI7Kl5IrzEM3FZxl3OYccdHXdGstbTMLTTXu5+VqjmrqParyVQjN6vvatQZ+lFvU7bzT+xJaiW4whSuG/OF5xdpt2Ofnwqe9z1ob22K2qHk/o29xFRMvAaxj+GZMYi00XmQkZfb85p9sYuoFsa9O6u3YPxQROf+9cdKpjycxe2sQnMGbTVpT09V6q1eVPfp8s7PZPm2zpfEmxVoGS5uEuaxbpBduB/b38MvbIQzkn7TsAJSeQjhbpyyVakABhRZpcnQpQeXY4OR8YcfUH8soCBGKs12n6XqN1PXfig67paNzLppYdDtzbpADHDTOjlBW64a5yfIw6TkhDdtH3p1ZgqhU6+CuZlz4W9A8XAeTyjsWYkKpqm8Dp21sNVJ3vuQdiMOdl35Elcqn1e1ZfF1ekOxEG00G3TrM3EykpSDwITsVsnnBxY1f668qbiwSQbqtsNt4xGDNtTA8H9wtWUhl3S/fXEiLtRBX1ickxW89DWFAYsmtFMP2I1enFlzXh6RZBboaZ/+zWihJ1zMVqc2v0p885JsNQb81UC7zYR5R/ghhhmb4V+GoN7P+nXWUa8zx76p86b3rjdfm9zpwT4VEKnbzhz9/c/wJbrCWWK1kib1rhVV7sm8B6UU7dl10aDP51wDgnl8cKoUPVWTQUWwHktbYAg/Nr/iAzyujQSlkip3d9cp+apQwzVPTR7iV58BFTqwHwmOBSHeJuKKeMn/8j4ygn/RUbwjJuqkv0r9QNYMjWsx2/spfskpqi6EtIW4gJyvR7oJSHhzOt0mFhbM= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, Aug 20, 2024 at 11:42=E2=80=AFPM Kairui Song wro= te: > > On Wed, Aug 21, 2024 at 1:49=E2=80=AFPM Barry Song <21cnbao@gmail.com> wr= ote: > > > > On Tue, Aug 20, 2024 at 9:02=E2=80=AFPM Kairui Song = wrote: > > > > > > On Tue, Aug 20, 2024 at 4:47=E2=80=AFPM Kairui Song wrote: > > > > > > > > On Tue, Aug 20, 2024 at 4:13=E2=80=AFAM Yosry Ahmed wrote: > > > > > On Fri, Aug 16, 2024 at 12:52=E2=80=AFPM syzbot > > > > > wrote: > > > > > > > > > > > > Hello, > > > > > > > > > > > > syzbot found the following issue on: > > > > > > > > > > > > HEAD commit: 367b5c3d53e5 Add linux-next specific files for = 20240816 > > > > > > > > I can't find this commit, seems this commit is not in linux-next an= y more? > > > > > > > > > > git tree: linux-next > > > > > > console output: https://syzkaller.appspot.com/x/log.txt?x=3D124= 89105980000 > > > > > > kernel config: https://syzkaller.appspot.com/x/.config?x=3D61b= a6f3b22ee5467 > > > > > > dashboard link: https://syzkaller.appspot.com/bug?extid=3Dce602= 9250d7fd4d0476d > > > > > > compiler: Debian clang version 15.0.6, GNU ld (GNU Binuti= ls for Debian) 2.40 > > > > > > > > > > > > Unfortunately, I don't have any reproducer for this issue yet. > > > > > > > > > > > > Downloadable assets: > > > > > > disk image: https://storage.googleapis.com/syzbot-assets/0b1b4e= 3cad3c/disk-367b5c3d.raw.xz > > > > > > vmlinux: https://storage.googleapis.com/syzbot-assets/5bb090f78= 13c/vmlinux-367b5c3d.xz > > > > > > kernel image: https://storage.googleapis.com/syzbot-assets/6674= cb0709b1/bzImage-367b5c3d.xz > > > > > > > > > > > > IMPORTANT: if you fix the issue, please add the following tag t= o the commit: > > > > > > Reported-by: syzbot+ce6029250d7fd4d0476d@syzkaller.appspotmail.= com > > > > > > > > > > > > ------------[ cut here ]------------ > > > > > > WARNING: CPU: 0 PID: 11298 at mm/zswap.c:1700 zswap_swapoff+0x1= 1b/0x2b0 mm/zswap.c:1700 > > > > > > Modules linked in: > > > > > > CPU: 0 UID: 0 PID: 11298 Comm: swapoff Not tainted 6.11.0-rc3-n= ext-20240816-syzkaller #0 > > > > > > Hardware name: Google Google Compute Engine/Google Compute Engi= ne, BIOS Google 06/27/2024 > > > > > > RIP: 0010:zswap_swapoff+0x11b/0x2b0 mm/zswap.c:1700 > > > > > > Code: 74 05 e8 78 73 07 00 4b 83 7c 35 00 00 75 15 e8 1b bd 9e = ff 48 ff c5 49 83 c6 50 83 7c 24 0c 17 76 9b eb 24 e8 06 bd 9e ff 90 <0f> 0= b 90 eb e5 48 8b 0c 24 80 e1 07 80 c1 03 38 c1 7c 90 48 8b 3c > > > > > > RSP: 0018:ffffc9000302fa38 EFLAGS: 00010293 > > > > > > RAX: ffffffff81f4d66a RBX: dffffc0000000000 RCX: ffff88802c19bc= 00 > > > > > > RDX: 0000000000000000 RSI: 0000000000000002 RDI: ffff8880159862= 48 > > > > > > RBP: 0000000000000000 R08: ffffffff81f4d620 R09: 1ffffffff1d476= ac > > > > > > R10: dffffc0000000000 R11: fffffbfff1d476ad R12: dffffc00000000= 00 > > > > > > R13: ffff888015986200 R14: 0000000000000048 R15: 00000000000000= 02 > > > > > > FS: 00007f9e628a5380(0000) GS:ffff8880b9000000(0000) knlGS:000= 0000000000000 > > > > > > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > > > > > CR2: 0000001b30f15ff8 CR3: 000000006c5f0000 CR4: 00000000003506= f0 > > > > > > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 00000000000000= 00 > > > > > > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 00000000000004= 00 > > > > > > Call Trace: > > > > > > > > > > > > __do_sys_swapoff mm/swapfile.c:2837 [inline] > > > > > > __se_sys_swapoff+0x4653/0x4cf0 mm/swapfile.c:2706 > > > > > > do_syscall_x64 arch/x86/entry/common.c:52 [inline] > > > > > > do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83 > > > > > > entry_SYSCALL_64_after_hwframe+0x77/0x7f > > > > > > RIP: 0033:0x7f9e629feb37 > > > > > > Code: 73 01 c3 48 8b 0d f1 52 0d 00 f7 d8 64 89 01 48 83 c8 ff = c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 b8 a8 00 00 00 0f 05 <48> 3= d 01 f0 ff ff 73 01 c3 48 8b 0d c1 52 0d 00 f7 d8 64 89 01 48 > > > > > > RSP: 002b:00007fff17734f68 EFLAGS: 00000246 ORIG_RAX: 000000000= 00000a8 > > > > > > RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f9e629feb= 37 > > > > > > RDX: 00007f9e62a9e7e8 RSI: 00007f9e62b9beed RDI: 0000563090942a= 20 > > > > > > RBP: 0000563090942a20 R08: 0000000000000000 R09: 77872e07ed164f= 94 > > > > > > R10: 000000000000001f R11: 0000000000000246 R12: 00007fff177351= 88 > > > > > > R13: 00005630909422a0 R14: 0000563073724169 R15: 00007f9e62bdda= 80 > > > > > > > > > > > > > > > > I am hoping syzbot would find a reproducer and bisect this for us= . > > > > > Meanwhile, from a high-level it looks to me like we are missing a > > > > > zswap_invalidate() call in some paths. > > > > > > > > > > If I have to guess, I would say it's related to the latest mTHP s= wap > > > > > changes, but I am not following closely. Perhaps one of the follo= wing > > > > > things happened: > > > > > > > > > > (1) We are not calling zswap_invalidate() in some invalidation pa= ths. > > > > > It used to not be called for the cluster freeing path, so maybe w= e end > > > > > up with some order-0 swap entries in a cluster? or maybe there is= an > > > > > entirely new invalidation path that does not go through > > > > > free_swap_slot() for order-0 entries? > > > > > > > > > > (2) Some higher order swap entries (i.e. a cluster) end up in zsw= ap > > > > > somehow. zswap_store() has a warning to cover that though. Maybe > > > > > somehow some swap entries are allocated as a cluster, but then pa= ges > > > > > are swapped out one-by-one as order-0 (which can go to zswap), bu= t > > > > > then we still free the swap entries as a cluster? > > > > > > > > Hi Yosry, thanks for the report. > > > > > > > > There are many mTHP related optimizations recently, for this proble= m I > > > > can reproduce this locally. Can confirm the problem is gone for me > > > > after reverting: > > > > > > > > "mm: attempt to batch free swap entries for zap_pte_range()" > > > > > > > > Hi Barry, > > > > > > > > If a set of continuous slots are having the same value, they are > > > > considered a mTHP and freed, bypassing the slot cache, and causing > > > > zswap leak. > > > > This didn't happen in put_swap_folio because that function is > > > > expecting an actual mTHP folio behind the slots but > > > > free_swap_and_cache_nr is simply walking the slots. > > > > > > > > For the testing, I actually have to disable mTHP, because linux-nex= t > > > > will panic with mTHP due to lack of following fixes: > > > > https://lore.kernel.org/linux-mm/a4b1b34f-0d8c-490d-ab00-eaedbf3fe7= 80@gmail.com/ > > > > https://lore.kernel.org/linux-mm/403b7f3c-6e5b-4030-ab1c-3198f36e3f= 73@gmail.com/ > > > > > > > > > > > > > > I am not closely following the latest changes so I am not sure. C= Cing > > > > > folks who have done work in that area recently. > > > > > > > > > > I am starting to think maybe it would be more reliable to just ca= ll > > > > > zswap_invalidate() for all freed swap entries anyway. Would that = be > > > > > too expensive? We used to do that before the zswap_invalidate() c= all > > > > > was moved by commit 0827a1fb143f ("mm/zswap: invalidate zswap ent= ry > > > > > when swap entry free"), and that was before we started using the > > > > > xarray (so it was arguably worse than it would be now). > > > > > > > > > > > > > That might be a good idea, I suggest moving zswap_invalidate to > > > > swap_range_free and call it for every freed slot. > > > > > > > > Below patch can be squash into or put before "mm: attempt to batch > > > > free swap entries for zap_pte_range()". > > > > > > Hmm, on second thought, the commit message in the attachment commit > > > might be not suitable, current zswap_invalidate is also designed to > > > only work for order 0 ZSWAP, so things are not clean even after this. > > > > Kairui, what about the below? we don't touch the path of __try_to_recla= im_swap() where > > you have one folio backed? > > > > diff --git a/mm/swapfile.c b/mm/swapfile.c > > index c1638a009113..8ff58be40544 100644 > > --- a/mm/swapfile.c > > +++ b/mm/swapfile.c > > @@ -1514,6 +1514,8 @@ static bool __swap_entries_free(struct swap_info_= struct *si, > > unlock_cluster_or_swap_info(si, ci); > > > > if (!has_cache) { > > + for (i =3D 0; i < nr; i++) > > + zswap_invalidate(swp_entry(si->type, offset + i= )); > > spin_lock(&si->lock); > > swap_entry_range_free(si, entry, nr); > > spin_unlock(&si->lock); > > > > Hi Barry, > > Thanks for updating this thread, I'm thinking maybe something will > better be done at the zswap side? > > The concern of using zswap_invalidate is that it calls xa_erase which > requires the xa spin lock. But if we are calling zswap_invalidate in > swap_entry_range_free, and ensure the slot is HAS_CACHE pinned, doing > a lockless read first with xa_load should be OK for checking if the > slot needs a ZSWAP invalidation. The performance cost will be minimal > and we only need to call zswap_invalidate in one place, something like > this (haven't tested, comments are welcome). Also ZSWAP mthp will > still store entried in order 0 so this should be OK for future. While I do agree with this change on a high level, it's essentially reverting commit 0827a1fb143f ("mm/zswap: invalidate zswap entry when swap entry free") which fixed a small problem with zswap writeback. I'd prefer that we don't if possible. One thing that I always wanted to do is to pull some of the work done in swap_entry_range_free() and swap_range_free() before the slots caching layer. The memcg uncharging, clearing shadow entries from the swap cache, arch invalidation, zswap invalidation, etc. If we can have a hook for these pre-free callbacks we can call it for single entries before we add them to the slots cache, and call them for the clusters as we do today. This should also reduce the amount of work done under the lock, and move more work to where the freeing is actually happening vs. the cache draining. I remember discussing this briefly with Ying before. Anyone have any though= ts? > > diff --git a/mm/swap_slots.c b/mm/swap_slots.c > index 13ab3b771409..d7bb3caa9d4e 100644 > --- a/mm/swap_slots.c > +++ b/mm/swap_slots.c > @@ -273,9 +273,6 @@ void free_swap_slot(swp_entry_t entry) > { > struct swap_slots_cache *cache; > > - /* Large folio swap slot is not covered. */ > - zswap_invalidate(entry); > - > cache =3D raw_cpu_ptr(&swp_slots); > if (likely(use_swap_slot_cache && cache->slots_ret)) { > spin_lock_irq(&cache->free_lock); > diff --git a/mm/swapfile.c b/mm/swapfile.c > index f947f4dd31a9..fbc25d38a27e 100644 > --- a/mm/swapfile.c > +++ b/mm/swapfile.c > @@ -242,9 +242,6 @@ static int __try_to_reclaim_swap(struct > swap_info_struct *si, > folio_set_dirty(folio); > > spin_lock(&si->lock); > - /* Only sinple page folio can be backed by zswap */ > - if (nr_pages =3D=3D 1) > - zswap_invalidate(entry); > swap_entry_range_free(si, entry, nr_pages); > spin_unlock(&si->lock); > ret =3D nr_pages; > @@ -1545,6 +1542,10 @@ static void swap_entry_range_free(struct > swap_info_struct *si, swp_entry_t entry > unsigned char *map_end =3D map + nr_pages; > struct swap_cluster_info *ci; > > + /* Slots are pinned with SWAP_HAS_CACHE, safe to invalidate */ > + for (int i =3D 0; i < nr_pages; ++i) > + zswap_invalidate(swp_entry(si->type, offset + i)); > + > ci =3D lock_cluster(si, offset); > do { > VM_BUG_ON(*map !=3D SWAP_HAS_CACHE); > diff --git a/mm/zswap.c b/mm/zswap.c > index df66ab102d27..100ad04397fe 100644 > --- a/mm/zswap.c > +++ b/mm/zswap.c > @@ -1656,15 +1656,18 @@ bool zswap_load(struct folio *folio) > return true; > } > > +/* Caller need to pin the slot to prevent parallel store */ > void zswap_invalidate(swp_entry_t swp) > { > pgoff_t offset =3D swp_offset(swp); > struct xarray *tree =3D swap_zswap_tree(swp); > struct zswap_entry *entry; > > - entry =3D xa_erase(tree, offset); > - if (entry) > - zswap_entry_free(entry); > + if (xa_load(tree, offset)) { > + entry =3D xa_erase(tree, offset); > + if (entry) > + zswap_entry_free(entry); > + } > } > > int zswap_swapon(int type, unsigned long nr_pages) > -- > 2.45.2