linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Kairui Song <ryncsn@gmail.com>
To: Yosry Ahmed <yosryahmed@google.com>, Barry Song <21cnbao@gmail.com>
Cc: syzbot <syzbot+ce6029250d7fd4d0476d@syzkaller.appspotmail.com>,
	 akpm@linux-foundation.org, chengming.zhou@linux.dev,
	hannes@cmpxchg.org,  linux-kernel@vger.kernel.org,
	linux-mm@kvack.org, nphamcs@gmail.com,
	 syzkaller-bugs@googlegroups.com, Chris Li <chrisl@kernel.org>,
	 Ying <ying.huang@intel.com>, Ryan Roberts <ryan.roberts@arm.com>
Subject: Re: [syzbot] [mm?] WARNING in zswap_swapoff
Date: Tue, 20 Aug 2024 16:47:40 +0800	[thread overview]
Message-ID: <CAMgjq7CaCEZN2hf5pPR4N6BLzUEiMAA7Ax+G_nv4CyHVukxCNw@mail.gmail.com> (raw)
In-Reply-To: <CAJD7tkYWMkcFeXKA2S71PoZAubS+0R29G5qbhTSLLCcd7DfqkQ@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 6339 bytes --]

On Tue, Aug 20, 2024 at 4:13 AM Yosry Ahmed <yosryahmed@google.com> wrote:
> On Fri, Aug 16, 2024 at 12:52 PM syzbot
> <syzbot+ce6029250d7fd4d0476d@syzkaller.appspotmail.com> wrote:
> >
> > Hello,
> >
> > syzbot found the following issue on:
> >
> > HEAD commit:    367b5c3d53e5 Add linux-next specific files for 20240816

I can't find this commit, seems this commit is not in linux-next any more?

> > git tree:       linux-next
> > console output: https://syzkaller.appspot.com/x/log.txt?x=12489105980000
> > kernel config:  https://syzkaller.appspot.com/x/.config?x=61ba6f3b22ee5467
> > dashboard link: https://syzkaller.appspot.com/bug?extid=ce6029250d7fd4d0476d
> > compiler:       Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40
> >
> > Unfortunately, I don't have any reproducer for this issue yet.
> >
> > Downloadable assets:
> > disk image: https://storage.googleapis.com/syzbot-assets/0b1b4e3cad3c/disk-367b5c3d.raw.xz
> > vmlinux: https://storage.googleapis.com/syzbot-assets/5bb090f7813c/vmlinux-367b5c3d.xz
> > kernel image: https://storage.googleapis.com/syzbot-assets/6674cb0709b1/bzImage-367b5c3d.xz
> >
> > IMPORTANT: if you fix the issue, please add the following tag to the commit:
> > Reported-by: syzbot+ce6029250d7fd4d0476d@syzkaller.appspotmail.com
> >
> > ------------[ cut here ]------------
> > WARNING: CPU: 0 PID: 11298 at mm/zswap.c:1700 zswap_swapoff+0x11b/0x2b0 mm/zswap.c:1700
> > Modules linked in:
> > CPU: 0 UID: 0 PID: 11298 Comm: swapoff Not tainted 6.11.0-rc3-next-20240816-syzkaller #0
> > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 06/27/2024
> > RIP: 0010:zswap_swapoff+0x11b/0x2b0 mm/zswap.c:1700
> > Code: 74 05 e8 78 73 07 00 4b 83 7c 35 00 00 75 15 e8 1b bd 9e ff 48 ff c5 49 83 c6 50 83 7c 24 0c 17 76 9b eb 24 e8 06 bd 9e ff 90 <0f> 0b 90 eb e5 48 8b 0c 24 80 e1 07 80 c1 03 38 c1 7c 90 48 8b 3c
> > RSP: 0018:ffffc9000302fa38 EFLAGS: 00010293
> > RAX: ffffffff81f4d66a RBX: dffffc0000000000 RCX: ffff88802c19bc00
> > RDX: 0000000000000000 RSI: 0000000000000002 RDI: ffff888015986248
> > RBP: 0000000000000000 R08: ffffffff81f4d620 R09: 1ffffffff1d476ac
> > R10: dffffc0000000000 R11: fffffbfff1d476ad R12: dffffc0000000000
> > R13: ffff888015986200 R14: 0000000000000048 R15: 0000000000000002
> > FS:  00007f9e628a5380(0000) GS:ffff8880b9000000(0000) knlGS:0000000000000000
> > CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > CR2: 0000001b30f15ff8 CR3: 000000006c5f0000 CR4: 00000000003506f0
> > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > Call Trace:
> >  <TASK>
> >  __do_sys_swapoff mm/swapfile.c:2837 [inline]
> >  __se_sys_swapoff+0x4653/0x4cf0 mm/swapfile.c:2706
> >  do_syscall_x64 arch/x86/entry/common.c:52 [inline]
> >  do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
> >  entry_SYSCALL_64_after_hwframe+0x77/0x7f
> > RIP: 0033:0x7f9e629feb37
> > Code: 73 01 c3 48 8b 0d f1 52 0d 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 b8 a8 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d c1 52 0d 00 f7 d8 64 89 01 48
> > RSP: 002b:00007fff17734f68 EFLAGS: 00000246 ORIG_RAX: 00000000000000a8
> > RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f9e629feb37
> > RDX: 00007f9e62a9e7e8 RSI: 00007f9e62b9beed RDI: 0000563090942a20
> > RBP: 0000563090942a20 R08: 0000000000000000 R09: 77872e07ed164f94
> > R10: 000000000000001f R11: 0000000000000246 R12: 00007fff17735188
> > R13: 00005630909422a0 R14: 0000563073724169 R15: 00007f9e62bdda80
> >  </TASK>
>
> I am hoping syzbot would find a reproducer and bisect this for us.
> Meanwhile, from a high-level it looks to me like we are missing a
> zswap_invalidate() call in some paths.
>
> If I have to guess, I would say it's related to the latest mTHP swap
> changes, but I am not following closely. Perhaps one of the following
> things happened:
>
> (1) We are not calling zswap_invalidate() in some invalidation paths.
> It used to not be called for the cluster freeing path, so maybe we end
> up with some order-0 swap entries in a cluster? or maybe there is an
> entirely new invalidation path that does not go through
> free_swap_slot() for order-0 entries?
>
> (2) Some higher order swap entries (i.e. a cluster) end up in zswap
> somehow. zswap_store() has a warning to cover that though. Maybe
> somehow some swap entries are allocated as a cluster, but then pages
> are swapped out one-by-one as order-0 (which can go to zswap), but
> then we still free the swap entries as a cluster?

Hi Yosry, thanks for the report.

There are many mTHP related optimizations recently, for this problem I
can reproduce this locally. Can confirm the problem is gone for me
after reverting:

"mm: attempt to batch free swap entries for zap_pte_range()"

Hi Barry,

If a set of continuous slots are having the same value, they are
considered a mTHP and freed, bypassing the slot cache, and causing
zswap leak.
This didn't happen in put_swap_folio because that function is
expecting an actual mTHP folio behind the slots but
free_swap_and_cache_nr is simply walking the slots.

For the testing, I actually have to disable mTHP, because linux-next
will panic with mTHP due to lack of following fixes:
https://lore.kernel.org/linux-mm/a4b1b34f-0d8c-490d-ab00-eaedbf3fe780@gmail.com/
https://lore.kernel.org/linux-mm/403b7f3c-6e5b-4030-ab1c-3198f36e3f73@gmail.com/

>
> I am not closely following the latest changes so I am not sure. CCing
> folks who have done work in that area recently.
>
> I am starting to think maybe it would be more reliable to just call
> zswap_invalidate() for all freed swap entries anyway. Would that be
> too expensive? We used to do that before the zswap_invalidate() call
> was moved by commit 0827a1fb143f ("mm/zswap: invalidate zswap entry
> when swap entry free"), and that was before we started using the
> xarray (so it was arguably worse than it would be now).
>

That might be a good idea, I suggest moving zswap_invalidate to
swap_range_free and call it for every freed slot.

Below patch can be squash into or put before "mm: attempt to batch
free swap entries for zap_pte_range()".

[-- Attachment #2: 0001-mm-swap-sanitize-zswap-invalidating.patch --]
[-- Type: application/octet-stream, Size: 1840 bytes --]

From 7e07736deb955b6fe1390e3c67751da77796b660 Mon Sep 17 00:00:00 2001
From: Kairui Song <kasong@tencent.com>
Date: Tue, 20 Aug 2024 16:19:45 +0800
Subject: [PATCH] mm: swap: sanitize zswap invalidating

From: Kairui Song <ryncsn@gmail.com>

ZSWAP doesn't support mTHP/THP yet, so we are only invalidating order 0
entries. But thing will change soon, so call the invalidation for every
slot that are freed.

Signed-off-by: Kairui Song <ryncsn@gmail.com>
Signed-off-by: Kairui Song <kasong@tencent.com>
---
 mm/swap_slots.c | 3 ---
 mm/swapfile.c   | 4 +---
 2 files changed, 1 insertion(+), 6 deletions(-)

diff --git a/mm/swap_slots.c b/mm/swap_slots.c
index 13ab3b771409..d7bb3caa9d4e 100644
--- a/mm/swap_slots.c
+++ b/mm/swap_slots.c
@@ -273,9 +273,6 @@ void free_swap_slot(swp_entry_t entry)
 {
 	struct swap_slots_cache *cache;
 
-	/* Large folio swap slot is not covered. */
-	zswap_invalidate(entry);
-
 	cache = raw_cpu_ptr(&swp_slots);
 	if (likely(use_swap_slot_cache && cache->slots_ret)) {
 		spin_lock_irq(&cache->free_lock);
diff --git a/mm/swapfile.c b/mm/swapfile.c
index f947f4dd31a9..31ca8b15a8da 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -242,9 +242,6 @@ static int __try_to_reclaim_swap(struct swap_info_struct *si,
 	folio_set_dirty(folio);
 
 	spin_lock(&si->lock);
-	/* Only sinple page folio can be backed by zswap */
-	if (nr_pages == 1)
-		zswap_invalidate(entry);
 	swap_entry_range_free(si, entry, nr_pages);
 	spin_unlock(&si->lock);
 	ret = nr_pages;
@@ -956,6 +953,7 @@ static void swap_range_free(struct swap_info_struct *si, unsigned long offset,
 	else
 		swap_slot_free_notify = NULL;
 	while (offset <= end) {
+		zswap_invalidate(swp_entry(si->type, offset));
 		arch_swap_invalidate_page(si->type, offset);
 		if (swap_slot_free_notify)
 			swap_slot_free_notify(si->bdev, offset);
-- 
2.45.2


  reply	other threads:[~2024-08-20  8:48 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-08-16 19:52 [syzbot] [mm?] WARNING in zswap_swapoff syzbot
2024-08-19 20:12 ` Yosry Ahmed
2024-08-20  8:47   ` Kairui Song [this message]
2024-08-20  9:02     ` Kairui Song
2024-08-21  5:49       ` Barry Song
2024-08-21  6:42         ` Kairui Song
2024-08-21  7:38           ` Barry Song
2024-08-21 17:33             ` Kairui Song
2024-08-21 20:59               ` Barry Song
2024-08-22 18:12           ` Yosry Ahmed
2024-08-22 20:16             ` Chris Li
2024-08-22 20:20               ` Yosry Ahmed
2024-08-22 21:44                 ` Chris Li
2024-09-03 18:21             ` Yosry Ahmed
2024-09-03 18:50               ` Kairui Song
2024-09-03 19:41                 ` Yosry Ahmed
2024-08-20  9:22     ` Barry Song
2024-08-20  9:29       ` Kairui Song
2024-08-20  9:53         ` Barry Song

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAMgjq7CaCEZN2hf5pPR4N6BLzUEiMAA7Ax+G_nv4CyHVukxCNw@mail.gmail.com \
    --to=ryncsn@gmail.com \
    --cc=21cnbao@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=chengming.zhou@linux.dev \
    --cc=chrisl@kernel.org \
    --cc=hannes@cmpxchg.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=nphamcs@gmail.com \
    --cc=ryan.roberts@arm.com \
    --cc=syzbot+ce6029250d7fd4d0476d@syzkaller.appspotmail.com \
    --cc=syzkaller-bugs@googlegroups.com \
    --cc=ying.huang@intel.com \
    --cc=yosryahmed@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).