linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Chris Li <chrisl@kernel.org>
To: Kairui Song <ryncsn@gmail.com>
Cc: linux-mm <linux-mm@kvack.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	 Matthew Wilcox <willy@infradead.org>,
	Hugh Dickins <hughd@google.com>, Barry Song <baohua@kernel.org>,
	 Baoquan He <bhe@redhat.com>, Nhat Pham <nphamcs@gmail.com>,
	 Kemeng Shi <shikemeng@huaweicloud.com>,
	Baolin Wang <baolin.wang@linux.alibaba.com>,
	 Ying Huang <ying.huang@linux.alibaba.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	 David Hildenbrand <david@redhat.com>,
	Yosry Ahmed <yosryahmed@google.com>,
	 Lorenzo Stoakes <lorenzo.stoakes@oracle.com>,
	Zi Yan <ziy@nvidia.com>,  LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 2/9] mm, swap: always lock and check the swap cache folio before use
Date: Fri, 29 Aug 2025 18:42:20 -0700	[thread overview]
Message-ID: <CACePvbUYDfne6mZ32c3YzxcwH16u-G3YXoYoGaXL5EtxjDH8mw@mail.gmail.com> (raw)
In-Reply-To: <CAMgjq7D3EiLOyEQiR+Uq6mTdXajMCg4+A+KRZajE91kbTw=tHg@mail.gmail.com>

Hi Kairui,

Sorry for the late reply. I have been super busy this week.

On Wed, Aug 27, 2025 at 6:44 AM Kairui Song <ryncsn@gmail.com> wrote:
>
> On Wed, Aug 27, 2025 at 4:06 PM Chris Li <chrisl@kernel.org> wrote:
> >
> > On Fri, Aug 22, 2025 at 12:21 PM Kairui Song <ryncsn@gmail.com> wrote:
> > >
> > > diff --git a/mm/memory.c b/mm/memory.c
> > > index 10ef528a5f44..9ca8e1873c6e 100644
> > > --- a/mm/memory.c
> > > +++ b/mm/memory.c
> > > @@ -4661,12 +4661,7 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
> > >                 goto out;
> > >
> > >         folio = swap_cache_get_folio(entry);
> > > -       if (folio) {
> > > -               swap_update_readahead(folio, vma, vmf->address);
> > > -               page = folio_file_page(folio, swp_offset(entry));
> > > -       }
> > >         swapcache = folio;
> > > -
> >
> > Can simplify as:
> >            folio = swapcache = swap_cache_get_folio(entry);
>
> checkpatch.pl is unhappy about it:
>
> checkpatch: multiple assignments should be avoided

Ah, never mind then. I actually like multiple assignments but I can
see checkpatch wants to ban it.

>
> >
> > >         if (!folio) {
> > >                 if (data_race(si->flags & SWP_SYNCHRONOUS_IO) &&
> > >                     __swap_count(entry) == 1) {
> > > @@ -4735,20 +4730,13 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
> > >                 ret = VM_FAULT_MAJOR;
> > >                 count_vm_event(PGMAJFAULT);
> > >                 count_memcg_event_mm(vma->vm_mm, PGMAJFAULT);
> > > -               page = folio_file_page(folio, swp_offset(entry));
> > > -       } else if (PageHWPoison(page)) {
> > > -               /*
> > > -                * hwpoisoned dirty swapcache pages are kept for killing
> > > -                * owner processes (which may be unknown at hwpoison time)
> > > -                */
> > > -               ret = VM_FAULT_HWPOISON;
> > > -               goto out_release;
> >
> > Here you move the HWPosion(page) bail out from before taking the page
> > lock to after the page lock. The HWPosion page should be able to bail
> > out without taking the lock.
> >
> >
> > >         }
> > >
> > >         ret |= folio_lock_or_retry(folio, vmf);
> > >         if (ret & VM_FAULT_RETRY)
> > >                 goto out_release;
> > >
> > > +       page = folio_file_page(folio, swp_offset(entry));
> > >         if (swapcache) {
> > >                 /*
> > >                  * Make sure folio_free_swap() or swapoff did not release the
> > > @@ -4757,10 +4745,20 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
> > >                  * swapcache, we need to check that the page's swap has not
> > >                  * changed.
> > >                  */
> > > -               if (unlikely(!folio_test_swapcache(folio) ||
> > > -                            page_swap_entry(page).val != entry.val))
> > > +               if (!folio_contains_swap(folio, entry))
> > >                         goto out_page;
> > >
> > > +               if (PageHWPoison(page)) {
> > > +                       /*
> > > +                        * hwpoisoned dirty swapcache pages are kept for killing
> > > +                        * owner processes (which may be unknown at hwpoison time)
> > > +                        */
> > > +                       ret = VM_FAULT_HWPOISON;
> > > +                       goto out_page;
> >
> > It seems you bail out with the page still locked, that seems like a bug to me.
>
> I think it's the original behaviour that is kind of fragile. The
> returned folio of swap_cache_get_folio is unstable unless locked, so
> the folio could have been removed from swap cache or marked by some
> other threads as Poisoned. So the PageHWPoison here could be tested
> against an unrelated page, which leads to false positive PageHWPoison
> results. We have encountered several similar bugs due to using folios
> returned by swap cache lookup without lock & checking first.
>
> So checking HWPoison after locking is actually safer.

How do I verify the code can handle HWPoison from unlock to lock page?
I haven't followed the HWPoison path very much. I am still wondering
how does the HWPoison code handle the page previously unlocked, now
become locked and without any additional code change?

If you want this behavior, I strongly suggest you split this portion
of the change out, ideally outside this 9 series if you can afford to
do so.

This patch description has nothing mentioning this kind of subtle
behavior change, as a reviewer I am caught off guard by this. At the
very least the commit should mention this. Talk explains why this
change is needed and why it is safe to do so.

Having very subtle behavior changes buried in a large code restructure
change is the worst. Splitting it out will reduce the reviewer's
workload to reason this behavior change, and makes your series easier
to review.

Does it make sense?

Chris

  reply	other threads:[~2025-08-30  1:42 UTC|newest]

Thread overview: 97+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-08-22 19:20 [PATCH 0/9] mm, swap: introduce swap table as swap cache (phase I) Kairui Song
2025-08-22 19:20 ` [PATCH 1/9] mm, swap: use unified helper for swap cache look up Kairui Song
2025-08-27  2:47   ` Chris Li
2025-08-27  3:50     ` Chris Li
2025-08-27 13:45     ` Kairui Song
2025-08-27  3:52   ` Baoquan He
2025-08-27 13:46     ` Kairui Song
2025-08-28  3:20   ` Baolin Wang
2025-09-01 23:50   ` Barry Song
2025-09-02  6:12     ` Kairui Song
2025-09-02  6:52       ` Chris Li
2025-09-02 10:06   ` David Hildenbrand
2025-09-02 12:32     ` Chris Li
2025-09-02 13:18       ` David Hildenbrand
2025-09-02 16:38     ` Kairui Song
2025-09-02 10:10   ` David Hildenbrand
2025-09-02 17:13     ` Kairui Song
2025-09-03  8:00       ` David Hildenbrand
2025-09-03 17:41   ` Nhat Pham
2025-09-04 16:05     ` Kairui Song
2025-08-22 19:20 ` [PATCH 2/9] mm, swap: always lock and check the swap cache folio before use Kairui Song
2025-08-27  6:13   ` Chris Li
2025-08-27 13:44     ` Kairui Song
2025-08-30  1:42       ` Chris Li [this message]
2025-08-27  7:03   ` Chris Li
2025-08-27 14:35     ` Kairui Song
2025-08-28  3:41       ` Baolin Wang
2025-08-28 18:05         ` Kairui Song
2025-08-30  1:53       ` Chris Li
2025-08-30 15:15         ` Kairui Song
2025-08-30 17:17           ` Chris Li
2025-09-01 18:17         ` Kairui Song
2025-09-01 21:10           ` Chris Li
2025-09-02  5:40   ` Barry Song
2025-09-02 10:18   ` David Hildenbrand
2025-09-02 10:21     ` David Hildenbrand
2025-09-02 12:46     ` Chris Li
2025-09-02 13:27       ` Kairui Song
2025-08-22 19:20 ` [PATCH 3/9] mm, swap: rename and move some swap cluster definition and helpers Kairui Song
2025-08-30  2:31   ` Chris Li
2025-09-02  5:53   ` Barry Song
2025-09-02 10:20   ` David Hildenbrand
2025-09-02 12:50     ` Chris Li
2025-08-22 19:20 ` [PATCH 4/9] mm, swap: tidy up swap device and cluster info helpers Kairui Song
2025-08-27  3:47   ` Baoquan He
2025-08-27 17:44     ` Chris Li
2025-08-27 23:46       ` Baoquan He
2025-08-30  2:38         ` Chris Li
2025-09-02  6:01       ` Barry Song
2025-09-03  9:28       ` David Hildenbrand
2025-09-02  6:02   ` Barry Song
2025-09-02 13:33   ` David Hildenbrand
2025-09-02 15:03     ` Kairui Song
2025-09-03  8:11       ` David Hildenbrand
2025-08-22 19:20 ` [PATCH 5/9] mm/shmem, swap: remove redundant error handling for replacing folio Kairui Song
2025-08-25  3:02   ` Baolin Wang
2025-08-25  9:45     ` Kairui Song
2025-08-30  2:41       ` Chris Li
2025-09-03  8:25   ` David Hildenbrand
2025-08-22 19:20 ` [PATCH 6/9] mm, swap: use the swap table for the swap cache and switch API Kairui Song
2025-08-30  1:54   ` Baoquan He
2025-08-30  3:40     ` Chris Li
2025-08-30  3:34   ` Chris Li
2025-08-30 16:52     ` Kairui Song
2025-08-31  1:00       ` Chris Li
2025-09-02 11:51         ` Kairui Song
2025-09-02  9:55   ` Barry Song
2025-09-02 11:58     ` Kairui Song
2025-09-02 23:44       ` Barry Song
2025-09-03  2:12         ` Kairui Song
2025-09-03  2:31           ` Barry Song
2025-09-03 11:41   ` David Hildenbrand
2025-09-03 12:54     ` Kairui Song
2025-09-04  9:28       ` David Hildenbrand
2025-08-22 19:20 ` [PATCH 7/9] mm, swap: remove contention workaround for swap cache Kairui Song
2025-08-30  4:07   ` Chris Li
2025-08-30 15:24     ` Kairui Song
2025-08-31 15:54       ` Kairui Song
2025-08-31 20:06         ` Chris Li
2025-08-31 20:04       ` Chris Li
2025-09-02 10:06   ` Barry Song
2025-08-22 19:20 ` [PATCH 8/9] mm, swap: implement dynamic allocation of swap table Kairui Song
2025-08-30  4:17   ` Chris Li
2025-09-02 11:15   ` Barry Song
2025-09-02 13:17     ` Chris Li
2025-09-02 16:57       ` Kairui Song
2025-09-02 23:31       ` Barry Song
2025-09-03  2:13         ` Kairui Song
2025-09-03 12:35         ` Chris Li
2025-09-03 20:52           ` Barry Song
2025-09-04  6:50             ` Chris Li
2025-08-22 19:20 ` [PATCH 9/9] mm, swap: use a single page for swap table when the size fits Kairui Song
2025-08-30  4:23   ` Chris Li
2025-08-26 22:00 ` [PATCH 0/9] mm, swap: introduce swap table as swap cache (phase I) Chris Li
2025-08-30  5:44 ` Chris Li
2025-09-04 16:36   ` Kairui Song
2025-09-04 18:50     ` Chris Li

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CACePvbUYDfne6mZ32c3YzxcwH16u-G3YXoYoGaXL5EtxjDH8mw@mail.gmail.com \
    --to=chrisl@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=baohua@kernel.org \
    --cc=baolin.wang@linux.alibaba.com \
    --cc=bhe@redhat.com \
    --cc=david@redhat.com \
    --cc=hannes@cmpxchg.org \
    --cc=hughd@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lorenzo.stoakes@oracle.com \
    --cc=nphamcs@gmail.com \
    --cc=ryncsn@gmail.com \
    --cc=shikemeng@huaweicloud.com \
    --cc=willy@infradead.org \
    --cc=ying.huang@linux.alibaba.com \
    --cc=yosryahmed@google.com \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).