All of lore.kernel.org
 help / color / mirror / Atom feed
From: Hyeonggon Yoo <42.hyeyoo@gmail.com>
To: "Huang, Ying" <ying.huang@intel.com>
Cc: linux-mm@kvack.org, akpm@linux-foundation.org,
	Zi Yan <ziy@nvidia.com>, Yang Shi <shy828301@gmail.com>,
	Baolin Wang <baolin.wang@linux.alibaba.com>,
	Oscar Salvador <osalvador@suse.de>,
	Matthew Wilcox <willy@infradead.org>,
	Bharata B Rao <bharata@amd.com>,
	Alistair Popple <apopple@nvidia.com>,
	haoxin <xhao@linux.alibaba.com>, Minchan Kim <minchan@kernel.org>
Subject: Re: [BISECTED] first bad commit is c203c6d5b3f0597 ("migrate_pages: batch _unmap and _move")
Date: Sun, 5 Feb 2023 14:38:44 +0000	[thread overview]
Message-ID: <Y9+/dDEmxHvuy76V@localhost> (raw)
In-Reply-To: <874js2n65l.fsf@yhuang6-desk2.ccr.corp.intel.com>

On Fri, Feb 03, 2023 at 11:02:46PM +0800, Huang, Ying wrote:
> Hyeonggon Yoo <42.hyeyoo@gmail.com> writes:
> 
> > On Fri, Feb 03, 2023 at 07:17:14AM +0800, Huang, Ying wrote:
> >> "Huang, Ying" <ying.huang@intel.com> writes:
> >> 
> >> > Hi, Hyeonggon,
> >> >
> >> > Hyeonggon Yoo <42.hyeyoo@gmail.com> writes:
> >> >
> >> >> On Wed, Feb 01, 2023 at 01:09:10AM +0900, Hyeonggon Yoo wrote:
> >> >>> I've observed random list_del corruption on mm-unstable,
> >> >>> where HEAD is commit d69862e693c069f4
> >> >>> ("mm/migrate: convert putback_movable_pages() to use folios").
> >> >>> 
> >> >>> The issue can be easily reproduced by stressing MM multiple times:
> >> >>> 	stress-ng --bigheap 0 --timeout 300
> >> >>> 
> >> >>> The compiler is gcc 12.2.1 and config, dmesg are included as attachment.
> >> >>> I will try to bisect but can't promise quick resolution :)
> >> >>
> >> >>
> >> >> The first bad commits appears to be:
> >> >> c203c6d5b3f0597 ("migrate_pages: batch _unmap and _move")
> >> >>
> >> >> the first bad commit _probably_ be earlier, but this is quite
> >> >> easy to reproduce so at this point I think above is the real bad commit.
> >> >
> >> > Thank you very much for reporting the bug.  I'm in travel now but I will
> >> > try to find some time to reproduce and debug it.
> >> 
> >> Still haven't reproduced the issue.  But after reviewing the code, I
> >> found a bug in the code, which may cause list corruption.  Can you try
> >> the debug patch below?
> >
> > Unfortunately my home server seems to be broken again :(
> > That means I only have access to VMs and not a real machine now.
> >
> > FYI it was not reproduced on KVM but reproduced on real machine.
> > Could you try checking on your machine with the config I attached? [1]
> 
> Thank you very much for testing!
>
> > Sorry to bother your travel!
> 
> Never mind.  Your report helps me very much!
> 
> > [1] https://marc.info/?l=linux-mm&m=167518135116956
> 
> I have reproduced the bug successfully!  And I can reproduce the bug
> with the previous debug patch too, although the reproduction rate isn't
> high.
> 
> And in my test, the following patch can fix the bug.
> 
> It appears that zswap code will touch dst->lru during moving page.

After setting swap space I was able to reproduce even on VM.

> -------------------------8<----------------------------------
> From b2e3f4aab16d8af0033286fde669b46e7467c7ec Mon Sep 17 00:00:00 2001
> From: Huang Ying <ying.huang@intel.com>
> Date: Fri, 3 Feb 2023 22:03:24 +0800
> Subject: [PATCH] dbg,migrate_pages: restore destination folio state before
>  move
> 
> ---
>  mm/migrate.c | 15 ++++++++-------
>  1 file changed, 8 insertions(+), 7 deletions(-)


This fixes the bug on my test:

Tested-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Thanks for such a quick fix!

> 
> diff --git a/mm/migrate.c b/mm/migrate.c
> index 143d96775b4d..fa7212330cb6 100644
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -1225,13 +1225,19 @@ static int __migrate_folio_move(struct folio *src, struct folio *dst,
>  	int page_was_mapped = 0;
>  	struct anon_vma *anon_vma = NULL;
>  	bool is_lru = !__PageMovable(&src->page);
> +	struct list_head *prev;
>  
>  	__migrate_folio_extract(dst, &page_was_mapped, &anon_vma);
> +	prev = dst->lru.prev;
> +	list_del(&dst->lru);

BTW may be silly questions,
 
- How can zswap touch dst->lru during moving page, is there no lock
  that prevents this to happen?

- Does this race (?) happen only during moving page?
  (I mean, why is it safe to perform list_del()/list_add() before and
  after moving page?)

>  
>  	rc = move_to_new_folio(dst, src, mode);
>  
> -	if (rc != -EAGAIN)
> -		list_del(&dst->lru);
> +	if (rc == -EAGAIN) {
> +		list_add(&dst->lru, prev);
> +		__migrate_folio_record(dst, page_was_mapped, anon_vma);
> +		return rc;
> +	}
>
>  
>  	if (unlikely(!is_lru))
>  		goto out_unlock_both;
> @@ -1251,11 +1257,6 @@ static int __migrate_folio_move(struct folio *src, struct folio *dst,
>  			lru_add_drain();
>  	}
>  
> -	if (rc == -EAGAIN) {
> -		__migrate_folio_record(dst, page_was_mapped, anon_vma);
> -		return rc;
> -	}
> -
>  	if (page_was_mapped)
>  		remove_migration_ptes(src,
>  			rc == MIGRATEPAGE_SUCCESS ? dst : src, false);
> -- 
> 2.35.1


  reply	other threads:[~2023-02-05 14:38 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <Y9k9Jl9wIaUFZS30@hyeyoo>
2023-01-31 17:35 ` [BISECTED] first bad commit is c203c6d5b3f0597 ("migrate_pages: batch _unmap and _move") Hyeonggon Yoo
2023-01-31 22:45   ` Andrew Morton
2023-02-01 23:28   ` Huang, Ying
2023-02-02 23:17     ` Huang, Ying
2023-02-03 14:17       ` Hyeonggon Yoo
2023-02-03 15:02         ` Huang, Ying
2023-02-05 14:38           ` Hyeonggon Yoo [this message]
2023-02-06  6:25             ` Huang, Ying

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Y9+/dDEmxHvuy76V@localhost \
    --to=42.hyeyoo@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=apopple@nvidia.com \
    --cc=baolin.wang@linux.alibaba.com \
    --cc=bharata@amd.com \
    --cc=linux-mm@kvack.org \
    --cc=minchan@kernel.org \
    --cc=osalvador@suse.de \
    --cc=shy828301@gmail.com \
    --cc=willy@infradead.org \
    --cc=xhao@linux.alibaba.com \
    --cc=ying.huang@intel.com \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.