All of lore.kernel.org
 help / color / mirror / Atom feed
From: Wang Yugui <wangyugui@e16-tech.com>
To: Yang Shi <shy828301@gmail.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
	Linux MM <linux-mm@kvack.org>
Subject: Re: kernel BUG at mm/huge_memory.c:2736(linux 5.10.29)
Date: Fri, 23 Apr 2021 10:16:55 +0800	[thread overview]
Message-ID: <20210423101654.1242.409509F4@e16-tech.com> (raw)
In-Reply-To: <CAHbLzkpiVjDs9qPL=sX7PRMTweyi9TForGB3B4yGhqR575p_Xg@mail.gmail.com>

Hi,

> On Sat, Apr 17, 2021 at 1:33 AM Wang Yugui <wangyugui@e16-tech.com> wrote:
> >
> > Hi,
> >
> > > On Mon, Apr 12, 2021 at 3:07 AM Wang Yugui <wangyugui@e16-tech.com> wrote:
> > > >
> > > > Hi,
> > > >
> > > > kernel BUG at mm/huge_memory.c:2736(linux 5.10.29) is triggered
> > > > by some files write test.
> > > >
> > > > mm/huge_memory.c:
> > > >         if (IS_ENABLED(CONFIG_DEBUG_VM) && mapcount) {
> > > >             pr_alert("total_mapcount: %u, page_count(): %u\n",
> > > >                     mapcount, count);
> > > >             if (PageTail(page))
> > > >                 dump_page(head, NULL);
> > > >             dump_page(page, "total_mapcount(head) > 0");
> > > > L2736:           BUG();
> > > >         }
> > >
> > > We just can tell the mapcount of the page is not zero from the current
> > > log, it might mean the unmap_page() call is failed. It seems you have
> > > CONFIG_DEBUG_VM enabled, could you please paste more log? There is
> > > "VM_BUG_ON_PAGE(!unmap_success, page)" in unmap_page(). It should be
> > > able to tell us if unmap_page() is failed or not, or something else
> > > happened.
> >
> > This is the full dmesg output
> >
> > [63080.331513] huge_memory: total_mapcount: 511, page_count(): 512
> > [63080.332167] page:00000000d2e1a982 refcount:512 mapcount:0 mapping:0000000000000000 index:0x7fe260582 pfn:0x676a00
> > [63080.332167] head:00000000d2e1a982 order:9 compound_mapcount:0 compound_pincount:0
> > [63080.332167] anon flags: 0x17ffffc009001d(locked|uptodate|dirty|lru|head|swapbacked)
> > [63080.332167] raw: 0017ffffc009001d ffffc93cda0d0008 ffffc93cd9ab0008 ffff8f21be9f0cb9
> > [63080.332167] raw: 00000007fe260582 0000000000000000 00000200ffffffff ffff8f1021810000
> > [63080.332167] page->mem_cgroup:ffff8f1021810000
> > [63080.332167] page:00000000bc78ac24 refcount:512 mapcount:1 mapping:0000000000000000 index:0x7fe260584 pfn:0x676a02
> > [63080.332167] head:00000000d2e1a982 order:9 compound_mapcount:0 compound_pincount:0
> > [63080.332167] anon flags: 0x17ffffc009001d(locked|uptodate|dirty|lru|head|swapbacked)
> > [63080.332167] raw: 0017ffffc0000000 ffffc93cd9da8001 dead000000000000 ffffc93d428d0098
> > [63080.332167] raw: ffffa002cd183bf0 0000000000000000 0000000000000000 0000000000000000
> > [63080.332167] head: 0017ffffc009001d ffffc93cda0d0008 ffffc93cd9ab0008 ffff8f21be9f0cb9
> > [63080.332167] head: 00000007fe260582 0000000000000000 00000200ffffffff ffff8f1021810000
> > [63080.332167] page dumped because: total_mapcount(head) > 0
> 
> Added Kirill in this loop too, he may have some insights.
> 
> Thanks a lot for pasting the full log. It seems the BUG_ON in
> unmap_page() and VM_BUG_ON_PAGE(compound_mapcount(head), head) were
> not triggered. But the dumped page shows its total_mapcount is 511. It
> means 511 subpages of the huge page are PTE mapped. It seems all tail
> pages are PTE mapped. It may be because unmap_page() is failed or they
> are mapped again after unmap_page().
> 
> But the VM_BUG_ON_PAGE just checks compound_mapcount, and it seems
> page_mapcount() call in unmap_page() also just checks
> compound_mapcount and the mapcount of the head page. If the mapcount
> of the head page is 0 and compound_mapcount is also 0, try_to_unmap()
> considers unmap is successful.
> 
> So we can't tell which case it is although I don't think of how
> unmap_page() could fail for this case.  I think we should check the
> total mapcount in try_to_unmap() instead.
> 
> Can you please try the below debug patch (untested) to help narrow
> down the problem?
> 
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index ae907a9c2050..c10e89be1c99 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -2726,7 +2726,7 @@ int split_huge_page_to_list(struct page *page,
> struct list_head *list)
>         }
> 
>         unmap_page(head);
> -       VM_BUG_ON_PAGE(compound_mapcount(head), head);
> +       VM_BUG_ON_PAGE(total_mapcount(head), head);
> 
>         /* block interrupt reentry in xa_lock and spinlock */
>         local_irq_disable();
> diff --git a/mm/rmap.c b/mm/rmap.c
> index b0fc27e77d6d..537dfc557744 100644
> --- a/mm/rmap.c
> +++ b/mm/rmap.c
> @@ -1777,7 +1777,7 @@ bool try_to_unmap(struct page *page, enum ttu_flags flags)
>         else
>                 rmap_walk(page, &rwc);
> 
> -       return !page_mapcount(page) ? true : false;
> +       return !total_mapcount(page) ? true : false;
>  }
> 
>  /**
> 
> 

With this patch, the problem yet not happen after 4 tests(5.10.x).

By the way, the problem does not happen in 5.4.x.(>about 120 tests)
does this match the code version?

Best Regards
Wang Yugui (wangyugui@e16-tech.com)
2021/04/23




  reply	other threads:[~2021-04-23  2:16 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-12 10:07 kernel BUG at mm/huge_memory.c:2736(linux 5.10.29) Wang Yugui
2021-04-12 20:18 ` Yang Shi
2021-04-13 11:30   ` Wang Yugui
2021-04-15 11:18     ` Wang Yugui
2021-04-15 16:26       ` Yang Shi
2021-04-17  8:33   ` Wang Yugui
2021-04-22  0:11     ` Yang Shi
2021-04-23  2:16       ` Wang Yugui [this message]
2021-04-23  8:07         ` Wang Yugui
2021-04-23 21:05           ` Yang Shi
2021-04-24  5:28             ` Wang Yugui
2021-04-26 22:56               ` Yang Shi
2021-04-28 21:55                 ` Wang Yugui

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210423101654.1242.409509F4@e16-tech.com \
    --to=wangyugui@e16-tech.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linux-mm@kvack.org \
    --cc=shy828301@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.