From: Alistair Popple <apopple@nvidia.com>
To: Matthew Wilcox <willy@infradead.org>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>,
linux-mm@kvack.org, Andrew Morton <akpm@linux-foundation.org>,
Thomas Gleixner <tglx@linutronix.de>
Subject: Re: BUG_ON() in pfn_swap_entry_to_page()
Date: Wed, 23 Mar 2022 11:29:12 +1100 [thread overview]
Message-ID: <871qyt4g4a.fsf@nvdebian.thelocal> (raw)
In-Reply-To: <YjoKZ3ghBaq6y3jN@casper.infradead.org>
[-- Attachment #1: Type: text/plain, Size: 3831 bytes --]
Matthew Wilcox <willy@infradead.org> writes:
> On Tue, Mar 22, 2022 at 06:25:02PM +0100, Sebastian Andrzej Siewior wrote:
>> Run into
>>
>> |kernel BUG at include/linux/swapops.h:258!
>> |RIP: 0010:migration_entry_wait_on_locked+0x233/0x2c0
>> |Call Trace:
>> | <TASK>
>> | __handle_mm_fault+0x2bf/0x1810
>> | handle_mm_fault+0x136/0x3e0
>> | exc_page_fault+0x1d2/0x850
>> | asm_exc_page_fault+0x22/0x30
>>
>> This is the BUG_ON() in pfn_swap_entry_to_page(). The box itself has no
>> swapspace configured. Is this something known?
>
> It's not been reported before, but it seems obvious why it triggers.
> I think it's ffa65753c431 "mm/migrate.c: rework migration_entry_wait()
> to not take a pageref".
It's obvious the BUG_ON() is because the page isn't locked, but it's less
obvious to me at least why we have a migration entry pointing to an unlocked
page.
> There's a lot of inlining going on, so let's write this out:
>
> __handle_mm_fault
> handle_pte_fault
> do_swap_page():
>
> entry = pte_to_swp_entry(vmf->orig_pte);
> if (unlikely(non_swap_entry(entry))) {
> if (is_migration_entry(entry)) {
> migration_entry_wait(vma->vm_mm, vmf->pmd,
> vmf->address);
>
> We don't (and can't) have the underlying page locked here. We're just
> waiting for it to finish migration.
Why can't the page be locked here? Waiting for migration to finish is equivalent
to waiting for the page to be unlocked. __unmap_and_move() follows this rough
sequence:
__unmap_and_move()
if (!trylock_page(page)) {
if (!force || mode == MIGRATE_ASYNC)
goto out;
/*
* It's not safe for direct compaction to call lock_page.
* For example, during page readahead pages are added locked
* to the LRU. Later, when the IO completes the pages are
* marked uptodate and unlocked. However, the queueing
* could be merging multiple pages for one bio (e.g.
* mpage_readahead). If an allocation happens for the
* second or third page, the process can end up locking
* the same page twice and deadlocking. Rather than
* trying to be clever about what pages can be locked,
* avoid the use of lock_page for direct compaction
* altogether.
*/
if (current->flags & PF_MEMALLOC)
goto out;
lock_page(page);
}
[...]
if (unlikely(!trylock_page(newpage)))
goto out_unlock;
[...]
} else if (page_mapped(page)) {
/* Establish migration ptes */
VM_BUG_ON_PAGE(PageAnon(page) && !PageKsm(page) && !anon_vma,
page);
try_to_migrate(page, 0);
page_was_mapped = true;
}
if (!page_mapped(page))
rc = move_to_new_page(newpage, page, mode);
if (page_was_mapped)
remove_migration_ptes(page,
rc == MIGRATEPAGE_SUCCESS ? newpage : page, false);
Both the source and destination pages (page and newpage respectively) are locked
prior to installing migration entries in try_to_migrate() and unlocked after
migration entries are removed. In order to remove a migration entry the PTL is
required. Therefore while holding the PTL for a migration entry it should be
impossible to observe it pointing to an unlocked page.
> migration_entry_wait
> migration_entry_wait_on_locked():
>
> struct folio *folio = page_folio(pfn_swap_entry_to_page(entry));
>
> pfn_swap_entry_to_page():
>
> struct page *p = pfn_to_page(swp_offset(entry));
>
> /*
> * Any use of migration entries may only occur while the
> * corresponding page is locked
> */
> BUG_ON(is_migration_entry(entry) && !PageLocked(p));
>
> So why did we not hit this earlier? Surely somebody's testing page
> migration? I'm not familiar with the migration code, so maybe this
> assertion is obsolete and it's now legitimate to use a migration entry
> while the page is unlocked.
next prev parent reply other threads:[~2022-03-23 1:22 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-03-22 17:25 BUG_ON() in pfn_swap_entry_to_page() Sebastian Andrzej Siewior
2022-03-22 17:41 ` Matthew Wilcox
2022-03-23 0:29 ` Alistair Popple [this message]
2022-03-24 3:24 ` Matthew Wilcox
2022-03-24 3:51 ` Alistair Popple
2024-04-24 19:45 ` Felix Kuehling
2024-04-25 9:32 ` David Hildenbrand
2024-04-25 14:33 ` Felix Kuehling
2024-04-26 8:49 ` David Hildenbrand
2024-04-26 14:56 ` Felix Kuehling
2022-03-22 18:53 ` Ritesh Harjani
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=871qyt4g4a.fsf@nvdebian.thelocal \
--to=apopple@nvidia.com \
--cc=akpm@linux-foundation.org \
--cc=bigeasy@linutronix.de \
--cc=linux-mm@kvack.org \
--cc=tglx@linutronix.de \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.