From: Larry Woodman <lwoodman@redhat.com>
To: Andrea Arcangeli <aarcange@redhat.com>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>,
"H. Peter Anvin" <hpa@zytor.com>,
the arch/x86 maintainers <x86@kernel.org>,
"Xen-devel@lists.xensource.com" <Xen-devel@lists.xensource.com>,
Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
Ian Campbell <Ian.Campbell@citrix.com>,
Jan Beulich <JBeulich@novell.com>
Subject: Re: [PATCH] x86: hold mm->page_table_lock while doing vmalloc_sync
Date: Thu, 03 Feb 2011 15:59:13 -0500 [thread overview]
Message-ID: <4D4B1721.9010100@redhat.com> (raw)
In-Reply-To: <20110203024838.GI5843@random.random>
On 02/02/2011 09:48 PM, Andrea Arcangeli wrote:
> Hello,
>
> Larry (CC'ed) found a problem with the patch in subject. When
> USE_SPLIT_PTLOCKS is not defined (NR_CPUS == 2) it will deadlock in
> ptep_clear_flush_notify in rmap.c because it's sending IPIs with the
> page_table_lock already held, and the other CPUs now spins on the
> page_table_lock with irq disabled, so the IPI never runs. With
> CONFIG_TRANSPARENT_HUGEPAGE=y this deadlocks happens even with
> USE_SPLIT_PTLOCKS defined so it become visible but it needs to be
> fixed regardless (for NR_CPUS == 2).
>
> I'd like to understand why the pgd_lock needs irq disabled, it sounds
> too easy that I can just remove the _irqsave, doesn't it?
>
> A pgd_free comment says it can run from irq. page_table_lock having to
> be taken there is for Xen only, but other archs also uses
> spin_lock_irqsave(pgd_lock) so I guess it's either common code, or
> it's superfluous and not another Xen special requirement.
>
> If we could remove that _irqsave like below it'd solve it... But
> clearly something must be taking the pgd_lock from irq. (using a
> rwlock would also be possible as long as nobody takes it in write mode
> during irq, but if it's pgd_free that really runs in irq, that would
> need the write_lock so it wouldn't be a solution).
>
> I'm trying this fix and the VM_BUG_ON never triggered yet.
>
> In short: who takes the pgd_lock from an irq? (__mmdrop shouldn't but
> maybe I'm overlooking some aio bit?)
>
>
This is the specifics:
The problem is with THP. The page reclaim code calls page_referenced_one()
which takes the mm->page_table_lock on one CPU before sending an IPI to other
CPU(s):
On CPU1 we take the mm->page_table_lock, send IPIs and wait for a response:
page_referenced_one(...)
if (unlikely(PageTransHuge(page))) {
pmd_t *pmd;
spin_lock(&mm->page_table_lock);
pmd = page_check_address_pmd(page, mm, address,
PAGE_CHECK_ADDRESS_PMD_FLAG);
if (pmd&& !pmd_trans_splitting(*pmd)&&
pmdp_clear_flush_young_notify(vma, address, pmd))
referenced++;
spin_unlock(&mm->page_table_lock);
} else {
CPU2 can race in vmalloc_sync_all() because it disables interrupt(preventing a
response to the IPI from CPU1) and takes the pgd_lock then spins in the
mm->page_table_lock which is already held on CPU1.
spin_lock_irqsave(&pgd_lock, flags);
list_for_each_entry(page,&pgd_list, lru) {
pgd_t *pgd;
spinlock_t *pgt_lock;
pgd = (pgd_t *)page_address(page) + pgd_index(address);
pgt_lock =&pgd_page_get_mm(page)->page_table_lock;
spin_lock(pgt_lock);
At this point the system is deadlocked. The pmdp_clear_flush_young_notify
needs to do its PDG business with the page_table_lock held then release that
lock before sending the IPIs to the other CPUs.
prev parent reply other threads:[~2011-02-03 20:59 UTC|newest]
Thread overview: 52+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-10-14 20:56 [PATCH] x86: hold mm->page_table_lock while doing vmalloc_sync Jeremy Fitzhardinge
2010-10-15 17:07 ` [Xen-devel] " Jeremy Fitzhardinge
2010-10-19 22:17 ` [tip:x86/mm] x86, mm: Hold " tip-bot for Jeremy Fitzhardinge
2010-10-20 10:36 ` Borislav Petkov
2010-10-20 19:31 ` [tip:x86/mm] x86, mm: Fix incorrect data type in vmalloc_sync_all() tip-bot for tip-bot for Jeremy Fitzhardinge
2010-10-20 19:50 ` Borislav Petkov
2010-10-20 19:53 ` H. Peter Anvin
2010-10-20 20:10 ` Borislav Petkov
2010-10-20 20:13 ` H. Peter Anvin
2010-10-20 22:11 ` Borislav Petkov
2010-10-20 21:26 ` Ben Pfaff
2010-10-20 19:58 ` tip-bot for Borislav Petkov
2010-10-21 21:06 ` [PATCH] x86: hold mm->page_table_lock while doing vmalloc_sync Jeremy Fitzhardinge
2010-10-21 21:26 ` H. Peter Anvin
2010-10-21 21:34 ` Jeremy Fitzhardinge
2011-02-03 2:48 ` Andrea Arcangeli
2011-02-03 20:44 ` Jeremy Fitzhardinge
2011-02-04 1:21 ` Andrea Arcangeli
2011-02-04 21:27 ` Jeremy Fitzhardinge
2011-02-07 23:20 ` Andrea Arcangeli
2011-02-15 19:07 ` [PATCH] fix pgd_lock deadlock Andrea Arcangeli
2011-02-15 19:26 ` Thomas Gleixner
2011-02-15 19:54 ` Andrea Arcangeli
2011-02-15 20:05 ` Thomas Gleixner
2011-02-15 20:26 ` Thomas Gleixner
2011-02-15 22:52 ` Andrea Arcangeli
2011-02-15 23:03 ` Thomas Gleixner
2011-02-15 23:17 ` Andrea Arcangeli
2011-02-16 9:58 ` Peter Zijlstra
2011-02-16 10:15 ` Andrea Arcangeli
2011-02-16 10:28 ` Ingo Molnar
2011-02-16 14:49 ` Andrea Arcangeli
2011-02-16 16:26 ` Rik van Riel
2011-02-16 20:15 ` Ingo Molnar
2012-04-23 9:07 ` [2.6.32.y][PATCH] " Philipp Hahn
2012-04-23 19:09 ` Willy Tarreau
2011-02-16 18:33 ` [PATCH] " Andrea Arcangeli
2011-02-16 21:34 ` Konrad Rzeszutek Wilk
2011-02-17 10:19 ` Johannes Weiner
2011-02-21 14:30 ` Andrea Arcangeli
2011-02-21 14:53 ` Johannes Weiner
2011-02-22 7:48 ` Jan Beulich
2011-02-22 13:49 ` Andrea Arcangeli
2011-02-22 14:22 ` Jan Beulich
2011-02-22 14:34 ` Andrea Arcangeli
2011-02-22 17:08 ` Jeremy Fitzhardinge
2011-02-22 17:13 ` Andrea Arcangeli
2011-02-24 4:22 ` Andrea Arcangeli
2011-02-24 8:23 ` Jan Beulich
2011-02-24 14:11 ` Andrea Arcangeli
2011-02-21 17:40 ` Jeremy Fitzhardinge
2011-02-03 20:59 ` Larry Woodman [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4D4B1721.9010100@redhat.com \
--to=lwoodman@redhat.com \
--cc=Ian.Campbell@citrix.com \
--cc=JBeulich@novell.com \
--cc=Xen-devel@lists.xensource.com \
--cc=aarcange@redhat.com \
--cc=hpa@zytor.com \
--cc=jeremy@goop.org \
--cc=linux-kernel@vger.kernel.org \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.