From: Johannes Weiner <jweiner@redhat.com>
To: Andrea Arcangeli <aarcange@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>,
Jeremy Fitzhardinge <jeremy@goop.org>,
Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>,
"H. Peter Anvin" <hpa@zytor.com>,
the arch/x86 maintainers <x86@kernel.org>,
"Xen-devel@lists.xensource.com" <Xen-devel@lists.xensource.com>,
Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
Ian Campbell <Ian.Campbell@citrix.com>,
Jan Beulich <JBeulich@novell.com>,
Larry Woodman <lwoodman@redhat.com>,
Andrew Morton <akpm@linux-foundation.org>,
Andi Kleen <andi@firstfloor.org>, Hugh Dickins <hughd@google.com>,
Rik van Riel <riel@redhat.com>
Subject: Re: [PATCH] fix pgd_lock deadlock
Date: Mon, 21 Feb 2011 15:53:50 +0100 [thread overview]
Message-ID: <20110221145350.GH25382@redhat.com> (raw)
In-Reply-To: <20110221143023.GF13092@random.random>
On Mon, Feb 21, 2011 at 03:30:23PM +0100, Andrea Arcangeli wrote:
> On Thu, Feb 17, 2011 at 11:19:41AM +0100, Johannes Weiner wrote:
> > So Xen needs all page tables protected when pinning/unpinning and
> > extended page_table_lock to cover kernel range, which it does nowhere
> > else AFAICS. But the places it extended are also taking the pgd_lock,
> > so I wonder if Xen could just take the pgd_lock itself in these paths
> > and we could revert page_table_lock back to cover user va only?
> > Jeremy, could this work? Untested.
>
> If this works for Xen, I definitely prefer this.
Below is real submission, with changelog and sign-off and all (except
testing on Xen itself, sorry). I moved pgd_lock acquisition in this
version to make the lock ordering perfectly clear. Xen people, could
you have a look at this?
> Still there's no point to insist on _irqsave if nothing takes the
> pgd_lock from irq, so my patch probably should be applied anyway or
> it's confusing and there's even a comment saying pgd_dtor is called
> from irq, if it's not it should be corrected. But then it becomes a
> cleanup notafix.
Absolutely agreed, I like your clean-up but would prefer it not being
a requirement for this fix.
---
From: Johannes Weiner <hannes@cmpxchg.org>
Subject: [patch] x86, mm: fix mm->page_table_lock deadlock
'617d34d x86, mm: Hold mm->page_table_lock while doing vmalloc_sync'
made two paths grab mm->page_table_lock while having IRQs disabled.
This is not safe as rmap waits for IPI responses with this lock held
when clearing young bits and flushing the TLB.
What 617d34d wanted was to exclude any changes to the page tables,
including demand kernel page table propagation from init_mm, so that
Xen could pin and unpin page tables atomically.
Kernel page table propagation takes the pgd_lock to protect the list
of page directories, however, so instead of adding mm->page_table_lock
to this side, this patch has Xen exclude it by taking pgd_lock itself.
The pgd_lock will then nest within mm->page_table_lock to exclude rmap
before disabling IRQs, thus fixing the deadlock.
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
---
arch/x86/include/asm/pgtable.h | 2 --
arch/x86/mm/fault.c | 14 ++------------
arch/x86/mm/init_64.c | 6 ------
arch/x86/mm/pgtable.c | 20 +++-----------------
arch/x86/xen/mmu.c | 12 ++++++++++++
5 files changed, 17 insertions(+), 37 deletions(-)
diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
index 18601c8..8c0335a 100644
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -28,8 +28,6 @@ extern unsigned long empty_zero_page[PAGE_SIZE / sizeof(unsigned long)];
extern spinlock_t pgd_lock;
extern struct list_head pgd_list;
-extern struct mm_struct *pgd_page_get_mm(struct page *page);
-
#ifdef CONFIG_PARAVIRT
#include <asm/paravirt.h>
#else /* !CONFIG_PARAVIRT */
diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index 7d90ceb..5da4155 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -234,19 +234,9 @@ void vmalloc_sync_all(void)
struct page *page;
spin_lock_irqsave(&pgd_lock, flags);
- list_for_each_entry(page, &pgd_list, lru) {
- spinlock_t *pgt_lock;
- pmd_t *ret;
-
- pgt_lock = &pgd_page_get_mm(page)->page_table_lock;
-
- spin_lock(pgt_lock);
- ret = vmalloc_sync_one(page_address(page), address);
- spin_unlock(pgt_lock);
-
- if (!ret)
+ list_for_each_entry(page, &pgd_list, lru)
+ if (!vmalloc_sync_one(page_address(page), address))
break;
- }
spin_unlock_irqrestore(&pgd_lock, flags);
}
}
diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index 71a5929..9332f21 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -114,19 +114,13 @@ void sync_global_pgds(unsigned long start, unsigned long end)
spin_lock_irqsave(&pgd_lock, flags);
list_for_each_entry(page, &pgd_list, lru) {
pgd_t *pgd;
- spinlock_t *pgt_lock;
pgd = (pgd_t *)page_address(page) + pgd_index(address);
- pgt_lock = &pgd_page_get_mm(page)->page_table_lock;
- spin_lock(pgt_lock);
-
if (pgd_none(*pgd))
set_pgd(pgd, *pgd_ref);
else
BUG_ON(pgd_page_vaddr(*pgd)
!= pgd_page_vaddr(*pgd_ref));
-
- spin_unlock(pgt_lock);
}
spin_unlock_irqrestore(&pgd_lock, flags);
}
diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
index 500242d..72107ab 100644
--- a/arch/x86/mm/pgtable.c
+++ b/arch/x86/mm/pgtable.c
@@ -87,19 +87,7 @@ static inline void pgd_list_del(pgd_t *pgd)
#define UNSHARED_PTRS_PER_PGD \
(SHARED_KERNEL_PMD ? KERNEL_PGD_BOUNDARY : PTRS_PER_PGD)
-
-static void pgd_set_mm(pgd_t *pgd, struct mm_struct *mm)
-{
- BUILD_BUG_ON(sizeof(virt_to_page(pgd)->index) < sizeof(mm));
- virt_to_page(pgd)->index = (pgoff_t)mm;
-}
-
-struct mm_struct *pgd_page_get_mm(struct page *page)
-{
- return (struct mm_struct *)page->index;
-}
-
-static void pgd_ctor(struct mm_struct *mm, pgd_t *pgd)
+static void pgd_ctor(pgd_t *pgd)
{
/* If the pgd points to a shared pagetable level (either the
ptes in non-PAE, or shared PMD in PAE), then just copy the
@@ -113,10 +101,8 @@ static void pgd_ctor(struct mm_struct *mm, pgd_t *pgd)
}
/* list required to sync kernel mapping updates */
- if (!SHARED_KERNEL_PMD) {
- pgd_set_mm(pgd, mm);
+ if (!SHARED_KERNEL_PMD)
pgd_list_add(pgd);
- }
}
static void pgd_dtor(pgd_t *pgd)
@@ -282,7 +268,7 @@ pgd_t *pgd_alloc(struct mm_struct *mm)
*/
spin_lock_irqsave(&pgd_lock, flags);
- pgd_ctor(mm, pgd);
+ pgd_ctor(pgd);
pgd_prepopulate_pmd(mm, pgd, pmds);
spin_unlock_irqrestore(&pgd_lock, flags);
diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
index 5e92b61..498e6ae 100644
--- a/arch/x86/xen/mmu.c
+++ b/arch/x86/xen/mmu.c
@@ -1117,15 +1117,23 @@ void xen_mm_unpin_all(void)
void xen_activate_mm(struct mm_struct *prev, struct mm_struct *next)
{
+ unsigned long flags;
+
spin_lock(&next->page_table_lock);
+ spin_lock_irqsave(&pgd_lock, flags);
xen_pgd_pin(next);
+ spin_unlock_irqrestore(&pgd_lock, flags);
spin_unlock(&next->page_table_lock);
}
void xen_dup_mmap(struct mm_struct *oldmm, struct mm_struct *mm)
{
+ unsigned long flags;
+
spin_lock(&mm->page_table_lock);
+ spin_lock_irqsave(&pgd_lock, flags);
xen_pgd_pin(mm);
+ spin_unlock_irqrestore(&pgd_lock, flags);
spin_unlock(&mm->page_table_lock);
}
@@ -1211,16 +1219,20 @@ static void xen_drop_mm_ref(struct mm_struct *mm)
*/
void xen_exit_mmap(struct mm_struct *mm)
{
+ unsigned long flags;
+
get_cpu(); /* make sure we don't move around */
xen_drop_mm_ref(mm);
put_cpu();
spin_lock(&mm->page_table_lock);
+ spin_lock_irqsave(&pgd_lock, flags);
/* pgd may not be pinned in the error exit path of execve */
if (xen_page_pinned(mm->pgd))
xen_pgd_unpin(mm);
+ spin_unlock_irqrestore(&pgd_lock, flags);
spin_unlock(&mm->page_table_lock);
}
--
1.7.4
next prev parent reply other threads:[~2011-02-21 14:54 UTC|newest]
Thread overview: 68+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-10-14 20:56 [PATCH] x86: hold mm->page_table_lock while doing vmalloc_sync Jeremy Fitzhardinge
2010-10-14 20:56 ` Jeremy Fitzhardinge
2010-10-15 17:07 ` [Xen-devel] " Jeremy Fitzhardinge
2010-10-15 17:07 ` Jeremy Fitzhardinge
2010-10-19 22:17 ` [tip:x86/mm] x86, mm: Hold " tip-bot for Jeremy Fitzhardinge
2010-10-20 10:36 ` Borislav Petkov
2010-10-20 19:31 ` [tip:x86/mm] x86, mm: Fix incorrect data type in vmalloc_sync_all() tip-bot for tip-bot for Jeremy Fitzhardinge
2010-10-20 19:50 ` Borislav Petkov
2010-10-20 19:53 ` H. Peter Anvin
2010-10-20 20:10 ` Borislav Petkov
2010-10-20 20:13 ` H. Peter Anvin
2010-10-20 22:11 ` Borislav Petkov
2010-10-20 21:26 ` Ben Pfaff
2010-10-20 19:58 ` tip-bot for Borislav Petkov
2010-10-21 21:06 ` [PATCH] x86: hold mm->page_table_lock while doing vmalloc_sync Jeremy Fitzhardinge
2010-10-21 21:06 ` Jeremy Fitzhardinge
2010-10-21 21:26 ` H. Peter Anvin
2010-10-21 21:26 ` H. Peter Anvin
2010-10-21 21:34 ` Jeremy Fitzhardinge
2011-02-03 2:48 ` Andrea Arcangeli
2011-02-03 20:44 ` Jeremy Fitzhardinge
2011-02-03 20:44 ` Jeremy Fitzhardinge
2011-02-04 1:21 ` Andrea Arcangeli
2011-02-04 21:27 ` Jeremy Fitzhardinge
2011-02-07 23:20 ` Andrea Arcangeli
2011-02-15 19:07 ` [PATCH] fix pgd_lock deadlock Andrea Arcangeli
2011-02-15 19:26 ` Thomas Gleixner
2011-02-15 19:31 ` Larry Woodman
2011-02-15 19:54 ` Andrea Arcangeli
2011-02-15 19:54 ` Andrea Arcangeli
2011-02-15 20:05 ` Thomas Gleixner
2011-02-15 20:05 ` Thomas Gleixner
2011-02-15 20:26 ` Thomas Gleixner
2011-02-15 20:26 ` Thomas Gleixner
2011-02-15 22:52 ` Andrea Arcangeli
2011-02-15 23:03 ` Thomas Gleixner
2011-02-15 23:03 ` Thomas Gleixner
2011-02-15 23:17 ` Andrea Arcangeli
2011-02-16 9:58 ` Peter Zijlstra
2011-02-16 10:15 ` Andrea Arcangeli
2011-02-16 10:15 ` Andrea Arcangeli
2011-02-16 10:28 ` Ingo Molnar
2011-02-16 10:28 ` Ingo Molnar
2011-02-16 14:49 ` Andrea Arcangeli
2011-02-16 16:26 ` Rik van Riel
2011-02-16 16:26 ` Rik van Riel
2011-02-16 20:15 ` Ingo Molnar
2012-04-23 9:07 ` [2.6.32.y][PATCH] " Philipp Hahn
2012-04-23 19:09 ` Willy Tarreau
2011-02-16 18:33 ` [PATCH] " Andrea Arcangeli
2011-02-16 21:34 ` Konrad Rzeszutek Wilk
2011-02-17 10:19 ` Johannes Weiner
2011-02-21 14:30 ` Andrea Arcangeli
2011-02-21 14:53 ` Johannes Weiner [this message]
2011-02-22 7:48 ` Jan Beulich
2011-02-22 13:49 ` Andrea Arcangeli
2011-02-22 14:22 ` Jan Beulich
2011-02-22 14:22 ` Jan Beulich
2011-02-22 14:34 ` Andrea Arcangeli
2011-02-22 17:08 ` Jeremy Fitzhardinge
2011-02-22 17:08 ` Jeremy Fitzhardinge
2011-02-22 17:13 ` Andrea Arcangeli
2011-02-22 17:13 ` Andrea Arcangeli
2011-02-24 4:22 ` Andrea Arcangeli
2011-02-24 8:23 ` Jan Beulich
2011-02-24 14:11 ` Andrea Arcangeli
2011-02-21 17:40 ` Jeremy Fitzhardinge
2011-02-03 20:59 ` [PATCH] x86: hold mm->page_table_lock while doing vmalloc_sync Larry Woodman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20110221145350.GH25382@redhat.com \
--to=jweiner@redhat.com \
--cc=Ian.Campbell@citrix.com \
--cc=JBeulich@novell.com \
--cc=Xen-devel@lists.xensource.com \
--cc=aarcange@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=andi@firstfloor.org \
--cc=hpa@zytor.com \
--cc=hughd@google.com \
--cc=jeremy@goop.org \
--cc=konrad.wilk@oracle.com \
--cc=linux-kernel@vger.kernel.org \
--cc=lwoodman@redhat.com \
--cc=riel@redhat.com \
--cc=tglx@linutronix.de \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.