From: Jeremy Fitzhardinge <jeremy@goop.org>
To: "H. Peter Anvin" <hpa@zytor.com>
Cc: the arch/x86 maintainers <x86@kernel.org>,
"Xen-devel@lists.xensource.com" <Xen-devel@lists.xensource.com>,
Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
Ian Campbell <Ian.Campbell@citrix.com>,
Jan Beulich <JBeulich@novell.com>
Subject: [PATCH] x86: hold mm->page_table_lock while doing vmalloc_sync
Date: Thu, 14 Oct 2010 13:56:43 -0700 [thread overview]
Message-ID: <4CB76E8B.2090309@goop.org> (raw)
[-- Attachment #1: Type: text/plain, Size: 3767 bytes --]
Take mm->page_table_lock while syncing the vmalloc region. This prevents
a race with the Xen pagetable pin/unpin code, which expects that the
page_table_lock is already held. If this race occurs, then Xen can see
an inconsistent page type (a page can either be read/write or a pagetable
page, and pin/unpin converts it between them), which will cause either
the pin or the set_p[gm]d to fail; either will crash the kernel.
vmalloc_sync_all() should be called rarely, so this extra use of
page_table_lock should not interfere with its normal users.
The mm pointer is stashed in the pgd page's index field, as that won't
be otherwise used for pgd pages.
Bug reported by Ian Campbell <ian.cambell@eu.citrix.com>
Derived from a patch by Jan Beulich <jbeulich@novell.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
index a34c785..422b363 100644
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -28,6 +28,8 @@ extern unsigned long empty_zero_page[PAGE_SIZE / sizeof(unsigned long)];
extern spinlock_t pgd_lock;
extern struct list_head pgd_list;
+extern struct mm_struct *pgd_page_get_mm(struct page *page);
+
#ifdef CONFIG_PARAVIRT
#include <asm/paravirt.h>
#else /* !CONFIG_PARAVIRT */
diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index 4c4508e..b7f9ae1 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -229,7 +229,16 @@ void vmalloc_sync_all(void)
spin_lock_irqsave(&pgd_lock, flags);
list_for_each_entry(page, &pgd_list, lru) {
- if (!vmalloc_sync_one(page_address(page), address))
+ spinlock_t *pgt_lock;
+ int ret;
+
+ pgt_lock = &pgd_page_get_mm(page)->page_table_lock;
+
+ spin_lock(pgt_lock);
+ ret = vmalloc_sync_one(page_address(page), address);
+ spin_unlock(pgt_lock);
+
+ if (!ret)
break;
}
spin_unlock_irqrestore(&pgd_lock, flags);
@@ -341,11 +350,19 @@ void vmalloc_sync_all(void)
spin_lock_irqsave(&pgd_lock, flags);
list_for_each_entry(page, &pgd_list, lru) {
pgd_t *pgd;
+ spinlock_t *pgt_lock;
+
pgd = (pgd_t *)page_address(page) + pgd_index(address);
+
+ pgt_lock = &pgd_page_get_mm(page)->page_table_lock;
+ spin_lock(pgt_lock);
+
if (pgd_none(*pgd))
set_pgd(pgd, *pgd_ref);
else
BUG_ON(pgd_page_vaddr(*pgd) != pgd_page_vaddr(*pgd_ref));
+
+ spin_unlock(pgt_lock);
}
spin_unlock_irqrestore(&pgd_lock, flags);
}
diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
index 5c4ee42..c70e57d 100644
--- a/arch/x86/mm/pgtable.c
+++ b/arch/x86/mm/pgtable.c
@@ -87,7 +87,19 @@ static inline void pgd_list_del(pgd_t *pgd)
#define UNSHARED_PTRS_PER_PGD \
(SHARED_KERNEL_PMD ? KERNEL_PGD_BOUNDARY : PTRS_PER_PGD)
-static void pgd_ctor(pgd_t *pgd)
+
+static void pgd_set_mm(pgd_t *pgd, struct mm_struct *mm)
+{
+ BUILD_BUG_ON(sizeof(virt_to_page(pgd)->index) < sizeof(mm));
+ virt_to_page(pgd)->index = (pgoff_t)mm;
+}
+
+struct mm_struct *pgd_page_get_mm(struct page *page)
+{
+ return (struct mm_struct *)page->index;
+}
+
+static void pgd_ctor(struct mm_struct *mm, pgd_t *pgd)
{
/* If the pgd points to a shared pagetable level (either the
ptes in non-PAE, or shared PMD in PAE), then just copy the
@@ -105,8 +117,10 @@ static void pgd_ctor(pgd_t *pgd)
}
/* list required to sync kernel mapping updates */
- if (!SHARED_KERNEL_PMD)
+ if (!SHARED_KERNEL_PMD) {
+ pgd_set_mm(pgd, mm);
pgd_list_add(pgd);
+ }
}
static void pgd_dtor(pgd_t *pgd)
@@ -272,7 +286,7 @@ pgd_t *pgd_alloc(struct mm_struct *mm)
*/
spin_lock_irqsave(&pgd_lock, flags);
- pgd_ctor(pgd);
+ pgd_ctor(mm, pgd);
pgd_prepopulate_pmd(mm, pgd, pmds);
spin_unlock_irqrestore(&pgd_lock, flags);
[-- Attachment #2: x86-lock-page_table_lock-in-vmalloc_sync.patch --]
[-- Type: text/plain, Size: 3999 bytes --]
From 6ddec918473deed20c2b42a19530608ef96b7afc Mon Sep 17 00:00:00 2001
From: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Date: Tue, 21 Sep 2010 12:01:51 -0700
Subject: [PATCH] x86: hold mm->page_table_lock while doing vmalloc_sync
Take mm->page_table_lock while syncing the vmalloc region. This prevents
a race with the Xen pagetable pin/unpin code, which expects that the
page_table_lock is already held. If this race occurs, then Xen can see
an inconsistent page type (a page can either be read/write or a pagetable
page, and pin/unpin converts it between them), which will cause either
the pin or the set_p[gm]d to fail; either will crash the kernel.
vmalloc_sync_all() should be called rarely, so this extra use of
page_table_lock should not interfere with its normal users.
The mm pointer is stashed in the pgd page's index field, as that won't
be otherwise used for pgds.
Bug reported by Ian Campbell <ian.cambell@eu.citrix.com>
Derived from a patch by Jan Beulich <jbeulich@novell.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
index a34c785..422b363 100644
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -28,6 +28,8 @@ extern unsigned long empty_zero_page[PAGE_SIZE / sizeof(unsigned long)];
extern spinlock_t pgd_lock;
extern struct list_head pgd_list;
+extern struct mm_struct *pgd_page_get_mm(struct page *page);
+
#ifdef CONFIG_PARAVIRT
#include <asm/paravirt.h>
#else /* !CONFIG_PARAVIRT */
diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index 4c4508e..b7f9ae1 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -229,7 +229,16 @@ void vmalloc_sync_all(void)
spin_lock_irqsave(&pgd_lock, flags);
list_for_each_entry(page, &pgd_list, lru) {
- if (!vmalloc_sync_one(page_address(page), address))
+ spinlock_t *pgt_lock;
+ int ret;
+
+ pgt_lock = &pgd_page_get_mm(page)->page_table_lock;
+
+ spin_lock(pgt_lock);
+ ret = vmalloc_sync_one(page_address(page), address);
+ spin_unlock(pgt_lock);
+
+ if (!ret)
break;
}
spin_unlock_irqrestore(&pgd_lock, flags);
@@ -341,11 +350,19 @@ void vmalloc_sync_all(void)
spin_lock_irqsave(&pgd_lock, flags);
list_for_each_entry(page, &pgd_list, lru) {
pgd_t *pgd;
+ spinlock_t *pgt_lock;
+
pgd = (pgd_t *)page_address(page) + pgd_index(address);
+
+ pgt_lock = &pgd_page_get_mm(page)->page_table_lock;
+ spin_lock(pgt_lock);
+
if (pgd_none(*pgd))
set_pgd(pgd, *pgd_ref);
else
BUG_ON(pgd_page_vaddr(*pgd) != pgd_page_vaddr(*pgd_ref));
+
+ spin_unlock(pgt_lock);
}
spin_unlock_irqrestore(&pgd_lock, flags);
}
diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
index 5c4ee42..c70e57d 100644
--- a/arch/x86/mm/pgtable.c
+++ b/arch/x86/mm/pgtable.c
@@ -87,7 +87,19 @@ static inline void pgd_list_del(pgd_t *pgd)
#define UNSHARED_PTRS_PER_PGD \
(SHARED_KERNEL_PMD ? KERNEL_PGD_BOUNDARY : PTRS_PER_PGD)
-static void pgd_ctor(pgd_t *pgd)
+
+static void pgd_set_mm(pgd_t *pgd, struct mm_struct *mm)
+{
+ BUILD_BUG_ON(sizeof(virt_to_page(pgd)->index) < sizeof(mm));
+ virt_to_page(pgd)->index = (pgoff_t)mm;
+}
+
+struct mm_struct *pgd_page_get_mm(struct page *page)
+{
+ return (struct mm_struct *)page->index;
+}
+
+static void pgd_ctor(struct mm_struct *mm, pgd_t *pgd)
{
/* If the pgd points to a shared pagetable level (either the
ptes in non-PAE, or shared PMD in PAE), then just copy the
@@ -105,8 +117,10 @@ static void pgd_ctor(pgd_t *pgd)
}
/* list required to sync kernel mapping updates */
- if (!SHARED_KERNEL_PMD)
+ if (!SHARED_KERNEL_PMD) {
+ pgd_set_mm(pgd, mm);
pgd_list_add(pgd);
+ }
}
static void pgd_dtor(pgd_t *pgd)
@@ -272,7 +286,7 @@ pgd_t *pgd_alloc(struct mm_struct *mm)
*/
spin_lock_irqsave(&pgd_lock, flags);
- pgd_ctor(pgd);
+ pgd_ctor(mm, pgd);
pgd_prepopulate_pmd(mm, pgd, pmds);
spin_unlock_irqrestore(&pgd_lock, flags);
next reply other threads:[~2010-10-14 20:56 UTC|newest]
Thread overview: 52+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-10-14 20:56 Jeremy Fitzhardinge [this message]
2010-10-15 17:07 ` [Xen-devel] [PATCH] x86: hold mm->page_table_lock while doing vmalloc_sync Jeremy Fitzhardinge
2010-10-19 22:17 ` [tip:x86/mm] x86, mm: Hold " tip-bot for Jeremy Fitzhardinge
2010-10-20 10:36 ` Borislav Petkov
2010-10-20 19:31 ` [tip:x86/mm] x86, mm: Fix incorrect data type in vmalloc_sync_all() tip-bot for tip-bot for Jeremy Fitzhardinge
2010-10-20 19:50 ` Borislav Petkov
2010-10-20 19:53 ` H. Peter Anvin
2010-10-20 20:10 ` Borislav Petkov
2010-10-20 20:13 ` H. Peter Anvin
2010-10-20 22:11 ` Borislav Petkov
2010-10-20 21:26 ` Ben Pfaff
2010-10-20 19:58 ` tip-bot for Borislav Petkov
2010-10-21 21:06 ` [PATCH] x86: hold mm->page_table_lock while doing vmalloc_sync Jeremy Fitzhardinge
2010-10-21 21:26 ` H. Peter Anvin
2010-10-21 21:34 ` Jeremy Fitzhardinge
2011-02-03 2:48 ` Andrea Arcangeli
2011-02-03 20:44 ` Jeremy Fitzhardinge
2011-02-04 1:21 ` Andrea Arcangeli
2011-02-04 21:27 ` Jeremy Fitzhardinge
2011-02-07 23:20 ` Andrea Arcangeli
2011-02-15 19:07 ` [PATCH] fix pgd_lock deadlock Andrea Arcangeli
2011-02-15 19:26 ` Thomas Gleixner
2011-02-15 19:54 ` Andrea Arcangeli
2011-02-15 20:05 ` Thomas Gleixner
2011-02-15 20:26 ` Thomas Gleixner
2011-02-15 22:52 ` Andrea Arcangeli
2011-02-15 23:03 ` Thomas Gleixner
2011-02-15 23:17 ` Andrea Arcangeli
2011-02-16 9:58 ` Peter Zijlstra
2011-02-16 10:15 ` Andrea Arcangeli
2011-02-16 10:28 ` Ingo Molnar
2011-02-16 14:49 ` Andrea Arcangeli
2011-02-16 16:26 ` Rik van Riel
2011-02-16 20:15 ` Ingo Molnar
2012-04-23 9:07 ` [2.6.32.y][PATCH] " Philipp Hahn
2012-04-23 19:09 ` Willy Tarreau
2011-02-16 18:33 ` [PATCH] " Andrea Arcangeli
2011-02-16 21:34 ` Konrad Rzeszutek Wilk
2011-02-17 10:19 ` Johannes Weiner
2011-02-21 14:30 ` Andrea Arcangeli
2011-02-21 14:53 ` Johannes Weiner
2011-02-22 7:48 ` Jan Beulich
2011-02-22 13:49 ` Andrea Arcangeli
2011-02-22 14:22 ` Jan Beulich
2011-02-22 14:34 ` Andrea Arcangeli
2011-02-22 17:08 ` Jeremy Fitzhardinge
2011-02-22 17:13 ` Andrea Arcangeli
2011-02-24 4:22 ` Andrea Arcangeli
2011-02-24 8:23 ` Jan Beulich
2011-02-24 14:11 ` Andrea Arcangeli
2011-02-21 17:40 ` Jeremy Fitzhardinge
2011-02-03 20:59 ` [PATCH] x86: hold mm->page_table_lock while doing vmalloc_sync Larry Woodman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4CB76E8B.2090309@goop.org \
--to=jeremy@goop.org \
--cc=Ian.Campbell@citrix.com \
--cc=JBeulich@novell.com \
--cc=Xen-devel@lists.xensource.com \
--cc=hpa@zytor.com \
--cc=linux-kernel@vger.kernel.org \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).