[2.6.32.y][PATCH] fix pgd

Linux kernel -stable discussions
 help / color / mirror / Atom feed

* [2.6.32.y][PATCH] fix pgd_lock deadlock
       [not found]   ` <20110216144947.GA5935@random.random>
@ 2012-04-23  9:07     ` Philipp Hahn
  2012-04-23 19:09       ` Willy Tarreau
  0 siblings, 1 reply; 2+ messages in thread
From: Philipp Hahn @ 2012-04-23  9:07 UTC (permalink / raw)
  To: stable
  Cc: Andrea Arcangeli, Ingo Molnar, Jeremy Fitzhardinge,
	Peter Zijlstra, the arch/x86 maintainers, Hugh Dickins,
	Linux Kernel Mailing List, Jan Beulich, Andi Kleen, Andrew Morton,
	Johannes Weiner, H. Peter Anvin, Thomas Gleixner, Larry Woodman,
	Rik van Riel, Konrad Rzeszutek Wilk, Linus Torvalds, 669335


[-- Attachment #1.1: Type: text/plain, Size: 2790 bytes --]

Hello,

On Wednesday 16 February 2011 15:49:47 Andrea Arcangeli wrote:
> Subject: fix pgd_lock deadlock
>
> From: Andrea Arcangeli <aarcange@redhat.com>
>
> It's forbidden to take the page_table_lock with the irq disabled or if
> there's contention the IPIs (for tlb flushes) sent with the page_table_lock
> held will never run leading to a deadlock.
>
> Apparently nobody takes the pgd_lock from irq so the _irqsave can be
> removed.
>
> Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>

This patch (original commit Id for 2.6.38 
a79e53d85683c6dd9f99c90511028adc2043031f) needs to be back-ported to 2.6.32.x 
as well.
I observed a dead-lock problem when running a PAE enabled Debian 2.6.32.46+ 
kernel with 6 VCPUs as a KVM on (2.6.32, 3.2, 3.3) kernel, which showed the 
following behaviour:

1 VCPU is stuck in
  pgd_alloc() → pgd_prepopulate_pmb() →... →  flush_tlb_others_ipi()
while (!cpumask_empty(to_cpumask(f->flush_cpumask)))
    cpu_relax();
(gdb) print f->flush_cpumask
$5 = {1}

while all other VCPUs are stuck in
  pgd_alloc() → spin_lock_irqsave(pgd_lock)

I tracked it down to the commit
 2.6.39-rc1: 4981d01eada5354d81c8929d5b2836829ba3df7b
 2.6.32.34: ba456fd7ec1bdc31a4ad4a6bd02802dcaa730a33
 x86: Flush TLB if PGD entry is changed in i386 PAE mode
which when reverted made the bug disappear.

Comparing 3.2 to 2.6.32.34 showed that the 'pgd-deadlock'-patch went into 
2.6.38, that is before the 'PAE correctness'-patch, so the problem was 
probably never observed in the main development branch.
But for 2.6.32 the 'pgd-deadlock' patch is still missing, so the 'PAE 
corretness'-patch made the problem worse with 2.6.32.

The Patch was also back-ported to the OpenSUSE Kernel
<http://kernel.opensuse.org/cgit/kernel-source/commit/?id=ac27c01aa880c65d17043ab87249c613ac4c3635>,
Since the patch didn't apply cleanly on the current Debian kernel, I had to 
backport it for us and Debian. The patch is also available from our (German) 
Bugzilla <https://forge.univention.org/bugzilla/show_bug.cgi?id=26661> or 
from the Debian BTS at 
<http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=669335>.

I have no easy test case, but running multiple parallel builds inside the VM 
normally triggers the bug within seconds to minutes. With the patch applied 
the VM survived a night building packages without any problem.

Signed-off-by: Philipp Hahn <hahn@univention.de>

Sincerely
Philipp
-- 
Philipp Hahn           Open Source Software Engineer      hahn@univention.de
Univention GmbH        be open.                       fon: +49 421 22 232- 0
Mary-Somerville-Str.1  D-28359 Bremen                 fax: +49 421 22 232-99
                                                   http://www.univention.de/

[-- Attachment #1.2: x86-mm-Fix-pgd_lock-deadlock.patch --]
[-- Type: text/x-diff, Size: 5882 bytes --]

It's forbidden to take the page_table_lock with the irq disabled
or if there's contention the IPIs (for tlb flushes) sent with
the page_table_lock held will never run leading to a deadlock.

Nobody takes the pgd_lock from irq context so the _irqsave can be
removed.

Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Acked-by: Rik van Riel <riel@redhat.com>
Tested-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: <stable@kernel.org>
LKML-Reference: <201102162345.p1GNjMjm021738@imap1.linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Git-commit: a79e53d85683c6dd9f99c90511028adc2043031f
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -223,15 +223,14 @@ void vmalloc_sync_all(void)
 	     address >= TASK_SIZE && address < FIXADDR_TOP;
 	     address += PMD_SIZE) {
 
-		unsigned long flags;
 		struct page *page;
 
-		spin_lock_irqsave(&pgd_lock, flags);
+		spin_lock(&pgd_lock);
 		list_for_each_entry(page, &pgd_list, lru) {
 			if (!vmalloc_sync_one(page_address(page), address))
 				break;
 		}
-		spin_unlock_irqrestore(&pgd_lock, flags);
+		spin_unlock(&pgd_lock);
 	}
 }
 
@@ -331,13 +330,12 @@ void vmalloc_sync_all(void)
 	     address += PGDIR_SIZE) {
 
 		const pgd_t *pgd_ref = pgd_offset_k(address);
-		unsigned long flags;
 		struct page *page;
 
 		if (pgd_none(*pgd_ref))
 			continue;
 
-		spin_lock_irqsave(&pgd_lock, flags);
+		spin_lock(&pgd_lock);
 		list_for_each_entry(page, &pgd_list, lru) {
 			pgd_t *pgd;
 			pgd = (pgd_t *)page_address(page) + pgd_index(address);
@@ -346,7 +344,7 @@ void vmalloc_sync_all(void)
 			else
 				BUG_ON(pgd_page_vaddr(*pgd) != pgd_page_vaddr(*pgd_ref));
 		}
-		spin_unlock_irqrestore(&pgd_lock, flags);
+		spin_unlock(&pgd_lock);
 	}
 }
 
--- a/arch/x86/mm/pageattr.c
+++ b/arch/x86/mm/pageattr.c
@@ -56,12 +56,10 @@ static unsigned long direct_pages_count[
 
 void update_page_count(int level, unsigned long pages)
 {
-	unsigned long flags;
-
 	/* Protect against CPA */
-	spin_lock_irqsave(&pgd_lock, flags);
+	spin_lock(&pgd_lock);
 	direct_pages_count[level] += pages;
-	spin_unlock_irqrestore(&pgd_lock, flags);
+	spin_unlock(&pgd_lock);
 }
 
 static void split_page_count(int level)
@@ -354,7 +352,7 @@ static int
 try_preserve_large_page(pte_t *kpte, unsigned long address,
 			struct cpa_data *cpa)
 {
-	unsigned long nextpage_addr, numpages, pmask, psize, flags, addr, pfn;
+	unsigned long nextpage_addr, numpages, pmask, psize, addr, pfn;
 	pte_t new_pte, old_pte, *tmp;
 	pgprot_t old_prot, new_prot;
 	int i, do_split = 1;
@@ -363,7 +361,7 @@ try_preserve_large_page(pte_t *kpte, uns
 	if (cpa->force_split)
 		return 1;
 
-	spin_lock_irqsave(&pgd_lock, flags);
+	spin_lock(&pgd_lock);
 	/*
 	 * Check for races, another CPU might have split this page
 	 * up already:
@@ -458,14 +456,14 @@ try_preserve_large_page(pte_t *kpte, uns
 	}
 
 out_unlock:
-	spin_unlock_irqrestore(&pgd_lock, flags);
+	spin_unlock(&pgd_lock);
 
 	return do_split;
 }
 
 static int split_large_page(pte_t *kpte, unsigned long address)
 {
-	unsigned long flags, pfn, pfninc = 1;
+	unsigned long pfn, pfninc = 1;
 	unsigned int i, level;
 	pte_t *pbase, *tmp;
 	pgprot_t ref_prot;
@@ -479,7 +477,7 @@ static int split_large_page(pte_t *kpte,
 	if (!base)
 		return -ENOMEM;
 
-	spin_lock_irqsave(&pgd_lock, flags);
+	spin_lock(&pgd_lock);
 	/*
 	 * Check for races, another CPU might have split this page
 	 * up for us already:
@@ -551,7 +549,7 @@ out_unlock:
 	 */
 	if (base)
 		__free_page(base);
-	spin_unlock_irqrestore(&pgd_lock, flags);
+	spin_unlock(&pgd_lock);
 
 	return 0;
 }
--- a/arch/x86/mm/pgtable.c
+++ b/arch/x86/mm/pgtable.c
@@ -110,14 +110,12 @@ static void pgd_ctor(pgd_t *pgd)
 
 static void pgd_dtor(pgd_t *pgd)
 {
-	unsigned long flags; /* can be called from interrupt context */
-
 	if (SHARED_KERNEL_PMD)
 		return;
 
-	spin_lock_irqsave(&pgd_lock, flags);
+	spin_lock(&pgd_lock);
 	pgd_list_del(pgd);
-	spin_unlock_irqrestore(&pgd_lock, flags);
+	spin_unlock(&pgd_lock);
 }
 
 /*
@@ -248,7 +246,6 @@ pgd_t *pgd_alloc(struct mm_struct *mm)
 {
 	pgd_t *pgd;
 	pmd_t *pmds[PREALLOCATED_PMDS];
-	unsigned long flags;
 
 	pgd = (pgd_t *)__get_free_page(PGALLOC_GFP);
 
@@ -268,12 +265,12 @@ pgd_t *pgd_alloc(struct mm_struct *mm)
 	 * respect to anything walking the pgd_list, so that they
 	 * never see a partially populated pgd.
 	 */
-	spin_lock_irqsave(&pgd_lock, flags);
+	spin_lock(&pgd_lock);
 
 	pgd_ctor(pgd);
 	pgd_prepopulate_pmd(mm, pgd, pmds);
 
-	spin_unlock_irqrestore(&pgd_lock, flags);
+	spin_unlock(&pgd_lock);
 
 	return pgd;
 
--- a/arch/x86/xen/mmu.c
+++ b/arch/x86/xen/mmu.c
@@ -988,10 +988,9 @@ static void xen_pgd_pin(struct mm_struct
  */
 void xen_mm_pin_all(void)
 {
-	unsigned long flags;
 	struct page *page;
 
-	spin_lock_irqsave(&pgd_lock, flags);
+	spin_lock(&pgd_lock);
 
 	list_for_each_entry(page, &pgd_list, lru) {
 		if (!PagePinned(page)) {
@@ -1000,7 +999,7 @@ void xen_mm_pin_all(void)
 		}
 	}
 
-	spin_unlock_irqrestore(&pgd_lock, flags);
+	spin_unlock(&pgd_lock);
 }
 
 /*
@@ -1101,10 +1100,9 @@ static void xen_pgd_unpin(struct mm_stru
  */
 void xen_mm_unpin_all(void)
 {
-	unsigned long flags;
 	struct page *page;
 
-	spin_lock_irqsave(&pgd_lock, flags);
+	spin_lock(&pgd_lock);
 
 	list_for_each_entry(page, &pgd_list, lru) {
 		if (PageSavePinned(page)) {
@@ -1114,7 +1112,7 @@ void xen_mm_unpin_all(void)
 		}
 	}
 
-	spin_unlock_irqrestore(&pgd_lock, flags);
+	spin_unlock(&pgd_lock);
 }
 
 void xen_activate_mm(struct mm_struct *prev, struct mm_struct *next)

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 197 bytes --]

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: [2.6.32.y][PATCH] fix pgd_lock deadlock
  2012-04-23  9:07     ` [2.6.32.y][PATCH] fix pgd_lock deadlock Philipp Hahn
@ 2012-04-23 19:09       ` Willy Tarreau
  0 siblings, 0 replies; 2+ messages in thread
From: Willy Tarreau @ 2012-04-23 19:09 UTC (permalink / raw)
  To: Philipp Hahn
  Cc: stable, Andrea Arcangeli, Ingo Molnar, Jeremy Fitzhardinge,
	Peter Zijlstra, the arch/x86 maintainers, Hugh Dickins,
	Linux Kernel Mailing List, Jan Beulich, Andi Kleen, Andrew Morton,
	Johannes Weiner, H. Peter Anvin, Thomas Gleixner, Larry Woodman,
	Rik van Riel, Konrad Rzeszutek Wilk, Linus Torvalds, 669335

Hello Philipp,

On Mon, Apr 23, 2012 at 11:07:53AM +0200, Philipp Hahn wrote:
> Hello,
> 
> On Wednesday 16 February 2011 15:49:47 Andrea Arcangeli wrote:
> > Subject: fix pgd_lock deadlock
> >
> > From: Andrea Arcangeli <aarcange@redhat.com>
> >
> > It's forbidden to take the page_table_lock with the irq disabled or if
> > there's contention the IPIs (for tlb flushes) sent with the page_table_lock
> > held will never run leading to a deadlock.
> >
> > Apparently nobody takes the pgd_lock from irq so the _irqsave can be
> > removed.
> >
> > Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
> 
> This patch (original commit Id for 2.6.38 
> a79e53d85683c6dd9f99c90511028adc2043031f) needs to be back-ported to 2.6.32.x 
> as well.
> I observed a dead-lock problem when running a PAE enabled Debian 2.6.32.46+ 
> kernel with 6 VCPUs as a KVM on (2.6.32, 3.2, 3.3) kernel, which showed the 
> following behaviour:
> 
> 1 VCPU is stuck in
>   pgd_alloc() ??? pgd_prepopulate_pmb() ???... ???  flush_tlb_others_ipi()
> while (!cpumask_empty(to_cpumask(f->flush_cpumask)))
>     cpu_relax();
> (gdb) print f->flush_cpumask
> $5 = {1}
> 
> while all other VCPUs are stuck in
>   pgd_alloc() ??? spin_lock_irqsave(pgd_lock)
> 
> I tracked it down to the commit
>  2.6.39-rc1: 4981d01eada5354d81c8929d5b2836829ba3df7b
>  2.6.32.34: ba456fd7ec1bdc31a4ad4a6bd02802dcaa730a33
>  x86: Flush TLB if PGD entry is changed in i386 PAE mode
> which when reverted made the bug disappear.
> 
> Comparing 3.2 to 2.6.32.34 showed that the 'pgd-deadlock'-patch went into 
> 2.6.38, that is before the 'PAE correctness'-patch, so the problem was 
> probably never observed in the main development branch.
> But for 2.6.32 the 'pgd-deadlock' patch is still missing, so the 'PAE 
> corretness'-patch made the problem worse with 2.6.32.
> 
> The Patch was also back-ported to the OpenSUSE Kernel
> <http://kernel.opensuse.org/cgit/kernel-source/commit/?id=ac27c01aa880c65d17043ab87249c613ac4c3635>,
> Since the patch didn't apply cleanly on the current Debian kernel, I had to 
> backport it for us and Debian. The patch is also available from our (German) 
> Bugzilla <https://forge.univention.org/bugzilla/show_bug.cgi?id=26661> or 
> from the Debian BTS at 
> <http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=669335>.
> 
> I have no easy test case, but running multiple parallel builds inside the VM 
> normally triggers the bug within seconds to minutes. With the patch applied 
> the VM survived a night building packages without any problem.
> 
> Signed-off-by: Philipp Hahn <hahn@univention.de>
> 
> Sincerely
> Philipp

Thank you, I'm queuing it for next 32-stable.

Regards,
Willy


^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2012-04-23 19:09 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <alpine.LFD.2.00.1102152020590.26192@localhost6.localdomain6>
     [not found] ` <20110216102801.GA23082@elte.hu>
     [not found]   ` <20110216144947.GA5935@random.random>
2012-04-23  9:07     ` [2.6.32.y][PATCH] fix pgd_lock deadlock Philipp Hahn
2012-04-23 19:09       ` Willy Tarreau

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox