linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] x86,mm: fix pte_free()
@ 2009-01-23 16:37 Peter Zijlstra
  2009-01-23 17:34 ` Ingo Molnar
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Peter Zijlstra @ 2009-01-23 16:37 UTC (permalink / raw)
  To: Linus Torvalds, Nick Piggin, Hugh Dickins, Thomas Gleixner,
	Ingo Molnar, Andrew Morton
  Cc: L-K, linux-mm, David Howells

On -rt we were seeing spurious bad page states like:

Bad page state in process 'firefox'
page:c1bc2380 flags:0x40000000 mapping:c1bc2390 mapcount:0 count:0
Trying to fix it up, but a reboot is needed
Backtrace:
Pid: 503, comm: firefox Not tainted 2.6.26.8-rt13 #3
[<c043d0f3>] ? printk+0x14/0x19
[<c0272d4e>] bad_page+0x4e/0x79
[<c0273831>] free_hot_cold_page+0x5b/0x1d3
[<c02739f6>] free_hot_page+0xf/0x11
[<c0273a18>] __free_pages+0x20/0x2b
[<c027d170>] __pte_alloc+0x87/0x91
[<c027d25e>] handle_mm_fault+0xe4/0x733
[<c043f680>] ? rt_mutex_down_read_trylock+0x57/0x63
[<c043f680>] ? rt_mutex_down_read_trylock+0x57/0x63
[<c0218875>] do_page_fault+0x36f/0x88a

This is the case where a concurrent fault already installed the PTE and
we get to free the newly allocated one.

This is due to pgtable_page_ctor() doing the spin_lock_init(&page->ptl)
which is overlaid with the {private, mapping} struct.

union {
    struct {
        unsigned long private;
        struct address_space *mapping;
    };
#if NR_CPUS >= CONFIG_SPLIT_PTLOCK_CPUS
    spinlock_t ptl;
#endif
    struct kmem_cache *slab;
    struct page *first_page;
};

Normally the spinlock is small enough to not stomp on page->mapping, but
PREEMPT_RT=y has huge 'spin'locks.

But lockdep kernels should also be able to trigger this splat, as the
lock tracking code grows the spinlock to cover page->mapping.

The obvious fix is calling pgtable_page_dtor() like the regular pte free
path __pte_free_tlb() does.

It seems all architectures except x86 and nm10300 already do this, and
nm10300 doesn't seem to use pgtable_page_ctor(), which suggests it
doesn't do SMP or simply doesnt do MMU at all or something.

Signed-off-by: Peter Zijlstra <a.p.zijlsta@chello.nl>
CC: stable@kernel.org
---
 arch/x86/include/asm/pgalloc.h |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/arch/x86/include/asm/pgalloc.h b/arch/x86/include/asm/pgalloc.h
index cb7c151..b99023c 100644
--- a/arch/x86/include/asm/pgalloc.h
+++ b/arch/x86/include/asm/pgalloc.h
@@ -42,6 +42,7 @@ static inline void pte_free_kernel(struct mm_struct *mm, pte_t *pte)
 
 static inline void pte_free(struct mm_struct *mm, struct page *pte)
 {
+	pgtable_page_dtor();
 	__free_page(pte);
 }
 


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] x86,mm: fix pte_free()
  2009-01-23 16:37 [PATCH] x86,mm: fix pte_free() Peter Zijlstra
@ 2009-01-23 17:34 ` Ingo Molnar
  2009-01-23 17:39   ` Peter Zijlstra
  2009-01-23 17:34 ` Peter Zijlstra
  2009-01-23 20:15 ` David Howells
  2 siblings, 1 reply; 8+ messages in thread
From: Ingo Molnar @ 2009-01-23 17:34 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Linus Torvalds, Nick Piggin, Hugh Dickins, Thomas Gleixner,
	Andrew Morton, L-K, linux-mm, David Howells


* Peter Zijlstra <peterz@infradead.org> wrote:

> On -rt we were seeing spurious bad page states like:
> 
> Bad page state in process 'firefox'
> page:c1bc2380 flags:0x40000000 mapping:c1bc2390 mapcount:0 count:0
> Trying to fix it up, but a reboot is needed
> Backtrace:
> Pid: 503, comm: firefox Not tainted 2.6.26.8-rt13 #3
> [<c043d0f3>] ? printk+0x14/0x19
> [<c0272d4e>] bad_page+0x4e/0x79
> [<c0273831>] free_hot_cold_page+0x5b/0x1d3
> [<c02739f6>] free_hot_page+0xf/0x11
> [<c0273a18>] __free_pages+0x20/0x2b
> [<c027d170>] __pte_alloc+0x87/0x91
> [<c027d25e>] handle_mm_fault+0xe4/0x733
> [<c043f680>] ? rt_mutex_down_read_trylock+0x57/0x63
> [<c043f680>] ? rt_mutex_down_read_trylock+0x57/0x63
> [<c0218875>] do_page_fault+0x36f/0x88a
> 
> This is the case where a concurrent fault already installed the PTE and
> we get to free the newly allocated one.
> 
> This is due to pgtable_page_ctor() doing the spin_lock_init(&page->ptl)
> which is overlaid with the {private, mapping} struct.
> 
> union {
>     struct {
>         unsigned long private;
>         struct address_space *mapping;
>     };
> #if NR_CPUS >= CONFIG_SPLIT_PTLOCK_CPUS
>     spinlock_t ptl;
> #endif
>     struct kmem_cache *slab;
>     struct page *first_page;
> };
> 
> Normally the spinlock is small enough to not stomp on page->mapping, but
> PREEMPT_RT=y has huge 'spin'locks.
> 
> But lockdep kernels should also be able to trigger this splat, as the
> lock tracking code grows the spinlock to cover page->mapping.
> 
> The obvious fix is calling pgtable_page_dtor() like the regular pte free
> path __pte_free_tlb() does.
> 
> It seems all architectures except x86 and nm10300 already do this, and
> nm10300 doesn't seem to use pgtable_page_ctor(), which suggests it
> doesn't do SMP or simply doesnt do MMU at all or something.
> 
> Signed-off-by: Peter Zijlstra <a.p.zijlsta@chello.nl>
> CC: stable@kernel.org
> ---
>  arch/x86/include/asm/pgalloc.h |    1 +
>  1 files changed, 1 insertions(+), 0 deletions(-)
> 
> diff --git a/arch/x86/include/asm/pgalloc.h b/arch/x86/include/asm/pgalloc.h
> index cb7c151..b99023c 100644
> --- a/arch/x86/include/asm/pgalloc.h
> +++ b/arch/x86/include/asm/pgalloc.h
> @@ -42,6 +42,7 @@ static inline void pte_free_kernel(struct mm_struct *mm, pte_t *pte)
>  
>  static inline void pte_free(struct mm_struct *mm, struct page *pte)
>  {
> +	pgtable_page_dtor();

i suspect on lockdep we dont see this in practice because it initializes 
things to NULL, which hides the issue. On -rt we initialize list heads 
there which brings up the wrong warning in the page free logic.

So i agree with the fix, but the patch does not look right: shouldnt that 
be pgtable_page_dtor(pte), so that we get ->mapping cleared via 
pte_lock_deinit()? (which i guess your intention was here - this probably 
wont even build)

	Ingo

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] x86,mm: fix pte_free()
  2009-01-23 16:37 [PATCH] x86,mm: fix pte_free() Peter Zijlstra
  2009-01-23 17:34 ` Ingo Molnar
@ 2009-01-23 17:34 ` Peter Zijlstra
  2009-01-23 18:42   ` Hugh Dickins
  2009-01-23 20:15 ` David Howells
  2 siblings, 1 reply; 8+ messages in thread
From: Peter Zijlstra @ 2009-01-23 17:34 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Nick Piggin, Hugh Dickins, Thomas Gleixner, Ingo Molnar,
	Andrew Morton, L-K, linux-mm, David Howells

On Fri, 2009-01-23 at 17:37 +0100, Peter Zijlstra wrote:
> On -rt we were seeing spurious bad page states like:
> 
> Bad page state in process 'firefox'
> page:c1bc2380 flags:0x40000000 mapping:c1bc2390 mapcount:0 count:0
> Trying to fix it up, but a reboot is needed
> Backtrace:
> Pid: 503, comm: firefox Not tainted 2.6.26.8-rt13 #3
> [<c043d0f3>] ? printk+0x14/0x19
> [<c0272d4e>] bad_page+0x4e/0x79
> [<c0273831>] free_hot_cold_page+0x5b/0x1d3
> [<c02739f6>] free_hot_page+0xf/0x11
> [<c0273a18>] __free_pages+0x20/0x2b
> [<c027d170>] __pte_alloc+0x87/0x91
> [<c027d25e>] handle_mm_fault+0xe4/0x733
> [<c043f680>] ? rt_mutex_down_read_trylock+0x57/0x63
> [<c043f680>] ? rt_mutex_down_read_trylock+0x57/0x63
> [<c0218875>] do_page_fault+0x36f/0x88a
> 
> This is the case where a concurrent fault already installed the PTE
> and
> we get to free the newly allocated one.
> 
> This is due to pgtable_page_ctor() doing the
> spin_lock_init(&page->ptl)
> which is overlaid with the {private, mapping} struct.
> 
> union {
>     struct {
>         unsigned long private;
>         struct address_space *mapping;
>     };
> #if NR_CPUS >= CONFIG_SPLIT_PTLOCK_CPUS
>     spinlock_t ptl;
> #endif
>     struct kmem_cache *slab;
>     struct page *first_page;
> };
> 
> Normally the spinlock is small enough to not stomp on page->mapping,
> but
> PREEMPT_RT=y has huge 'spin'locks.
> 
> But lockdep kernels should also be able to trigger this splat, as the
> lock tracking code grows the spinlock to cover page->mapping.
> 
> The obvious fix is calling pgtable_page_dtor() like the regular pte
> free
> path __pte_free_tlb() does.
> 
> It seems all architectures except x86 and nm10300 already do this, and
> nm10300 doesn't seem to use pgtable_page_ctor(), which suggests it
> doesn't do SMP or simply doesnt do MMU at all or something.
> 
> Signed-off-by: Peter Zijlstra <a.p.zijlsta@chello.nl>
> CC: stable@kernel.org

Now one that's not obviously borken,..

---
 arch/x86/include/asm/pgalloc.h |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/arch/x86/include/asm/pgalloc.h b/arch/x86/include/asm/pgalloc.h
index cb7c151..dd14c54 100644
--- a/arch/x86/include/asm/pgalloc.h
+++ b/arch/x86/include/asm/pgalloc.h
@@ -42,6 +42,7 @@ static inline void pte_free_kernel(struct mm_struct *mm, pte_t *pte)
 
 static inline void pte_free(struct mm_struct *mm, struct page *pte)
 {
+	pgtable_page_dtor(pte);
 	__free_page(pte);
 }
 


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] x86,mm: fix pte_free()
  2009-01-23 17:34 ` Ingo Molnar
@ 2009-01-23 17:39   ` Peter Zijlstra
  2009-01-23 17:45     ` Ingo Molnar
  0 siblings, 1 reply; 8+ messages in thread
From: Peter Zijlstra @ 2009-01-23 17:39 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Linus Torvalds, Nick Piggin, Hugh Dickins, Thomas Gleixner,
	Andrew Morton, L-K, linux-mm, David Howells

On Fri, 2009-01-23 at 18:34 +0100, Ingo Molnar wrote:

> So i agree with the fix, but the patch does not look right: shouldnt that 
> be pgtable_page_dtor(pte), so that we get ->mapping cleared via 
> pte_lock_deinit()? (which i guess your intention was here - this probably 
> wont even build)

Yeah, I somehow fudged it, already send out a better one. -- One of them
days I guess :-(

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] x86,mm: fix pte_free()
  2009-01-23 17:39   ` Peter Zijlstra
@ 2009-01-23 17:45     ` Ingo Molnar
  2009-01-26  3:09       ` KOSAKI Motohiro
  0 siblings, 1 reply; 8+ messages in thread
From: Ingo Molnar @ 2009-01-23 17:45 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Linus Torvalds, Nick Piggin, Hugh Dickins, Thomas Gleixner,
	Andrew Morton, L-K, linux-mm, David Howells


* Peter Zijlstra <peterz@infradead.org> wrote:

> On Fri, 2009-01-23 at 18:34 +0100, Ingo Molnar wrote:
> 
> > So i agree with the fix, but the patch does not look right: shouldnt that 
> > be pgtable_page_dtor(pte), so that we get ->mapping cleared via 
> > pte_lock_deinit()? (which i guess your intention was here - this probably 
> > wont even build)
> 
> Yeah, I somehow fudged it, already send out a better one. -- One of them
> days I guess :-(

no problem - applied to tip/x86/urgent, thanks Peter!

	Ingo

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] x86,mm: fix pte_free()
  2009-01-23 17:34 ` Peter Zijlstra
@ 2009-01-23 18:42   ` Hugh Dickins
  0 siblings, 0 replies; 8+ messages in thread
From: Hugh Dickins @ 2009-01-23 18:42 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Linus Torvalds, Nick Piggin, Thomas Gleixner, Ingo Molnar,
	Andrew Morton, L-K, linux-mm, David Howells

On Fri, 23 Jan 2009, Peter Zijlstra wrote:
> On Fri, 2009-01-23 at 17:37 +0100, Peter Zijlstra wrote:
> > On -rt we were seeing spurious bad page states like:
> > 
> > Bad page state in process 'firefox'
> > page:c1bc2380 flags:0x40000000 mapping:c1bc2390 mapcount:0 count:0
> > Trying to fix it up, but a reboot is needed
> > Backtrace:
> > Pid: 503, comm: firefox Not tainted 2.6.26.8-rt13 #3
> > [<c043d0f3>] ? printk+0x14/0x19
> > [<c0272d4e>] bad_page+0x4e/0x79
> > [<c0273831>] free_hot_cold_page+0x5b/0x1d3
> > [<c02739f6>] free_hot_page+0xf/0x11
> > [<c0273a18>] __free_pages+0x20/0x2b
> > [<c027d170>] __pte_alloc+0x87/0x91
> > [<c027d25e>] handle_mm_fault+0xe4/0x733
> > [<c043f680>] ? rt_mutex_down_read_trylock+0x57/0x63
> > [<c043f680>] ? rt_mutex_down_read_trylock+0x57/0x63
> > [<c0218875>] do_page_fault+0x36f/0x88a
> > 
> > This is the case where a concurrent fault already installed the PTE
> > and
> > we get to free the newly allocated one.
> > 
> > This is due to pgtable_page_ctor() doing the
> > spin_lock_init(&page->ptl)
> > which is overlaid with the {private, mapping} struct.
> > 
> > union {
> >     struct {
> >         unsigned long private;
> >         struct address_space *mapping;
> >     };
> > #if NR_CPUS >= CONFIG_SPLIT_PTLOCK_CPUS
> >     spinlock_t ptl;
> > #endif
> >     struct kmem_cache *slab;
> >     struct page *first_page;
> > };
> > 
> > Normally the spinlock is small enough to not stomp on page->mapping,
> > but
> > PREEMPT_RT=y has huge 'spin'locks.
> > 
> > But lockdep kernels should also be able to trigger this splat, as the
> > lock tracking code grows the spinlock to cover page->mapping.
> > 
> > The obvious fix is calling pgtable_page_dtor() like the regular pte
> > free
> > path __pte_free_tlb() does.
> > 
> > It seems all architectures except x86 and nm10300 already do this, and
> > nm10300 doesn't seem to use pgtable_page_ctor(), which suggests it
> > doesn't do SMP or simply doesnt do MMU at all or something.
> > 
> > Signed-off-by: Peter Zijlstra <a.p.zijlsta@chello.nl>
> > CC: stable@kernel.org

Thanks, Peter: good catch.  That pgtable_page_dtor() had long been there
in pte_free(), then somehow got lost in one of 2.6.26's rearrangements.

Acked-by: Hugh Dickins <hugh@veritas.com>

> 
> Now one that's not obviously borken,..

And I can quite see why you voided the first version:
your mind rightly stalled on that foul "struct page *pte".
Oh well, clean that up some other time.

Hugh

> 
> ---
>  arch/x86/include/asm/pgalloc.h |    1 +
>  1 files changed, 1 insertions(+), 0 deletions(-)
> 
> diff --git a/arch/x86/include/asm/pgalloc.h b/arch/x86/include/asm/pgalloc.h
> index cb7c151..dd14c54 100644
> --- a/arch/x86/include/asm/pgalloc.h
> +++ b/arch/x86/include/asm/pgalloc.h
> @@ -42,6 +42,7 @@ static inline void pte_free_kernel(struct mm_struct *mm, pte_t *pte)
>  
>  static inline void pte_free(struct mm_struct *mm, struct page *pte)
>  {
> +	pgtable_page_dtor(pte);
>  	__free_page(pte);
>  }

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] x86,mm: fix pte_free()
  2009-01-23 16:37 [PATCH] x86,mm: fix pte_free() Peter Zijlstra
  2009-01-23 17:34 ` Ingo Molnar
  2009-01-23 17:34 ` Peter Zijlstra
@ 2009-01-23 20:15 ` David Howells
  2 siblings, 0 replies; 8+ messages in thread
From: David Howells @ 2009-01-23 20:15 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: dhowells, Linus Torvalds, Nick Piggin, Hugh Dickins,
	Thomas Gleixner, Ingo Molnar, Andrew Morton, L-K, linux-mm

Peter Zijlstra <peterz@infradead.org> wrote:

> It seems all architectures except x86 and nm10300 already do this, and
> nm10300 doesn't seem to use pgtable_page_ctor(), which suggests it
> doesn't do SMP or simply doesnt do MMU at all or something.

MN10300 does not, as yet, do SMP.

David

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] x86,mm: fix pte_free()
  2009-01-23 17:45     ` Ingo Molnar
@ 2009-01-26  3:09       ` KOSAKI Motohiro
  0 siblings, 0 replies; 8+ messages in thread
From: KOSAKI Motohiro @ 2009-01-26  3:09 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: kosaki.motohiro, Peter Zijlstra, Linus Torvalds, Nick Piggin,
	Hugh Dickins, Thomas Gleixner, Andrew Morton, L-K, linux-mm,
	David Howells

> 
> * Peter Zijlstra <peterz@infradead.org> wrote:
> 
> > On Fri, 2009-01-23 at 18:34 +0100, Ingo Molnar wrote:
> > 
> > > So i agree with the fix, but the patch does not look right: shouldnt that 
> > > be pgtable_page_dtor(pte), so that we get ->mapping cleared via 
> > > pte_lock_deinit()? (which i guess your intention was here - this probably 
> > > wont even build)
> > 
> > Yeah, I somehow fudged it, already send out a better one. -- One of them
> > days I guess :-(
> 
> no problem - applied to tip/x86/urgent, thanks Peter!

please fix typo. s/nm10300/MN10300/ :)
at first look, I don't understand his intention.



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2009-01-26  3:09 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-01-23 16:37 [PATCH] x86,mm: fix pte_free() Peter Zijlstra
2009-01-23 17:34 ` Ingo Molnar
2009-01-23 17:39   ` Peter Zijlstra
2009-01-23 17:45     ` Ingo Molnar
2009-01-26  3:09       ` KOSAKI Motohiro
2009-01-23 17:34 ` Peter Zijlstra
2009-01-23 18:42   ` Hugh Dickins
2009-01-23 20:15 ` David Howells

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).