* mmu_hash_lock deadlock causes kernel stuck at 2.6.21 SMP powerpc 32bit
@ 2008-03-30 11:47 Gaash Hazan
0 siblings, 0 replies; 4+ messages in thread
From: Gaash Hazan @ 2008-03-30 11:47 UTC (permalink / raw)
To: linuxppc-dev; +Cc: gaash-ppclnx, gilad
Hello PPC SMP MM experts,
mmu_hash_lock (arch/powerpc/mm/hash_low_32.S) is a
(non-standard) spin lock that protects the CPU MMU
hashing table. It exists and used only with SMP
configurations.
In some scenarios, the spin lock is taken when
interrupts are *enabled* causing kernel deadlock at
the next take attempt in the same CPU.
The deadlock happened on 2.6.21 kernel, Powerpc 32 bit
with SMP enabled. At this moment system had one active
CPU. The sequence I saw was:
do_exit (program termination)
exit_mm
mmput
exit_mmap
free_pgtables
free_pgd_range
unmap_vmas
pte_free
hash_page_sync (takes mmu_hash_lock. Note: interrupts
are enabled)
timer_interrupt (timer interrupts occurs during
hash_page_sync, lock is taken)
irq_exit
do_softirq
__do_softirq
net_rx_action (packet received from network)
( ... omitted ... )
xdr_skb_read_bits
skb_copy_bits
memcpy - memcpy causes DSI exception(0x300). This is
OK.
DSI exception handler calls hash_page
hash_page waits for mmu_mash_lock. It waits forever
since the lock is already taken.
Deadlock! with interrupts disabled. kernel is dead.
I think the rout cause of the problem is
hash_page_sync() taking the mmu_hash_lock spin lock
without disabling interrupts. This leads to the
deadlock.
To verify the theory, hash_page_sync() was wrapped
with interrupts disabled code and problem never
occurred again. Of course this is temporary workaround
as there are several places needed to be fixed.
What do you think?
Thanks,
Gaash
^ permalink raw reply [flat|nested] 4+ messages in thread
* mmu_hash_lock deadlock causes kernel stuck at 2.6.21 SMP powerpc 32bit
@ 2008-03-30 21:28 Gaash Hazan
2008-03-30 21:40 ` Benjamin Herrenschmidt
2008-03-30 21:49 ` [PATCH] powerpc: Fix deadlock with mmu_hash_lock in hash_page_sync Benjamin Herrenschmidt
0 siblings, 2 replies; 4+ messages in thread
From: Gaash Hazan @ 2008-03-30 21:28 UTC (permalink / raw)
To: linuxppc-dev; +Cc: gaash-ppclnx, Gilad Ben-Yossef
(reposting)
Hello PPC SMP MM experts,
mmu_hash_lock (arch/powerpc/mm/hash_low_32.S) is a
(non-standard) spin lock that protects the CPU MMU
hashing table. It exists and used only with SMP
configurations.
In some scenarios, the spin lock is taken when
interrupts are *enabled* causing kernel deadlock at
the next take attempt in the same CPU.
The deadlock happened on 2.6.21 kernel, Powerpc 32 bit
with SMP enabled. At this moment system had one active
CPU. The sequence I saw was:
do_exit (program termination)
exit_mm
mmput
exit_mmap
free_pgtables
free_pgd_range
unmap_vmas
pte_free
hash_page_sync (takes mmu_hash_lock. Note: interrupts
are enabled)
timer_interrupt (timer interrupts occurs during
hash_page_sync, lock is taken) irq_exit do_softirq
__do_softirq net_rx_action (packet received from
network) ( ... omitted ... ) xdr_skb_read_bits
skb_copy_bits memcpy - memcpy causes DSI
exception(0x300). This is OK.
DSI exception handler calls hash_page
hash_page waits for mmu_mash_lock. It waits forever
since the lock is already taken.
Deadlock! with interrupts disabled. kernel is dead.
I think the rout cause of the problem is
hash_page_sync() taking the mmu_hash_lock spin lock
without disabling interrupts. This leads to the
deadlock.
To verify the theory, hash_page_sync() was wrapped
with interrupts disabled code and problem never
occurred again. Of course this is temporary workaround
as there are several places needed to be fixed.
What do you think?
Thanks,
Gaash
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: mmu_hash_lock deadlock causes kernel stuck at 2.6.21 SMP powerpc 32bit
2008-03-30 21:28 mmu_hash_lock deadlock causes kernel stuck at 2.6.21 SMP powerpc 32bit Gaash Hazan
@ 2008-03-30 21:40 ` Benjamin Herrenschmidt
2008-03-30 21:49 ` [PATCH] powerpc: Fix deadlock with mmu_hash_lock in hash_page_sync Benjamin Herrenschmidt
1 sibling, 0 replies; 4+ messages in thread
From: Benjamin Herrenschmidt @ 2008-03-30 21:40 UTC (permalink / raw)
To: gaash-ppclnx; +Cc: linuxppc-dev, Gilad Ben-Yossef
On Sun, 2008-03-30 at 14:28 -0700, Gaash Hazan wrote:
> To verify the theory, hash_page_sync() was wrapped
> with interrupts disabled code and problem never
> occurred again. Of course this is temporary workaround
> as there are several places needed to be fixed.
It is definitely a bug in hash_page_sync() which should
clear MSR:EE while holding the lock.
I'll do a patch.
Thanks for finding that out !
Cheers,
Ben.
^ permalink raw reply [flat|nested] 4+ messages in thread
* [PATCH] powerpc: Fix deadlock with mmu_hash_lock in hash_page_sync
2008-03-30 21:28 mmu_hash_lock deadlock causes kernel stuck at 2.6.21 SMP powerpc 32bit Gaash Hazan
2008-03-30 21:40 ` Benjamin Herrenschmidt
@ 2008-03-30 21:49 ` Benjamin Herrenschmidt
1 sibling, 0 replies; 4+ messages in thread
From: Benjamin Herrenschmidt @ 2008-03-30 21:49 UTC (permalink / raw)
To: gaash-ppclnx; +Cc: linuxppc-dev, Paul Mackerras, Gilad Ben-Yossef
hash_page_sync() takes and releases the low level mmu hash
lock in order to sync with other processors disposing of page
tables. Because that lock can be needed to service hash misses
triggered by interrupt handler, taking it must be done with
interrupts off. However, hash_page_sync() appear to be called
with interrupts enabled, thus causing occasional deadlocks.
We fix it by making sure hash_page_sync() masks interrupts while
holding the lock.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
---
Please test and report asap thought it's probably too late for
2.6.25, it can still go into stable later on.
Index: linux-work/arch/powerpc/mm/hash_low_32.S
===================================================================
--- linux-work.orig/arch/powerpc/mm/hash_low_32.S 2008-03-31 08:42:56.000000000 +1100
+++ linux-work/arch/powerpc/mm/hash_low_32.S 2008-03-31 08:45:05.000000000 +1100
@@ -44,6 +44,9 @@ mmu_hash_lock:
#ifdef CONFIG_SMP
.text
_GLOBAL(hash_page_sync)
+ mfmsr r10
+ rlwinm r0,r10,0,17,15 /* clear bit 16 (MSR_EE) */
+ mtmsr r0
lis r8,mmu_hash_lock@h
ori r8,r8,mmu_hash_lock@l
lis r0,0x0fff
@@ -60,8 +63,9 @@ _GLOBAL(hash_page_sync)
eieio
li r0,0
stw r0,0(r8)
- blr
-#endif
+ mtmsr r10
+ blr
+#endif /* CONFIG_SMP */
/*
* Load a PTE into the hash table, if possible.
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2008-03-30 21:50 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-03-30 21:28 mmu_hash_lock deadlock causes kernel stuck at 2.6.21 SMP powerpc 32bit Gaash Hazan
2008-03-30 21:40 ` Benjamin Herrenschmidt
2008-03-30 21:49 ` [PATCH] powerpc: Fix deadlock with mmu_hash_lock in hash_page_sync Benjamin Herrenschmidt
-- strict thread matches above, loose matches on Subject: below --
2008-03-30 11:47 mmu_hash_lock deadlock causes kernel stuck at 2.6.21 SMP powerpc 32bit Gaash Hazan
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).