* [PATCH, RFC] fix parisc runtime hangs wrt pa_tlb_lock
@ 2009-05-22 20:13 Helge Deller
2009-05-23 15:26 ` John David Anglin
` (2 more replies)
0 siblings, 3 replies; 13+ messages in thread
From: Helge Deller @ 2009-05-22 20:13 UTC (permalink / raw)
To: linux-parisc, John David Anglin
Since many kernel versions I regularly faced reproducibly kernel hangs when
compiling some bigger files. The machine suddenly just seemed to hang.
With spinlock debugging turned on I found this:
BUG: spinlock recursion on CPU#0, tool/7263
lock: 10644000, .magic: dead4ead, .owner: tool/7263, .owner_cpu: 0
Backtrace:
[<10113a94>] show_stack+0x18/0x28
BUG: spinlock lockup on CPU#0, tool/7263, 10644000
Backtrace:
[<10113a94>] show_stack+0x18/0x28
BUG: soft lockup - CPU#0 stuck for 61s! [tool:7263]
IASQ: 00000000 00000000 IAOQ: 102d55dc 102d557c
IIR: 03c008b3 ISR: 00000000 IOR: 00000000
CPU: 0 CR30: 7d1a4000 CR31: 11111111
ORIG_R28: 00000000
IAOQ[0]: _raw_spin_lock+0x15c/0x1c0
IAOQ[1]: _raw_spin_lock+0xfc/0x1c0
RP(r2): _raw_spin_lock+0x18c/0x1c0
Backtrace:
[<102d560c>] _raw_spin_lock+0x18c/0x1c0
Kernel panic - not syncing: softlockup: hung tasks
Backtrace:
[<10113a94>] show_stack+0x18/0x28
The lock at "10644000" references pa_tlb_lock, which is used to lock the PxTLB
broadcasts (see comment in cache.c).
So, I think we need to use the irq-blocking functions for purge_tlb_start() and
purge_tlb_end(). I did tested the patch below, and since then I didn't faced one
single hang any longer.
The patch below is just a hack. It will need a cleanup and is just to open
discussions here on the list.
Helge
diff --git a/arch/parisc/include/asm/tlbflush.h b/arch/parisc/include/asm/tlbflush.h
index 1f6fd4f..9b92fbe 100644
--- a/arch/parisc/include/asm/tlbflush.h
+++ b/arch/parisc/include/asm/tlbflush.h
@@ -18,8 +18,8 @@
*/
extern spinlock_t pa_tlb_lock;
-#define purge_tlb_start(x) spin_lock(&pa_tlb_lock)
-#define purge_tlb_end(x) spin_unlock(&pa_tlb_lock)
+#define purge_tlb_start(x) do { unsigned long flags; spin_lock_irqsave(&pa_tlb_lock, flags)
+#define purge_tlb_end(x) spin_unlock_irqrestore(&pa_tlb_lock, flags); } while (0)
extern void flush_tlb_all(void);
extern void flush_tlb_all_local(void *);
^ permalink raw reply related [flat|nested] 13+ messages in thread* Re: [PATCH, RFC] fix parisc runtime hangs wrt pa_tlb_lock 2009-05-22 20:13 [PATCH, RFC] fix parisc runtime hangs wrt pa_tlb_lock Helge Deller @ 2009-05-23 15:26 ` John David Anglin 2009-05-23 15:34 ` John David Anglin 2009-05-23 21:53 ` John David Anglin 2009-05-28 1:50 ` John David Anglin 2 siblings, 1 reply; 13+ messages in thread From: John David Anglin @ 2009-05-23 15:26 UTC (permalink / raw) To: Helge Deller; +Cc: linux-parisc > -#define purge_tlb_start(x) spin_lock(&pa_tlb_lock) > -#define purge_tlb_end(x) spin_unlock(&pa_tlb_lock) > +#define purge_tlb_start(x) do { unsigned long flags; spin_lock_irqsave(&pa_tlb_lock, flags) > +#define purge_tlb_end(x) spin_unlock_irqrestore(&pa_tlb_lock, flags); } while (0) To my taste, the handling of the "flags" variable is ugly and potentially error prone. I think it would be better to add another argument to the macros, and declare the variable flags in the routines that use the macros (as is done for the current users of spin_lock_irqsave macro). Dave -- J. David Anglin dave.anglin@nrc-cnrc.gc.ca National Research Council of Canada (613) 990-0752 (FAX: 952-6602) ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH, RFC] fix parisc runtime hangs wrt pa_tlb_lock 2009-05-23 15:26 ` John David Anglin @ 2009-05-23 15:34 ` John David Anglin 2009-05-23 19:34 ` Helge Deller 0 siblings, 1 reply; 13+ messages in thread From: John David Anglin @ 2009-05-23 15:34 UTC (permalink / raw) To: John David Anglin; +Cc: deller, linux-parisc > > -#define purge_tlb_start(x) spin_lock(&pa_tlb_lock) > > -#define purge_tlb_end(x) spin_unlock(&pa_tlb_lock) > > +#define purge_tlb_start(x) do { unsigned long flags; spin_lock_irqsave(&pa_tlb_lock, flags) > > +#define purge_tlb_end(x) spin_unlock_irqrestore(&pa_tlb_lock, flags); } while (0) > > To my taste, the handling of the "flags" variable is ugly and potentially > error prone. > > I think it would be better to add another argument to the macros, and declare > the variable flags in the routines that use the macros (as is done for the > current users of spin_lock_irqsave macro). Actually, the 'x' argument could be used for this. Dave -- J. David Anglin dave.anglin@nrc-cnrc.gc.ca National Research Council of Canada (613) 990-0752 (FAX: 952-6602) ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH, RFC] fix parisc runtime hangs wrt pa_tlb_lock 2009-05-23 15:34 ` John David Anglin @ 2009-05-23 19:34 ` Helge Deller 2009-05-23 20:03 ` John David Anglin 0 siblings, 1 reply; 13+ messages in thread From: Helge Deller @ 2009-05-23 19:34 UTC (permalink / raw) To: John David Anglin; +Cc: linux-parisc John David Anglin wrote: >>> -#define purge_tlb_start(x) spin_lock(&pa_tlb_lock) >>> -#define purge_tlb_end(x) spin_unlock(&pa_tlb_lock) >>> +#define purge_tlb_start(x) do { unsigned long flags; spin_lock_irqsave(&pa_tlb_lock, flags) >>> +#define purge_tlb_end(x) spin_unlock_irqrestore(&pa_tlb_lock, flags); } while (0) >> To my taste, the handling of the "flags" variable is ugly and potentially >> error prone. >> >> I think it would be better to add another argument to the macros, and declare >> the variable flags in the routines that use the macros (as is done for the >> current users of spin_lock_irqsave macro). > > Actually, the 'x' argument could be used for this. I mentioned in my original mail that this is not the final version: > The patch below is just a hack. It will need a cleanup and is just to open > discussions here on the list. But instead of cleaning it up right now, I would be very much interested if this patch fixes the kernel stalls you see as well? If it does, then I agree that we should add a "flags" argument to purge_tlb_xxx() functions, e.g. - purge_tlb_start(lock, flags) - purge_tlb_end(lock, flags) Helge ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH, RFC] fix parisc runtime hangs wrt pa_tlb_lock 2009-05-23 19:34 ` Helge Deller @ 2009-05-23 20:03 ` John David Anglin 0 siblings, 0 replies; 13+ messages in thread From: John David Anglin @ 2009-05-23 20:03 UTC (permalink / raw) To: Helge Deller; +Cc: linux-parisc > I mentioned in my original mail that this is not the final version: > > The patch below is just a hack. It will need a cleanup and is just to open > > discussions here on the list. > > But instead of cleaning it up right now, I would be very much interested if > this patch fixes the kernel stalls you see as well? I can't say that I see kernel stalls on any of my machines. However, I do see memory corruption that I have reported a number of times. I also see a user/group id bug that also could be a memory corruption bug (pam cache). I think there's a good possibility that your patch may help. I have a version installed on top of 2.6.30-rc6 on my rp3340. I haven't done a lot of testing with it, but it boots and I have done one kernel build with it. Didn't include lock in macro as you show below. I was looking a couple of the warnings: arch/parisc/kernel/time.c:184: warning: initialization from incompatible pointer type arch/parisc/kernel/irq.c:123: warning: passing argument 1 of 'cpumask_setall' from incompatible pointer type arch/parisc/kernel/irq.c:141: warning: passing argument 1 of 'cpumask_copy' from incompatible pointer type arch/parisc/kernel/irq.c:300: warning: passing argument 1 of 'cpumask_copy' from incompatible pointer type arch/parisc/kernel/irq.c:357: warning: passing argument 2 of 'cpumask_copy' from incompatible pointer type The first is due to the missing argument in read_cr16. I believe a fix was posted to the list. The latter warnings seem due to a change in the type of the affinity field of the irq_desc struct. > If it does, then I agree that we should add a "flags" argument to purge_tlb_xxx() functions, e.g. > - purge_tlb_start(lock, flags) > - purge_tlb_end(lock, flags) Dave -- J. David Anglin dave.anglin@nrc-cnrc.gc.ca National Research Council of Canada (613) 990-0752 (FAX: 952-6602) ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH, RFC] fix parisc runtime hangs wrt pa_tlb_lock 2009-05-22 20:13 [PATCH, RFC] fix parisc runtime hangs wrt pa_tlb_lock Helge Deller 2009-05-23 15:26 ` John David Anglin @ 2009-05-23 21:53 ` John David Anglin 2009-05-28 1:50 ` John David Anglin 2 siblings, 0 replies; 13+ messages in thread From: John David Anglin @ 2009-05-23 21:53 UTC (permalink / raw) To: Helge Deller; +Cc: linux-parisc > So, I think we need to use the irq-blocking functions for purge_tlb_start() and > purge_tlb_end(). I did tested the patch below, and since then I didn't faced one > single hang any longer. The preceeding comment says that disabling preemption should be sufficient. Disabling interrupts may be overkill. I have CONFIG_PREEMPT_NONE set both on my c3750 and rp3440. I'm starting to wonder if there is a preemption bug. Dave -- J. David Anglin dave.anglin@nrc-cnrc.gc.ca National Research Council of Canada (613) 990-0752 (FAX: 952-6602) ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH, RFC] fix parisc runtime hangs wrt pa_tlb_lock 2009-05-22 20:13 [PATCH, RFC] fix parisc runtime hangs wrt pa_tlb_lock Helge Deller 2009-05-23 15:26 ` John David Anglin 2009-05-23 21:53 ` John David Anglin @ 2009-05-28 1:50 ` John David Anglin 2009-06-06 21:25 ` Helge Deller 2 siblings, 1 reply; 13+ messages in thread From: John David Anglin @ 2009-05-28 1:50 UTC (permalink / raw) To: Helge Deller; +Cc: linux-parisc > Since many kernel versions I regularly faced reproducibly kernel hangs when > compiling some bigger files. The machine suddenly just seemed to hang. > > With spinlock debugging turned on I found this: > > BUG: spinlock recursion on CPU#0, tool/7263 > lock: 10644000, .magic: dead4ead, .owner: tool/7263, .owner_cpu: 0 > Backtrace: > [<10113a94>] show_stack+0x18/0x28 > > BUG: spinlock lockup on CPU#0, tool/7263, 10644000 > Backtrace: > [<10113a94>] show_stack+0x18/0x28 > > BUG: soft lockup - CPU#0 stuck for 61s! [tool:7263] > IASQ: 00000000 00000000 IAOQ: 102d55dc 102d557c > IIR: 03c008b3 ISR: 00000000 IOR: 00000000 > CPU: 0 CR30: 7d1a4000 CR31: 11111111 > ORIG_R28: 00000000 > IAOQ[0]: _raw_spin_lock+0x15c/0x1c0 > IAOQ[1]: _raw_spin_lock+0xfc/0x1c0 > RP(r2): _raw_spin_lock+0x18c/0x1c0 > Backtrace: > [<102d560c>] _raw_spin_lock+0x18c/0x1c0 > > Kernel panic - not syncing: softlockup: hung tasks > Backtrace: > [<10113a94>] show_stack+0x18/0x28 I have tested the proposed change using 2.6.30-rc6 and a modified version of 2.6.22.10. I tested using SMP kernels running on a rp3440 and UP kernels on a c3750. I also tested changing the macros to just use preempt_disable/preempt_enable. The patch doesn't cause any new problems as far as I can tell. However, it doesn't fix any of the problems that I currently see on these two machines. In particular, I see the occasional gcc testsuite timeout using SMP kernels. Programs that usually take a few seconds to run timeout after three minutes. These timeouts don't occur with UP kernels. On the rp3440, the spinlock is definitely needed. With just preempt_disable/preempt_enable, a crash occurs during bootstrap at the point unused memory is recovered. Thus, the tlb purge issue referred to in the preceeding comment affects more than just N class. On the otherhand, it doesn't seem necessary to disable interrupts during the purge with UP kernels. With SMP kernels, it would be nice to know if the lockup was caused by an interruption during the tlb purge, or a preemption issue. I have the sense that disabling interrupts is wrong. That is any given CPU can only generate one PxTLB inter processor broadcast at a time. Disabling interrupts could cause a deadlock if a TLB purge was needed while the purge code was executing. The other alternative is to allow the processor that holds the lock to enter the flush code. This would fix the deadlock. Don't know how to code this (atomic compare and exchange?). The preempt_disable/preempt_enable crash on the rp3440 made me wonder if all tlb purge operations are properly protected with the tlb spinlock. I think we need to look at flush_tlb_all_local and copy_user_page_asm. They don't seem protected. Dave -- J. David Anglin dave.anglin@nrc-cnrc.gc.ca National Research Council of Canada (613) 990-0752 (FAX: 952-6602) ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH, RFC] fix parisc runtime hangs wrt pa_tlb_lock 2009-05-28 1:50 ` John David Anglin @ 2009-06-06 21:25 ` Helge Deller 2009-06-14 22:05 ` Helge Deller 0 siblings, 1 reply; 13+ messages in thread From: Helge Deller @ 2009-06-06 21:25 UTC (permalink / raw) To: John David Anglin; +Cc: linux-parisc John David Anglin wrote: >> Since many kernel versions I regularly faced reproducibly kernel hangs when >> compiling some bigger files. The machine suddenly just seemed to hang. >> >> With spinlock debugging turned on I found this: >> >> BUG: spinlock recursion on CPU#0, tool/7263 >> lock: 10644000, .magic: dead4ead, .owner: tool/7263, .owner_cpu: 0 >> Backtrace: >> [<10113a94>] show_stack+0x18/0x28 >> >> BUG: spinlock lockup on CPU#0, tool/7263, 10644000 >> Backtrace: >> [<10113a94>] show_stack+0x18/0x28 >> >> BUG: soft lockup - CPU#0 stuck for 61s! [tool:7263] >> IASQ: 00000000 00000000 IAOQ: 102d55dc 102d557c >> IIR: 03c008b3 ISR: 00000000 IOR: 00000000 >> CPU: 0 CR30: 7d1a4000 CR31: 11111111 >> ORIG_R28: 00000000 >> IAOQ[0]: _raw_spin_lock+0x15c/0x1c0 >> IAOQ[1]: _raw_spin_lock+0xfc/0x1c0 >> RP(r2): _raw_spin_lock+0x18c/0x1c0 >> Backtrace: >> [<102d560c>] _raw_spin_lock+0x18c/0x1c0 >> >> Kernel panic - not syncing: softlockup: hung tasks >> Backtrace: >> [<10113a94>] show_stack+0x18/0x28 > > I have tested the proposed change using 2.6.30-rc6 and a modified version > of 2.6.22.10. I tested using SMP kernels running on a rp3440 and UP > kernels on a c3750. I also tested changing the macros to just use > preempt_disable/preempt_enable. > > The patch doesn't cause any new problems as far as I can tell. That's good. Thanks for testing ! > However, > it doesn't fix any of the problems that I currently see on these two > machines. In particular, I see the occasional gcc testsuite timeout > using SMP kernels. Programs that usually take a few seconds to run > timeout after three minutes. These timeouts don't occur with UP kernels. Ok, it fixes at least a kernel hang I faced regularily with my compilations... > On the rp3440, the spinlock is definitely needed. With just > preempt_disable/preempt_enable, a crash occurs during bootstrap > at the point unused memory is recovered. Thus, the tlb purge > issue referred to in the preceeding comment affects more than > just N class. > > On the otherhand, it doesn't seem necessary to disable interrupts > during the purge with UP kernels. My kernel crashes happened with UP-kernels on a B2000 (1 CPU). So, I still have the feeling that it's necessary to disable interrupts on UP kernels as well. The new cleaned-up attached patch below shows, that arch/parisc/kernel/pci-dma.c is affected. My kernel crashes always happened when the machine was pretty much loaded, during memory pressure when the system still wanted to load something from disk. In pci-dma.c the TLB flushing code is called from the DMA handlers, so it does indicate that IRQ would need to be disabled. One example I could imagine (with a UP-kernel): - Process A runs some code, gets into kernel and executes clear_user_page() and locks the pa_tlb_lock spinlock (inside the critical section). - Suddenly Process B gets data from disk via DMA handlers in pci-dma.c - DMA-Interrupt kicks in, the DMA handler tries to get the pa_tlb_lock and fails since process A still holds the lock -> machine deadlocks and kernel starts reporting about "hung tasks after 61seconds". (just search the mailing list for such reports..) So, even a UP-kernel is affected. > With SMP kernels, it would be nice > to know if the lockup was caused by an interruption during the tlb > purge, or a preemption issue. > > I have the sense that disabling interrupts is wrong. That is any > given CPU can only generate one PxTLB inter processor broadcast > at a time. Disabling interrupts could cause a deadlock if a TLB > purge was needed while the purge code was executing. > > The other alternative is to allow the processor that holds the lock > to enter the flush code. This would fix the deadlock. Don't know how > to code this (atomic compare and exchange?). > > The preempt_disable/preempt_enable crash on the rp3440 made me wonder > if all tlb purge operations are properly protected with the tlb spinlock. > I think we need to look at flush_tlb_all_local and copy_user_page_asm. > They don't seem protected. New patch attached. Helge diff --git a/arch/parisc/include/asm/tlbflush.h b/arch/parisc/include/asm/tlbflush.h index 1f6fd4f..217588e 100644 --- a/arch/parisc/include/asm/tlbflush.h +++ b/arch/parisc/include/asm/tlbflush.h @@ -18,8 +18,8 @@ */ extern spinlock_t pa_tlb_lock; -#define purge_tlb_start(x) spin_lock(&pa_tlb_lock) -#define purge_tlb_end(x) spin_unlock(&pa_tlb_lock) +#define purge_tlb_start(flags) spin_lock_irqsave(&pa_tlb_lock, flags) +#define purge_tlb_end(flags) spin_unlock_irqrestore(&pa_tlb_lock, flags) extern void flush_tlb_all(void); extern void flush_tlb_all_local(void *); @@ -64,13 +64,14 @@ static inline void flush_tlb_page(struct vm_area_struct *vma, unsigned long addr) { /* For one page, it's not worth testing the split_tlb variable */ + unsigned long flags; mb(); mtsp(vma->vm_mm->context,1); - purge_tlb_start(); + purge_tlb_start(flags); pdtlb(addr); pitlb(addr); - purge_tlb_end(); + purge_tlb_end(flags); } void __flush_tlb_range(unsigned long sid, diff --git a/arch/parisc/kernel/cache.c b/arch/parisc/kernel/cache.c index 837530e..e3c55f7 100644 --- a/arch/parisc/kernel/cache.c +++ b/arch/parisc/kernel/cache.c @@ -400,10 +400,11 @@ void clear_user_page_asm(void *page, unsigned long vaddr) { /* This function is implemented in assembly in pacache.S */ extern void __clear_user_page_asm(void *page, unsigned long vaddr); + unsigned long flags; - purge_tlb_start(); + purge_tlb_start(flags); __clear_user_page_asm(page, vaddr); - purge_tlb_end(); + purge_tlb_end(flags); } #define FLUSH_THRESHOLD 0x80000 /* 0.5MB */ @@ -444,20 +445,22 @@ extern void clear_user_page_asm(void *page, unsigned long vaddr); void clear_user_page(void *page, unsigned long vaddr, struct page *pg) { + unsigned long flags; purge_kernel_dcache_page((unsigned long)page); - purge_tlb_start(); + purge_tlb_start(flags); pdtlb_kernel(page); - purge_tlb_end(); + purge_tlb_end(flags); clear_user_page_asm(page, vaddr); } EXPORT_SYMBOL(clear_user_page); void flush_kernel_dcache_page_addr(void *addr) { + unsigned long flags; flush_kernel_dcache_page_asm(addr); - purge_tlb_start(); + purge_tlb_start(flags); pdtlb_kernel(addr); - purge_tlb_end(); + purge_tlb_end(flags); } EXPORT_SYMBOL(flush_kernel_dcache_page_addr); @@ -490,8 +493,9 @@ void __flush_tlb_range(unsigned long sid, unsigned long start, if (npages >= 512) /* 2MB of space: arbitrary, should be tuned */ flush_tlb_all(); else { + unsigned long flags; mtsp(sid, 1); - purge_tlb_start(); + purge_tlb_start(flags); if (split_tlb) { while (npages--) { pdtlb(start); @@ -504,7 +508,7 @@ void __flush_tlb_range(unsigned long sid, unsigned long start, start += PAGE_SIZE; } } - purge_tlb_end(); + purge_tlb_end(flags); } } diff --git a/arch/parisc/kernel/pci-dma.c b/arch/parisc/kernel/pci-dma.c index 7d927ea..b8ec5a0 100644 --- a/arch/parisc/kernel/pci-dma.c +++ b/arch/parisc/kernel/pci-dma.c @@ -90,12 +90,13 @@ static inline int map_pte_uncached(pte_t * pte, if (end > PMD_SIZE) end = PMD_SIZE; do { + unsigned long flags; if (!pte_none(*pte)) printk(KERN_ERR "map_pte_uncached: page already exists\n"); set_pte(pte, __mk_pte(*paddr_ptr, PAGE_KERNEL_UNC)); - purge_tlb_start(); + purge_tlb_start(flags); pdtlb_kernel(orig_vaddr); - purge_tlb_end(); + purge_tlb_end(flags); vaddr += PAGE_SIZE; orig_vaddr += PAGE_SIZE; (*paddr_ptr) += PAGE_SIZE; @@ -168,11 +169,12 @@ static inline void unmap_uncached_pte(pmd_t * pmd, unsigned long vaddr, if (end > PMD_SIZE) end = PMD_SIZE; do { + unsigned long flags; pte_t page = *pte; pte_clear(&init_mm, vaddr, pte); - purge_tlb_start(); + purge_tlb_start(flags); pdtlb_kernel(orig_vaddr); - purge_tlb_end(); + purge_tlb_end(flags); vaddr += PAGE_SIZE; orig_vaddr += PAGE_SIZE; pte++; ^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH, RFC] fix parisc runtime hangs wrt pa_tlb_lock 2009-06-06 21:25 ` Helge Deller @ 2009-06-14 22:05 ` Helge Deller 2009-06-14 23:20 ` John David Anglin 0 siblings, 1 reply; 13+ messages in thread From: Helge Deller @ 2009-06-14 22:05 UTC (permalink / raw) To: John David Anglin; +Cc: linux-parisc On 06/06/2009 11:25 PM, Helge Deller wrote: > John David Anglin wrote: >> On the rp3440, the spinlock is definitely needed. With just >> preempt_disable/preempt_enable, a crash occurs during bootstrap >> at the point unused memory is recovered. Thus, the tlb purge >> issue referred to in the preceeding comment affects more than >> just N class. >> >> On the otherhand, it doesn't seem necessary to disable interrupts >> during the purge with UP kernels. > > My kernel crashes happened with UP-kernels on a B2000 (1 CPU). > So, I still have the feeling that it's necessary to disable interrupts > on UP kernels as well. Hi Dave, I did further testing. Especially I wanted to clarify if disabling interrupts are necessary or not. To test it, I added a WARN_ON(in_interrupt()) to the purge_tlb_start() macro like this: +#define purge_tlb_start(flags) WARN_ON(in_interrupt()); spin_lock_irqsave(&pa_tlb_lock, flags) and did ran the compile-test which usually hang my system. The result is that we really need to disable interrupts. The WARN_ON() did triggered for me again as it always did hang the system. This is with a UP-kernel (2.6.30-rc8) on a UP-machine (B2000): Badness at arch/parisc/kernel/cache.c:461 YZrvWESTHLNXBCVMcbcbcbcbOGFRQPDI PSW: 00000000000001101111100100001111 Tainted: G W r00-03 0006f90f 10640000 10112760 106f4c40 r04-07 5eadc000 2418e30c 0004eadc 113336f0 r08-11 2418e31c 0000007c 106f4a88 7c5b3c7c r12-15 104c45ec 106c9da0 00001000 10000000 r16-19 00000fff 1067f13c 105e0560 00000100 r20-23 0000ff00 5eadc10a 5eadc100 00000040 r24-27 00000000 5eadcfc0 5eadd000 10640680 r28-31 106f4000 0000000a 106f4c80 471249e8 sr00-03 00000000 00000000 00000000 00000025 sr04-07 00000000 00000000 00000000 00000000 IASQ: 00000000 00000000 IAOQ: 10112774 10112778 IIR: 03ffe01f ISR: 0424013a IOR: b72dcfc0 CPU: 0 CR30: 106f4000 CR31: 11111111 ORIG_R28: 106f4dc0 IAOQ[0]: flush_kernel_dcache_page_addr+0x30/0xa0 IAOQ[1]: flush_kernel_dcache_page_addr+0x34/0xa0 RP(r2): flush_kernel_dcache_page_addr+0x1c/0xa0 Backtrace: [<10112760>] flush_kernel_dcache_page_addr+0x1c/0xa0 Sadly I didn't got a full backtrace. The WARN_ON() triggered around 20 times during the phase where my machine was pretty much loaded. Interestingly it was always inflush_kernel_dcache_page_addr(). So, I still think my patch at http://patchwork.kernel.org/patch/28458/ should be applied to all kernels. Helge ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH, RFC] fix parisc runtime hangs wrt pa_tlb_lock 2009-06-14 22:05 ` Helge Deller @ 2009-06-14 23:20 ` John David Anglin 2009-06-16 20:51 ` [PATCH] " Helge Deller 0 siblings, 1 reply; 13+ messages in thread From: John David Anglin @ 2009-06-14 23:20 UTC (permalink / raw) To: Helge Deller; +Cc: linux-parisc On Mon, 15 Jun 2009, Helge Deller wrote: > On 06/06/2009 11:25 PM, Helge Deller wrote: >> John David Anglin wrote: >>> On the rp3440, the spinlock is definitely needed. With just >>> preempt_disable/preempt_enable, a crash occurs during bootstrap >>> at the point unused memory is recovered. Thus, the tlb purge >>> issue referred to in the preceeding comment affects more than >>> just N class. >>> >>> On the otherhand, it doesn't seem necessary to disable interrupts >>> during the purge with UP kernels. >> >> My kernel crashes happened with UP-kernels on a B2000 (1 CPU). >> So, I still have the feeling that it's necessary to disable interrupts >> on UP kernels as well. > > Hi Dave, > > I did further testing. Especially I wanted to clarify if disabling > interrupts are necessary or not. > > To test it, I added a WARN_ON(in_interrupt()) to the purge_tlb_start() > macro like this: > +#define purge_tlb_start(flags) WARN_ON(in_interrupt()); > spin_lock_irqsave(&pa_tlb_lock, flags) > and did ran the compile-test which usually hang my system. > > The result is that we really need to disable interrupts. > The WARN_ON() did triggered for me again as it always did hang the system. > This is with a UP-kernel (2.6.30-rc8) on a UP-machine (B2000): > > > Badness at arch/parisc/kernel/cache.c:461 > > > > YZrvWESTHLNXBCVMcbcbcbcbOGFRQPDI > > PSW: 00000000000001101111100100001111 Tainted: G W > > r00-03 0006f90f 10640000 10112760 106f4c40 > > r04-07 5eadc000 2418e30c 0004eadc 113336f0 > > r08-11 2418e31c 0000007c 106f4a88 7c5b3c7c > > r12-15 104c45ec 106c9da0 00001000 10000000 > > r16-19 00000fff 1067f13c 105e0560 00000100 > > r20-23 0000ff00 5eadc10a 5eadc100 00000040 > > r24-27 00000000 5eadcfc0 5eadd000 10640680 > > r28-31 106f4000 0000000a 106f4c80 471249e8 > > sr00-03 00000000 00000000 00000000 00000025 > > sr04-07 00000000 00000000 00000000 00000000 > > > > IASQ: 00000000 00000000 IAOQ: 10112774 10112778 > > IIR: 03ffe01f ISR: 0424013a IOR: b72dcfc0 > > CPU: 0 CR30: 106f4000 CR31: 11111111 > > ORIG_R28: 106f4dc0 > > IAOQ[0]: flush_kernel_dcache_page_addr+0x30/0xa0 > > IAOQ[1]: flush_kernel_dcache_page_addr+0x34/0xa0 > > RP(r2): flush_kernel_dcache_page_addr+0x1c/0xa0 > > Backtrace: > > [<10112760>] flush_kernel_dcache_page_addr+0x1c/0xa0 > > > > Sadly I didn't got a full backtrace. > The WARN_ON() triggered around 20 times during the phase where my machine > was > pretty much loaded. > Interestingly it was always inflush_kernel_dcache_page_addr(). > > So, I still think my patch at http://patchwork.kernel.org/patch/28458/ > should > be applied to all kernels. I am convinced that interrupts need to be disabled on SMP kernels to prevent deadlock. On UP kernels, I am not convinced that anything bad happens if we do a tlb purge while handling an interrupt since we don't have to worry about preventing bus conflicts. The UP code can't deadlock. I'm thinking that we can stay with disabling preemption. Dave -- J. David Anglin dave.anglin@nrc-cnrc.gc.ca National Research Council of Canada (613) 990-0752 (FAX: 952-6602) ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] fix parisc runtime hangs wrt pa_tlb_lock 2009-06-14 23:20 ` John David Anglin @ 2009-06-16 20:51 ` Helge Deller 2009-06-17 2:55 ` Kyle McMartin 0 siblings, 1 reply; 13+ messages in thread From: Helge Deller @ 2009-06-16 20:51 UTC (permalink / raw) To: John David Anglin, Kyle McMartin; +Cc: linux-parisc On 06/15/2009 01:20 AM, John David Anglin wrote: > I am convinced that interrupts need to be disabled on SMP kernels to > prevent deadlock. On UP kernels, I am not convinced that anything bad > happens if we do a tlb purge while handling an interrupt since we don't > have to worry about preventing bus conflicts. The UP code can't > deadlock. I'm thinking that we can stay with disabling preemption. Dave is right and it took me long to understand his point... I continued testing and we can simply just disable preemption for UP kernels and use an irq-safe spinlock for SMP kernels. Below is the latest and greatest patch which fixes this bug. Run-tested onUP kernels and compile-tested on SMP kernels. Kyle, it would be nice if you could apply it as soon as possible. The bug is existing since a long time, so it might make sense to backport it to older kernels as well... ----------- Patch: [parisc] fix system hangs due to unsafe locking of the TLB-protection spinlock Importance: high, fixes complete kernel stalls The TLB flushing functions on hppa, which causes PxTLB broadcasts on the system bus, needs to be protected by irq-safe spinlocks to avoid irq handlers to deadlock the kernel. The deadlocks only happened during I/O intensive loads and triggered pretty seldom, which is why this bug went so long unnoticed. Fix this bug by using spin_lock_irqsafe() for SMP kernels and preempt_disable() for UP kernels. Signed-off-by: Helge Deller <deller@gmx.de> diff --git a/arch/parisc/include/asm/tlbflush.h b/arch/parisc/include/asm/tlbflush.h index 1f6fd4f..3d82fc8 100644 --- a/arch/parisc/include/asm/tlbflush.h +++ b/arch/parisc/include/asm/tlbflush.h @@ -10,16 +10,21 @@ /* This is for the serialisation of PxTLB broadcasts. At least on the * N class systems, only one PxTLB inter processor broadcast can be - * active at any one time on the Merced bus. This tlb purge - * synchronisation is fairly lightweight and harmless so we activate - * it on all SMP systems not just the N class. We also need to have - * preemption disabled on uniprocessor machines, and spin_lock does that - * nicely. + * active at any one time on the Merced bus. To be on the safe side, we + * unconditionally protect those broadcasts for all SMP kernels with a + * spinlock. + * On uniprocessor machines we save this overhead by just disabling + * preemption. */ -extern spinlock_t pa_tlb_lock; -#define purge_tlb_start(x) spin_lock(&pa_tlb_lock) -#define purge_tlb_end(x) spin_unlock(&pa_tlb_lock) +#ifdef CONFIG_SMP +extern spinlock_t pa_tlb_lock; +# define purge_tlb_start(flags) spin_lock_irqsave(&pa_tlb_lock, flags) +# define purge_tlb_end(flags) spin_unlock_irqrestore(&pa_tlb_lock, flags) +#else +# define purge_tlb_start(flags) preempt_disable() +# define purge_tlb_end(flags) preempt_enable() +#endif extern void flush_tlb_all(void); extern void flush_tlb_all_local(void *); @@ -64,13 +69,14 @@ static inline void flush_tlb_page(struct vm_area_struct *vma, unsigned long addr) { /* For one page, it's not worth testing the split_tlb variable */ + unsigned long flags __maybe_unused; mb(); mtsp(vma->vm_mm->context,1); - purge_tlb_start(); + purge_tlb_start(flags); pdtlb(addr); pitlb(addr); - purge_tlb_end(); + purge_tlb_end(flags); } void __flush_tlb_range(unsigned long sid, diff --git a/arch/parisc/kernel/cache.c b/arch/parisc/kernel/cache.c index 837530e..9c85e4f 100644 --- a/arch/parisc/kernel/cache.c +++ b/arch/parisc/kernel/cache.c @@ -35,12 +35,14 @@ int icache_stride __read_mostly; EXPORT_SYMBOL(dcache_stride); +#ifdef CONFIG_SMP /* On some machines (e.g. ones with the Merced bus), there can be * only a single PxTLB broadcast at a time; this must be guaranteed - * by software. We put a spinlock around all TLB flushes to - * ensure this. + * by software. We put a spinlock around all TLB flushes on SMP + * kernels to ensure this. */ DEFINE_SPINLOCK(pa_tlb_lock); +#endif struct pdc_cache_info cache_info __read_mostly; #ifndef CONFIG_PA20 @@ -400,10 +402,11 @@ void clear_user_page_asm(void *page, unsigned long vaddr) { /* This function is implemented in assembly in pacache.S */ extern void __clear_user_page_asm(void *page, unsigned long vaddr); + unsigned long flags __maybe_unused; - purge_tlb_start(); + purge_tlb_start(flags); __clear_user_page_asm(page, vaddr); - purge_tlb_end(); + purge_tlb_end(flags); } #define FLUSH_THRESHOLD 0x80000 /* 0.5MB */ @@ -444,20 +447,22 @@ extern void clear_user_page_asm(void *page, unsigned long vaddr); void clear_user_page(void *page, unsigned long vaddr, struct page *pg) { + unsigned long flags __maybe_unused; purge_kernel_dcache_page((unsigned long)page); - purge_tlb_start(); + purge_tlb_start(flags); pdtlb_kernel(page); - purge_tlb_end(); + purge_tlb_end(flags); clear_user_page_asm(page, vaddr); } EXPORT_SYMBOL(clear_user_page); void flush_kernel_dcache_page_addr(void *addr) { + unsigned long flags __maybe_unused; flush_kernel_dcache_page_asm(addr); - purge_tlb_start(); + purge_tlb_start(flags); pdtlb_kernel(addr); - purge_tlb_end(); + purge_tlb_end(flags); } EXPORT_SYMBOL(flush_kernel_dcache_page_addr); @@ -490,8 +495,9 @@ void __flush_tlb_range(unsigned long sid, unsigned long start, if (npages >= 512) /* 2MB of space: arbitrary, should be tuned */ flush_tlb_all(); else { + unsigned long flags __maybe_unused; mtsp(sid, 1); - purge_tlb_start(); + purge_tlb_start(flags); if (split_tlb) { while (npages--) { pdtlb(start); @@ -504,7 +510,7 @@ void __flush_tlb_range(unsigned long sid, unsigned long start, start += PAGE_SIZE; } } - purge_tlb_end(); + purge_tlb_end(flags); } } diff --git a/arch/parisc/kernel/pci-dma.c b/arch/parisc/kernel/pci-dma.c index 7d927ea..dbe6c8e 100644 --- a/arch/parisc/kernel/pci-dma.c +++ b/arch/parisc/kernel/pci-dma.c @@ -90,12 +90,13 @@ static inline int map_pte_uncached(pte_t * pte, if (end > PMD_SIZE) end = PMD_SIZE; do { + unsigned long flags __maybe_unused; if (!pte_none(*pte)) printk(KERN_ERR "map_pte_uncached: page already exists\n"); set_pte(pte, __mk_pte(*paddr_ptr, PAGE_KERNEL_UNC)); - purge_tlb_start(); + purge_tlb_start(flags); pdtlb_kernel(orig_vaddr); - purge_tlb_end(); + purge_tlb_end(flags); vaddr += PAGE_SIZE; orig_vaddr += PAGE_SIZE; (*paddr_ptr) += PAGE_SIZE; @@ -168,11 +169,12 @@ static inline void unmap_uncached_pte(pmd_t * pmd, unsigned long vaddr, if (end > PMD_SIZE) end = PMD_SIZE; do { + unsigned long flags __maybe_unused; pte_t page = *pte; pte_clear(&init_mm, vaddr, pte); - purge_tlb_start(); + purge_tlb_start(flags); pdtlb_kernel(orig_vaddr); - purge_tlb_end(); + purge_tlb_end(flags); vaddr += PAGE_SIZE; orig_vaddr += PAGE_SIZE; pte++; ^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH] fix parisc runtime hangs wrt pa_tlb_lock 2009-06-16 20:51 ` [PATCH] " Helge Deller @ 2009-06-17 2:55 ` Kyle McMartin 2009-06-17 7:26 ` Helge Deller 0 siblings, 1 reply; 13+ messages in thread From: Kyle McMartin @ 2009-06-17 2:55 UTC (permalink / raw) To: Helge Deller; +Cc: John David Anglin, Kyle McMartin, linux-parisc On Tue, Jun 16, 2009 at 10:51:48PM +0200, Helge Deller wrote: > On 06/15/2009 01:20 AM, John David Anglin wrote: >> I am convinced that interrupts need to be disabled on SMP kernels to >> prevent deadlock. On UP kernels, I am not convinced that anything bad >> happens if we do a tlb purge while handling an interrupt since we don't >> have to worry about preventing bus conflicts. The UP code can't >> deadlock. I'm thinking that we can stay with disabling preemption. > > Dave is right and it took me long to understand his point... > I continued testing and we can simply just disable preemption for UP > kernels and use an irq-safe spinlock for SMP kernels. > > Below is the latest and greatest patch which fixes this bug. > Run-tested onUP kernels and compile-tested on SMP kernels. > Ugh, I'd really prefer it if we just took the irq disable and enable hit on UP by making the codepaths the same... Is that ok with you or? Kyle > Kyle, it would be nice if you could apply it as soon as possible. > The bug is existing since a long time, so it might make sense to backport > it to older kernels as well... > ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] fix parisc runtime hangs wrt pa_tlb_lock 2009-06-17 2:55 ` Kyle McMartin @ 2009-06-17 7:26 ` Helge Deller 0 siblings, 0 replies; 13+ messages in thread From: Helge Deller @ 2009-06-17 7:26 UTC (permalink / raw) To: Kyle McMartin; +Cc: John David Anglin, linux-parisc On 06/17/2009 04:55 AM, Kyle McMartin wrote: > On Tue, Jun 16, 2009 at 10:51:48PM +0200, Helge Deller wrote: >> On 06/15/2009 01:20 AM, John David Anglin wrote: >>> I am convinced that interrupts need to be disabled on SMP kernels to >>> prevent deadlock. On UP kernels, I am not convinced that anything bad >>> happens if we do a tlb purge while handling an interrupt since we don't >>> have to worry about preventing bus conflicts. The UP code can't >>> deadlock. I'm thinking that we can stay with disabling preemption. >> Dave is right and it took me long to understand his point... >> I continued testing and we can simply just disable preemption for UP >> kernels and use an irq-safe spinlock for SMP kernels. >> >> Below is the latest and greatest patch which fixes this bug. >> Run-tested onUP kernels and compile-tested on SMP kernels. >> > > Ugh, I'd really prefer it if we just took the irq disable and enable hit > on UP by making the codepaths the same... Is that ok with you or? You mean to use the irq-safe spinlocking path for SMP _and_ UP? Yes, that would be ok for me too. We just thought why we should try to save some cycles on UP if possible, as the UP-kernel often is used on slower machines where it would be beneficial for them... If yes, do you want me to send an updated patch? Helge ^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2009-06-17 7:26 UTC | newest] Thread overview: 13+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2009-05-22 20:13 [PATCH, RFC] fix parisc runtime hangs wrt pa_tlb_lock Helge Deller 2009-05-23 15:26 ` John David Anglin 2009-05-23 15:34 ` John David Anglin 2009-05-23 19:34 ` Helge Deller 2009-05-23 20:03 ` John David Anglin 2009-05-23 21:53 ` John David Anglin 2009-05-28 1:50 ` John David Anglin 2009-06-06 21:25 ` Helge Deller 2009-06-14 22:05 ` Helge Deller 2009-06-14 23:20 ` John David Anglin 2009-06-16 20:51 ` [PATCH] " Helge Deller 2009-06-17 2:55 ` Kyle McMartin 2009-06-17 7:26 ` Helge Deller
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox