* [PATCH -rt] Preemption problem in kernel RT Patch
@ 2007-06-21 19:39 Beauchemin, Mark
2007-06-22 8:03 ` Thomas Gleixner
2009-10-14 12:55 ` dtslinux
0 siblings, 2 replies; 15+ messages in thread
From: Beauchemin, Mark @ 2007-06-21 19:39 UTC (permalink / raw)
To: linux-kernel; +Cc: mingo, tglx
Hi,
I've found a preemption problem in kernel/rtmutex.c:649. The BUG_ON listed in the patch below makes sure a preemption event hasn't occurred since the thread last checked the owner of the lock. If it did happen and the current task is now the owner, it asserts with BUG_ON. With the RT-PATCH applied, however, interrupts are not disabled and preemption is possible. The following patch removes the BUG_ON as it is an incorrect check in the rt kernel. I've checked the rtmutex code and it appears to handle this case just fine..
Thanks,
Mark Beauchemin
Here's the patch:
--- linux-2.6.21.3-rt9/kernel/rtmutex.c 2007-06-01 15:21:12.000000000 -0400
+++ linux-2.6.21.3-rt9_new/kernel/rtmutex.c 2007-06-20 12:15:44.000000000 -0400
@@ -646,7 +646,7 @@
return;
}
- BUG_ON(rt_mutex_owner(lock) == current);
+/* BUG_ON(rt_mutex_owner(lock) == current); */
/*
* Here we save whatever state the task was in originally,
Here's the bug assertion:
: ------------[ cut here ]------------
Kernel BUG at c01c7bc4 [verbose debug info unavailable]
Oops: Exception in kernel mode, sig: 5 [#1]
PREEMPT
NIP: C01C7BC4 LR: C01C7BA8 CTR: C01531CCJun 1 23:06:51 BC122 kern.warn kernel: REGS: d010ba90 TRAP: 0700 Tainted: P
MSR: 00021000 <ME> CR: 24002082 XER: 00000000
TASK = d0100920[5] 'softirq-timer/0' THREAD: d010a000
GPR00: 00000001 D010BB40 D0100920 00000000 00000030 00000002 C0260000 00029000
GPR08: D0100920 00000000 D2C07970 D0100920 9F472D4B 00121868 C0220000 00000000
GPR16: 00000000 000F422C 29000000 0000003B 9ACA0000 C026562C D010A028 00004000
GPR24: D191F898 00000000 D2C1CAC8 D2C1CB60 D2C07960 CEA752C0 D010A000 00029000
NIP [C01C7BC4] rt_spin_lock_slowlock+0x60/0x1f8
LR [C01C7BA8] rt_spin_lock_slowlock+0x44/0x1f8
Call Trace:
[D010BB40] [C01C7BA8] rt_spin_lock_slowlock+0x44/0x1f8 (unreliable)
[D010BB90] [C0153464] dev_queue_xmit+0x298/0x2a0 Tunnel2
[D010BBB0] [C0176398] ip_output+0x288/0x2dc
[D010BBE0] [C01AC078] ipip_tunnel_xmit+0x508/0x698
[D010BC60] [C0150DF4] dev_hard_start_xmit+0x1b4/0x2a4
[D010BC80] [C0153430] dev_queue_xmit+0x264/0x2a0 Tunnel4
[D010BCA0] [C0176398] ip_output+0x288/0x2dc
[D010BCD0] [C01AC078] ipip_tunnel_xmit+0x508/0x698
[D010BD50] [C0150DF4] dev_hard_start_xmit+0x1b4/0x2a4
[D010BD70] [C0153430] dev_queue_xmit+0x264/0x2a0 Tunnel2
[D010BD90] [C0176398] ip_output+0x288/0x2dc
[D010BDC0] [C017685C] ip_queue_xmit+0x1ac/0x4e4
[D010BE30] [C018762C] tcp_transmit_skb+0x390/0x810
[D010BE70] [C018882C] tcp_retransmit_skb+0x160/0x638
[D010BEA0] [C018BA5C] tcp_write_timer+0x274/0x6c0
[D010BED0] [C0024314] run_timer_softirq+0x2d0/0xedc
[D010BF80] [C001F1C4] ksoftirqd+0xf8/0x1b0
[D010BFC0] [C0031588] kthread+0xc0/0xfc
[D010BFF0] [C000471C] kernel_thread+0x44/0x60
Instruction dump:
913e000c 80030004 2f800000 419e0188 4be73599 2f830000 409e0144 801c0010
5400003a 7c001278 7c000034 5400d97e <0f000000> 39200004 7f401028 7d20112d
note: softirq-timer/0[5] exited with preempt_count 1
^ permalink raw reply [flat|nested] 15+ messages in thread* Re: [PATCH -rt] Preemption problem in kernel RT Patch 2007-06-21 19:39 [PATCH -rt] Preemption problem in kernel RT Patch Beauchemin, Mark @ 2007-06-22 8:03 ` Thomas Gleixner 2007-06-23 14:08 ` Beauchemin, Mark 2009-10-14 12:55 ` dtslinux 1 sibling, 1 reply; 15+ messages in thread From: Thomas Gleixner @ 2007-06-22 8:03 UTC (permalink / raw) To: Beauchemin, Mark; +Cc: linux-kernel, mingo, David Miller Mark, please fix your mail client to do proper line wraps at column 78. On Thu, 2007-06-21 at 15:39 -0400, Beauchemin, Mark wrote: > Hi, > I've found a preemption problem in kernel/rtmutex.c:649. The BUG_ON > listed in the patch below makes sure a preemption event hasn't > occurred since the thread last checked the owner of the lock. If it > did happen and the current task is now the owner, it asserts with > BUG_ON. With the RT-PATCH applied, however, interrupts are not > disabled and preemption is possible. The following patch removes the > BUG_ON as it is an incorrect check in the rt kernel. I've checked the > rtmutex code and it appears to handle this case just fine.. Nice, but nevertheless wrong theory. This check is part of the RT-Patch and it _is_ entirely correct: Something tries to do a spin_lock() on a lock, which the same task has already locked before. That's what the BUG_ON is catching. There is nothing which can make a task magically the owner of a lock, whether preemption is enabled or not. > Call Trace: > [D010BB40] [C01C7BA8] rt_spin_lock_slowlock+0x44/0x1f8 (unreliable) > [D010BB90] [C0153464] dev_queue_xmit+0x298/0x2a0 Tunnel2 > [D010BBB0] [C0176398] ip_output+0x288/0x2dc > [D010BBE0] [C01AC078] ipip_tunnel_xmit+0x508/0x698 > [D010BC60] [C0150DF4] dev_hard_start_xmit+0x1b4/0x2a4 > [D010BC80] [C0153430] dev_queue_xmit+0x264/0x2a0 Tunnel4 > [D010BCA0] [C0176398] ip_output+0x288/0x2dc > [D010BCD0] [C01AC078] ipip_tunnel_xmit+0x508/0x698 > [D010BD50] [C0150DF4] dev_hard_start_xmit+0x1b4/0x2a4 > [D010BD70] [C0153430] dev_queue_xmit+0x264/0x2a0 Tunnel2 > [D010BD90] [C0176398] ip_output+0x288/0x2dc > [D010BDC0] [C017685C] ip_queue_xmit+0x1ac/0x4e4 > [D010BE30] [C018762C] tcp_transmit_skb+0x390/0x810 > [D010BE70] [C018882C] tcp_retransmit_skb+0x160/0x638 > [D010BEA0] [C018BA5C] tcp_write_timer+0x274/0x6c0 > [D010BED0] [C0024314] run_timer_softirq+0x2d0/0xedc > [D010BF80] [C001F1C4] ksoftirqd+0xf8/0x1b0 > [D010BFC0] [C0031588] kthread+0xc0/0xfc > [D010BFF0] [C000471C] kernel_thread+0x44/0x60 Looks like the tunnel code is doing a nasty recursive thing. Dave, any idea ? Mark, can you please turn on CONFIG_PROVE_LOCKING. This should produce more detailed information about the problem. tglx ^ permalink raw reply [flat|nested] 15+ messages in thread
* RE: [PATCH -rt] Preemption problem in kernel RT Patch 2007-06-22 8:03 ` Thomas Gleixner @ 2007-06-23 14:08 ` Beauchemin, Mark 2007-06-23 14:26 ` Thomas Gleixner 0 siblings, 1 reply; 15+ messages in thread From: Beauchemin, Mark @ 2007-06-23 14:08 UTC (permalink / raw) To: Thomas Gleixner; +Cc: linux-kernel, mingo, David Miller Thomas, > please fix your mail client to do proper line wraps at column 78. Outlook sucks. I'll install thunderbird this weekend. sorry. > Nice, but nevertheless wrong theory. > > This check is part of the RT-Patch and it _is_ entirely correct: > Something tries to do a spin_lock() on a lock, which the same task has > already locked before. That's what the BUG_ON is catching. > > There is nothing which can make a task magically the owner of a lock, > whether preemption is enabled or not. Thanks for straightening me out. I was reading the function try_to_take_rt_mutex wrong... The problem makes more sense now. The tunnel code encapsulates the current packet in a new packet and calls ip_output to get it to the destination. If the routing table is changing(which I'm doing when this happens) it could be called recursively. The tunnel code tries to handle recursion at the top of ipip_tunnel_xmit: if (tunnel->recursion++) { tunnel->stat.collisions++; goto tx_error; } The problem is it tries to take dev->lock which it already owns in dev_queue_xmit before the check for recursion. Unfortunately, every time I put in debug to see the routing changes which cause the bug, it doesn't happen. I'll certainly try to reproduce it with CONFIG_PROVE_LOCKING on, but it won't be till end of next week as we have a release going out. Thanks for your help, Mark Beauchemin ^ permalink raw reply [flat|nested] 15+ messages in thread
* RE: [PATCH -rt] Preemption problem in kernel RT Patch 2007-06-23 14:08 ` Beauchemin, Mark @ 2007-06-23 14:26 ` Thomas Gleixner 2007-07-24 15:48 ` Beauchemin, Mark 0 siblings, 1 reply; 15+ messages in thread From: Thomas Gleixner @ 2007-06-23 14:26 UTC (permalink / raw) To: Beauchemin, Mark; +Cc: linux-kernel, mingo, David Miller Mark, On Sat, 2007-06-23 at 10:08 -0400, Beauchemin, Mark wrote: > Thanks for straightening me out. I was reading the function > try_to_take_rt_mutex wrong... The problem makes more sense now. The tunnel > code encapsulates the current packet in a new packet and calls ip_output > to get it to the destination. If the routing table is changing(which > I'm doing when this happens) it could be called recursively. The tunnel > code tries to handle recursion at the top of ipip_tunnel_xmit: > > if (tunnel->recursion++) { > tunnel->stat.collisions++; > goto tx_error; > } > > The problem is it tries to take dev->lock which it already owns in > dev_queue_xmit before the check for recursion. Hmm, this sounds scary. On a vanilla kernel (with debugging disabled), this code will simply deadlock. Do you have a test case? If you need more help, please contact the netdev folks (netdev@vger.kernel.org). > Unfortunately, every time I put in debug to see the routing > changes which cause the bug, it doesn't happen. I'll certainly try to > reproduce it with CONFIG_PROVE_LOCKING on, but it won't be till end of next > week as we have a release going out. Well, you won't see much more than you already debugged. You see the place where the lock was taken and the call trace of the function in the same way you have seen it with the BUG_ON(). tglx ^ permalink raw reply [flat|nested] 15+ messages in thread
* RE: [PATCH -rt] Preemption problem in kernel RT Patch 2007-06-23 14:26 ` Thomas Gleixner @ 2007-07-24 15:48 ` Beauchemin, Mark 2007-07-24 19:15 ` Ingo Molnar 0 siblings, 1 reply; 15+ messages in thread From: Beauchemin, Mark @ 2007-07-24 15:48 UTC (permalink / raw) To: Thomas Gleixner; +Cc: linux-kernel, mingo, David Miller Thomas, I think I've gotten to the heart of the problem. Here's an excerpt from the latest -rt patch: net/core/dev.c in the function dev_queue_xmit @@ -1588,11 +1588,17 @@ gso: Either shot noqueue qdisc, it is even simpler 8) */ if (dev->flags & IFF_UP) { - int cpu = smp_processor_id(); /* ok because BHs are off */ + /* + * No need to check for recursion with threaded interrupts: + */ +#ifdef CONFIG_PREEMPT_RT + if (1) { +#else + int cpu = raw_smp_processor_id(); /* ok because BHs are off */ if (dev->xmit_lock_owner != cpu) { - - HARD_TX_LOCK(dev, cpu); +#endif + HARD_TX_LOCK(dev); if (!netif_queue_stopped(dev) && !netif_subqueue_stopped(dev, skb->queue_mapping)) { I'm not sure why the check for recursion has been removed. In the backtrace below, I think it would be caught by this check and not recursively call the spinlock code. > Call Trace: > [D010BB40] [C01C7BA8] rt_spin_lock_slowlock+0x44/0x1f8 (unreliable) > [D010BB90] [C0153464] dev_queue_xmit+0x298/0x2a0 Tunnel2 > [D010BBB0] [C0176398] ip_output+0x288/0x2dc > [D010BBE0] [C01AC078] ipip_tunnel_xmit+0x508/0x698 > [D010BC60] [C0150DF4] dev_hard_start_xmit+0x1b4/0x2a4 > [D010BC80] [C0153430] dev_queue_xmit+0x264/0x2a0 Tunnel4 > [D010BCA0] [C0176398] ip_output+0x288/0x2dc > [D010BCD0] [C01AC078] ipip_tunnel_xmit+0x508/0x698 > [D010BD50] [C0150DF4] dev_hard_start_xmit+0x1b4/0x2a4 > [D010BD70] [C0153430] dev_queue_xmit+0x264/0x2a0 Tunnel2 > [D010BD90] [C0176398] ip_output+0x288/0x2dc > [D010BDC0] [C017685C] ip_queue_xmit+0x1ac/0x4e4 > [D010BE30] [C018762C] tcp_transmit_skb+0x390/0x810 > [D010BE70] [C018882C] tcp_retransmit_skb+0x160/0x638 > [D010BEA0] [C018BA5C] tcp_write_timer+0x274/0x6c0 > [D010BED0] [C0024314] run_timer_softirq+0x2d0/0xedc > [D010BF80] [C001F1C4] ksoftirqd+0xf8/0x1b0 > [D010BFC0] [C0031588] kthread+0xc0/0xfc > [D010BFF0] [C000471C] kernel_thread+0x44/0x60 I found one other place in the code which appears to do the same thing. Although it is written to handle smp collisions, I think it should also handle the error case above. Index: linux-rt-rebase.q/net/sched/sch_generic.c =================================================================== --- linux-rt-rebase.q.orig/net/sched/sch_generic.c +++ linux-rt-rebase.q/net/sched/sch_generic.c @@ -12,6 +12,7 @@ */ #include <linux/bitops.h> +#include <linux/kallsyms.h> #include <linux/module.h> #include <linux/types.h> #include <linux/kernel.h> @@ -150,16 +151,28 @@ static inline int qdisc_restart(struct n */ lockless = (dev->features & NETIF_F_LLTX); - if (!lockless && !netif_tx_trylock(dev)) { - /* Another CPU grabbed the driver tx lock */ - return handle_dev_cpu_collision(skb, dev, q); + if (!lockless) { +#ifdef CONFIG_PREEMPT_RT + netif_tx_lock(dev); +#else + if (netif_tx_trylock(dev)) + /* Another CPU grabbed the driver tx lock */ + return handle_dev_cpu_collision(skb, dev, q); +#endif What do you think? Thanks, Mark ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH -rt] Preemption problem in kernel RT Patch 2007-07-24 15:48 ` Beauchemin, Mark @ 2007-07-24 19:15 ` Ingo Molnar 2007-07-24 19:36 ` Beauchemin, Mark 2007-08-01 14:15 ` Beauchemin, Mark 0 siblings, 2 replies; 15+ messages in thread From: Ingo Molnar @ 2007-07-24 19:15 UTC (permalink / raw) To: Beauchemin, Mark; +Cc: Thomas Gleixner, linux-kernel, David Miller * Beauchemin, Mark <Mark.Beauchemin@sycamorenet.com> wrote: > I'm not sure why the check for recursion has been removed. In the > backtrace below, I think it would be caught by this check and not > recursively call the spinlock code. ah ... i think i did it like that because i didnt realize that there would be a recursive call sequence, i was concentrating on recursive locking. incidentally, this code got cleaned up in .23-rc1-rt0, and now it looks quite similar to your suggested fix. Could you double-check that it solves your problem? Ingo ^ permalink raw reply [flat|nested] 15+ messages in thread
* RE: [PATCH -rt] Preemption problem in kernel RT Patch 2007-07-24 19:15 ` Ingo Molnar @ 2007-07-24 19:36 ` Beauchemin, Mark 2007-08-01 14:15 ` Beauchemin, Mark 1 sibling, 0 replies; 15+ messages in thread From: Beauchemin, Mark @ 2007-07-24 19:36 UTC (permalink / raw) To: Ingo Molnar; +Cc: Thomas Gleixner, linux-kernel, David Miller Ingo, > ah ... i think i did it like that because i didnt realize that there > would be a recursive call sequence, i was concentrating on recursive > locking. That makes sense. It doesn't seem right that it gets called recursive in the first place. I'm not sure, however, if the tunnel code can help it as it uses ip_output to send packets. > incidentally, this code got cleaned up in .23-rc1-rt0, and now it looks > quite similar to your suggested fix. Could you double-check that it > solves your problem? I've downloaded .23-rc1-rt0. It appears the issue remains in both places as seen below: net/core/dev.c: dev_queue_xmit #ifdef CONFIG_PREEMPT_RT if (1) { #else int cpu = raw_smp_processor_id(); /* ok because BHs are off */ if (dev->xmit_lock_owner != cpu) { #endif and net/sched/sch_generic.c: qdisc_restart #ifdef CONFIG_PREEMPT_RT netif_tx_lock(dev); #else if (netif_tx_trylock(dev)) /* Another CPU grabbed the driver tx lock */ return handle_dev_cpu_collision(skb, dev, q); #endif Mark -----Original Message----- From: Ingo Molnar [mailto:mingo@elte.hu] Sent: Tuesday, July 24, 2007 3:15 PM To: Beauchemin, Mark Cc: Thomas Gleixner; linux-kernel@vger.kernel.org; David Miller Subject: Re: [PATCH -rt] Preemption problem in kernel RT Patch * Beauchemin, Mark <Mark.Beauchemin@sycamorenet.com> wrote: > I'm not sure why the check for recursion has been removed. In the > backtrace below, I think it would be caught by this check and not > recursively call the spinlock code. ah ... i think i did it like that because i didnt realize that there would be a recursive call sequence, i was concentrating on recursive locking. incidentally, this code got cleaned up in .23-rc1-rt0, and now it looks quite similar to your suggested fix. Could you double-check that it solves your problem? Ingo ^ permalink raw reply [flat|nested] 15+ messages in thread
* RE: [PATCH -rt] Preemption problem in kernel RT Patch 2007-07-24 19:15 ` Ingo Molnar 2007-07-24 19:36 ` Beauchemin, Mark @ 2007-08-01 14:15 ` Beauchemin, Mark 2007-08-01 14:22 ` Beauchemin, Mark 1 sibling, 1 reply; 15+ messages in thread From: Beauchemin, Mark @ 2007-08-01 14:15 UTC (permalink / raw) To: Ingo Molnar; +Cc: Thomas Gleixner, linux-kernel, David Miller Ingo, I tried just removing the CONFIG_PREEMPT_RT code, but that drops packets if another task has the lock. Here's the debug printouts: <4>xmit_lock_owner owned by sigd not softirq-net-rx/ <4>xmit_lock_owner owned by sigd not softirq-net-rx/ <4>xmit_lock_owner owned by sigd not softirq-net-rx/ Our quality department has been testing the patch below for a few days and has not seen any problems. It pretty much preserves the original -rt patch pieces, but adds recursive checking. I changed xmit_lock_owner to a void * as it is now a pointer to the task which owns the lock. What do you think? Thanks, Mark diff -ur linux-2.6.23-rc1-rt0/include/linux/netdevice.h linux-2.6.23-rc1-rt0_new/include/linux/netdevice.h --- linux-2.6.23-rc1-rt0/include/linux/netdevice.h 2007-07-24 15:17:07.000000000 -0400 +++ linux-2.6.23-rc1-rt0_new/include/linux/netdevice.h 2007-08-01 09:01:32.000000000 -0400 @@ -468,7 +468,11 @@ /* cpu id of processor entered to hard_start_xmit or -1, if nobody entered there. */ - int xmit_lock_owner; +#ifdef CONFIG_PREEMPT_RT + void * xmit_lock_owner; +#else + int xmit_lock_owner; +#endif void *priv; /* pointer to private data */ int (*hard_start_xmit) (struct sk_buff *skb, struct net_device *dev); @@ -1041,32 +1045,54 @@ static inline void netif_tx_lock(struct net_device *dev) { spin_lock(&dev->_xmit_lock); - dev->xmit_lock_owner = raw_smp_processor_id(); +#ifdef CONFIG_PREEMPT_RT + dev->xmit_lock_owner = (void *)current; +#else + dev->xmit_lock_owner = raw_smp_processor_id(); +#endif } static inline void netif_tx_lock_bh(struct net_device *dev) { spin_lock_bh(&dev->_xmit_lock); - dev->xmit_lock_owner = raw_smp_processor_id(); +#ifdef CONFIG_PREEMPT_RT + dev->xmit_lock_owner = (void *)current; +#else + dev->xmit_lock_owner = raw_smp_processor_id(); +#endif } static inline int netif_tx_trylock(struct net_device *dev) { int ok = spin_trylock(&dev->_xmit_lock); if (likely(ok)) - dev->xmit_lock_owner = raw_smp_processor_id(); + { +#ifdef CONFIG_PREEMPT_RT + dev->xmit_lock_owner = (void *)current; +#else + dev->xmit_lock_owner = raw_smp_processor_id(); +#endif + } return ok; } static inline void netif_tx_unlock(struct net_device *dev) { +#ifdef CONFIG_PREEMPT_RT dev->xmit_lock_owner = -1; +#else + dev->xmit_lock_owner = (void *)-1; +#endif spin_unlock(&dev->_xmit_lock); } static inline void netif_tx_unlock_bh(struct net_device *dev) { +#ifdef CONFIG_PREEMPT_RT dev->xmit_lock_owner = -1; +#else + dev->xmit_lock_owner = (void *)-1; +#endif spin_unlock_bh(&dev->_xmit_lock); } diff -ur linux-2.6.23-rc1-rt0/net/core/dev.c linux-2.6.23-rc1-rt0_new/net/core/dev.c --- linux-2.6.23-rc1-rt0/net/core/dev.c 2007-07-24 15:17:07.000000000 -0400 +++ linux-2.6.23-rc1-rt0_new/net/core/dev.c 2007-08-01 08:56:02.000000000 -0400 @@ -1592,7 +1592,7 @@ * No need to check for recursion with threaded interrupts: */ #ifdef CONFIG_PREEMPT_RT - if (1) { + if (dev->xmit_lock_owner != (void *)current) { #else int cpu = raw_smp_processor_id(); /* ok because BHs are off */ diff -ur linux-2.6.23-rc1-rt0/net/sched/sch_generic.c linux-2.6.23-rc1-rt0_new/net/sched/sch_generic.c --- linux-2.6.23-rc1-rt0/net/sched/sch_generic.c 2007-07-24 15:17:07.000000000 -0400 +++ linux-2.6.23-rc1-rt0_new/net/sched/sch_generic.c 2007-08-01 08:57:14.000000000 -0400 @@ -153,7 +153,13 @@ if (!lockless) { #ifdef CONFIG_PREEMPT_RT - netif_tx_lock(dev); + if (dev->xmit_lock_owner == (void *)current) { + kfree_skb(skb); + if (net_ratelimit()) + printk(KERN_DEBUG "Dead loop on netdevice %s, fix it urgently!\n", dev->name); + return -1; + } + netif_tx_lock(dev); #else if (netif_tx_trylock(dev)) /* Another CPU grabbed the driver tx lock */ -----Original Message----- From: Ingo Molnar [mailto:mingo@elte.hu] Sent: Tuesday, July 24, 2007 3:15 PM To: Beauchemin, Mark Cc: Thomas Gleixner; linux-kernel@vger.kernel.org; David Miller Subject: Re: [PATCH -rt] Preemption problem in kernel RT Patch * Beauchemin, Mark <Mark.Beauchemin@sycamorenet.com> wrote: > I'm not sure why the check for recursion has been removed. In the > backtrace below, I think it would be caught by this check and not > recursively call the spinlock code. ah ... i think i did it like that because i didnt realize that there would be a recursive call sequence, i was concentrating on recursive locking. incidentally, this code got cleaned up in .23-rc1-rt0, and now it looks quite similar to your suggested fix. Could you double-check that it solves your problem? Ingo ^ permalink raw reply [flat|nested] 15+ messages in thread
* RE: [PATCH -rt] Preemption problem in kernel RT Patch 2007-08-01 14:15 ` Beauchemin, Mark @ 2007-08-01 14:22 ` Beauchemin, Mark 2007-08-06 7:13 ` Ingo Molnar 0 siblings, 1 reply; 15+ messages in thread From: Beauchemin, Mark @ 2007-08-01 14:22 UTC (permalink / raw) To: Beauchemin, Mark, Ingo Molnar; +Cc: Thomas Gleixner, linux-kernel, David Miller sorry.. I sent the wrong patch file. There was a warning in the other one. diff -ur linux-2.6.23-rc1-rt0/include/linux/netdevice.h linux-2.6.23-rc1-rt0_new/include/linux/netdevice.h --- linux-2.6.23-rc1-rt0/include/linux/netdevice.h 2007-07-24 15:17:07.000000000 -0400 +++ linux-2.6.23-rc1-rt0_new/include/linux/netdevice.h 2007-08-01 09:10:01.000000000 -0400 @@ -468,7 +468,11 @@ /* cpu id of processor entered to hard_start_xmit or -1, if nobody entered there. */ - int xmit_lock_owner; +#ifdef CONFIG_PREEMPT_RT + void * xmit_lock_owner; +#else + int xmit_lock_owner; +#endif void *priv; /* pointer to private data */ int (*hard_start_xmit) (struct sk_buff *skb, struct net_device *dev); @@ -1041,32 +1045,54 @@ static inline void netif_tx_lock(struct net_device *dev) { spin_lock(&dev->_xmit_lock); - dev->xmit_lock_owner = raw_smp_processor_id(); +#ifdef CONFIG_PREEMPT_RT + dev->xmit_lock_owner = (void *)current; +#else + dev->xmit_lock_owner = raw_smp_processor_id(); +#endif } static inline void netif_tx_lock_bh(struct net_device *dev) { spin_lock_bh(&dev->_xmit_lock); - dev->xmit_lock_owner = raw_smp_processor_id(); +#ifdef CONFIG_PREEMPT_RT + dev->xmit_lock_owner = (void *)current; +#else + dev->xmit_lock_owner = raw_smp_processor_id(); +#endif } static inline int netif_tx_trylock(struct net_device *dev) { int ok = spin_trylock(&dev->_xmit_lock); if (likely(ok)) - dev->xmit_lock_owner = raw_smp_processor_id(); + { +#ifdef CONFIG_PREEMPT_RT + dev->xmit_lock_owner = (void *)current; +#else + dev->xmit_lock_owner = raw_smp_processor_id(); +#endif + } return ok; } static inline void netif_tx_unlock(struct net_device *dev) { +#ifdef CONFIG_PREEMPT_RT + dev->xmit_lock_owner = (void *)-1; +#else dev->xmit_lock_owner = -1; +#endif spin_unlock(&dev->_xmit_lock); } static inline void netif_tx_unlock_bh(struct net_device *dev) { +#ifdef CONFIG_PREEMPT_RT + dev->xmit_lock_owner = (void *)-1; +#else dev->xmit_lock_owner = -1; +#endif spin_unlock_bh(&dev->_xmit_lock); } diff -ur linux-2.6.23-rc1-rt0/net/core/dev.c linux-2.6.23-rc1-rt0_new/net/core/dev.c --- linux-2.6.23-rc1-rt0/net/core/dev.c 2007-07-24 15:17:07.000000000 -0400 +++ linux-2.6.23-rc1-rt0_new/net/core/dev.c 2007-08-01 08:56:02.000000000 -0400 @@ -1592,7 +1592,7 @@ * No need to check for recursion with threaded interrupts: */ #ifdef CONFIG_PREEMPT_RT - if (1) { + if (dev->xmit_lock_owner != (void *)current) { #else int cpu = raw_smp_processor_id(); /* ok because BHs are off */ diff -ur linux-2.6.23-rc1-rt0/net/sched/sch_generic.c linux-2.6.23-rc1-rt0_new/net/sched/sch_generic.c --- linux-2.6.23-rc1-rt0/net/sched/sch_generic.c 2007-07-24 15:17:07.000000000 -0400 +++ linux-2.6.23-rc1-rt0_new/net/sched/sch_generic.c 2007-08-01 08:57:14.000000000 -0400 @@ -153,7 +153,13 @@ if (!lockless) { #ifdef CONFIG_PREEMPT_RT - netif_tx_lock(dev); + if (dev->xmit_lock_owner == (void *)current) { + kfree_skb(skb); + if (net_ratelimit()) + printk(KERN_DEBUG "Dead loop on netdevice %s, fix it urgently!\n", dev->name); + return -1; + } + netif_tx_lock(dev); #else if (netif_tx_trylock(dev)) /* Another CPU grabbed the driver tx lock */ ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH -rt] Preemption problem in kernel RT Patch 2007-08-01 14:22 ` Beauchemin, Mark @ 2007-08-06 7:13 ` Ingo Molnar 2007-08-07 19:41 ` Beauchemin, Mark 0 siblings, 1 reply; 15+ messages in thread From: Ingo Molnar @ 2007-08-06 7:13 UTC (permalink / raw) To: Beauchemin, Mark; +Cc: Thomas Gleixner, linux-kernel, David Miller * Beauchemin, Mark <Mark.Beauchemin@sycamorenet.com> wrote: > sorry.. I sent the wrong patch file. There was a warning in the > other one. > - int xmit_lock_owner; > +#ifdef CONFIG_PREEMPT_RT > + void * xmit_lock_owner; > +#else > + int xmit_lock_owner; > +#endif could you please change this to use 'current' (instead of the CPU number) as the xmit_lock_owner unconditionally? That results in much fewer #ifdefs and far cleaner code. Ingo ^ permalink raw reply [flat|nested] 15+ messages in thread
* RE: [PATCH -rt] Preemption problem in kernel RT Patch 2007-08-06 7:13 ` Ingo Molnar @ 2007-08-07 19:41 ` Beauchemin, Mark 2007-09-17 13:03 ` Beauchemin, Mark 0 siblings, 1 reply; 15+ messages in thread From: Beauchemin, Mark @ 2007-08-07 19:41 UTC (permalink / raw) To: Ingo Molnar; +Cc: Thomas Gleixner, linux-kernel, David Miller > could you please change this to use 'current' (instead of the CPU > number) as the xmit_lock_owner unconditionally? That results in much > fewer #ifdefs and far cleaner code. > > Ingo Ingo, Here's the new patch. Please check me on the non-rt portion. I think the check is still functionally the same. Thanks, Mark diff -ur linux-2.6.23-rc1-rt0/include/linux/netdevice.h linux-2.6.23-rc1-rt0_new/include/linux/netdevice.h --- linux-2.6.23-rc1-rt0/include/linux/netdevice.h 2007-07-24 15:17:07.000000000 -0400 +++ linux-2.6.23-rc1-rt0_new/include/linux/netdevice.h 2007-08-06 09:26:51.000000000 -0400 @@ -468,7 +468,7 @@ /* cpu id of processor entered to hard_start_xmit or -1, if nobody entered there. */ - int xmit_lock_owner; + void * xmit_lock_owner; void *priv; /* pointer to private data */ int (*hard_start_xmit) (struct sk_buff *skb, struct net_device *dev); @@ -1041,32 +1041,34 @@ static inline void netif_tx_lock(struct net_device *dev) { spin_lock(&dev->_xmit_lock); - dev->xmit_lock_owner = raw_smp_processor_id(); + dev->xmit_lock_owner = (void *)current; } static inline void netif_tx_lock_bh(struct net_device *dev) { spin_lock_bh(&dev->_xmit_lock); - dev->xmit_lock_owner = raw_smp_processor_id(); + dev->xmit_lock_owner = (void *)current; } static inline int netif_tx_trylock(struct net_device *dev) { int ok = spin_trylock(&dev->_xmit_lock); if (likely(ok)) - dev->xmit_lock_owner = raw_smp_processor_id(); + { + dev->xmit_lock_owner = (void *)current; + } return ok; } static inline void netif_tx_unlock(struct net_device *dev) { - dev->xmit_lock_owner = -1; + dev->xmit_lock_owner = (void *)-1; spin_unlock(&dev->_xmit_lock); } static inline void netif_tx_unlock_bh(struct net_device *dev) { - dev->xmit_lock_owner = -1; + dev->xmit_lock_owner = (void *)-1; spin_unlock_bh(&dev->_xmit_lock); } diff -ur linux-2.6.23-rc1-rt0/net/core/dev.c linux-2.6.23-rc1-rt0_new/net/core/dev.c --- linux-2.6.23-rc1-rt0/net/core/dev.c 2007-07-24 15:17:07.000000000 -0400 +++ linux-2.6.23-rc1-rt0_new/net/core/dev.c 2007-08-07 15:22:31.000000000 -0400 @@ -1588,16 +1588,7 @@ Either shot noqueue qdisc, it is even simpler 8) */ if (dev->flags & IFF_UP) { - /* - * No need to check for recursion with threaded interrupts: - */ -#ifdef CONFIG_PREEMPT_RT - if (1) { -#else - int cpu = raw_smp_processor_id(); /* ok because BHs are off */ - - if (dev->xmit_lock_owner != cpu) { -#endif + if (dev->xmit_lock_owner != (void *)current) { HARD_TX_LOCK(dev); if (!netif_queue_stopped(dev) && @@ -3349,7 +3340,7 @@ spin_lock_init(&dev->queue_lock); spin_lock_init(&dev->_xmit_lock); netdev_set_lockdep_class(&dev->_xmit_lock, dev->type); - dev->xmit_lock_owner = -1; + dev->xmit_lock_owner = (void *)-1; spin_lock_init(&dev->ingress_lock); dev->iflink = -1; diff -ur linux-2.6.23-rc1-rt0/net/mac80211/ieee80211.c linux-2.6.23-rc1-rt0_new/net/mac80211/ieee80211.c --- linux-2.6.23-rc1-rt0/net/mac80211/ieee80211.c 2007-07-24 15:15:57.000000000 -0400 +++ linux-2.6.23-rc1-rt0_new/net/mac80211/ieee80211.c 2007-08-07 15:21:28.000000000 -0400 @@ -2413,7 +2413,7 @@ static inline void netif_tx_lock_nested(struct net_device *dev, int subclass) { spin_lock_nested(&dev->_xmit_lock, subclass); - dev->xmit_lock_owner = smp_processor_id(); + dev->xmit_lock_owner = (void *)current; } static void ieee80211_set_multicast_list(struct net_device *dev) diff -ur linux-2.6.23-rc1-rt0/net/sched/sch_generic.c linux-2.6.23-rc1-rt0_new/net/sched/sch_generic.c --- linux-2.6.23-rc1-rt0/net/sched/sch_generic.c 2007-07-24 15:17:07.000000000 -0400 +++ linux-2.6.23-rc1-rt0_new/net/sched/sch_generic.c 2007-08-07 15:20:10.000000000 -0400 @@ -88,7 +88,7 @@ { int ret; - if (unlikely(dev->xmit_lock_owner == smp_processor_id())) { + if (unlikely(dev->xmit_lock_owner == (void *)current)) { /* * Same CPU holding the lock. It may be a transient * configuration error, when hard_start_xmit() recurses. We @@ -153,7 +153,13 @@ if (!lockless) { #ifdef CONFIG_PREEMPT_RT - netif_tx_lock(dev); + if (dev->xmit_lock_owner == (void *)current) { + kfree_skb(skb); + if (net_ratelimit()) + printk(KERN_DEBUG "Dead loop on netdevice %s, fix it urgently!\n", dev->name); + return -1; + } + netif_tx_lock(dev); #else if (netif_tx_trylock(dev)) /* Another CPU grabbed the driver tx lock */ ^ permalink raw reply [flat|nested] 15+ messages in thread
* RE: [PATCH -rt] Preemption problem in kernel RT Patch 2007-08-07 19:41 ` Beauchemin, Mark @ 2007-09-17 13:03 ` Beauchemin, Mark 2007-09-17 13:59 ` Ingo Molnar 0 siblings, 1 reply; 15+ messages in thread From: Beauchemin, Mark @ 2007-09-17 13:03 UTC (permalink / raw) To: Beauchemin, Mark, Ingo Molnar; +Cc: Thomas Gleixner, linux-kernel Ingo, Any thoughts on the patch? Thanks, Mark -----Original Message----- From: Beauchemin, Mark Sent: Tuesday, August 07, 2007 3:42 PM To: 'Ingo Molnar' Cc: Thomas Gleixner; linux-kernel@vger.kernel.org; David Miller Subject: RE: [PATCH -rt] Preemption problem in kernel RT Patch diff -ur linux-2.6.23-rc1-rt0/include/linux/netdevice.h linux-2.6.23-rc1-rt0_new/include/linux/netdevice.h --- linux-2.6.23-rc1-rt0/include/linux/netdevice.h 2007-07-24 15:17:07.000000000 -0400 +++ linux-2.6.23-rc1-rt0_new/include/linux/netdevice.h 2007-08-06 09:26:51.000000000 -0400 @@ -468,7 +468,7 @@ /* cpu id of processor entered to hard_start_xmit or -1, if nobody entered there. */ - int xmit_lock_owner; + void * xmit_lock_owner; void *priv; /* pointer to private data */ int (*hard_start_xmit) (struct sk_buff *skb, struct net_device *dev); @@ -1041,32 +1041,34 @@ static inline void netif_tx_lock(struct net_device *dev) { spin_lock(&dev->_xmit_lock); - dev->xmit_lock_owner = raw_smp_processor_id(); + dev->xmit_lock_owner = (void *)current; } static inline void netif_tx_lock_bh(struct net_device *dev) { spin_lock_bh(&dev->_xmit_lock); - dev->xmit_lock_owner = raw_smp_processor_id(); + dev->xmit_lock_owner = (void *)current; } static inline int netif_tx_trylock(struct net_device *dev) { int ok = spin_trylock(&dev->_xmit_lock); if (likely(ok)) - dev->xmit_lock_owner = raw_smp_processor_id(); + { + dev->xmit_lock_owner = (void *)current; + } return ok; } static inline void netif_tx_unlock(struct net_device *dev) { - dev->xmit_lock_owner = -1; + dev->xmit_lock_owner = (void *)-1; spin_unlock(&dev->_xmit_lock); } static inline void netif_tx_unlock_bh(struct net_device *dev) { - dev->xmit_lock_owner = -1; + dev->xmit_lock_owner = (void *)-1; spin_unlock_bh(&dev->_xmit_lock); } diff -ur linux-2.6.23-rc1-rt0/net/core/dev.c linux-2.6.23-rc1-rt0_new/net/core/dev.c --- linux-2.6.23-rc1-rt0/net/core/dev.c 2007-07-24 15:17:07.000000000 -0400 +++ linux-2.6.23-rc1-rt0_new/net/core/dev.c 2007-08-07 15:22:31.000000000 -0400 @@ -1588,16 +1588,7 @@ Either shot noqueue qdisc, it is even simpler 8) */ if (dev->flags & IFF_UP) { - /* - * No need to check for recursion with threaded interrupts: - */ -#ifdef CONFIG_PREEMPT_RT - if (1) { -#else - int cpu = raw_smp_processor_id(); /* ok because BHs are off */ - - if (dev->xmit_lock_owner != cpu) { -#endif + if (dev->xmit_lock_owner != (void *)current) { HARD_TX_LOCK(dev); if (!netif_queue_stopped(dev) && @@ -3349,7 +3340,7 @@ spin_lock_init(&dev->queue_lock); spin_lock_init(&dev->_xmit_lock); netdev_set_lockdep_class(&dev->_xmit_lock, dev->type); - dev->xmit_lock_owner = -1; + dev->xmit_lock_owner = (void *)-1; spin_lock_init(&dev->ingress_lock); dev->iflink = -1; diff -ur linux-2.6.23-rc1-rt0/net/mac80211/ieee80211.c linux-2.6.23-rc1-rt0_new/net/mac80211/ieee80211.c --- linux-2.6.23-rc1-rt0/net/mac80211/ieee80211.c 2007-07-24 15:15:57.000000000 -0400 +++ linux-2.6.23-rc1-rt0_new/net/mac80211/ieee80211.c 2007-08-07 15:21:28.000000000 -0400 @@ -2413,7 +2413,7 @@ static inline void netif_tx_lock_nested(struct net_device *dev, int subclass) { spin_lock_nested(&dev->_xmit_lock, subclass); - dev->xmit_lock_owner = smp_processor_id(); + dev->xmit_lock_owner = (void *)current; } static void ieee80211_set_multicast_list(struct net_device *dev) diff -ur linux-2.6.23-rc1-rt0/net/sched/sch_generic.c linux-2.6.23-rc1-rt0_new/net/sched/sch_generic.c --- linux-2.6.23-rc1-rt0/net/sched/sch_generic.c 2007-07-24 15:17:07.000000000 -0400 +++ linux-2.6.23-rc1-rt0_new/net/sched/sch_generic.c 2007-08-07 15:20:10.000000000 -0400 @@ -88,7 +88,7 @@ { int ret; - if (unlikely(dev->xmit_lock_owner == smp_processor_id())) { + if (unlikely(dev->xmit_lock_owner == (void *)current)) { /* * Same CPU holding the lock. It may be a transient * configuration error, when hard_start_xmit() recurses. We @@ -153,7 +153,13 @@ if (!lockless) { #ifdef CONFIG_PREEMPT_RT - netif_tx_lock(dev); + if (dev->xmit_lock_owner == (void *)current) { + kfree_skb(skb); + if (net_ratelimit()) + printk(KERN_DEBUG "Dead loop on netdevice %s, fix it urgently!\n", dev->name); + return -1; + } + netif_tx_lock(dev); #else if (netif_tx_trylock(dev)) /* Another CPU grabbed the driver tx lock */ ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH -rt] Preemption problem in kernel RT Patch 2007-09-17 13:03 ` Beauchemin, Mark @ 2007-09-17 13:59 ` Ingo Molnar 0 siblings, 0 replies; 15+ messages in thread From: Ingo Molnar @ 2007-09-17 13:59 UTC (permalink / raw) To: Beauchemin, Mark; +Cc: Thomas Gleixner, linux-kernel * Beauchemin, Mark <Mark.Beauchemin@sycamorenet.com> wrote: > Ingo, > > Any thoughts on the patch? looks good to me - but it has a number of style issues, please run it through scripts/checkpatch.pl to see those. Ingo ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH -rt] Preemption problem in kernel RT Patch 2007-06-21 19:39 [PATCH -rt] Preemption problem in kernel RT Patch Beauchemin, Mark 2007-06-22 8:03 ` Thomas Gleixner @ 2009-10-14 12:55 ` dtslinux 2009-10-14 15:38 ` Steven Rostedt 1 sibling, 1 reply; 15+ messages in thread From: dtslinux @ 2009-10-14 12:55 UTC (permalink / raw) To: linux-kernel Hello All, I am having an issue in kernel 2.6.24.7 with RT-27 patch. I am using a block device driver that is doing I/O operations on a virtual device. The driver is using separate kernel threads to perform read and write operations. The driver is working fine in the normal kernels, it is also working fine in RT-27 patch with 2.6.24.7 kernel, but some times I am getting following bug when performing write test with xdd benchmark (in RT-27 patch with 2.6.24.7 kernel) : WARNING: at kernel/rtmutex.c:979 rt_spin_lock() Pid: 12634, comm: pdflush Tainted: GF 2.6.24.7-rt27 #9 [<c04046b8>] show_trace_log_lvl+0x1f/0x34 [<c0404f67>] show_trace+0x17/0x19 [<c04052e2>] dump_stack+0x6f/0x75 [<c063ac74>] rt_spin_lock+0x4a/0xa2 [<c04f33a4>] cfq_exit_io_context+0x30/0x56 [<c04ed88f>] exit_io_context+0x68/0x72 [<c04206c1>] do_exit+0x6c2/0x739 [<c040432d>] kernel_thread_helper+0xd/0x10 ======================= <1>BUG: unable to handle kernel NULL pointer dereference at virtual address 0000003d printing eip: c043d18a *pdpt = 00000000349dd001 *pde = 0000000000000000 <0>Oops: 0000 [#1] PREEMPT DEBUG_PAGEALLOC Modules linked in: sysfs_driver(F) regularcache(F) dts nls_utf8 hfsplus ramdisk_driver bridge autofs4 hidp rfcomm l2cap bluetooth sunrpc ib_iser libiscsi scsi_transport_iscsi ib_srp scsi_transport_srp ib_ipoib ipv6 rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_addr ib_sa ib_mad ib_core dm_mirror dm_multipath dm_mod sbs sbshc battery ac lp floppy sg serio_raw parport_pc parport snd_intel8x0 snd_ac97_codec 8250_pnp ac97_bus snd_seq_dummy 8250 serial_core snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss button snd_pcm e1000 snd_timer snd soundcore i2c_i801 snd_page_alloc i2c_core pcspkr ata_piix libata sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd Pid: 12634, comm: pdflush Tainted: GF (2.6.24.7-rt27 #9) EIP: 0060:[<c043d18a>] EFLAGS: 00010016 CPU: 0 EIP is at task_blocks_on_rt_mutex+0xf8/0x240 EAX: ef42406c EBX: 0000001a ECX: ef424044 EDX: ef424044 ESI: ef424044 EDI: 00000009 EBP: e7ba7eec ESP: e7ba7ebc DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 0068 preempt:00000003 Process pdflush (pid: 12634, ti=e7ba7000 task=e5268060 task.ti=e7ba7000) Stack: 00000001 ef424054 ef424044 00000296 00000000 e7ba7f04 ef424044 00000009 e7ba7f1c 00000296 ef424044 00000296 e7ba7f58 c063a57e 00000296 00000046 00000000 ffffffff 00000078 e7ba7f08 e7ba7f08 e7ba7f10 e7ba7f10 00000000 Call Trace: [<c04046b8>] show_trace_log_lvl+0x1f/0x34 [<c0404772>] show_stack_log_lvl+0xa5/0xb9 [<c040483a>] show_registers+0xb4/0x1b8 [<c0404a5e>] die+0x120/0x21b [<c063ddfc>] do_page_fault+0x845/0xa07 [<c063c29a>] error_code+0x6a/0x70 [<c063a57e>] rt_spin_lock_slowlock+0xc5/0x1b7 [<c063a9d4>] __rt_spin_lock+0x48/0x4b [<c063acbe>] rt_spin_lock+0x94/0xa2 [<c04f33a4>] cfq_exit_io_context+0x30/0x56 [<c04ed88f>] exit_io_context+0x68/0x72 [<c04206c1>] do_exit+0x6c2/0x739 [<c040432d>] kernel_thread_helper+0xd/0x10 ======================= INFO: lockdep is turned off. Code: 24 08 4c 00 00 00 c7 44 24 04 83 42 6e c0 c7 04 24 b6 78 6d c0 e8 c0 0d fe ff e8 f5 80 fc ff 8b 7f 08 8b 4d e8 83 ef 0c 89 7d ec <39> 4f 34 74 04 0f 0b eb fe 8b 7d e8 8b 45 e4 83 c7 20 89 fa e8 <0>EIP: [<c043d18a>] task_blocks_on_rt_mutex+0xf8/0x240 SS:ESP 0068:e7ba7ebc ---[ end trace 432e3e53cc0cfa18 ]--- Fixing recursive fault but reboot is needed! BUG: scheduling with irqs disabled: pdflush/0x00000002/12634 caller is do_exit+0xcc/0x739 Pid: 12634, comm: pdflush Tainted: GF D 2.6.24.7-rt27 #9 [<c04046b8>] show_trace_log_lvl+0x1f/0x34 [<c0404f67>] show_trace+0x17/0x19 [<c04052e2>] dump_stack+0x6f/0x75 [<c0638e41>] schedule+0x8a/0x105 [<c04200cb>] do_exit+0xcc/0x739 [<c0404b51>] die+0x213/0x21b [<c063ddfc>] do_page_fault+0x845/0xa07 [<c063c29a>] error_code+0x6a/0x70 [<c063a57e>] rt_spin_lock_slowlock+0xc5/0x1b7 [<c063a9d4>] __rt_spin_lock+0x48/0x4b [<c063acbe>] rt_spin_lock+0x94/0xa2 [<c04f33a4>] cfq_exit_io_context+0x30/0x56 [<c04ed88f>] exit_io_context+0x68/0x72 [<c04206c1>] do_exit+0x6c2/0x739 [<c040432d>] kernel_thread_helper+0xd/0x10 ======================= I am confuse whether my driver is causing this problem or not as in the trace above I cannot find any function of my driver. All the functions are of kernel and pdflush is causing this problem. In the above trace "Modules linked in: sysfs_driver(F) regularcache(F) dts " are my modules. I am using CentOS 5 with 1GB RAM on "Intel(R) Pentium(R) 4 CPU 2.80GHz". Please guide me if I am doing any mistake Thanks! Furahm -- This message was sent on behalf of dtslinux@hotmail.com at openSubscriber.com http://www.opensubscriber.com/message/linux-kernel@vger.kernel.org/6978704.html ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH -rt] Preemption problem in kernel RT Patch 2009-10-14 12:55 ` dtslinux @ 2009-10-14 15:38 ` Steven Rostedt 0 siblings, 0 replies; 15+ messages in thread From: Steven Rostedt @ 2009-10-14 15:38 UTC (permalink / raw) To: dtslinux; +Cc: linux-kernel Note, LKML receives 600 emails a day. If you want to make sure your email is seen, it is best to Cc those that post the -rt patches. Otherwise, your email is likely to get lost in the noise. On Wed, Oct 14, 2009 at 08:55:28AM -0400, dtslinux@hotmail.com wrote: > Hello All, > > I am having an issue in kernel 2.6.24.7 with RT-27 patch. I am using a block device driver that is doing I/O operations on a virtual device. The driver is using separate kernel threads to perform read and write operations. The driver is working fine in the normal kernels, it is also working fine in RT-27 patch with 2.6.24.7 kernel, but some times I am getting following bug when performing write test with xdd benchmark (in RT-27 patch with 2.6.24.7 kernel) : > > WARNING: at kernel/rtmutex.c:979 rt_spin_lock() This shows that a non "rtmutex" was used in the rtmutex code. > Pid: 12634, comm: pdflush Tainted: GF 2.6.24.7-rt27 #9 > [<c04046b8>] show_trace_log_lvl+0x1f/0x34 > [<c0404f67>] show_trace+0x17/0x19 > [<c04052e2>] dump_stack+0x6f/0x75 > [<c063ac74>] rt_spin_lock+0x4a/0xa2 > [<c04f33a4>] cfq_exit_io_context+0x30/0x56 The q->queue_lock used in cfq_exit_single_io_context is not an rtmutex. Yes, this will crash the kernel. -- Steve > [<c04ed88f>] exit_io_context+0x68/0x72 > [<c04206c1>] do_exit+0x6c2/0x739 > [<c040432d>] kernel_thread_helper+0xd/0x10 > ======================= > > <1>BUG: unable to handle kernel NULL pointer dereference at virtual address 0000003d > printing eip: c043d18a *pdpt = 00000000349dd001 *pde = 0000000000000000 > > <0>Oops: 0000 [#1] PREEMPT DEBUG_PAGEALLOC > Modules linked in: sysfs_driver(F) regularcache(F) dts nls_utf8 hfsplus ramdisk_driver bridge autofs4 hidp rfcomm l2cap bluetooth sunrpc ib_iser libiscsi scsi_transport_iscsi ib_srp scsi_transport_srp ib_ipoib ipv6 rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_addr ib_sa ib_mad ib_core dm_mirror dm_multipath dm_mod sbs sbshc battery ac lp floppy sg serio_raw parport_pc parport snd_intel8x0 snd_ac97_codec 8250_pnp ac97_bus snd_seq_dummy 8250 serial_core snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss button snd_pcm e1000 snd_timer snd soundcore i2c_i801 snd_page_alloc i2c_core pcspkr ata_piix libata sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd > > Pid: 12634, comm: pdflush Tainted: GF (2.6.24.7-rt27 #9) > EIP: 0060:[<c043d18a>] EFLAGS: 00010016 CPU: 0 > EIP is at task_blocks_on_rt_mutex+0xf8/0x240 > EAX: ef42406c EBX: 0000001a ECX: ef424044 EDX: ef424044 > ESI: ef424044 EDI: 00000009 EBP: e7ba7eec ESP: e7ba7ebc > DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 0068 preempt:00000003 > Process pdflush (pid: 12634, ti=e7ba7000 task=e5268060 task.ti=e7ba7000) > Stack: 00000001 ef424054 ef424044 00000296 00000000 e7ba7f04 ef424044 00000009 > e7ba7f1c 00000296 ef424044 00000296 e7ba7f58 c063a57e 00000296 00000046 > 00000000 ffffffff 00000078 e7ba7f08 e7ba7f08 e7ba7f10 e7ba7f10 00000000 > Call Trace: > [<c04046b8>] show_trace_log_lvl+0x1f/0x34 > [<c0404772>] show_stack_log_lvl+0xa5/0xb9 > [<c040483a>] show_registers+0xb4/0x1b8 > [<c0404a5e>] die+0x120/0x21b > [<c063ddfc>] do_page_fault+0x845/0xa07 > [<c063c29a>] error_code+0x6a/0x70 > [<c063a57e>] rt_spin_lock_slowlock+0xc5/0x1b7 > [<c063a9d4>] __rt_spin_lock+0x48/0x4b > [<c063acbe>] rt_spin_lock+0x94/0xa2 > [<c04f33a4>] cfq_exit_io_context+0x30/0x56 > [<c04ed88f>] exit_io_context+0x68/0x72 > [<c04206c1>] do_exit+0x6c2/0x739 > [<c040432d>] kernel_thread_helper+0xd/0x10 > ======================= > INFO: lockdep is turned off. > Code: 24 08 4c 00 00 00 c7 44 24 04 83 42 6e c0 c7 04 24 b6 78 6d c0 e8 c0 0d fe ff e8 f5 80 fc ff 8b 7f 08 8b 4d e8 83 ef 0c 89 7d ec <39> 4f 34 74 04 0f 0b eb fe 8b 7d e8 8b 45 e4 83 c7 20 89 fa e8 > > <0>EIP: [<c043d18a>] task_blocks_on_rt_mutex+0xf8/0x240 SS:ESP 0068:e7ba7ebc > ---[ end trace 432e3e53cc0cfa18 ]--- > Fixing recursive fault but reboot is needed! > BUG: scheduling with irqs disabled: pdflush/0x00000002/12634 > caller is do_exit+0xcc/0x739 > Pid: 12634, comm: pdflush Tainted: GF D 2.6.24.7-rt27 #9 > [<c04046b8>] show_trace_log_lvl+0x1f/0x34 > [<c0404f67>] show_trace+0x17/0x19 > [<c04052e2>] dump_stack+0x6f/0x75 > [<c0638e41>] schedule+0x8a/0x105 > [<c04200cb>] do_exit+0xcc/0x739 > [<c0404b51>] die+0x213/0x21b > [<c063ddfc>] do_page_fault+0x845/0xa07 > [<c063c29a>] error_code+0x6a/0x70 > [<c063a57e>] rt_spin_lock_slowlock+0xc5/0x1b7 > [<c063a9d4>] __rt_spin_lock+0x48/0x4b > [<c063acbe>] rt_spin_lock+0x94/0xa2 > [<c04f33a4>] cfq_exit_io_context+0x30/0x56 > [<c04ed88f>] exit_io_context+0x68/0x72 > [<c04206c1>] do_exit+0x6c2/0x739 > [<c040432d>] kernel_thread_helper+0xd/0x10 > ======================= > > I am confuse whether my driver is causing this problem or not as in the trace above I cannot find any function of my driver. All the functions are of kernel and pdflush is causing this problem. In the above trace "Modules linked in: sysfs_driver(F) regularcache(F) dts " are my modules. > I am using CentOS 5 with 1GB RAM on "Intel(R) Pentium(R) 4 CPU 2.80GHz". Please guide me if I am doing any mistake > > Thanks! > Furahm > > -- > This message was sent on behalf of dtslinux@hotmail.com at openSubscriber.com > http://www.opensubscriber.com/message/linux-kernel@vger.kernel.org/6978704.html > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2009-10-14 15:40 UTC | newest] Thread overview: 15+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2007-06-21 19:39 [PATCH -rt] Preemption problem in kernel RT Patch Beauchemin, Mark 2007-06-22 8:03 ` Thomas Gleixner 2007-06-23 14:08 ` Beauchemin, Mark 2007-06-23 14:26 ` Thomas Gleixner 2007-07-24 15:48 ` Beauchemin, Mark 2007-07-24 19:15 ` Ingo Molnar 2007-07-24 19:36 ` Beauchemin, Mark 2007-08-01 14:15 ` Beauchemin, Mark 2007-08-01 14:22 ` Beauchemin, Mark 2007-08-06 7:13 ` Ingo Molnar 2007-08-07 19:41 ` Beauchemin, Mark 2007-09-17 13:03 ` Beauchemin, Mark 2007-09-17 13:59 ` Ingo Molnar 2009-10-14 12:55 ` dtslinux 2009-10-14 15:38 ` Steven Rostedt
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox