* [BUG] 2.6.20.1-rt8 irnet + pppd recursive spinlock...
@ 2007-04-05 7:09 Guennadi Liakhovetski
2007-04-05 10:59 ` Guennadi Liakhovetski
0 siblings, 1 reply; 9+ messages in thread
From: Guennadi Liakhovetski @ 2007-04-05 7:09 UTC (permalink / raw)
To: Paul Mackerras
Cc: irda-users-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f,
linux-rt-users-u79uwXL29TY76Z2rM5mHXA,
netdev-u79uwXL29TY76Z2rM5mHXA
Hi all
As I came this morning to check the IrNET / PPP test, I started yesterday,
the device was dead and OOM messages were scrolling up the terminal. I
captured task trace, and the ftp process seems to have been the original
culprit. Below is the backtrace. Which looks like a recursive spinlock,
since, I assume, it is this BUG() that has triggered:
BUG_ON(rt_mutex_owner(lock) == current);
and ppp is indeed entered recursively in the trace. I'll look if I can
find the reason and a solution, but would also be greatful for any hints.
Thanks
Guennadi
---------------------------------
Guennadi Liakhovetski, Ph.D.
DSA Daten- und Systemtechnik GmbH
Pascalstr. 28
D-52076 Aachen
Germany
ftp D [c3c9e460] C01E5838 0 18445 1 20756 14588 (L-TLB)
[<c01e5420>] (__schedule+0x0/0x7e8) from [<c01e5cfc>] (schedule+0x54/0x124)
[<c01e5ca8>] (schedule+0x0/0x124) from [<c017e69c>] (lock_sock_nested+0x94/0xd0)
r5 = C329F06C r4 = C14B9780
[<c017e608>] (lock_sock_nested+0x0/0xd0) from [<c017b878>] (sock_fasync+0x40/0x154)
r7 = C329F040 r6 = C2238A5C r5 = C0AD8B60 r4 = C0AD8B60
[<c017b838>] (sock_fasync+0x0/0x154) from [<c017b9b0>] (sock_close+0x24/0x44)
[<c017b98c>] (sock_close+0x0/0x44) from [<c008dbbc>] (__fput+0x194/0x1c8)
r4 = 00000008
[<c008da28>] (__fput+0x0/0x1c8) from [<c008dc28>] (fput+0x38/0x3c)
r8 = 00000000 r7 = C3251380 r6 = 00000000 r5 = C3251380
r4 = C0AD8B60
[<c008dbf0>] (fput+0x0/0x3c) from [<c008b860>] (filp_close+0x5c/0x88)
[<c008b804>] (filp_close+0x0/0x88) from [<c003f108>] (put_files_struct+0x9c/0xdc)
r6 = C3251388 r5 = 00000007 r4 = 00000001
[<c003f06c>] (put_files_struct+0x0/0xdc) from [<c003fa14>] (do_exit+0x168/0x8b0)
[<c003f8ac>] (do_exit+0x0/0x8b0) from [<c002411c>] (die+0x29c/0x2e8)
[<c0023e80>] (die+0x0/0x2e8) from [<c0025b3c>] (__do_kernel_fault+0x70/0x80)
[<c0025acc>] (__do_kernel_fault+0x0/0x80) from [<c0025cd0>] (do_page_fault+0x60/0x214)
r7 = C1B3B8C0 r6 = C0264418 r5 = C3C9E460 r4 = C02643A8
[<c0025c70>] (do_page_fault+0x0/0x214) from [<c0025fa0>] (do_DataAbort+0x3c/0xa4)
[<c0025f64>] (do_DataAbort+0x0/0xa4) from [<c001fa60>] (__dabt_svc+0x40/0x60)
r8 = 00000001 r7 = A0000013 r6 = C3A43780 r5 = C14B99E0
r4 = FFFFFFFF
[<c0023cd0>] (__bug+0x0/0x2c) from [<c01e753c>] (rt_spin_lock_slowlock+0x1c8/0x1f8)
[<c01e7374>] (rt_spin_lock_slowlock+0x0/0x1f8) from [<c01e7894>] (__lock_text_start+0x44/0x48)
[<c01e7850>] (__lock_text_start+0x0/0x48) from [<bf12b23c>] (ppp_channel_push+0x1c/0xc8 [ppp_generic])
[<bf12b220>] (ppp_channel_push+0x0/0xc8 [ppp_generic]) from [<bf12bf98>] (ppp_output_wakeup+0x18/0x1c [ppp_generic])
r7 = C38F42BC r6 = C38F4200 r5 = C38F4200 r4 = 00000000
[<bf12bf80>] (ppp_output_wakeup+0x0/0x1c [ppp_generic]) from [<bf132c98>] (irnet_flow_indication+0x38/0x3c [irnet])
[<bf132c60>] (irnet_flow_indication+0x0/0x3c [irnet]) from [<bf104e4c>] (irttp_run_tx_queue+0x1c0/0x1d4 [irda])
[<bf104c8c>] (irttp_run_tx_queue+0x0/0x1d4 [irda]) from [<bf104f88>] (irttp_data_request+0x128/0x4f8 [irda])
r8 = BF121560 r7 = 00000002 r6 = C38F4200 r5 = C21418B8
r4 = C21418B8
[<bf104e60>] (irttp_data_request+0x0/0x4f8 [irda]) from [<bf1321bc>] (ppp_irnet_send+0x134/0x238 [irnet])
[<bf132088>] (ppp_irnet_send+0x0/0x238 [irnet]) from [<bf12a600>] (ppp_push+0x80/0xb8 [ppp_generic])
r7 = C3A436E0 r6 = 00000000 r5 = C21418B8 r4 = C1489600
[<bf12a580>] (ppp_push+0x0/0xb8 [ppp_generic]) from [<bf12a8d8>] (ppp_xmit_process+0x34/0x50c [ppp_generic])
r7 = 00000021 r6 = C21418B8 r5 = C1489600 r4 = 00000000
[<bf12a8a4>] (ppp_xmit_process+0x0/0x50c [ppp_generic]) from [<bf12aed8>] (ppp_start_xmit+0x128/0x254 [ppp_generic])
[<bf12adb0>] (ppp_start_xmit+0x0/0x254 [ppp_generic]) from [<c0186fa4>] (dev_hard_start_xmit+0x170/0x268)
[<c0186e34>] (dev_hard_start_xmit+0x0/0x268) from [<c01979b8>] (__qdisc_run+0x60/0x270)
r8 = C1BBC914 r7 = C21418B8 r6 = 00000000 r5 = C21418B8
r4 = C1BBC800
[<c0197958>] (__qdisc_run+0x0/0x270) from [<c0187250>] (dev_queue_xmit+0x1b4/0x25c)
[<c018709c>] (dev_queue_xmit+0x0/0x25c) from [<c01a5f08>] (ip_output+0x150/0x254)
r7 = C329F040 r6 = C21418B8 r5 = 00000000 r4 = C0D02EE0
[<c01a5db8>] (ip_output+0x0/0x254) from [<c01a52ac>] (ip_queue_xmit+0x360/0x4b4)
[<c01a4f4c>] (ip_queue_xmit+0x0/0x4b4) from [<c01b8424>] (tcp_transmit_skb+0x5ec/0x8c0)
[<c01b7e38>] (tcp_transmit_skb+0x0/0x8c0) from [<c01b9ff4>] (tcp_push_one+0xb4/0x13c)
[<c01b9f40>] (tcp_push_one+0x0/0x13c) from [<c01ad640>] (tcp_sendmsg+0x9a8/0xcdc)
r8 = C2EF30A0 r7 = 000005A8 r6 = 00000000 r5 = C329F040
r4 = C2141820
[<c01acc98>] (tcp_sendmsg+0x0/0xcdc) from [<c01ccc94>] (inet_sendmsg+0x60/0x64)
[<c01ccc34>] (inet_sendmsg+0x0/0x64) from [<c017b49c>] (sock_aio_write+0x100/0x104)
r7 = C14B9E94 r6 = 00000001 r5 = C14B9E9C r4 = C2238A20
[<c017b3a0>] (sock_aio_write+0x4/0x104) from [<c008c360>] (do_sync_write+0xc8/0x114)
r8 = C14B9E94 r7 = C14B9EE4 r6 = C14B9E9C r5 = 00000000
r4 = 00000000
[<c008c298>] (do_sync_write+0x0/0x114) from [<c008c524>] (vfs_write+0x178/0x18c)
[<c008c3ac>] (vfs_write+0x0/0x18c) from [<c008c600>] (sys_write+0x4c/0x7c)
[<c008c5b4>] (sys_write+0x0/0x7c) from [<c001fee0>] (ret_fast_syscall+0x0/0x2c)
r8 = C0020084 r7 = 00000004 r6 = 00082000 r5 = 00002000
r4 = BE974C9C
-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [BUG] 2.6.20.1-rt8 irnet + pppd recursive spinlock...
2007-04-05 7:09 [BUG] 2.6.20.1-rt8 irnet + pppd recursive spinlock Guennadi Liakhovetski
@ 2007-04-05 10:59 ` Guennadi Liakhovetski
2007-04-05 15:11 ` Guennadi Liakhovetski
[not found] ` <Pine.LNX.4.63.0704051250170.534-pjC7uWgG1yLsj6RlE8knwQ@public.gmane.org>
0 siblings, 2 replies; 9+ messages in thread
From: Guennadi Liakhovetski @ 2007-04-05 10:59 UTC (permalink / raw)
To: Paul Mackerras; +Cc: irda-users, linux-rt-users, netdev
Ok, a simple analysis reveals the recursive spinlock:
On Thu, 5 Apr 2007, Guennadi Liakhovetski wrote:
> [<bf12b220>] (ppp_channel_push+0x0/0xc8 [ppp_generic]) from [<bf12bf98>] (ppp_output_wakeup+0x18/0x1c [ppp_generic])
===> ^^^^^^^^^^^^^^^^
> r7 = C38F42BC r6 = C38F4200 r5 = C38F4200 r4 = 00000000
===> spin_lock_bh(&pch->downl);
> [<bf12bf80>] (ppp_output_wakeup+0x0/0x1c [ppp_generic]) from [<bf132c98>] (irnet_flow_indication+0x38/0x3c [irnet])
> [<bf132c60>] (irnet_flow_indication+0x0/0x3c [irnet]) from [<bf104e4c>] (irttp_run_tx_queue+0x1c0/0x1d4 [irda])
> [<bf104c8c>] (irttp_run_tx_queue+0x0/0x1d4 [irda]) from [<bf104f88>] (irttp_data_request+0x128/0x4f8 [irda])
> r8 = BF121560 r7 = 00000002 r6 = C38F4200 r5 = C21418B8
> r4 = C21418B8
> [<bf104e60>] (irttp_data_request+0x0/0x4f8 [irda]) from [<bf1321bc>] (ppp_irnet_send+0x134/0x238 [irnet])
> [<bf132088>] (ppp_irnet_send+0x0/0x238 [irnet]) from [<bf12a600>] (ppp_push+0x80/0xb8 [ppp_generic])
> r7 = C3A436E0 r6 = 00000000 r5 = C21418B8 r4 = C1489600
> [<bf12a580>] (ppp_push+0x0/0xb8 [ppp_generic]) from [<bf12a8d8>] (ppp_xmit_process+0x34/0x50c [ppp_generic])
===> ^^^^^^^^
> r7 = 00000021 r6 = C21418B8 r5 = C1489600 r4 = 00000000
===> spin_lock_bh(&pch->downl);
> [<bf12a8a4>] (ppp_xmit_process+0x0/0x50c [ppp_generic]) from [<bf12aed8>] (ppp_start_xmit+0x128/0x254 [ppp_generic])
> [<bf12adb0>] (ppp_start_xmit+0x0/0x254 [ppp_generic]) from [<c0186fa4>] (dev_hard_start_xmit+0x170/0x268)
> [<c0186e34>] (dev_hard_start_xmit+0x0/0x268) from [<c01979b8>] (__qdisc_run+0x60/0x270)
> r8 = C1BBC914 r7 = C21418B8 r6 = 00000000 r5 = C21418B8
> r4 = C1BBC800
> [<c0197958>] (__qdisc_run+0x0/0x270) from [<c0187250>] (dev_queue_xmit+0x1b4/0x25c)
Now, does anyone have an idea how best to fix it?
1. Should re-entrance in ppp_channel_push() be prevented, and if yes - at
which level? Or
2. Should re-entrance be allowed and only recursive spin_lock_bh()
avoided?
Thanks
Guennadi
---------------------------------
Guennadi Liakhovetski, Ph.D.
DSA Daten- und Systemtechnik GmbH
Pascalstr. 28
D-52076 Aachen
Germany
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [BUG] 2.6.20.1-rt8 irnet + pppd recursive spinlock...
2007-04-05 10:59 ` Guennadi Liakhovetski
@ 2007-04-05 15:11 ` Guennadi Liakhovetski
[not found] ` <Pine.LNX.4.63.0704051250170.534-pjC7uWgG1yLsj6RlE8knwQ@public.gmane.org>
1 sibling, 0 replies; 9+ messages in thread
From: Guennadi Liakhovetski @ 2007-04-05 15:11 UTC (permalink / raw)
To: Paul Mackerras; +Cc: irda-users, linux-rt-users, netdev
On Thu, 5 Apr 2007, Guennadi Liakhovetski wrote:
> Ok, a simple analysis reveals the recursive spinlock:
>
> On Thu, 5 Apr 2007, Guennadi Liakhovetski wrote:
>
> > [<bf12b220>] (ppp_channel_push+0x0/0xc8 [ppp_generic]) from [<bf12bf98>] (ppp_output_wakeup+0x18/0x1c [ppp_generic])
> ===> ^^^^^^^^^^^^^^^^
> > r7 = C38F42BC r6 = C38F4200 r5 = C38F4200 r4 = 00000000
>
> ===> spin_lock_bh(&pch->downl);
>
> > [<bf12bf80>] (ppp_output_wakeup+0x0/0x1c [ppp_generic]) from [<bf132c98>] (irnet_flow_indication+0x38/0x3c [irnet])
> > [<bf132c60>] (irnet_flow_indication+0x0/0x3c [irnet]) from [<bf104e4c>] (irttp_run_tx_queue+0x1c0/0x1d4 [irda])
> > [<bf104c8c>] (irttp_run_tx_queue+0x0/0x1d4 [irda]) from [<bf104f88>] (irttp_data_request+0x128/0x4f8 [irda])
> > r8 = BF121560 r7 = 00000002 r6 = C38F4200 r5 = C21418B8
> > r4 = C21418B8
> > [<bf104e60>] (irttp_data_request+0x0/0x4f8 [irda]) from [<bf1321bc>] (ppp_irnet_send+0x134/0x238 [irnet])
> > [<bf132088>] (ppp_irnet_send+0x0/0x238 [irnet]) from [<bf12a600>] (ppp_push+0x80/0xb8 [ppp_generic])
> > r7 = C3A436E0 r6 = 00000000 r5 = C21418B8 r4 = C1489600
> > [<bf12a580>] (ppp_push+0x0/0xb8 [ppp_generic]) from [<bf12a8d8>] (ppp_xmit_process+0x34/0x50c [ppp_generic])
> ===> ^^^^^^^^
> > r7 = 00000021 r6 = C21418B8 r5 = C1489600 r4 = 00000000
>
> ===> spin_lock_bh(&pch->downl);
For comments, below is a patch I am testing ATM. It doesn't look right nor
very pretty, but I couldn't come up with anything better. I do sign-off
for it in case nothing better is proposed and it is decided to push it for
2.6.21.
Thanks
Guennadi
---------------------------------
Guennadi Liakhovetski, Ph.D.
DSA Daten- und Systemtechnik GmbH
Pascalstr. 28
D-52076 Aachen
Germany
Fix recursion with PPP over IrNET.
Signed-off-by: G. Liakhovetski <gl@dsa-ac.de>
diff -u -r1.1.1.19.4.1 ppp_generic.c
--- a/drivers/net/ppp_generic.c 26 Mar 2007 09:21:32 -0000 1.1.1.19.4.1
+++ b/drivers/net/ppp_generic.c 5 Apr 2007 15:01:45 -0000
@@ -155,6 +155,7 @@
struct ppp_channel *chan; /* public channel data structure */
struct rw_semaphore chan_sem; /* protects `chan' during chan ioctl */
spinlock_t downl; /* protects `chan', file.xq dequeue */
+ struct task_struct *locker; /* owner of the downl lock */
struct ppp *ppp; /* ppp unit we're connected to */
struct list_head clist; /* link in list of channels per unit */
rwlock_t upl; /* protects `ppp' */
@@ -1214,8 +1215,11 @@
spin_lock_bh(&pch->downl);
if (pch->chan) {
+ /* Prevent recursive locking */
+ pch->locker = current;
if (pch->chan->ops->start_xmit(pch->chan, skb))
ppp->xmit_pending = NULL;
+ pch->locker = NULL;
} else {
/* channel got unregistered */
kfree_skb(skb);
@@ -1435,6 +1439,9 @@
struct sk_buff *skb;
struct ppp *ppp;
+ if (pch->locker == current)
+ return;
+
spin_lock_bh(&pch->downl);
if (pch->chan != 0) {
while (!skb_queue_empty(&pch->file.xq)) {
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [BUG] 2.6.20.1-rt8 irnet + pppd recursive spinlock...
[not found] ` <Pine.LNX.4.63.0704051250170.534-pjC7uWgG1yLsj6RlE8knwQ@public.gmane.org>
@ 2007-04-07 0:59 ` Samuel Ortiz
2007-04-10 18:10 ` [irda-users] " Samuel Ortiz
0 siblings, 1 reply; 9+ messages in thread
From: Samuel Ortiz @ 2007-04-07 0:59 UTC (permalink / raw)
To: Guennadi Liakhovetski
Cc: irda-users-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f, Paul Mackerras,
linux-rt-users-u79uwXL29TY76Z2rM5mHXA,
netdev-u79uwXL29TY76Z2rM5mHXA
Hi Guennadi,
On Thu, Apr 05, 2007 at 12:59:40PM +0200, Guennadi Liakhovetski wrote:
> Ok, a simple analysis reveals the recursive spinlock:
>
> On Thu, 5 Apr 2007, Guennadi Liakhovetski wrote:
>
> > [<bf12b220>] (ppp_channel_push+0x0/0xc8 [ppp_generic]) from [<bf12bf98>] (ppp_output_wakeup+0x18/0x1c [ppp_generic])
> ===> ^^^^^^^^^^^^^^^^
> > r7 = C38F42BC r6 = C38F4200 r5 = C38F4200 r4 = 00000000
>
> ===> spin_lock_bh(&pch->downl);
>
> > [<bf12bf80>] (ppp_output_wakeup+0x0/0x1c [ppp_generic]) from [<bf132c98>] (irnet_flow_indication+0x38/0x3c [irnet])
> > [<bf132c60>] (irnet_flow_indication+0x0/0x3c [irnet]) from [<bf104e4c>] (irttp_run_tx_queue+0x1c0/0x1d4 [irda])
> > [<bf104c8c>] (irttp_run_tx_queue+0x0/0x1d4 [irda]) from [<bf104f88>] (irttp_data_request+0x128/0x4f8 [irda])
> > r8 = BF121560 r7 = 00000002 r6 = C38F4200 r5 = C21418B8
> > r4 = C21418B8
> > [<bf104e60>] (irttp_data_request+0x0/0x4f8 [irda]) from [<bf1321bc>] (ppp_irnet_send+0x134/0x238 [irnet])
> > [<bf132088>] (ppp_irnet_send+0x0/0x238 [irnet]) from [<bf12a600>] (ppp_push+0x80/0xb8 [ppp_generic])
> > r7 = C3A436E0 r6 = 00000000 r5 = C21418B8 r4 = C1489600
> > [<bf12a580>] (ppp_push+0x0/0xb8 [ppp_generic]) from [<bf12a8d8>] (ppp_xmit_process+0x34/0x50c [ppp_generic])
> ===> ^^^^^^^^
> > r7 = 00000021 r6 = C21418B8 r5 = C1489600 r4 = 00000000
>
> ===> spin_lock_bh(&pch->downl);
>
> > [<bf12a8a4>] (ppp_xmit_process+0x0/0x50c [ppp_generic]) from [<bf12aed8>] (ppp_start_xmit+0x128/0x254 [ppp_generic])
> > [<bf12adb0>] (ppp_start_xmit+0x0/0x254 [ppp_generic]) from [<c0186fa4>] (dev_hard_start_xmit+0x170/0x268)
> > [<c0186e34>] (dev_hard_start_xmit+0x0/0x268) from [<c01979b8>] (__qdisc_run+0x60/0x270)
> > r8 = C1BBC914 r7 = C21418B8 r6 = 00000000 r5 = C21418B8
> > r4 = C1BBC800
> > [<c0197958>] (__qdisc_run+0x0/0x270) from [<c0187250>] (dev_queue_xmit+0x1b4/0x25c)
>
> Now, does anyone have an idea how best to fix it?
>
> 1. Should re-entrance in ppp_channel_push() be prevented, and if yes - at
> which level? Or
>
> 2. Should re-entrance be allowed and only recursive spin_lock_bh()
> avoided?
IMHO, irnet_flow_indication() should be called asynchronously by
irttp_run_tx_queue(), through some bottom-half mechanism. That would fix your
locking issues, and that would reduce the time we spend in the IrDA code with
this lock taken.
I will try to come up with some patches for you later this weekend.
Cheers,
Samuel.
-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [irda-users] [BUG] 2.6.20.1-rt8 irnet + pppd recursive spinlock...
2007-04-07 0:59 ` Samuel Ortiz
@ 2007-04-10 18:10 ` Samuel Ortiz
2007-04-11 10:02 ` Guennadi Liakhovetski
2007-04-30 13:24 ` Guennadi Liakhovetski
0 siblings, 2 replies; 9+ messages in thread
From: Samuel Ortiz @ 2007-04-10 18:10 UTC (permalink / raw)
To: Guennadi Liakhovetski; +Cc: Paul Mackerras, irda-users, linux-rt-users, netdev
Hi Guennadi,
On Sat, Apr 07, 2007 at 03:59:26AM +0300, Samuel Ortiz wrote:
> IMHO, irnet_flow_indication() should be called asynchronously by
> irttp_run_tx_queue(), through some bottom-half mechanism. That would fix your
> locking issues, and that would reduce the time we spend in the IrDA code with
> this lock taken.
>
> I will try to come up with some patches for you later this weekend.
The patch below schedules irnet_flow_indication() asynchronously. Could
you please give it a try (it builds, but I couldn't test it...) ? :
diff --git a/include/net/irda/irttp.h b/include/net/irda/irttp.h
index a899e58..941f0f1 100644
--- a/include/net/irda/irttp.h
+++ b/include/net/irda/irttp.h
@@ -128,6 +128,7 @@ struct tsap_cb {
struct net_device_stats stats;
struct timer_list todo_timer;
+ struct work_struct irnet_flow_work; /* irttp asynchronous flow restart */
__u32 max_seg_size; /* Max data that fit into an IrLAP frame */
__u8 max_header_size;
diff --git a/net/irda/irnet/irnet.h b/net/irda/irnet/irnet.h
diff --git a/net/irda/irttp.c b/net/irda/irttp.c
index 7069e4a..a0d0f26 100644
--- a/net/irda/irttp.c
+++ b/net/irda/irttp.c
@@ -367,6 +367,29 @@ static int irttp_param_max_sdu_size(void *instance, irda_param_t *param,
/*************************** CLIENT CALLS ***************************/
/************************** LMP CALLBACKS **************************/
/* Everything is happily mixed up. Waiting for next clean up - Jean II */
+static void irttp_flow_restart(struct work_struct *work)
+{
+ struct tsap_cb * self =
+ container_of(work, struct tsap_cb, irnet_flow_work);
+
+ if (self == NULL)
+ return;
+
+ /* Check if we can accept more frames from client. */
+ if ((self->tx_sdu_busy) &&
+ (skb_queue_len(&self->tx_queue) < TTP_TX_LOW_THRESHOLD) &&
+ (!self->close_pend)) {
+ if (self->notify.flow_indication)
+ self->notify.flow_indication(self->notify.instance,
+ self, FLOW_START);
+
+ /* self->tx_sdu_busy is the state of the client.
+ * We don't really have a race here, but it's always safer
+ * to update our state after the client - Jean II */
+ self->tx_sdu_busy = FALSE;
+ }
+}
+
/*
* Function irttp_open_tsap (stsap, notify)
@@ -402,6 +425,8 @@ struct tsap_cb *irttp_open_tsap(__u8 stsap_sel, int credit, notify_t *notify)
self->todo_timer.data = (unsigned long) self;
self->todo_timer.function = &irttp_todo_expired;
+ INIT_WORK(&self->irnet_flow_work, irttp_flow_restart);
+
/* Initialize callbacks for IrLMP to use */
irda_notify_init(&ttp_notify);
ttp_notify.connect_confirm = irttp_connect_confirm;
@@ -761,25 +786,10 @@ static void irttp_run_tx_queue(struct tsap_cb *self)
self->stats.tx_packets++;
}
- /* Check if we can accept more frames from client.
- * We don't want to wait until the todo timer to do that, and we
- * can't use tasklets (grr...), so we are obliged to give control
- * to client. That's ok, this test will be true not too often
- * (max once per LAP window) and we are called from places
- * where we can spend a bit of time doing stuff. - Jean II */
if ((self->tx_sdu_busy) &&
(skb_queue_len(&self->tx_queue) < TTP_TX_LOW_THRESHOLD) &&
(!self->close_pend))
- {
- if (self->notify.flow_indication)
- self->notify.flow_indication(self->notify.instance,
- self, FLOW_START);
-
- /* self->tx_sdu_busy is the state of the client.
- * We don't really have a race here, but it's always safer
- * to update our state after the client - Jean II */
- self->tx_sdu_busy = FALSE;
- }
+ schedule_work(&self->irnet_flow_work);
/* Reset lock */
self->tx_queue_lock = 0;
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [irda-users] [BUG] 2.6.20.1-rt8 irnet + pppd recursive spinlock...
2007-04-10 18:10 ` [irda-users] " Samuel Ortiz
@ 2007-04-11 10:02 ` Guennadi Liakhovetski
2007-04-30 13:24 ` Guennadi Liakhovetski
1 sibling, 0 replies; 9+ messages in thread
From: Guennadi Liakhovetski @ 2007-04-11 10:02 UTC (permalink / raw)
To: Samuel Ortiz; +Cc: Paul Mackerras, irda-users, linux-rt-users, netdev
On Tue, 10 Apr 2007, Samuel Ortiz wrote:
> The patch below schedules irnet_flow_indication() asynchronously. Could
> you please give it a try (it builds, but I couldn't test it...) ? :
I'm afraid, your patch makes it worse - with my patch to ppp_generic it
worked 5 hours before Ir stopped, remaining at 4mbaud, and only discovery
timer was expiring periodically without actually doing anything; with your
patch the same happens in about 5 minutes.
I'll double-verify it, and then will try the same work-queue approach, but
from ppp_generic, rather than from irttp, combining our 2 patches.
Thanks
Guennadi
---------------------------------
Guennadi Liakhovetski, Ph.D.
DSA Daten- und Systemtechnik GmbH
Pascalstr. 28
D-52076 Aachen
Germany
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [irda-users] [BUG] 2.6.20.1-rt8 irnet + pppd recursive spinlock...
2007-04-10 18:10 ` [irda-users] " Samuel Ortiz
2007-04-11 10:02 ` Guennadi Liakhovetski
@ 2007-04-30 13:24 ` Guennadi Liakhovetski
2007-05-01 0:53 ` Samuel Ortiz
1 sibling, 1 reply; 9+ messages in thread
From: Guennadi Liakhovetski @ 2007-04-30 13:24 UTC (permalink / raw)
To: Samuel Ortiz; +Cc: Paul Mackerras, irda-users, linux-rt-users, netdev
On Tue, 10 Apr 2007, Samuel Ortiz wrote:
> Hi Guennadi,
>
> The patch below schedules irnet_flow_indication() asynchronously. Could
> you please give it a try (it builds, but I couldn't test it...) ? :
Ok, your patch (still below) works too (now that I fixed that state
machine race, btw, we still have to decide on the final form how it goes
in the mainline) __after__ you also add the line
+ INIT_WORK(&new->irnet_flow_work, irttp_flow_restart);
in irttp_dup() (remember spinlock_init()?:-)), otherwise it oopses.
Generally, I like your patch better than mine to ppp_generic.c, where I
explicitly check if a recursion is occurring. Still, I am a bit concerned
about introducing yet another execution context into irda... We have seen
a couple of locking issues there already in the last 2-3 months especially
under rt-preempt... Would you be able to run some tests too? I will be
testing it too, but don't know how much longer and how intensively. Do you
think we might get some new problems with this new context?
Thanks
Guennadi
>
> diff --git a/include/net/irda/irttp.h b/include/net/irda/irttp.h
> index a899e58..941f0f1 100644
> --- a/include/net/irda/irttp.h
> +++ b/include/net/irda/irttp.h
> @@ -128,6 +128,7 @@ struct tsap_cb {
>
> struct net_device_stats stats;
> struct timer_list todo_timer;
> + struct work_struct irnet_flow_work; /* irttp asynchronous flow restart */
>
> __u32 max_seg_size; /* Max data that fit into an IrLAP frame */
> __u8 max_header_size;
> diff --git a/net/irda/irnet/irnet.h b/net/irda/irnet/irnet.h
> diff --git a/net/irda/irttp.c b/net/irda/irttp.c
> index 7069e4a..a0d0f26 100644
> --- a/net/irda/irttp.c
> +++ b/net/irda/irttp.c
> @@ -367,6 +367,29 @@ static int irttp_param_max_sdu_size(void *instance, irda_param_t *param,
> /*************************** CLIENT CALLS ***************************/
> /************************** LMP CALLBACKS **************************/
> /* Everything is happily mixed up. Waiting for next clean up - Jean II */
> +static void irttp_flow_restart(struct work_struct *work)
> +{
> + struct tsap_cb * self =
> + container_of(work, struct tsap_cb, irnet_flow_work);
> +
> + if (self == NULL)
> + return;
> +
> + /* Check if we can accept more frames from client. */
> + if ((self->tx_sdu_busy) &&
> + (skb_queue_len(&self->tx_queue) < TTP_TX_LOW_THRESHOLD) &&
> + (!self->close_pend)) {
> + if (self->notify.flow_indication)
> + self->notify.flow_indication(self->notify.instance,
> + self, FLOW_START);
> +
> + /* self->tx_sdu_busy is the state of the client.
> + * We don't really have a race here, but it's always safer
> + * to update our state after the client - Jean II */
> + self->tx_sdu_busy = FALSE;
> + }
> +}
> +
>
> /*
> * Function irttp_open_tsap (stsap, notify)
> @@ -402,6 +425,8 @@ struct tsap_cb *irttp_open_tsap(__u8 stsap_sel, int credit, notify_t *notify)
> self->todo_timer.data = (unsigned long) self;
> self->todo_timer.function = &irttp_todo_expired;
>
> + INIT_WORK(&self->irnet_flow_work, irttp_flow_restart);
> +
> /* Initialize callbacks for IrLMP to use */
> irda_notify_init(&ttp_notify);
> ttp_notify.connect_confirm = irttp_connect_confirm;
> @@ -761,25 +786,10 @@ static void irttp_run_tx_queue(struct tsap_cb *self)
> self->stats.tx_packets++;
> }
>
> - /* Check if we can accept more frames from client.
> - * We don't want to wait until the todo timer to do that, and we
> - * can't use tasklets (grr...), so we are obliged to give control
> - * to client. That's ok, this test will be true not too often
> - * (max once per LAP window) and we are called from places
> - * where we can spend a bit of time doing stuff. - Jean II */
> if ((self->tx_sdu_busy) &&
> (skb_queue_len(&self->tx_queue) < TTP_TX_LOW_THRESHOLD) &&
> (!self->close_pend))
> - {
> - if (self->notify.flow_indication)
> - self->notify.flow_indication(self->notify.instance,
> - self, FLOW_START);
> -
> - /* self->tx_sdu_busy is the state of the client.
> - * We don't really have a race here, but it's always safer
> - * to update our state after the client - Jean II */
> - self->tx_sdu_busy = FALSE;
> - }
> + schedule_work(&self->irnet_flow_work);
>
> /* Reset lock */
> self->tx_queue_lock = 0;
>
>
>
---------------------------------
Guennadi Liakhovetski, Ph.D.
DSA Daten- und Systemtechnik GmbH
Pascalstr. 28
D-52076 Aachen
Germany
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [irda-users] [BUG] 2.6.20.1-rt8 irnet + pppd recursive spinlock...
2007-04-30 13:24 ` Guennadi Liakhovetski
@ 2007-05-01 0:53 ` Samuel Ortiz
2007-05-02 11:05 ` Guennadi Liakhovetski
0 siblings, 1 reply; 9+ messages in thread
From: Samuel Ortiz @ 2007-05-01 0:53 UTC (permalink / raw)
To: Guennadi Liakhovetski; +Cc: irda-users, Paul Mackerras, linux-rt-users, netdev
On Mon, Apr 30, 2007 at 03:24:05PM +0200, Guennadi Liakhovetski wrote:
> On Tue, 10 Apr 2007, Samuel Ortiz wrote:
>
> > Hi Guennadi,
> >
> > The patch below schedules irnet_flow_indication() asynchronously. Could
> > you please give it a try (it builds, but I couldn't test it...) ? :
>
> Ok, your patch (still below) works too (now that I fixed that state
> machine race, btw, we still have to decide on the final form how it goes
> in the mainline) __after__ you also add the line
>
> + INIT_WORK(&new->irnet_flow_work, irttp_flow_restart);
>
> in irttp_dup() (remember spinlock_init()?:-)), otherwise it oopses.
good catch, again...Yes, I do remember the irttp_dup bug ;-)
> Generally, I like your patch better than mine to ppp_generic.c, where I
> explicitly check if a recursion is occurring. Still, I am a bit concerned
> about introducing yet another execution context into irda... We have seen
> a couple of locking issues there already in the last 2-3 months especially
> under rt-preempt... Would you be able to run some tests too?
I think I can run some tests here as well, but probably not as many as you:
I'm not doing IrDA stuff full time while it seems you currently are.
But I will definitely spend some time this week running my IrDA stack with this
patch applied. I didn't bother to do that earlier as you first reported some
oops with this patch applied.
> I will be
> testing it too, but don't know how much longer and how intensively. Do you
> think we might get some new problems with this new context?
It seems quite safe to me. But more importantly, I thing we do want to call
the flow indication routine asynchronously in that specific case.
There is one thing that this patch is missing though: we should probably
clean the worqueue right before we destroy the TTP instance. The work routine
checks for NULL pointer, but still...
Cheers,
Samuel.
> Thanks
> Guennadi
>
> >
> > diff --git a/include/net/irda/irttp.h b/include/net/irda/irttp.h
> > index a899e58..941f0f1 100644
> > --- a/include/net/irda/irttp.h
> > +++ b/include/net/irda/irttp.h
> > @@ -128,6 +128,7 @@ struct tsap_cb {
> >
> > struct net_device_stats stats;
> > struct timer_list todo_timer;
> > + struct work_struct irnet_flow_work; /* irttp asynchronous flow restart */
> >
> > __u32 max_seg_size; /* Max data that fit into an IrLAP frame */
> > __u8 max_header_size;
> > diff --git a/net/irda/irnet/irnet.h b/net/irda/irnet/irnet.h
> > diff --git a/net/irda/irttp.c b/net/irda/irttp.c
> > index 7069e4a..a0d0f26 100644
> > --- a/net/irda/irttp.c
> > +++ b/net/irda/irttp.c
> > @@ -367,6 +367,29 @@ static int irttp_param_max_sdu_size(void *instance, irda_param_t *param,
> > /*************************** CLIENT CALLS ***************************/
> > /************************** LMP CALLBACKS **************************/
> > /* Everything is happily mixed up. Waiting for next clean up - Jean II */
> > +static void irttp_flow_restart(struct work_struct *work)
> > +{
> > + struct tsap_cb * self =
> > + container_of(work, struct tsap_cb, irnet_flow_work);
> > +
> > + if (self == NULL)
> > + return;
> > +
> > + /* Check if we can accept more frames from client. */
> > + if ((self->tx_sdu_busy) &&
> > + (skb_queue_len(&self->tx_queue) < TTP_TX_LOW_THRESHOLD) &&
> > + (!self->close_pend)) {
> > + if (self->notify.flow_indication)
> > + self->notify.flow_indication(self->notify.instance,
> > + self, FLOW_START);
> > +
> > + /* self->tx_sdu_busy is the state of the client.
> > + * We don't really have a race here, but it's always safer
> > + * to update our state after the client - Jean II */
> > + self->tx_sdu_busy = FALSE;
> > + }
> > +}
> > +
> >
> > /*
> > * Function irttp_open_tsap (stsap, notify)
> > @@ -402,6 +425,8 @@ struct tsap_cb *irttp_open_tsap(__u8 stsap_sel, int credit, notify_t *notify)
> > self->todo_timer.data = (unsigned long) self;
> > self->todo_timer.function = &irttp_todo_expired;
> >
> > + INIT_WORK(&self->irnet_flow_work, irttp_flow_restart);
> > +
> > /* Initialize callbacks for IrLMP to use */
> > irda_notify_init(&ttp_notify);
> > ttp_notify.connect_confirm = irttp_connect_confirm;
> > @@ -761,25 +786,10 @@ static void irttp_run_tx_queue(struct tsap_cb *self)
> > self->stats.tx_packets++;
> > }
> >
> > - /* Check if we can accept more frames from client.
> > - * We don't want to wait until the todo timer to do that, and we
> > - * can't use tasklets (grr...), so we are obliged to give control
> > - * to client. That's ok, this test will be true not too often
> > - * (max once per LAP window) and we are called from places
> > - * where we can spend a bit of time doing stuff. - Jean II */
> > if ((self->tx_sdu_busy) &&
> > (skb_queue_len(&self->tx_queue) < TTP_TX_LOW_THRESHOLD) &&
> > (!self->close_pend))
> > - {
> > - if (self->notify.flow_indication)
> > - self->notify.flow_indication(self->notify.instance,
> > - self, FLOW_START);
> > -
> > - /* self->tx_sdu_busy is the state of the client.
> > - * We don't really have a race here, but it's always safer
> > - * to update our state after the client - Jean II */
> > - self->tx_sdu_busy = FALSE;
> > - }
> > + schedule_work(&self->irnet_flow_work);
> >
> > /* Reset lock */
> > self->tx_queue_lock = 0;
> >
> >
> >
>
> ---------------------------------
> Guennadi Liakhovetski, Ph.D.
> DSA Daten- und Systemtechnik GmbH
> Pascalstr. 28
> D-52076 Aachen
> Germany
>
> -------------------------------------------------------------------------
> This SF.net email is sponsored by DB2 Express
> Download DB2 Express C - the FREE version of DB2 express and take
> control of your XML. No limits. Just data. Click to get it now.
> http://sourceforge.net/powerbar/db2/
> _______________________________________________
> irda-users mailing list
> irda-users@lists.sourceforge.net
> http://lists.sourceforge.net/lists/listinfo/irda-users
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [irda-users] [BUG] 2.6.20.1-rt8 irnet + pppd recursive spinlock...
2007-05-01 0:53 ` Samuel Ortiz
@ 2007-05-02 11:05 ` Guennadi Liakhovetski
0 siblings, 0 replies; 9+ messages in thread
From: Guennadi Liakhovetski @ 2007-05-02 11:05 UTC (permalink / raw)
To: Samuel Ortiz; +Cc: irda-users, Paul Mackerras, linux-rt-users, netdev
On Tue, 1 May 2007, Samuel Ortiz wrote:
> But I will definitely spend some time this week running my IrDA stack
> with this patch applied. I didn't bother to do that earlier as you first
> reported some oops with this patch applied.
I think, what I reported was not an Oops, but the race that we're also
fixing ATM - the one in the state machine. So, unrelated.
> On Mon, Apr 30, 2007 at 03:24:05PM +0200, Guennadi Liakhovetski wrote:
> > in irttp_dup() (remember spinlock_init()?:-)), otherwise it oopses.
> good catch, again...Yes, I do remember the irttp_dup bug ;-)
I've put a tsap_init() function that does those common initialisation
calls, so we have a smaller chance of forgetting the dup() path next
time...
> > I will be
> > testing it too, but don't know how much longer and how intensively. Do you
> > think we might get some new problems with this new context?
> It seems quite safe to me. But more importantly, I thing we do want to call
> the flow indication routine asynchronously in that specific case.
> There is one thing that this patch is missing though: we should probably
> clean the worqueue right before we destroy the TTP instance. The work routine
> checks for NULL pointer, but still...
I thought about it too, but the only thing you can do is
flush_scheduled_work(), is this what you mean?
The test with your patch stopped inside a ftp transfer - the ftp client
was doing a get, and it stopped half-way through. On the client side only
the control tcp connection was still open, on the client both ftp-server
forked children were dead, no connection open. No errors in logs. Weird.
With my "ugly" patch it ran the weekend through.
...and it just stopped again - this time after just 13 minutes.
A couple of comments to your patch:
> > > +static void irttp_flow_restart(struct work_struct *work)
> > > +{
> > > + struct tsap_cb * self =
> > > + container_of(work, struct tsap_cb, irnet_flow_work);
> > > +
> > > + if (self == NULL)
> > > + return;
self will not be NULL here. How about
+ IRDA_ASSERT(work != NULL, return;);
+ IRDA_ASSERT(self->magic == TTP_TSAP_MAGIC, return;);
> > > - /* Check if we can accept more frames from client.
> > > - * We don't want to wait until the todo timer to do that, and we
> > > - * can't use tasklets (grr...), so we are obliged to give control
> > > - * to client. That's ok, this test will be true not too often
> > > - * (max once per LAP window) and we are called from places
> > > - * where we can spend a bit of time doing stuff. - Jean II */
> > > if ((self->tx_sdu_busy) &&
> > > (skb_queue_len(&self->tx_queue) < TTP_TX_LOW_THRESHOLD) &&
> > > (!self->close_pend))
Is this test still needed here? You do it again in the work-function, and
the conditions can change, so, should we schedule_work unconditionally
here?
Thanks
Guennadi
---------------------------------
Guennadi Liakhovetski, Ph.D.
DSA Daten- und Systemtechnik GmbH
Pascalstr. 28
D-52076 Aachen
Germany
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2007-05-02 11:05 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-04-05 7:09 [BUG] 2.6.20.1-rt8 irnet + pppd recursive spinlock Guennadi Liakhovetski
2007-04-05 10:59 ` Guennadi Liakhovetski
2007-04-05 15:11 ` Guennadi Liakhovetski
[not found] ` <Pine.LNX.4.63.0704051250170.534-pjC7uWgG1yLsj6RlE8knwQ@public.gmane.org>
2007-04-07 0:59 ` Samuel Ortiz
2007-04-10 18:10 ` [irda-users] " Samuel Ortiz
2007-04-11 10:02 ` Guennadi Liakhovetski
2007-04-30 13:24 ` Guennadi Liakhovetski
2007-05-01 0:53 ` Samuel Ortiz
2007-05-02 11:05 ` Guennadi Liakhovetski
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).