* [BUG] 2.6.20.1-rt8 irnet + pppd recursive spinlock... @ 2007-04-05 7:09 Guennadi Liakhovetski 2007-04-05 10:59 ` Guennadi Liakhovetski 0 siblings, 1 reply; 9+ messages in thread From: Guennadi Liakhovetski @ 2007-04-05 7:09 UTC (permalink / raw) To: Paul Mackerras Cc: irda-users-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f, linux-rt-users-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA Hi all As I came this morning to check the IrNET / PPP test, I started yesterday, the device was dead and OOM messages were scrolling up the terminal. I captured task trace, and the ftp process seems to have been the original culprit. Below is the backtrace. Which looks like a recursive spinlock, since, I assume, it is this BUG() that has triggered: BUG_ON(rt_mutex_owner(lock) == current); and ppp is indeed entered recursively in the trace. I'll look if I can find the reason and a solution, but would also be greatful for any hints. Thanks Guennadi --------------------------------- Guennadi Liakhovetski, Ph.D. DSA Daten- und Systemtechnik GmbH Pascalstr. 28 D-52076 Aachen Germany ftp D [c3c9e460] C01E5838 0 18445 1 20756 14588 (L-TLB) [<c01e5420>] (__schedule+0x0/0x7e8) from [<c01e5cfc>] (schedule+0x54/0x124) [<c01e5ca8>] (schedule+0x0/0x124) from [<c017e69c>] (lock_sock_nested+0x94/0xd0) r5 = C329F06C r4 = C14B9780 [<c017e608>] (lock_sock_nested+0x0/0xd0) from [<c017b878>] (sock_fasync+0x40/0x154) r7 = C329F040 r6 = C2238A5C r5 = C0AD8B60 r4 = C0AD8B60 [<c017b838>] (sock_fasync+0x0/0x154) from [<c017b9b0>] (sock_close+0x24/0x44) [<c017b98c>] (sock_close+0x0/0x44) from [<c008dbbc>] (__fput+0x194/0x1c8) r4 = 00000008 [<c008da28>] (__fput+0x0/0x1c8) from [<c008dc28>] (fput+0x38/0x3c) r8 = 00000000 r7 = C3251380 r6 = 00000000 r5 = C3251380 r4 = C0AD8B60 [<c008dbf0>] (fput+0x0/0x3c) from [<c008b860>] (filp_close+0x5c/0x88) [<c008b804>] (filp_close+0x0/0x88) from [<c003f108>] (put_files_struct+0x9c/0xdc) r6 = C3251388 r5 = 00000007 r4 = 00000001 [<c003f06c>] (put_files_struct+0x0/0xdc) from [<c003fa14>] (do_exit+0x168/0x8b0) [<c003f8ac>] (do_exit+0x0/0x8b0) from [<c002411c>] (die+0x29c/0x2e8) [<c0023e80>] (die+0x0/0x2e8) from [<c0025b3c>] (__do_kernel_fault+0x70/0x80) [<c0025acc>] (__do_kernel_fault+0x0/0x80) from [<c0025cd0>] (do_page_fault+0x60/0x214) r7 = C1B3B8C0 r6 = C0264418 r5 = C3C9E460 r4 = C02643A8 [<c0025c70>] (do_page_fault+0x0/0x214) from [<c0025fa0>] (do_DataAbort+0x3c/0xa4) [<c0025f64>] (do_DataAbort+0x0/0xa4) from [<c001fa60>] (__dabt_svc+0x40/0x60) r8 = 00000001 r7 = A0000013 r6 = C3A43780 r5 = C14B99E0 r4 = FFFFFFFF [<c0023cd0>] (__bug+0x0/0x2c) from [<c01e753c>] (rt_spin_lock_slowlock+0x1c8/0x1f8) [<c01e7374>] (rt_spin_lock_slowlock+0x0/0x1f8) from [<c01e7894>] (__lock_text_start+0x44/0x48) [<c01e7850>] (__lock_text_start+0x0/0x48) from [<bf12b23c>] (ppp_channel_push+0x1c/0xc8 [ppp_generic]) [<bf12b220>] (ppp_channel_push+0x0/0xc8 [ppp_generic]) from [<bf12bf98>] (ppp_output_wakeup+0x18/0x1c [ppp_generic]) r7 = C38F42BC r6 = C38F4200 r5 = C38F4200 r4 = 00000000 [<bf12bf80>] (ppp_output_wakeup+0x0/0x1c [ppp_generic]) from [<bf132c98>] (irnet_flow_indication+0x38/0x3c [irnet]) [<bf132c60>] (irnet_flow_indication+0x0/0x3c [irnet]) from [<bf104e4c>] (irttp_run_tx_queue+0x1c0/0x1d4 [irda]) [<bf104c8c>] (irttp_run_tx_queue+0x0/0x1d4 [irda]) from [<bf104f88>] (irttp_data_request+0x128/0x4f8 [irda]) r8 = BF121560 r7 = 00000002 r6 = C38F4200 r5 = C21418B8 r4 = C21418B8 [<bf104e60>] (irttp_data_request+0x0/0x4f8 [irda]) from [<bf1321bc>] (ppp_irnet_send+0x134/0x238 [irnet]) [<bf132088>] (ppp_irnet_send+0x0/0x238 [irnet]) from [<bf12a600>] (ppp_push+0x80/0xb8 [ppp_generic]) r7 = C3A436E0 r6 = 00000000 r5 = C21418B8 r4 = C1489600 [<bf12a580>] (ppp_push+0x0/0xb8 [ppp_generic]) from [<bf12a8d8>] (ppp_xmit_process+0x34/0x50c [ppp_generic]) r7 = 00000021 r6 = C21418B8 r5 = C1489600 r4 = 00000000 [<bf12a8a4>] (ppp_xmit_process+0x0/0x50c [ppp_generic]) from [<bf12aed8>] (ppp_start_xmit+0x128/0x254 [ppp_generic]) [<bf12adb0>] (ppp_start_xmit+0x0/0x254 [ppp_generic]) from [<c0186fa4>] (dev_hard_start_xmit+0x170/0x268) [<c0186e34>] (dev_hard_start_xmit+0x0/0x268) from [<c01979b8>] (__qdisc_run+0x60/0x270) r8 = C1BBC914 r7 = C21418B8 r6 = 00000000 r5 = C21418B8 r4 = C1BBC800 [<c0197958>] (__qdisc_run+0x0/0x270) from [<c0187250>] (dev_queue_xmit+0x1b4/0x25c) [<c018709c>] (dev_queue_xmit+0x0/0x25c) from [<c01a5f08>] (ip_output+0x150/0x254) r7 = C329F040 r6 = C21418B8 r5 = 00000000 r4 = C0D02EE0 [<c01a5db8>] (ip_output+0x0/0x254) from [<c01a52ac>] (ip_queue_xmit+0x360/0x4b4) [<c01a4f4c>] (ip_queue_xmit+0x0/0x4b4) from [<c01b8424>] (tcp_transmit_skb+0x5ec/0x8c0) [<c01b7e38>] (tcp_transmit_skb+0x0/0x8c0) from [<c01b9ff4>] (tcp_push_one+0xb4/0x13c) [<c01b9f40>] (tcp_push_one+0x0/0x13c) from [<c01ad640>] (tcp_sendmsg+0x9a8/0xcdc) r8 = C2EF30A0 r7 = 000005A8 r6 = 00000000 r5 = C329F040 r4 = C2141820 [<c01acc98>] (tcp_sendmsg+0x0/0xcdc) from [<c01ccc94>] (inet_sendmsg+0x60/0x64) [<c01ccc34>] (inet_sendmsg+0x0/0x64) from [<c017b49c>] (sock_aio_write+0x100/0x104) r7 = C14B9E94 r6 = 00000001 r5 = C14B9E9C r4 = C2238A20 [<c017b3a0>] (sock_aio_write+0x4/0x104) from [<c008c360>] (do_sync_write+0xc8/0x114) r8 = C14B9E94 r7 = C14B9EE4 r6 = C14B9E9C r5 = 00000000 r4 = 00000000 [<c008c298>] (do_sync_write+0x0/0x114) from [<c008c524>] (vfs_write+0x178/0x18c) [<c008c3ac>] (vfs_write+0x0/0x18c) from [<c008c600>] (sys_write+0x4c/0x7c) [<c008c5b4>] (sys_write+0x0/0x7c) from [<c001fee0>] (ret_fast_syscall+0x0/0x2c) r8 = C0020084 r7 = 00000004 r6 = 00082000 r5 = 00002000 r4 = BE974C9C ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys-and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [BUG] 2.6.20.1-rt8 irnet + pppd recursive spinlock... 2007-04-05 7:09 [BUG] 2.6.20.1-rt8 irnet + pppd recursive spinlock Guennadi Liakhovetski @ 2007-04-05 10:59 ` Guennadi Liakhovetski 2007-04-05 15:11 ` Guennadi Liakhovetski [not found] ` <Pine.LNX.4.63.0704051250170.534-pjC7uWgG1yLsj6RlE8knwQ@public.gmane.org> 0 siblings, 2 replies; 9+ messages in thread From: Guennadi Liakhovetski @ 2007-04-05 10:59 UTC (permalink / raw) To: Paul Mackerras; +Cc: irda-users, linux-rt-users, netdev Ok, a simple analysis reveals the recursive spinlock: On Thu, 5 Apr 2007, Guennadi Liakhovetski wrote: > [<bf12b220>] (ppp_channel_push+0x0/0xc8 [ppp_generic]) from [<bf12bf98>] (ppp_output_wakeup+0x18/0x1c [ppp_generic]) ===> ^^^^^^^^^^^^^^^^ > r7 = C38F42BC r6 = C38F4200 r5 = C38F4200 r4 = 00000000 ===> spin_lock_bh(&pch->downl); > [<bf12bf80>] (ppp_output_wakeup+0x0/0x1c [ppp_generic]) from [<bf132c98>] (irnet_flow_indication+0x38/0x3c [irnet]) > [<bf132c60>] (irnet_flow_indication+0x0/0x3c [irnet]) from [<bf104e4c>] (irttp_run_tx_queue+0x1c0/0x1d4 [irda]) > [<bf104c8c>] (irttp_run_tx_queue+0x0/0x1d4 [irda]) from [<bf104f88>] (irttp_data_request+0x128/0x4f8 [irda]) > r8 = BF121560 r7 = 00000002 r6 = C38F4200 r5 = C21418B8 > r4 = C21418B8 > [<bf104e60>] (irttp_data_request+0x0/0x4f8 [irda]) from [<bf1321bc>] (ppp_irnet_send+0x134/0x238 [irnet]) > [<bf132088>] (ppp_irnet_send+0x0/0x238 [irnet]) from [<bf12a600>] (ppp_push+0x80/0xb8 [ppp_generic]) > r7 = C3A436E0 r6 = 00000000 r5 = C21418B8 r4 = C1489600 > [<bf12a580>] (ppp_push+0x0/0xb8 [ppp_generic]) from [<bf12a8d8>] (ppp_xmit_process+0x34/0x50c [ppp_generic]) ===> ^^^^^^^^ > r7 = 00000021 r6 = C21418B8 r5 = C1489600 r4 = 00000000 ===> spin_lock_bh(&pch->downl); > [<bf12a8a4>] (ppp_xmit_process+0x0/0x50c [ppp_generic]) from [<bf12aed8>] (ppp_start_xmit+0x128/0x254 [ppp_generic]) > [<bf12adb0>] (ppp_start_xmit+0x0/0x254 [ppp_generic]) from [<c0186fa4>] (dev_hard_start_xmit+0x170/0x268) > [<c0186e34>] (dev_hard_start_xmit+0x0/0x268) from [<c01979b8>] (__qdisc_run+0x60/0x270) > r8 = C1BBC914 r7 = C21418B8 r6 = 00000000 r5 = C21418B8 > r4 = C1BBC800 > [<c0197958>] (__qdisc_run+0x0/0x270) from [<c0187250>] (dev_queue_xmit+0x1b4/0x25c) Now, does anyone have an idea how best to fix it? 1. Should re-entrance in ppp_channel_push() be prevented, and if yes - at which level? Or 2. Should re-entrance be allowed and only recursive spin_lock_bh() avoided? Thanks Guennadi --------------------------------- Guennadi Liakhovetski, Ph.D. DSA Daten- und Systemtechnik GmbH Pascalstr. 28 D-52076 Aachen Germany ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [BUG] 2.6.20.1-rt8 irnet + pppd recursive spinlock... 2007-04-05 10:59 ` Guennadi Liakhovetski @ 2007-04-05 15:11 ` Guennadi Liakhovetski [not found] ` <Pine.LNX.4.63.0704051250170.534-pjC7uWgG1yLsj6RlE8knwQ@public.gmane.org> 1 sibling, 0 replies; 9+ messages in thread From: Guennadi Liakhovetski @ 2007-04-05 15:11 UTC (permalink / raw) To: Paul Mackerras; +Cc: irda-users, linux-rt-users, netdev On Thu, 5 Apr 2007, Guennadi Liakhovetski wrote: > Ok, a simple analysis reveals the recursive spinlock: > > On Thu, 5 Apr 2007, Guennadi Liakhovetski wrote: > > > [<bf12b220>] (ppp_channel_push+0x0/0xc8 [ppp_generic]) from [<bf12bf98>] (ppp_output_wakeup+0x18/0x1c [ppp_generic]) > ===> ^^^^^^^^^^^^^^^^ > > r7 = C38F42BC r6 = C38F4200 r5 = C38F4200 r4 = 00000000 > > ===> spin_lock_bh(&pch->downl); > > > [<bf12bf80>] (ppp_output_wakeup+0x0/0x1c [ppp_generic]) from [<bf132c98>] (irnet_flow_indication+0x38/0x3c [irnet]) > > [<bf132c60>] (irnet_flow_indication+0x0/0x3c [irnet]) from [<bf104e4c>] (irttp_run_tx_queue+0x1c0/0x1d4 [irda]) > > [<bf104c8c>] (irttp_run_tx_queue+0x0/0x1d4 [irda]) from [<bf104f88>] (irttp_data_request+0x128/0x4f8 [irda]) > > r8 = BF121560 r7 = 00000002 r6 = C38F4200 r5 = C21418B8 > > r4 = C21418B8 > > [<bf104e60>] (irttp_data_request+0x0/0x4f8 [irda]) from [<bf1321bc>] (ppp_irnet_send+0x134/0x238 [irnet]) > > [<bf132088>] (ppp_irnet_send+0x0/0x238 [irnet]) from [<bf12a600>] (ppp_push+0x80/0xb8 [ppp_generic]) > > r7 = C3A436E0 r6 = 00000000 r5 = C21418B8 r4 = C1489600 > > [<bf12a580>] (ppp_push+0x0/0xb8 [ppp_generic]) from [<bf12a8d8>] (ppp_xmit_process+0x34/0x50c [ppp_generic]) > ===> ^^^^^^^^ > > r7 = 00000021 r6 = C21418B8 r5 = C1489600 r4 = 00000000 > > ===> spin_lock_bh(&pch->downl); For comments, below is a patch I am testing ATM. It doesn't look right nor very pretty, but I couldn't come up with anything better. I do sign-off for it in case nothing better is proposed and it is decided to push it for 2.6.21. Thanks Guennadi --------------------------------- Guennadi Liakhovetski, Ph.D. DSA Daten- und Systemtechnik GmbH Pascalstr. 28 D-52076 Aachen Germany Fix recursion with PPP over IrNET. Signed-off-by: G. Liakhovetski <gl@dsa-ac.de> diff -u -r1.1.1.19.4.1 ppp_generic.c --- a/drivers/net/ppp_generic.c 26 Mar 2007 09:21:32 -0000 1.1.1.19.4.1 +++ b/drivers/net/ppp_generic.c 5 Apr 2007 15:01:45 -0000 @@ -155,6 +155,7 @@ struct ppp_channel *chan; /* public channel data structure */ struct rw_semaphore chan_sem; /* protects `chan' during chan ioctl */ spinlock_t downl; /* protects `chan', file.xq dequeue */ + struct task_struct *locker; /* owner of the downl lock */ struct ppp *ppp; /* ppp unit we're connected to */ struct list_head clist; /* link in list of channels per unit */ rwlock_t upl; /* protects `ppp' */ @@ -1214,8 +1215,11 @@ spin_lock_bh(&pch->downl); if (pch->chan) { + /* Prevent recursive locking */ + pch->locker = current; if (pch->chan->ops->start_xmit(pch->chan, skb)) ppp->xmit_pending = NULL; + pch->locker = NULL; } else { /* channel got unregistered */ kfree_skb(skb); @@ -1435,6 +1439,9 @@ struct sk_buff *skb; struct ppp *ppp; + if (pch->locker == current) + return; + spin_lock_bh(&pch->downl); if (pch->chan != 0) { while (!skb_queue_empty(&pch->file.xq)) { ^ permalink raw reply [flat|nested] 9+ messages in thread
[parent not found: <Pine.LNX.4.63.0704051250170.534-pjC7uWgG1yLsj6RlE8knwQ@public.gmane.org>]
* Re: [BUG] 2.6.20.1-rt8 irnet + pppd recursive spinlock... [not found] ` <Pine.LNX.4.63.0704051250170.534-pjC7uWgG1yLsj6RlE8knwQ@public.gmane.org> @ 2007-04-07 0:59 ` Samuel Ortiz 2007-04-10 18:10 ` [irda-users] " Samuel Ortiz 0 siblings, 1 reply; 9+ messages in thread From: Samuel Ortiz @ 2007-04-07 0:59 UTC (permalink / raw) To: Guennadi Liakhovetski Cc: irda-users-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f, Paul Mackerras, linux-rt-users-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA Hi Guennadi, On Thu, Apr 05, 2007 at 12:59:40PM +0200, Guennadi Liakhovetski wrote: > Ok, a simple analysis reveals the recursive spinlock: > > On Thu, 5 Apr 2007, Guennadi Liakhovetski wrote: > > > [<bf12b220>] (ppp_channel_push+0x0/0xc8 [ppp_generic]) from [<bf12bf98>] (ppp_output_wakeup+0x18/0x1c [ppp_generic]) > ===> ^^^^^^^^^^^^^^^^ > > r7 = C38F42BC r6 = C38F4200 r5 = C38F4200 r4 = 00000000 > > ===> spin_lock_bh(&pch->downl); > > > [<bf12bf80>] (ppp_output_wakeup+0x0/0x1c [ppp_generic]) from [<bf132c98>] (irnet_flow_indication+0x38/0x3c [irnet]) > > [<bf132c60>] (irnet_flow_indication+0x0/0x3c [irnet]) from [<bf104e4c>] (irttp_run_tx_queue+0x1c0/0x1d4 [irda]) > > [<bf104c8c>] (irttp_run_tx_queue+0x0/0x1d4 [irda]) from [<bf104f88>] (irttp_data_request+0x128/0x4f8 [irda]) > > r8 = BF121560 r7 = 00000002 r6 = C38F4200 r5 = C21418B8 > > r4 = C21418B8 > > [<bf104e60>] (irttp_data_request+0x0/0x4f8 [irda]) from [<bf1321bc>] (ppp_irnet_send+0x134/0x238 [irnet]) > > [<bf132088>] (ppp_irnet_send+0x0/0x238 [irnet]) from [<bf12a600>] (ppp_push+0x80/0xb8 [ppp_generic]) > > r7 = C3A436E0 r6 = 00000000 r5 = C21418B8 r4 = C1489600 > > [<bf12a580>] (ppp_push+0x0/0xb8 [ppp_generic]) from [<bf12a8d8>] (ppp_xmit_process+0x34/0x50c [ppp_generic]) > ===> ^^^^^^^^ > > r7 = 00000021 r6 = C21418B8 r5 = C1489600 r4 = 00000000 > > ===> spin_lock_bh(&pch->downl); > > > [<bf12a8a4>] (ppp_xmit_process+0x0/0x50c [ppp_generic]) from [<bf12aed8>] (ppp_start_xmit+0x128/0x254 [ppp_generic]) > > [<bf12adb0>] (ppp_start_xmit+0x0/0x254 [ppp_generic]) from [<c0186fa4>] (dev_hard_start_xmit+0x170/0x268) > > [<c0186e34>] (dev_hard_start_xmit+0x0/0x268) from [<c01979b8>] (__qdisc_run+0x60/0x270) > > r8 = C1BBC914 r7 = C21418B8 r6 = 00000000 r5 = C21418B8 > > r4 = C1BBC800 > > [<c0197958>] (__qdisc_run+0x0/0x270) from [<c0187250>] (dev_queue_xmit+0x1b4/0x25c) > > Now, does anyone have an idea how best to fix it? > > 1. Should re-entrance in ppp_channel_push() be prevented, and if yes - at > which level? Or > > 2. Should re-entrance be allowed and only recursive spin_lock_bh() > avoided? IMHO, irnet_flow_indication() should be called asynchronously by irttp_run_tx_queue(), through some bottom-half mechanism. That would fix your locking issues, and that would reduce the time we spend in the IrDA code with this lock taken. I will try to come up with some patches for you later this weekend. Cheers, Samuel. ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys-and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [irda-users] [BUG] 2.6.20.1-rt8 irnet + pppd recursive spinlock... 2007-04-07 0:59 ` Samuel Ortiz @ 2007-04-10 18:10 ` Samuel Ortiz 2007-04-11 10:02 ` Guennadi Liakhovetski 2007-04-30 13:24 ` Guennadi Liakhovetski 0 siblings, 2 replies; 9+ messages in thread From: Samuel Ortiz @ 2007-04-10 18:10 UTC (permalink / raw) To: Guennadi Liakhovetski; +Cc: Paul Mackerras, irda-users, linux-rt-users, netdev Hi Guennadi, On Sat, Apr 07, 2007 at 03:59:26AM +0300, Samuel Ortiz wrote: > IMHO, irnet_flow_indication() should be called asynchronously by > irttp_run_tx_queue(), through some bottom-half mechanism. That would fix your > locking issues, and that would reduce the time we spend in the IrDA code with > this lock taken. > > I will try to come up with some patches for you later this weekend. The patch below schedules irnet_flow_indication() asynchronously. Could you please give it a try (it builds, but I couldn't test it...) ? : diff --git a/include/net/irda/irttp.h b/include/net/irda/irttp.h index a899e58..941f0f1 100644 --- a/include/net/irda/irttp.h +++ b/include/net/irda/irttp.h @@ -128,6 +128,7 @@ struct tsap_cb { struct net_device_stats stats; struct timer_list todo_timer; + struct work_struct irnet_flow_work; /* irttp asynchronous flow restart */ __u32 max_seg_size; /* Max data that fit into an IrLAP frame */ __u8 max_header_size; diff --git a/net/irda/irnet/irnet.h b/net/irda/irnet/irnet.h diff --git a/net/irda/irttp.c b/net/irda/irttp.c index 7069e4a..a0d0f26 100644 --- a/net/irda/irttp.c +++ b/net/irda/irttp.c @@ -367,6 +367,29 @@ static int irttp_param_max_sdu_size(void *instance, irda_param_t *param, /*************************** CLIENT CALLS ***************************/ /************************** LMP CALLBACKS **************************/ /* Everything is happily mixed up. Waiting for next clean up - Jean II */ +static void irttp_flow_restart(struct work_struct *work) +{ + struct tsap_cb * self = + container_of(work, struct tsap_cb, irnet_flow_work); + + if (self == NULL) + return; + + /* Check if we can accept more frames from client. */ + if ((self->tx_sdu_busy) && + (skb_queue_len(&self->tx_queue) < TTP_TX_LOW_THRESHOLD) && + (!self->close_pend)) { + if (self->notify.flow_indication) + self->notify.flow_indication(self->notify.instance, + self, FLOW_START); + + /* self->tx_sdu_busy is the state of the client. + * We don't really have a race here, but it's always safer + * to update our state after the client - Jean II */ + self->tx_sdu_busy = FALSE; + } +} + /* * Function irttp_open_tsap (stsap, notify) @@ -402,6 +425,8 @@ struct tsap_cb *irttp_open_tsap(__u8 stsap_sel, int credit, notify_t *notify) self->todo_timer.data = (unsigned long) self; self->todo_timer.function = &irttp_todo_expired; + INIT_WORK(&self->irnet_flow_work, irttp_flow_restart); + /* Initialize callbacks for IrLMP to use */ irda_notify_init(&ttp_notify); ttp_notify.connect_confirm = irttp_connect_confirm; @@ -761,25 +786,10 @@ static void irttp_run_tx_queue(struct tsap_cb *self) self->stats.tx_packets++; } - /* Check if we can accept more frames from client. - * We don't want to wait until the todo timer to do that, and we - * can't use tasklets (grr...), so we are obliged to give control - * to client. That's ok, this test will be true not too often - * (max once per LAP window) and we are called from places - * where we can spend a bit of time doing stuff. - Jean II */ if ((self->tx_sdu_busy) && (skb_queue_len(&self->tx_queue) < TTP_TX_LOW_THRESHOLD) && (!self->close_pend)) - { - if (self->notify.flow_indication) - self->notify.flow_indication(self->notify.instance, - self, FLOW_START); - - /* self->tx_sdu_busy is the state of the client. - * We don't really have a race here, but it's always safer - * to update our state after the client - Jean II */ - self->tx_sdu_busy = FALSE; - } + schedule_work(&self->irnet_flow_work); /* Reset lock */ self->tx_queue_lock = 0; ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [irda-users] [BUG] 2.6.20.1-rt8 irnet + pppd recursive spinlock... 2007-04-10 18:10 ` [irda-users] " Samuel Ortiz @ 2007-04-11 10:02 ` Guennadi Liakhovetski 2007-04-30 13:24 ` Guennadi Liakhovetski 1 sibling, 0 replies; 9+ messages in thread From: Guennadi Liakhovetski @ 2007-04-11 10:02 UTC (permalink / raw) To: Samuel Ortiz; +Cc: Paul Mackerras, irda-users, linux-rt-users, netdev On Tue, 10 Apr 2007, Samuel Ortiz wrote: > The patch below schedules irnet_flow_indication() asynchronously. Could > you please give it a try (it builds, but I couldn't test it...) ? : I'm afraid, your patch makes it worse - with my patch to ppp_generic it worked 5 hours before Ir stopped, remaining at 4mbaud, and only discovery timer was expiring periodically without actually doing anything; with your patch the same happens in about 5 minutes. I'll double-verify it, and then will try the same work-queue approach, but from ppp_generic, rather than from irttp, combining our 2 patches. Thanks Guennadi --------------------------------- Guennadi Liakhovetski, Ph.D. DSA Daten- und Systemtechnik GmbH Pascalstr. 28 D-52076 Aachen Germany ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [irda-users] [BUG] 2.6.20.1-rt8 irnet + pppd recursive spinlock... 2007-04-10 18:10 ` [irda-users] " Samuel Ortiz 2007-04-11 10:02 ` Guennadi Liakhovetski @ 2007-04-30 13:24 ` Guennadi Liakhovetski 2007-05-01 0:53 ` Samuel Ortiz 1 sibling, 1 reply; 9+ messages in thread From: Guennadi Liakhovetski @ 2007-04-30 13:24 UTC (permalink / raw) To: Samuel Ortiz; +Cc: Paul Mackerras, irda-users, linux-rt-users, netdev On Tue, 10 Apr 2007, Samuel Ortiz wrote: > Hi Guennadi, > > The patch below schedules irnet_flow_indication() asynchronously. Could > you please give it a try (it builds, but I couldn't test it...) ? : Ok, your patch (still below) works too (now that I fixed that state machine race, btw, we still have to decide on the final form how it goes in the mainline) __after__ you also add the line + INIT_WORK(&new->irnet_flow_work, irttp_flow_restart); in irttp_dup() (remember spinlock_init()?:-)), otherwise it oopses. Generally, I like your patch better than mine to ppp_generic.c, where I explicitly check if a recursion is occurring. Still, I am a bit concerned about introducing yet another execution context into irda... We have seen a couple of locking issues there already in the last 2-3 months especially under rt-preempt... Would you be able to run some tests too? I will be testing it too, but don't know how much longer and how intensively. Do you think we might get some new problems with this new context? Thanks Guennadi > > diff --git a/include/net/irda/irttp.h b/include/net/irda/irttp.h > index a899e58..941f0f1 100644 > --- a/include/net/irda/irttp.h > +++ b/include/net/irda/irttp.h > @@ -128,6 +128,7 @@ struct tsap_cb { > > struct net_device_stats stats; > struct timer_list todo_timer; > + struct work_struct irnet_flow_work; /* irttp asynchronous flow restart */ > > __u32 max_seg_size; /* Max data that fit into an IrLAP frame */ > __u8 max_header_size; > diff --git a/net/irda/irnet/irnet.h b/net/irda/irnet/irnet.h > diff --git a/net/irda/irttp.c b/net/irda/irttp.c > index 7069e4a..a0d0f26 100644 > --- a/net/irda/irttp.c > +++ b/net/irda/irttp.c > @@ -367,6 +367,29 @@ static int irttp_param_max_sdu_size(void *instance, irda_param_t *param, > /*************************** CLIENT CALLS ***************************/ > /************************** LMP CALLBACKS **************************/ > /* Everything is happily mixed up. Waiting for next clean up - Jean II */ > +static void irttp_flow_restart(struct work_struct *work) > +{ > + struct tsap_cb * self = > + container_of(work, struct tsap_cb, irnet_flow_work); > + > + if (self == NULL) > + return; > + > + /* Check if we can accept more frames from client. */ > + if ((self->tx_sdu_busy) && > + (skb_queue_len(&self->tx_queue) < TTP_TX_LOW_THRESHOLD) && > + (!self->close_pend)) { > + if (self->notify.flow_indication) > + self->notify.flow_indication(self->notify.instance, > + self, FLOW_START); > + > + /* self->tx_sdu_busy is the state of the client. > + * We don't really have a race here, but it's always safer > + * to update our state after the client - Jean II */ > + self->tx_sdu_busy = FALSE; > + } > +} > + > > /* > * Function irttp_open_tsap (stsap, notify) > @@ -402,6 +425,8 @@ struct tsap_cb *irttp_open_tsap(__u8 stsap_sel, int credit, notify_t *notify) > self->todo_timer.data = (unsigned long) self; > self->todo_timer.function = &irttp_todo_expired; > > + INIT_WORK(&self->irnet_flow_work, irttp_flow_restart); > + > /* Initialize callbacks for IrLMP to use */ > irda_notify_init(&ttp_notify); > ttp_notify.connect_confirm = irttp_connect_confirm; > @@ -761,25 +786,10 @@ static void irttp_run_tx_queue(struct tsap_cb *self) > self->stats.tx_packets++; > } > > - /* Check if we can accept more frames from client. > - * We don't want to wait until the todo timer to do that, and we > - * can't use tasklets (grr...), so we are obliged to give control > - * to client. That's ok, this test will be true not too often > - * (max once per LAP window) and we are called from places > - * where we can spend a bit of time doing stuff. - Jean II */ > if ((self->tx_sdu_busy) && > (skb_queue_len(&self->tx_queue) < TTP_TX_LOW_THRESHOLD) && > (!self->close_pend)) > - { > - if (self->notify.flow_indication) > - self->notify.flow_indication(self->notify.instance, > - self, FLOW_START); > - > - /* self->tx_sdu_busy is the state of the client. > - * We don't really have a race here, but it's always safer > - * to update our state after the client - Jean II */ > - self->tx_sdu_busy = FALSE; > - } > + schedule_work(&self->irnet_flow_work); > > /* Reset lock */ > self->tx_queue_lock = 0; > > > --------------------------------- Guennadi Liakhovetski, Ph.D. DSA Daten- und Systemtechnik GmbH Pascalstr. 28 D-52076 Aachen Germany ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [irda-users] [BUG] 2.6.20.1-rt8 irnet + pppd recursive spinlock... 2007-04-30 13:24 ` Guennadi Liakhovetski @ 2007-05-01 0:53 ` Samuel Ortiz 2007-05-02 11:05 ` Guennadi Liakhovetski 0 siblings, 1 reply; 9+ messages in thread From: Samuel Ortiz @ 2007-05-01 0:53 UTC (permalink / raw) To: Guennadi Liakhovetski; +Cc: irda-users, Paul Mackerras, linux-rt-users, netdev On Mon, Apr 30, 2007 at 03:24:05PM +0200, Guennadi Liakhovetski wrote: > On Tue, 10 Apr 2007, Samuel Ortiz wrote: > > > Hi Guennadi, > > > > The patch below schedules irnet_flow_indication() asynchronously. Could > > you please give it a try (it builds, but I couldn't test it...) ? : > > Ok, your patch (still below) works too (now that I fixed that state > machine race, btw, we still have to decide on the final form how it goes > in the mainline) __after__ you also add the line > > + INIT_WORK(&new->irnet_flow_work, irttp_flow_restart); > > in irttp_dup() (remember spinlock_init()?:-)), otherwise it oopses. good catch, again...Yes, I do remember the irttp_dup bug ;-) > Generally, I like your patch better than mine to ppp_generic.c, where I > explicitly check if a recursion is occurring. Still, I am a bit concerned > about introducing yet another execution context into irda... We have seen > a couple of locking issues there already in the last 2-3 months especially > under rt-preempt... Would you be able to run some tests too? I think I can run some tests here as well, but probably not as many as you: I'm not doing IrDA stuff full time while it seems you currently are. But I will definitely spend some time this week running my IrDA stack with this patch applied. I didn't bother to do that earlier as you first reported some oops with this patch applied. > I will be > testing it too, but don't know how much longer and how intensively. Do you > think we might get some new problems with this new context? It seems quite safe to me. But more importantly, I thing we do want to call the flow indication routine asynchronously in that specific case. There is one thing that this patch is missing though: we should probably clean the worqueue right before we destroy the TTP instance. The work routine checks for NULL pointer, but still... Cheers, Samuel. > Thanks > Guennadi > > > > > diff --git a/include/net/irda/irttp.h b/include/net/irda/irttp.h > > index a899e58..941f0f1 100644 > > --- a/include/net/irda/irttp.h > > +++ b/include/net/irda/irttp.h > > @@ -128,6 +128,7 @@ struct tsap_cb { > > > > struct net_device_stats stats; > > struct timer_list todo_timer; > > + struct work_struct irnet_flow_work; /* irttp asynchronous flow restart */ > > > > __u32 max_seg_size; /* Max data that fit into an IrLAP frame */ > > __u8 max_header_size; > > diff --git a/net/irda/irnet/irnet.h b/net/irda/irnet/irnet.h > > diff --git a/net/irda/irttp.c b/net/irda/irttp.c > > index 7069e4a..a0d0f26 100644 > > --- a/net/irda/irttp.c > > +++ b/net/irda/irttp.c > > @@ -367,6 +367,29 @@ static int irttp_param_max_sdu_size(void *instance, irda_param_t *param, > > /*************************** CLIENT CALLS ***************************/ > > /************************** LMP CALLBACKS **************************/ > > /* Everything is happily mixed up. Waiting for next clean up - Jean II */ > > +static void irttp_flow_restart(struct work_struct *work) > > +{ > > + struct tsap_cb * self = > > + container_of(work, struct tsap_cb, irnet_flow_work); > > + > > + if (self == NULL) > > + return; > > + > > + /* Check if we can accept more frames from client. */ > > + if ((self->tx_sdu_busy) && > > + (skb_queue_len(&self->tx_queue) < TTP_TX_LOW_THRESHOLD) && > > + (!self->close_pend)) { > > + if (self->notify.flow_indication) > > + self->notify.flow_indication(self->notify.instance, > > + self, FLOW_START); > > + > > + /* self->tx_sdu_busy is the state of the client. > > + * We don't really have a race here, but it's always safer > > + * to update our state after the client - Jean II */ > > + self->tx_sdu_busy = FALSE; > > + } > > +} > > + > > > > /* > > * Function irttp_open_tsap (stsap, notify) > > @@ -402,6 +425,8 @@ struct tsap_cb *irttp_open_tsap(__u8 stsap_sel, int credit, notify_t *notify) > > self->todo_timer.data = (unsigned long) self; > > self->todo_timer.function = &irttp_todo_expired; > > > > + INIT_WORK(&self->irnet_flow_work, irttp_flow_restart); > > + > > /* Initialize callbacks for IrLMP to use */ > > irda_notify_init(&ttp_notify); > > ttp_notify.connect_confirm = irttp_connect_confirm; > > @@ -761,25 +786,10 @@ static void irttp_run_tx_queue(struct tsap_cb *self) > > self->stats.tx_packets++; > > } > > > > - /* Check if we can accept more frames from client. > > - * We don't want to wait until the todo timer to do that, and we > > - * can't use tasklets (grr...), so we are obliged to give control > > - * to client. That's ok, this test will be true not too often > > - * (max once per LAP window) and we are called from places > > - * where we can spend a bit of time doing stuff. - Jean II */ > > if ((self->tx_sdu_busy) && > > (skb_queue_len(&self->tx_queue) < TTP_TX_LOW_THRESHOLD) && > > (!self->close_pend)) > > - { > > - if (self->notify.flow_indication) > > - self->notify.flow_indication(self->notify.instance, > > - self, FLOW_START); > > - > > - /* self->tx_sdu_busy is the state of the client. > > - * We don't really have a race here, but it's always safer > > - * to update our state after the client - Jean II */ > > - self->tx_sdu_busy = FALSE; > > - } > > + schedule_work(&self->irnet_flow_work); > > > > /* Reset lock */ > > self->tx_queue_lock = 0; > > > > > > > > --------------------------------- > Guennadi Liakhovetski, Ph.D. > DSA Daten- und Systemtechnik GmbH > Pascalstr. 28 > D-52076 Aachen > Germany > > ------------------------------------------------------------------------- > This SF.net email is sponsored by DB2 Express > Download DB2 Express C - the FREE version of DB2 express and take > control of your XML. No limits. Just data. Click to get it now. > http://sourceforge.net/powerbar/db2/ > _______________________________________________ > irda-users mailing list > irda-users@lists.sourceforge.net > http://lists.sourceforge.net/lists/listinfo/irda-users ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [irda-users] [BUG] 2.6.20.1-rt8 irnet + pppd recursive spinlock... 2007-05-01 0:53 ` Samuel Ortiz @ 2007-05-02 11:05 ` Guennadi Liakhovetski 0 siblings, 0 replies; 9+ messages in thread From: Guennadi Liakhovetski @ 2007-05-02 11:05 UTC (permalink / raw) To: Samuel Ortiz; +Cc: irda-users, Paul Mackerras, linux-rt-users, netdev On Tue, 1 May 2007, Samuel Ortiz wrote: > But I will definitely spend some time this week running my IrDA stack > with this patch applied. I didn't bother to do that earlier as you first > reported some oops with this patch applied. I think, what I reported was not an Oops, but the race that we're also fixing ATM - the one in the state machine. So, unrelated. > On Mon, Apr 30, 2007 at 03:24:05PM +0200, Guennadi Liakhovetski wrote: > > in irttp_dup() (remember spinlock_init()?:-)), otherwise it oopses. > good catch, again...Yes, I do remember the irttp_dup bug ;-) I've put a tsap_init() function that does those common initialisation calls, so we have a smaller chance of forgetting the dup() path next time... > > I will be > > testing it too, but don't know how much longer and how intensively. Do you > > think we might get some new problems with this new context? > It seems quite safe to me. But more importantly, I thing we do want to call > the flow indication routine asynchronously in that specific case. > There is one thing that this patch is missing though: we should probably > clean the worqueue right before we destroy the TTP instance. The work routine > checks for NULL pointer, but still... I thought about it too, but the only thing you can do is flush_scheduled_work(), is this what you mean? The test with your patch stopped inside a ftp transfer - the ftp client was doing a get, and it stopped half-way through. On the client side only the control tcp connection was still open, on the client both ftp-server forked children were dead, no connection open. No errors in logs. Weird. With my "ugly" patch it ran the weekend through. ...and it just stopped again - this time after just 13 minutes. A couple of comments to your patch: > > > +static void irttp_flow_restart(struct work_struct *work) > > > +{ > > > + struct tsap_cb * self = > > > + container_of(work, struct tsap_cb, irnet_flow_work); > > > + > > > + if (self == NULL) > > > + return; self will not be NULL here. How about + IRDA_ASSERT(work != NULL, return;); + IRDA_ASSERT(self->magic == TTP_TSAP_MAGIC, return;); > > > - /* Check if we can accept more frames from client. > > > - * We don't want to wait until the todo timer to do that, and we > > > - * can't use tasklets (grr...), so we are obliged to give control > > > - * to client. That's ok, this test will be true not too often > > > - * (max once per LAP window) and we are called from places > > > - * where we can spend a bit of time doing stuff. - Jean II */ > > > if ((self->tx_sdu_busy) && > > > (skb_queue_len(&self->tx_queue) < TTP_TX_LOW_THRESHOLD) && > > > (!self->close_pend)) Is this test still needed here? You do it again in the work-function, and the conditions can change, so, should we schedule_work unconditionally here? Thanks Guennadi --------------------------------- Guennadi Liakhovetski, Ph.D. DSA Daten- und Systemtechnik GmbH Pascalstr. 28 D-52076 Aachen Germany ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2007-05-02 11:05 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-04-05 7:09 [BUG] 2.6.20.1-rt8 irnet + pppd recursive spinlock Guennadi Liakhovetski
2007-04-05 10:59 ` Guennadi Liakhovetski
2007-04-05 15:11 ` Guennadi Liakhovetski
[not found] ` <Pine.LNX.4.63.0704051250170.534-pjC7uWgG1yLsj6RlE8knwQ@public.gmane.org>
2007-04-07 0:59 ` Samuel Ortiz
2007-04-10 18:10 ` [irda-users] " Samuel Ortiz
2007-04-11 10:02 ` Guennadi Liakhovetski
2007-04-30 13:24 ` Guennadi Liakhovetski
2007-05-01 0:53 ` Samuel Ortiz
2007-05-02 11:05 ` Guennadi Liakhovetski
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).