[BUG] 2.6.20.1-rt8 irnet + pppd recursive spinlock...

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [BUG] 2.6.20.1-rt8 irnet + pppd recursive spinlock...
@ 2007-04-05  7:09 Guennadi Liakhovetski
  2007-04-05 10:59 ` Guennadi Liakhovetski
  0 siblings, 1 reply; 9+ messages in thread
From: Guennadi Liakhovetski @ 2007-04-05  7:09 UTC (permalink / raw)
  To: Paul Mackerras
  Cc: irda-users-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f,
	linux-rt-users-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA

Hi all

As I came this morning to check the IrNET / PPP test, I started yesterday, 
the device was dead and OOM messages were scrolling up the terminal. I 
captured task trace, and the ftp process seems to have been the original 
culprit. Below is the backtrace. Which looks like a recursive spinlock, 
since, I assume, it is this BUG() that has triggered:

	BUG_ON(rt_mutex_owner(lock) == current);

and ppp is indeed entered recursively in the trace. I'll look if I can 
find the reason and a solution, but would also be greatful for any hints.

Thanks
Guennadi
---------------------------------
Guennadi Liakhovetski, Ph.D.
DSA Daten- und Systemtechnik GmbH
Pascalstr. 28
D-52076 Aachen
Germany

ftp           D [c3c9e460] C01E5838     0 18445      1         20756 14588 (L-TLB)
[<c01e5420>] (__schedule+0x0/0x7e8) from [<c01e5cfc>] (schedule+0x54/0x124)
[<c01e5ca8>] (schedule+0x0/0x124) from [<c017e69c>] (lock_sock_nested+0x94/0xd0)
 r5 = C329F06C  r4 = C14B9780 
[<c017e608>] (lock_sock_nested+0x0/0xd0) from [<c017b878>] (sock_fasync+0x40/0x154)
 r7 = C329F040  r6 = C2238A5C  r5 = C0AD8B60  r4 = C0AD8B60
[<c017b838>] (sock_fasync+0x0/0x154) from [<c017b9b0>] (sock_close+0x24/0x44)
[<c017b98c>] (sock_close+0x0/0x44) from [<c008dbbc>] (__fput+0x194/0x1c8)
 r4 = 00000008 
[<c008da28>] (__fput+0x0/0x1c8) from [<c008dc28>] (fput+0x38/0x3c)
 r8 = 00000000  r7 = C3251380  r6 = 00000000  r5 = C3251380
 r4 = C0AD8B60 
[<c008dbf0>] (fput+0x0/0x3c) from [<c008b860>] (filp_close+0x5c/0x88)
[<c008b804>] (filp_close+0x0/0x88) from [<c003f108>] (put_files_struct+0x9c/0xdc)
 r6 = C3251388  r5 = 00000007  r4 = 00000001 
[<c003f06c>] (put_files_struct+0x0/0xdc) from [<c003fa14>] (do_exit+0x168/0x8b0)
[<c003f8ac>] (do_exit+0x0/0x8b0) from [<c002411c>] (die+0x29c/0x2e8)
[<c0023e80>] (die+0x0/0x2e8) from [<c0025b3c>] (__do_kernel_fault+0x70/0x80)
[<c0025acc>] (__do_kernel_fault+0x0/0x80) from [<c0025cd0>] (do_page_fault+0x60/0x214)
 r7 = C1B3B8C0  r6 = C0264418  r5 = C3C9E460  r4 = C02643A8
[<c0025c70>] (do_page_fault+0x0/0x214) from [<c0025fa0>] (do_DataAbort+0x3c/0xa4)
[<c0025f64>] (do_DataAbort+0x0/0xa4) from [<c001fa60>] (__dabt_svc+0x40/0x60)
 r8 = 00000001  r7 = A0000013  r6 = C3A43780  r5 = C14B99E0
 r4 = FFFFFFFF 
[<c0023cd0>] (__bug+0x0/0x2c) from [<c01e753c>] (rt_spin_lock_slowlock+0x1c8/0x1f8)
[<c01e7374>] (rt_spin_lock_slowlock+0x0/0x1f8) from [<c01e7894>] (__lock_text_start+0x44/0x48)
[<c01e7850>] (__lock_text_start+0x0/0x48) from [<bf12b23c>] (ppp_channel_push+0x1c/0xc8 [ppp_generic])
[<bf12b220>] (ppp_channel_push+0x0/0xc8 [ppp_generic]) from [<bf12bf98>] (ppp_output_wakeup+0x18/0x1c [ppp_generic])
 r7 = C38F42BC  r6 = C38F4200  r5 = C38F4200  r4 = 00000000
[<bf12bf80>] (ppp_output_wakeup+0x0/0x1c [ppp_generic]) from [<bf132c98>] (irnet_flow_indication+0x38/0x3c [irnet])
[<bf132c60>] (irnet_flow_indication+0x0/0x3c [irnet]) from [<bf104e4c>] (irttp_run_tx_queue+0x1c0/0x1d4 [irda])
[<bf104c8c>] (irttp_run_tx_queue+0x0/0x1d4 [irda]) from [<bf104f88>] (irttp_data_request+0x128/0x4f8 [irda])
 r8 = BF121560  r7 = 00000002  r6 = C38F4200  r5 = C21418B8
 r4 = C21418B8 
[<bf104e60>] (irttp_data_request+0x0/0x4f8 [irda]) from [<bf1321bc>] (ppp_irnet_send+0x134/0x238 [irnet])
[<bf132088>] (ppp_irnet_send+0x0/0x238 [irnet]) from [<bf12a600>] (ppp_push+0x80/0xb8 [ppp_generic])
 r7 = C3A436E0  r6 = 00000000  r5 = C21418B8  r4 = C1489600
[<bf12a580>] (ppp_push+0x0/0xb8 [ppp_generic]) from [<bf12a8d8>] (ppp_xmit_process+0x34/0x50c [ppp_generic])
 r7 = 00000021  r6 = C21418B8  r5 = C1489600  r4 = 00000000
[<bf12a8a4>] (ppp_xmit_process+0x0/0x50c [ppp_generic]) from [<bf12aed8>] (ppp_start_xmit+0x128/0x254 [ppp_generic])
[<bf12adb0>] (ppp_start_xmit+0x0/0x254 [ppp_generic]) from [<c0186fa4>] (dev_hard_start_xmit+0x170/0x268)
[<c0186e34>] (dev_hard_start_xmit+0x0/0x268) from [<c01979b8>] (__qdisc_run+0x60/0x270)
 r8 = C1BBC914  r7 = C21418B8  r6 = 00000000  r5 = C21418B8
 r4 = C1BBC800 
[<c0197958>] (__qdisc_run+0x0/0x270) from [<c0187250>] (dev_queue_xmit+0x1b4/0x25c)
[<c018709c>] (dev_queue_xmit+0x0/0x25c) from [<c01a5f08>] (ip_output+0x150/0x254)
 r7 = C329F040  r6 = C21418B8  r5 = 00000000  r4 = C0D02EE0
[<c01a5db8>] (ip_output+0x0/0x254) from [<c01a52ac>] (ip_queue_xmit+0x360/0x4b4)
[<c01a4f4c>] (ip_queue_xmit+0x0/0x4b4) from [<c01b8424>] (tcp_transmit_skb+0x5ec/0x8c0)
[<c01b7e38>] (tcp_transmit_skb+0x0/0x8c0) from [<c01b9ff4>] (tcp_push_one+0xb4/0x13c)
[<c01b9f40>] (tcp_push_one+0x0/0x13c) from [<c01ad640>] (tcp_sendmsg+0x9a8/0xcdc)
 r8 = C2EF30A0  r7 = 000005A8  r6 = 00000000  r5 = C329F040
 r4 = C2141820 
[<c01acc98>] (tcp_sendmsg+0x0/0xcdc) from [<c01ccc94>] (inet_sendmsg+0x60/0x64)
[<c01ccc34>] (inet_sendmsg+0x0/0x64) from [<c017b49c>] (sock_aio_write+0x100/0x104)
 r7 = C14B9E94  r6 = 00000001  r5 = C14B9E9C  r4 = C2238A20
[<c017b3a0>] (sock_aio_write+0x4/0x104) from [<c008c360>] (do_sync_write+0xc8/0x114)
 r8 = C14B9E94  r7 = C14B9EE4  r6 = C14B9E9C  r5 = 00000000
 r4 = 00000000 
[<c008c298>] (do_sync_write+0x0/0x114) from [<c008c524>] (vfs_write+0x178/0x18c)
[<c008c3ac>] (vfs_write+0x0/0x18c) from [<c008c600>] (sys_write+0x4c/0x7c)
[<c008c5b4>] (sys_write+0x0/0x7c) from [<c001fee0>] (ret_fast_syscall+0x0/0x2c)
 r8 = C0020084  r7 = 00000004  r6 = 00082000  r5 = 00002000
 r4 = BE974C9C 

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [BUG] 2.6.20.1-rt8 irnet + pppd recursive spinlock...
  2007-04-05  7:09 [BUG] 2.6.20.1-rt8 irnet + pppd recursive spinlock Guennadi Liakhovetski
@ 2007-04-05 10:59 ` Guennadi Liakhovetski
  2007-04-05 15:11   ` Guennadi Liakhovetski
       [not found]   ` <Pine.LNX.4.63.0704051250170.534-pjC7uWgG1yLsj6RlE8knwQ@public.gmane.org>
  0 siblings, 2 replies; 9+ messages in thread
From: Guennadi Liakhovetski @ 2007-04-05 10:59 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: irda-users, linux-rt-users, netdev

Ok, a simple analysis reveals the recursive spinlock:

On Thu, 5 Apr 2007, Guennadi Liakhovetski wrote:

> [<bf12b220>] (ppp_channel_push+0x0/0xc8 [ppp_generic]) from [<bf12bf98>] (ppp_output_wakeup+0x18/0x1c [ppp_generic])
===>		^^^^^^^^^^^^^^^^
>  r7 = C38F42BC  r6 = C38F4200  r5 = C38F4200  r4 = 00000000

===>	spin_lock_bh(&pch->downl);

> [<bf12bf80>] (ppp_output_wakeup+0x0/0x1c [ppp_generic]) from [<bf132c98>] (irnet_flow_indication+0x38/0x3c [irnet])
> [<bf132c60>] (irnet_flow_indication+0x0/0x3c [irnet]) from [<bf104e4c>] (irttp_run_tx_queue+0x1c0/0x1d4 [irda])
> [<bf104c8c>] (irttp_run_tx_queue+0x0/0x1d4 [irda]) from [<bf104f88>] (irttp_data_request+0x128/0x4f8 [irda])
>  r8 = BF121560  r7 = 00000002  r6 = C38F4200  r5 = C21418B8
>  r4 = C21418B8 
> [<bf104e60>] (irttp_data_request+0x0/0x4f8 [irda]) from [<bf1321bc>] (ppp_irnet_send+0x134/0x238 [irnet])
> [<bf132088>] (ppp_irnet_send+0x0/0x238 [irnet]) from [<bf12a600>] (ppp_push+0x80/0xb8 [ppp_generic])
>  r7 = C3A436E0  r6 = 00000000  r5 = C21418B8  r4 = C1489600
> [<bf12a580>] (ppp_push+0x0/0xb8 [ppp_generic]) from [<bf12a8d8>] (ppp_xmit_process+0x34/0x50c [ppp_generic])
===>		^^^^^^^^
>  r7 = 00000021  r6 = C21418B8  r5 = C1489600  r4 = 00000000

===>		spin_lock_bh(&pch->downl);

> [<bf12a8a4>] (ppp_xmit_process+0x0/0x50c [ppp_generic]) from [<bf12aed8>] (ppp_start_xmit+0x128/0x254 [ppp_generic])
> [<bf12adb0>] (ppp_start_xmit+0x0/0x254 [ppp_generic]) from [<c0186fa4>] (dev_hard_start_xmit+0x170/0x268)
> [<c0186e34>] (dev_hard_start_xmit+0x0/0x268) from [<c01979b8>] (__qdisc_run+0x60/0x270)
>  r8 = C1BBC914  r7 = C21418B8  r6 = 00000000  r5 = C21418B8
>  r4 = C1BBC800 
> [<c0197958>] (__qdisc_run+0x0/0x270) from [<c0187250>] (dev_queue_xmit+0x1b4/0x25c)

Now, does anyone have an idea how best to fix it?

1. Should re-entrance in ppp_channel_push() be prevented, and if yes - at 
which level? Or

2. Should re-entrance be allowed and only recursive spin_lock_bh() 
avoided?

Thanks
Guennadi
---------------------------------
Guennadi Liakhovetski, Ph.D.
DSA Daten- und Systemtechnik GmbH
Pascalstr. 28
D-52076 Aachen
Germany

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [BUG] 2.6.20.1-rt8 irnet + pppd recursive spinlock...
  2007-04-05 10:59 ` Guennadi Liakhovetski
@ 2007-04-05 15:11   ` Guennadi Liakhovetski
       [not found]   ` <Pine.LNX.4.63.0704051250170.534-pjC7uWgG1yLsj6RlE8knwQ@public.gmane.org>
  1 sibling, 0 replies; 9+ messages in thread
From: Guennadi Liakhovetski @ 2007-04-05 15:11 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: irda-users, linux-rt-users, netdev

On Thu, 5 Apr 2007, Guennadi Liakhovetski wrote:

> Ok, a simple analysis reveals the recursive spinlock:
> 
> On Thu, 5 Apr 2007, Guennadi Liakhovetski wrote:
> 
> > [<bf12b220>] (ppp_channel_push+0x0/0xc8 [ppp_generic]) from [<bf12bf98>] (ppp_output_wakeup+0x18/0x1c [ppp_generic])
> ===>		^^^^^^^^^^^^^^^^
> >  r7 = C38F42BC  r6 = C38F4200  r5 = C38F4200  r4 = 00000000
> 
> ===>	spin_lock_bh(&pch->downl);
> 
> > [<bf12bf80>] (ppp_output_wakeup+0x0/0x1c [ppp_generic]) from [<bf132c98>] (irnet_flow_indication+0x38/0x3c [irnet])
> > [<bf132c60>] (irnet_flow_indication+0x0/0x3c [irnet]) from [<bf104e4c>] (irttp_run_tx_queue+0x1c0/0x1d4 [irda])
> > [<bf104c8c>] (irttp_run_tx_queue+0x0/0x1d4 [irda]) from [<bf104f88>] (irttp_data_request+0x128/0x4f8 [irda])
> >  r8 = BF121560  r7 = 00000002  r6 = C38F4200  r5 = C21418B8
> >  r4 = C21418B8 
> > [<bf104e60>] (irttp_data_request+0x0/0x4f8 [irda]) from [<bf1321bc>] (ppp_irnet_send+0x134/0x238 [irnet])
> > [<bf132088>] (ppp_irnet_send+0x0/0x238 [irnet]) from [<bf12a600>] (ppp_push+0x80/0xb8 [ppp_generic])
> >  r7 = C3A436E0  r6 = 00000000  r5 = C21418B8  r4 = C1489600
> > [<bf12a580>] (ppp_push+0x0/0xb8 [ppp_generic]) from [<bf12a8d8>] (ppp_xmit_process+0x34/0x50c [ppp_generic])
> ===>		^^^^^^^^
> >  r7 = 00000021  r6 = C21418B8  r5 = C1489600  r4 = 00000000
> 
> ===>		spin_lock_bh(&pch->downl);

For comments, below is a patch I am testing ATM. It doesn't look right nor 
very pretty, but I couldn't come up with anything better. I do sign-off 
for it in case nothing better is proposed and it is decided to push it for 
2.6.21.

Thanks
Guennadi
---------------------------------
Guennadi Liakhovetski, Ph.D.
DSA Daten- und Systemtechnik GmbH
Pascalstr. 28
D-52076 Aachen
Germany

Fix recursion with PPP over IrNET.

Signed-off-by: G. Liakhovetski <gl@dsa-ac.de>

diff -u -r1.1.1.19.4.1 ppp_generic.c
--- a/drivers/net/ppp_generic.c	26 Mar 2007 09:21:32 -0000	1.1.1.19.4.1
+++ b/drivers/net/ppp_generic.c	5 Apr 2007 15:01:45 -0000
@@ -155,6 +155,7 @@
 	struct ppp_channel *chan;	/* public channel data structure */
 	struct rw_semaphore chan_sem;	/* protects `chan' during chan ioctl */
 	spinlock_t	downl;		/* protects `chan', file.xq dequeue */
+	struct task_struct *locker;	/* owner of the downl lock */
 	struct ppp	*ppp;		/* ppp unit we're connected to */
 	struct list_head clist;		/* link in list of channels per unit */
 	rwlock_t	upl;		/* protects `ppp' */
@@ -1214,8 +1215,11 @@
 
 		spin_lock_bh(&pch->downl);
 		if (pch->chan) {
+			/* Prevent recursive locking */
+			pch->locker = current;
 			if (pch->chan->ops->start_xmit(pch->chan, skb))
 				ppp->xmit_pending = NULL;
+			pch->locker = NULL;
 		} else {
 			/* channel got unregistered */
 			kfree_skb(skb);
@@ -1435,6 +1439,9 @@
 	struct sk_buff *skb;
 	struct ppp *ppp;
 
+	if (pch->locker == current)
+		return;
+
 	spin_lock_bh(&pch->downl);
 	if (pch->chan != 0) {
 		while (!skb_queue_empty(&pch->file.xq)) {

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [BUG] 2.6.20.1-rt8 irnet + pppd recursive spinlock...
       [not found]   ` <Pine.LNX.4.63.0704051250170.534-pjC7uWgG1yLsj6RlE8knwQ@public.gmane.org>
@ 2007-04-07  0:59     ` Samuel Ortiz
  2007-04-10 18:10       ` [irda-users] " Samuel Ortiz
  0 siblings, 1 reply; 9+ messages in thread
From: Samuel Ortiz @ 2007-04-07  0:59 UTC (permalink / raw)
  To: Guennadi Liakhovetski
  Cc: irda-users-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f, Paul Mackerras,
	linux-rt-users-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA

Hi Guennadi,

On Thu, Apr 05, 2007 at 12:59:40PM +0200, Guennadi Liakhovetski wrote:
> Ok, a simple analysis reveals the recursive spinlock:
> 
> On Thu, 5 Apr 2007, Guennadi Liakhovetski wrote:
> 
> > [<bf12b220>] (ppp_channel_push+0x0/0xc8 [ppp_generic]) from [<bf12bf98>] (ppp_output_wakeup+0x18/0x1c [ppp_generic])
> ===>		^^^^^^^^^^^^^^^^
> >  r7 = C38F42BC  r6 = C38F4200  r5 = C38F4200  r4 = 00000000
> 
> ===>	spin_lock_bh(&pch->downl);
> 
> > [<bf12bf80>] (ppp_output_wakeup+0x0/0x1c [ppp_generic]) from [<bf132c98>] (irnet_flow_indication+0x38/0x3c [irnet])
> > [<bf132c60>] (irnet_flow_indication+0x0/0x3c [irnet]) from [<bf104e4c>] (irttp_run_tx_queue+0x1c0/0x1d4 [irda])
> > [<bf104c8c>] (irttp_run_tx_queue+0x0/0x1d4 [irda]) from [<bf104f88>] (irttp_data_request+0x128/0x4f8 [irda])
> >  r8 = BF121560  r7 = 00000002  r6 = C38F4200  r5 = C21418B8
> >  r4 = C21418B8 
> > [<bf104e60>] (irttp_data_request+0x0/0x4f8 [irda]) from [<bf1321bc>] (ppp_irnet_send+0x134/0x238 [irnet])
> > [<bf132088>] (ppp_irnet_send+0x0/0x238 [irnet]) from [<bf12a600>] (ppp_push+0x80/0xb8 [ppp_generic])
> >  r7 = C3A436E0  r6 = 00000000  r5 = C21418B8  r4 = C1489600
> > [<bf12a580>] (ppp_push+0x0/0xb8 [ppp_generic]) from [<bf12a8d8>] (ppp_xmit_process+0x34/0x50c [ppp_generic])
> ===>		^^^^^^^^
> >  r7 = 00000021  r6 = C21418B8  r5 = C1489600  r4 = 00000000
> 
> ===>		spin_lock_bh(&pch->downl);
> 
> > [<bf12a8a4>] (ppp_xmit_process+0x0/0x50c [ppp_generic]) from [<bf12aed8>] (ppp_start_xmit+0x128/0x254 [ppp_generic])
> > [<bf12adb0>] (ppp_start_xmit+0x0/0x254 [ppp_generic]) from [<c0186fa4>] (dev_hard_start_xmit+0x170/0x268)
> > [<c0186e34>] (dev_hard_start_xmit+0x0/0x268) from [<c01979b8>] (__qdisc_run+0x60/0x270)
> >  r8 = C1BBC914  r7 = C21418B8  r6 = 00000000  r5 = C21418B8
> >  r4 = C1BBC800 
> > [<c0197958>] (__qdisc_run+0x0/0x270) from [<c0187250>] (dev_queue_xmit+0x1b4/0x25c)
> 
> Now, does anyone have an idea how best to fix it?
> 
> 1. Should re-entrance in ppp_channel_push() be prevented, and if yes - at 
> which level? Or
> 
> 2. Should re-entrance be allowed and only recursive spin_lock_bh() 
> avoided?
IMHO, irnet_flow_indication() should be called asynchronously by
irttp_run_tx_queue(), through some bottom-half mechanism. That would fix your
locking issues, and that would reduce the time we spend in the IrDA code with
this lock taken.

I will try to come up with some patches for you later this weekend.

Cheers,
Samuel.


-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [irda-users] [BUG] 2.6.20.1-rt8 irnet + pppd recursive spinlock...
  2007-04-07  0:59     ` Samuel Ortiz
@ 2007-04-10 18:10       ` Samuel Ortiz
  2007-04-11 10:02         ` Guennadi Liakhovetski
  2007-04-30 13:24         ` Guennadi Liakhovetski
  0 siblings, 2 replies; 9+ messages in thread
From: Samuel Ortiz @ 2007-04-10 18:10 UTC (permalink / raw)
  To: Guennadi Liakhovetski; +Cc: Paul Mackerras, irda-users, linux-rt-users, netdev

Hi Guennadi,

On Sat, Apr 07, 2007 at 03:59:26AM +0300, Samuel Ortiz wrote:
> IMHO, irnet_flow_indication() should be called asynchronously by
> irttp_run_tx_queue(), through some bottom-half mechanism. That would fix your
> locking issues, and that would reduce the time we spend in the IrDA code with
> this lock taken.
> 
> I will try to come up with some patches for you later this weekend.
The patch below schedules irnet_flow_indication() asynchronously. Could
you please give it a try (it builds, but I couldn't test it...) ? :

diff --git a/include/net/irda/irttp.h b/include/net/irda/irttp.h
index a899e58..941f0f1 100644
--- a/include/net/irda/irttp.h
+++ b/include/net/irda/irttp.h
@@ -128,6 +128,7 @@ struct tsap_cb {
 
 	struct net_device_stats stats;
 	struct timer_list todo_timer; 
+	struct work_struct irnet_flow_work;   /* irttp asynchronous flow restart */
 
 	__u32 max_seg_size;     /* Max data that fit into an IrLAP frame */
 	__u8  max_header_size;
diff --git a/net/irda/irnet/irnet.h b/net/irda/irnet/irnet.h
diff --git a/net/irda/irttp.c b/net/irda/irttp.c
index 7069e4a..a0d0f26 100644
--- a/net/irda/irttp.c
+++ b/net/irda/irttp.c
@@ -367,6 +367,29 @@ static int irttp_param_max_sdu_size(void *instance, irda_param_t *param,
 /*************************** CLIENT CALLS ***************************/
 /************************** LMP CALLBACKS **************************/
 /* Everything is happily mixed up. Waiting for next clean up - Jean II */
+static void irttp_flow_restart(struct work_struct *work)
+{
+	struct tsap_cb * self =
+		container_of(work, struct tsap_cb, irnet_flow_work);
+
+	if (self == NULL)
+		return;
+
+	/* Check if we can accept more frames from client. */
+	if ((self->tx_sdu_busy) &&
+	    (skb_queue_len(&self->tx_queue) < TTP_TX_LOW_THRESHOLD) &&
+	    (!self->close_pend)) {
+		if (self->notify.flow_indication)
+			self->notify.flow_indication(self->notify.instance,
+						     self, FLOW_START);
+
+		/* self->tx_sdu_busy is the state of the client.
+		 * We don't really have a race here, but it's always safer
+		 * to update our state after the client - Jean II */
+		self->tx_sdu_busy = FALSE;
+	}
+}
+
 
 /*
  * Function irttp_open_tsap (stsap, notify)
@@ -402,6 +425,8 @@ struct tsap_cb *irttp_open_tsap(__u8 stsap_sel, int credit, notify_t *notify)
 	self->todo_timer.data     = (unsigned long) self;
 	self->todo_timer.function = &irttp_todo_expired;
 
+	INIT_WORK(&self->irnet_flow_work, irttp_flow_restart);
+
 	/* Initialize callbacks for IrLMP to use */
 	irda_notify_init(&ttp_notify);
 	ttp_notify.connect_confirm = irttp_connect_confirm;
@@ -761,25 +786,10 @@ static void irttp_run_tx_queue(struct tsap_cb *self)
 		self->stats.tx_packets++;
 	}
 
-	/* Check if we can accept more frames from client.
-	 * We don't want to wait until the todo timer to do that, and we
-	 * can't use tasklets (grr...), so we are obliged to give control
-	 * to client. That's ok, this test will be true not too often
-	 * (max once per LAP window) and we are called from places
-	 * where we can spend a bit of time doing stuff. - Jean II */
 	if ((self->tx_sdu_busy) &&
 	    (skb_queue_len(&self->tx_queue) < TTP_TX_LOW_THRESHOLD) &&
 	    (!self->close_pend))
-	{
-		if (self->notify.flow_indication)
-			self->notify.flow_indication(self->notify.instance,
-						     self, FLOW_START);
-
-		/* self->tx_sdu_busy is the state of the client.
-		 * We don't really have a race here, but it's always safer
-		 * to update our state after the client - Jean II */
-		self->tx_sdu_busy = FALSE;
-	}
+		schedule_work(&self->irnet_flow_work);
 
 	/* Reset lock */
 	self->tx_queue_lock = 0;


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [irda-users] [BUG] 2.6.20.1-rt8 irnet + pppd recursive spinlock...
  2007-04-10 18:10       ` [irda-users] " Samuel Ortiz
@ 2007-04-11 10:02         ` Guennadi Liakhovetski
  2007-04-30 13:24         ` Guennadi Liakhovetski
  1 sibling, 0 replies; 9+ messages in thread
From: Guennadi Liakhovetski @ 2007-04-11 10:02 UTC (permalink / raw)
  To: Samuel Ortiz; +Cc: Paul Mackerras, irda-users, linux-rt-users, netdev

On Tue, 10 Apr 2007, Samuel Ortiz wrote:

> The patch below schedules irnet_flow_indication() asynchronously. Could
> you please give it a try (it builds, but I couldn't test it...) ? :

I'm afraid, your patch makes it worse - with my patch to ppp_generic it 
worked 5 hours before Ir stopped, remaining at 4mbaud, and only discovery 
timer was expiring periodically without actually doing anything; with your 
patch the same happens in about 5 minutes.

I'll double-verify it, and then will try the same work-queue approach, but 
from ppp_generic, rather than from irttp, combining our 2 patches.

Thanks
Guennadi
---------------------------------
Guennadi Liakhovetski, Ph.D.
DSA Daten- und Systemtechnik GmbH
Pascalstr. 28
D-52076 Aachen
Germany

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [irda-users] [BUG] 2.6.20.1-rt8 irnet + pppd recursive spinlock...
  2007-04-10 18:10       ` [irda-users] " Samuel Ortiz
  2007-04-11 10:02         ` Guennadi Liakhovetski
@ 2007-04-30 13:24         ` Guennadi Liakhovetski
  2007-05-01  0:53           ` Samuel Ortiz
  1 sibling, 1 reply; 9+ messages in thread
From: Guennadi Liakhovetski @ 2007-04-30 13:24 UTC (permalink / raw)
  To: Samuel Ortiz; +Cc: Paul Mackerras, irda-users, linux-rt-users, netdev

On Tue, 10 Apr 2007, Samuel Ortiz wrote:

> Hi Guennadi,
> 
> The patch below schedules irnet_flow_indication() asynchronously. Could
> you please give it a try (it builds, but I couldn't test it...) ? :

Ok, your patch (still below) works too (now that I fixed that state 
machine race, btw, we still have to decide on the final form how it goes 
in the mainline) __after__ you also add the line

+	INIT_WORK(&new->irnet_flow_work, irttp_flow_restart);

in irttp_dup() (remember spinlock_init()?:-)), otherwise it oopses. 
Generally, I like your patch better than mine to ppp_generic.c, where I 
explicitly check if a recursion is occurring. Still, I am a bit concerned 
about introducing yet another execution context into irda... We have seen 
a couple of locking issues there already in the last 2-3 months especially 
under rt-preempt... Would you be able to run some tests too? I will be 
testing it too, but don't know how much longer and how intensively. Do you 
think we might get some new problems with this new context?

Thanks
Guennadi

> 
> diff --git a/include/net/irda/irttp.h b/include/net/irda/irttp.h
> index a899e58..941f0f1 100644
> --- a/include/net/irda/irttp.h
> +++ b/include/net/irda/irttp.h
> @@ -128,6 +128,7 @@ struct tsap_cb {
>  
>  	struct net_device_stats stats;
>  	struct timer_list todo_timer; 
> +	struct work_struct irnet_flow_work;   /* irttp asynchronous flow restart */
>  
>  	__u32 max_seg_size;     /* Max data that fit into an IrLAP frame */
>  	__u8  max_header_size;
> diff --git a/net/irda/irnet/irnet.h b/net/irda/irnet/irnet.h
> diff --git a/net/irda/irttp.c b/net/irda/irttp.c
> index 7069e4a..a0d0f26 100644
> --- a/net/irda/irttp.c
> +++ b/net/irda/irttp.c
> @@ -367,6 +367,29 @@ static int irttp_param_max_sdu_size(void *instance, irda_param_t *param,
>  /*************************** CLIENT CALLS ***************************/
>  /************************** LMP CALLBACKS **************************/
>  /* Everything is happily mixed up. Waiting for next clean up - Jean II */
> +static void irttp_flow_restart(struct work_struct *work)
> +{
> +	struct tsap_cb * self =
> +		container_of(work, struct tsap_cb, irnet_flow_work);
> +
> +	if (self == NULL)
> +		return;
> +
> +	/* Check if we can accept more frames from client. */
> +	if ((self->tx_sdu_busy) &&
> +	    (skb_queue_len(&self->tx_queue) < TTP_TX_LOW_THRESHOLD) &&
> +	    (!self->close_pend)) {
> +		if (self->notify.flow_indication)
> +			self->notify.flow_indication(self->notify.instance,
> +						     self, FLOW_START);
> +
> +		/* self->tx_sdu_busy is the state of the client.
> +		 * We don't really have a race here, but it's always safer
> +		 * to update our state after the client - Jean II */
> +		self->tx_sdu_busy = FALSE;
> +	}
> +}
> +
>  
>  /*
>   * Function irttp_open_tsap (stsap, notify)
> @@ -402,6 +425,8 @@ struct tsap_cb *irttp_open_tsap(__u8 stsap_sel, int credit, notify_t *notify)
>  	self->todo_timer.data     = (unsigned long) self;
>  	self->todo_timer.function = &irttp_todo_expired;
>  
> +	INIT_WORK(&self->irnet_flow_work, irttp_flow_restart);
> +
>  	/* Initialize callbacks for IrLMP to use */
>  	irda_notify_init(&ttp_notify);
>  	ttp_notify.connect_confirm = irttp_connect_confirm;
> @@ -761,25 +786,10 @@ static void irttp_run_tx_queue(struct tsap_cb *self)
>  		self->stats.tx_packets++;
>  	}
>  
> -	/* Check if we can accept more frames from client.
> -	 * We don't want to wait until the todo timer to do that, and we
> -	 * can't use tasklets (grr...), so we are obliged to give control
> -	 * to client. That's ok, this test will be true not too often
> -	 * (max once per LAP window) and we are called from places
> -	 * where we can spend a bit of time doing stuff. - Jean II */
>  	if ((self->tx_sdu_busy) &&
>  	    (skb_queue_len(&self->tx_queue) < TTP_TX_LOW_THRESHOLD) &&
>  	    (!self->close_pend))
> -	{
> -		if (self->notify.flow_indication)
> -			self->notify.flow_indication(self->notify.instance,
> -						     self, FLOW_START);
> -
> -		/* self->tx_sdu_busy is the state of the client.
> -		 * We don't really have a race here, but it's always safer
> -		 * to update our state after the client - Jean II */
> -		self->tx_sdu_busy = FALSE;
> -	}
> +		schedule_work(&self->irnet_flow_work);
>  
>  	/* Reset lock */
>  	self->tx_queue_lock = 0;
> 
> 
> 

---------------------------------
Guennadi Liakhovetski, Ph.D.
DSA Daten- und Systemtechnik GmbH
Pascalstr. 28
D-52076 Aachen
Germany

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [irda-users] [BUG] 2.6.20.1-rt8 irnet + pppd recursive spinlock...
  2007-04-30 13:24         ` Guennadi Liakhovetski
@ 2007-05-01  0:53           ` Samuel Ortiz
  2007-05-02 11:05             ` Guennadi Liakhovetski
  0 siblings, 1 reply; 9+ messages in thread
From: Samuel Ortiz @ 2007-05-01  0:53 UTC (permalink / raw)
  To: Guennadi Liakhovetski; +Cc: irda-users, Paul Mackerras, linux-rt-users, netdev

On Mon, Apr 30, 2007 at 03:24:05PM +0200, Guennadi Liakhovetski wrote:
> On Tue, 10 Apr 2007, Samuel Ortiz wrote:
> 
> > Hi Guennadi,
> > 
> > The patch below schedules irnet_flow_indication() asynchronously. Could
> > you please give it a try (it builds, but I couldn't test it...) ? :
> 
> Ok, your patch (still below) works too (now that I fixed that state 
> machine race, btw, we still have to decide on the final form how it goes 
> in the mainline) __after__ you also add the line
> 
> +	INIT_WORK(&new->irnet_flow_work, irttp_flow_restart);
> 
> in irttp_dup() (remember spinlock_init()?:-)), otherwise it oopses.
good catch, again...Yes, I do remember the irttp_dup bug ;-)

 
> Generally, I like your patch better than mine to ppp_generic.c, where I 
> explicitly check if a recursion is occurring. Still, I am a bit concerned 
> about introducing yet another execution context into irda... We have seen 
> a couple of locking issues there already in the last 2-3 months especially 
> under rt-preempt... Would you be able to run some tests too? 
I think I can run some tests here as well, but probably not as many as you:
I'm not doing IrDA stuff full time while it seems you currently are.
But I will definitely spend some time this week running my IrDA stack with this
patch applied. I didn't bother to do that earlier as you first reported some
oops with this patch applied.

> I will be 
> testing it too, but don't know how much longer and how intensively. Do you 
> think we might get some new problems with this new context?
It seems quite safe to me. But more importantly, I thing we do want to call
the flow indication routine asynchronously in that specific case. 
There is one thing that this patch is missing though: we should probably
clean the worqueue right before we destroy the TTP instance. The work routine
checks for NULL pointer, but still...

Cheers,
Samuel.


> Thanks
> Guennadi
> 
> > 
> > diff --git a/include/net/irda/irttp.h b/include/net/irda/irttp.h
> > index a899e58..941f0f1 100644
> > --- a/include/net/irda/irttp.h
> > +++ b/include/net/irda/irttp.h
> > @@ -128,6 +128,7 @@ struct tsap_cb {
> >  
> >  	struct net_device_stats stats;
> >  	struct timer_list todo_timer; 
> > +	struct work_struct irnet_flow_work;   /* irttp asynchronous flow restart */
> >  
> >  	__u32 max_seg_size;     /* Max data that fit into an IrLAP frame */
> >  	__u8  max_header_size;
> > diff --git a/net/irda/irnet/irnet.h b/net/irda/irnet/irnet.h
> > diff --git a/net/irda/irttp.c b/net/irda/irttp.c
> > index 7069e4a..a0d0f26 100644
> > --- a/net/irda/irttp.c
> > +++ b/net/irda/irttp.c
> > @@ -367,6 +367,29 @@ static int irttp_param_max_sdu_size(void *instance, irda_param_t *param,
> >  /*************************** CLIENT CALLS ***************************/
 > >  /************************** LMP CALLBACKS **************************/
> >  /* Everything is happily mixed up. Waiting for next clean up - Jean II */
> > +static void irttp_flow_restart(struct work_struct *work)
> > +{
> > +	struct tsap_cb * self =
> > +		container_of(work, struct tsap_cb, irnet_flow_work);
> > +
> > +	if (self == NULL)
> > +		return;
> > +
> > +	/* Check if we can accept more frames from client. */
> > +	if ((self->tx_sdu_busy) &&
> > +	    (skb_queue_len(&self->tx_queue) < TTP_TX_LOW_THRESHOLD) &&
> > +	    (!self->close_pend)) {
> > +		if (self->notify.flow_indication)
> > +			self->notify.flow_indication(self->notify.instance,
> > +						     self, FLOW_START);
> > +
> > +		/* self->tx_sdu_busy is the state of the client.
> > +		 * We don't really have a race here, but it's always safer
> > +		 * to update our state after the client - Jean II */
> > +		self->tx_sdu_busy = FALSE;
> > +	}
> > +}
> > +
> >  
> >  /*
> >   * Function irttp_open_tsap (stsap, notify)
> > @@ -402,6 +425,8 @@ struct tsap_cb *irttp_open_tsap(__u8 stsap_sel, int credit, notify_t *notify)
> >  	self->todo_timer.data     = (unsigned long) self;
> >  	self->todo_timer.function = &irttp_todo_expired;
> >  
> > +	INIT_WORK(&self->irnet_flow_work, irttp_flow_restart);
> > +
> >  	/* Initialize callbacks for IrLMP to use */
> >  	irda_notify_init(&ttp_notify);
> >  	ttp_notify.connect_confirm = irttp_connect_confirm;
> > @@ -761,25 +786,10 @@ static void irttp_run_tx_queue(struct tsap_cb *self)
> >  		self->stats.tx_packets++;
> >  	}
> >  
> > -	/* Check if we can accept more frames from client.
> > -	 * We don't want to wait until the todo timer to do that, and we
> > -	 * can't use tasklets (grr...), so we are obliged to give control
> > -	 * to client. That's ok, this test will be true not too often
> > -	 * (max once per LAP window) and we are called from places
> > -	 * where we can spend a bit of time doing stuff. - Jean II */
> >  	if ((self->tx_sdu_busy) &&
> >  	    (skb_queue_len(&self->tx_queue) < TTP_TX_LOW_THRESHOLD) &&
> >  	    (!self->close_pend))
> > -	{
> > -		if (self->notify.flow_indication)
> > -			self->notify.flow_indication(self->notify.instance,
> > -						     self, FLOW_START);
> > -
> > -		/* self->tx_sdu_busy is the state of the client.
> > -		 * We don't really have a race here, but it's always safer
> > -		 * to update our state after the client - Jean II */
> > -		self->tx_sdu_busy = FALSE;
> > -	}
> > +		schedule_work(&self->irnet_flow_work);
> >  
> >  	/* Reset lock */
> >  	self->tx_queue_lock = 0;
> > 
> > 
> > 
> 
> ---------------------------------
> Guennadi Liakhovetski, Ph.D.
> DSA Daten- und Systemtechnik GmbH
> Pascalstr. 28
> D-52076 Aachen
> Germany
> 
> -------------------------------------------------------------------------
> This SF.net email is sponsored by DB2 Express
> Download DB2 Express C - the FREE version of DB2 express and take
> control of your XML. No limits. Just data. Click to get it now.
> http://sourceforge.net/powerbar/db2/
> _______________________________________________
> irda-users mailing list
> irda-users@lists.sourceforge.net
> http://lists.sourceforge.net/lists/listinfo/irda-users

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [irda-users] [BUG] 2.6.20.1-rt8 irnet + pppd recursive spinlock...
  2007-05-01  0:53           ` Samuel Ortiz
@ 2007-05-02 11:05             ` Guennadi Liakhovetski
  0 siblings, 0 replies; 9+ messages in thread
From: Guennadi Liakhovetski @ 2007-05-02 11:05 UTC (permalink / raw)
  To: Samuel Ortiz; +Cc: irda-users, Paul Mackerras, linux-rt-users, netdev

On Tue, 1 May 2007, Samuel Ortiz wrote:
> But I will definitely spend some time this week running my IrDA stack 
> with this patch applied. I didn't bother to do that earlier as you first 
> reported some oops with this patch applied.

I think, what I reported was not an Oops, but the race that we're also 
fixing ATM - the one in the state machine. So, unrelated.

> On Mon, Apr 30, 2007 at 03:24:05PM +0200, Guennadi Liakhovetski wrote:

> > in irttp_dup() (remember spinlock_init()?:-)), otherwise it oopses.
> good catch, again...Yes, I do remember the irttp_dup bug ;-)

I've put a tsap_init() function that does those common initialisation 
calls, so we have a smaller chance of forgetting the dup() path next 
time...

> > I will be 
> > testing it too, but don't know how much longer and how intensively. Do you 
> > think we might get some new problems with this new context?
> It seems quite safe to me. But more importantly, I thing we do want to call
> the flow indication routine asynchronously in that specific case. 
> There is one thing that this patch is missing though: we should probably
> clean the worqueue right before we destroy the TTP instance. The work routine
> checks for NULL pointer, but still...

I thought about it too, but the only thing you can do is 
flush_scheduled_work(), is this what you mean?

The test with your patch stopped inside a ftp transfer - the ftp client 
was doing a get, and it stopped half-way through. On the client side only 
the control tcp connection was still open, on the client both ftp-server 
forked children were dead, no connection open. No errors in logs. Weird. 
With my "ugly" patch it ran the weekend through.

...and it just stopped again - this time after just 13 minutes.

A couple of comments to your patch:

> > > +static void irttp_flow_restart(struct work_struct *work)
> > > +{
> > > +	struct tsap_cb * self =
> > > +		container_of(work, struct tsap_cb, irnet_flow_work);
> > > +
> > > +	if (self == NULL)
> > > +		return;

self will not be NULL here. How about

+	IRDA_ASSERT(work != NULL, return;);
+	IRDA_ASSERT(self->magic == TTP_TSAP_MAGIC, return;);

> > > -	/* Check if we can accept more frames from client.
> > > -	 * We don't want to wait until the todo timer to do that, and we
> > > -	 * can't use tasklets (grr...), so we are obliged to give control
> > > -	 * to client. That's ok, this test will be true not too often
> > > -	 * (max once per LAP window) and we are called from places
> > > -	 * where we can spend a bit of time doing stuff. - Jean II */
> > >  	if ((self->tx_sdu_busy) &&
> > >  	    (skb_queue_len(&self->tx_queue) < TTP_TX_LOW_THRESHOLD) &&
> > >  	    (!self->close_pend))

Is this test still needed here? You do it again in the work-function, and 
the conditions can change, so, should we schedule_work unconditionally 
here?

Thanks
Guennadi
---------------------------------
Guennadi Liakhovetski, Ph.D.
DSA Daten- und Systemtechnik GmbH
Pascalstr. 28
D-52076 Aachen
Germany

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2007-05-02 11:05 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-04-05  7:09 [BUG] 2.6.20.1-rt8 irnet + pppd recursive spinlock Guennadi Liakhovetski
2007-04-05 10:59 ` Guennadi Liakhovetski
2007-04-05 15:11   ` Guennadi Liakhovetski
     [not found]   ` <Pine.LNX.4.63.0704051250170.534-pjC7uWgG1yLsj6RlE8knwQ@public.gmane.org>
2007-04-07  0:59     ` Samuel Ortiz
2007-04-10 18:10       ` [irda-users] " Samuel Ortiz
2007-04-11 10:02         ` Guennadi Liakhovetski
2007-04-30 13:24         ` Guennadi Liakhovetski
2007-05-01  0:53           ` Samuel Ortiz
2007-05-02 11:05             ` Guennadi Liakhovetski

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).