linux-bluetooth.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v1 0/7] Avoid L2CAP ERTM shutdown hung tasks
@ 2015-06-05 23:11 Dean Jenkins
  2015-06-05 23:11 ` [PATCH v1 1/7] Bluetooth: L2CAP ERTM shutdown protect sk and chan Dean Jenkins
                   ` (7 more replies)
  0 siblings, 8 replies; 17+ messages in thread
From: Dean Jenkins @ 2015-06-05 23:11 UTC (permalink / raw)
  To: linux-bluetooth
  Cc: marcel, gustavo, johan.hedberg, Joshua_Frkuska, Dean Jenkins

Details of the issue are described in the thread
[RFC] Bluetooth ERTM L2CAP deadlocks (hung tasks) due to l2cap_sock_shutdown()
and kernel.org Bugzilla https://bugzilla.kernel.org/show_bug.cgi?id=99301

Summary
=======

a) __l2cap_wait_ack() gets stuck in an infinite loop whilst waiting for
   I frame acknowledgements (ack is S or I frame from the remote device).
   The infinite loop occurs because the waiting loop's condition never becomes
   false. __l2cap_wait_ack() periodically sleeps for 200ms.

b) The L2CAP threads for processing received frames and error recovery
   timers become blocked on a mutex held by l2cap_sock_shutdown() which
   prevents updating of the loop condition in__l2cap_wait_ack().

Therefore a deadlock occurs.

This deadlock was observed on kernels 3.8 upto bluetooth-next on x86 and ARM
based platforms including the Raspberry Pi kernel 3.18.14+

The testcase is simple to reproduce the hung tasks, please wait upto 10 minutes
to observe the hung tasks:

On a laptop or other suitable device use the command:
l2test -i hci0 -X ertm -d
This command listens for connections with optional L2CAP ERTM supported and
dumps out the received data.

On the unit under test use the command:
l2test -s -b 1024 -B /dev/urandom -X ertm -P 4113 <lap_BT_ADDR>
This command sends 1024 bytes to the laptop via PSM 4113 used by l2test over
an ERTM channel. The command line shows:

l2test -s -b 1024 -B /dev/urandom -X ertm -P 4113 5C:F3:70:68:BA:DD
l2test[10366]: Connected to 5C:F3:70:68:BA:DD (bredr, psm 4113, scid 64)
l2test[10366]: Local device 5C:F3:70:68:BA:DA (bredr, psm 4113, scid 64)
l2test[10366]: Options [imtu 672, omtu 672, flush_to 65535, mode 3, handle 11, class 0x000000, priority 0, rcvbuf 163840]
l2test[10366]: Sending ...
l2test[10366]: Closing channel ... 

Note that the channel fails to close because l2cap_sock_shutdown() is prevented
from returning due to __l2cap_wait_ack being stuck in an infinite loop.

Example hung tasks are:

[  601.116119] INFO: task kworker/0:1:19 blocked for more than 120 seconds.
[  601.116156]       Not tainted 3.18.11+ #781
[  601.116169] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  601.116183] kworker/0:1     D c054cbcc     0    19      2 0x00000000
[  601.116537] Workqueue: events l2cap_retrans_timeout [bluetooth]
[  601.116685] [<c054cbcc>] (__schedule) from [<c054cfa4>] (schedule+0x40/0x8c)
[  601.116727] [<c054cfa4>] (schedule) from [<c054d3c8>] (schedule_preempt_disabled+0x30/0x40)
[  601.116812] [<c054d3c8>] (schedule_preempt_disabled) from [<c054ee14>] (__mutex_lock_slowpath+0xb8/0x18c)
[  601.116856] [<c054ee14>] (__mutex_lock_slowpath) from [<c054ef20>] (mutex_lock+0x38/0x3c)
[  601.117085] [<c054ef20>] (mutex_lock) from [<bf05173c>] (l2cap_retrans_timeout+0x24/0x5c [bluetooth])
[  601.117371] [<bf05173c>] (l2cap_retrans_timeout [bluetooth]) from [<c003aa68>] (process_one_work+0x13c/0x444)
[  601.117410] [<c003aa68>] (process_one_work) from [<c003b954>] (worker_thread+0x190/0x580)
[  601.117445] [<c003b954>] (worker_thread) from [<c003fdb4>] (kthread+0xd8/0xf4)
[  601.117488] [<c003fdb4>] (kthread) from [<c000e8c8>] (ret_from_fork+0x14/0x20)
[  601.117569] INFO: task kworker/u3:2:327 blocked for more than 120 seconds.
[  601.117588]       Not tainted 3.18.11+ #781
[  601.117601] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  601.117614] kworker/u3:2    D c054cbcc     0   327      2 0x00000000
[  601.117806] Workqueue: hci0 hci_rx_work [bluetooth]
[  601.117941] [<c054cbcc>] (__schedule) from [<c054cfa4>] (schedule+0x40/0x8c)
[  601.117981] [<c054cfa4>] (schedule) from [<c054d3c8>] (schedule_preempt_disabled+0x30/0x40)
[  601.118056] [<c054d3c8>] (schedule_preempt_disabled) from [<c054ee14>] (__mutex_lock_slowpath+0xb8/0x18c)
[  601.118092] [<c054ee14>] (__mutex_lock_slowpath) from [<c054ef20>] (mutex_lock+0x38/0x3c)
[  601.118307] [<c054ef20>] (mutex_lock) from [<bf04d2c8>] (l2cap_get_chan_by_scid+0x28/0x98 [bluetooth])
[  601.118696] [<bf04d2c8>] (l2cap_get_chan_by_scid [bluetooth]) from [<bf054590>] (l2cap_recv_frame+0xa4/0x2cec [bluetooth])
[  601.119146] [<bf054590>] (l2cap_recv_frame [bluetooth]) from [<bf058194>] (l2cap_recv_acldata+0x214/0x22c [bluetooth])
[  601.119559] [<bf058194>] (l2cap_recv_acldata [bluetooth]) from [<bf02bf88>] (hci_rx_work+0x180/0x288 [bluetooth])
[  601.119801] [<bf02bf88>] (hci_rx_work [bluetooth]) from [<c003aa68>] (process_one_work+0x13c/0x444)
[  601.119839] [<c003aa68>] (process_one_work) from [<c003b954>] (worker_thread+0x190/0x580)
[  601.119883] [<c003b954>] (worker_thread) from [<c003fdb4>] (kthread+0xd8/0xf4)
[  601.119919] [<c003fdb4>] (kthread) from [<c000e8c8>] (ret_from_fork+0x14/0x20)

This is not an exhaustive list, there are other hung task flavours.

Solution
========

The solution was to reorganise l2cap_sock_shutdown() to reveal the purpose of
the function. The function shuts down the L2CAP channel but in the case when the
channel type is ERTM, __l2cap_wait_ack() is called to wait for acknowledgements
before the channel is shutdown.

There is some concurrency protection by using a lock on sk to prevent
multiple instances of l2cap_sock_shutdown() from running.

SHUTDOWN_MASK is used to inhibit new I frames and other PDUs from being
queued. Setting this flag was moved to before calling __l2cap_wait_ack() so that
the loop would not be waiting indefinitely for ACKs. There is some risk here
of not sending all pending data, however, an arbitrary cut-off point is needed
otherwise the link could remain connected. This may need improving to inhibit
only new I frames.

Extra protection to the sk and chan structures was added to prevent those
structures being freed whilst the locks were not held. This is defensive and
perhaps unnecessary.

__l2cap_wait_ack() was modified to be only called when necessary and the
while loop was replaced by a do while loop because there must be at least 1
waiting period. References to chan->conn were replaced by chan->state so that
the mutex locks could be removed. Also waiting for ACKs is only valid whilst the
connection is in the BT_CONNECTED state.

The scope of chan locking was reduced so that processing of received frames
and error handling routines could operate whilst waiting for ACKs.

Importantly, for __l2cap_wait_ack() a 10 second maximum wait limit was
implemented to prevent __l2cap_wait_ack() getting stuck in an infinite loop.
If no expected ACKs arrive then the channel is disconnected and the error
code -ENOLINK is returned to indicate that not all data was ACK'ed.

Testing
=======

This is being tested in kernel 3.8 on an ARM based board. Results so far
show that the hung tasks no longer occur.

Note that the failure scenario was Mediaplayer media browsing. Waiting for
ACK's is unimportant in this scenario because it does not matter whether
a media browsing request such as "get item" was successfully transferred at the
point of shutting down the channel.

Feedback
========

Please review and provide comments on this solution. Thanks.

Patchset
========

Patches are based on bluetooth-next:

Dean Jenkins (7):
  Bluetooth: L2CAP ERTM shutdown protect sk and chan
  Bluetooth: Make __l2cap_wait_ack more efficient
  Bluetooth: Unwind l2cap_sock_shutdown()
  Bluetooth: l2cap_sock_shutdown() remove mutex_lock calls
  Bluetooth: l2cap_sock_shutdown() reduce scope of chan locking
  Bluetooth: Add BT_DBG to l2cap_sock_shutdown()
  Bluetooth: __l2cap_wait_ack() limit max waiting time

 net/bluetooth/l2cap_sock.c | 69 +++++++++++++++++++++++++++++-----------------
 1 file changed, 44 insertions(+), 25 deletions(-)

-- 
1.8.5.6

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH v1 1/7] Bluetooth: L2CAP ERTM shutdown protect sk and chan
  2015-06-05 23:11 [PATCH v1 0/7] Avoid L2CAP ERTM shutdown hung tasks Dean Jenkins
@ 2015-06-05 23:11 ` Dean Jenkins
  2015-06-05 23:11 ` [PATCH v1 2/7] Bluetooth: Make __l2cap_wait_ack more efficient Dean Jenkins
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 17+ messages in thread
From: Dean Jenkins @ 2015-06-05 23:11 UTC (permalink / raw)
  To: linux-bluetooth
  Cc: marcel, gustavo, johan.hedberg, Joshua_Frkuska, Dean Jenkins,
	Kautuk Consul

During execution of l2cap_sock_shutdown() which might
sleep, the sk and chan structures can be in an unlocked
condition which potentially allows the structures to be
freed by other running threads. Therefore, there is a
possibility of a malfunction or memory reuse after being
freed.

Keep the sk and chan structures alive during the
exection of l2cap_sock_shutdown() by using their
respective hold and put functions. This allows the structures
to be freeable at the end of l2cap_sock_shutdown().

Signed-off-by: Kautuk Consul <Kautuk_Consul@mentor.com>
Signed-off-by: Dean Jenkins <Dean_Jenkins@mentor.com>
---
 net/bluetooth/l2cap_sock.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/net/bluetooth/l2cap_sock.c b/net/bluetooth/l2cap_sock.c
index 2442877..8321685 100644
--- a/net/bluetooth/l2cap_sock.c
+++ b/net/bluetooth/l2cap_sock.c
@@ -1098,7 +1098,12 @@ static int l2cap_sock_shutdown(struct socket *sock, int how)
 	if (!sk)
 		return 0;
 
+	/* prevent sk structure from being freed whilst unlocked */
+	sock_hold(sk);
+
 	chan = l2cap_pi(sk)->chan;
+	/* prevent chan structure from being freed whilst unlocked */
+	l2cap_chan_hold(chan);
 	conn = chan->conn;
 
 	BT_DBG("chan %p state %s", chan, state_to_string(chan->state));
@@ -1134,6 +1139,10 @@ static int l2cap_sock_shutdown(struct socket *sock, int how)
 	if (conn)
 		mutex_unlock(&conn->chan_lock);
 
+	/* allow chan and sk structures to be freed */
+	l2cap_chan_put(chan);
+	sock_put(sk);
+
 	return err;
 }
 
-- 
1.8.5.6

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v1 2/7] Bluetooth: Make __l2cap_wait_ack more efficient
  2015-06-05 23:11 [PATCH v1 0/7] Avoid L2CAP ERTM shutdown hung tasks Dean Jenkins
  2015-06-05 23:11 ` [PATCH v1 1/7] Bluetooth: L2CAP ERTM shutdown protect sk and chan Dean Jenkins
@ 2015-06-05 23:11 ` Dean Jenkins
  2015-06-05 23:11 ` [PATCH v1 3/7] Bluetooth: Unwind l2cap_sock_shutdown() Dean Jenkins
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 17+ messages in thread
From: Dean Jenkins @ 2015-06-05 23:11 UTC (permalink / raw)
  To: linux-bluetooth
  Cc: marcel, gustavo, johan.hedberg, Joshua_Frkuska, Dean Jenkins

Use chan->state instead of chan->conn because waiting
for ACK's is only possible in the BT_CONNECTED state.
Also avoids reference to the conn structure so makes
locking easier.

Only call __l2cap_wait_ack() when the needed condition
of chan->unacked_frames > 0 && chan->state == BT_CONNECTED
is true and convert the while loop to a do while loop.

__l2cap_wait_ack() change the function prototype to
pass in the chan variable as chan is already available
in the calling function l2cap_sock_shutdown(). Avoids
locking issues.

Signed-off-by: Dean Jenkins <Dean_Jenkins@mentor.com>
---
 net/bluetooth/l2cap_sock.c | 15 +++++++++------
 1 file changed, 9 insertions(+), 6 deletions(-)

diff --git a/net/bluetooth/l2cap_sock.c b/net/bluetooth/l2cap_sock.c
index 8321685..c909f45 100644
--- a/net/bluetooth/l2cap_sock.c
+++ b/net/bluetooth/l2cap_sock.c
@@ -1054,16 +1054,15 @@ static void l2cap_sock_kill(struct sock *sk)
 	sock_put(sk);
 }
 
-static int __l2cap_wait_ack(struct sock *sk)
+static int __l2cap_wait_ack(struct sock *sk, struct l2cap_chan *chan)
 {
-	struct l2cap_chan *chan = l2cap_pi(sk)->chan;
 	DECLARE_WAITQUEUE(wait, current);
 	int err = 0;
 	int timeo = HZ/5;
 
 	add_wait_queue(sk_sleep(sk), &wait);
 	set_current_state(TASK_INTERRUPTIBLE);
-	while (chan->unacked_frames > 0 && chan->conn) {
+	do {
 		if (!timeo)
 			timeo = HZ/5;
 
@@ -1080,7 +1079,9 @@ static int __l2cap_wait_ack(struct sock *sk)
 		err = sock_error(sk);
 		if (err)
 			break;
-	}
+	} while (chan->unacked_frames > 0 &&
+		 chan->state == BT_CONNECTED);
+
 	set_current_state(TASK_RUNNING);
 	remove_wait_queue(sk_sleep(sk), &wait);
 	return err;
@@ -1115,8 +1116,10 @@ static int l2cap_sock_shutdown(struct socket *sock, int how)
 	lock_sock(sk);
 
 	if (!sk->sk_shutdown) {
-		if (chan->mode == L2CAP_MODE_ERTM)
-			err = __l2cap_wait_ack(sk);
+		if (chan->mode == L2CAP_MODE_ERTM &&
+		    chan->unacked_frames > 0 &&
+		    chan->state == BT_CONNECTED)
+			err = __l2cap_wait_ack(sk, chan);
 
 		sk->sk_shutdown = SHUTDOWN_MASK;
 
-- 
1.8.5.6


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v1 3/7] Bluetooth: Unwind l2cap_sock_shutdown()
  2015-06-05 23:11 [PATCH v1 0/7] Avoid L2CAP ERTM shutdown hung tasks Dean Jenkins
  2015-06-05 23:11 ` [PATCH v1 1/7] Bluetooth: L2CAP ERTM shutdown protect sk and chan Dean Jenkins
  2015-06-05 23:11 ` [PATCH v1 2/7] Bluetooth: Make __l2cap_wait_ack more efficient Dean Jenkins
@ 2015-06-05 23:11 ` Dean Jenkins
  2015-06-09  9:55   ` Szymon Janc
  2015-06-05 23:11 ` [PATCH v1 4/7] Bluetooth: l2cap_sock_shutdown() remove mutex_lock calls Dean Jenkins
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 17+ messages in thread
From: Dean Jenkins @ 2015-06-05 23:11 UTC (permalink / raw)
  To: linux-bluetooth
  Cc: marcel, gustavo, johan.hedberg, Joshua_Frkuska, Dean Jenkins

l2cap_sock_shutdown() is designed to only action shutdown
of the channel when shutdown is not already in progress.
Therefore, reorganise the code flow by adding a goto
to jump to the end of function handling when shutdown is
already being actioned. This removes one level of code
indentation and make the code more readable.

Signed-off-by: Dean Jenkins <Dean_Jenkins@mentor.com>
---
 net/bluetooth/l2cap_sock.c | 44 ++++++++++++++++++++++++--------------------
 1 file changed, 24 insertions(+), 20 deletions(-)

diff --git a/net/bluetooth/l2cap_sock.c b/net/bluetooth/l2cap_sock.c
index c909f45..8550886 100644
--- a/net/bluetooth/l2cap_sock.c
+++ b/net/bluetooth/l2cap_sock.c
@@ -1099,6 +1099,13 @@ static int l2cap_sock_shutdown(struct socket *sock, int how)
 	if (!sk)
 		return 0;
 
+	lock_sock(sk);
+
+	if (sk->sk_shutdown)
+		goto shutdown_already;
+
+	sk->sk_shutdown = SHUTDOWN_MASK;
+
 	/* prevent sk structure from being freed whilst unlocked */
 	sock_hold(sk);
 
@@ -1113,30 +1120,21 @@ static int l2cap_sock_shutdown(struct socket *sock, int how)
 		mutex_lock(&conn->chan_lock);
 
 	l2cap_chan_lock(chan);
-	lock_sock(sk);
 
-	if (!sk->sk_shutdown) {
-		if (chan->mode == L2CAP_MODE_ERTM &&
-		    chan->unacked_frames > 0 &&
-		    chan->state == BT_CONNECTED)
-			err = __l2cap_wait_ack(sk, chan);
+	if (chan->mode == L2CAP_MODE_ERTM &&
+	    chan->unacked_frames > 0 &&
+	    chan->state == BT_CONNECTED)
+		err = __l2cap_wait_ack(sk, chan);
 
-		sk->sk_shutdown = SHUTDOWN_MASK;
-
-		release_sock(sk);
-		l2cap_chan_close(chan, 0);
-		lock_sock(sk);
-
-		if (sock_flag(sk, SOCK_LINGER) && sk->sk_lingertime &&
-		    !(current->flags & PF_EXITING))
-			err = bt_sock_wait_state(sk, BT_CLOSED,
-						 sk->sk_lingertime);
-	}
+	release_sock(sk);
+	l2cap_chan_close(chan, 0);
+	lock_sock(sk);
 
-	if (!err && sk->sk_err)
-		err = -sk->sk_err;
+	if (sock_flag(sk, SOCK_LINGER) && sk->sk_lingertime &&
+	    !(current->flags & PF_EXITING))
+		err = bt_sock_wait_state(sk, BT_CLOSED,
+					 sk->sk_lingertime);
 
-	release_sock(sk);
 	l2cap_chan_unlock(chan);
 
 	if (conn)
@@ -1146,6 +1144,12 @@ static int l2cap_sock_shutdown(struct socket *sock, int how)
 	l2cap_chan_put(chan);
 	sock_put(sk);
 
+shutdown_already:
+	if (!err && sk->sk_err)
+		err = -sk->sk_err;
+
+	release_sock(sk);
+
 	return err;
 }
 
-- 
1.8.5.6

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v1 4/7] Bluetooth: l2cap_sock_shutdown() remove mutex_lock calls
  2015-06-05 23:11 [PATCH v1 0/7] Avoid L2CAP ERTM shutdown hung tasks Dean Jenkins
                   ` (2 preceding siblings ...)
  2015-06-05 23:11 ` [PATCH v1 3/7] Bluetooth: Unwind l2cap_sock_shutdown() Dean Jenkins
@ 2015-06-05 23:11 ` Dean Jenkins
  2015-06-05 23:11 ` [PATCH v1 5/7] Bluetooth: l2cap_sock_shutdown() reduce scope of chan locking Dean Jenkins
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 17+ messages in thread
From: Dean Jenkins @ 2015-06-05 23:11 UTC (permalink / raw)
  To: linux-bluetooth
  Cc: marcel, gustavo, johan.hedberg, Joshua_Frkuska, Dean Jenkins

Remove the calls
mutex_lock(&conn->chan_lock);
mutex_unlock(&conn->chan_lock);

because the conn structure is no longer being referenced in
l2cap_sock_shutdown() as instead the chan->state is being used
to determince the channel connection state.

Signed-off-by: Dean Jenkins <Dean_Jenkins@mentor.com>
---
 net/bluetooth/l2cap_sock.c | 8 --------
 1 file changed, 8 deletions(-)

diff --git a/net/bluetooth/l2cap_sock.c b/net/bluetooth/l2cap_sock.c
index 8550886..273bfa1 100644
--- a/net/bluetooth/l2cap_sock.c
+++ b/net/bluetooth/l2cap_sock.c
@@ -1091,7 +1091,6 @@ static int l2cap_sock_shutdown(struct socket *sock, int how)
 {
 	struct sock *sk = sock->sk;
 	struct l2cap_chan *chan;
-	struct l2cap_conn *conn;
 	int err = 0;
 
 	BT_DBG("sock %p, sk %p", sock, sk);
@@ -1112,13 +1111,9 @@ static int l2cap_sock_shutdown(struct socket *sock, int how)
 	chan = l2cap_pi(sk)->chan;
 	/* prevent chan structure from being freed whilst unlocked */
 	l2cap_chan_hold(chan);
-	conn = chan->conn;
 
 	BT_DBG("chan %p state %s", chan, state_to_string(chan->state));
 
-	if (conn)
-		mutex_lock(&conn->chan_lock);
-
 	l2cap_chan_lock(chan);
 
 	if (chan->mode == L2CAP_MODE_ERTM &&
@@ -1137,9 +1132,6 @@ static int l2cap_sock_shutdown(struct socket *sock, int how)
 
 	l2cap_chan_unlock(chan);
 
-	if (conn)
-		mutex_unlock(&conn->chan_lock);
-
 	/* allow chan and sk structures to be freed */
 	l2cap_chan_put(chan);
 	sock_put(sk);
-- 
1.8.5.6

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v1 5/7] Bluetooth: l2cap_sock_shutdown() reduce scope of chan locking
  2015-06-05 23:11 [PATCH v1 0/7] Avoid L2CAP ERTM shutdown hung tasks Dean Jenkins
                   ` (3 preceding siblings ...)
  2015-06-05 23:11 ` [PATCH v1 4/7] Bluetooth: l2cap_sock_shutdown() remove mutex_lock calls Dean Jenkins
@ 2015-06-05 23:11 ` Dean Jenkins
  2015-06-05 23:11 ` [PATCH v1 6/7] Bluetooth: Add BT_DBG to l2cap_sock_shutdown() Dean Jenkins
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 17+ messages in thread
From: Dean Jenkins @ 2015-06-05 23:11 UTC (permalink / raw)
  To: linux-bluetooth
  Cc: marcel, gustavo, johan.hedberg, Joshua_Frkuska, Dean Jenkins

The only place where chan locking is needed is the call to
l2cap_chan_close(chan, 0) which if necessary closes the channel.
Therefore, move the l2cap_chan_lock(chan) and
l2cap_chan_lock(chan) locking calls to around
l2cap_chan_close(chan, 0).

This allows __l2cap_wait_ack(sk, chan) to be called with no
chan locks being held so L2CAP messaging over the ACL link
can be done unimpaired.

Signed-off-by: Dean Jenkins <Dean_Jenkins@mentor.com>
---
 net/bluetooth/l2cap_sock.c | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/net/bluetooth/l2cap_sock.c b/net/bluetooth/l2cap_sock.c
index 273bfa1..0d8ae73 100644
--- a/net/bluetooth/l2cap_sock.c
+++ b/net/bluetooth/l2cap_sock.c
@@ -1114,24 +1114,22 @@ static int l2cap_sock_shutdown(struct socket *sock, int how)
 
 	BT_DBG("chan %p state %s", chan, state_to_string(chan->state));
 
-	l2cap_chan_lock(chan);
-
 	if (chan->mode == L2CAP_MODE_ERTM &&
 	    chan->unacked_frames > 0 &&
 	    chan->state == BT_CONNECTED)
 		err = __l2cap_wait_ack(sk, chan);
 
+	l2cap_chan_lock(chan);
 	release_sock(sk);
 	l2cap_chan_close(chan, 0);
 	lock_sock(sk);
+	l2cap_chan_unlock(chan);
 
 	if (sock_flag(sk, SOCK_LINGER) && sk->sk_lingertime &&
 	    !(current->flags & PF_EXITING))
 		err = bt_sock_wait_state(sk, BT_CLOSED,
 					 sk->sk_lingertime);
 
-	l2cap_chan_unlock(chan);
-
 	/* allow chan and sk structures to be freed */
 	l2cap_chan_put(chan);
 	sock_put(sk);
-- 
1.8.5.6

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v1 6/7] Bluetooth: Add BT_DBG to l2cap_sock_shutdown()
  2015-06-05 23:11 [PATCH v1 0/7] Avoid L2CAP ERTM shutdown hung tasks Dean Jenkins
                   ` (4 preceding siblings ...)
  2015-06-05 23:11 ` [PATCH v1 5/7] Bluetooth: l2cap_sock_shutdown() reduce scope of chan locking Dean Jenkins
@ 2015-06-05 23:11 ` Dean Jenkins
  2015-06-05 23:11 ` [PATCH v1 7/7] Bluetooth: __l2cap_wait_ack() limit max waiting time Dean Jenkins
  2015-06-06  4:13 ` [PATCH v1 0/7] Avoid L2CAP ERTM shutdown hung tasks Marcel Holtmann
  7 siblings, 0 replies; 17+ messages in thread
From: Dean Jenkins @ 2015-06-05 23:11 UTC (permalink / raw)
  To: linux-bluetooth
  Cc: marcel, gustavo, johan.hedberg, Joshua_Frkuska, Dean Jenkins

Add helpful BT_DBG debug to l2cap_sock_shutdown()
and __l2cap_wait_ack() so that the code flow can
be analysed.

Signed-off-by: Dean Jenkins <Dean_Jenkins@mentor.com>
---
 net/bluetooth/l2cap_sock.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/net/bluetooth/l2cap_sock.c b/net/bluetooth/l2cap_sock.c
index 0d8ae73..369ad0e 100644
--- a/net/bluetooth/l2cap_sock.c
+++ b/net/bluetooth/l2cap_sock.c
@@ -1063,6 +1063,8 @@ static int __l2cap_wait_ack(struct sock *sk, struct l2cap_chan *chan)
 	add_wait_queue(sk_sleep(sk), &wait);
 	set_current_state(TASK_INTERRUPTIBLE);
 	do {
+		BT_DBG("Waiting for %d ACKs", chan->unacked_frames);
+
 		if (!timeo)
 			timeo = HZ/5;
 
@@ -1140,6 +1142,8 @@ shutdown_already:
 
 	release_sock(sk);
 
+	BT_DBG("err: %d", err);
+
 	return err;
 }
 
-- 
1.8.5.6


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v1 7/7] Bluetooth: __l2cap_wait_ack() limit max waiting time
  2015-06-05 23:11 [PATCH v1 0/7] Avoid L2CAP ERTM shutdown hung tasks Dean Jenkins
                   ` (5 preceding siblings ...)
  2015-06-05 23:11 ` [PATCH v1 6/7] Bluetooth: Add BT_DBG to l2cap_sock_shutdown() Dean Jenkins
@ 2015-06-05 23:11 ` Dean Jenkins
  2015-06-06  4:21   ` Marcel Holtmann
  2015-06-06  4:13 ` [PATCH v1 0/7] Avoid L2CAP ERTM shutdown hung tasks Marcel Holtmann
  7 siblings, 1 reply; 17+ messages in thread
From: Dean Jenkins @ 2015-06-05 23:11 UTC (permalink / raw)
  To: linux-bluetooth
  Cc: marcel, gustavo, johan.hedberg, Joshua_Frkuska, Dean Jenkins

Add a limiter counter to prevent the do while loop
running in an infinite loop. This ensures that the
channel will be instructed to close within 10 seconds
so prevents l2cap_sock_shutdown() getting stuck forever.

Returns -ENOLINK when the limit is reached as the channel
will be subequently closed and not all data was ACK'ed.

Signed-off-by: Dean Jenkins <Dean_Jenkins@mentor.com>
---
 net/bluetooth/l2cap_sock.c | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/net/bluetooth/l2cap_sock.c b/net/bluetooth/l2cap_sock.c
index 369ad0e..ee6531e 100644
--- a/net/bluetooth/l2cap_sock.c
+++ b/net/bluetooth/l2cap_sock.c
@@ -1059,11 +1059,13 @@ static int __l2cap_wait_ack(struct sock *sk, struct l2cap_chan *chan)
 	DECLARE_WAITQUEUE(wait, current);
 	int err = 0;
 	int timeo = HZ/5;
+	int limiter = 10 * 5;	/* 10 seconds limit */
 
 	add_wait_queue(sk_sleep(sk), &wait);
 	set_current_state(TASK_INTERRUPTIBLE);
 	do {
-		BT_DBG("Waiting for %d ACKs", chan->unacked_frames);
+		BT_DBG("Waiting for %d ACKs, limiter %d",
+			chan->unacked_frames, limiter);
 
 		if (!timeo)
 			timeo = HZ/5;
@@ -1081,6 +1083,13 @@ static int __l2cap_wait_ack(struct sock *sk, struct l2cap_chan *chan)
 		err = sock_error(sk);
 		if (err)
 			break;
+
+		limiter--;
+		if (!limiter) {
+			err = -ENOLINK;
+			break;
+		}
+
 	} while (chan->unacked_frames > 0 &&
 		 chan->state == BT_CONNECTED);
 
-- 
1.8.5.6

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH v1 0/7] Avoid L2CAP ERTM shutdown hung tasks
  2015-06-05 23:11 [PATCH v1 0/7] Avoid L2CAP ERTM shutdown hung tasks Dean Jenkins
                   ` (6 preceding siblings ...)
  2015-06-05 23:11 ` [PATCH v1 7/7] Bluetooth: __l2cap_wait_ack() limit max waiting time Dean Jenkins
@ 2015-06-06  4:13 ` Marcel Holtmann
  2015-10-07 14:09   ` Luiz Augusto von Dentz
  7 siblings, 1 reply; 17+ messages in thread
From: Marcel Holtmann @ 2015-06-06  4:13 UTC (permalink / raw)
  To: Dean Jenkins
  Cc: BlueZ development, Gustavo F. Padovan, Johan Hedberg,
	Joshua_Frkuska

Hi Dean,

> Details of the issue are described in the thread
> [RFC] Bluetooth ERTM L2CAP deadlocks (hung tasks) due to l2cap_sock_shutdown()
> and kernel.org Bugzilla https://bugzilla.kernel.org/show_bug.cgi?id=99301
> 
> Summary
> =======
> 
> a) __l2cap_wait_ack() gets stuck in an infinite loop whilst waiting for
>   I frame acknowledgements (ack is S or I frame from the remote device).
>   The infinite loop occurs because the waiting loop's condition never becomes
>   false. __l2cap_wait_ack() periodically sleeps for 200ms.
> 
> b) The L2CAP threads for processing received frames and error recovery
>   timers become blocked on a mutex held by l2cap_sock_shutdown() which
>   prevents updating of the loop condition in__l2cap_wait_ack().
> 
> Therefore a deadlock occurs.
> 
> This deadlock was observed on kernels 3.8 upto bluetooth-next on x86 and ARM
> based platforms including the Raspberry Pi kernel 3.18.14+
> 
> The testcase is simple to reproduce the hung tasks, please wait upto 10 minutes
> to observe the hung tasks:
> 
> On a laptop or other suitable device use the command:
> l2test -i hci0 -X ertm -d
> This command listens for connections with optional L2CAP ERTM supported and
> dumps out the received data.
> 
> On the unit under test use the command:
> l2test -s -b 1024 -B /dev/urandom -X ertm -P 4113 <lap_BT_ADDR>
> This command sends 1024 bytes to the laptop via PSM 4113 used by l2test over
> an ERTM channel. The command line shows:
> 
> l2test -s -b 1024 -B /dev/urandom -X ertm -P 4113 5C:F3:70:68:BA:DD
> l2test[10366]: Connected to 5C:F3:70:68:BA:DD (bredr, psm 4113, scid 64)
> l2test[10366]: Local device 5C:F3:70:68:BA:DA (bredr, psm 4113, scid 64)
> l2test[10366]: Options [imtu 672, omtu 672, flush_to 65535, mode 3, handle 11, class 0x000000, priority 0, rcvbuf 163840]
> l2test[10366]: Sending ...
> l2test[10366]: Closing channel ... 
> 
> Note that the channel fails to close because l2cap_sock_shutdown() is prevented
> from returning due to __l2cap_wait_ack being stuck in an infinite loop.
> 
> Example hung tasks are:
> 
> [  601.116119] INFO: task kworker/0:1:19 blocked for more than 120 seconds.
> [  601.116156]       Not tainted 3.18.11+ #781
> [  601.116169] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [  601.116183] kworker/0:1     D c054cbcc     0    19      2 0x00000000
> [  601.116537] Workqueue: events l2cap_retrans_timeout [bluetooth]
> [  601.116685] [<c054cbcc>] (__schedule) from [<c054cfa4>] (schedule+0x40/0x8c)
> [  601.116727] [<c054cfa4>] (schedule) from [<c054d3c8>] (schedule_preempt_disabled+0x30/0x40)
> [  601.116812] [<c054d3c8>] (schedule_preempt_disabled) from [<c054ee14>] (__mutex_lock_slowpath+0xb8/0x18c)
> [  601.116856] [<c054ee14>] (__mutex_lock_slowpath) from [<c054ef20>] (mutex_lock+0x38/0x3c)
> [  601.117085] [<c054ef20>] (mutex_lock) from [<bf05173c>] (l2cap_retrans_timeout+0x24/0x5c [bluetooth])
> [  601.117371] [<bf05173c>] (l2cap_retrans_timeout [bluetooth]) from [<c003aa68>] (process_one_work+0x13c/0x444)
> [  601.117410] [<c003aa68>] (process_one_work) from [<c003b954>] (worker_thread+0x190/0x580)
> [  601.117445] [<c003b954>] (worker_thread) from [<c003fdb4>] (kthread+0xd8/0xf4)
> [  601.117488] [<c003fdb4>] (kthread) from [<c000e8c8>] (ret_from_fork+0x14/0x20)
> [  601.117569] INFO: task kworker/u3:2:327 blocked for more than 120 seconds.
> [  601.117588]       Not tainted 3.18.11+ #781
> [  601.117601] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [  601.117614] kworker/u3:2    D c054cbcc     0   327      2 0x00000000
> [  601.117806] Workqueue: hci0 hci_rx_work [bluetooth]
> [  601.117941] [<c054cbcc>] (__schedule) from [<c054cfa4>] (schedule+0x40/0x8c)
> [  601.117981] [<c054cfa4>] (schedule) from [<c054d3c8>] (schedule_preempt_disabled+0x30/0x40)
> [  601.118056] [<c054d3c8>] (schedule_preempt_disabled) from [<c054ee14>] (__mutex_lock_slowpath+0xb8/0x18c)
> [  601.118092] [<c054ee14>] (__mutex_lock_slowpath) from [<c054ef20>] (mutex_lock+0x38/0x3c)
> [  601.118307] [<c054ef20>] (mutex_lock) from [<bf04d2c8>] (l2cap_get_chan_by_scid+0x28/0x98 [bluetooth])
> [  601.118696] [<bf04d2c8>] (l2cap_get_chan_by_scid [bluetooth]) from [<bf054590>] (l2cap_recv_frame+0xa4/0x2cec [bluetooth])
> [  601.119146] [<bf054590>] (l2cap_recv_frame [bluetooth]) from [<bf058194>] (l2cap_recv_acldata+0x214/0x22c [bluetooth])
> [  601.119559] [<bf058194>] (l2cap_recv_acldata [bluetooth]) from [<bf02bf88>] (hci_rx_work+0x180/0x288 [bluetooth])
> [  601.119801] [<bf02bf88>] (hci_rx_work [bluetooth]) from [<c003aa68>] (process_one_work+0x13c/0x444)
> [  601.119839] [<c003aa68>] (process_one_work) from [<c003b954>] (worker_thread+0x190/0x580)
> [  601.119883] [<c003b954>] (worker_thread) from [<c003fdb4>] (kthread+0xd8/0xf4)
> [  601.119919] [<c003fdb4>] (kthread) from [<c000e8c8>] (ret_from_fork+0x14/0x20)
> 
> This is not an exhaustive list, there are other hung task flavours.
> 
> Solution
> ========
> 
> The solution was to reorganise l2cap_sock_shutdown() to reveal the purpose of
> the function. The function shuts down the L2CAP channel but in the case when the
> channel type is ERTM, __l2cap_wait_ack() is called to wait for acknowledgements
> before the channel is shutdown.
> 
> There is some concurrency protection by using a lock on sk to prevent
> multiple instances of l2cap_sock_shutdown() from running.
> 
> SHUTDOWN_MASK is used to inhibit new I frames and other PDUs from being
> queued. Setting this flag was moved to before calling __l2cap_wait_ack() so that
> the loop would not be waiting indefinitely for ACKs. There is some risk here
> of not sending all pending data, however, an arbitrary cut-off point is needed
> otherwise the link could remain connected. This may need improving to inhibit
> only new I frames.
> 
> Extra protection to the sk and chan structures was added to prevent those
> structures being freed whilst the locks were not held. This is defensive and
> perhaps unnecessary.
> 
> __l2cap_wait_ack() was modified to be only called when necessary and the
> while loop was replaced by a do while loop because there must be at least 1
> waiting period. References to chan->conn were replaced by chan->state so that
> the mutex locks could be removed. Also waiting for ACKs is only valid whilst the
> connection is in the BT_CONNECTED state.
> 
> The scope of chan locking was reduced so that processing of received frames
> and error handling routines could operate whilst waiting for ACKs.
> 
> Importantly, for __l2cap_wait_ack() a 10 second maximum wait limit was
> implemented to prevent __l2cap_wait_ack() getting stuck in an infinite loop.
> If no expected ACKs arrive then the channel is disconnected and the error
> code -ENOLINK is returned to indicate that not all data was ACK'ed.
> 
> Testing
> =======
> 
> This is being tested in kernel 3.8 on an ARM based board. Results so far
> show that the hung tasks no longer occur.
> 
> Note that the failure scenario was Mediaplayer media browsing. Waiting for
> ACK's is unimportant in this scenario because it does not matter whether
> a media browsing request such as "get item" was successfully transferred at the
> point of shutting down the channel.

any chance you have run PTS against it to ensure that we are not breaking qualification?

You could use android/pts-l2cap.txt as documentation for the commands for each test case. I would feel a lot more comfortable if we know that qualification is still in place.

Otherwise, this looks pretty good. Thanks for tracking this down.

Regards

Marcel


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v1 7/7] Bluetooth: __l2cap_wait_ack() limit max waiting time
  2015-06-05 23:11 ` [PATCH v1 7/7] Bluetooth: __l2cap_wait_ack() limit max waiting time Dean Jenkins
@ 2015-06-06  4:21   ` Marcel Holtmann
  2015-06-10 10:31     ` Dean Jenkins
  0 siblings, 1 reply; 17+ messages in thread
From: Marcel Holtmann @ 2015-06-06  4:21 UTC (permalink / raw)
  To: Dean Jenkins
  Cc: BlueZ development, Gustavo F. Padovan, Johan Hedberg,
	Joshua_Frkuska

Hi Dean,

> Add a limiter counter to prevent the do while loop
> running in an infinite loop. This ensures that the
> channel will be instructed to close within 10 seconds
> so prevents l2cap_sock_shutdown() getting stuck forever.
> 
> Returns -ENOLINK when the limit is reached as the channel
> will be subequently closed and not all data was ACK'ed.
> 
> Signed-off-by: Dean Jenkins <Dean_Jenkins@mentor.com>
> ---
> net/bluetooth/l2cap_sock.c | 11 ++++++++++-
> 1 file changed, 10 insertions(+), 1 deletion(-)
> 
> diff --git a/net/bluetooth/l2cap_sock.c b/net/bluetooth/l2cap_sock.c
> index 369ad0e..ee6531e 100644
> --- a/net/bluetooth/l2cap_sock.c
> +++ b/net/bluetooth/l2cap_sock.c
> @@ -1059,11 +1059,13 @@ static int __l2cap_wait_ack(struct sock *sk, struct l2cap_chan *chan)
> 	DECLARE_WAITQUEUE(wait, current);
> 	int err = 0;
> 	int timeo = HZ/5;
> +	int limiter = 10 * 5;	/* 10 seconds limit */

while reading this, should timeo not be using msecs_to_jiffies() in the first place.

And with that, can we have a little bit better logic on how you get to 10 seconds. I had to scratch my head a bit to realise that this is 50 * 200 msec. It seems a bit error prone in case anyone ever changes something.

> 
> 	add_wait_queue(sk_sleep(sk), &wait);
> 	set_current_state(TASK_INTERRUPTIBLE);
> 	do {
> -		BT_DBG("Waiting for %d ACKs", chan->unacked_frames);
> +		BT_DBG("Waiting for %d ACKs, limiter %d",
> +			chan->unacked_frames, limiter);
> 
> 		if (!timeo)
> 			timeo = HZ/5;

And with that, I have no idea why we are doing this check here. Seems rather pointless unless I misses something.

I know these are not your bugs, but while we are at it, it might be better to really clean this out.

> @@ -1081,6 +1083,13 @@ static int __l2cap_wait_ack(struct sock *sk, struct l2cap_chan *chan)
> 		err = sock_error(sk);
> 		if (err)
> 			break;
> +
> +		limiter--;
> +		if (!limiter) {
> +			err = -ENOLINK;
> +			break;
> +		}
> +
> 	} while (chan->unacked_frames > 0 &&
> 		 chan->state == BT_CONNECTED);

Regards

Marcel


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v1 3/7] Bluetooth: Unwind l2cap_sock_shutdown()
  2015-06-05 23:11 ` [PATCH v1 3/7] Bluetooth: Unwind l2cap_sock_shutdown() Dean Jenkins
@ 2015-06-09  9:55   ` Szymon Janc
  2015-06-09 10:13     ` Dean Jenkins
  0 siblings, 1 reply; 17+ messages in thread
From: Szymon Janc @ 2015-06-09  9:55 UTC (permalink / raw)
  To: Dean Jenkins
  Cc: linux-bluetooth, marcel, gustavo, johan.hedberg, Joshua_Frkuska

Hi Dean,

On Saturday 06 of June 2015 00:11:12 Dean Jenkins wrote:
> l2cap_sock_shutdown() is designed to only action shutdown
> of the channel when shutdown is not already in progress.
> Therefore, reorganise the code flow by adding a goto
> to jump to the end of function handling when shutdown is
> already being actioned. This removes one level of code
> indentation and make the code more readable.
> 
> Signed-off-by: Dean Jenkins <Dean_Jenkins@mentor.com>
> ---
>  net/bluetooth/l2cap_sock.c | 44
> ++++++++++++++++++++++++-------------------- 1 file changed, 24
> insertions(+), 20 deletions(-)
> 
> diff --git a/net/bluetooth/l2cap_sock.c b/net/bluetooth/l2cap_sock.c
> index c909f45..8550886 100644
> --- a/net/bluetooth/l2cap_sock.c
> +++ b/net/bluetooth/l2cap_sock.c
> @@ -1099,6 +1099,13 @@ static int l2cap_sock_shutdown(struct socket *sock,
> int how) if (!sk)
>  		return 0;
> 
> +	lock_sock(sk);
> +
> +	if (sk->sk_shutdown)
> +		goto shutdown_already;
> +
> +	sk->sk_shutdown = SHUTDOWN_MASK;
> +
>  	/* prevent sk structure from being freed whilst unlocked */
>  	sock_hold(sk);
> 
> @@ -1113,30 +1120,21 @@ static int l2cap_sock_shutdown(struct socket *sock,
> int how) mutex_lock(&conn->chan_lock);
> 
>  	l2cap_chan_lock(chan);
> -	lock_sock(sk);
> 
> -	if (!sk->sk_shutdown) {
> -		if (chan->mode == L2CAP_MODE_ERTM &&
> -		    chan->unacked_frames > 0 &&
> -		    chan->state == BT_CONNECTED)
> -			err = __l2cap_wait_ack(sk, chan);
> +	if (chan->mode == L2CAP_MODE_ERTM &&
> +	    chan->unacked_frames > 0 &&
> +	    chan->state == BT_CONNECTED)
> +		err = __l2cap_wait_ack(sk, chan);
> 
> -		sk->sk_shutdown = SHUTDOWN_MASK;
> -
> -		release_sock(sk);
> -		l2cap_chan_close(chan, 0);
> -		lock_sock(sk);
> -
> -		if (sock_flag(sk, SOCK_LINGER) && sk->sk_lingertime &&
> -		    !(current->flags & PF_EXITING))
> -			err = bt_sock_wait_state(sk, BT_CLOSED,
> -						 sk->sk_lingertime);
> -	}
> +	release_sock(sk);
> +	l2cap_chan_close(chan, 0);
> +	lock_sock(sk);
> 
> -	if (!err && sk->sk_err)
> -		err = -sk->sk_err;
> +	if (sock_flag(sk, SOCK_LINGER) && sk->sk_lingertime &&
> +	    !(current->flags & PF_EXITING))
> +		err = bt_sock_wait_state(sk, BT_CLOSED,
> +					 sk->sk_lingertime);
> 
> -	release_sock(sk);
>  	l2cap_chan_unlock(chan);
> 
>  	if (conn)
> @@ -1146,6 +1144,12 @@ static int l2cap_sock_shutdown(struct socket *sock,
> int how) l2cap_chan_put(chan);
>  	sock_put(sk);

Isn't it possible that sk will be freed here? (at least this is what comment 
from patch 1 suggests).

> 
> +shutdown_already:
> +	if (!err && sk->sk_err)
> +		err = -sk->sk_err;
> +
> +	release_sock(sk);
> +
>  	return err;
>  }

-- 
BR
Szymon Janc

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v1 3/7] Bluetooth: Unwind l2cap_sock_shutdown()
  2015-06-09  9:55   ` Szymon Janc
@ 2015-06-09 10:13     ` Dean Jenkins
  0 siblings, 0 replies; 17+ messages in thread
From: Dean Jenkins @ 2015-06-09 10:13 UTC (permalink / raw)
  To: Szymon Janc
  Cc: linux-bluetooth, marcel, gustavo, johan.hedberg, Joshua_Frkuska

Hi Szymon,

On 09/06/15 10:55, Szymon Janc wrote:
> Hi Dean,
>
> On Saturday 06 of June 2015 00:11:12 Dean Jenkins wrote:
>> l2cap_sock_shutdown() is designed to only action shutdown
>> of the channel when shutdown is not already in progress.
>> Therefore, reorganise the code flow by adding a goto
>> to jump to the end of function handling when shutdown is
>> already being actioned. This removes one level of code
>> indentation and make the code more readable.
>>
>> Signed-off-by: Dean Jenkins <Dean_Jenkins@mentor.com>
>> ---
>>   net/bluetooth/l2cap_sock.c | 44
>> ++++++++++++++++++++++++-------------------- 1 file changed, 24
>> insertions(+), 20 deletions(-)
>>
>> diff --git a/net/bluetooth/l2cap_sock.c b/net/bluetooth/l2cap_sock.c
>> index c909f45..8550886 100644
>> --- a/net/bluetooth/l2cap_sock.c
>> +++ b/net/bluetooth/l2cap_sock.c
>> @@ -1099,6 +1099,13 @@ static int l2cap_sock_shutdown(struct socket *sock,
>> int how) if (!sk)
>>   		return 0;
>>
>> +	lock_sock(sk);
>> +
>> +	if (sk->sk_shutdown)
>> +		goto shutdown_already;
>> +
>> +	sk->sk_shutdown = SHUTDOWN_MASK;
>> +
>>   	/* prevent sk structure from being freed whilst unlocked */
>>   	sock_hold(sk);
>>
>> @@ -1113,30 +1120,21 @@ static int l2cap_sock_shutdown(struct socket *sock,
>> int how) mutex_lock(&conn->chan_lock);
>>
>>   	l2cap_chan_lock(chan);
>> -	lock_sock(sk);
>>
>> -	if (!sk->sk_shutdown) {
>> -		if (chan->mode == L2CAP_MODE_ERTM &&
>> -		    chan->unacked_frames > 0 &&
>> -		    chan->state == BT_CONNECTED)
>> -			err = __l2cap_wait_ack(sk, chan);
>> +	if (chan->mode == L2CAP_MODE_ERTM &&
>> +	    chan->unacked_frames > 0 &&
>> +	    chan->state == BT_CONNECTED)
>> +		err = __l2cap_wait_ack(sk, chan);
>>
>> -		sk->sk_shutdown = SHUTDOWN_MASK;
>> -
>> -		release_sock(sk);
>> -		l2cap_chan_close(chan, 0);
>> -		lock_sock(sk);
>> -
>> -		if (sock_flag(sk, SOCK_LINGER) && sk->sk_lingertime &&
>> -		    !(current->flags & PF_EXITING))
>> -			err = bt_sock_wait_state(sk, BT_CLOSED,
>> -						 sk->sk_lingertime);
>> -	}
>> +	release_sock(sk);
>> +	l2cap_chan_close(chan, 0);
>> +	lock_sock(sk);
>>
>> -	if (!err && sk->sk_err)
>> -		err = -sk->sk_err;
>> +	if (sock_flag(sk, SOCK_LINGER) && sk->sk_lingertime &&
>> +	    !(current->flags & PF_EXITING))
>> +		err = bt_sock_wait_state(sk, BT_CLOSED,
>> +					 sk->sk_lingertime);
>>
>> -	release_sock(sk);
>>   	l2cap_chan_unlock(chan);
>>
>>   	if (conn)
>> @@ -1146,6 +1144,12 @@ static int l2cap_sock_shutdown(struct socket *sock,
>> int how) l2cap_chan_put(chan);
>>   	sock_put(sk);
> Isn't it possible that sk will be freed here? (at least this is what comment
> from patch 1 suggests).
At this point in the code, lock_sock(sk) is held so it should prevent 
the freeing of sk.
I think my comment is misleading. I will rewrite my comment.

Thanks.
>
>> +shutdown_already:
>> +	if (!err && sk->sk_err)
>> +		err = -sk->sk_err;
>> +
>> +	release_sock(sk);
>> +
>>   	return err;
>>   }


-- 
Dean Jenkins
Embedded Software Engineer
Linux Transportation Solutions
Mentor Embedded Software Division
Mentor Graphics (UK) Ltd.


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v1 7/7] Bluetooth: __l2cap_wait_ack() limit max waiting time
  2015-06-06  4:21   ` Marcel Holtmann
@ 2015-06-10 10:31     ` Dean Jenkins
  2015-06-10 11:18       ` Marcel Holtmann
  0 siblings, 1 reply; 17+ messages in thread
From: Dean Jenkins @ 2015-06-10 10:31 UTC (permalink / raw)
  To: Marcel Holtmann
  Cc: BlueZ development, Gustavo F. Padovan, Johan Hedberg,
	Joshua_Frkuska

Hi Marcel,

This patch is broken. Needs a redo. Please see inline comments.

On 06/06/15 05:21, Marcel Holtmann wrote:
> Hi Dean,
>
>> Add a limiter counter to prevent the do while loop
>> running in an infinite loop. This ensures that the
>> channel will be instructed to close within 10 seconds
>> so prevents l2cap_sock_shutdown() getting stuck forever.
>>
>> Returns -ENOLINK when the limit is reached as the channel
>> will be subequently closed and not all data was ACK'ed.
>>
>> Signed-off-by: Dean Jenkins <Dean_Jenkins@mentor.com>
>> ---
>> net/bluetooth/l2cap_sock.c | 11 ++++++++++-
>> 1 file changed, 10 insertions(+), 1 deletion(-)
>>
>> diff --git a/net/bluetooth/l2cap_sock.c b/net/bluetooth/l2cap_sock.c
>> index 369ad0e..ee6531e 100644
>> --- a/net/bluetooth/l2cap_sock.c
>> +++ b/net/bluetooth/l2cap_sock.c
>> @@ -1059,11 +1059,13 @@ static int __l2cap_wait_ack(struct sock *sk, struct l2cap_chan *chan)
>> 	DECLARE_WAITQUEUE(wait, current);
>> 	int err = 0;
>> 	int timeo = HZ/5;
>> +	int limiter = 10 * 5;	/* 10 seconds limit */
> while reading this, should timeo not be using msecs_to_jiffies() in the first place.
>
> And with that, can we have a little bit better logic on how you get to 10 seconds. I had to scratch my head a bit to realise that this is 50 * 200 msec. It seems a bit error prone in case anyone ever changes something.
Thanks for your comments. I will redo this by using a #define and use 2 
separate commits; one to add limiter, the other to use msecs_to_jiffies()
>
>> 	add_wait_queue(sk_sleep(sk), &wait);
>> 	set_current_state(TASK_INTERRUPTIBLE);
>> 	do {
>> -		BT_DBG("Waiting for %d ACKs", chan->unacked_frames);
>> +		BT_DBG("Waiting for %d ACKs, limiter %d",
>> +			chan->unacked_frames, limiter);
>>
>> 		if (!timeo)
>> 			timeo = HZ/5;
> And with that, I have no idea why we are doing this check here. Seems rather pointless unless I misses something.
Testing shows that schedule_timeout() can return before the timeo time 
period has expired. I do not know why schedule_timeout() is returning 
early, no signal is caught by the signal_pending() statement within the 
loop. This means that the patch is broken because limiter can decrement 
too fast so the elapsed time is less than 10 seconds. The fix is to move 
limiter-- to this location so that timeo is zero before doing limiter--.

The limiter counter will not give an accurate period of 10 seconds and 
will be longer than 10 seconds in most cases. However, the 10 seconds is 
arbitrary so the accuracy is unimportant.

Would you prefer an overall 10 second jiffies counter instead of using 
the limiter loop counter ? Such as
keep looping until jiffies > start_jiffies + 10*HZ
although jiffies overflow needs to be taken into account, right ?


>
> I know these are not your bugs, but while we are at it, it might be better to really clean this out.
>
>> @@ -1081,6 +1083,13 @@ static int __l2cap_wait_ack(struct sock *sk, struct l2cap_chan *chan)
>> 		err = sock_error(sk);
>> 		if (err)
>> 			break;
>> +
>> +		limiter--;
>> +		if (!limiter) {
>> +			err = -ENOLINK;
>> +			break;
>> +		}
>> +
>> 	} while (chan->unacked_frames > 0 &&
>> 		 chan->state == BT_CONNECTED);
> Regards
>
> Marcel
>
Regards,
Dean

-- 
Dean Jenkins
Embedded Software Engineer
Linux Transportation Solutions
Mentor Embedded Software Division
Mentor Graphics (UK) Ltd.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v1 7/7] Bluetooth: __l2cap_wait_ack() limit max waiting time
  2015-06-10 10:31     ` Dean Jenkins
@ 2015-06-10 11:18       ` Marcel Holtmann
  0 siblings, 0 replies; 17+ messages in thread
From: Marcel Holtmann @ 2015-06-10 11:18 UTC (permalink / raw)
  To: Dean Jenkins
  Cc: BlueZ development, Gustavo F. Padovan, Johan Hedberg,
	Joshua_Frkuska

Hi Dean,

>>> Add a limiter counter to prevent the do while loop
>>> running in an infinite loop. This ensures that the
>>> channel will be instructed to close within 10 seconds
>>> so prevents l2cap_sock_shutdown() getting stuck forever.
>>> 
>>> Returns -ENOLINK when the limit is reached as the channel
>>> will be subequently closed and not all data was ACK'ed.
>>> 
>>> Signed-off-by: Dean Jenkins <Dean_Jenkins@mentor.com>
>>> ---
>>> net/bluetooth/l2cap_sock.c | 11 ++++++++++-
>>> 1 file changed, 10 insertions(+), 1 deletion(-)
>>> 
>>> diff --git a/net/bluetooth/l2cap_sock.c b/net/bluetooth/l2cap_sock.c
>>> index 369ad0e..ee6531e 100644
>>> --- a/net/bluetooth/l2cap_sock.c
>>> +++ b/net/bluetooth/l2cap_sock.c
>>> @@ -1059,11 +1059,13 @@ static int __l2cap_wait_ack(struct sock *sk, struct l2cap_chan *chan)
>>> 	DECLARE_WAITQUEUE(wait, current);
>>> 	int err = 0;
>>> 	int timeo = HZ/5;
>>> +	int limiter = 10 * 5;	/* 10 seconds limit */
>> while reading this, should timeo not be using msecs_to_jiffies() in the first place.
>> 
>> And with that, can we have a little bit better logic on how you get to 10 seconds. I had to scratch my head a bit to realise that this is 50 * 200 msec. It seems a bit error prone in case anyone ever changes something.
> Thanks for your comments. I will redo this by using a #define and use 2 separate commits; one to add limiter, the other to use msecs_to_jiffies()
>> 
>>> 	add_wait_queue(sk_sleep(sk), &wait);
>>> 	set_current_state(TASK_INTERRUPTIBLE);
>>> 	do {
>>> -		BT_DBG("Waiting for %d ACKs", chan->unacked_frames);
>>> +		BT_DBG("Waiting for %d ACKs, limiter %d",
>>> +			chan->unacked_frames, limiter);
>>> 
>>> 		if (!timeo)
>>> 			timeo = HZ/5;
>> And with that, I have no idea why we are doing this check here. Seems rather pointless unless I misses something.
> Testing shows that schedule_timeout() can return before the timeo time period has expired. I do not know why schedule_timeout() is returning early, no signal is caught by the signal_pending() statement within the loop. This means that the patch is broken because limiter can decrement too fast so the elapsed time is less than 10 seconds. The fix is to move limiter-- to this location so that timeo is zero before doing limiter--.
> 
> The limiter counter will not give an accurate period of 10 seconds and will be longer than 10 seconds in most cases. However, the 10 seconds is arbitrary so the accuracy is unimportant.
> 
> Would you prefer an overall 10 second jiffies counter instead of using the limiter loop counter ? Such as
> keep looping until jiffies > start_jiffies + 10*HZ
> although jiffies overflow needs to be taken into account, right ?

but we have time_after helpers for that right? If so, then try that one.

Regards

Marcel


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v1 0/7] Avoid L2CAP ERTM shutdown hung tasks
  2015-06-06  4:13 ` [PATCH v1 0/7] Avoid L2CAP ERTM shutdown hung tasks Marcel Holtmann
@ 2015-10-07 14:09   ` Luiz Augusto von Dentz
  2015-10-07 14:43     ` Dean Jenkins
  0 siblings, 1 reply; 17+ messages in thread
From: Luiz Augusto von Dentz @ 2015-10-07 14:09 UTC (permalink / raw)
  To: Marcel Holtmann
  Cc: Dean Jenkins, BlueZ development, Gustavo F. Padovan,
	Johan Hedberg, Joshua_Frkuska

Hi Dean,

On Sat, Jun 6, 2015 at 7:13 AM, Marcel Holtmann <marcel@holtmann.org> wrote:
> Hi Dean,
>
>> Details of the issue are described in the thread
>> [RFC] Bluetooth ERTM L2CAP deadlocks (hung tasks) due to l2cap_sock_shutdown()
>> and kernel.org Bugzilla https://bugzilla.kernel.org/show_bug.cgi?id=99301

Are you still working on this? I can still lock bluetoothd on close,
btw it seems you are not aware of the following:

       SO_LINGER
              When enabled, a close(2) or shutdown(2) will not return
until all queued messages for the socket have been successfully sent
or the linger timeout  has  been  reached.   Otherwise,  the  call
returns immediately and the closing is done in the background.

So according to the text above we should never call __l2cap_wait_ack
on l2cap_sock_shutdown since it should return immediately, only when
SO_LINGER is used it is allowed to block. Also there doesn't seems to
be anything prevent us to send a L2CAP Disconnect request while there
are unacked packets, at least I could find anything on the core about
waiting for unacked nor the test spec has any test for it.

-- 
Luiz Augusto von Dentz

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v1 0/7] Avoid L2CAP ERTM shutdown hung tasks
  2015-10-07 14:09   ` Luiz Augusto von Dentz
@ 2015-10-07 14:43     ` Dean Jenkins
  2015-10-08  8:59       ` Luiz Augusto von Dentz
  0 siblings, 1 reply; 17+ messages in thread
From: Dean Jenkins @ 2015-10-07 14:43 UTC (permalink / raw)
  To: Luiz Augusto von Dentz, Marcel Holtmann
  Cc: BlueZ development, Gustavo F. Padovan, Johan Hedberg,
	Joshua_Frkuska, Craske, Mark

Hi Luiz,

On 07/10/15 15:09, Luiz Augusto von Dentz wrote:
> Hi Dean,
>
> On Sat, Jun 6, 2015 at 7:13 AM, Marcel Holtmann <marcel@holtmann.org> wrote:
>> Hi Dean,
>>
>>> Details of the issue are described in the thread
>>> [RFC] Bluetooth ERTM L2CAP deadlocks (hung tasks) due to l2cap_sock_shutdown()
>>> and kernel.org Bugzilla https://bugzilla.kernel.org/show_bug.cgi?id=99301
> Are you still working on this? I can still lock bluetoothd on close,
> btw it seems you are not aware of the following:
>
>         SO_LINGER
>                When enabled, a close(2) or shutdown(2) will not return
> until all queued messages for the socket have been successfully sent
> or the linger timeout  has  been  reached.   Otherwise,  the  call
> returns immediately and the closing is done in the background.
>
> So according to the text above we should never call __l2cap_wait_ack
> on l2cap_sock_shutdown since it should return immediately, only when
> SO_LINGER is used it is allowed to block. Also there doesn't seems to
> be anything prevent us to send a L2CAP Disconnect request while there
> are unacked packets, at least I could find anything on the core about
> waiting for unacked nor the test spec has any test for it.
>
Thanks for the information about SO_LINGER time. So it sounds like 
further changes to the logic of the code are needed.

Yes, we are still working on this issue. We have devised a fix for our 
commercial ARM kernel 3.14. and so far we have no reported failures.

In our fix, we had to ensure "L2CAP Disconnect request" took precedence 
over shutdown.

Note that ACKS are only used for L2CAP ERTM and so only some BT services 
risk getting hung such as audio media browsing over BT.

We are currently preparing a patchset for bluetooth-next of our 2nd 
attempt at resolving the locking issue.

Thanks,

Regards,
Dean

-- 
Dean Jenkins
Embedded Software Engineer
Linux Transportation Solutions
Mentor Embedded Software Division
Mentor Graphics (UK) Ltd.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v1 0/7] Avoid L2CAP ERTM shutdown hung tasks
  2015-10-07 14:43     ` Dean Jenkins
@ 2015-10-08  8:59       ` Luiz Augusto von Dentz
  0 siblings, 0 replies; 17+ messages in thread
From: Luiz Augusto von Dentz @ 2015-10-08  8:59 UTC (permalink / raw)
  To: Dean Jenkins
  Cc: Marcel Holtmann, BlueZ development, Gustavo F. Padovan,
	Johan Hedberg, Joshua_Frkuska, Craske, Mark

Hi Dean,

On Wed, Oct 7, 2015 at 5:43 PM, Dean Jenkins <Dean_Jenkins@mentor.com> wrot=
e:
> Hi Luiz,
>
> On 07/10/15 15:09, Luiz Augusto von Dentz wrote:
>>
>> Hi Dean,
>>
>> On Sat, Jun 6, 2015 at 7:13 AM, Marcel Holtmann <marcel@holtmann.org>
>> wrote:
>>>
>>> Hi Dean,
>>>
>>>> Details of the issue are described in the thread
>>>> [RFC] Bluetooth ERTM L2CAP deadlocks (hung tasks) due to
>>>> l2cap_sock_shutdown()
>>>> and kernel.org Bugzilla
>>>> https://bugzilla.kernel.org/show_bug.cgi?id=3D99301
>>
>> Are you still working on this? I can still lock bluetoothd on close,
>> btw it seems you are not aware of the following:
>>
>>         SO_LINGER
>>                When enabled, a close(2) or shutdown(2) will not return
>> until all queued messages for the socket have been successfully sent
>> or the linger timeout  has  been  reached.   Otherwise,  the  call
>> returns immediately and the closing is done in the background.
>>
>> So according to the text above we should never call __l2cap_wait_ack
>> on l2cap_sock_shutdown since it should return immediately, only when
>> SO_LINGER is used it is allowed to block. Also there doesn't seems to
>> be anything prevent us to send a L2CAP Disconnect request while there
>> are unacked packets, at least I could find anything on the core about
>> waiting for unacked nor the test spec has any test for it.
>>
> Thanks for the information about SO_LINGER time. So it sounds like furthe=
r
> changes to the logic of the code are needed.
>
> Yes, we are still working on this issue. We have devised a fix for our
> commercial ARM kernel 3.14. and so far we have no reported failures.
>
> In our fix, we had to ensure "L2CAP Disconnect request" took precedence o=
ver
> shutdown.

Alright but then __l2cap_wait_ack should probably not be needed
anymore since the remote should discard any data not processed yet, or
if did process send the pending acks before the disconnect response,
which makes me question why we have this code in the first place.

> Note that ACKS are only used for L2CAP ERTM and so only some BT services
> risk getting hung such as audio media browsing over BT.

Actually for AVRCP browsing it does not matter since it won't transfer
persistent data, GOEP 2.0 would perhaps be a problem since it means
data transfers, which could be files in case of FTP, may not be
complete but OBEX don't have segmentation so you always need to wait
the final of GET or PUT command even in Single Response Mode which
means if the we disconnect before the data completes we should discard
the data so there is no point in waiting for unacked packets in GOEP
2.0.

> We are currently preparing a patchset for bluetooth-next of our 2nd attem=
pt
> at resolving the locking issue.

I wonder if you tried removing __l2cap_wait_ack and related code?

Btw, Ive trace down the original commit introduce this logic to:

commit 6161c0382bbab883a634d284f7367a88bbe88534
Author: Gustavo F. Padovan <padovan@profusion.mobi>
Date:   Sat May 1 16:15:44 2010 -0300

    Bluetooth: Add wait_queue to wait ack of all sent packets

    To guarantee that all packets we sent were received we need to wait for
    theirs ack before shutdown the socket.

    Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
    Reviewed-by: Jo=C3=A3o Paulo Rechi Vita <jprvita@profusion.mobi>
    Signed-off-by: Marcel Holtmann <marcel@holtmann.org>

It looks to me it was trying to prevent that the scenario where you
send a bunch of data and disconnect, probably l2test would do that,
which means there is no protocol on top, which Im now suspecting
doesn't even work with BlueZ as a receiving part because we only ack
if the window is 3/4ths full.

--=20
Luiz Augusto von Dentz

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2015-10-08  8:59 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-06-05 23:11 [PATCH v1 0/7] Avoid L2CAP ERTM shutdown hung tasks Dean Jenkins
2015-06-05 23:11 ` [PATCH v1 1/7] Bluetooth: L2CAP ERTM shutdown protect sk and chan Dean Jenkins
2015-06-05 23:11 ` [PATCH v1 2/7] Bluetooth: Make __l2cap_wait_ack more efficient Dean Jenkins
2015-06-05 23:11 ` [PATCH v1 3/7] Bluetooth: Unwind l2cap_sock_shutdown() Dean Jenkins
2015-06-09  9:55   ` Szymon Janc
2015-06-09 10:13     ` Dean Jenkins
2015-06-05 23:11 ` [PATCH v1 4/7] Bluetooth: l2cap_sock_shutdown() remove mutex_lock calls Dean Jenkins
2015-06-05 23:11 ` [PATCH v1 5/7] Bluetooth: l2cap_sock_shutdown() reduce scope of chan locking Dean Jenkins
2015-06-05 23:11 ` [PATCH v1 6/7] Bluetooth: Add BT_DBG to l2cap_sock_shutdown() Dean Jenkins
2015-06-05 23:11 ` [PATCH v1 7/7] Bluetooth: __l2cap_wait_ack() limit max waiting time Dean Jenkins
2015-06-06  4:21   ` Marcel Holtmann
2015-06-10 10:31     ` Dean Jenkins
2015-06-10 11:18       ` Marcel Holtmann
2015-06-06  4:13 ` [PATCH v1 0/7] Avoid L2CAP ERTM shutdown hung tasks Marcel Holtmann
2015-10-07 14:09   ` Luiz Augusto von Dentz
2015-10-07 14:43     ` Dean Jenkins
2015-10-08  8:59       ` Luiz Augusto von Dentz

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).