From: Dean Jenkins <Dean_Jenkins@mentor.com>
To: BlueZ development <linux-bluetooth@vger.kernel.org>
Cc: Marcel Holtmann <marcel@holtmann.org>,
Gustavo Padovan <gustavo@padovan.org>,
Johan Hedberg <johan.hedberg@gmail.com>,
"Frkuska, Joshua" <Joshua_Frkuska@mentor.com>
Subject: Re: [RFC] Bluetooth ERTM L2CAP deadlocks (hung tasks) due to l2cap_sock_shutdown()
Date: Fri, 5 Jun 2015 09:35:19 +0100 [thread overview]
Message-ID: <55715F47.7070900@mentor.com> (raw)
In-Reply-To: <55683B1E.7090001@mentor.com>
Hi Marcel, Gustavo and Johan,
We are working on a fix for the Bluetooth ERTM L2CAP deadlock (hung
task) issue. In particular to fix kernel 3.8, however, I have
forward-ported the patches to bluetooth-next and I believe the fix is
appropriate for bluetooth-next as sjanc on IRC was able to reproduce the
deadlock on bluetooth-next.
I have uploaded our set of proposed patches in
l2cap_ertm_shutdown_deadlock_patches.tgz to
https://bugzilla.kernel.org/show_bug.cgi?id=99301
We are still testing so the patches may not be the final versions but
the signs are good that the solution works.
Feel free to comment and collaborate.
I use the nick "djenkins" in the Bluetooth IRC chat room, please connect
me if you wish to discuss the issue.
Thanks for your time.
Regards,
Dean
On 29/05/15 11:10, Dean Jenkins wrote:
> Hi Marcel, Gustavo and Johan,
>
> My original E-mail is below for completeness so that you understand
> the context of this new E-mail.
>
> We have been successful in demonstrating a L2CAP ERTM shutdown
> deadlock by using the Bluez l2test utility. We think the latest 4.x
> kernel can also deadlock (not tested).
>
> Details of the simple test case are as follows:
>
> Apparatus
> ======
>
> Raspberry Pi (RPi) original B variant (single ARM core) running Linux
> Raspian with kernel 3.18.11+ and a USB BT v4.0 dongle. bluez is v5.15
> (locally built)
> Dell laptop running Linux Mageia 4 with kernel 3.14.32 and an on-board
> USB BT dongle. bluez is v5.11
>
> Method
> =====
>
> INITIAL SETTINGS:
>
> Ensure bluetoothd is running on the laptop and RPi so the D-Bus
> services are available for the Bluez utilities to use
> Run bluetoothctl on the laptop and RPi and ensure that an adaptor is
> shown
>
> Type in the following commands on both systems:
> agent on
> default-agent
> power on
> discoverable on
> scan on
>
> Both systems should now detect each other
>
> Make a note of each device's BT address. Call them <lap_BT_ADDR> and
> <RPi_BT_ADDR>
>
> On the RPi pair with the laptop (visa versa will also work):
>
> On the RPi do command
> pair <lap_BT_ADDR>
> Accept the pin code
>
> On the laptop do
> Accept the pin code
>
> Pairing should now be successful. bluetoothd must continue to run so
> that the linkkeys can be serviced by the agent.
>
> bluetoothctl can be quit on both systems
>
> USING L2TEST
>
> l2test can be used in various modes of operation controlled via
> command line options. l2test can be used in a client-server
> arrangement whereby 1 BT device listens for a connection (server) and
> the other BT device connects (client) to the listening device.
>
> For this testcase:
>
> RPi is the l2test client (unit under test)
> laptop is the l2test server
>
> On the laptop use the command:
> l2test -i hci0 -X ertm -d
> This command listens for connections with optional L2CAP ERTM
> supported and dumps out the received data.
>
> TEST 1: NON-ERTM CONNECTION
>
> On the RPi use the command:
> l2test -s -b 1024 -B /dev/urandom -P 4113 <lap_BT_ADDR>
> This command sends 1024 bytes to the laptop via PSM 4113 used by
> l2test over a L2CAP basic mode channel.
>
> TEST 2: ERTM CONNECTION
>
> On the RPi use the command:
> l2test -s -b 1024 -B /dev/urandom -X ertm -P 4113 <lap_BT_ADDR>
> This command sends 1024 bytes to the laptop via PSM 4113 used by
> l2test over an ERTM channel
>
> Results
> =====
>
> TEST 1
>
> root@raspberrypi:~# l2test -s -b 1024 -B /dev/urandom -P 4113
> 5C:F3:70:68:BA:DD
> l2test[10346]: Connected to 5C:F3:70:68:BA:DD (bredr, psm 4113, scid 64)
> l2test[10346]: Local device 5C:F3:70:68:BA:DA (bredr, psm 4113, scid 64)
> l2test[10346]: Options [imtu 672, omtu 672, flush_to 65535, mode 0,
> handle 11, class 0x000000, priority 0, rcvbuf 163840]
> l2test[10346]: Sending ...
> l2test[10346]: Closing channel ...
> l2test[10346]: Done
> l2test[10346]: Exit
>
> Using basic L2CAP mode, the channel sends and closes OK.
> l2test terminates as expected.
>
> TEST 2
>
> root@raspberrypi:~# l2test -s -b 1024 -B /dev/urandom -X ertm -P 4113
> 5C:F3:70:68:BA:DD
> l2test[10366]: Connected to 5C:F3:70:68:BA:DD (bredr, psm 4113, scid 64)
> l2test[10366]: Local device 5C:F3:70:68:BA:DA (bredr, psm 4113, scid 64)
> l2test[10366]: Options [imtu 672, omtu 672, flush_to 65535, mode 3,
> handle 11, class 0x000000, priority 0, rcvbuf 163840]
> l2test[10366]: Sending ...
> l2test[10366]: Closing channel ...
>
> Using L2CAP ERTM, the channel is instructed to close but hangs.
> l2test fails to naturally terminate.
>
> From Bluez 5.15 tools/l2test.c:
> static void send_mode(int sk)
> {
> do_send(sk);
>
> syslog(LOG_INFO, "Closing channel ...");
> if (shutdown(sk, SHUT_RDWR) < 0)
> syslog(LOG_INFO, "Close failed: %m");
> else
> syslog(LOG_INFO, "Done");
> }
>
> The -s send l2test command line option calls send_mode() to send the
> data via do_send() and then closes the channel with the call to
> shutdown().
> do_send() is probably not checking that the data was successfully sent
> before returning meaning potentially not all the I frames were ACK'ed.
>
> The results shows that shutdown() fails to return as no "Done" textual
> message is seen.
>
> When the RPi is left in this condition for up to 10 minutes (need to
> wait longer than 2 times the hung task detector period plus L2CAP
> timer expiry time), dmesg shows:
>
> [ 601.116119] INFO: task kworker/0:1:19 blocked for more than 120
> seconds.
> [ 601.116156] Not tainted 3.18.11+ #781
> [ 601.116169] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> disables this message.
> [ 601.116183] kworker/0:1 D c054cbcc 0 19 2 0x00000000
> [ 601.116537] Workqueue: events l2cap_retrans_timeout [bluetooth]
> [ 601.116685] [<c054cbcc>] (__schedule) from [<c054cfa4>]
> (schedule+0x40/0x8c)
> [ 601.116727] [<c054cfa4>] (schedule) from [<c054d3c8>]
> (schedule_preempt_disabled+0x30/0x40)
> [ 601.116812] [<c054d3c8>] (schedule_preempt_disabled) from
> [<c054ee14>] (__mutex_lock_slowpath+0xb8/0x18c)
> [ 601.116856] [<c054ee14>] (__mutex_lock_slowpath) from [<c054ef20>]
> (mutex_lock+0x38/0x3c)
> [ 601.117085] [<c054ef20>] (mutex_lock) from [<bf05173c>]
> (l2cap_retrans_timeout+0x24/0x5c [bluetooth])
> [ 601.117371] [<bf05173c>] (l2cap_retrans_timeout [bluetooth]) from
> [<c003aa68>] (process_one_work+0x13c/0x444)
> [ 601.117410] [<c003aa68>] (process_one_work) from [<c003b954>]
> (worker_thread+0x190/0x580)
> [ 601.117445] [<c003b954>] (worker_thread) from [<c003fdb4>]
> (kthread+0xd8/0xf4)
> [ 601.117488] [<c003fdb4>] (kthread) from [<c000e8c8>]
> (ret_from_fork+0x14/0x20)
> [ 601.117569] INFO: task kworker/u3:2:327 blocked for more than 120
> seconds.
> [ 601.117588] Not tainted 3.18.11+ #781
> [ 601.117601] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> disables this message.
> [ 601.117614] kworker/u3:2 D c054cbcc 0 327 2 0x00000000
> [ 601.117806] Workqueue: hci0 hci_rx_work [bluetooth]
> [ 601.117941] [<c054cbcc>] (__schedule) from [<c054cfa4>]
> (schedule+0x40/0x8c)
> [ 601.117981] [<c054cfa4>] (schedule) from [<c054d3c8>]
> (schedule_preempt_disabled+0x30/0x40)
> [ 601.118056] [<c054d3c8>] (schedule_preempt_disabled) from
> [<c054ee14>] (__mutex_lock_slowpath+0xb8/0x18c)
> [ 601.118092] [<c054ee14>] (__mutex_lock_slowpath) from [<c054ef20>]
> (mutex_lock+0x38/0x3c)
> [ 601.118307] [<c054ef20>] (mutex_lock) from [<bf04d2c8>]
> (l2cap_get_chan_by_scid+0x28/0x98 [bluetooth])
> [ 601.118696] [<bf04d2c8>] (l2cap_get_chan_by_scid [bluetooth]) from
> [<bf054590>] (l2cap_recv_frame+0xa4/0x2cec [bluetooth])
> [ 601.119146] [<bf054590>] (l2cap_recv_frame [bluetooth]) from
> [<bf058194>] (l2cap_recv_acldata+0x214/0x22c [bluetooth])
> [ 601.119559] [<bf058194>] (l2cap_recv_acldata [bluetooth]) from
> [<bf02bf88>] (hci_rx_work+0x180/0x288 [bluetooth])
> [ 601.119801] [<bf02bf88>] (hci_rx_work [bluetooth]) from
> [<c003aa68>] (process_one_work+0x13c/0x444)
> [ 601.119839] [<c003aa68>] (process_one_work) from [<c003b954>]
> (worker_thread+0x190/0x580)
> [ 601.119883] [<c003b954>] (worker_thread) from [<c003fdb4>]
> (kthread+0xd8/0xf4)
> [ 601.119919] [<c003fdb4>] (kthread) from [<c000e8c8>]
> (ret_from_fork+0x14/0x20)
>
> Conclusions
> ========
>
> The hung task failure is easy to reproduce with the l2test utility. It
> could be argued that the do_send() function should only return when
> all the data has been successfully acknowledged but on the other hand,
> it is ideal to generate the right conditions for the failure to be
> triggered.
>
> Probably, the low resources on the RPi help to make the failure more
> likely than using a PC for example.
>
> hcidump shows that the S RR frame from the laptop comes 320ms after
> the last I frame from the RPi which generates a window of opportunity
> for the shutdown() call to execute l2cap_sock_shutdown() before the S
> RR frame has been processed.
>
> It can be seen that the root cause function __l2cap_wait_ack() is not
> reported by the hung task detector because the function is running in
> an infinite loop but periodically waiting (sleeping) for 200ms so is
> not detectable by the hung task detector. The hung task detector is
> triggered when the L2CAP recovery timers expire
> (l2cap_retrans_timeout) and gets blocked, or when a new frame is
> received (l2cap_get_chan_by_scid) and processing gets blocked.
>
> We looked at a range of kernels from 3.8 to 4.0 which we think can
> suffer the deadlock.
>
> We managed to get a deadlock on Fedora using kernel 4.0.4 but it was a
> different test case with an iPhone.
>
> Please can we work together to find a solution to the deadlock ?
>
> Note that our interest is in fixing kernels 3.8 and 3.14 so hopefully
> any fix to the latest 4.x kernel can be back-ported to the kernels
> that we are using.
>
> Many thanks for your time.
>
> Regards,
> Dean
>
> On 07/05/15 16:26, Dean Jenkins wrote:
>> Hi Marcel, Gustavo and Johan,
>>
>> Please see the end of the E-mail for a workaround, do you agree with
>> the workaround, any comments ?
>>
>> Description of L2CAP ERTM deadlock
>> ======================
>>
>> Using kernel 3.8 on an ARM i.MX6 based platform, we have identified
>> some deadlock situations involving L2CAP ERTM. We believe kernel
>> v4.0.1 can deadlock as well (not tested it) as the affected code is
>> similar to kernel 3.8.
>>
>> With a L2CAP ERTM channel connected, and l2cap_sock_shutdown() is
>> called, then __l2cap_wait_ack() is called which is supposed to wait
>> for all sent I frames to be acknowledged.
>>
>> l2cap_sock_shutdown() takes all 3 locks namely:
>>
>> mutex_lock(&conn->chan_lock);
>> l2cap_chan_lock(chan);
>> lock_sock(sk);
>>
>> __l2cap_wait_ack() is implemented as follows:
>>
>> int __l2cap_wait_ack(struct sock *sk)
>> {
>> struct l2cap_chan *chan = l2cap_pi(sk)->chan;
>> DECLARE_WAITQUEUE(wait, current);
>> int err = 0;
>> int timeo = HZ/5;
>>
>> add_wait_queue(sk_sleep(sk), &wait);
>> set_current_state(TASK_INTERRUPTIBLE);
>> while (chan->unacked_frames > 0 && chan->conn) {
>> if (!timeo)
>> timeo = HZ/5;
>>
>> if (signal_pending(current)) {
>> err = sock_intr_errno(timeo);
>> break;
>> }
>>
>> release_sock(sk);
>> timeo = schedule_timeout(timeo);
>> lock_sock(sk);
>> set_current_state(TASK_INTERRUPTIBLE);
>>
>> err = sock_error(sk);
>> if (err)
>> break;
>> }
>> set_current_state(TASK_RUNNING);
>> remove_wait_queue(sk_sleep(sk), &wait);
>> return err;
>> }
>>
>> Analysis shows that the while loop checks chan->unacked_frames > 0
>> and chan->conn is !NULL. So when both are true the loop waits
>> (sleeps) for 200ms using schedule_timeout() and then rechecks the
>> while loop conditions before waiting again.
>>
>> Unfortunately, during this waiting (sleeping) period the following
>> locks are still held:
>> mutex_lock(&conn->chan_lock);
>> l2cap_chan_lock(chan);
>>
>> These held locks prevent l2cap_recv_frame() from processing the
>> received ACKs for the sent I frames in ERTM. This means
>> chan->unacked_frames > 0 remains true.
>>
>> chan->conn cannot be set to NULL by the channel being closed by other
>> means as the held locks prevent the state being updated.
>>
>> Therefore a deadlock occurs because __l2cap_wait_ack() gets stuck in
>> an infinite loop which periodically waits (sleeps). The kernel's hung
>> task detector is not able to detect this "hung" thread because the
>> thread periodically runs. This means l2cap_sock_shutdown() does not
>> return. With debug, we have proven that the infinite loop occurs.
>>
>> Here is an example hung task log and LOCKDEP output of a L2CAP ERTM
>> deadlock scenario: (we do have some other different traces as well so
>> more than 1 failure scenario exists)
>>
>> [ 360.497912] INFO: task kworker/u:0H:7 blocked for more than 180
>> seconds.
>> [ 360.497931] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
>> disables this message.
>> [ 360.497944] kworker/u:0H D 8046bb28 0 7 2 0x00000000
>> [ 360.498051] Backtrace:
>> [ 360.498145] [<8046b624>] (__schedule+0x0/0x790) from [<8046be3c>]
>> (schedule+0x88/0x8c)
>> [ 360.498231] [<8046bdb4>] (schedule+0x0/0x8c) from [<8046c1c8>]
>> (schedule_preempt_disabled+0x18/0x24)
>> [ 360.498318] [<8046c1b0>] (schedule_preempt_disabled+0x0/0x24) from
>> [<80469b70>] (mutex_lock_nested+0x250/0x3e8)
>> [ 360.498495] [<80469920>] (mutex_lock_nested+0x0/0x3e8) from
>> [<7f44d2a8>] (l2cap_get_chan_by_scid+0x24/0x54 [bluetooth])
>> [ 360.498696] [<7f44d284>] (l2cap_get_chan_by_scid+0x0/0x54
>> [bluetooth]) from [<7f453f68>] (l2cap_recv_frame+0x1918/0x1cc8
>> [bluetooth])
>> [ 360.498779] r6:0000006b r5:9773ef00 r4:96bfdc00 r3:0000003f
>> [ 360.498919] [<7f452650>] (l2cap_recv_frame+0x0/0x1cc8 [bluetooth])
>> from [<7f454e3c>] (l2cap_recv_acldata+0xc0/0x1f8 [bluetooth])
>> [ 360.499088] [<7f454d7c>] (l2cap_recv_acldata+0x0/0x1f8
>> [bluetooth]) from [<7f437c74>] (hci_rx_work+0x1b4/0x2ec [bluetooth])
>> [ 360.499176] r6:9759a000 r5:9759a77c r4:9773ef00 r3:00000000
>> [ 360.499259] [<7f437ac0>] (hci_rx_work+0x0/0x2ec [bluetooth]) from
>> [<8003ee1c>] (process_one_work+0x344/0x540)
>> [ 360.499352] [<8003ead8>] (process_one_work+0x0/0x540) from
>> [<8003f2e4>] (worker_thread+0x298/0x3ec)
>> [ 360.499449] [<8003f04c>] (worker_thread+0x0/0x3ec) from
>> [<80044a40>] (kthread+0xb4/0xc0)
>> [ 360.499543] [<8004498c>] (kthread+0x0/0xc0) from [<8000da30>]
>> (ret_from_fork+0x14/0x24)
>> [ 360.499622] r7:00000000 r6:00000000 r5:8004498c r4:a40a3e34
>> [ 360.499880]
>> Showing all locks held in the system:
>> [ 360.499909] 3 locks held by kworker/u:0H/7:
>> [ 360.499921] #0: (hdev->name){++++..}, at: [<8003ed28>]
>> process_one_work+0x250/0x540
>> [ 360.499968] #1: ((&hdev->rx_work)){+.+...}, at: [<8003ed28>]
>> process_one_work+0x250/0x540
>> [ 360.500009] #2: (&conn->chan_lock){+.+...}, at: [<7f44d2a8>]
>> l2cap_get_chan_by_scid+0x24/0x54 [bluetooth]
>> <snip>
>> [ 360.500256] 3 locks held by kworker/1:2/64:
>> [ 360.500266] #0: (events){.+.+..}, at: [<8003ed28>]
>> process_one_work+0x250/0x540
>> [ 360.500307] #1: ((&(&chan->retrans_timer)->work)){+.+...}, at:
>> [<8003ed28>] process_one_work+0x250/0x540
>> [ 360.500346] #2: (&chan->lock){+.+...}, at: [<7f450414>]
>> l2cap_retrans_timeout+0x24/0x58 [bluetooth]
>> <snip>
>> [ 360.502066] 2 locks held by <userland BT daemon>/1433:
>> [ 360.502076] #0: (&conn->chan_lock){+.+...}, at: [<7f455378>]
>> l2cap_sock_shutdown+0x34/0x130 [bluetooth]
>> [ 360.502179] #1: (&chan->lock){+.+...}, at: [<7f455388>]
>> l2cap_sock_shutdown+0x44/0x130 [bluetooth]
>> <snip>
>>
>> It can be seen that l2cap_get_chan_by_scid() is blocked so that the
>> ACK cannot be processed and l2cap_retrans_timeout() has timed out in
>> an attempt to retransmit the I frame but is also blocked. The
>> Bluetooth stack is now paralysed.
>>
>> This scenario will fail every time. The locking seems to be flawed in
>> l2cap_sock_shutdown() for L2CAP ERTM. Did it ever work ?
>>
>>
>> Workaround
>> ========
>>
>> Here is our workaround patch (based on kernel 3.8) which releases all
>> the locks before waiting (sleeping):
>>
>> diff --git a/net/bluetooth/l2cap_core.c b/net/bluetooth/l2cap_core.c
>> index a53f47a..bf49066 100644
>> --- a/net/bluetooth/l2cap_core.c
>> +++ b/net/bluetooth/l2cap_core.c
>> @@ -1766,7 +1766,13 @@ int __l2cap_wait_ack(struct sock *sk)
>> }
>>
>> release_sock(sk);
>> + l2cap_chan_unlock(chan);
>> + if (chan->conn)
>> + mutex_unlock(&chan->conn->chan_lock);
>> timeo = schedule_timeout(timeo);
>> + if (chan->conn)
>> + mutex_lock(&chan->conn->chan_lock);
>> + l2cap_chan_lock(chan);
>> lock_sock(sk);
>> set_current_state(TASK_INTERRUPTIBLE);
>>
>> diff --git a/net/bluetooth/l2cap_sock.c b/net/bluetooth/l2cap_sock.c
>> index 657b3a4..bd0d1ae 100644
>> --- a/net/bluetooth/l2cap_sock.c
>> +++ b/net/bluetooth/l2cap_sock.c
>> @@ -867,8 +867,10 @@ static int l2cap_sock_shutdown(struct socket
>> *sock, int how)
>> lock_sock(sk);
>>
>> if (!sk->sk_shutdown) {
>> - if (chan->mode == L2CAP_MODE_ERTM)
>> + if (chan->mode == L2CAP_MODE_ERTM) {
>> err = __l2cap_wait_ack(sk);
>> + conn = chan->conn;
>> + }
>>
>> sk->sk_shutdown = SHUTDOWN_MASK;
>>
>> If you think this is the right approach then we will formally create
>> a patch and send it to the mailing list for review.
>>
>> Our initial tests show that the deadlock no longer occurs with the
>> patch applied, but we are worried about side effects.
>>
>> We have some doubts:
>>
>> a) What are the locks protecting ? Do all 3 locks need to be taken ?
>>
>> b) Can a better architectural solution be created that avoids the
>> locks ?
>>
>> Perhaps arm the channel to disconnect, then after all ACKs have been
>> received, call a callback to wake up l2cap_sock_shutdown ?
>>
>> c) What stops new I frames being sent during the waiting period so
>> chan->unacked_frames increases ?
>>
>> Thanks for your time.
>>
>> Regards,
>> Dean
>>
>
--
Dean Jenkins
Embedded Software Engineer
Linux Transportation Solutions
Mentor Embedded Software Division
Mentor Graphics (UK) Ltd.
prev parent reply other threads:[~2015-06-05 8:35 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-05-07 15:26 [RFC] Bluetooth ERTM L2CAP deadlocks (hung tasks) due to l2cap_sock_shutdown() Dean Jenkins
2015-05-29 10:10 ` Dean Jenkins
2015-06-05 8:35 ` Dean Jenkins [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=55715F47.7070900@mentor.com \
--to=dean_jenkins@mentor.com \
--cc=Joshua_Frkuska@mentor.com \
--cc=gustavo@padovan.org \
--cc=johan.hedberg@gmail.com \
--cc=linux-bluetooth@vger.kernel.org \
--cc=marcel@holtmann.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).