linux-bluetooth.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Kernel panic in rfcomm_run - unbalanced refcount on rfcomm_session
@ 2010-02-18  5:04 Nick Pelly
  2010-02-18  7:15 ` Ville Tervo
  2010-02-20  8:17 ` Dave Young
  0 siblings, 2 replies; 9+ messages in thread
From: Nick Pelly @ 2010-02-18  5:04 UTC (permalink / raw)
  To: Bluettooth Linux

Since 2.6.32 we are seeing kernel panics like:

[10651.110229] Unable to handle kernel paging request at virtual
address 6b6b6b6b
[10651.111968] Internal error: Oops: 5 [#1] PREEMPT
[10651.113952] CPU: 0    Tainted: G        W   (2.6.32-59979-gd0c97db #1)
[10651.114624] PC is at rfcomm_run+0xa04/0xdbc
<...>
[10651.406188] [<c031ad24>] (rfcomm_run+0xa04/0xdbc) from [<c006ce30>]
(kthread+0x78/0x80)
[10651.406585] [<c006ce30>] (kthread+0x78/0x80) from [<c002793c>]
(kernel_thread_exit+0x0/0x8)

(rfcomm_run() is all inlined so theres not much of a stack trace))

This is a use-after-free on struct rfcomm_session s in the call chain
rfcomm_run() -> rfcomm_process_sessions() -> rfcomm_process_dlcs() ->
list_for_each_safe(p, n, &s->dlcs). The only way this can happen is if
there is an unbalanced refcount on the rfcomm session.

We found that reverting the patch
9e726b17422bade75fba94e625cd35fd1353e682 "Bluetooth: Fix rejected
connection not disconnecting ACL link" fixes the issue for us. The
patch itself looks ok, I added some logging to check the new refcounts
in the patch are balanced and they are. However if I remove the new
calls to rfcomm_session_put() and rfcomm_session_hold() the crash is
resolved. I also found that we can crash without hitting
rfcomm_session_timeout(), so its not related to Marcel's recent patch
to remove the scheduling-while-atomic warning.

9e726b17422bade75fba94e625cd35fd1353e682 does lead to a delay in
calling rfcomm_session_del() due to the extra refcount while waiting
for the new timeout. I believe that this delay has revealed some more
subtle problem elsewhere that causes an unbalanced refcount and then
the kernel panic.

I have debug kernel logs and hci logs - they are too large to send to
the list but I can send them directly to anyone interested in
debugging.

We see this crash frequently with a number of headsets since 2.6.32,
but not reliably. I do have a 100% repro case with the Nuvi Garmin,
with these exact steps:
1) Make sure Nuvi is unpaired, Bluez stack is unpaired, and kernel has
been rebooted since unpairing.
2) Initiate device discovery, pairing, and handsfree connection from Nuvi
3) Observe HFP rfcomm connect briefly, then disconnect, and kernel panic

Our short-term solution is unfortunately to revert
9e726b17422bade75fba94e625cd35fd1353e682.

Nick

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2010-10-29 12:34 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-02-18  5:04 Kernel panic in rfcomm_run - unbalanced refcount on rfcomm_session Nick Pelly
2010-02-18  7:15 ` Ville Tervo
2010-02-20  8:17 ` Dave Young
2010-02-21 21:00   ` Nick Pelly
2010-02-26 10:23     ` Ville Tervo
2010-03-09  7:19       ` Ville Tervo
2010-03-09  7:31         ` Nick Pelly
2010-03-19  8:33           ` Andrei Emeltchenko
2010-10-29 12:34             ` Simantini Bhattacharya

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).