netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: [syzbot] [bluetooth?] KASAN: slab-use-after-free Write in __hci_acl_create_connection_sync
       [not found] <000000000000b7b62e0610db6b8a@google.com>
@ 2024-02-08 15:32 ` syzbot
  2024-02-09 12:37   ` Jonas Dreßler
  0 siblings, 1 reply; 6+ messages in thread
From: syzbot @ 2024-02-08 15:32 UTC (permalink / raw)
  To: davem, edumazet, johan.hedberg, kuba, linux-bluetooth,
	linux-kernel, luiz.dentz, luiz.von.dentz, marcel, netdev, pabeni,
	syzkaller-bugs, verdre

syzbot has bisected this issue to:

commit 456561ba8e495e9320c1f304bf1cd3d1043cbe7b
Author: Jonas Dreßler <verdre@v0yd.nl>
Date:   Tue Feb 6 11:08:13 2024 +0000

    Bluetooth: hci_conn: Only do ACL connections sequentially

bisection log:  https://syzkaller.appspot.com/x/bisect.txt?x=154f8550180000
start commit:   b1d3a0e70c38 Add linux-next specific files for 20240208
git tree:       linux-next
final oops:     https://syzkaller.appspot.com/x/report.txt?x=174f8550180000
console output: https://syzkaller.appspot.com/x/log.txt?x=134f8550180000
kernel config:  https://syzkaller.appspot.com/x/.config?x=bb693ba195662a06
dashboard link: https://syzkaller.appspot.com/bug?extid=3f0a39be7a2035700868
syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=11d95147e80000
C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=107c2d8fe80000

Reported-by: syzbot+3f0a39be7a2035700868@syzkaller.appspotmail.com
Fixes: 456561ba8e49 ("Bluetooth: hci_conn: Only do ACL connections sequentially")

For information about bisection process see: https://goo.gl/tpsmEJ#bisection

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [syzbot] [bluetooth?] KASAN: slab-use-after-free Write in __hci_acl_create_connection_sync
  2024-02-08 15:32 ` [syzbot] [bluetooth?] KASAN: slab-use-after-free Write in __hci_acl_create_connection_sync syzbot
@ 2024-02-09 12:37   ` Jonas Dreßler
  2024-02-09 13:36     ` Luiz Augusto von Dentz
  0 siblings, 1 reply; 6+ messages in thread
From: Jonas Dreßler @ 2024-02-09 12:37 UTC (permalink / raw)
  To: syzbot, davem, edumazet, johan.hedberg, kuba, linux-bluetooth,
	linux-kernel, luiz.dentz, luiz.von.dentz, marcel, netdev, pabeni,
	syzkaller-bugs
  Cc: verdre

Hi everyone!

On 08.02.24 16:32, syzbot wrote:
> syzbot has bisected this issue to:
> 
> commit 456561ba8e495e9320c1f304bf1cd3d1043cbe7b
> Author: Jonas Dreßler <verdre@v0yd.nl>
> Date:   Tue Feb 6 11:08:13 2024 +0000
> 
>      Bluetooth: hci_conn: Only do ACL connections sequentially
> 
> bisection log:  https://syzkaller.appspot.com/x/bisect.txt?x=154f8550180000
> start commit:   b1d3a0e70c38 Add linux-next specific files for 20240208
> git tree:       linux-next
> final oops:     https://syzkaller.appspot.com/x/report.txt?x=174f8550180000
> console output: https://syzkaller.appspot.com/x/log.txt?x=134f8550180000
> kernel config:  https://syzkaller.appspot.com/x/.config?x=bb693ba195662a06
> dashboard link: https://syzkaller.appspot.com/bug?extid=3f0a39be7a2035700868
> syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=11d95147e80000
> C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=107c2d8fe80000
> 
> Reported-by: syzbot+3f0a39be7a2035700868@syzkaller.appspotmail.com
> Fixes: 456561ba8e49 ("Bluetooth: hci_conn: Only do ACL connections sequentially")
> 
> For information about bisection process see: https://goo.gl/tpsmEJ#bisection

Hmm, looking at the backtraces, I think the issue that the introduction of the
sequential connect has introduced another async case: In hci_connect_acl(), when
we call hci_acl_create_connection_sync(), the conn state no longer immediately
gets set to BT_CONNECT, but remains in BT_OPEN or BT_CLOSED until the hci_sync
queue actually executes __hci_acl_create_connection_sync().

This means that now hci_connect_acl() is happy to do multiple
hci_acl_create_connection_sync calls, and the hci_sync machinery will happily
execute them right after each other. Then the newly introduced hci_abort_conn_sync()
in __hci_acl_create_connection_sync() calls hci_conn_del() and frees the conn
object, so the second time we enter __hci_acl_create_connection_sync(),
things blow up.

It looks to me like in theory the hci_connect_le_sync() logic is prone to a
similar issue, but in practice that's prohibited because in hci_connect_le_sync()
we lookup whether the conn object still exists and bail out if it doesn't.

Even for LE though I think we can queue multiple hci_connect_le_sync() calls
and those will happily send HCI_OP_LE_CREATE_CONN no matter what the connection
state actually is?

So assuming this analysis is correct, what do we do to fix this? It seems to me
that

1) we want a BT_CONNECT_QUEUED state for connections, so that the state
machine covers this additional stage that we have for ACL and LE connections now.

2) the conn object can still disappear while the __hci_acl_create_connection_sync()
is queued, so we need something like the "if conn doesn't exist anymore, bail out"
check from hci_connect_le_sync() in __hci_acl_create_connection_sync(), too.

That said, the current check in hci_connect_le_sync() that's using the connection
handle to lookup the conn does not seem great, aren't these handles re-used
after connections are torn down?

Cheers,
Jonas

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [syzbot] [bluetooth?] KASAN: slab-use-after-free Write in __hci_acl_create_connection_sync
  2024-02-09 12:37   ` Jonas Dreßler
@ 2024-02-09 13:36     ` Luiz Augusto von Dentz
  2024-02-09 14:37       ` Luiz Augusto von Dentz
  2024-02-13  0:29       ` Jonas Dreßler
  0 siblings, 2 replies; 6+ messages in thread
From: Luiz Augusto von Dentz @ 2024-02-09 13:36 UTC (permalink / raw)
  To: Jonas Dreßler
  Cc: syzbot, davem, edumazet, johan.hedberg, kuba, linux-bluetooth,
	linux-kernel, luiz.von.dentz, marcel, netdev, pabeni,
	syzkaller-bugs

Hi Jonas,

On Fri, Feb 9, 2024 at 7:37 AM Jonas Dreßler <verdre@v0yd.nl> wrote:
>
> Hi everyone!
>
> On 08.02.24 16:32, syzbot wrote:
> > syzbot has bisected this issue to:
> >
> > commit 456561ba8e495e9320c1f304bf1cd3d1043cbe7b
> > Author: Jonas Dreßler <verdre@v0yd.nl>
> > Date:   Tue Feb 6 11:08:13 2024 +0000
> >
> >      Bluetooth: hci_conn: Only do ACL connections sequentially
> >
> > bisection log:  https://syzkaller.appspot.com/x/bisect.txt?x=154f8550180000
> > start commit:   b1d3a0e70c38 Add linux-next specific files for 20240208
> > git tree:       linux-next
> > final oops:     https://syzkaller.appspot.com/x/report.txt?x=174f8550180000
> > console output: https://syzkaller.appspot.com/x/log.txt?x=134f8550180000
> > kernel config:  https://syzkaller.appspot.com/x/.config?x=bb693ba195662a06
> > dashboard link: https://syzkaller.appspot.com/bug?extid=3f0a39be7a2035700868
> > syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=11d95147e80000
> > C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=107c2d8fe80000
> >
> > Reported-by: syzbot+3f0a39be7a2035700868@syzkaller.appspotmail.com
> > Fixes: 456561ba8e49 ("Bluetooth: hci_conn: Only do ACL connections sequentially")
> >
> > For information about bisection process see: https://goo.gl/tpsmEJ#bisection
>
> Hmm, looking at the backtraces, I think the issue that the introduction of the
> sequential connect has introduced another async case: In hci_connect_acl(), when
> we call hci_acl_create_connection_sync(), the conn state no longer immediately
> gets set to BT_CONNECT, but remains in BT_OPEN or BT_CLOSED until the hci_sync
> queue actually executes __hci_acl_create_connection_sync().

Need to double check but I think we do set BT_CONNECT in case of LE
when it is queued so which shall prevent it to be queued multiple
times.

> This means that now hci_connect_acl() is happy to do multiple
> hci_acl_create_connection_sync calls, and the hci_sync machinery will happily
> execute them right after each other. Then the newly introduced hci_abort_conn_sync()
> in __hci_acl_create_connection_sync() calls hci_conn_del() and frees the conn
> object, so the second time we enter __hci_acl_create_connection_sync(),
> things blow up.
>
> It looks to me like in theory the hci_connect_le_sync() logic is prone to a
> similar issue, but in practice that's prohibited because in hci_connect_le_sync()
> we lookup whether the conn object still exists and bail out if it doesn't.
>
> Even for LE though I think we can queue multiple hci_connect_le_sync() calls
> and those will happily send HCI_OP_LE_CREATE_CONN no matter what the connection
> state actually is?
>
> So assuming this analysis is correct, what do we do to fix this? It seems to me
> that
>
> 1) we want a BT_CONNECT_QUEUED state for connections, so that the state
> machine covers this additional stage that we have for ACL and LE connections now.

BT_CONNECT already indicates that connection procedure is in progress.

> 2) the conn object can still disappear while the __hci_acl_create_connection_sync()
> is queued, so we need something like the "if conn doesn't exist anymore, bail out"
> check from hci_connect_le_sync() in __hci_acl_create_connection_sync(), too.

Btw, I'd probably clean up the connect function and create something
like hci_connect/hci_connect_sync which takes care of the details
internally like it was done to abort.

> That said, the current check in hci_connect_le_sync() that's using the connection
> handle to lookup the conn does not seem great, aren't these handles re-used
> after connections are torn down?

Well we could perhaps do a lookup by pointer to see if the connection
hasn't been removed in the meantime, that said to force a clash on the
handles it need to happen in between abort, which frees the handle,
and connect, anyway the real culprit here is that we should be able to
abort the cmd_sync callback like we do in LE:

https://github.com/bluez/bluetooth-next/blob/master/net/bluetooth/hci_conn.c#L2943

That way we stop the connect callback to run and don't have to worry
about handle re-use.

> Cheers,
> Jonas



-- 
Luiz Augusto von Dentz

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [syzbot] [bluetooth?] KASAN: slab-use-after-free Write in __hci_acl_create_connection_sync
  2024-02-09 13:36     ` Luiz Augusto von Dentz
@ 2024-02-09 14:37       ` Luiz Augusto von Dentz
  2024-02-13  0:29       ` Jonas Dreßler
  1 sibling, 0 replies; 6+ messages in thread
From: Luiz Augusto von Dentz @ 2024-02-09 14:37 UTC (permalink / raw)
  To: Jonas Dreßler
  Cc: syzbot, davem, edumazet, johan.hedberg, kuba, linux-bluetooth,
	linux-kernel, luiz.von.dentz, marcel, netdev, pabeni,
	syzkaller-bugs

Hi Jonas,

On Fri, Feb 9, 2024 at 8:36 AM Luiz Augusto von Dentz
<luiz.dentz@gmail.com> wrote:
>
> Hi Jonas,
>
> On Fri, Feb 9, 2024 at 7:37 AM Jonas Dreßler <verdre@v0yd.nl> wrote:
> >
> > Hi everyone!
> >
> > On 08.02.24 16:32, syzbot wrote:
> > > syzbot has bisected this issue to:
> > >
> > > commit 456561ba8e495e9320c1f304bf1cd3d1043cbe7b
> > > Author: Jonas Dreßler <verdre@v0yd.nl>
> > > Date:   Tue Feb 6 11:08:13 2024 +0000
> > >
> > >      Bluetooth: hci_conn: Only do ACL connections sequentially
> > >
> > > bisection log:  https://syzkaller.appspot.com/x/bisect.txt?x=154f8550180000
> > > start commit:   b1d3a0e70c38 Add linux-next specific files for 20240208
> > > git tree:       linux-next
> > > final oops:     https://syzkaller.appspot.com/x/report.txt?x=174f8550180000
> > > console output: https://syzkaller.appspot.com/x/log.txt?x=134f8550180000
> > > kernel config:  https://syzkaller.appspot.com/x/.config?x=bb693ba195662a06
> > > dashboard link: https://syzkaller.appspot.com/bug?extid=3f0a39be7a2035700868
> > > syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=11d95147e80000
> > > C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=107c2d8fe80000
> > >
> > > Reported-by: syzbot+3f0a39be7a2035700868@syzkaller.appspotmail.com
> > > Fixes: 456561ba8e49 ("Bluetooth: hci_conn: Only do ACL connections sequentially")
> > >
> > > For information about bisection process see: https://goo.gl/tpsmEJ#bisection
> >
> > Hmm, looking at the backtraces, I think the issue that the introduction of the
> > sequential connect has introduced another async case: In hci_connect_acl(), when
> > we call hci_acl_create_connection_sync(), the conn state no longer immediately
> > gets set to BT_CONNECT, but remains in BT_OPEN or BT_CLOSED until the hci_sync
> > queue actually executes __hci_acl_create_connection_sync().
>
> Need to double check but I think we do set BT_CONNECT in case of LE
> when it is queued so which shall prevent it to be queued multiple
> times.
>
> > This means that now hci_connect_acl() is happy to do multiple
> > hci_acl_create_connection_sync calls, and the hci_sync machinery will happily
> > execute them right after each other. Then the newly introduced hci_abort_conn_sync()
> > in __hci_acl_create_connection_sync() calls hci_conn_del() and frees the conn
> > object, so the second time we enter __hci_acl_create_connection_sync(),
> > things blow up.
> >
> > It looks to me like in theory the hci_connect_le_sync() logic is prone to a
> > similar issue, but in practice that's prohibited because in hci_connect_le_sync()
> > we lookup whether the conn object still exists and bail out if it doesn't.
> >
> > Even for LE though I think we can queue multiple hci_connect_le_sync() calls
> > and those will happily send HCI_OP_LE_CREATE_CONN no matter what the connection
> > state actually is?
> >
> > So assuming this analysis is correct, what do we do to fix this? It seems to me
> > that
> >
> > 1) we want a BT_CONNECT_QUEUED state for connections, so that the state
> > machine covers this additional stage that we have for ACL and LE connections now.
>
> BT_CONNECT already indicates that connection procedure is in progress.
>
> > 2) the conn object can still disappear while the __hci_acl_create_connection_sync()
> > is queued, so we need something like the "if conn doesn't exist anymore, bail out"
> > check from hci_connect_le_sync() in __hci_acl_create_connection_sync(), too.
>
> Btw, I'd probably clean up the connect function and create something
> like hci_connect/hci_connect_sync which takes care of the details
> internally like it was done to abort.
>
> > That said, the current check in hci_connect_le_sync() that's using the connection
> > handle to lookup the conn does not seem great, aren't these handles re-used
> > after connections are torn down?
>
> Well we could perhaps do a lookup by pointer to see if the connection
> hasn't been removed in the meantime, that said to force a clash on the
> handles it need to happen in between abort, which frees the handle,
> and connect, anyway the real culprit here is that we should be able to
> abort the cmd_sync callback like we do in LE:
>
> https://github.com/bluez/bluetooth-next/blob/master/net/bluetooth/hci_conn.c#L2943
>
> That way we stop the connect callback to run and don't have to worry
> about handle re-use.

https://patchwork.kernel.org/project/bluetooth/patch/20240209141612.3554051-1-luiz.dentz@gmail.com/

>
> > Cheers,
> > Jonas
>
>
>
> --
> Luiz Augusto von Dentz



-- 
Luiz Augusto von Dentz

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [syzbot] [bluetooth?] KASAN: slab-use-after-free Write in __hci_acl_create_connection_sync
  2024-02-09 13:36     ` Luiz Augusto von Dentz
  2024-02-09 14:37       ` Luiz Augusto von Dentz
@ 2024-02-13  0:29       ` Jonas Dreßler
  2024-02-13 13:53         ` Luiz Augusto von Dentz
  1 sibling, 1 reply; 6+ messages in thread
From: Jonas Dreßler @ 2024-02-13  0:29 UTC (permalink / raw)
  To: Luiz Augusto von Dentz
  Cc: syzbot, davem, edumazet, johan.hedberg, kuba, linux-bluetooth,
	linux-kernel, luiz.von.dentz, marcel, netdev, pabeni,
	syzkaller-bugs

Hi Luiz,

On 09.02.24 14:36, Luiz Augusto von Dentz wrote:
> Hi Jonas,
> 
> On Fri, Feb 9, 2024 at 7:37 AM Jonas Dreßler <verdre@v0yd.nl> wrote:
>>
>> Hi everyone!
>>
>> On 08.02.24 16:32, syzbot wrote:
>>> syzbot has bisected this issue to:
>>>
>>> commit 456561ba8e495e9320c1f304bf1cd3d1043cbe7b
>>> Author: Jonas Dreßler <verdre@v0yd.nl>
>>> Date:   Tue Feb 6 11:08:13 2024 +0000
>>>
>>>       Bluetooth: hci_conn: Only do ACL connections sequentially
>>>
>>> bisection log:  https://syzkaller.appspot.com/x/bisect.txt?x=154f8550180000
>>> start commit:   b1d3a0e70c38 Add linux-next specific files for 20240208
>>> git tree:       linux-next
>>> final oops:     https://syzkaller.appspot.com/x/report.txt?x=174f8550180000
>>> console output: https://syzkaller.appspot.com/x/log.txt?x=134f8550180000
>>> kernel config:  https://syzkaller.appspot.com/x/.config?x=bb693ba195662a06
>>> dashboard link: https://syzkaller.appspot.com/bug?extid=3f0a39be7a2035700868
>>> syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=11d95147e80000
>>> C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=107c2d8fe80000
>>>
>>> Reported-by: syzbot+3f0a39be7a2035700868@syzkaller.appspotmail.com
>>> Fixes: 456561ba8e49 ("Bluetooth: hci_conn: Only do ACL connections sequentially")
>>>
>>> For information about bisection process see: https://goo.gl/tpsmEJ#bisection
>>
>> Hmm, looking at the backtraces, I think the issue that the introduction of the
>> sequential connect has introduced another async case: In hci_connect_acl(), when
>> we call hci_acl_create_connection_sync(), the conn state no longer immediately
>> gets set to BT_CONNECT, but remains in BT_OPEN or BT_CLOSED until the hci_sync
>> queue actually executes __hci_acl_create_connection_sync().
> 
> Need to double check but I think we do set BT_CONNECT in case of LE
> when it is queued so which shall prevent it to be queued multiple
> times.

Nope, we set it only from within the hci_sync callback, see
hci_connect_le_sync(). IMO that makes sense, because in
hci_abort_conn_sync(), we send a  HCI_OP_CREATE_CONN_CANCEL in case
of conn->state == BT_CONNECT, and this OP we wouldn't want to send
while the command is still waiting in the queue.

> 
>> This means that now hci_connect_acl() is happy to do multiple
>> hci_acl_create_connection_sync calls, and the hci_sync machinery will happily
>> execute them right after each other. Then the newly introduced hci_abort_conn_sync()
>> in __hci_acl_create_connection_sync() calls hci_conn_del() and frees the conn
>> object, so the second time we enter __hci_acl_create_connection_sync(),
>> things blow up.
>>
>> It looks to me like in theory the hci_connect_le_sync() logic is prone to a
>> similar issue, but in practice that's prohibited because in hci_connect_le_sync()
>> we lookup whether the conn object still exists and bail out if it doesn't.
>>
>> Even for LE though I think we can queue multiple hci_connect_le_sync() calls
>> and those will happily send HCI_OP_LE_CREATE_CONN no matter what the connection
>> state actually is?
>>
>> So assuming this analysis is correct, what do we do to fix this? It seems to me
>> that
>>
>> 1) we want a BT_CONNECT_QUEUED state for connections, so that the state
>> machine covers this additional stage that we have for ACL and LE connections now.
> 
> BT_CONNECT already indicates that connection procedure is in progress.

I still think an additional state is necessary. Alternatively (and maybe
a lot nicer than the extra state) would be to add some functions to hci_sync
to check for and remove queued/ongoing commands, I'm thinking of a new

bool hci_cmd_sync_has(struct hci_dev *hdev, hci_cmd_sync_work_func_t func, void *data);
void hci_cmd_sync_cancel(struct hci_dev *hdev, hci_cmd_sync_work_func_t func, void *data);

I think if we had those, in addition to not needing the additional state,
we could also simplify the hci_abort_conn() code quite a bit and possibly
get rid of the passing of connection handles to hci_connect_le_sync().

I'll give that a try and propose a small patch tomorrow.

Cheers,
Jonas

> 
>> 2) the conn object can still disappear while the __hci_acl_create_connection_sync()
>> is queued, so we need something like the "if conn doesn't exist anymore, bail out"
>> check from hci_connect_le_sync() in __hci_acl_create_connection_sync(), too.
> 
> Btw, I'd probably clean up the connect function and create something
> like hci_connect/hci_connect_sync which takes care of the details
> internally like it was done to abort.
> 
>> That said, the current check in hci_connect_le_sync() that's using the connection
>> handle to lookup the conn does not seem great, aren't these handles re-used
>> after connections are torn down?
> 
> Well we could perhaps do a lookup by pointer to see if the connection
> hasn't been removed in the meantime, that said to force a clash on the
> handles it need to happen in between abort, which frees the handle,
> and connect, anyway the real culprit here is that we should be able to
> abort the cmd_sync callback like we do in LE:
> 
> https://github.com/bluez/bluetooth-next/blob/master/net/bluetooth/hci_conn.c#L2943
> 
> That way we stop the connect callback to run and don't have to worry
> about handle re-use.
> 
>> Cheers,
>> Jonas
> 
> 
> 

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [syzbot] [bluetooth?] KASAN: slab-use-after-free Write in __hci_acl_create_connection_sync
  2024-02-13  0:29       ` Jonas Dreßler
@ 2024-02-13 13:53         ` Luiz Augusto von Dentz
  0 siblings, 0 replies; 6+ messages in thread
From: Luiz Augusto von Dentz @ 2024-02-13 13:53 UTC (permalink / raw)
  To: Jonas Dreßler
  Cc: syzbot, davem, edumazet, johan.hedberg, kuba, linux-bluetooth,
	linux-kernel, luiz.von.dentz, marcel, netdev, pabeni,
	syzkaller-bugs

Hi Jonas,

On Mon, Feb 12, 2024 at 7:29 PM Jonas Dreßler <verdre@v0yd.nl> wrote:
>
> Hi Luiz,
>
> On 09.02.24 14:36, Luiz Augusto von Dentz wrote:
> > Hi Jonas,
> >
> > On Fri, Feb 9, 2024 at 7:37 AM Jonas Dreßler <verdre@v0yd.nl> wrote:
> >>
> >> Hi everyone!
> >>
> >> On 08.02.24 16:32, syzbot wrote:
> >>> syzbot has bisected this issue to:
> >>>
> >>> commit 456561ba8e495e9320c1f304bf1cd3d1043cbe7b
> >>> Author: Jonas Dreßler <verdre@v0yd.nl>
> >>> Date:   Tue Feb 6 11:08:13 2024 +0000
> >>>
> >>>       Bluetooth: hci_conn: Only do ACL connections sequentially
> >>>
> >>> bisection log:  https://syzkaller.appspot.com/x/bisect.txt?x=154f8550180000
> >>> start commit:   b1d3a0e70c38 Add linux-next specific files for 20240208
> >>> git tree:       linux-next
> >>> final oops:     https://syzkaller.appspot.com/x/report.txt?x=174f8550180000
> >>> console output: https://syzkaller.appspot.com/x/log.txt?x=134f8550180000
> >>> kernel config:  https://syzkaller.appspot.com/x/.config?x=bb693ba195662a06
> >>> dashboard link: https://syzkaller.appspot.com/bug?extid=3f0a39be7a2035700868
> >>> syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=11d95147e80000
> >>> C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=107c2d8fe80000
> >>>
> >>> Reported-by: syzbot+3f0a39be7a2035700868@syzkaller.appspotmail.com
> >>> Fixes: 456561ba8e49 ("Bluetooth: hci_conn: Only do ACL connections sequentially")
> >>>
> >>> For information about bisection process see: https://goo.gl/tpsmEJ#bisection
> >>
> >> Hmm, looking at the backtraces, I think the issue that the introduction of the
> >> sequential connect has introduced another async case: In hci_connect_acl(), when
> >> we call hci_acl_create_connection_sync(), the conn state no longer immediately
> >> gets set to BT_CONNECT, but remains in BT_OPEN or BT_CLOSED until the hci_sync
> >> queue actually executes __hci_acl_create_connection_sync().
> >
> > Need to double check but I think we do set BT_CONNECT in case of LE
> > when it is queued so which shall prevent it to be queued multiple
> > times.
>
> Nope, we set it only from within the hci_sync callback, see
> hci_connect_le_sync(). IMO that makes sense, because in
> hci_abort_conn_sync(), we send a  HCI_OP_CREATE_CONN_CANCEL in case
> of conn->state == BT_CONNECT, and this OP we wouldn't want to send
> while the command is still waiting in the queue.

Yep, well I did a change which should work regardless of BT_CONNECT
only being set at the callback:

https://patchwork.kernel.org/project/bluetooth/patch/20240209141612.3554051-1-luiz.dentz@gmail.com/

> >
> >> This means that now hci_connect_acl() is happy to do multiple
> >> hci_acl_create_connection_sync calls, and the hci_sync machinery will happily
> >> execute them right after each other. Then the newly introduced hci_abort_conn_sync()
> >> in __hci_acl_create_connection_sync() calls hci_conn_del() and frees the conn
> >> object, so the second time we enter __hci_acl_create_connection_sync(),
> >> things blow up.
> >>
> >> It looks to me like in theory the hci_connect_le_sync() logic is prone to a
> >> similar issue, but in practice that's prohibited because in hci_connect_le_sync()
> >> we lookup whether the conn object still exists and bail out if it doesn't.
> >>
> >> Even for LE though I think we can queue multiple hci_connect_le_sync() calls
> >> and those will happily send HCI_OP_LE_CREATE_CONN no matter what the connection
> >> state actually is?
> >>
> >> So assuming this analysis is correct, what do we do to fix this? It seems to me
> >> that
> >>
> >> 1) we want a BT_CONNECT_QUEUED state for connections, so that the state
> >> machine covers this additional stage that we have for ACL and LE connections now.
> >
> > BT_CONNECT already indicates that connection procedure is in progress.
>
> I still think an additional state is necessary. Alternatively (and maybe
> a lot nicer than the extra state) would be to add some functions to hci_sync
> to check for and remove queued/ongoing commands, I'm thinking of a new
>
> bool hci_cmd_sync_has(struct hci_dev *hdev, hci_cmd_sync_work_func_t func, void *data);
> void hci_cmd_sync_cancel(struct hci_dev *hdev, hci_cmd_sync_work_func_t func, void *data);
>
> I think if we had those, in addition to not needing the additional state,
> we could also simplify the hci_abort_conn() code quite a bit and possibly
> get rid of the passing of connection handles to hci_connect_le_sync().
>
> I'll give that a try and propose a small patch tomorrow.

Yeah, I did something like in the past for hci_cmd_sync_queue_once, I
will resend it since it had the following function as well:

bool hci_cmd_sync_lookup(struct hci_dev *hdev, hci_cmd_sync_work_func_t func,
             void *data, hci_cmd_sync_work_destroy_t destroy,
             bool cancel);

> Cheers,
> Jonas
>
> >
> >> 2) the conn object can still disappear while the __hci_acl_create_connection_sync()
> >> is queued, so we need something like the "if conn doesn't exist anymore, bail out"
> >> check from hci_connect_le_sync() in __hci_acl_create_connection_sync(), too.
> >
> > Btw, I'd probably clean up the connect function and create something
> > like hci_connect/hci_connect_sync which takes care of the details
> > internally like it was done to abort.
> >
> >> That said, the current check in hci_connect_le_sync() that's using the connection
> >> handle to lookup the conn does not seem great, aren't these handles re-used
> >> after connections are torn down?
> >
> > Well we could perhaps do a lookup by pointer to see if the connection
> > hasn't been removed in the meantime, that said to force a clash on the
> > handles it need to happen in between abort, which frees the handle,
> > and connect, anyway the real culprit here is that we should be able to
> > abort the cmd_sync callback like we do in LE:
> >
> > https://github.com/bluez/bluetooth-next/blob/master/net/bluetooth/hci_conn.c#L2943
> >
> > That way we stop the connect callback to run and don't have to worry
> > about handle re-use.
> >
> >> Cheers,
> >> Jonas
> >
> >
> >



-- 
Luiz Augusto von Dentz

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2024-02-13 13:53 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <000000000000b7b62e0610db6b8a@google.com>
2024-02-08 15:32 ` [syzbot] [bluetooth?] KASAN: slab-use-after-free Write in __hci_acl_create_connection_sync syzbot
2024-02-09 12:37   ` Jonas Dreßler
2024-02-09 13:36     ` Luiz Augusto von Dentz
2024-02-09 14:37       ` Luiz Augusto von Dentz
2024-02-13  0:29       ` Jonas Dreßler
2024-02-13 13:53         ` Luiz Augusto von Dentz

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).