* [PATCH v2] can: bcm: prevent thrtimer UAF in rx path by checking RX_NO_AUTOTIMER
@ 2026-05-19 11:38 Oliver Hartkopp
2026-05-20 12:47 ` Lee Jones
0 siblings, 1 reply; 14+ messages in thread
From: Oliver Hartkopp @ 2026-05-19 11:38 UTC (permalink / raw)
To: linux-can; +Cc: Lee Jones, Oliver Hartkopp
From: Lee Jones <lee@kernel.org>
Commit f1b4e32aca08 ("can: bcm: use call_rcu() instead of costly
synchronize_rcu()") removed the synchronize_rcu() call from
bcm_delete_rx_op() and introduced the RX_NO_AUTOTIMER flag to prevent
timers from being rearmed during deletion. However, it only applied
this check to op->timer via bcm_rx_starttimer().
It missed the fact that op->thrtimer can also be rearmed by an
in-flight bcm_rx_handler() (which runs as an RCU reader) via
bcm_rx_update_and_send(). This allows op->thrtimer to be queued after
bcm_remove_op() has already cancelled it, leading to a use-after-free
when the timer fires on the deferred-freed struct bcm_op.
Address the omission by checking the RX_NO_AUTOTIMER flag
in bcm_rx_update_and_send() before starting op->thrtimer, effectively
preventing it from being rearmed concurrently with teardown.
[Hartkopp] Added the setting of RX_NO_AUTOTIMER also to bcm_release() before
removing the CAN filters following the bcm_delete_rx_op() approach.
Additionally WRITE_ONCE()/READ_ONCE() macros have been introduced for
the changes of RX_NO_AUTOTIMER at rx op removal time to prevent a
potential code reordering of RX_NO_AUTOTIMER setting after CAN filter removal.
Signed-off-by: Lee Jones <lee@kernel.org>
Co-developed-by: Oliver Hartkopp <socketcan@hartkopp.net>
Fixes: f1b4e32aca08 ("can: bcm: use call_rcu() instead of costly synchronize_rcu()")
Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
---
net/can/bcm.c | 14 ++++++++++++--
1 file changed, 12 insertions(+), 2 deletions(-)
diff --git a/net/can/bcm.c b/net/can/bcm.c
index a4bef2c48a55..abf7bd2c2e6f 100644
--- a/net/can/bcm.c
+++ b/net/can/bcm.c
@@ -537,10 +537,16 @@ static void bcm_rx_update_and_send(struct bcm_op *op,
/* with active throttling timer we are just done here */
if (hrtimer_active(&op->thrtimer))
return;
+ /* bcm_remove_op() may have cancelled thrtimer concurrently with this
+ * RCU-protected handler; do not rearm it. Mirrors bcm_rx_starttimer().
+ */
+ if (READ_ONCE(op->flags) & RX_NO_AUTOTIMER)
+ return;
+
/* first reception with enabled throttling mode */
if (!op->kt_lastmsg)
goto rx_changed_settime;
/* got a second frame inside a potential throttle period? */
@@ -603,11 +609,11 @@ static void bcm_rx_cmp_to_index(struct bcm_op *op, unsigned int index,
/*
* bcm_rx_starttimer - enable timeout monitoring for CAN frame reception
*/
static void bcm_rx_starttimer(struct bcm_op *op)
{
- if (op->flags & RX_NO_AUTOTIMER)
+ if (READ_ONCE(op->flags) & RX_NO_AUTOTIMER)
return;
if (op->kt_ival1)
hrtimer_start(&op->timer, op->kt_ival1, HRTIMER_MODE_REL_SOFT);
}
@@ -838,11 +844,11 @@ static int bcm_delete_rx_op(struct list_head *ops, struct bcm_msg_head *mh,
list_for_each_entry_safe(op, n, ops, list) {
if ((op->can_id == mh->can_id) && (op->ifindex == ifindex) &&
(op->flags & CAN_FD_FRAME) == (mh->flags & CAN_FD_FRAME)) {
/* disable automatic timer on frame reception */
- op->flags |= RX_NO_AUTOTIMER;
+ WRITE_ONCE(op->flags, op->flags | RX_NO_AUTOTIMER);
/*
* Don't care if we're bound or not (due to netdev
* problems) can_rx_unregister() is always a save
* thing to do here.
@@ -1618,10 +1624,14 @@ static int bcm_release(struct socket *sock)
list_for_each_entry_safe(op, next, &bo->tx_ops, list)
bcm_remove_op(op);
list_for_each_entry_safe(op, next, &bo->rx_ops, list) {
+
+ /* disable automatic timer on frame reception */
+ WRITE_ONCE(op->flags, op->flags | RX_NO_AUTOTIMER);
+
/*
* Don't care if we're bound or not (due to netdev problems)
* can_rx_unregister() is always a save thing to do here.
*/
if (op->ifindex) {
--
2.53.0
^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH v2] can: bcm: prevent thrtimer UAF in rx path by checking RX_NO_AUTOTIMER
2026-05-19 11:38 [PATCH v2] can: bcm: prevent thrtimer UAF in rx path by checking RX_NO_AUTOTIMER Oliver Hartkopp
@ 2026-05-20 12:47 ` Lee Jones
2026-05-20 12:49 ` Lee Jones
2026-05-20 12:59 ` Oliver Hartkopp
0 siblings, 2 replies; 14+ messages in thread
From: Lee Jones @ 2026-05-20 12:47 UTC (permalink / raw)
To: Oliver Hartkopp; +Cc: linux-can
On Tue, 19 May 2026, Oliver Hartkopp wrote:
> From: Lee Jones <lee@kernel.org>
>
> Commit f1b4e32aca08 ("can: bcm: use call_rcu() instead of costly
> synchronize_rcu()") removed the synchronize_rcu() call from
> bcm_delete_rx_op() and introduced the RX_NO_AUTOTIMER flag to prevent
> timers from being rearmed during deletion. However, it only applied
> this check to op->timer via bcm_rx_starttimer().
>
> It missed the fact that op->thrtimer can also be rearmed by an
> in-flight bcm_rx_handler() (which runs as an RCU reader) via
> bcm_rx_update_and_send(). This allows op->thrtimer to be queued after
> bcm_remove_op() has already cancelled it, leading to a use-after-free
> when the timer fires on the deferred-freed struct bcm_op.
>
> Address the omission by checking the RX_NO_AUTOTIMER flag
> in bcm_rx_update_and_send() before starting op->thrtimer, effectively
> preventing it from being rearmed concurrently with teardown.
>
> [Hartkopp] Added the setting of RX_NO_AUTOTIMER also to bcm_release() before
> removing the CAN filters following the bcm_delete_rx_op() approach.
>
> Additionally WRITE_ONCE()/READ_ONCE() macros have been introduced for
> the changes of RX_NO_AUTOTIMER at rx op removal time to prevent a
> potential code reordering of RX_NO_AUTOTIMER setting after CAN filter removal.
>
> Signed-off-by: Lee Jones <lee@kernel.org>
> Co-developed-by: Oliver Hartkopp <socketcan@hartkopp.net>
You did? Can you add a note saying what you changed please?
> Fixes: f1b4e32aca08 ("can: bcm: use call_rcu() instead of costly synchronize_rcu()")
> Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
> ---
> net/can/bcm.c | 14 ++++++++++++--
> 1 file changed, 12 insertions(+), 2 deletions(-)
>
> diff --git a/net/can/bcm.c b/net/can/bcm.c
> index a4bef2c48a55..abf7bd2c2e6f 100644
> --- a/net/can/bcm.c
> +++ b/net/can/bcm.c
> @@ -537,10 +537,16 @@ static void bcm_rx_update_and_send(struct bcm_op *op,
>
> /* with active throttling timer we are just done here */
> if (hrtimer_active(&op->thrtimer))
> return;
>
> + /* bcm_remove_op() may have cancelled thrtimer concurrently with this
> + * RCU-protected handler; do not rearm it. Mirrors bcm_rx_starttimer().
> + */
> + if (READ_ONCE(op->flags) & RX_NO_AUTOTIMER)
> + return;
> +
> /* first reception with enabled throttling mode */
> if (!op->kt_lastmsg)
> goto rx_changed_settime;
>
> /* got a second frame inside a potential throttle period? */
> @@ -603,11 +609,11 @@ static void bcm_rx_cmp_to_index(struct bcm_op *op, unsigned int index,
> /*
> * bcm_rx_starttimer - enable timeout monitoring for CAN frame reception
> */
> static void bcm_rx_starttimer(struct bcm_op *op)
> {
> - if (op->flags & RX_NO_AUTOTIMER)
> + if (READ_ONCE(op->flags) & RX_NO_AUTOTIMER)
> return;
>
> if (op->kt_ival1)
> hrtimer_start(&op->timer, op->kt_ival1, HRTIMER_MODE_REL_SOFT);
> }
> @@ -838,11 +844,11 @@ static int bcm_delete_rx_op(struct list_head *ops, struct bcm_msg_head *mh,
> list_for_each_entry_safe(op, n, ops, list) {
> if ((op->can_id == mh->can_id) && (op->ifindex == ifindex) &&
> (op->flags & CAN_FD_FRAME) == (mh->flags & CAN_FD_FRAME)) {
>
> /* disable automatic timer on frame reception */
> - op->flags |= RX_NO_AUTOTIMER;
> + WRITE_ONCE(op->flags, op->flags | RX_NO_AUTOTIMER);
>
> /*
> * Don't care if we're bound or not (due to netdev
> * problems) can_rx_unregister() is always a save
> * thing to do here.
> @@ -1618,10 +1624,14 @@ static int bcm_release(struct socket *sock)
>
> list_for_each_entry_safe(op, next, &bo->tx_ops, list)
> bcm_remove_op(op);
>
> list_for_each_entry_safe(op, next, &bo->rx_ops, list) {
> +
> + /* disable automatic timer on frame reception */
> + WRITE_ONCE(op->flags, op->flags | RX_NO_AUTOTIMER);
> +
> /*
> * Don't care if we're bound or not (due to netdev problems)
> * can_rx_unregister() is always a save thing to do here.
> */
> if (op->ifindex) {
> --
> 2.53.0
>
--
Lee Jones
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v2] can: bcm: prevent thrtimer UAF in rx path by checking RX_NO_AUTOTIMER
2026-05-20 12:47 ` Lee Jones
@ 2026-05-20 12:49 ` Lee Jones
2026-05-20 13:03 ` Oliver Hartkopp
2026-05-20 12:59 ` Oliver Hartkopp
1 sibling, 1 reply; 14+ messages in thread
From: Lee Jones @ 2026-05-20 12:49 UTC (permalink / raw)
To: Oliver Hartkopp; +Cc: linux-can
On Wed, 20 May 2026, Lee Jones wrote:
> On Tue, 19 May 2026, Oliver Hartkopp wrote:
>
> > From: Lee Jones <lee@kernel.org>
> >
> > Commit f1b4e32aca08 ("can: bcm: use call_rcu() instead of costly
> > synchronize_rcu()") removed the synchronize_rcu() call from
> > bcm_delete_rx_op() and introduced the RX_NO_AUTOTIMER flag to prevent
> > timers from being rearmed during deletion. However, it only applied
> > this check to op->timer via bcm_rx_starttimer().
> >
> > It missed the fact that op->thrtimer can also be rearmed by an
> > in-flight bcm_rx_handler() (which runs as an RCU reader) via
> > bcm_rx_update_and_send(). This allows op->thrtimer to be queued after
> > bcm_remove_op() has already cancelled it, leading to a use-after-free
> > when the timer fires on the deferred-freed struct bcm_op.
> >
> > Address the omission by checking the RX_NO_AUTOTIMER flag
> > in bcm_rx_update_and_send() before starting op->thrtimer, effectively
> > preventing it from being rearmed concurrently with teardown.
> >
> > [Hartkopp] Added the setting of RX_NO_AUTOTIMER also to bcm_release() before
> > removing the CAN filters following the bcm_delete_rx_op() approach.
> >
> > Additionally WRITE_ONCE()/READ_ONCE() macros have been introduced for
> > the changes of RX_NO_AUTOTIMER at rx op removal time to prevent a
> > potential code reordering of RX_NO_AUTOTIMER setting after CAN filter removal.
> >
> > Signed-off-by: Lee Jones <lee@kernel.org>
> > Co-developed-by: Oliver Hartkopp <socketcan@hartkopp.net>
>
> You did? Can you add a note saying what you changed please?
FYI, did you also see the second swing I took at this:
https://lore.kernel.org/r/20260520080523.2513957-1-lee@kernel.org
> > Fixes: f1b4e32aca08 ("can: bcm: use call_rcu() instead of costly synchronize_rcu()")
> > Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
> > ---
> > net/can/bcm.c | 14 ++++++++++++--
> > 1 file changed, 12 insertions(+), 2 deletions(-)
> >
> > diff --git a/net/can/bcm.c b/net/can/bcm.c
> > index a4bef2c48a55..abf7bd2c2e6f 100644
> > --- a/net/can/bcm.c
> > +++ b/net/can/bcm.c
> > @@ -537,10 +537,16 @@ static void bcm_rx_update_and_send(struct bcm_op *op,
> >
> > /* with active throttling timer we are just done here */
> > if (hrtimer_active(&op->thrtimer))
> > return;
> >
> > + /* bcm_remove_op() may have cancelled thrtimer concurrently with this
> > + * RCU-protected handler; do not rearm it. Mirrors bcm_rx_starttimer().
> > + */
> > + if (READ_ONCE(op->flags) & RX_NO_AUTOTIMER)
> > + return;
> > +
> > /* first reception with enabled throttling mode */
> > if (!op->kt_lastmsg)
> > goto rx_changed_settime;
> >
> > /* got a second frame inside a potential throttle period? */
> > @@ -603,11 +609,11 @@ static void bcm_rx_cmp_to_index(struct bcm_op *op, unsigned int index,
> > /*
> > * bcm_rx_starttimer - enable timeout monitoring for CAN frame reception
> > */
> > static void bcm_rx_starttimer(struct bcm_op *op)
> > {
> > - if (op->flags & RX_NO_AUTOTIMER)
> > + if (READ_ONCE(op->flags) & RX_NO_AUTOTIMER)
> > return;
> >
> > if (op->kt_ival1)
> > hrtimer_start(&op->timer, op->kt_ival1, HRTIMER_MODE_REL_SOFT);
> > }
> > @@ -838,11 +844,11 @@ static int bcm_delete_rx_op(struct list_head *ops, struct bcm_msg_head *mh,
> > list_for_each_entry_safe(op, n, ops, list) {
> > if ((op->can_id == mh->can_id) && (op->ifindex == ifindex) &&
> > (op->flags & CAN_FD_FRAME) == (mh->flags & CAN_FD_FRAME)) {
> >
> > /* disable automatic timer on frame reception */
> > - op->flags |= RX_NO_AUTOTIMER;
> > + WRITE_ONCE(op->flags, op->flags | RX_NO_AUTOTIMER);
> >
> > /*
> > * Don't care if we're bound or not (due to netdev
> > * problems) can_rx_unregister() is always a save
> > * thing to do here.
> > @@ -1618,10 +1624,14 @@ static int bcm_release(struct socket *sock)
> >
> > list_for_each_entry_safe(op, next, &bo->tx_ops, list)
> > bcm_remove_op(op);
> >
> > list_for_each_entry_safe(op, next, &bo->rx_ops, list) {
> > +
> > + /* disable automatic timer on frame reception */
> > + WRITE_ONCE(op->flags, op->flags | RX_NO_AUTOTIMER);
> > +
> > /*
> > * Don't care if we're bound or not (due to netdev problems)
> > * can_rx_unregister() is always a save thing to do here.
> > */
> > if (op->ifindex) {
> > --
> > 2.53.0
> >
>
> --
> Lee Jones
--
Lee Jones
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v2] can: bcm: prevent thrtimer UAF in rx path by checking RX_NO_AUTOTIMER
2026-05-20 12:47 ` Lee Jones
2026-05-20 12:49 ` Lee Jones
@ 2026-05-20 12:59 ` Oliver Hartkopp
1 sibling, 0 replies; 14+ messages in thread
From: Oliver Hartkopp @ 2026-05-20 12:59 UTC (permalink / raw)
To: Lee Jones; +Cc: linux-can
On 20.05.26 14:47, Lee Jones wrote:
This:
>>
>> [Hartkopp] Added the setting of RX_NO_AUTOTIMER also to bcm_release() before
>> removing the CAN filters following the bcm_delete_rx_op() approach.
>>
>> Additionally WRITE_ONCE()/READ_ONCE() macros have been introduced for
>> the changes of RX_NO_AUTOTIMER at rx op removal time to prevent a
>> potential code reordering of RX_NO_AUTOTIMER setting after CAN filter removal.
>>
>> Signed-off-by: Lee Jones <lee@kernel.org>
>> Co-developed-by: Oliver Hartkopp <socketcan@hartkopp.net>
>
> You did? Can you add a note saying what you changed please?
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v2] can: bcm: prevent thrtimer UAF in rx path by checking RX_NO_AUTOTIMER
2026-05-20 12:49 ` Lee Jones
@ 2026-05-20 13:03 ` Oliver Hartkopp
2026-05-20 13:40 ` Lee Jones
0 siblings, 1 reply; 14+ messages in thread
From: Oliver Hartkopp @ 2026-05-20 13:03 UTC (permalink / raw)
To: Lee Jones; +Cc: linux-can
On 20.05.26 14:49, Lee Jones wrote:
> On Wed, 20 May 2026, Lee Jones wrote:
>
>> On Tue, 19 May 2026, Oliver Hartkopp wrote:
>>
>>> From: Lee Jones <lee@kernel.org>
>>>
>>> Commit f1b4e32aca08 ("can: bcm: use call_rcu() instead of costly
>>> synchronize_rcu()") removed the synchronize_rcu() call from
>>> bcm_delete_rx_op() and introduced the RX_NO_AUTOTIMER flag to prevent
>>> timers from being rearmed during deletion. However, it only applied
>>> this check to op->timer via bcm_rx_starttimer().
>>>
>>> It missed the fact that op->thrtimer can also be rearmed by an
>>> in-flight bcm_rx_handler() (which runs as an RCU reader) via
>>> bcm_rx_update_and_send(). This allows op->thrtimer to be queued after
>>> bcm_remove_op() has already cancelled it, leading to a use-after-free
>>> when the timer fires on the deferred-freed struct bcm_op.
>>>
>>> Address the omission by checking the RX_NO_AUTOTIMER flag
>>> in bcm_rx_update_and_send() before starting op->thrtimer, effectively
>>> preventing it from being rearmed concurrently with teardown.
>>>
>>> [Hartkopp] Added the setting of RX_NO_AUTOTIMER also to bcm_release() before
>>> removing the CAN filters following the bcm_delete_rx_op() approach.
>>>
>>> Additionally WRITE_ONCE()/READ_ONCE() macros have been introduced for
>>> the changes of RX_NO_AUTOTIMER at rx op removal time to prevent a
>>> potential code reordering of RX_NO_AUTOTIMER setting after CAN filter removal.
>>>
>>> Signed-off-by: Lee Jones <lee@kernel.org>
>>> Co-developed-by: Oliver Hartkopp <socketcan@hartkopp.net>
>>
>> You did? Can you add a note saying what you changed please?
>
> FYI, did you also see the second swing I took at this:
>
> https://lore.kernel.org/r/20260520080523.2513957-1-lee@kernel.org
Yes, and I answered to your patch.
Is there some lag in the e-mail communication right now?
That's why I also wondered why you sent a patch one day after my v2
proposal.
Best regards,
Oliver
>
>>> Fixes: f1b4e32aca08 ("can: bcm: use call_rcu() instead of costly synchronize_rcu()")
>>> Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
>>> ---
>>> net/can/bcm.c | 14 ++++++++++++--
>>> 1 file changed, 12 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/net/can/bcm.c b/net/can/bcm.c
>>> index a4bef2c48a55..abf7bd2c2e6f 100644
>>> --- a/net/can/bcm.c
>>> +++ b/net/can/bcm.c
>>> @@ -537,10 +537,16 @@ static void bcm_rx_update_and_send(struct bcm_op *op,
>>>
>>> /* with active throttling timer we are just done here */
>>> if (hrtimer_active(&op->thrtimer))
>>> return;
>>>
>>> + /* bcm_remove_op() may have cancelled thrtimer concurrently with this
>>> + * RCU-protected handler; do not rearm it. Mirrors bcm_rx_starttimer().
>>> + */
>>> + if (READ_ONCE(op->flags) & RX_NO_AUTOTIMER)
>>> + return;
>>> +
>>> /* first reception with enabled throttling mode */
>>> if (!op->kt_lastmsg)
>>> goto rx_changed_settime;
>>>
>>> /* got a second frame inside a potential throttle period? */
>>> @@ -603,11 +609,11 @@ static void bcm_rx_cmp_to_index(struct bcm_op *op, unsigned int index,
>>> /*
>>> * bcm_rx_starttimer - enable timeout monitoring for CAN frame reception
>>> */
>>> static void bcm_rx_starttimer(struct bcm_op *op)
>>> {
>>> - if (op->flags & RX_NO_AUTOTIMER)
>>> + if (READ_ONCE(op->flags) & RX_NO_AUTOTIMER)
>>> return;
>>>
>>> if (op->kt_ival1)
>>> hrtimer_start(&op->timer, op->kt_ival1, HRTIMER_MODE_REL_SOFT);
>>> }
>>> @@ -838,11 +844,11 @@ static int bcm_delete_rx_op(struct list_head *ops, struct bcm_msg_head *mh,
>>> list_for_each_entry_safe(op, n, ops, list) {
>>> if ((op->can_id == mh->can_id) && (op->ifindex == ifindex) &&
>>> (op->flags & CAN_FD_FRAME) == (mh->flags & CAN_FD_FRAME)) {
>>>
>>> /* disable automatic timer on frame reception */
>>> - op->flags |= RX_NO_AUTOTIMER;
>>> + WRITE_ONCE(op->flags, op->flags | RX_NO_AUTOTIMER);
>>>
>>> /*
>>> * Don't care if we're bound or not (due to netdev
>>> * problems) can_rx_unregister() is always a save
>>> * thing to do here.
>>> @@ -1618,10 +1624,14 @@ static int bcm_release(struct socket *sock)
>>>
>>> list_for_each_entry_safe(op, next, &bo->tx_ops, list)
>>> bcm_remove_op(op);
>>>
>>> list_for_each_entry_safe(op, next, &bo->rx_ops, list) {
>>> +
>>> + /* disable automatic timer on frame reception */
>>> + WRITE_ONCE(op->flags, op->flags | RX_NO_AUTOTIMER);
>>> +
>>> /*
>>> * Don't care if we're bound or not (due to netdev problems)
>>> * can_rx_unregister() is always a save thing to do here.
>>> */
>>> if (op->ifindex) {
>>> --
>>> 2.53.0
>>>
>>
>> --
>> Lee Jones
>
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v2] can: bcm: prevent thrtimer UAF in rx path by checking RX_NO_AUTOTIMER
2026-05-20 13:03 ` Oliver Hartkopp
@ 2026-05-20 13:40 ` Lee Jones
2026-05-20 14:06 ` Lee Jones
0 siblings, 1 reply; 14+ messages in thread
From: Lee Jones @ 2026-05-20 13:40 UTC (permalink / raw)
To: Oliver Hartkopp; +Cc: linux-can
On Wed, 20 May 2026, Oliver Hartkopp wrote:
>
>
> On 20.05.26 14:49, Lee Jones wrote:
> > On Wed, 20 May 2026, Lee Jones wrote:
> >
> > > On Tue, 19 May 2026, Oliver Hartkopp wrote:
> > >
> > > > From: Lee Jones <lee@kernel.org>
> > > >
> > > > Commit f1b4e32aca08 ("can: bcm: use call_rcu() instead of costly
> > > > synchronize_rcu()") removed the synchronize_rcu() call from
> > > > bcm_delete_rx_op() and introduced the RX_NO_AUTOTIMER flag to prevent
> > > > timers from being rearmed during deletion. However, it only applied
> > > > this check to op->timer via bcm_rx_starttimer().
> > > >
> > > > It missed the fact that op->thrtimer can also be rearmed by an
> > > > in-flight bcm_rx_handler() (which runs as an RCU reader) via
> > > > bcm_rx_update_and_send(). This allows op->thrtimer to be queued after
> > > > bcm_remove_op() has already cancelled it, leading to a use-after-free
> > > > when the timer fires on the deferred-freed struct bcm_op.
> > > >
> > > > Address the omission by checking the RX_NO_AUTOTIMER flag
> > > > in bcm_rx_update_and_send() before starting op->thrtimer, effectively
> > > > preventing it from being rearmed concurrently with teardown.
> > > >
> > > > [Hartkopp] Added the setting of RX_NO_AUTOTIMER also to bcm_release() before
> > > > removing the CAN filters following the bcm_delete_rx_op() approach.
> > > >
> > > > Additionally WRITE_ONCE()/READ_ONCE() macros have been introduced for
> > > > the changes of RX_NO_AUTOTIMER at rx op removal time to prevent a
> > > > potential code reordering of RX_NO_AUTOTIMER setting after CAN filter removal.
> > > >
> > > > Signed-off-by: Lee Jones <lee@kernel.org>
> > > > Co-developed-by: Oliver Hartkopp <socketcan@hartkopp.net>
> > >
> > > You did? Can you add a note saying what you changed please?
> >
> > FYI, did you also see the second swing I took at this:
> >
> > https://lore.kernel.org/r/20260520080523.2513957-1-lee@kernel.org
>
> Yes, and I answered to your patch.
>
> Is there some lag in the e-mail communication right now?
>
> That's why I also wondered why you sent a patch one day after my v2
> proposal.
Right. I only saw your proposal today.
I've been working the alternative since Jakub NACKed the first submission.
--
Lee Jones
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v2] can: bcm: prevent thrtimer UAF in rx path by checking RX_NO_AUTOTIMER
2026-05-20 13:40 ` Lee Jones
@ 2026-05-20 14:06 ` Lee Jones
2026-05-20 15:23 ` Oliver Hartkopp
0 siblings, 1 reply; 14+ messages in thread
From: Lee Jones @ 2026-05-20 14:06 UTC (permalink / raw)
To: Oliver Hartkopp; +Cc: linux-can
On Wed, 20 May 2026, Lee Jones wrote:
> On Wed, 20 May 2026, Oliver Hartkopp wrote:
>
> >
> >
> > On 20.05.26 14:49, Lee Jones wrote:
> > > On Wed, 20 May 2026, Lee Jones wrote:
> > >
> > > > On Tue, 19 May 2026, Oliver Hartkopp wrote:
> > > >
> > > > > From: Lee Jones <lee@kernel.org>
> > > > >
> > > > > Commit f1b4e32aca08 ("can: bcm: use call_rcu() instead of costly
> > > > > synchronize_rcu()") removed the synchronize_rcu() call from
> > > > > bcm_delete_rx_op() and introduced the RX_NO_AUTOTIMER flag to prevent
> > > > > timers from being rearmed during deletion. However, it only applied
> > > > > this check to op->timer via bcm_rx_starttimer().
> > > > >
> > > > > It missed the fact that op->thrtimer can also be rearmed by an
> > > > > in-flight bcm_rx_handler() (which runs as an RCU reader) via
> > > > > bcm_rx_update_and_send(). This allows op->thrtimer to be queued after
> > > > > bcm_remove_op() has already cancelled it, leading to a use-after-free
> > > > > when the timer fires on the deferred-freed struct bcm_op.
> > > > >
> > > > > Address the omission by checking the RX_NO_AUTOTIMER flag
> > > > > in bcm_rx_update_and_send() before starting op->thrtimer, effectively
> > > > > preventing it from being rearmed concurrently with teardown.
> > > > >
> > > > > [Hartkopp] Added the setting of RX_NO_AUTOTIMER also to bcm_release() before
> > > > > removing the CAN filters following the bcm_delete_rx_op() approach.
> > > > >
> > > > > Additionally WRITE_ONCE()/READ_ONCE() macros have been introduced for
> > > > > the changes of RX_NO_AUTOTIMER at rx op removal time to prevent a
> > > > > potential code reordering of RX_NO_AUTOTIMER setting after CAN filter removal.
> > > > >
> > > > > Signed-off-by: Lee Jones <lee@kernel.org>
> > > > > Co-developed-by: Oliver Hartkopp <socketcan@hartkopp.net>
> > > >
> > > > You did? Can you add a note saying what you changed please?
> > >
> > > FYI, did you also see the second swing I took at this:
> > >
> > > https://lore.kernel.org/r/20260520080523.2513957-1-lee@kernel.org
> >
> > Yes, and I answered to your patch.
> >
> > Is there some lag in the e-mail communication right now?
> >
> > That's why I also wondered why you sent a patch one day after my v2
> > proposal.
>
> Right. I only saw your proposal today.
>
> I've been working the alternative since Jakub NACKed the first submission.
Okay, so I fed both of our v2 fixes into Gemini Next and requested a
critical review of both approaches. The TL;DR is that this v2 is better
than my v1, but still contains the reported race and isn't as solid as
the work queue solution.
In the interest of full disclosure, here is the full analysis for your perusal:
I have critically evaluated the alternative patch ( branch: b-499356389-can-bcm-uaf-v2 ) currently in contention on the mailing list.
While this alternative patch represents a highly refined version of the "flag check" approach (incorporating memory barriers and socket
release hooks), it is still architecturally inferior to our Workqueue-Deferred Process Context Cleanup.
Below is the critical technical comparison of how the alternative patch fares against our solution, focused on race resilience,
performance hot-paths, and upstream validation.
──────
### 1. The TOCTOU Race Window is Still Theoretically Open
The core mechanism of the alternative patch relies on setting RX_NO_AUTOTIMER via WRITE_ONCE and checking it inside the hot-path (
bcm_rx_update_and_send ) via READ_ONCE .
While READ_ONCE / WRITE_ONCE enforce volatile memory accesses and prevent compiler reorderings, they do not provide hardware-level
atomic synchronization or serialization (such as spinlocks or memory barriers like smp_mb() ) between the check and the timer arming.
This leaves a classic Time-of-Check to Time-of-Use (TOCTOU) race window open:
CPU A (RCU Reader: bcm_rx_handler) | CPU B (Writer: bcm_delete_rx_op)
-------------------------------------------+-------------------------------------------
READ_ONCE(op->flags) & RX_NO_AUTOTIMER |
/* flag is NOT set; passes check */ |
| WRITE_ONCE(op->flags, flags | RX_NO_AUTOTIMER);
| hrtimer_cancel(&op->thrtimer); /* Returns 0 (no-op) */
| call_rcu(&op->rcu, bcm_free_op_rcu);
hrtimer_start(&op->thrtimer, ...); |
/* thrtimer is armed AFTER cancellation! */|
| RCU grace period elapses...
| bcm_free_op_rcu() -> kfree(op);
thrtimer fires |
bcm_rx_thr_handler(op) [UAF CRASH!] |
If CPU B executes the write and the cancellation after CPU A has evaluated the READ_ONCE flag check but before CPU A calls
hrtimer_start() , the cancellation is a no-op, the timer is successfully armed, and the Use-After-Free occurs once RCU frees the memory.
#### Why Our Solution is Bulletproof:
Our Workqueue-deferred cleanup does not rely on flag synchronization. Because bcm_free_op_work executes after the RCU grace period has
fully elapsed, all in-flight RCU readers ( bcm_rx_handler ) are guaranteed to have exited. Therefore, any possible racing
hrtimer_start() has already executed, and our final hrtimer_cancel() right before kfree(op) is guaranteed to catch and safely
neutralize the active timer. Our design mathematically eliminates the TOCTOU race.
──────
### 2. Hot-Path Performance Overhead
• The Alternative Patch:
It introduces a volatile read check ( READ_ONCE(op->flags) & RX_NO_AUTOTIMER ) inside bcm_rx_update_and_send() . This is the critical
fast-path executed for every single CAN frame matching the socket filters. Under high-frequency industrial or automotive CAN bus routing,
adding branch/conditional instructions to the packet hot-path introduces cumulative latency.
• Our Solution:
Leaves bcm_rx_update_and_send() completely pristine. Zero hot-path overhead is introduced.
──────
### 3. Evaluation Summary Against All Solutions
Architectural Criteria │ Solut… │ Solut… │ Alternative Patch (Commit 2074ea86 ) │ Our Solution: Dedicated Workqueue ( can-bc…
──────────────────────────┼────────┼────────┼─────────────────────────────────────────────┼─────────────────────────────────────────────
TOCTOU Race Resilience │ ❌ │ None │ ⚠️ Weak (TOCTOU race window remains open on │ Bulletproof (RCU grace period guarantees
│ Vulner │ (Fatal │ SMP concurrency). │ all readers have completed).
│ able │ ) │ │
Atomic Context Safety │ Safe │ ❌ │ Safe (Cancellations performed in process │ Safe (Cancellations executed in sleepable
│ │ FATAL │ contexts). │ process context).
│ │ PANIC │ │
│ │ (Calls │ │
│ │ hrtime │ │
│ │ r_canc │ │
│ │ el in │ │
│ │ softir │ │
│ │ q │ │
│ │ callba │ │
│ │ ck). │ │
Fast-Path Packet Latency │ ❌ │ None │ ❌ Overhead (Volatile branch check in │ Zero Latency (Fast-path remains untouched).
│ Overhe │ │ packet hot-path). │
│ ad │ │ │
│ (Hot- │ │ │
│ path │ │ │
│ checks │ │ │
│ ). │ │ │
Module Unloading Safety │ Safe │ None │ Safe │ Safe ( destroy_workqueue flushes WQ before
│ │ │ │ code unmap).
Workqueue Saturation │ None │ None │ None │ Safe (Isolated dedicated WQ prevents system
│ │ │ │ starvation).
──────
### Conclusion: What Will Upstream Say?
If reviewed critically by upstream LLMs or senior maintainers:
1. They will point out the TOCTOU race in the alternative patch: Reviewers will recognize that READ_ONCE / WRITE_ONCE do not enforce
synchronization between the flag check and hrtimer_start() , leaving the UAF theoretically reachable under tight SMP concurrency.
2. They will prefer our zero-overhead fast-path: Upstream networking maintainers always favor solutions that isolate teardown logic to
the asynchronous slow-path (workqueues) rather than polluting the hot packet-reception path with cleanup flags.
Our Dedicated Unbound Workqueue design remains the most mathematically secure, performant, and architecturally elegant fix for this
vulnerability.
--
Lee Jones
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v2] can: bcm: prevent thrtimer UAF in rx path by checking RX_NO_AUTOTIMER
2026-05-20 14:06 ` Lee Jones
@ 2026-05-20 15:23 ` Oliver Hartkopp
2026-05-20 16:13 ` Lee Jones
0 siblings, 1 reply; 14+ messages in thread
From: Oliver Hartkopp @ 2026-05-20 15:23 UTC (permalink / raw)
To: Lee Jones; +Cc: linux-can
On 20.05.26 16:06, Lee Jones wrote:
> On Wed, 20 May 2026, Lee Jones wrote:
>
>> On Wed, 20 May 2026, Oliver Hartkopp wrote:
>>
>>>
>>>
>>> On 20.05.26 14:49, Lee Jones wrote:
>>>> On Wed, 20 May 2026, Lee Jones wrote:
>>>>
>>>>> On Tue, 19 May 2026, Oliver Hartkopp wrote:
>>>>>
>>>>>> From: Lee Jones <lee@kernel.org>
>>>>>>
>>>>>> Commit f1b4e32aca08 ("can: bcm: use call_rcu() instead of costly
>>>>>> synchronize_rcu()") removed the synchronize_rcu() call from
>>>>>> bcm_delete_rx_op() and introduced the RX_NO_AUTOTIMER flag to prevent
>>>>>> timers from being rearmed during deletion. However, it only applied
>>>>>> this check to op->timer via bcm_rx_starttimer().
>>>>>>
>>>>>> It missed the fact that op->thrtimer can also be rearmed by an
>>>>>> in-flight bcm_rx_handler() (which runs as an RCU reader) via
>>>>>> bcm_rx_update_and_send(). This allows op->thrtimer to be queued after
>>>>>> bcm_remove_op() has already cancelled it, leading to a use-after-free
>>>>>> when the timer fires on the deferred-freed struct bcm_op.
>>>>>>
>>>>>> Address the omission by checking the RX_NO_AUTOTIMER flag
>>>>>> in bcm_rx_update_and_send() before starting op->thrtimer, effectively
>>>>>> preventing it from being rearmed concurrently with teardown.
>>>>>>
>>>>>> [Hartkopp] Added the setting of RX_NO_AUTOTIMER also to bcm_release() before
>>>>>> removing the CAN filters following the bcm_delete_rx_op() approach.
>>>>>>
>>>>>> Additionally WRITE_ONCE()/READ_ONCE() macros have been introduced for
>>>>>> the changes of RX_NO_AUTOTIMER at rx op removal time to prevent a
>>>>>> potential code reordering of RX_NO_AUTOTIMER setting after CAN filter removal.
>>>>>>
>>>>>> Signed-off-by: Lee Jones <lee@kernel.org>
>>>>>> Co-developed-by: Oliver Hartkopp <socketcan@hartkopp.net>
>>>>>
>>>>> You did? Can you add a note saying what you changed please?
>>>>
>>>> FYI, did you also see the second swing I took at this:
>>>>
>>>> https://lore.kernel.org/r/20260520080523.2513957-1-lee@kernel.org
>>>
>>> Yes, and I answered to your patch.
>>>
>>> Is there some lag in the e-mail communication right now?
>>>
>>> That's why I also wondered why you sent a patch one day after my v2
>>> proposal.
>>
>> Right. I only saw your proposal today.
>>
>> I've been working the alternative since Jakub NACKed the first submission.
>
> Okay, so I fed both of our v2 fixes into Gemini Next and requested a
> critical review of both approaches. The TL;DR is that this v2 is better
> than my v1, but still contains the reported race and isn't as solid as
> the work queue solution.
>
> In the interest of full disclosure, here is the full analysis for your perusal:
>
> I have critically evaluated the alternative patch ( branch: b-499356389-can-bcm-uaf-v2 ) currently in contention on the mailing list.
>
> While this alternative patch represents a highly refined version of the "flag check" approach (incorporating memory barriers and socket
> release hooks), it is still architecturally inferior to our Workqueue-Deferred Process Context Cleanup.
>
> Below is the critical technical comparison of how the alternative patch fares against our solution, focused on race resilience,
> performance hot-paths, and upstream validation.
> ──────
> ### 1. The TOCTOU Race Window is Still Theoretically Open
>
> The core mechanism of the alternative patch relies on setting RX_NO_AUTOTIMER via WRITE_ONCE and checking it inside the hot-path (
> bcm_rx_update_and_send ) via READ_ONCE .
>
> While READ_ONCE / WRITE_ONCE enforce volatile memory accesses and prevent compiler reorderings, they do not provide hardware-level
> atomic synchronization or serialization (such as spinlocks or memory barriers like smp_mb() ) between the check and the timer arming.
>
> This leaves a classic Time-of-Check to Time-of-Use (TOCTOU) race window open:
>
> CPU A (RCU Reader: bcm_rx_handler) | CPU B (Writer: bcm_delete_rx_op)
> -------------------------------------------+-------------------------------------------
> READ_ONCE(op->flags) & RX_NO_AUTOTIMER |
> /* flag is NOT set; passes check */ |
> | WRITE_ONCE(op->flags, flags | RX_NO_AUTOTIMER);
> | hrtimer_cancel(&op->thrtimer); /* Returns 0 (no-op) */
> | call_rcu(&op->rcu, bcm_free_op_rcu);
> hrtimer_start(&op->thrtimer, ...); |
> /* thrtimer is armed AFTER cancellation! */|
> | RCU grace period elapses...
> | bcm_free_op_rcu() -> kfree(op);
> thrtimer fires |
> bcm_rx_thr_handler(op) [UAF CRASH!] |
>
> If CPU B executes the write and the cancellation after CPU A has evaluated the READ_ONCE flag check but before CPU A calls
> hrtimer_start() , the cancellation is a no-op, the timer is successfully armed, and the Use-After-Free occurs once RCU frees the memory.
>
> #### Why Our Solution is Bulletproof:
>
> Our Workqueue-deferred cleanup does not rely on flag synchronization. Because bcm_free_op_work executes after the RCU grace period has
> fully elapsed, all in-flight RCU readers ( bcm_rx_handler ) are guaranteed to have exited. Therefore, any possible racing
> hrtimer_start() has already executed, and our final hrtimer_cancel() right before kfree(op) is guaranteed to catch and safely
> neutralize the active timer. Our design mathematically eliminates the TOCTOU race.
> ──────
> ### 2. Hot-Path Performance Overhead
>
> • The Alternative Patch:
> It introduces a volatile read check ( READ_ONCE(op->flags) & RX_NO_AUTOTIMER ) inside bcm_rx_update_and_send() . This is the critical
> fast-path executed for every single CAN frame matching the socket filters. Under high-frequency industrial or automotive CAN bus routing,
> adding branch/conditional instructions to the packet hot-path introduces cumulative latency.
> • Our Solution:
> Leaves bcm_rx_update_and_send() completely pristine. Zero hot-path overhead is introduced.
> ──────
> ### 3. Evaluation Summary Against All Solutions
>
> Architectural Criteria │ Solut… │ Solut… │ Alternative Patch (Commit 2074ea86 ) │ Our Solution: Dedicated Workqueue ( can-bc…
> ──────────────────────────┼────────┼────────┼─────────────────────────────────────────────┼─────────────────────────────────────────────
> TOCTOU Race Resilience │ ❌ │ None │ ⚠️ Weak (TOCTOU race window remains open on │ Bulletproof (RCU grace period guarantees
> │ Vulner │ (Fatal │ SMP concurrency). │ all readers have completed).
> │ able │ ) │ │
> Atomic Context Safety │ Safe │ ❌ │ Safe (Cancellations performed in process │ Safe (Cancellations executed in sleepable
> │ │ FATAL │ contexts). │ process context).
> │ │ PANIC │ │
> │ │ (Calls │ │
> │ │ hrtime │ │
> │ │ r_canc │ │
> │ │ el in │ │
> │ │ softir │ │
> │ │ q │ │
> │ │ callba │ │
> │ │ ck). │ │
> Fast-Path Packet Latency │ ❌ │ None │ ❌ Overhead (Volatile branch check in │ Zero Latency (Fast-path remains untouched).
> │ Overhe │ │ packet hot-path). │
> │ ad │ │ │
> │ (Hot- │ │ │
> │ path │ │ │
> │ checks │ │ │
> │ ). │ │ │
> Module Unloading Safety │ Safe │ None │ Safe │ Safe ( destroy_workqueue flushes WQ before
> │ │ │ │ code unmap).
> Workqueue Saturation │ None │ None │ None │ Safe (Isolated dedicated WQ prevents system
> │ │ │ │ starvation).
> ──────
> ### Conclusion: What Will Upstream Say?
>
> If reviewed critically by upstream LLMs or senior maintainers:
>
> 1. They will point out the TOCTOU race in the alternative patch: Reviewers will recognize that READ_ONCE / WRITE_ONCE do not enforce
> synchronization between the flag check and hrtimer_start() , leaving the UAF theoretically reachable under tight SMP concurrency.
> 2. They will prefer our zero-overhead fast-path: Upstream networking maintainers always favor solutions that isolate teardown logic to
> the asynchronous slow-path (workqueues) rather than polluting the hot packet-reception path with cleanup flags.
>
> Our Dedicated Unbound Workqueue design remains the most mathematically secure, performant, and architecturally elegant fix for this
> vulnerability.
>
Ok, thanks for the detailed explanation!
As your updated patch fixes the original patch
f1b4e32aca08 ("can: bcm: use call_rcu() instead of costly
synchronize_rcu()")
we should also revert this op->flags setting that has been introduced
with that patch in addition to the rcu stuff, right?
@@ -755,10 +763,13 @@ static int bcm_delete_rx_op(struct list_head *ops,
struct bcm_msg_head *mh,
list_for_each_entry_safe(op, n, ops, list) {
if ((op->can_id == mh->can_id) && (op->ifindex ==
ifindex) &&
(op->flags & CAN_FD_FRAME) == (mh->flags &
CAN_FD_FRAME)) {
+ /* disable automatic timer on frame reception */
+ op->flags |= RX_NO_AUTOTIMER;
+
/*
* Don't care if we're bound or not (due to netdev
* problems) can_rx_unregister() is always a save
* thing to do here.
*/
Best regards,
Oliver
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v2] can: bcm: prevent thrtimer UAF in rx path by checking RX_NO_AUTOTIMER
2026-05-20 15:23 ` Oliver Hartkopp
@ 2026-05-20 16:13 ` Lee Jones
2026-05-20 18:00 ` Oliver Hartkopp
0 siblings, 1 reply; 14+ messages in thread
From: Lee Jones @ 2026-05-20 16:13 UTC (permalink / raw)
To: Oliver Hartkopp; +Cc: linux-can
On Wed, 20 May 2026, Oliver Hartkopp wrote:
>
>
> On 20.05.26 16:06, Lee Jones wrote:
> > On Wed, 20 May 2026, Lee Jones wrote:
> >
> > > On Wed, 20 May 2026, Oliver Hartkopp wrote:
> > >
> > > >
> > > >
> > > > On 20.05.26 14:49, Lee Jones wrote:
> > > > > On Wed, 20 May 2026, Lee Jones wrote:
> > > > >
> > > > > > On Tue, 19 May 2026, Oliver Hartkopp wrote:
> > > > > >
> > > > > > > From: Lee Jones <lee@kernel.org>
> > > > > > >
> > > > > > > Commit f1b4e32aca08 ("can: bcm: use call_rcu() instead of costly
> > > > > > > synchronize_rcu()") removed the synchronize_rcu() call from
> > > > > > > bcm_delete_rx_op() and introduced the RX_NO_AUTOTIMER flag to prevent
> > > > > > > timers from being rearmed during deletion. However, it only applied
> > > > > > > this check to op->timer via bcm_rx_starttimer().
> > > > > > >
> > > > > > > It missed the fact that op->thrtimer can also be rearmed by an
> > > > > > > in-flight bcm_rx_handler() (which runs as an RCU reader) via
> > > > > > > bcm_rx_update_and_send(). This allows op->thrtimer to be queued after
> > > > > > > bcm_remove_op() has already cancelled it, leading to a use-after-free
> > > > > > > when the timer fires on the deferred-freed struct bcm_op.
> > > > > > >
> > > > > > > Address the omission by checking the RX_NO_AUTOTIMER flag
> > > > > > > in bcm_rx_update_and_send() before starting op->thrtimer, effectively
> > > > > > > preventing it from being rearmed concurrently with teardown.
> > > > > > >
> > > > > > > [Hartkopp] Added the setting of RX_NO_AUTOTIMER also to bcm_release() before
> > > > > > > removing the CAN filters following the bcm_delete_rx_op() approach.
> > > > > > >
> > > > > > > Additionally WRITE_ONCE()/READ_ONCE() macros have been introduced for
> > > > > > > the changes of RX_NO_AUTOTIMER at rx op removal time to prevent a
> > > > > > > potential code reordering of RX_NO_AUTOTIMER setting after CAN filter removal.
> > > > > > >
> > > > > > > Signed-off-by: Lee Jones <lee@kernel.org>
> > > > > > > Co-developed-by: Oliver Hartkopp <socketcan@hartkopp.net>
> > > > > >
> > > > > > You did? Can you add a note saying what you changed please?
> > > > >
> > > > > FYI, did you also see the second swing I took at this:
> > > > >
> > > > > https://lore.kernel.org/r/20260520080523.2513957-1-lee@kernel.org
> > > >
> > > > Yes, and I answered to your patch.
> > > >
> > > > Is there some lag in the e-mail communication right now?
> > > >
> > > > That's why I also wondered why you sent a patch one day after my v2
> > > > proposal.
> > >
> > > Right. I only saw your proposal today.
> > >
> > > I've been working the alternative since Jakub NACKed the first submission.
> >
> > Okay, so I fed both of our v2 fixes into Gemini Next and requested a
> > critical review of both approaches. The TL;DR is that this v2 is better
> > than my v1, but still contains the reported race and isn't as solid as
> > the work queue solution.
> >
> > In the interest of full disclosure, here is the full analysis for your perusal:
> >
> > I have critically evaluated the alternative patch ( branch: b-499356389-can-bcm-uaf-v2 ) currently in contention on the mailing list.
> >
> > While this alternative patch represents a highly refined version of the "flag check" approach (incorporating memory barriers and socket
> > release hooks), it is still architecturally inferior to our Workqueue-Deferred Process Context Cleanup.
> >
> > Below is the critical technical comparison of how the alternative patch fares against our solution, focused on race resilience,
> > performance hot-paths, and upstream validation.
> > ──────
> > ### 1. The TOCTOU Race Window is Still Theoretically Open
> >
> > The core mechanism of the alternative patch relies on setting RX_NO_AUTOTIMER via WRITE_ONCE and checking it inside the hot-path (
> > bcm_rx_update_and_send ) via READ_ONCE .
> >
> > While READ_ONCE / WRITE_ONCE enforce volatile memory accesses and prevent compiler reorderings, they do not provide hardware-level
> > atomic synchronization or serialization (such as spinlocks or memory barriers like smp_mb() ) between the check and the timer arming.
> >
> > This leaves a classic Time-of-Check to Time-of-Use (TOCTOU) race window open:
> >
> > CPU A (RCU Reader: bcm_rx_handler) | CPU B (Writer: bcm_delete_rx_op)
> > -------------------------------------------+-------------------------------------------
> > READ_ONCE(op->flags) & RX_NO_AUTOTIMER |
> > /* flag is NOT set; passes check */ |
> > | WRITE_ONCE(op->flags, flags | RX_NO_AUTOTIMER);
> > | hrtimer_cancel(&op->thrtimer); /* Returns 0 (no-op) */
> > | call_rcu(&op->rcu, bcm_free_op_rcu);
> > hrtimer_start(&op->thrtimer, ...); |
> > /* thrtimer is armed AFTER cancellation! */|
> > | RCU grace period elapses...
> > | bcm_free_op_rcu() -> kfree(op);
> > thrtimer fires |
> > bcm_rx_thr_handler(op) [UAF CRASH!] |
> >
> > If CPU B executes the write and the cancellation after CPU A has evaluated the READ_ONCE flag check but before CPU A calls
> > hrtimer_start() , the cancellation is a no-op, the timer is successfully armed, and the Use-After-Free occurs once RCU frees the memory.
> >
> > #### Why Our Solution is Bulletproof:
> >
> > Our Workqueue-deferred cleanup does not rely on flag synchronization. Because bcm_free_op_work executes after the RCU grace period has
> > fully elapsed, all in-flight RCU readers ( bcm_rx_handler ) are guaranteed to have exited. Therefore, any possible racing
> > hrtimer_start() has already executed, and our final hrtimer_cancel() right before kfree(op) is guaranteed to catch and safely
> > neutralize the active timer. Our design mathematically eliminates the TOCTOU race.
> > ──────
> > ### 2. Hot-Path Performance Overhead
> >
> > • The Alternative Patch:
> > It introduces a volatile read check ( READ_ONCE(op->flags) & RX_NO_AUTOTIMER ) inside bcm_rx_update_and_send() . This is the critical
> > fast-path executed for every single CAN frame matching the socket filters. Under high-frequency industrial or automotive CAN bus routing,
> > adding branch/conditional instructions to the packet hot-path introduces cumulative latency.
> > • Our Solution:
> > Leaves bcm_rx_update_and_send() completely pristine. Zero hot-path overhead is introduced.
> > ──────
> > ### 3. Evaluation Summary Against All Solutions
> >
> > Architectural Criteria │ Solut… │ Solut… │ Alternative Patch (Commit 2074ea86 ) │ Our Solution: Dedicated Workqueue ( can-bc…
> > ──────────────────────────┼────────┼────────┼─────────────────────────────────────────────┼─────────────────────────────────────────────
> > TOCTOU Race Resilience │ ❌ │ None │ ⚠️ Weak (TOCTOU race window remains open on │ Bulletproof (RCU grace period guarantees
> > │ Vulner │ (Fatal │ SMP concurrency). │ all readers have completed).
> > │ able │ ) │ │
> > Atomic Context Safety │ Safe │ ❌ │ Safe (Cancellations performed in process │ Safe (Cancellations executed in sleepable
> > │ │ FATAL │ contexts). │ process context).
> > │ │ PANIC │ │
> > │ │ (Calls │ │
> > │ │ hrtime │ │
> > │ │ r_canc │ │
> > │ │ el in │ │
> > │ │ softir │ │
> > │ │ q │ │
> > │ │ callba │ │
> > │ │ ck). │ │
> > Fast-Path Packet Latency │ ❌ │ None │ ❌ Overhead (Volatile branch check in │ Zero Latency (Fast-path remains untouched).
> > │ Overhe │ │ packet hot-path). │
> > │ ad │ │ │
> > │ (Hot- │ │ │
> > │ path │ │ │
> > │ checks │ │ │
> > │ ). │ │ │
> > Module Unloading Safety │ Safe │ None │ Safe │ Safe ( destroy_workqueue flushes WQ before
> > │ │ │ │ code unmap).
> > Workqueue Saturation │ None │ None │ None │ Safe (Isolated dedicated WQ prevents system
> > │ │ │ │ starvation).
> > ──────
> > ### Conclusion: What Will Upstream Say?
> >
> > If reviewed critically by upstream LLMs or senior maintainers:
> >
> > 1. They will point out the TOCTOU race in the alternative patch: Reviewers will recognize that READ_ONCE / WRITE_ONCE do not enforce
> > synchronization between the flag check and hrtimer_start() , leaving the UAF theoretically reachable under tight SMP concurrency.
> > 2. They will prefer our zero-overhead fast-path: Upstream networking maintainers always favor solutions that isolate teardown logic to
> > the asynchronous slow-path (workqueues) rather than polluting the hot packet-reception path with cleanup flags.
> >
> > Our Dedicated Unbound Workqueue design remains the most mathematically secure, performant, and architecturally elegant fix for this
> > vulnerability.
> >
>
> Ok, thanks for the detailed explanation!
>
> As your updated patch fixes the original patch
>
> f1b4e32aca08 ("can: bcm: use call_rcu() instead of costly
> synchronize_rcu()")
>
> we should also revert this op->flags setting that has been introduced with
> that patch in addition to the rcu stuff, right?
>
> @@ -755,10 +763,13 @@ static int bcm_delete_rx_op(struct list_head *ops,
> struct bcm_msg_head *mh,
>
> list_for_each_entry_safe(op, n, ops, list) {
> if ((op->can_id == mh->can_id) && (op->ifindex == ifindex)
> &&
> (op->flags & CAN_FD_FRAME) == (mh->flags &
> CAN_FD_FRAME)) {
>
> + /* disable automatic timer on frame reception */
> + op->flags |= RX_NO_AUTOTIMER;
> +
You mean from v1? I thought that was NACKed and not applied?
My follow-up was a replacement for it.
--
Lee Jones
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v2] can: bcm: prevent thrtimer UAF in rx path by checking RX_NO_AUTOTIMER
2026-05-20 16:13 ` Lee Jones
@ 2026-05-20 18:00 ` Oliver Hartkopp
2026-05-21 11:07 ` Lee Jones
0 siblings, 1 reply; 14+ messages in thread
From: Oliver Hartkopp @ 2026-05-20 18:00 UTC (permalink / raw)
To: Lee Jones; +Cc: linux-can
On 20.05.26 18:13, Lee Jones wrote:
> On Wed, 20 May 2026, Oliver Hartkopp wrote:
>
>>
>>
>> On 20.05.26 16:06, Lee Jones wrote:
>>> On Wed, 20 May 2026, Lee Jones wrote:
>>>
>>>> On Wed, 20 May 2026, Oliver Hartkopp wrote:
>>>>
>>>>>
>>>>>
>>>>> On 20.05.26 14:49, Lee Jones wrote:
>>>>>> On Wed, 20 May 2026, Lee Jones wrote:
>>>>>>
>>>>>>> On Tue, 19 May 2026, Oliver Hartkopp wrote:
>>>>>>>
>>>>>>>> From: Lee Jones <lee@kernel.org>
>>>>>>>>
>>>>>>>> Commit f1b4e32aca08 ("can: bcm: use call_rcu() instead of costly
>>>>>>>> synchronize_rcu()") removed the synchronize_rcu() call from
>>>>>>>> bcm_delete_rx_op() and introduced the RX_NO_AUTOTIMER flag to prevent
>>>>>>>> timers from being rearmed during deletion. However, it only applied
>>>>>>>> this check to op->timer via bcm_rx_starttimer().
>>>>>>>>
>>>>>>>> It missed the fact that op->thrtimer can also be rearmed by an
>>>>>>>> in-flight bcm_rx_handler() (which runs as an RCU reader) via
>>>>>>>> bcm_rx_update_and_send(). This allows op->thrtimer to be queued after
>>>>>>>> bcm_remove_op() has already cancelled it, leading to a use-after-free
>>>>>>>> when the timer fires on the deferred-freed struct bcm_op.
>>>>>>>>
>>>>>>>> Address the omission by checking the RX_NO_AUTOTIMER flag
>>>>>>>> in bcm_rx_update_and_send() before starting op->thrtimer, effectively
>>>>>>>> preventing it from being rearmed concurrently with teardown.
>>>>>>>>
>>>>>>>> [Hartkopp] Added the setting of RX_NO_AUTOTIMER also to bcm_release() before
>>>>>>>> removing the CAN filters following the bcm_delete_rx_op() approach.
>>>>>>>>
>>>>>>>> Additionally WRITE_ONCE()/READ_ONCE() macros have been introduced for
>>>>>>>> the changes of RX_NO_AUTOTIMER at rx op removal time to prevent a
>>>>>>>> potential code reordering of RX_NO_AUTOTIMER setting after CAN filter removal.
>>>>>>>>
>>>>>>>> Signed-off-by: Lee Jones <lee@kernel.org>
>>>>>>>> Co-developed-by: Oliver Hartkopp <socketcan@hartkopp.net>
>>>>>>>
>>>>>>> You did? Can you add a note saying what you changed please?
>>>>>>
>>>>>> FYI, did you also see the second swing I took at this:
>>>>>>
>>>>>> https://lore.kernel.org/r/20260520080523.2513957-1-lee@kernel.org
>>>>>
>>>>> Yes, and I answered to your patch.
>>>>>
>>>>> Is there some lag in the e-mail communication right now?
>>>>>
>>>>> That's why I also wondered why you sent a patch one day after my v2
>>>>> proposal.
>>>>
>>>> Right. I only saw your proposal today.
>>>>
>>>> I've been working the alternative since Jakub NACKed the first submission.
>>>
>>> Okay, so I fed both of our v2 fixes into Gemini Next and requested a
>>> critical review of both approaches. The TL;DR is that this v2 is better
>>> than my v1, but still contains the reported race and isn't as solid as
>>> the work queue solution.
>>>
>>> In the interest of full disclosure, here is the full analysis for your perusal:
>>>
>>> I have critically evaluated the alternative patch ( branch: b-499356389-can-bcm-uaf-v2 ) currently in contention on the mailing list.
>>>
>>> While this alternative patch represents a highly refined version of the "flag check" approach (incorporating memory barriers and socket
>>> release hooks), it is still architecturally inferior to our Workqueue-Deferred Process Context Cleanup.
>>>
>>> Below is the critical technical comparison of how the alternative patch fares against our solution, focused on race resilience,
>>> performance hot-paths, and upstream validation.
>>> ──────
>>> ### 1. The TOCTOU Race Window is Still Theoretically Open
>>>
>>> The core mechanism of the alternative patch relies on setting RX_NO_AUTOTIMER via WRITE_ONCE and checking it inside the hot-path (
>>> bcm_rx_update_and_send ) via READ_ONCE .
>>>
>>> While READ_ONCE / WRITE_ONCE enforce volatile memory accesses and prevent compiler reorderings, they do not provide hardware-level
>>> atomic synchronization or serialization (such as spinlocks or memory barriers like smp_mb() ) between the check and the timer arming.
>>>
>>> This leaves a classic Time-of-Check to Time-of-Use (TOCTOU) race window open:
>>>
>>> CPU A (RCU Reader: bcm_rx_handler) | CPU B (Writer: bcm_delete_rx_op)
>>> -------------------------------------------+-------------------------------------------
>>> READ_ONCE(op->flags) & RX_NO_AUTOTIMER |
>>> /* flag is NOT set; passes check */ |
>>> | WRITE_ONCE(op->flags, flags | RX_NO_AUTOTIMER);
>>> | hrtimer_cancel(&op->thrtimer); /* Returns 0 (no-op) */
>>> | call_rcu(&op->rcu, bcm_free_op_rcu);
>>> hrtimer_start(&op->thrtimer, ...); |
>>> /* thrtimer is armed AFTER cancellation! */|
>>> | RCU grace period elapses...
>>> | bcm_free_op_rcu() -> kfree(op);
>>> thrtimer fires |
>>> bcm_rx_thr_handler(op) [UAF CRASH!] |
>>>
>>> If CPU B executes the write and the cancellation after CPU A has evaluated the READ_ONCE flag check but before CPU A calls
>>> hrtimer_start() , the cancellation is a no-op, the timer is successfully armed, and the Use-After-Free occurs once RCU frees the memory.
>>>
>>> #### Why Our Solution is Bulletproof:
>>>
>>> Our Workqueue-deferred cleanup does not rely on flag synchronization. Because bcm_free_op_work executes after the RCU grace period has
>>> fully elapsed, all in-flight RCU readers ( bcm_rx_handler ) are guaranteed to have exited. Therefore, any possible racing
>>> hrtimer_start() has already executed, and our final hrtimer_cancel() right before kfree(op) is guaranteed to catch and safely
>>> neutralize the active timer. Our design mathematically eliminates the TOCTOU race.
>>> ──────
>>> ### 2. Hot-Path Performance Overhead
>>>
>>> • The Alternative Patch:
>>> It introduces a volatile read check ( READ_ONCE(op->flags) & RX_NO_AUTOTIMER ) inside bcm_rx_update_and_send() . This is the critical
>>> fast-path executed for every single CAN frame matching the socket filters. Under high-frequency industrial or automotive CAN bus routing,
>>> adding branch/conditional instructions to the packet hot-path introduces cumulative latency.
>>> • Our Solution:
>>> Leaves bcm_rx_update_and_send() completely pristine. Zero hot-path overhead is introduced.
>>> ──────
>>> ### 3. Evaluation Summary Against All Solutions
>>>
>>> Architectural Criteria │ Solut… │ Solut… │ Alternative Patch (Commit 2074ea86 ) │ Our Solution: Dedicated Workqueue ( can-bc…
>>> ──────────────────────────┼────────┼────────┼─────────────────────────────────────────────┼─────────────────────────────────────────────
>>> TOCTOU Race Resilience │ ❌ │ None │ ⚠️ Weak (TOCTOU race window remains open on │ Bulletproof (RCU grace period guarantees
>>> │ Vulner │ (Fatal │ SMP concurrency). │ all readers have completed).
>>> │ able │ ) │ │
>>> Atomic Context Safety │ Safe │ ❌ │ Safe (Cancellations performed in process │ Safe (Cancellations executed in sleepable
>>> │ │ FATAL │ contexts). │ process context).
>>> │ │ PANIC │ │
>>> │ │ (Calls │ │
>>> │ │ hrtime │ │
>>> │ │ r_canc │ │
>>> │ │ el in │ │
>>> │ │ softir │ │
>>> │ │ q │ │
>>> │ │ callba │ │
>>> │ │ ck). │ │
>>> Fast-Path Packet Latency │ ❌ │ None │ ❌ Overhead (Volatile branch check in │ Zero Latency (Fast-path remains untouched).
>>> │ Overhe │ │ packet hot-path). │
>>> │ ad │ │ │
>>> │ (Hot- │ │ │
>>> │ path │ │ │
>>> │ checks │ │ │
>>> │ ). │ │ │
>>> Module Unloading Safety │ Safe │ None │ Safe │ Safe ( destroy_workqueue flushes WQ before
>>> │ │ │ │ code unmap).
>>> Workqueue Saturation │ None │ None │ None │ Safe (Isolated dedicated WQ prevents system
>>> │ │ │ │ starvation).
>>> ──────
>>> ### Conclusion: What Will Upstream Say?
>>>
>>> If reviewed critically by upstream LLMs or senior maintainers:
>>>
>>> 1. They will point out the TOCTOU race in the alternative patch: Reviewers will recognize that READ_ONCE / WRITE_ONCE do not enforce
>>> synchronization between the flag check and hrtimer_start() , leaving the UAF theoretically reachable under tight SMP concurrency.
>>> 2. They will prefer our zero-overhead fast-path: Upstream networking maintainers always favor solutions that isolate teardown logic to
>>> the asynchronous slow-path (workqueues) rather than polluting the hot packet-reception path with cleanup flags.
>>>
>>> Our Dedicated Unbound Workqueue design remains the most mathematically secure, performant, and architecturally elegant fix for this
>>> vulnerability.
>>>
>>
>> Ok, thanks for the detailed explanation!
>>
>> As your updated patch fixes the original patch
>>
>> f1b4e32aca08 ("can: bcm: use call_rcu() instead of costly
>> synchronize_rcu()")
>>
>> we should also revert this op->flags setting that has been introduced with
>> that patch in addition to the rcu stuff, right?
>>
>> @@ -755,10 +763,13 @@ static int bcm_delete_rx_op(struct list_head *ops,
>> struct bcm_msg_head *mh,
>>
>> list_for_each_entry_safe(op, n, ops, list) {
>> if ((op->can_id == mh->can_id) && (op->ifindex == ifindex)
>> &&
>> (op->flags & CAN_FD_FRAME) == (mh->flags &
>> CAN_FD_FRAME)) {
>>
>> + /* disable automatic timer on frame reception */
>> + op->flags |= RX_NO_AUTOTIMER;
>> +
>
> You mean from v1? I thought that was NACKed and not applied?
No. These two lines were introduced in the original patch you aim to
fix. So when fixing the original "use call_rcu() instead of costly
synchronize_rcu()" patch that introduced the rcu stuff, this now
obsolete op->flags |= RX_NO_AUTOTIMER should be removed too.
Best regards,
Oliver
>
> My follow-up was a replacement for it.
>
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v2] can: bcm: prevent thrtimer UAF in rx path by checking RX_NO_AUTOTIMER
2026-05-20 18:00 ` Oliver Hartkopp
@ 2026-05-21 11:07 ` Lee Jones
2026-05-21 11:35 ` Oliver Hartkopp
0 siblings, 1 reply; 14+ messages in thread
From: Lee Jones @ 2026-05-21 11:07 UTC (permalink / raw)
To: Oliver Hartkopp; +Cc: linux-can
On Wed, 20 May 2026, Oliver Hartkopp wrote:
> > > Ok, thanks for the detailed explanation!
> > >
> > > As your updated patch fixes the original patch
> > >
> > > f1b4e32aca08 ("can: bcm: use call_rcu() instead of costly
> > > synchronize_rcu()")
> > >
> > > we should also revert this op->flags setting that has been introduced with
> > > that patch in addition to the rcu stuff, right?
> > >
> > > @@ -755,10 +763,13 @@ static int bcm_delete_rx_op(struct list_head *ops,
> > > struct bcm_msg_head *mh,
> > >
> > > list_for_each_entry_safe(op, n, ops, list) {
> > > if ((op->can_id == mh->can_id) && (op->ifindex == ifindex)
> > > &&
> > > (op->flags & CAN_FD_FRAME) == (mh->flags &
> > > CAN_FD_FRAME)) {
> > >
> > > + /* disable automatic timer on frame reception */
> > > + op->flags |= RX_NO_AUTOTIMER;
> > > +
> >
> > You mean from v1? I thought that was NACKed and not applied?
>
> No. These two lines were introduced in the original patch you aim to fix. So
> when fixing the original "use call_rcu() instead of costly
> synchronize_rcu()" patch that introduced the rcu stuff, this now obsolete
> op->flags |= RX_NO_AUTOTIMER should be removed too.
Is that not orthogonal to the fix?
If it is, would you be kind enough to submit a fix (and get your own
creds up ;-D). This is usually something that I'd gleefully pick up,
but I'm absolutely inundated with vulnerability reports right now.
--
Lee Jones
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v2] can: bcm: prevent thrtimer UAF in rx path by checking RX_NO_AUTOTIMER
2026-05-21 11:07 ` Lee Jones
@ 2026-05-21 11:35 ` Oliver Hartkopp
2026-05-21 13:51 ` Lee Jones
0 siblings, 1 reply; 14+ messages in thread
From: Oliver Hartkopp @ 2026-05-21 11:35 UTC (permalink / raw)
To: Lee Jones; +Cc: linux-can
On 21.05.26 13:07, Lee Jones wrote:
> On Wed, 20 May 2026, Oliver Hartkopp wrote:
>>>> Ok, thanks for the detailed explanation!
>>>>
>>>> As your updated patch fixes the original patch
>>>>
>>>> f1b4e32aca08 ("can: bcm: use call_rcu() instead of costly
>>>> synchronize_rcu()")
>>>>
>>>> we should also revert this op->flags setting that has been introduced with
>>>> that patch in addition to the rcu stuff, right?
>>>>
>>>> @@ -755,10 +763,13 @@ static int bcm_delete_rx_op(struct list_head *ops,
>>>> struct bcm_msg_head *mh,
>>>>
>>>> list_for_each_entry_safe(op, n, ops, list) {
>>>> if ((op->can_id == mh->can_id) && (op->ifindex == ifindex)
>>>> &&
>>>> (op->flags & CAN_FD_FRAME) == (mh->flags &
>>>> CAN_FD_FRAME)) {
>>>>
>>>> + /* disable automatic timer on frame reception */
>>>> + op->flags |= RX_NO_AUTOTIMER;
>>>> +
>>>
>>> You mean from v1? I thought that was NACKed and not applied?
>>
>> No. These two lines were introduced in the original patch you aim to fix. So
>> when fixing the original "use call_rcu() instead of costly
>> synchronize_rcu()" patch that introduced the rcu stuff, this now obsolete
>> op->flags |= RX_NO_AUTOTIMER should be removed too.
>
> Is that not orthogonal to the fix?
>
> If it is, would you be kind enough to submit a fix (and get your own
> creds up ;-D). This is usually something that I'd gleefully pick up,
> but I'm absolutely inundated with vulnerability reports right now.
>
Yes, of course I can clean this up later. But not for getting extra
creds :-)
Btw. have you seen the AI remark on your patch that asks about an
additional rcu_barrier() in bcm_module_exit() ?
https://netdev-ai.bots.linux.dev/ai-review.html?id=9e8842e7-59a2-4ec5-9f65-1df7973c6170
says this:
> @@ -1870,6 +1891,7 @@ static void __exit bcm_module_exit(void)
> can_proto_unregister(&bcm_can_proto);
> unregister_netdevice_notifier(&canbcm_notifier);
> unregister_pernet_subsys(&canbcm_pernet_ops);
> + destroy_workqueue(bcm_wq);
> }
Does bcm_module_exit need an rcu_barrier() call before destroy_workqueue?
Consider this sequence:
A socket is being closed while the module is unloading:
bcm_release()
bcm_remove_op()
call_rcu(&op->rcu, bcm_free_op_rcu) // RCU callback scheduled
Module refcount reaches zero, rmmod proceeds
bcm_module_exit()
can_proto_unregister(&bcm_can_proto)
unregister_netdevice_notifier(&canbcm_notifier)
unregister_pernet_subsys(&canbcm_pernet_ops)
destroy_workqueue(bcm_wq) // bcm_wq destroyed
RCU grace period elapses
bcm_free_op_rcu() // callback fires
INIT_WORK(&op->work, bcm_free_op_work)
queue_work(bcm_wq, &op->work) // UAF: bcm_wq already freed
The RCU callback queues work to bcm_wq, but if the callback hasn't fired
yet when bcm_module_exit runs, it will try to queue_work to a destroyed
workqueue. Before this patch, bcm_free_op_rcu called kfree directly without
using module-global resources, so this wasn't an issue.
Should bcm_module_exit call rcu_barrier() before destroy_workqueue to
ensure all pending RCU callbacks complete first?
Many thanks,
Oliver
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v2] can: bcm: prevent thrtimer UAF in rx path by checking RX_NO_AUTOTIMER
2026-05-21 11:35 ` Oliver Hartkopp
@ 2026-05-21 13:51 ` Lee Jones
2026-05-21 17:57 ` Oliver Hartkopp
0 siblings, 1 reply; 14+ messages in thread
From: Lee Jones @ 2026-05-21 13:51 UTC (permalink / raw)
To: Oliver Hartkopp; +Cc: linux-can
On Thu, 21 May 2026, Oliver Hartkopp wrote:
>
>
> On 21.05.26 13:07, Lee Jones wrote:
> > On Wed, 20 May 2026, Oliver Hartkopp wrote:
> > > > > Ok, thanks for the detailed explanation!
> > > > >
> > > > > As your updated patch fixes the original patch
> > > > >
> > > > > f1b4e32aca08 ("can: bcm: use call_rcu() instead of costly
> > > > > synchronize_rcu()")
> > > > >
> > > > > we should also revert this op->flags setting that has been introduced with
> > > > > that patch in addition to the rcu stuff, right?
> > > > >
> > > > > @@ -755,10 +763,13 @@ static int bcm_delete_rx_op(struct list_head *ops,
> > > > > struct bcm_msg_head *mh,
> > > > >
> > > > > list_for_each_entry_safe(op, n, ops, list) {
> > > > > if ((op->can_id == mh->can_id) && (op->ifindex == ifindex)
> > > > > &&
> > > > > (op->flags & CAN_FD_FRAME) == (mh->flags &
> > > > > CAN_FD_FRAME)) {
> > > > >
> > > > > + /* disable automatic timer on frame reception */
> > > > > + op->flags |= RX_NO_AUTOTIMER;
> > > > > +
> > > >
> > > > You mean from v1? I thought that was NACKed and not applied?
> > >
> > > No. These two lines were introduced in the original patch you aim to fix. So
> > > when fixing the original "use call_rcu() instead of costly
> > > synchronize_rcu()" patch that introduced the rcu stuff, this now obsolete
> > > op->flags |= RX_NO_AUTOTIMER should be removed too.
> >
> > Is that not orthogonal to the fix?
> >
> > If it is, would you be kind enough to submit a fix (and get your own
> > creds up ;-D). This is usually something that I'd gleefully pick up,
> > but I'm absolutely inundated with vulnerability reports right now.
> >
>
> Yes, of course I can clean this up later. But not for getting extra creds
> :-)
>
> Btw. have you seen the AI remark on your patch that asks about an additional
> rcu_barrier() in bcm_module_exit() ?
>
> https://netdev-ai.bots.linux.dev/ai-review.html?id=9e8842e7-59a2-4ec5-9f65-1df7973c6170
>
> says this:
>
> > @@ -1870,6 +1891,7 @@ static void __exit bcm_module_exit(void)
> > can_proto_unregister(&bcm_can_proto);
> > unregister_netdevice_notifier(&canbcm_notifier);
> > unregister_pernet_subsys(&canbcm_pernet_ops);
> > + destroy_workqueue(bcm_wq);
> > }
>
> Does bcm_module_exit need an rcu_barrier() call before destroy_workqueue?
> Consider this sequence:
>
> A socket is being closed while the module is unloading:
>
> bcm_release()
> bcm_remove_op()
> call_rcu(&op->rcu, bcm_free_op_rcu) // RCU callback scheduled
>
> Module refcount reaches zero, rmmod proceeds
>
> bcm_module_exit()
> can_proto_unregister(&bcm_can_proto)
> unregister_netdevice_notifier(&canbcm_notifier)
> unregister_pernet_subsys(&canbcm_pernet_ops)
> destroy_workqueue(bcm_wq) // bcm_wq destroyed
>
> RCU grace period elapses
>
> bcm_free_op_rcu() // callback fires
> INIT_WORK(&op->work, bcm_free_op_work)
> queue_work(bcm_wq, &op->work) // UAF: bcm_wq already freed
>
> The RCU callback queues work to bcm_wq, but if the callback hasn't fired
> yet when bcm_module_exit runs, it will try to queue_work to a destroyed
> workqueue. Before this patch, bcm_free_op_rcu called kfree directly without
> using module-global resources, so this wasn't an issue.
>
> Should bcm_module_exit call rcu_barrier() before destroy_workqueue to
> ensure all pending RCU callbacks complete first?
I hadn't seen it (do you get notifications? If so, how do I enable
those?), but it is legitimate. I shall send out a v2 early next week.
--
Lee Jones
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v2] can: bcm: prevent thrtimer UAF in rx path by checking RX_NO_AUTOTIMER
2026-05-21 13:51 ` Lee Jones
@ 2026-05-21 17:57 ` Oliver Hartkopp
0 siblings, 0 replies; 14+ messages in thread
From: Oliver Hartkopp @ 2026-05-21 17:57 UTC (permalink / raw)
To: Lee Jones; +Cc: linux-can
On 21.05.26 15:51, Lee Jones wrote:
> On Thu, 21 May 2026, Oliver Hartkopp wrote:
>> https://netdev-ai.bots.linux.dev/ai-review.html?id=9e8842e7-59a2-4ec5-9f65-1df7973c6170
>>
>> says this:
>>
>>> @@ -1870,6 +1891,7 @@ static void __exit bcm_module_exit(void)
>>> can_proto_unregister(&bcm_can_proto);
>>> unregister_netdevice_notifier(&canbcm_notifier);
>>> unregister_pernet_subsys(&canbcm_pernet_ops);
>>> + destroy_workqueue(bcm_wq);
>>> }
>>
>> Does bcm_module_exit need an rcu_barrier() call before destroy_workqueue?
>> Consider this sequence:
>>
>> A socket is being closed while the module is unloading:
>>
>> bcm_release()
>> bcm_remove_op()
>> call_rcu(&op->rcu, bcm_free_op_rcu) // RCU callback scheduled
>>
>> Module refcount reaches zero, rmmod proceeds
>>
>> bcm_module_exit()
>> can_proto_unregister(&bcm_can_proto)
>> unregister_netdevice_notifier(&canbcm_notifier)
>> unregister_pernet_subsys(&canbcm_pernet_ops)
>> destroy_workqueue(bcm_wq) // bcm_wq destroyed
>>
>> RCU grace period elapses
>>
>> bcm_free_op_rcu() // callback fires
>> INIT_WORK(&op->work, bcm_free_op_work)
>> queue_work(bcm_wq, &op->work) // UAF: bcm_wq already freed
>>
>> The RCU callback queues work to bcm_wq, but if the callback hasn't fired
>> yet when bcm_module_exit runs, it will try to queue_work to a destroyed
>> workqueue. Before this patch, bcm_free_op_rcu called kfree directly without
>> using module-global resources, so this wasn't an issue.
>>
>> Should bcm_module_exit call rcu_barrier() before destroy_workqueue to
>> ensure all pending RCU callbacks complete first?
>
> I hadn't seen it (do you get notifications? If so, how do I enable
> those?), but it is legitimate. I shall send out a v2 early next week.
>
Thanks! Looking forward to it.
We have this AI bot feature inside Patchwork.
Linux-CAN Patchwork
https://patchwork.kernel.org/project/linux-can/list/
Netdev + BPF Patchwork
https://patchwork.kernel.org/project/netdevbpf/list/
And your patches (with AI bot checks) show up there:
[1/1] can: bcm: defer rx_op deallocation to workqueue to fix thrtimer UAF
https://patchwork.kernel.org/project/linux-can/patch/20260520080523.2513957-1-lee@kernel.org/
[1/1] l2tp: pppol2tp: hold reference to session in pppol2tp_ioctl()
https://patchwork.kernel.org/project/netdevbpf/patch/20260520134837.2780050-1-lee@kernel.org/
(which has some AI feedback too)
Best regards,
Oliver
^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2026-05-21 17:58 UTC | newest]
Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-19 11:38 [PATCH v2] can: bcm: prevent thrtimer UAF in rx path by checking RX_NO_AUTOTIMER Oliver Hartkopp
2026-05-20 12:47 ` Lee Jones
2026-05-20 12:49 ` Lee Jones
2026-05-20 13:03 ` Oliver Hartkopp
2026-05-20 13:40 ` Lee Jones
2026-05-20 14:06 ` Lee Jones
2026-05-20 15:23 ` Oliver Hartkopp
2026-05-20 16:13 ` Lee Jones
2026-05-20 18:00 ` Oliver Hartkopp
2026-05-21 11:07 ` Lee Jones
2026-05-21 11:35 ` Oliver Hartkopp
2026-05-21 13:51 ` Lee Jones
2026-05-21 17:57 ` Oliver Hartkopp
2026-05-20 12:59 ` Oliver Hartkopp
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox