[PATCH can] can: bcm: add synchronize_rcu() in bcm_delete_rx

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH can] can: bcm: add synchronize_rcu() in bcm_delete_rx_op() to fix UAF
@ 2026-05-26 10:23 Zhenghang Xiao
  2026-05-26 10:59 ` Oliver Hartkopp
  2026-06-09 18:52 ` sashiko-bot
  0 siblings, 2 replies; 4+ messages in thread
From: Zhenghang Xiao @ 2026-05-26 10:23 UTC (permalink / raw)
  To: Oliver Hartkopp, Marc Kleine-Budde; +Cc: linux-can, Zhenghang Xiao

bcm_delete_rx_op() calls can_rx_unregister() which does hlist_del_rcu(),
preventing new bcm_rx_handler invocations. However, already-dispatched
handlers continue running under rcu_read_lock() on other CPUs. The
subsequent bcm_remove_op() cancels the hrtimers and schedules deferred
free via call_rcu(), but a concurrent bcm_rx_handler can re-arm
op->thrtimer via hrtimer_start() after the cancel returns. The re-armed
timer fires on freed memory after the grace period.

The RX_NO_AUTOTIMER flag added by commit f1b4e32aca08 ("can: bcm: use
call_rcu() instead of costly synchronize_rcu()") only guards op->timer
(via bcm_rx_starttimer), not op->thrtimer (via bcm_rx_update_and_send).

Add synchronize_rcu() between list_del_rcu() and bcm_remove_op(),
matching the pattern in bcm_release(). This ensures all in-flight
handlers complete before timers are cancelled.

Fixes: f1b4e32aca08 ("can: bcm: use call_rcu() instead of costly synchronize_rcu()")
Signed-off-by: Zhenghang Xiao <kipreyyy@gmail.com>
---
 net/can/bcm.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/net/can/bcm.c b/net/can/bcm.c
index a4bef2c48a55..ae083f59a9ef 100644
--- a/net/can/bcm.c
+++ b/net/can/bcm.c
@@ -870,6 +870,7 @@ static int bcm_delete_rx_op(struct list_head *ops, struct bcm_msg_head *mh,
 						  bcm_rx_handler, op);

 			list_del_rcu(&op->list);
+			synchronize_rcu();
 			bcm_remove_op(op);
 			return 1; /* done */
 		}
-- 
2.50.1 (Apple Git-155)

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH can] can: bcm: add synchronize_rcu() in bcm_delete_rx_op() to fix UAF
  2026-05-26 10:23 [PATCH can] can: bcm: add synchronize_rcu() in bcm_delete_rx_op() to fix UAF Zhenghang Xiao
@ 2026-05-26 10:59 ` Oliver Hartkopp
  2026-06-09 18:52 ` sashiko-bot
  1 sibling, 0 replies; 4+ messages in thread
From: Oliver Hartkopp @ 2026-05-26 10:59 UTC (permalink / raw)
  To: Zhenghang Xiao, Marc Kleine-Budde, Lee Jones; +Cc: linux-can

Lee Jones is already working on a workqueue based approach:

https://lore.kernel.org/linux-can/20260520080523.2513957-1-lee@kernel.org/

to not(!) re-introduce the costly synchronize_rcu().

There's only some feedback from the AI bot that has to be considered.

https://netdev-ai.bots.linux.dev/ai-review.html?id=9e8842e7-59a2-4ec5-9f65-1df7973c6170

I expect the updated v2 patch this week.

Best regards,
Oliver

On 26.05.26 12:23, Zhenghang Xiao wrote:
> bcm_delete_rx_op() calls can_rx_unregister() which does hlist_del_rcu(),
> preventing new bcm_rx_handler invocations. However, already-dispatched
> handlers continue running under rcu_read_lock() on other CPUs. The
> subsequent bcm_remove_op() cancels the hrtimers and schedules deferred
> free via call_rcu(), but a concurrent bcm_rx_handler can re-arm
> op->thrtimer via hrtimer_start() after the cancel returns. The re-armed
> timer fires on freed memory after the grace period.
> 
> The RX_NO_AUTOTIMER flag added by commit f1b4e32aca08 ("can: bcm: use
> call_rcu() instead of costly synchronize_rcu()") only guards op->timer
> (via bcm_rx_starttimer), not op->thrtimer (via bcm_rx_update_and_send).
> 
> Add synchronize_rcu() between list_del_rcu() and bcm_remove_op(),
> matching the pattern in bcm_release(). This ensures all in-flight
> handlers complete before timers are cancelled.
> 
> Fixes: f1b4e32aca08 ("can: bcm: use call_rcu() instead of costly synchronize_rcu()")
> Signed-off-by: Zhenghang Xiao <kipreyyy@gmail.com>
> ---
>   net/can/bcm.c | 1 +
>   1 file changed, 1 insertion(+)
> 
> diff --git a/net/can/bcm.c b/net/can/bcm.c
> index a4bef2c48a55..ae083f59a9ef 100644
> --- a/net/can/bcm.c
> +++ b/net/can/bcm.c
> @@ -870,6 +870,7 @@ static int bcm_delete_rx_op(struct list_head *ops, struct bcm_msg_head *mh,
>   						  bcm_rx_handler, op);
>   
>   			list_del_rcu(&op->list);
> +			synchronize_rcu();
>   			bcm_remove_op(op);
>   			return 1; /* done */
>   		}


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH can] can: bcm: add synchronize_rcu() in bcm_delete_rx_op() to fix UAF
  2026-05-26 10:23 [PATCH can] can: bcm: add synchronize_rcu() in bcm_delete_rx_op() to fix UAF Zhenghang Xiao
  2026-05-26 10:59 ` Oliver Hartkopp
@ 2026-06-09 18:52 ` sashiko-bot
  2026-06-10  7:29   ` Oliver Hartkopp
  1 sibling, 1 reply; 4+ messages in thread
From: sashiko-bot @ 2026-06-09 18:52 UTC (permalink / raw)
  To: Zhenghang Xiao
  Cc: Oleksij Rempel, linux-can, Marc Kleine-Budde, Vincent Mailhol,
	Oliver Hartkopp

Thank you for your contribution! Sashiko AI review found 3 potential issue(s) to consider:

Pre-existing issues:
- [High] Skipping `bcm_rx_unreg()` when `dev_get_by_index()` returns NULL leaves the CAN receiver registered in the core, leading to a memory leak and Use-After-Free.
- [High] Incorrect list primitives used for RCU-traversed lists (`rx_ops` and `tx_ops`), leading to list corruption and crashes.
- [High] Concurrent modification of an existing RX operation in `bcm_rx_setup()` races with `bcm_rx_handler()`, allowing readers to see partially-written data.
--

commit 27083d45b99681b318907aa96e90787a12c08a58
Author: Zhenghang Xiao <kipreyyy@gmail.com>

can: bcm: add synchronize_rcu() in bcm_delete_rx_op() to fix UAF

This commit adds synchronize_rcu() in bcm_delete_rx_op() before
bcm_remove_op() to ensure all in-flight handlers complete before timers
are cancelled. This prevents a use-after-free where a timer could be
re-armed by a concurrent bcm_rx_handler after being cancelled.

Fixes: f1b4e32aca08 ("can: bcm: use call_rcu() instead of costly synchronize_rcu()")

> diff --git a/net/can/bcm.c b/net/can/bcm.c
> index a4bef2c48a559..ae083f59a9ef5 100644
> --- a/net/can/bcm.c
> +++ b/net/can/bcm.c
> @@ -870,6 +870,7 @@ static int bcm_delete_rx_op(struct list_head *ops, struct bcm_msg_head *mh,
>  						  bcm_rx_handler, op);
>  
>  			list_del_rcu(&op->list);

[Severity: High]
This isn't a bug introduced by this patch, but looking at the code just
above this addition in bcm_delete_rx_op():

    dev = dev_get_by_index(sock_net(op->sk), op->ifindex);
    if (dev) {
            bcm_rx_unreg(dev, op);
            dev_put(dev);
    }

During device unregistration, unlist_netdevice() makes dev_get_by_index()
return NULL before the NETDEV_UNREGISTER notifier is dispatched.

If user space concurrently closes the socket in bcm_release() or deletes
the RX filter in bcm_delete_rx_op(), these functions check
dev_get_by_index(), see NULL, and bypass calling bcm_rx_unreg().

Does this leave the CAN receiver registered in the core, leaking the
struct receiver and causing a use-after-free when queued frames process
the already freed op pointer?

> +			synchronize_rcu();
>  			bcm_remove_op(op);
>  			return 1; /* done */
>  		}

[Severity: High]
This is a pre-existing issue, but I noticed a potential problem with the
list primitives used for rx_ops and tx_ops in net/can/bcm.c.

In bcm_rx_setup() and bcm_tx_setup(), the code uses list_add():

    list_add(&op->list, &bo->tx_ops);

However, in bcm_proc_show(), the tx_ops list is traversed locklessly:

    list_for_each_entry(op, &bo->tx_ops, list) {

Because list_add() lacks the smp_store_release() memory barriers of
list_add_rcu(), and the traversal doesn't use list_for_each_entry_rcu(),
could this result in readers seeing partially initialized list pointers
and dereferencing a NULL pointer on weakly ordered architectures?

[Severity: High]
This is another pre-existing issue, but there appears to be a race
condition between bcm_rx_setup() and bcm_rx_handler() in net/can/bcm.c.

When bcm_rx_setup() updates an existing RX operation under lock_sock(sk),
it modifies the frames locklessly:

    err = memcpy_from_msg(op->frames, msg, msg_head->nframes * op->cfsiz);
    if (err < 0)
            return err;
    memset(op->last_frames, 0, msg_head->nframes * op->cfsiz);
    op->nframes = msg_head->nframes;

Meanwhile, bcm_rx_handler() accesses these fields in softirq context:

    for (i = 1; i < op->nframes; i++) {
            if ((get_u64(op->frames, 0) & get_u64(rxframe, 0)) ==
                (get_u64(op->frames, 0) &
                 get_u64(op->frames + op->cfsiz * i, 0))) {

Since memcpy_from_msg() can sleep on page faults, could bcm_rx_handler()
read torn or partially overwritten CAN filters, leading to incorrect
message delivery or spurious timer scheduling?

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260526102349.94074-1-kipreyyy@gmail.com?part=1

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH can] can: bcm: add synchronize_rcu() in bcm_delete_rx_op() to fix UAF
  2026-06-09 18:52 ` sashiko-bot
@ 2026-06-10  7:29   ` Oliver Hartkopp
  0 siblings, 0 replies; 4+ messages in thread
From: Oliver Hartkopp @ 2026-06-10  7:29 UTC (permalink / raw)
  To: sashiko-reviews, Zhenghang Xiao
  Cc: Oleksij Rempel, linux-can, Marc Kleine-Budde, Vincent Mailhol

The patch already has been rejected in favor of:

[PATCH v3 1/1] can: bcm: defer rx_op deallocation to workqueue to fix 
thrtimer UAF

https://lore.kernel.org/linux-can/20260526165257.3705239-1-lee@kernel.org/

On 09.06.26 20:52, sashiko-bot@kernel.org wrote:
> Thank you for your contribution! Sashiko AI review found 3 potential issue(s) to consider:
> 
> Pre-existing issues:
> - [High] Skipping `bcm_rx_unreg()` when `dev_get_by_index()` returns NULL leaves the CAN receiver registered in the core, leading to a memory leak and Use-After-Free.
> - [High] Incorrect list primitives used for RCU-traversed lists (`rx_ops` and `tx_ops`), leading to list corruption and crashes.
> - [High] Concurrent modification of an existing RX operation in `bcm_rx_setup()` races with `bcm_rx_handler()`, allowing readers to see partially-written data.
> --
> 
> commit 27083d45b99681b318907aa96e90787a12c08a58
> Author: Zhenghang Xiao <kipreyyy@gmail.com>
> 
> can: bcm: add synchronize_rcu() in bcm_delete_rx_op() to fix UAF
> 
> This commit adds synchronize_rcu() in bcm_delete_rx_op() before
> bcm_remove_op() to ensure all in-flight handlers complete before timers
> are cancelled. This prevents a use-after-free where a timer could be
> re-armed by a concurrent bcm_rx_handler after being cancelled.
> 
> Fixes: f1b4e32aca08 ("can: bcm: use call_rcu() instead of costly synchronize_rcu()")
> 
>> diff --git a/net/can/bcm.c b/net/can/bcm.c
>> index a4bef2c48a559..ae083f59a9ef5 100644
>> --- a/net/can/bcm.c
>> +++ b/net/can/bcm.c
>> @@ -870,6 +870,7 @@ static int bcm_delete_rx_op(struct list_head *ops, struct bcm_msg_head *mh,
>>   						  bcm_rx_handler, op);
>>   
>>   			list_del_rcu(&op->list);
> 
> [Severity: High]
> This isn't a bug introduced by this patch, but looking at the code just
> above this addition in bcm_delete_rx_op():
> 
>      dev = dev_get_by_index(sock_net(op->sk), op->ifindex);
>      if (dev) {
>              bcm_rx_unreg(dev, op);
>              dev_put(dev);
>      }
> 
> During device unregistration, unlist_netdevice() makes dev_get_by_index()
> return NULL before the NETDEV_UNREGISTER notifier is dispatched.
> 
> If user space concurrently closes the socket in bcm_release() or deletes
> the RX filter in bcm_delete_rx_op(), these functions check
> dev_get_by_index(), see NULL, and bypass calling bcm_rx_unreg().
> 
> Does this leave the CAN receiver registered in the core, leaking the
> struct receiver and causing a use-after-free when queued frames process
> the already freed op pointer?
> 
>> +			synchronize_rcu();
>>   			bcm_remove_op(op);
>>   			return 1; /* done */
>>   		}
> 
> [Severity: High]
> This is a pre-existing issue, but I noticed a potential problem with the
> list primitives used for rx_ops and tx_ops in net/can/bcm.c.
> 
> In bcm_rx_setup() and bcm_tx_setup(), the code uses list_add():
> 
>      list_add(&op->list, &bo->tx_ops);
> 
> However, in bcm_proc_show(), the tx_ops list is traversed locklessly:
> 
>      list_for_each_entry(op, &bo->tx_ops, list) {
> 
> Because list_add() lacks the smp_store_release() memory barriers of
> list_add_rcu(), and the traversal doesn't use list_for_each_entry_rcu(),
> could this result in readers seeing partially initialized list pointers
> and dereferencing a NULL pointer on weakly ordered architectures?
> 
> [Severity: High]
> This is another pre-existing issue, but there appears to be a race
> condition between bcm_rx_setup() and bcm_rx_handler() in net/can/bcm.c.
> 
> When bcm_rx_setup() updates an existing RX operation under lock_sock(sk),
> it modifies the frames locklessly:
> 
>      err = memcpy_from_msg(op->frames, msg, msg_head->nframes * op->cfsiz);
>      if (err < 0)
>              return err;
>      memset(op->last_frames, 0, msg_head->nframes * op->cfsiz);
>      op->nframes = msg_head->nframes;
> 
> Meanwhile, bcm_rx_handler() accesses these fields in softirq context:
> 
>      for (i = 1; i < op->nframes; i++) {
>              if ((get_u64(op->frames, 0) & get_u64(rxframe, 0)) ==
>                  (get_u64(op->frames, 0) &
>                   get_u64(op->frames + op->cfsiz * i, 0))) {
> 
> Since memcpy_from_msg() can sleep on page faults, could bcm_rx_handler()
> read torn or partially overwritten CAN filters, leading to incorrect
> message delivery or spurious timer scheduling?
> 


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2026-06-10  7:30 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-26 10:23 [PATCH can] can: bcm: add synchronize_rcu() in bcm_delete_rx_op() to fix UAF Zhenghang Xiao
2026-05-26 10:59 ` Oliver Hartkopp
2026-06-09 18:52 ` sashiko-bot
2026-06-10  7:29   ` Oliver Hartkopp

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.