Linux CAN drivers development
 help / color / mirror / Atom feed
From: Lee Jones <lee@kernel.org>
To: Oliver Hartkopp <socketcan@hartkopp.net>
Cc: linux-can@vger.kernel.org
Subject: Re: [PATCH v2] can: bcm: prevent thrtimer UAF in rx path by checking RX_NO_AUTOTIMER
Date: Wed, 20 May 2026 17:13:08 +0100	[thread overview]
Message-ID: <20260520161308.GL2767592@google.com> (raw)
In-Reply-To: <24a20c37-5cac-4a38-a8f1-ed98b38f7e1d@hartkopp.net>

On Wed, 20 May 2026, Oliver Hartkopp wrote:

> 
> 
> On 20.05.26 16:06, Lee Jones wrote:
> > On Wed, 20 May 2026, Lee Jones wrote:
> > 
> > > On Wed, 20 May 2026, Oliver Hartkopp wrote:
> > > 
> > > > 
> > > > 
> > > > On 20.05.26 14:49, Lee Jones wrote:
> > > > > On Wed, 20 May 2026, Lee Jones wrote:
> > > > > 
> > > > > > On Tue, 19 May 2026, Oliver Hartkopp wrote:
> > > > > > 
> > > > > > > From: Lee Jones <lee@kernel.org>
> > > > > > > 
> > > > > > > Commit f1b4e32aca08 ("can: bcm: use call_rcu() instead of costly
> > > > > > > synchronize_rcu()") removed the synchronize_rcu() call from
> > > > > > > bcm_delete_rx_op() and introduced the RX_NO_AUTOTIMER flag to prevent
> > > > > > > timers from being rearmed during deletion.  However, it only applied
> > > > > > > this check to op->timer via bcm_rx_starttimer().
> > > > > > > 
> > > > > > > It missed the fact that op->thrtimer can also be rearmed by an
> > > > > > > in-flight bcm_rx_handler() (which runs as an RCU reader) via
> > > > > > > bcm_rx_update_and_send().  This allows op->thrtimer to be queued after
> > > > > > > bcm_remove_op() has already cancelled it, leading to a use-after-free
> > > > > > > when the timer fires on the deferred-freed struct bcm_op.
> > > > > > > 
> > > > > > > Address the omission by checking the RX_NO_AUTOTIMER flag
> > > > > > > in bcm_rx_update_and_send() before starting op->thrtimer, effectively
> > > > > > > preventing it from being rearmed concurrently with teardown.
> > > > > > > 
> > > > > > > [Hartkopp] Added the setting of RX_NO_AUTOTIMER also to bcm_release() before
> > > > > > > removing the CAN filters following the bcm_delete_rx_op() approach.
> > > > > > > 
> > > > > > > Additionally WRITE_ONCE()/READ_ONCE() macros have been introduced for
> > > > > > > the changes of RX_NO_AUTOTIMER at rx op removal time to prevent a
> > > > > > > potential code reordering of RX_NO_AUTOTIMER setting after CAN filter removal.
> > > > > > > 
> > > > > > > Signed-off-by: Lee Jones <lee@kernel.org>
> > > > > > > Co-developed-by: Oliver Hartkopp <socketcan@hartkopp.net>
> > > > > > 
> > > > > > You did?  Can you add a note saying what you changed please?
> > > > > 
> > > > > FYI, did you also see the second swing I took at this:
> > > > > 
> > > > > https://lore.kernel.org/r/20260520080523.2513957-1-lee@kernel.org
> > > > 
> > > > Yes, and I answered to your patch.
> > > > 
> > > > Is there some lag in the e-mail communication right now?
> > > > 
> > > > That's why I also wondered why you sent a patch one day after my v2
> > > > proposal.
> > > 
> > > Right.  I only saw your proposal today.
> > > 
> > > I've been working the alternative since Jakub NACKed the first submission.
> > 
> > Okay, so I fed both of our v2 fixes into Gemini Next and requested a
> > critical review of both approaches.  The TL;DR is that this v2 is better
> > than my v1, but still contains the reported race and isn't as solid as
> > the work queue solution.
> > 
> > In the interest of full disclosure, here is the full analysis for your perusal:
> > 
> >    I have critically evaluated the alternative patch ( branch: b-499356389-can-bcm-uaf-v2 ) currently in contention on the mailing list.
> > 
> >    While this alternative patch represents a highly refined version of the "flag check" approach (incorporating memory barriers and socket
> >    release hooks), it is still architecturally inferior to our Workqueue-Deferred Process Context Cleanup.
> > 
> >    Below is the critical technical comparison of how the alternative patch fares against our solution, focused on race resilience,
> >    performance hot-paths, and upstream validation.
> >    ──────
> >    ### 1. The TOCTOU Race Window is Still Theoretically Open
> > 
> >    The core mechanism of the alternative patch relies on setting  RX_NO_AUTOTIMER  via  WRITE_ONCE  and checking it inside the hot-path (
> >    bcm_rx_update_and_send ) via  READ_ONCE .
> > 
> >    While  READ_ONCE / WRITE_ONCE  enforce  volatile  memory accesses and prevent compiler reorderings, they do not provide hardware-level
> >    atomic synchronization or serialization (such as spinlocks or memory barriers like  smp_mb() ) between the check and the timer arming.
> > 
> >    This leaves a classic Time-of-Check to Time-of-Use (TOCTOU) race window open:
> > 
> >      CPU A (RCU Reader: bcm_rx_handler)         | CPU B (Writer: bcm_delete_rx_op)
> >      -------------------------------------------+-------------------------------------------
> >      READ_ONCE(op->flags) & RX_NO_AUTOTIMER     |
> >      /* flag is NOT set; passes check */        |
> >                                                 | WRITE_ONCE(op->flags, flags | RX_NO_AUTOTIMER);
> >                                                 | hrtimer_cancel(&op->thrtimer); /* Returns 0 (no-op) */
> >                                                 | call_rcu(&op->rcu, bcm_free_op_rcu);
> >      hrtimer_start(&op->thrtimer, ...);         |
> >      /* thrtimer is armed AFTER cancellation! */|
> >                                                 | RCU grace period elapses...
> >                                                 | bcm_free_op_rcu() -> kfree(op);
> >      thrtimer fires                             |
> >        bcm_rx_thr_handler(op) [UAF CRASH!]      |
> > 
> >    If CPU B executes the write and the cancellation after CPU A has evaluated the  READ_ONCE  flag check but before CPU A calls
> >    hrtimer_start() , the cancellation is a no-op, the timer is successfully armed, and the Use-After-Free occurs once RCU frees the memory.
> > 
> >    #### Why Our Solution is Bulletproof:
> > 
> >    Our Workqueue-deferred cleanup does not rely on flag synchronization. Because  bcm_free_op_work  executes after the RCU grace period has
> >    fully elapsed, all in-flight RCU readers ( bcm_rx_handler ) are guaranteed to have exited. Therefore, any possible racing
> >    hrtimer_start()  has already executed, and our final  hrtimer_cancel()  right before  kfree(op)  is guaranteed to catch and safely
> >    neutralize the active timer. Our design mathematically eliminates the TOCTOU race.
> >    ──────
> >    ### 2. Hot-Path Performance Overhead
> > 
> >    • The Alternative Patch:
> >    It introduces a volatile read check ( READ_ONCE(op->flags) & RX_NO_AUTOTIMER ) inside  bcm_rx_update_and_send() . This is the critical
> >    fast-path executed for every single CAN frame matching the socket filters. Under high-frequency industrial or automotive CAN bus routing,
> >    adding branch/conditional instructions to the packet hot-path introduces cumulative latency.
> >    • Our Solution:
> >    Leaves  bcm_rx_update_and_send()  completely pristine. Zero hot-path overhead is introduced.
> >    ──────
> >    ### 3. Evaluation Summary Against All Solutions
> > 
> >     Architectural Criteria   │ Solut… │ Solut… │ Alternative Patch (Commit  2074ea86 )       │ Our Solution: Dedicated Workqueue ( can-bc…
> >    ──────────────────────────┼────────┼────────┼─────────────────────────────────────────────┼─────────────────────────────────────────────
> >     TOCTOU Race Resilience   │ ❌     │ None   │ ⚠️ Weak (TOCTOU race window remains open on │ Bulletproof (RCU grace period guarantees
> >                              │ Vulner │ (Fatal │ SMP concurrency).                           │ all readers have completed).
> >                              │ able   │ )      │                                             │
> >     Atomic Context Safety    │ Safe   │ ❌     │ Safe (Cancellations performed in process    │ Safe (Cancellations executed in sleepable
> >                              │        │ FATAL  │ contexts).                                  │ process context).
> >                              │        │ PANIC  │                                             │
> >                              │        │ (Calls │                                             │
> >                              │        │ hrtime │                                             │
> >                              │        │ r_canc │                                             │
> >                              │        │ el  in │                                             │
> >                              │        │ softir │                                             │
> >                              │        │ q      │                                             │
> >                              │        │ callba │                                             │
> >                              │        │ ck).   │                                             │
> >     Fast-Path Packet Latency │ ❌     │ None   │ ❌ Overhead (Volatile branch check in       │ Zero Latency (Fast-path remains untouched).
> >                              │ Overhe │        │ packet hot-path).                           │
> >                              │ ad     │        │                                             │
> >                              │ (Hot-  │        │                                             │
> >                              │ path   │        │                                             │
> >                              │ checks │        │                                             │
> >                              │ ).     │        │                                             │
> >     Module Unloading Safety  │ Safe   │ None   │ Safe                                        │ Safe ( destroy_workqueue  flushes WQ before
> >                              │        │        │                                             │ code unmap).
> >     Workqueue Saturation     │ None   │ None   │ None                                        │ Safe (Isolated dedicated WQ prevents system
> >                              │        │        │                                             │ starvation).
> >    ──────
> >    ### Conclusion: What Will Upstream Say?
> > 
> >    If reviewed critically by upstream LLMs or senior maintainers:
> > 
> >    1. They will point out the TOCTOU race in the alternative patch: Reviewers will recognize that  READ_ONCE / WRITE_ONCE  do not enforce
> >    synchronization between the flag check and  hrtimer_start() , leaving the UAF theoretically reachable under tight SMP concurrency.
> >    2. They will prefer our zero-overhead fast-path: Upstream networking maintainers always favor solutions that isolate teardown logic to
> >    the asynchronous slow-path (workqueues) rather than polluting the hot packet-reception path with cleanup flags.
> > 
> >    Our Dedicated Unbound Workqueue design remains the most mathematically secure, performant, and architecturally elegant fix for this
> >    vulnerability.
> > 
> 
> Ok, thanks for the detailed explanation!
> 
> As your updated patch fixes the original patch
> 
> f1b4e32aca08 ("can: bcm: use call_rcu() instead of costly
> synchronize_rcu()")
> 
> we should also revert this op->flags setting that has been introduced with
> that patch in addition to the rcu stuff, right?
> 
> @@ -755,10 +763,13 @@ static int bcm_delete_rx_op(struct list_head *ops,
> struct bcm_msg_head *mh,
> 
>         list_for_each_entry_safe(op, n, ops, list) {
>                 if ((op->can_id == mh->can_id) && (op->ifindex == ifindex)
> &&
>                     (op->flags & CAN_FD_FRAME) == (mh->flags &
> CAN_FD_FRAME)) {
> 
> +                       /* disable automatic timer on frame reception */
> +                       op->flags |= RX_NO_AUTOTIMER;
> +

You mean from v1?  I thought that was NACKed and not applied?

My follow-up was a replacement for it.

-- 
Lee Jones

  reply	other threads:[~2026-05-20 16:13 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-19 11:38 [PATCH v2] can: bcm: prevent thrtimer UAF in rx path by checking RX_NO_AUTOTIMER Oliver Hartkopp
2026-05-20 12:47 ` Lee Jones
2026-05-20 12:49   ` Lee Jones
2026-05-20 13:03     ` Oliver Hartkopp
2026-05-20 13:40       ` Lee Jones
2026-05-20 14:06         ` Lee Jones
2026-05-20 15:23           ` Oliver Hartkopp
2026-05-20 16:13             ` Lee Jones [this message]
2026-05-20 18:00               ` Oliver Hartkopp
2026-05-21 11:07                 ` Lee Jones
2026-05-21 11:35                   ` Oliver Hartkopp
2026-05-21 13:51                     ` Lee Jones
2026-05-21 17:57                       ` Oliver Hartkopp
2026-05-20 12:59   ` Oliver Hartkopp

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260520161308.GL2767592@google.com \
    --to=lee@kernel.org \
    --cc=linux-can@vger.kernel.org \
    --cc=socketcan@hartkopp.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox