From: Lee Jones <lee@kernel.org>
To: Oliver Hartkopp <socketcan@hartkopp.net>
Cc: linux-can@vger.kernel.org
Subject: Re: [PATCH v2] can: bcm: prevent thrtimer UAF in rx path by checking RX_NO_AUTOTIMER
Date: Wed, 20 May 2026 17:13:08 +0100 [thread overview]
Message-ID: <20260520161308.GL2767592@google.com> (raw)
In-Reply-To: <24a20c37-5cac-4a38-a8f1-ed98b38f7e1d@hartkopp.net>
On Wed, 20 May 2026, Oliver Hartkopp wrote:
>
>
> On 20.05.26 16:06, Lee Jones wrote:
> > On Wed, 20 May 2026, Lee Jones wrote:
> >
> > > On Wed, 20 May 2026, Oliver Hartkopp wrote:
> > >
> > > >
> > > >
> > > > On 20.05.26 14:49, Lee Jones wrote:
> > > > > On Wed, 20 May 2026, Lee Jones wrote:
> > > > >
> > > > > > On Tue, 19 May 2026, Oliver Hartkopp wrote:
> > > > > >
> > > > > > > From: Lee Jones <lee@kernel.org>
> > > > > > >
> > > > > > > Commit f1b4e32aca08 ("can: bcm: use call_rcu() instead of costly
> > > > > > > synchronize_rcu()") removed the synchronize_rcu() call from
> > > > > > > bcm_delete_rx_op() and introduced the RX_NO_AUTOTIMER flag to prevent
> > > > > > > timers from being rearmed during deletion. However, it only applied
> > > > > > > this check to op->timer via bcm_rx_starttimer().
> > > > > > >
> > > > > > > It missed the fact that op->thrtimer can also be rearmed by an
> > > > > > > in-flight bcm_rx_handler() (which runs as an RCU reader) via
> > > > > > > bcm_rx_update_and_send(). This allows op->thrtimer to be queued after
> > > > > > > bcm_remove_op() has already cancelled it, leading to a use-after-free
> > > > > > > when the timer fires on the deferred-freed struct bcm_op.
> > > > > > >
> > > > > > > Address the omission by checking the RX_NO_AUTOTIMER flag
> > > > > > > in bcm_rx_update_and_send() before starting op->thrtimer, effectively
> > > > > > > preventing it from being rearmed concurrently with teardown.
> > > > > > >
> > > > > > > [Hartkopp] Added the setting of RX_NO_AUTOTIMER also to bcm_release() before
> > > > > > > removing the CAN filters following the bcm_delete_rx_op() approach.
> > > > > > >
> > > > > > > Additionally WRITE_ONCE()/READ_ONCE() macros have been introduced for
> > > > > > > the changes of RX_NO_AUTOTIMER at rx op removal time to prevent a
> > > > > > > potential code reordering of RX_NO_AUTOTIMER setting after CAN filter removal.
> > > > > > >
> > > > > > > Signed-off-by: Lee Jones <lee@kernel.org>
> > > > > > > Co-developed-by: Oliver Hartkopp <socketcan@hartkopp.net>
> > > > > >
> > > > > > You did? Can you add a note saying what you changed please?
> > > > >
> > > > > FYI, did you also see the second swing I took at this:
> > > > >
> > > > > https://lore.kernel.org/r/20260520080523.2513957-1-lee@kernel.org
> > > >
> > > > Yes, and I answered to your patch.
> > > >
> > > > Is there some lag in the e-mail communication right now?
> > > >
> > > > That's why I also wondered why you sent a patch one day after my v2
> > > > proposal.
> > >
> > > Right. I only saw your proposal today.
> > >
> > > I've been working the alternative since Jakub NACKed the first submission.
> >
> > Okay, so I fed both of our v2 fixes into Gemini Next and requested a
> > critical review of both approaches. The TL;DR is that this v2 is better
> > than my v1, but still contains the reported race and isn't as solid as
> > the work queue solution.
> >
> > In the interest of full disclosure, here is the full analysis for your perusal:
> >
> > I have critically evaluated the alternative patch ( branch: b-499356389-can-bcm-uaf-v2 ) currently in contention on the mailing list.
> >
> > While this alternative patch represents a highly refined version of the "flag check" approach (incorporating memory barriers and socket
> > release hooks), it is still architecturally inferior to our Workqueue-Deferred Process Context Cleanup.
> >
> > Below is the critical technical comparison of how the alternative patch fares against our solution, focused on race resilience,
> > performance hot-paths, and upstream validation.
> > ──────
> > ### 1. The TOCTOU Race Window is Still Theoretically Open
> >
> > The core mechanism of the alternative patch relies on setting RX_NO_AUTOTIMER via WRITE_ONCE and checking it inside the hot-path (
> > bcm_rx_update_and_send ) via READ_ONCE .
> >
> > While READ_ONCE / WRITE_ONCE enforce volatile memory accesses and prevent compiler reorderings, they do not provide hardware-level
> > atomic synchronization or serialization (such as spinlocks or memory barriers like smp_mb() ) between the check and the timer arming.
> >
> > This leaves a classic Time-of-Check to Time-of-Use (TOCTOU) race window open:
> >
> > CPU A (RCU Reader: bcm_rx_handler) | CPU B (Writer: bcm_delete_rx_op)
> > -------------------------------------------+-------------------------------------------
> > READ_ONCE(op->flags) & RX_NO_AUTOTIMER |
> > /* flag is NOT set; passes check */ |
> > | WRITE_ONCE(op->flags, flags | RX_NO_AUTOTIMER);
> > | hrtimer_cancel(&op->thrtimer); /* Returns 0 (no-op) */
> > | call_rcu(&op->rcu, bcm_free_op_rcu);
> > hrtimer_start(&op->thrtimer, ...); |
> > /* thrtimer is armed AFTER cancellation! */|
> > | RCU grace period elapses...
> > | bcm_free_op_rcu() -> kfree(op);
> > thrtimer fires |
> > bcm_rx_thr_handler(op) [UAF CRASH!] |
> >
> > If CPU B executes the write and the cancellation after CPU A has evaluated the READ_ONCE flag check but before CPU A calls
> > hrtimer_start() , the cancellation is a no-op, the timer is successfully armed, and the Use-After-Free occurs once RCU frees the memory.
> >
> > #### Why Our Solution is Bulletproof:
> >
> > Our Workqueue-deferred cleanup does not rely on flag synchronization. Because bcm_free_op_work executes after the RCU grace period has
> > fully elapsed, all in-flight RCU readers ( bcm_rx_handler ) are guaranteed to have exited. Therefore, any possible racing
> > hrtimer_start() has already executed, and our final hrtimer_cancel() right before kfree(op) is guaranteed to catch and safely
> > neutralize the active timer. Our design mathematically eliminates the TOCTOU race.
> > ──────
> > ### 2. Hot-Path Performance Overhead
> >
> > • The Alternative Patch:
> > It introduces a volatile read check ( READ_ONCE(op->flags) & RX_NO_AUTOTIMER ) inside bcm_rx_update_and_send() . This is the critical
> > fast-path executed for every single CAN frame matching the socket filters. Under high-frequency industrial or automotive CAN bus routing,
> > adding branch/conditional instructions to the packet hot-path introduces cumulative latency.
> > • Our Solution:
> > Leaves bcm_rx_update_and_send() completely pristine. Zero hot-path overhead is introduced.
> > ──────
> > ### 3. Evaluation Summary Against All Solutions
> >
> > Architectural Criteria │ Solut… │ Solut… │ Alternative Patch (Commit 2074ea86 ) │ Our Solution: Dedicated Workqueue ( can-bc…
> > ──────────────────────────┼────────┼────────┼─────────────────────────────────────────────┼─────────────────────────────────────────────
> > TOCTOU Race Resilience │ ❌ │ None │ ⚠️ Weak (TOCTOU race window remains open on │ Bulletproof (RCU grace period guarantees
> > │ Vulner │ (Fatal │ SMP concurrency). │ all readers have completed).
> > │ able │ ) │ │
> > Atomic Context Safety │ Safe │ ❌ │ Safe (Cancellations performed in process │ Safe (Cancellations executed in sleepable
> > │ │ FATAL │ contexts). │ process context).
> > │ │ PANIC │ │
> > │ │ (Calls │ │
> > │ │ hrtime │ │
> > │ │ r_canc │ │
> > │ │ el in │ │
> > │ │ softir │ │
> > │ │ q │ │
> > │ │ callba │ │
> > │ │ ck). │ │
> > Fast-Path Packet Latency │ ❌ │ None │ ❌ Overhead (Volatile branch check in │ Zero Latency (Fast-path remains untouched).
> > │ Overhe │ │ packet hot-path). │
> > │ ad │ │ │
> > │ (Hot- │ │ │
> > │ path │ │ │
> > │ checks │ │ │
> > │ ). │ │ │
> > Module Unloading Safety │ Safe │ None │ Safe │ Safe ( destroy_workqueue flushes WQ before
> > │ │ │ │ code unmap).
> > Workqueue Saturation │ None │ None │ None │ Safe (Isolated dedicated WQ prevents system
> > │ │ │ │ starvation).
> > ──────
> > ### Conclusion: What Will Upstream Say?
> >
> > If reviewed critically by upstream LLMs or senior maintainers:
> >
> > 1. They will point out the TOCTOU race in the alternative patch: Reviewers will recognize that READ_ONCE / WRITE_ONCE do not enforce
> > synchronization between the flag check and hrtimer_start() , leaving the UAF theoretically reachable under tight SMP concurrency.
> > 2. They will prefer our zero-overhead fast-path: Upstream networking maintainers always favor solutions that isolate teardown logic to
> > the asynchronous slow-path (workqueues) rather than polluting the hot packet-reception path with cleanup flags.
> >
> > Our Dedicated Unbound Workqueue design remains the most mathematically secure, performant, and architecturally elegant fix for this
> > vulnerability.
> >
>
> Ok, thanks for the detailed explanation!
>
> As your updated patch fixes the original patch
>
> f1b4e32aca08 ("can: bcm: use call_rcu() instead of costly
> synchronize_rcu()")
>
> we should also revert this op->flags setting that has been introduced with
> that patch in addition to the rcu stuff, right?
>
> @@ -755,10 +763,13 @@ static int bcm_delete_rx_op(struct list_head *ops,
> struct bcm_msg_head *mh,
>
> list_for_each_entry_safe(op, n, ops, list) {
> if ((op->can_id == mh->can_id) && (op->ifindex == ifindex)
> &&
> (op->flags & CAN_FD_FRAME) == (mh->flags &
> CAN_FD_FRAME)) {
>
> + /* disable automatic timer on frame reception */
> + op->flags |= RX_NO_AUTOTIMER;
> +
You mean from v1? I thought that was NACKed and not applied?
My follow-up was a replacement for it.
--
Lee Jones
next prev parent reply other threads:[~2026-05-20 16:13 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-19 11:38 [PATCH v2] can: bcm: prevent thrtimer UAF in rx path by checking RX_NO_AUTOTIMER Oliver Hartkopp
2026-05-20 12:47 ` Lee Jones
2026-05-20 12:49 ` Lee Jones
2026-05-20 13:03 ` Oliver Hartkopp
2026-05-20 13:40 ` Lee Jones
2026-05-20 14:06 ` Lee Jones
2026-05-20 15:23 ` Oliver Hartkopp
2026-05-20 16:13 ` Lee Jones [this message]
2026-05-20 18:00 ` Oliver Hartkopp
2026-05-21 11:07 ` Lee Jones
2026-05-21 11:35 ` Oliver Hartkopp
2026-05-21 13:51 ` Lee Jones
2026-05-21 17:57 ` Oliver Hartkopp
2026-05-20 12:59 ` Oliver Hartkopp
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260520161308.GL2767592@google.com \
--to=lee@kernel.org \
--cc=linux-can@vger.kernel.org \
--cc=socketcan@hartkopp.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox