From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from smtp.kernel.org (aws-us-west-2-korg-mail-alma10-1.taild15c8.ts.net [100.103.45.18])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 679AD372ECF
	for <linux-can@vger.kernel.org>; Wed, 20 May 2026 14:07:01 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=100.103.45.18
ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1779286022; cv=none; b=UPpEy/HzZCPiwy8fzzlps8q07j8KFNfk+XfcoSELs4KP/arWE0B5dmEEpNwNrzXmRnWcZ2mAOnU0Z3dv0b8xAKcnVWhvJMMRfhYpRxk8ygJcqY63e8T03bXbTw3our9Ll1RZcUbmBQkQdqzMK0YKQIEnV8vvLCtkgKIDxRngRo0=
ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1779286022; c=relaxed/simple;
	bh=EljRWYJl4c1hzX/ObDCR0FnwasIUtQ0+54npXZOFUHk=;
	h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version:
	 Content-Type:Content-Disposition:In-Reply-To; b=gwCDEeCm43YhLOjQuN+KN0SUBdUPIcQ/v2fslDThNSg2wCpnHQeKysVn+oMv7qsrF1wlG+JHy4QVdozYSQJH5C+vAZYo5gwQOcAVFEbQiNaluTfWE8EnA+SA1szvX/UU34tKdqyDzd1DNV5yrJuFqH3h9SeU3gKTMWiqaOzGg7w=
ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=jWTbXJJs; arc=none smtp.client-ip=100.103.45.18
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="jWTbXJJs"
Received: by smtp.kernel.org (Postfix) with ESMTPSA id 4B8171F00893;
	Wed, 20 May 2026 14:07:00 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel.org;
	s=k20260515; t=1779286021;
	bh=EyuhX62vGt1kaM9F4b6+MI1UN8/crU6emmqTaFs05No=;
	h=Date:From:To:Cc:Subject:References:In-Reply-To;
	b=jWTbXJJsDVx76rDTR6NE4fdF6Nr0ZWdvV0YRQR6gtCT6b7vHwnTGm9bPVAj3yFpsW
	 0ASnt2xnt1ejMYI9ZylLpmeZru8wVeA//3tELwPazGiwiBoy79pHQ3fefw7i4L2Wkk
	 7QcY7mNew7WZpixdaTOqnm93gDxFlgY3vMYdzAtbKnfuAoSsGrLuuVnAJ8tQK6L7nn
	 +xJ/xNY9NgoJ64bJKNr46g60etyO13jzjkRK6S43rjaAhKD9UOYVosGVTbtOcH5qxK
	 jTXXm4oCiRg+HDWHG/6w5EaIEtOUW8apqHuEWSIdRuRaQukogKjEBMSiTOKMI0VHth
	 VKJ42+vtkJNbQ==
Date: Wed, 20 May 2026 15:06:57 +0100
From: Lee Jones <lee@kernel.org>
To: Oliver Hartkopp <socketcan@hartkopp.net>
Cc: linux-can@vger.kernel.org
Subject: Re: [PATCH v2] can: bcm: prevent thrtimer UAF in rx path by checking
 RX_NO_AUTOTIMER
Message-ID: <20260520140657.GC2767592@google.com>
References: <20260519113806.85456-1-socketcan@hartkopp.net>
 <20260520124758.GA305027@google.com>
 <20260520124907.GB305027@google.com>
 <442a92c9-5810-4fcd-ab05-5b0acd0f345c@hartkopp.net>
 <20260520134032.GA2767592@google.com>
Precedence: bulk
X-Mailing-List: linux-can@vger.kernel.org
List-Id: <linux-can.vger.kernel.org>
List-Subscribe: <mailto:linux-can+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-can+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Disposition: inline
Content-Transfer-Encoding: 8bit
In-Reply-To: <20260520134032.GA2767592@google.com>

On Wed, 20 May 2026, Lee Jones wrote:

> On Wed, 20 May 2026, Oliver Hartkopp wrote:
> 
> > 
> > 
> > On 20.05.26 14:49, Lee Jones wrote:
> > > On Wed, 20 May 2026, Lee Jones wrote:
> > > 
> > > > On Tue, 19 May 2026, Oliver Hartkopp wrote:
> > > > 
> > > > > From: Lee Jones <lee@kernel.org>
> > > > > 
> > > > > Commit f1b4e32aca08 ("can: bcm: use call_rcu() instead of costly
> > > > > synchronize_rcu()") removed the synchronize_rcu() call from
> > > > > bcm_delete_rx_op() and introduced the RX_NO_AUTOTIMER flag to prevent
> > > > > timers from being rearmed during deletion.  However, it only applied
> > > > > this check to op->timer via bcm_rx_starttimer().
> > > > > 
> > > > > It missed the fact that op->thrtimer can also be rearmed by an
> > > > > in-flight bcm_rx_handler() (which runs as an RCU reader) via
> > > > > bcm_rx_update_and_send().  This allows op->thrtimer to be queued after
> > > > > bcm_remove_op() has already cancelled it, leading to a use-after-free
> > > > > when the timer fires on the deferred-freed struct bcm_op.
> > > > > 
> > > > > Address the omission by checking the RX_NO_AUTOTIMER flag
> > > > > in bcm_rx_update_and_send() before starting op->thrtimer, effectively
> > > > > preventing it from being rearmed concurrently with teardown.
> > > > > 
> > > > > [Hartkopp] Added the setting of RX_NO_AUTOTIMER also to bcm_release() before
> > > > > removing the CAN filters following the bcm_delete_rx_op() approach.
> > > > > 
> > > > > Additionally WRITE_ONCE()/READ_ONCE() macros have been introduced for
> > > > > the changes of RX_NO_AUTOTIMER at rx op removal time to prevent a
> > > > > potential code reordering of RX_NO_AUTOTIMER setting after CAN filter removal.
> > > > > 
> > > > > Signed-off-by: Lee Jones <lee@kernel.org>
> > > > > Co-developed-by: Oliver Hartkopp <socketcan@hartkopp.net>
> > > > 
> > > > You did?  Can you add a note saying what you changed please?
> > > 
> > > FYI, did you also see the second swing I took at this:
> > > 
> > > https://lore.kernel.org/r/20260520080523.2513957-1-lee@kernel.org
> > 
> > Yes, and I answered to your patch.
> > 
> > Is there some lag in the e-mail communication right now?
> > 
> > That's why I also wondered why you sent a patch one day after my v2
> > proposal.
> 
> Right.  I only saw your proposal today.
>
> I've been working the alternative since Jakub NACKed the first submission.

Okay, so I fed both of our v2 fixes into Gemini Next and requested a
critical review of both approaches.  The TL;DR is that this v2 is better
than my v1, but still contains the reported race and isn't as solid as
the work queue solution.

In the interest of full disclosure, here is the full analysis for your perusal:

  I have critically evaluated the alternative patch ( branch: b-499356389-can-bcm-uaf-v2 ) currently in contention on the mailing list.

  While this alternative patch represents a highly refined version of the "flag check" approach (incorporating memory barriers and socket
  release hooks), it is still architecturally inferior to our Workqueue-Deferred Process Context Cleanup.

  Below is the critical technical comparison of how the alternative patch fares against our solution, focused on race resilience,
  performance hot-paths, and upstream validation.
  ──────
  ### 1. The TOCTOU Race Window is Still Theoretically Open

  The core mechanism of the alternative patch relies on setting  RX_NO_AUTOTIMER  via  WRITE_ONCE  and checking it inside the hot-path (
  bcm_rx_update_and_send ) via  READ_ONCE .

  While  READ_ONCE / WRITE_ONCE  enforce  volatile  memory accesses and prevent compiler reorderings, they do not provide hardware-level
  atomic synchronization or serialization (such as spinlocks or memory barriers like  smp_mb() ) between the check and the timer arming.

  This leaves a classic Time-of-Check to Time-of-Use (TOCTOU) race window open:

    CPU A (RCU Reader: bcm_rx_handler)         | CPU B (Writer: bcm_delete_rx_op)
    -------------------------------------------+-------------------------------------------
    READ_ONCE(op->flags) & RX_NO_AUTOTIMER     |
    /* flag is NOT set; passes check */        |
                                               | WRITE_ONCE(op->flags, flags | RX_NO_AUTOTIMER);
                                               | hrtimer_cancel(&op->thrtimer); /* Returns 0 (no-op) */
                                               | call_rcu(&op->rcu, bcm_free_op_rcu);
    hrtimer_start(&op->thrtimer, ...);         |
    /* thrtimer is armed AFTER cancellation! */|
                                               | RCU grace period elapses...
                                               | bcm_free_op_rcu() -> kfree(op);
    thrtimer fires                             |
      bcm_rx_thr_handler(op) [UAF CRASH!]      |

  If CPU B executes the write and the cancellation after CPU A has evaluated the  READ_ONCE  flag check but before CPU A calls
  hrtimer_start() , the cancellation is a no-op, the timer is successfully armed, and the Use-After-Free occurs once RCU frees the memory.

  #### Why Our Solution is Bulletproof:

  Our Workqueue-deferred cleanup does not rely on flag synchronization. Because  bcm_free_op_work  executes after the RCU grace period has
  fully elapsed, all in-flight RCU readers ( bcm_rx_handler ) are guaranteed to have exited. Therefore, any possible racing
  hrtimer_start()  has already executed, and our final  hrtimer_cancel()  right before  kfree(op)  is guaranteed to catch and safely
  neutralize the active timer. Our design mathematically eliminates the TOCTOU race.
  ──────
  ### 2. Hot-Path Performance Overhead

  • The Alternative Patch:
  It introduces a volatile read check ( READ_ONCE(op->flags) & RX_NO_AUTOTIMER ) inside  bcm_rx_update_and_send() . This is the critical
  fast-path executed for every single CAN frame matching the socket filters. Under high-frequency industrial or automotive CAN bus routing,
  adding branch/conditional instructions to the packet hot-path introduces cumulative latency.
  • Our Solution:
  Leaves  bcm_rx_update_and_send()  completely pristine. Zero hot-path overhead is introduced.
  ──────
  ### 3. Evaluation Summary Against All Solutions

   Architectural Criteria   │ Solut… │ Solut… │ Alternative Patch (Commit  2074ea86 )       │ Our Solution: Dedicated Workqueue ( can-bc…
  ──────────────────────────┼────────┼────────┼─────────────────────────────────────────────┼─────────────────────────────────────────────
   TOCTOU Race Resilience   │ ❌     │ None   │ ⚠️ Weak (TOCTOU race window remains open on │ Bulletproof (RCU grace period guarantees
                            │ Vulner │ (Fatal │ SMP concurrency).                           │ all readers have completed).
                            │ able   │ )      │                                             │
   Atomic Context Safety    │ Safe   │ ❌     │ Safe (Cancellations performed in process    │ Safe (Cancellations executed in sleepable
                            │        │ FATAL  │ contexts).                                  │ process context).
                            │        │ PANIC  │                                             │
                            │        │ (Calls │                                             │
                            │        │ hrtime │                                             │
                            │        │ r_canc │                                             │
                            │        │ el  in │                                             │
                            │        │ softir │                                             │
                            │        │ q      │                                             │
                            │        │ callba │                                             │
                            │        │ ck).   │                                             │
   Fast-Path Packet Latency │ ❌     │ None   │ ❌ Overhead (Volatile branch check in       │ Zero Latency (Fast-path remains untouched).
                            │ Overhe │        │ packet hot-path).                           │
                            │ ad     │        │                                             │
                            │ (Hot-  │        │                                             │
                            │ path   │        │                                             │
                            │ checks │        │                                             │
                            │ ).     │        │                                             │
   Module Unloading Safety  │ Safe   │ None   │ Safe                                        │ Safe ( destroy_workqueue  flushes WQ before
                            │        │        │                                             │ code unmap).
   Workqueue Saturation     │ None   │ None   │ None                                        │ Safe (Isolated dedicated WQ prevents system
                            │        │        │                                             │ starvation).
  ──────
  ### Conclusion: What Will Upstream Say?

  If reviewed critically by upstream LLMs or senior maintainers:

  1. They will point out the TOCTOU race in the alternative patch: Reviewers will recognize that  READ_ONCE / WRITE_ONCE  do not enforce
  synchronization between the flag check and  hrtimer_start() , leaving the UAF theoretically reachable under tight SMP concurrency.
  2. They will prefer our zero-overhead fast-path: Upstream networking maintainers always favor solutions that isolate teardown logic to
  the asynchronous slow-path (workqueues) rather than polluting the hot packet-reception path with cleanup flags.

  Our Dedicated Unbound Workqueue design remains the most mathematically secure, performant, and architecturally elegant fix for this
  vulnerability.

--
Lee Jones