From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id F008138C41B for ; Tue, 12 May 2026 08:27:57 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778574479; cv=none; b=UC1rkCLviYe5CZh2g/lkyJbsko+TuC8vyKfKlng6UesPqF1mPSB3fzhySnQq2Pd2etsfpkwP+StozrI9cyXSXlXtjjdl/cXfV0wEoPT2KJ2MQydN2E+559+8syahy8ULnG6AV++b90WpIHEMbGZDWJdl6lKiBkp+PVSTU/fGXpk= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778574479; c=relaxed/simple; bh=C6+4mUUN+xcXGYRJtInTfmZlWGUnw9BDkCZMhsxJLOU=; h=Message-ID:Subject:From:To:Cc:Date:In-Reply-To:References: Content-Type:MIME-Version; b=Z3fS9OGiIfMTosgCRIh3rHAgqZYV5XNZJ7VHFK1Vo13yAN5cELAUBXvhosLzYXIP8fn+utnGJ29c3D6iT846pit+EpbOEyRzGITsyK0T16JZMDxYElUj2se6nluQ8i7eHgJje5j8cA79sv3YGYXo3l/I4dljt31b3X0X96aVxN4= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=RMiXQ6NU; dkim=pass (2048-bit key) header.d=redhat.com header.i=@redhat.com header.b=mlYICm4O; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="RMiXQ6NU"; dkim=pass (2048-bit key) header.d=redhat.com header.i=@redhat.com header.b="mlYICm4O" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1778574476; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:autocrypt:autocrypt; bh=vGKQtZGC1c4txfn6v0+W7d7uIVSHAwg4MX9gaJVmjPY=; b=RMiXQ6NUo0iQOLC6jDbpfKZDEif9SlOO/VddC00/3gZ+sRPZOEDwO52Lum+aRz2hMl1zRf H/kzEZQXACGRFBz1keUUKDeJ0hq5fkcy8Yn1ZEM5GcuUAulBlanyVIr6gs7YNk4BRusM85 wyShxfT9/zXEO+EsnUUAXLta2Fo8JjM= Received: from mail-wr1-f70.google.com (mail-wr1-f70.google.com [209.85.221.70]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-610-1kZUsUy_N-uHsOsBKe_GDA-1; Tue, 12 May 2026 04:27:53 -0400 X-MC-Unique: 1kZUsUy_N-uHsOsBKe_GDA-1 X-Mimecast-MFC-AGG-ID: 1kZUsUy_N-uHsOsBKe_GDA_1778574472 Received: by mail-wr1-f70.google.com with SMTP id ffacd0b85a97d-4497a0e3acaso4862229f8f.0 for ; Tue, 12 May 2026 01:27:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=google; t=1778574472; x=1779179272; darn=vger.kernel.org; h=mime-version:user-agent:content-transfer-encoding:autocrypt :references:in-reply-to:date:cc:to:from:subject:message-id:from:to :cc:subject:date:message-id:reply-to; bh=vGKQtZGC1c4txfn6v0+W7d7uIVSHAwg4MX9gaJVmjPY=; b=mlYICm4OcauARv9wS4Wp7BnWuVPcFGNgBSdPBseCtEhkZNyuxcvjrdf6jPNJTdBoLz MGzOZCIcyIqbICM9jWh9YXyHrRbgDF65Zt4SiUqmtRx5UjIPqaXSNrC92R68E9XlOyGJ DcxradXf//4PrPoObNMU9lnadc8wwheOveL19FJxUwwcAPH9u+RJ9Yz75o+DdnGryQ/S grKAb1t086gUy/pM8DlJaQSw+mhup3hjR7PVsxiPD8PcR2ac+zYpF4ONIjHGON1cQVnn U35jxZJ2QOqYfPWIsuNlvq81QK9TsD8craQGSypX3NoNRSdhmcqMO8eG7vhakygjd6a0 ucZw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1778574472; x=1779179272; h=mime-version:user-agent:content-transfer-encoding:autocrypt :references:in-reply-to:date:cc:to:from:subject:message-id:x-gm-gg :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=vGKQtZGC1c4txfn6v0+W7d7uIVSHAwg4MX9gaJVmjPY=; b=oe1BnIzn1tK0zoU/cUuRBTvIqTyg6Vr80ng98QoTI+Hfd2zyyf/QjoGI9MOIhXQb5J i7n/U7m57/YQYUlRoTlTbrQ/B8xXfoIguRnhG6MdsIpa/Q7eIlNlaolxlu5CuMJABS1w EsIH4uWXYwC734hNWobaGawxUXVrI/dH2M4uJPrMOViPhTkKWX4xYRbEP7wZue/u3NZe OMjjBjxT1pR1q29EIzJhm4Bgge9zOnnZqkhRaAZIXJmZ/BmGs+WE803K9WonZeVa/UAL TPjVrvy9X5ydpPIVwqD+j9yW5KAfh7SwBXG1/m71JqDjhs2egnyd8EWJvzLqjUJEi4Lp gMew== X-Forwarded-Encrypted: i=1; AFNElJ/C+ScwJ+YCuy9/5Qjgcncj5KHEZ6K8TWWzgx6Bp0qIwVwFb2h+gpEMKbab36Qc90HPCdHIxkSB65ZMsd8=@vger.kernel.org X-Gm-Message-State: AOJu0Yy/ia5CCJ/hhG6b410aVGuq5YvvychHe8VCZyq4Fq2c5Hvf9haf 0zxs0VYMZl8LdJrXAn+Ed06voo/VJc0O+i0BaqW1WYPzegUvapQE/h/PGiEcBwl9/OXB2VFzNk+ vHe0aApgp5jb8Tkeki58VTzE2vSit8OP9QldYmB6CybPJ3mTXWT/BQJ5RdRRhmeQUbQ== X-Gm-Gg: Acq92OG5qoSPZB1fxkrbk3nCyP0FKTJLFIpRtvs+4uZ/i7ZWD65iwRSzMclnrQ1XDiq xgU63Qe6bEb3ECPWHi2UTVS6T2rdLrArqbxvnarONmaf/jLX9Fs9HHTOMnf3QhFgYDYD6XktzqH M1z+2ZihrcwwxuvoB2YFy9HDivx/m5BqpAgGQ++wYydGLhr6BumGDdIRSjBiXKD8IOhSNnX6n+e M5SUZJiFICjifKH9uwDLNhPMYmtBuj1zCinReFXWWnm4roOaEdHKv6KtCwFtxZ8R7LZW1xwWzkR WIM10TaWVh8P1Q7u5ZLWPJsCePqjM0uDlx0329GuCQlgWcNnL3eBe8rXzmkalijGnvhrEX/pJjU CliFTS08+A8tAuj5zZ4rpKwn8zmBirjQvu0aZoIgmqmWww71zZK3XO2BIqyXfdtAIaw6mgiesDk 0wxBehdqqLN5T2hrk= X-Received: by 2002:a5d:5f82:0:b0:44e:1984:5f3e with SMTP id ffacd0b85a97d-456a46e8db9mr19103462f8f.31.1778574471844; Tue, 12 May 2026 01:27:51 -0700 (PDT) X-Received: by 2002:a5d:5f82:0:b0:44e:1984:5f3e with SMTP id ffacd0b85a97d-456a46e8db9mr19103417f8f.31.1778574471429; Tue, 12 May 2026 01:27:51 -0700 (PDT) Received: from gmonaco-thinkpadt14gen3.rmtit.csb (212-8-243-115.hosted-by-worldstream.net. [212.8.243.115]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-4548e4bbebdsm34017062f8f.5.2026.05.12.01.27.50 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 12 May 2026 01:27:51 -0700 (PDT) Message-ID: Subject: Re: [RFC PATCH v2 02/10] rv/da: fix per-task da_monitor_destroy() ordering and sync From: Gabriele Monaco To: wen.yang@linux.dev Cc: linux-trace-kernel@vger.kernel.org, linux-kernel@vger.kernel.org, Steven Rostedt Date: Tue, 12 May 2026 10:27:49 +0200 In-Reply-To: References: Autocrypt: addr=gmonaco@redhat.com; prefer-encrypt=mutual; keydata=mDMEZuK5YxYJKwYBBAHaRw8BAQdAmJ3dM9Sz6/Hodu33Qrf8QH2bNeNbOikqYtxWFLVm0 1a0JEdhYnJpZWxlIE1vbmFjbyA8Z21vbmFjb0BrZXJuZWwub3JnPoiZBBMWCgBBFiEEysoR+AuB3R Zwp6j270psSVh4TfIFAmjKX2MCGwMFCQWjmoAFCwkIBwICIgIGFQoJCAsCBBYCAwECHgcCF4AACgk Q70psSVh4TfIQuAD+JulczTN6l7oJjyroySU55Fbjdvo52xiYYlMjPG7dCTsBAMFI7dSL5zg98I+8 cXY1J7kyNsY6/dcipqBM4RMaxXsOtCRHYWJyaWVsZSBNb25hY28gPGdtb25hY29AcmVkaGF0LmNvb T6InAQTFgoARAIbAwUJBaOagAULCQgHAgIiAgYVCgkICwIEFgIDAQIeBwIXgBYhBMrKEfgLgd0WcK eo9u9KbElYeE3yBQJoymCyAhkBAAoJEO9KbElYeE3yjX4BAJ/ETNnlHn8OjZPT77xGmal9kbT1bC1 7DfrYVISWV2Y1AP9HdAMhWNAvtCtN2S1beYjNybuK6IzWYcFfeOV+OBWRDQ== Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable User-Agent: Evolution 3.60.1 (3.60.1-1.fc44) Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 On Tue, 2026-05-12 at 02:24 +0800, wen.yang@linux.dev wrote: > From: Wen Yang >=20 > The following two paths race: >=20 > =C2=A0 CPU 0 (disable_stall/__rv_disable_monitor)=C2=A0 CPU 1 (wwnr probe= handler) ^ did you mean stall? > =C2=A0 ------------------------------------------=C2=A0 -----------------= ------------ > =C2=A0 disable_stall() > =C2=A0=C2=A0=C2=A0 da_monitor_destroy() > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 da_monitor_reset_all()=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 <------ [task T: monitoring=3D0] > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 da_monitor_start(&T->rv[n]= ) > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 /* no timer_setup */ > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 monitoring=3D1=C2=A0= <---- > =C2=A0 tracepoint_synchronize_unregister() > =C2=A0 // CPU 1 probe has already returned; sync returns >=20 > Later, enable_stall() acquires the same slot and calls da_monitor_init(): >=20 > =C2=A0 da_monitor_reset_all() > =C2=A0=C2=A0=C2=A0 da_monitor_reset(&T->rv[slot])=C2=A0=C2=A0=C2=A0 // mo= nitoring=3D1, timer.function=3D=3D0 > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 ha_monitor_reset_env() > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 ha_cancel_timer() > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 timer_delete(&ha_m= on->timer)=C2=A0 // ODEBUG: timer never initialised >=20 > =C2=A0 ODEBUG: assert_init not available (active state 0) > =C2=A0 object type: timer_list > =C2=A0 Call trace: timer_delete <- da_monitor_reset_all <- enable_stall >=20 > Call tracepoint_synchronize_unregister() inside da_monitor_destroy() > before da_monitor_reset_all().=C2=A0 The unregister_trace_xxx() calls in = the > monitor's disable() have already disconnected the tracepoints; the sync > here drains any handler still in flight, so no new monitoring=3D1 can > appear after da_monitor_reset_all() clears the slot. >=20 > Also fix the slot release ordering: release the slot only after > reset_all() to avoid accessing rv[] with an out-of-bounds index. >=20 > Fixes: f5587d1b6ec9 ("rv: Add Hybrid Automata monitor type") > Signed-off-by: Wen Yang > --- Thanks for the fix, I have a similar one waiting for submission. These are technically 2 separate fixes though: the ordering with unset task_mon_slot (independent on HA) and the synchronisation with pending tracepoints. They probably deserve separate patches and visibility, the fir= st has always been around and we're technically overwriting who knows what. The explanation above is a bit hard to follow though, are you talking about= a handler for the same (stall) monitor running after the reset, effectively undoing it by setting the monitoring flag? Then this is indeed an issue with ha_monitor_reset_env() which expects a cl= ean environment. So that's basically what you'd see now much more often because in fact we d= on't reset the right slot (though, again, that's a different issue). Calling tracepoint_synchronize_unregister() there too would surely fix, but= it used to be kinda slow. But it's probably gotten faster since now tracepoint= s use SRCU, so we can wait for a dedicated grace period. I liked the idea to wait cumulatively in the end, but that's just making th= ings harder.. Let's do like this: Prepare 2 separate patches as fixes, put the task slot one first (would eas= e backporting), mention this issue with the race condition only in the second= . You can send them independently and I'll add them to the tree as urgent. I'm soon going to send my set of fixes that will also include the task slot patch (not removing to ease my life with conflicts). Thanks, Gabriele > =C2=A0include/rv/da_monitor.h | 18 ++++++++++++++++-- > =C2=A01 file changed, 16 insertions(+), 2 deletions(-) >=20 > diff --git a/include/rv/da_monitor.h b/include/rv/da_monitor.h > index 00ded3d5ab3f..d04bb3229c75 100644 > --- a/include/rv/da_monitor.h > +++ b/include/rv/da_monitor.h > @@ -304,6 +304,20 @@ static int da_monitor_init(void) > =C2=A0 > =C2=A0/* > =C2=A0 * da_monitor_destroy - return the allocated slot > + * > + * Call tracepoint_synchronize_unregister() before reset_all() to close > + * the race where an in-flight non-HA probe handler sets monitoring=3D1 > + * (without calling timer_setup()) after da_monitor_reset_all() has > + * already cleared the slot but before the caller's own sync completes. > + * Without this barrier, an HA_TIMER_WHEEL monitor that later acquires > + * the same slot would call timer_delete() on a never-initialised > + * timer_list, triggering ODEBUG warnings. > + * > + * Note: tracepoint_synchronize_unregister() is a system-wide barrier > + * that waits for all CPUs to finish any in-flight tracepoint handlers. > + * The caller's own __rv_disable_monitor() issues a second sync after > + * returning from disable(); that redundant call is harmless on the > + * infrequent admin (enable/disable) path. > =C2=A0 */ > =C2=A0static inline void da_monitor_destroy(void) > =C2=A0{ > @@ -311,10 +325,10 @@ static inline void da_monitor_destroy(void) > =C2=A0 WARN_ONCE(1, "Disabling a disabled monitor: " > __stringify(MONITOR_NAME)); > =C2=A0 return; > =C2=A0 } > + tracepoint_synchronize_unregister(); > + da_monitor_reset_all(); > =C2=A0 rv_put_task_monitor_slot(task_mon_slot); > =C2=A0 task_mon_slot =3D RV_PER_TASK_MONITOR_INIT; > - > - da_monitor_reset_all(); > =C2=A0} > =C2=A0 > =C2=A0#elif RV_MON_TYPE =3D=3D RV_MON_PER_OBJ