From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 441CA3921D5 for ; Tue, 12 May 2026 09:09:32 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778576975; cv=none; b=TJm8ukEbtcmTMvLCYEaOA7JXHxZj6W8qtqSUfRFnrgelbAU37Pd0YGD2yFePsgN365sjJaskovTfjQ0LeHnIgZMpWO+qlrYJsZY0OPgS56kVirhQJeMfgqMTa4KB1bexmvPmdRbTp1R4EfcLDAMuQ8MM3aedi0J6WQovCiJFxEk= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778576975; c=relaxed/simple; bh=Evou64IGsSUHPxrCGLiKqER91l/WTD11lBcORJlt7cI=; h=Message-ID:Subject:From:To:Cc:Date:In-Reply-To:References: Content-Type:MIME-Version; b=aqhTxJKqZQl6vHn7NW80yF50HyGxpPiCWLGj/zzAmXDVLNqVOPeUXh5hA6FMciEvQHTSP1+n1qUW000ZAYIJum6vJupl76Z3hONDmsKiewbo8W/htmfW4ZrSzhUz27LmG2vzevZ3bS1Uf/jk09Iw+OZYhwmp5THwZoChy2Rt1kM= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=THUhzq/s; dkim=pass (2048-bit key) header.d=redhat.com header.i=@redhat.com header.b=P0isIGPX; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="THUhzq/s"; dkim=pass (2048-bit key) header.d=redhat.com header.i=@redhat.com header.b="P0isIGPX" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1778576971; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:autocrypt:autocrypt; bh=6ku4PPIzPxXA06bNM7AJ3zKbTlFOq3BIrLLlOUlZKso=; b=THUhzq/shhQleU9VksmiU/gHsgxYcDMirize04w1wvHCuvPkcD8Ne4AlW+tpAnzu06JolB JNjLBtd1pdpvAGa2U2EL/ncf0ntpHy+jcDaG2l4QDF1NZmPYaOsopuxYxTLtEckld08iuS SaXTG/r4iBu1FcRYTqaGpNY+YNiMmlo= Received: from mail-wr1-f70.google.com (mail-wr1-f70.google.com [209.85.221.70]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-500-8CSlsMR-OXuBeIm5ef18pA-1; Tue, 12 May 2026 05:09:29 -0400 X-MC-Unique: 8CSlsMR-OXuBeIm5ef18pA-1 X-Mimecast-MFC-AGG-ID: 8CSlsMR-OXuBeIm5ef18pA_1778576968 Received: by mail-wr1-f70.google.com with SMTP id ffacd0b85a97d-45b7ff2140eso213735f8f.3 for ; Tue, 12 May 2026 02:09:29 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=google; t=1778576968; x=1779181768; darn=vger.kernel.org; h=mime-version:user-agent:content-transfer-encoding:autocrypt :references:in-reply-to:date:cc:to:from:subject:message-id:from:to :cc:subject:date:message-id:reply-to; bh=6ku4PPIzPxXA06bNM7AJ3zKbTlFOq3BIrLLlOUlZKso=; b=P0isIGPXsj5gjpXQeZOOaPvQM9VKr1gr7x7fzymBPSH9qGuWZv2eBOQNR/sqMFj8FH 6rhHIKGUjmfxMsvLNHqOvNjmt1uMNS9m5IRlU60nhhPsBOi+lGhasqJu0sNdtlvj0UOv CZ5vLnd1tDoe7gSrIfO4A8ACDPOrZq7YMd0ZgRyYCo+W2rOPTfF1NYPsSypnm0H0JTut 9wFtUlhwKKMShMuyyAytLqQmXEjeD8OsGR1rMZx3EZ74o4NEFK62F31t2nFSIacxi70v GJAjhspQuaONomZzEXVFxTSJAqyA8VTz5tgy4SCYgDCno5RyCFgxOp1JLdk6ljNHQx/W u9AA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1778576968; x=1779181768; h=mime-version:user-agent:content-transfer-encoding:autocrypt :references:in-reply-to:date:cc:to:from:subject:message-id:x-gm-gg :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=6ku4PPIzPxXA06bNM7AJ3zKbTlFOq3BIrLLlOUlZKso=; b=ctfSXHhJ0+AncKni0QQbotEV0vaktoSj1pz+Ahnb6N7Pm8d4myjE+RkXRjPTre87kz qn2ZZPXLjnpPof1mqybG2LD/1mfPKK2Rre38M+d7zLZ8m+r/gtfWuo3J4WV3H7jJlnbY DAWfAYwsdUM6+NKx7lc3yV0/jqX/UoDQHTEpMgw06G+rG+3coWYvYuAEua/uXDk5pKVP gWQI5AoTuFvoSGfKQBb39V0XZ1FpID4Rcy8A6bzU8YOCrDInnSHGEHZL0nluvx0mqkD3 uyJ0XRptpy0zqGbnMEkYN654wjP1YQF7n0oDrgZAxjEMNTtyy77BA4bexfqYY714It16 LQbA== X-Forwarded-Encrypted: i=1; AFNElJ8UWJUTheEDOFaLIJ89dyu8ddixl1SoQBdlSuC3hXcQ32echMgHTbSBNdCh9WSCvV2JYHykDU4xNMjXPjc=@vger.kernel.org X-Gm-Message-State: AOJu0Yz2MiI9BEgastEqdBmXVd/qWmHb8T/DaGeWtes0La7DrjrrI6uB NOn0Hyp+RkIzeGBh4AhExT95qiNkbwxeSdkjBgF/FJm75WP0a+SJz2sOz0IwdLo9V6gzMEwcYbe 7QlX+7DMF49erpYxwrV3mpojH9shXdVga0z7vn5ucVsXC6xWYFIb2EDA6fTaXHcicsg== X-Gm-Gg: Acq92OFDnYqR3dBOj+L8UVlZvHhiOxHkFOiaPt7VY+V1yrR7KM6PJG8vJ5B1ayA6fOq /KWdm99rXG/nJhEKnvG0xVkTnq3MtD5BDJEm+MvlRZgpEkR2lybz0rO6IRjnV64/Uk0A/QMm2Vi iyOU0ojAAgHQTrDs162wjn2qEwVePKJqf/RSIyDi6wZLVsYGRaYLNoPGvkt+52IU8qP1Xi+ZPXZ A/MqX4OUKW0y+j0ot0gOPk8Eh5vpf4ZYqD/jYsuIsp/19kN5v0GqjRnh9P8PcDfLzQvBX1YyRBM YwD/MTZg+kieYNYceOzG1OFsHrrNllMF9MfWER2GPYGgcL2EBteJtJX5Dj6Rklo5LQy5r0iK7lI LzpQOnPiAz1zB/nsWb+WB81Erp8zyMEbZGnIrxrKAwTM/ZMw7HS6dYTygrjrbUapHdWNxlsUryM d2xj/rEPinv8qiRUs= X-Received: by 2002:a05:600c:8b62:b0:47e:e2eb:bc22 with SMTP id 5b1f17b1804b1-48e6748afe8mr262618345e9.5.1778576968305; Tue, 12 May 2026 02:09:28 -0700 (PDT) X-Received: by 2002:a05:600c:8b62:b0:47e:e2eb:bc22 with SMTP id 5b1f17b1804b1-48e6748afe8mr262617945e9.5.1778576967888; Tue, 12 May 2026 02:09:27 -0700 (PDT) Received: from gmonaco-thinkpadt14gen3.rmtit.csb (212-8-243-115.hosted-by-worldstream.net. [212.8.243.115]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-454917d57aesm31655504f8f.26.2026.05.12.02.09.26 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 12 May 2026 02:09:27 -0700 (PDT) Message-ID: <8e80cbcf739304de95356f1fac677261628977fa.camel@redhat.com> Subject: Re: [RFC PATCH v2 02/10] rv/da: fix per-task da_monitor_destroy() ordering and sync From: Gabriele Monaco To: wen.yang@linux.dev Cc: linux-trace-kernel@vger.kernel.org, linux-kernel@vger.kernel.org, Steven Rostedt Date: Tue, 12 May 2026 11:09:26 +0200 In-Reply-To: References: Autocrypt: addr=gmonaco@redhat.com; prefer-encrypt=mutual; keydata=mDMEZuK5YxYJKwYBBAHaRw8BAQdAmJ3dM9Sz6/Hodu33Qrf8QH2bNeNbOikqYtxWFLVm0 1a0JEdhYnJpZWxlIE1vbmFjbyA8Z21vbmFjb0BrZXJuZWwub3JnPoiZBBMWCgBBFiEEysoR+AuB3R Zwp6j270psSVh4TfIFAmjKX2MCGwMFCQWjmoAFCwkIBwICIgIGFQoJCAsCBBYCAwECHgcCF4AACgk Q70psSVh4TfIQuAD+JulczTN6l7oJjyroySU55Fbjdvo52xiYYlMjPG7dCTsBAMFI7dSL5zg98I+8 cXY1J7kyNsY6/dcipqBM4RMaxXsOtCRHYWJyaWVsZSBNb25hY28gPGdtb25hY29AcmVkaGF0LmNvb T6InAQTFgoARAIbAwUJBaOagAULCQgHAgIiAgYVCgkICwIEFgIDAQIeBwIXgBYhBMrKEfgLgd0WcK eo9u9KbElYeE3yBQJoymCyAhkBAAoJEO9KbElYeE3yjX4BAJ/ETNnlHn8OjZPT77xGmal9kbT1bC1 7DfrYVISWV2Y1AP9HdAMhWNAvtCtN2S1beYjNybuK6IzWYcFfeOV+OBWRDQ== Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable User-Agent: Evolution 3.60.1 (3.60.1-1.fc44) Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 On Tue, 2026-05-12 at 10:27 +0200, Gabriele Monaco wrote: > On Tue, 2026-05-12 at 02:24 +0800, wen.yang@linux.dev=C2=A0wrote: > > From: Wen Yang > >=20 > > The following two paths race: > >=20 > > =C2=A0 CPU 0 (disable_stall/__rv_disable_monitor)=C2=A0 CPU 1 (wwnr pro= be handler) > ^ did you mean stall? Ok I got it now, so essentially you'd reproduce it like: * start a DA per-task monitor (no timer) * stop it, a handler is still running after reset, it sets monitoring back = to 1 * start an HA per-task monitor that would use the same slot that is now looking like: { monitoring =3D 1, timer.function =3D NULL } because it was not initialised as HA but monitoring was reset in the race. Thinking about this again, it isn't just an issue with per-task monitors, a= ll monitors reusing slots would suffer from it. Besides, relying on monitoring can be fragile when using LTL monitors on th= e same task (those don't even have monitoring). Perhaps the solution isn't that trivial, I'm going to give one more thought= on it, but thanks again for bringing this up! Gabriele > > =C2=A0 ------------------------------------------=C2=A0 ---------------= -------------- > > =C2=A0 disable_stall() > > =C2=A0=C2=A0=C2=A0 da_monitor_destroy() > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 da_monitor_reset_all()=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 <------ [task T: monitoring=3D0] > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 da_monitor_start(&T->rv= [n]) > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 /* no timer_setup */ > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 monitoring=3D1=C2= =A0 <---- > > =C2=A0 tracepoint_synchronize_unregister() > > =C2=A0 // CPU 1 probe has already returned; sync returns > >=20 > > Later, enable_stall() acquires the same slot and calls da_monitor_init(= ): > >=20 > > =C2=A0 da_monitor_reset_all() > > =C2=A0=C2=A0=C2=A0 da_monitor_reset(&T->rv[slot])=C2=A0=C2=A0=C2=A0 // = monitoring=3D1, timer.function=3D=3D0 > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 ha_monitor_reset_env() > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 ha_cancel_timer() > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 timer_delete(&ha= _mon->timer)=C2=A0 // ODEBUG: timer never initialised > >=20 > > =C2=A0 ODEBUG: assert_init not available (active state 0) > > =C2=A0 object type: timer_list > > =C2=A0 Call trace: timer_delete <- da_monitor_reset_all <- enable_stall > >=20 > > Call tracepoint_synchronize_unregister() inside da_monitor_destroy() > > before da_monitor_reset_all().=C2=A0 The unregister_trace_xxx() calls i= n the > > monitor's disable() have already disconnected the tracepoints; the sync > > here drains any handler still in flight, so no new monitoring=3D1 can > > appear after da_monitor_reset_all() clears the slot. > >=20 > > Also fix the slot release ordering: release the slot only after > > reset_all() to avoid accessing rv[] with an out-of-bounds index. > >=20 > > Fixes: f5587d1b6ec9 ("rv: Add Hybrid Automata monitor type") > > Signed-off-by: Wen Yang > > --- >=20 > Thanks for the fix, I have a similar one waiting for submission. >=20 > These are technically 2 separate fixes though: the ordering with unset > task_mon_slot (independent on HA) and the synchronisation with pending > tracepoints. They probably deserve separate patches and visibility, the f= irst > has always been around and we're technically overwriting who knows what. >=20 >=20 > The explanation above is a bit hard to follow though, are you talking abo= ut a > handler for the same (stall) monitor running after the reset, effectively > undoing it by setting the monitoring flag? >=20 > Then this is indeed an issue with ha_monitor_reset_env() which expects a = clean > environment. >=20 > So that's basically what you'd see now much more often because in fact we > don't > reset the right slot (though, again, that's a different issue). >=20 >=20 > Calling tracepoint_synchronize_unregister() there too would surely fix, b= ut it > used to be kinda slow. But it's probably gotten faster since now tracepoi= nts > use > SRCU, so we can wait for a dedicated grace period. >=20 > I liked the idea to wait cumulatively in the end, but that's just making > things > harder.. Let's do like this: >=20 > Prepare 2 separate patches as fixes, put the task slot one first (would e= ase > backporting), mention this issue with the race condition only in the seco= nd. > You can send them independently and I'll add them to the tree as urgent. >=20 >=20 > I'm soon going to send my set of fixes that will also include the task sl= ot > patch (not removing to ease my life with conflicts). >=20 > Thanks, > Gabriele >=20 > > =C2=A0include/rv/da_monitor.h | 18 ++++++++++++++++-- > > =C2=A01 file changed, 16 insertions(+), 2 deletions(-) > >=20 > > diff --git a/include/rv/da_monitor.h b/include/rv/da_monitor.h > > index 00ded3d5ab3f..d04bb3229c75 100644 > > --- a/include/rv/da_monitor.h > > +++ b/include/rv/da_monitor.h > > @@ -304,6 +304,20 @@ static int da_monitor_init(void) > > =C2=A0 > > =C2=A0/* > > =C2=A0 * da_monitor_destroy - return the allocated slot > > + * > > + * Call tracepoint_synchronize_unregister() before reset_all() to clos= e > > + * the race where an in-flight non-HA probe handler sets monitoring=3D= 1 > > + * (without calling timer_setup()) after da_monitor_reset_all() has > > + * already cleared the slot but before the caller's own sync completes= . > > + * Without this barrier, an HA_TIMER_WHEEL monitor that later acquires > > + * the same slot would call timer_delete() on a never-initialised > > + * timer_list, triggering ODEBUG warnings. > > + * > > + * Note: tracepoint_synchronize_unregister() is a system-wide barrier > > + * that waits for all CPUs to finish any in-flight tracepoint handlers= . > > + * The caller's own __rv_disable_monitor() issues a second sync after > > + * returning from disable(); that redundant call is harmless on the > > + * infrequent admin (enable/disable) path. > > =C2=A0 */ > > =C2=A0static inline void da_monitor_destroy(void) > > =C2=A0{ > > @@ -311,10 +325,10 @@ static inline void da_monitor_destroy(void) > > =C2=A0 WARN_ONCE(1, "Disabling a disabled monitor: " > > __stringify(MONITOR_NAME)); > > =C2=A0 return; > > =C2=A0 } > > + tracepoint_synchronize_unregister(); > > + da_monitor_reset_all(); > > =C2=A0 rv_put_task_monitor_slot(task_mon_slot); > > =C2=A0 task_mon_slot =3D RV_PER_TASK_MONITOR_INIT; > > - > > - da_monitor_reset_all(); > > =C2=A0} > > =C2=A0 > > =C2=A0#elif RV_MON_TYPE =3D=3D RV_MON_PER_OBJ