From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A31C81B87C9 for ; Wed, 13 May 2026 09:31:34 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778664696; cv=none; b=HofD7qwj0G275ARIiEWZ/yD7Enr6F7P+yRRoCDJIw4w0BsZUlEvtM1EYhZqdjU2VLH0XnMzg5w29BzC+Db5JogwyMfPQ0JUClGOLf134ghy4i3K8xP9eVh4Fqo7iJDb0wRHfoDGg7VxnwhJoXnuaPZhzffR8x6ZjaGTAtimi/PQ= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778664696; c=relaxed/simple; bh=/IZ5pTm+605PxF0qhvTpqjMHCkGNn/Ig3BRI/WBH8AQ=; h=Message-ID:Subject:From:To:Cc:Date:In-Reply-To:References: Content-Type:MIME-Version; b=hadVR9ha5RNp6BM9R6GHesRcvwuUs9pwUDOyH8lxJUPqPn06l18IEqd3oNboSEIGtj3aDOg3zP+35+wxTJTMYlquLiYOj5tYzBOzX41Q/hc6rp7LhDajugAJkOGS1twjA8sT/T3YJaSbRnrYimEVhqkS9EslZ3Xrs58fuVntvcw= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=ccMbrYeh; dkim=pass (2048-bit key) header.d=redhat.com header.i=@redhat.com header.b=Bvy5Bsf3; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="ccMbrYeh"; dkim=pass (2048-bit key) header.d=redhat.com header.i=@redhat.com header.b="Bvy5Bsf3" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1778664693; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:autocrypt:autocrypt; bh=9n2IgyZqpJfs0Yeip0ZAGShA1zXFX+3dWz5lTELV1IE=; b=ccMbrYehR3FoFzf9x71G//aZDY3Fsb8JqJ6Cmp2IxGcoL3GpN9SYkzsWg4Cnn3BHjY2eRp TNO4QKx2j1FAdu13hJYinW+OzcGzhgVqI8SQqwJcq/6u65IuVyDVjvD+Po4b8creyipgHg aycpgIWtHKi/QBiWMha2H+z3a4IlN9c= Received: from mail-wm1-f72.google.com (mail-wm1-f72.google.com [209.85.128.72]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-493-QalJBkGcNomXYRFhKAvfuw-1; Wed, 13 May 2026 05:31:32 -0400 X-MC-Unique: QalJBkGcNomXYRFhKAvfuw-1 X-Mimecast-MFC-AGG-ID: QalJBkGcNomXYRFhKAvfuw_1778664691 Received: by mail-wm1-f72.google.com with SMTP id 5b1f17b1804b1-48fd233d1e2so1813105e9.1 for ; Wed, 13 May 2026 02:31:31 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=google; t=1778664691; x=1779269491; darn=vger.kernel.org; h=mime-version:user-agent:content-transfer-encoding:autocrypt :references:in-reply-to:date:cc:to:from:subject:message-id:from:to :cc:subject:date:message-id:reply-to; bh=9n2IgyZqpJfs0Yeip0ZAGShA1zXFX+3dWz5lTELV1IE=; b=Bvy5Bsf3YoOAReQgjmQJ7d7iYLfqvEghqb8oETa5sxMe214lncoc2PrlT6BV/fgI5/ /Le4Wik/BcDm3ZGIfSk1swMOPI4Or9arO3voWnGHe/IWYO28RTi9xo9G87HiUsauUtX9 N8a7oZx56ma9S10POGEfKcYj9nUXcJ7p97exybwnEVgBvQDfHxGs56kP2fiHxJkhKP29 1muV+lx1hq/SWxgoLO9WdFlGsAVqL5m2xYs6CXz0uq3CjMeRRlbeMoustlEmMkkjewwZ f+6Kh0S4y3+f5wvCnX9XfeMXiSmaBG1lrqn9h7CIX7hxIPiaAyiiw307dsYhYlsIMX80 XNcA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1778664691; x=1779269491; h=mime-version:user-agent:content-transfer-encoding:autocrypt :references:in-reply-to:date:cc:to:from:subject:message-id:x-gm-gg :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=9n2IgyZqpJfs0Yeip0ZAGShA1zXFX+3dWz5lTELV1IE=; b=NRv2CXTLA2/brK1zXE0Zo7Z9Ana2C/YDyyJxDu2OotB5A/2XK0wUoL2+oYFq/9seSt IHwW29B6k6SLVVfZ81hR5FRnGpP1ry6mmc5oj5LhUGQlNdgiJYW/dhG/DovhIFr1KnDs XVZnammBvFuoyxfj4eFlaKw8d4jWC8BYB5CWh6VRJJ3DEx/bJIO0J7ribB0VxgfWrdCi 6QgqDRAE1LJwd7RaDeojDDGzFjU7LjhS9r4Rhmzb7aQS0XZx03bnewrf7ZfSp1xXr46/ rZb/M0soRQio2wy/I5RaX4dvYd1sBc47TkrzDMuKkE60PHS4EO6wOE/A3gqA3TuqKD6m UQaQ== X-Forwarded-Encrypted: i=1; AFNElJ9kFTEnQ8MCmJ5uvXefrWnkiyr0ZIE+VznV1A8qo5HaaNpon/DR0CJwS028L89rSziXqbgyNrz2p7cWziU=@vger.kernel.org X-Gm-Message-State: AOJu0YwHgrkjYupKAdWt2jLm/bCCaq8txrPeGIDYDD18U4Do0BP58f0W wVP4SV3Z9RPQft2PpEAMmarIAf/oc89Y+lJIVNNrpVlPeEDi1sRcFIm5ckWM1/jlVZanozWFGLn MLGA7VgzDGwip8J3hMVr/OR/fO66InjCesyKf1EUtXYvwE5iPidE4ULpzsg97F9dZGw== X-Gm-Gg: Acq92OG/QnyKu20o1qPVhUssCmmhYvXii9XjGiljBnyYf1MUjPq9bhFitI5o6yzQ0AK pW3os2cL7zroJtG2h5urCHBR2Yx2TDNFatzVjU4RUSWo1+mAu3MRiw6cfdg6LDpp5uLd14qfPTn bjge2ODRjHIE6Gwg+ZORs+wBjbT3/zCwWyusZtdspv+5jb0QjJyiXc/L7ISGMk58ogn9XKoAqN3 ZiPz5oHwrcrIB4WPKnOuFz1R+dPr3uYrjHzXouTj5iGabfbsa0Ynwl2DPHTsIXNJUg4EhernxZu mB594G2zNx7GP4KghjnLx1h3qOp8z8dm5EsiFG8Ih1wy6pwzuSE92Y/D8rIFCEvfBPtrYz1j5Df Tq8M7YxCEXT5xBKRuivi8BQjZI7i9eK0u61AcwN2FDuG0Z8lG0ppuA8w2GIzNt2dGody2Jq6+dr YHzsUg9NHMRzrNXfw= X-Received: by 2002:a05:600c:3051:b0:48f:d2fc:d025 with SMTP id 5b1f17b1804b1-48fd2fcd11emr5041775e9.6.1778664690940; Wed, 13 May 2026 02:31:30 -0700 (PDT) X-Received: by 2002:a05:600c:3051:b0:48f:d2fc:d025 with SMTP id 5b1f17b1804b1-48fd2fcd11emr5041405e9.6.1778664690551; Wed, 13 May 2026 02:31:30 -0700 (PDT) Received: from gmonaco-thinkpadt14gen3.rmtit.csb (212-8-243-115.hosted-by-worldstream.net. [212.8.243.115]) by smtp.gmail.com with ESMTPSA id 5b1f17b1804b1-48e8f43c31dsm38451965e9.25.2026.05.13.02.31.29 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 13 May 2026 02:31:30 -0700 (PDT) Message-ID: <67839f35b6331c4623d60281ff1c26513117bf3d.camel@redhat.com> Subject: Re: [RFC PATCH v2 02/10] rv/da: fix per-task da_monitor_destroy() ordering and sync From: Gabriele Monaco To: Wen Yang Cc: linux-trace-kernel@vger.kernel.org, linux-kernel@vger.kernel.org, Steven Rostedt Date: Wed, 13 May 2026 11:31:28 +0200 In-Reply-To: References: <8e80cbcf739304de95356f1fac677261628977fa.camel@redhat.com> Autocrypt: addr=gmonaco@redhat.com; prefer-encrypt=mutual; keydata=mDMEZuK5YxYJKwYBBAHaRw8BAQdAmJ3dM9Sz6/Hodu33Qrf8QH2bNeNbOikqYtxWFLVm0 1a0JEdhYnJpZWxlIE1vbmFjbyA8Z21vbmFjb0BrZXJuZWwub3JnPoiZBBMWCgBBFiEEysoR+AuB3R Zwp6j270psSVh4TfIFAmjKX2MCGwMFCQWjmoAFCwkIBwICIgIGFQoJCAsCBBYCAwECHgcCF4AACgk Q70psSVh4TfIQuAD+JulczTN6l7oJjyroySU55Fbjdvo52xiYYlMjPG7dCTsBAMFI7dSL5zg98I+8 cXY1J7kyNsY6/dcipqBM4RMaxXsOtCRHYWJyaWVsZSBNb25hY28gPGdtb25hY29AcmVkaGF0LmNvb T6InAQTFgoARAIbAwUJBaOagAULCQgHAgIiAgYVCgkICwIEFgIDAQIeBwIXgBYhBMrKEfgLgd0WcK eo9u9KbElYeE3yBQJoymCyAhkBAAoJEO9KbElYeE3yjX4BAJ/ETNnlHn8OjZPT77xGmal9kbT1bC1 7DfrYVISWV2Y1AP9HdAMhWNAvtCtN2S1beYjNybuK6IzWYcFfeOV+OBWRDQ== Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable User-Agent: Evolution 3.60.1 (3.60.1-1.fc44) Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 On Wed, 2026-05-13 at 13:32 +0800, Wen Yang wrote: > Thanks for both messages.=C2=A0 Two patches are ready; let me address > your follow-up concerns before sending. >=20 > =C2=A0=C2=A0 1. "all monitors reusing slots would suffer from it" >=20 > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 Only RV_MON_PER_TASK uses the rv_get/put_t= ask_monitor_slot() > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 pool.=C2=A0 RV_MON_GLOBAL and RV_MON_PER_C= PU each have dedicated > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 storage (a single static variable and a pe= r-cpu variable) and > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 never share slots across monitor types.=C2= =A0 The race is exclusive > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 to PER_TASK, so fixing that variant's da_m= onitor_destroy() is > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 the correct scope. >=20 > =C2=A0=C2=A0 2. "LTL monitors don't even have monitoring" >=20 > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 tracepoint_synchronize_unregister() does n= ot rely on the > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 monitoring flag at all.=C2=A0 It is a syst= em-wide barrier =E2=80=94 it > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 calls synchronize_rcu_tasks_trace() follow= ed by > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 synchronize_srcu(&tracepoint_srcu) =E2=80= =94 draining every in-flight > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 tracepoint handler on every CPU regardless= of which monitor > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 dispatched it.=C2=A0 LTL handlers are cove= red without any special > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 treatment. >=20 > The slot-ordering issue (patch 1) affects all per-task DA monitors, > not only HA ones =E2=80=94 "independent on HA" =E2=80=94 because > RV_PER_TASK_MONITOR_INIT equals CONFIG_RV_PER_TASK_MONITORS (one > past the end of rv[]), so da_monitor_reset_all() overwrites whatever > follows rv[] in task_struct whenever any per-task monitor is > disabled. Exactly, and since whatever follows .rv is randomised on a task_struct, thi= s can get quite nasty. I included my version of the fix in the series in [1], but feel free to sen= d yours, you got there first ;) >=20 > Also corrected "wwnr probe handler" to "stall probe handler" in > patch 2 per your annotation. >=20 While tracepoint_synchronize_unregister() does fix the race, I still see a = timed bomb in the way we do ha_monitor_reset_env(). Since we reused the same slots for per-task monitors (not for the others, y= ou're right I was brainfarting) we essentially don't know what happened before we= do da_monitor_init(), the same slot could have been used by an LTL monitor whi= ch cannot even reliably clear the byte used by the monitoring flag. Now, we either mandate all monitors to memset the entire slot (union rv_task_monitor) or we don't assume anything about the slot's state during initialisation. Any middle ground could reveal pesky bugs as soon as we ref= actor the structs. The latter idea is what I did in [1]. I believe that would make the synchronisation superfluous. What do you think? Thanks, Gabriele [1] - https://lore.kernel.org/lkml/20260512140250.262190-8-gmonaco@redhat.c= om > Please let me know if the above reasoning addresses your concerns. >=20 >=20 > -- > Best wishes, > Wen >=20 > > >=20 > > > > =C2=A0=C2=A0include/rv/da_monitor.h | 18 ++++++++++++++++-- > > > > =C2=A0=C2=A01 file changed, 16 insertions(+), 2 deletions(-) > > > >=20 > > > > diff --git a/include/rv/da_monitor.h b/include/rv/da_monitor.h > > > > index 00ded3d5ab3f..d04bb3229c75 100644 > > > > --- a/include/rv/da_monitor.h > > > > +++ b/include/rv/da_monitor.h > > > > @@ -304,6 +304,20 @@ static int da_monitor_init(void) > > > > =C2=A0=20 > > > > =C2=A0=C2=A0/* > > > > =C2=A0=C2=A0 * da_monitor_destroy - return the allocated slot > > > > + * > > > > + * Call tracepoint_synchronize_unregister() before reset_all() to = close > > > > + * the race where an in-flight non-HA probe handler sets monitorin= g=3D1 > > > > + * (without calling timer_setup()) after da_monitor_reset_all() ha= s > > > > + * already cleared the slot but before the caller's own sync compl= etes. > > > > + * Without this barrier, an HA_TIMER_WHEEL monitor that later acqu= ires > > > > + * the same slot would call timer_delete() on a never-initialised > > > > + * timer_list, triggering ODEBUG warnings. > > > > + * > > > > + * Note: tracepoint_synchronize_unregister() is a system-wide barr= ier > > > > + * that waits for all CPUs to finish any in-flight tracepoint hand= lers. > > > > + * The caller's own __rv_disable_monitor() issues a second sync af= ter > > > > + * returning from disable(); that redundant call is harmless on th= e > > > > + * infrequent admin (enable/disable) path. > > > > =C2=A0=C2=A0 */ > > > > =C2=A0=C2=A0static inline void da_monitor_destroy(void) > > > > =C2=A0=C2=A0{ > > > > @@ -311,10 +325,10 @@ static inline void da_monitor_destroy(void) > > > > =C2=A0=C2=A0 WARN_ONCE(1, "Disabling a disabled monitor: " > > > > __stringify(MONITOR_NAME)); > > > > =C2=A0=C2=A0 return; > > > > =C2=A0=C2=A0 } > > > > + tracepoint_synchronize_unregister(); > > > > + da_monitor_reset_all(); > > > > =C2=A0=C2=A0 rv_put_task_monitor_slot(task_mon_slot); > > > > =C2=A0=C2=A0 task_mon_slot =3D RV_PER_TASK_MONITOR_INIT; > > > > - > > > > - da_monitor_reset_all(); > > > > =C2=A0=C2=A0} > > > > =C2=A0=20 > > > > =C2=A0=C2=A0#elif RV_MON_TYPE =3D=3D RV_MON_PER_OBJ > >=20