From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.lttng.org (lists.lttng.org [167.114.26.123]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id A82ADEED617 for ; Thu, 12 Sep 2024 16:14:33 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=lists.lttng.org; s=default; t=1726157672; bh=rsPb1kKWgY+VCyO0LZER8/HFC7nrOVEDcnHwKDmSKzw=; h=References:In-Reply-To:Date:To:Cc:Subject:List-Id: List-Unsubscribe:List-Archive:List-Post:List-Help:List-Subscribe: From:Reply-To:From; b=eoj2k4DRFVn+gh48Z9QEiW6435jpMTDWUsrkmLNsOtsPO/E769oQtJrwBQDkLxF/p wqaxfY6uNxNz+ymztme63FQXzOwfgNIpBXcD24Es/bBaowNPtLFX2mPkFZzz73Tv6V h40sk+R1isx2/5I2mhpdshySxXFX+FMfYT37ys+ThyEDGQ9tc/72qptvHDr20pr9tS ksJ199WO76LadR7hczhBXyDUdxPw7Iktq9rvA7FJg8g7mhIcAC4/5cLGLL3a1FKO3l sm4pNZBN4ZCJTys93aCjFoHIJFVYyx3OXAQJ7c8Bnvc6y4C+fZydPq0HrkOWm8eJmW /r1LrUY5foFeA== Received: from lists-lttng01.efficios.com (localhost [IPv6:::1]) by lists.lttng.org (Postfix) with ESMTP id 4X4Mvq6khqz1Gly; Thu, 12 Sep 2024 12:14:31 -0400 (EDT) Received: from mail-yw1-x1130.google.com (mail-yw1-x1130.google.com [IPv6:2607:f8b0:4864:20::1130]) by lists.lttng.org (Postfix) with ESMTPS id 4X4Mvp6pRnz1Gfq for ; Thu, 12 Sep 2024 12:14:30 -0400 (EDT) Received: by mail-yw1-x1130.google.com with SMTP id 00721157ae682-6d3c10af2efso9807257b3.0 for ; Thu, 12 Sep 2024 09:14:30 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1726157670; x=1726762470; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=sXb085zECfXQgydoZsPx9lUbE7ZKzbfBVyTsWIVx7lo=; b=OrBDm4jh8WyLdkHGqUOXHTPxXN2/uHD4ncAcivlZEQnIKw5GfADquoy6JDzmRyPYxH 2K9UWrIGzzp+RwQgpsXMG3jtsXwF8niaoFxJz3vKKcJ/rr6+EL+YYfXAVByZjQTHt/CL T8GtVI+YD0BeJo6lpFZfd5F17IJRpd49DRu118jsU3VbOi+XNpz076BKdfmIqLLe/O8d UMEiYf9Xoqaq/XpN27lLKsKHb0RkDFOqowj8/PlFwTtsA3zc6cVaTUW7yaofRt+YNzSU OCw/rNM5EWvL0oUvsUf1oht4f2s5B9M4sPqzklixApSVfo2YgpK9FopHPhp/bSd7UhjP kr4w== X-Gm-Message-State: AOJu0YwPFb6pj/Le9mh6S6EqqwGf3amVS3LMJTniipUs6zLitU8bQFN0 8myArfDZeu9S8iFxWTN07KRXRZdxjSB3GXTTWpitDJfv5g7rkBgqVLgYFFL3Uo9TQwp4RHP6gZs /7hd2qBIjiLA6jbb3I5k/wqOTOjlfWfaLq69csA== X-Google-Smtp-Source: AGHT+IH/qxp+sQgsE/RJ8AKgm8O1+qggBjKwAjq8CrSQOmcZmNs/4xAUsMlJk8Etu9G/Ps+tvlLa42gHSg/MJOofaUE= X-Received: by 2002:a05:690c:c9b:b0:6ae:e4b8:6a46 with SMTP id 00721157ae682-6dbb6ba1112mr36774057b3.44.1726157669775; Thu, 12 Sep 2024 09:14:29 -0700 (PDT) MIME-Version: 1.0 References: <39b62cc1-66e1-4962-a3a8-0d3ad6e151ef@efficios.com> In-Reply-To: <39b62cc1-66e1-4962-a3a8-0d3ad6e151ef@efficios.com> Date: Thu, 12 Sep 2024 09:14:17 -0700 Message-ID: To: Kienan Stewart Cc: lttng-dev@lists.lttng.org Subject: Re: [lttng-dev] Trigger snapshots on a watchdog X-BeenThere: lttng-dev@lists.lttng.org X-Mailman-Version: 2.1.39 Precedence: list List-Id: LTTng development list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , From: Damien Berget via lttng-dev Reply-To: Damien Berget Content-Type: multipart/mixed; boundary="===============5701111791593864571==" Errors-To: lttng-dev-bounces@lists.lttng.org Sender: "lttng-dev" --===============5701111791593864571== Content-Type: multipart/alternative; boundary="000000000000bbd1a40621ee6610" --000000000000bbd1a40621ee6610 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Thanks for the quick response Kienan, Your proposal is exactly how we were thinking the monitor application could work, so we'll go with that for now. Reacting to absence of an event (watch dog) would really be a good complement to the existing trigger types. It's a really useful feature for a flight recorder in embedded medium real-time applications, is the team open to feature requests? Cheers Damien On Thu, Sep 12, 2024 at 12:57=E2=80=AFAM Kienan Stewart wrote: > Hi Damien, > > On 2024-09-11 18:38, Damien Berget via lttng-dev wrote: > > Good day, > > We are trying to see what it the best way to monitor some applications > > not hitting a deadline. Ideally something like a watchdog that needs > > to be pat regularly and if timeout is reached triggers the snapshot. > > > > Before we reinvent the wheel and code some userland applications, is > > there a canonical way in LTTng to do it? I found this > > that is suspiciously > > close maybe? > > > I don't think the the proposed changes you linked to are useful or > related to what you hope to achieve. The patch series is a concept about > how some types of UST ring buffer stalls might be addressed by the > session daemon. After a quick glance, the monitoring seems to be more > closely related to the 'monitor timer', which is used to sample > statistical information channels[1]. > > > There is a concept of triggers[2]; however triggers react to the > presence of events rather than the absence thereof. > > > I think a small user space application that monitors the state of other > applications is more the direction to head in. There's at least of > couple of ways that a snapshot on unhealthy state could be achieved: > > > * Use liblttng-ctl to trigger a snapshot from your watchdog > application[3][4]. > > * Have the watchdog application exec `lttng snapshot record`[5]. > > * Have the watchdog application emit some sort of "health state" events > with some data (e.g. health_okay, health_bad, ...) per your usage > requirements, and configure a trigger[2] to take a snapshot on the > "health state" events that have the non-okay state. > > > Depending on your tracing configuration - channel overwrite/discard > mode[6], buffer sizes, blocking mode, and number of events it is > possible that events may not be recorded. I would privilege using > liblttng-ctl or exec'ing `lttng snapshort record` if you want a stronger > guarantee that your watchdog will cause a snapshot to be taken. > > > I would love to hear if there are other ideas. Regardless, hope this help= s! > > > thanks, > > kienan > > > [1]: https://lttng.org/docs/v2.13/#doc-channel-timers > > [2]: https://lttng.org/docs/v2.13/#doc-trigger > > [3]: https://lttng.org/docs/v2.13/#doc-liblttng-ctl-lttng > > [4]: https://github.com/lttng/lttng-tools/tree/master/src/lib/lttng-ctl > > [5]: https://lttng.org/man/1/lttng-snapshot/v2.13/ > > [6]: > https://lttng.org/docs/v2.13/#doc-channel-overwrite-mode-vs-discard-mode > > > > Thanks, > > Cheers > > > > -- > > *Damien Berget* > > Embedded Platform Lead > > damien.berget@flyzipline.com > > > > _______________________________________________ > > lttng-dev mailing list > > lttng-dev@lists.lttng.org > > https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev > --=20 *Damien Berget* --000000000000bbd1a40621ee6610 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Thanks for the quick=C2=A0response Kienan,
Your propos= al is exactly how we were thinking the monitor application could work, so w= e'll go with that for now.
Reacting to absence=C2=A0of an eve= nt (watch dog) would really be a good complement to the existing trigger ty= pes.
It's a really useful feature for a flight recorder in em= bedded medium real-time applications, is the team open to feature requests?=
Cheers
Damien

On Thu, Sep 12, 2024 at 12:57=E2=80= =AFAM Kienan Stewart <kstewart@= efficios.com> wrote:
Hi Damien,

On 2024-09-11 18:38, Damien Berget via lttng-dev wrote:
> Good day,
> We are trying to see what it the best way to monitor some applications=
> not hitting a deadline. Ideally something like a watchdog that needs <= br> > to be pat regularly and if timeout is reached triggers the snapshot. >
> Before we reinvent the wheel and code some userland applications, is <= br> > there a canonical way in LTTng to do it? I found this
> <https://review.lttng.org/c/lttng-tools/+/965= 7/9>=C2=A0that is suspiciously
> close maybe?
>
I don't think the the proposed changes you linked to are useful or
related to what you hope to achieve. The patch series is a concept about how some types of UST ring buffer stalls might be addressed by the
session daemon. After a quick glance, the monitoring seems to be more
closely related to the 'monitor timer', which is used to sample statistical information channels[1].


There is a concept of triggers[2]; however triggers react to the
presence of events rather than the absence thereof.


I think a small user space application that monitors the state of other applications is more the direction to head in. There's at least of
couple of ways that a snapshot on unhealthy state could be achieved:


* Use liblttng-ctl to trigger a snapshot from your watchdog
application[3][4].

* Have the watchdog application exec `lttng snapshot record`[5].

* Have the watchdog application emit some sort of "health state" = events
with some data (e.g. health_okay, health_bad, ...) per your usage
requirements, and configure a trigger[2] to take a snapshot on the
"health state" events that have the non-okay state.


Depending on your tracing configuration - channel overwrite/discard
mode[6], buffer sizes, blocking mode, and number of events it is
possible that events may not be recorded. I would privilege using
liblttng-ctl or exec'ing `lttng snapshort record` if you want a stronge= r
guarantee that your watchdog will cause a snapshot to be taken.


I would love to hear if there are other ideas. Regardless, hope this helps!=


thanks,

kienan


[1]: https://lttng.org/docs/v2.13/#doc-channel-timer= s

[2]:=C2=A0 https://lttng.org/docs/v2.13/#doc-trigger
[3]:=C2=A0 https://lttng.org/docs/v2.13/#doc-lib= lttng-ctl-lttng

[4]: https://github.com/lttng/lttn= g-tools/tree/master/src/lib/lttng-ctl

[5]: https://lttng.org/man/1/lttng-snapshot/v2.13/<= br>
[6]:
https://lttng.org/docs/v2.13= /#doc-channel-overwrite-mode-vs-discard-mode


> Thanks,
> Cheers
>
> --
> *Damien Berget*
> Embedded Platform Lead
> dami= en.berget@flyzipline.com
>
> _______________________________________________
> lttng-dev mailing list
> lttng-d= ev@lists.lttng.org
> https://lists.lttng.org/cgi-bin/mailm= an/listinfo/lttng-dev


--
Damien Berget
--000000000000bbd1a40621ee6610-- --===============5701111791593864571== Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline _______________________________________________ lttng-dev mailing list lttng-dev@lists.lttng.org https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev --===============5701111791593864571==--