lttng-dev.lists.lttng.org archive mirror
 help / color / mirror / Atom feed
* [lttng-dev] Trigger snapshots on a watchdog
@ 2024-09-11 22:38 Damien Berget via lttng-dev
  2024-09-12  7:57 ` Kienan Stewart via lttng-dev
  0 siblings, 1 reply; 4+ messages in thread
From: Damien Berget via lttng-dev @ 2024-09-11 22:38 UTC (permalink / raw)
  To: lttng-dev


[-- Attachment #1.1: Type: text/plain, Size: 516 bytes --]

Good day,
We are trying to see what it the best way to monitor some applications not
hitting a deadline. Ideally something like a watchdog that needs to be pat
regularly and if timeout is reached triggers the snapshot.

Before we reinvent the wheel and code some userland applications, is there
a canonical way in LTTng to do it? I found this
<https://review.lttng.org/c/lttng-tools/+/9657/9> that is suspiciously
close maybe?

Thanks,
Cheers

-- 
*Damien Berget*
Embedded Platform Lead
damien.berget@flyzipline.com

[-- Attachment #1.2: Type: text/html, Size: 1215 bytes --]

[-- Attachment #2: Type: text/plain, Size: 156 bytes --]

_______________________________________________
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [lttng-dev] Trigger snapshots on a watchdog
  2024-09-11 22:38 [lttng-dev] Trigger snapshots on a watchdog Damien Berget via lttng-dev
@ 2024-09-12  7:57 ` Kienan Stewart via lttng-dev
  2024-09-12 16:14   ` Damien Berget via lttng-dev
  0 siblings, 1 reply; 4+ messages in thread
From: Kienan Stewart via lttng-dev @ 2024-09-12  7:57 UTC (permalink / raw)
  To: Damien Berget, lttng-dev

Hi Damien,

On 2024-09-11 18:38, Damien Berget via lttng-dev wrote:
> Good day,
> We are trying to see what it the best way to monitor some applications 
> not hitting a deadline. Ideally something like a watchdog that needs 
> to be pat regularly and if timeout is reached triggers the snapshot.
>
> Before we reinvent the wheel and code some userland applications, is 
> there a canonical way in LTTng to do it? I found this 
> <https://review.lttng.org/c/lttng-tools/+/9657/9> that is suspiciously 
> close maybe?
>
I don't think the the proposed changes you linked to are useful or 
related to what you hope to achieve. The patch series is a concept about 
how some types of UST ring buffer stalls might be addressed by the 
session daemon. After a quick glance, the monitoring seems to be more 
closely related to the 'monitor timer', which is used to sample 
statistical information channels[1].


There is a concept of triggers[2]; however triggers react to the 
presence of events rather than the absence thereof.


I think a small user space application that monitors the state of other 
applications is more the direction to head in. There's at least of 
couple of ways that a snapshot on unhealthy state could be achieved:


* Use liblttng-ctl to trigger a snapshot from your watchdog 
application[3][4].

* Have the watchdog application exec `lttng snapshot record`[5].

* Have the watchdog application emit some sort of "health state" events 
with some data (e.g. health_okay, health_bad, ...) per your usage 
requirements, and configure a trigger[2] to take a snapshot on the 
"health state" events that have the non-okay state.


Depending on your tracing configuration - channel overwrite/discard 
mode[6], buffer sizes, blocking mode, and number of events it is 
possible that events may not be recorded. I would privilege using 
liblttng-ctl or exec'ing `lttng snapshort record` if you want a stronger 
guarantee that your watchdog will cause a snapshot to be taken.


I would love to hear if there are other ideas. Regardless, hope this helps!


thanks,

kienan


[1]: https://lttng.org/docs/v2.13/#doc-channel-timers

[2]:  https://lttng.org/docs/v2.13/#doc-trigger

[3]:  https://lttng.org/docs/v2.13/#doc-liblttng-ctl-lttng

[4]: https://github.com/lttng/lttng-tools/tree/master/src/lib/lttng-ctl

[5]: https://lttng.org/man/1/lttng-snapshot/v2.13/

[6]: 
https://lttng.org/docs/v2.13/#doc-channel-overwrite-mode-vs-discard-mode


> Thanks,
> Cheers
>
> -- 
> *Damien Berget*
> Embedded Platform Lead
> damien.berget@flyzipline.com
>
> _______________________________________________
> lttng-dev mailing list
> lttng-dev@lists.lttng.org
> https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
_______________________________________________
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [lttng-dev] Trigger snapshots on a watchdog
  2024-09-12  7:57 ` Kienan Stewart via lttng-dev
@ 2024-09-12 16:14   ` Damien Berget via lttng-dev
  2024-09-13  9:51     ` Kienan Stewart via lttng-dev
  0 siblings, 1 reply; 4+ messages in thread
From: Damien Berget via lttng-dev @ 2024-09-12 16:14 UTC (permalink / raw)
  To: Kienan Stewart; +Cc: lttng-dev


[-- Attachment #1.1: Type: text/plain, Size: 3449 bytes --]

Thanks for the quick response Kienan,
Your proposal is exactly how we were thinking the monitor application could
work, so we'll go with that for now.
Reacting to absence of an event (watch dog) would really be a good
complement to the existing trigger types.
It's a really useful feature for a flight recorder in embedded medium
real-time applications, is the team open to feature requests?
Cheers
Damien

On Thu, Sep 12, 2024 at 12:57 AM Kienan Stewart <kstewart@efficios.com>
wrote:

> Hi Damien,
>
> On 2024-09-11 18:38, Damien Berget via lttng-dev wrote:
> > Good day,
> > We are trying to see what it the best way to monitor some applications
> > not hitting a deadline. Ideally something like a watchdog that needs
> > to be pat regularly and if timeout is reached triggers the snapshot.
> >
> > Before we reinvent the wheel and code some userland applications, is
> > there a canonical way in LTTng to do it? I found this
> > <https://review.lttng.org/c/lttng-tools/+/9657/9> that is suspiciously
> > close maybe?
> >
> I don't think the the proposed changes you linked to are useful or
> related to what you hope to achieve. The patch series is a concept about
> how some types of UST ring buffer stalls might be addressed by the
> session daemon. After a quick glance, the monitoring seems to be more
> closely related to the 'monitor timer', which is used to sample
> statistical information channels[1].
>
>
> There is a concept of triggers[2]; however triggers react to the
> presence of events rather than the absence thereof.
>
>
> I think a small user space application that monitors the state of other
> applications is more the direction to head in. There's at least of
> couple of ways that a snapshot on unhealthy state could be achieved:
>
>
> * Use liblttng-ctl to trigger a snapshot from your watchdog
> application[3][4].
>
> * Have the watchdog application exec `lttng snapshot record`[5].
>
> * Have the watchdog application emit some sort of "health state" events
> with some data (e.g. health_okay, health_bad, ...) per your usage
> requirements, and configure a trigger[2] to take a snapshot on the
> "health state" events that have the non-okay state.
>
>
> Depending on your tracing configuration - channel overwrite/discard
> mode[6], buffer sizes, blocking mode, and number of events it is
> possible that events may not be recorded. I would privilege using
> liblttng-ctl or exec'ing `lttng snapshort record` if you want a stronger
> guarantee that your watchdog will cause a snapshot to be taken.
>
>
> I would love to hear if there are other ideas. Regardless, hope this helps!
>
>
> thanks,
>
> kienan
>
>
> [1]: https://lttng.org/docs/v2.13/#doc-channel-timers
>
> [2]:  https://lttng.org/docs/v2.13/#doc-trigger
>
> [3]:  https://lttng.org/docs/v2.13/#doc-liblttng-ctl-lttng
>
> [4]: https://github.com/lttng/lttng-tools/tree/master/src/lib/lttng-ctl
>
> [5]: https://lttng.org/man/1/lttng-snapshot/v2.13/
>
> [6]:
> https://lttng.org/docs/v2.13/#doc-channel-overwrite-mode-vs-discard-mode
>
>
> > Thanks,
> > Cheers
> >
> > --
> > *Damien Berget*
> > Embedded Platform Lead
> > damien.berget@flyzipline.com
> >
> > _______________________________________________
> > lttng-dev mailing list
> > lttng-dev@lists.lttng.org
> > https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
>


-- 
*Damien Berget*

[-- Attachment #1.2: Type: text/html, Size: 5185 bytes --]

[-- Attachment #2: Type: text/plain, Size: 156 bytes --]

_______________________________________________
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [lttng-dev] Trigger snapshots on a watchdog
  2024-09-12 16:14   ` Damien Berget via lttng-dev
@ 2024-09-13  9:51     ` Kienan Stewart via lttng-dev
  0 siblings, 0 replies; 4+ messages in thread
From: Kienan Stewart via lttng-dev @ 2024-09-13  9:51 UTC (permalink / raw)
  To: Damien Berget; +Cc: lttng-dev

Hi Damien,


I've added a very summaryfeature request issue here[1], referring to 
this discussion. If you would like to elaborate or add other details, 
that would be most excellent.


thanks,

kienan


[1]: https://bugs.lttng.org/issues/1416

On 2024-09-12 12:14, Damien Berget wrote:
> Thanks for the quick response Kienan,
> Your proposal is exactly how we were thinking the monitor application 
> could work, so we'll go with that for now.
> Reacting to absence of an event (watch dog) would really be a good 
> complement to the existing trigger types.
> It's a really useful feature for a flight recorder in embedded medium 
> real-time applications, is the team open to feature requests?
> Cheers
> Damien
>
> On Thu, Sep 12, 2024 at 12:57 AM Kienan Stewart 
> <kstewart@efficios.com> wrote:
>
>     Hi Damien,
>
>     On 2024-09-11 18:38, Damien Berget via lttng-dev wrote:
>     > Good day,
>     > We are trying to see what it the best way to monitor some
>     applications
>     > not hitting a deadline. Ideally something like a watchdog that
>     needs
>     > to be pat regularly and if timeout is reached triggers the snapshot.
>     >
>     > Before we reinvent the wheel and code some userland
>     applications, is
>     > there a canonical way in LTTng to do it? I found this
>     > <https://review.lttng.org/c/lttng-tools/+/9657/9> that is
>     suspiciously
>     > close maybe?
>     >
>     I don't think the the proposed changes you linked to are useful or
>     related to what you hope to achieve. The patch series is a concept
>     about
>     how some types of UST ring buffer stalls might be addressed by the
>     session daemon. After a quick glance, the monitoring seems to be more
>     closely related to the 'monitor timer', which is used to sample
>     statistical information channels[1].
>
>
>     There is a concept of triggers[2]; however triggers react to the
>     presence of events rather than the absence thereof.
>
>
>     I think a small user space application that monitors the state of
>     other
>     applications is more the direction to head in. There's at least of
>     couple of ways that a snapshot on unhealthy state could be achieved:
>
>
>     * Use liblttng-ctl to trigger a snapshot from your watchdog
>     application[3][4].
>
>     * Have the watchdog application exec `lttng snapshot record`[5].
>
>     * Have the watchdog application emit some sort of "health state"
>     events
>     with some data (e.g. health_okay, health_bad, ...) per your usage
>     requirements, and configure a trigger[2] to take a snapshot on the
>     "health state" events that have the non-okay state.
>
>
>     Depending on your tracing configuration - channel overwrite/discard
>     mode[6], buffer sizes, blocking mode, and number of events it is
>     possible that events may not be recorded. I would privilege using
>     liblttng-ctl or exec'ing `lttng snapshort record` if you want a
>     stronger
>     guarantee that your watchdog will cause a snapshot to be taken.
>
>
>     I would love to hear if there are other ideas. Regardless, hope
>     this helps!
>
>
>     thanks,
>
>     kienan
>
>
>     [1]: https://lttng.org/docs/v2.13/#doc-channel-timers
>
>     [2]: https://lttng.org/docs/v2.13/#doc-trigger
>
>     [3]: https://lttng.org/docs/v2.13/#doc-liblttng-ctl-lttng
>
>     [4]:
>     https://github.com/lttng/lttng-tools/tree/master/src/lib/lttng-ctl
>
>     [5]: https://lttng.org/man/1/lttng-snapshot/v2.13/
>
>     [6]:
>     https://lttng.org/docs/v2.13/#doc-channel-overwrite-mode-vs-discard-mode
>
>
>     > Thanks,
>     > Cheers
>     >
>     > --
>     > *Damien Berget*
>     > Embedded Platform Lead
>     > damien.berget@flyzipline.com
>     >
>     > _______________________________________________
>     > lttng-dev mailing list
>     > lttng-dev@lists.lttng.org
>     > https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
>
>
>
> -- 
> *Damien Berget*
_______________________________________________
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2024-09-13  9:51 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-09-11 22:38 [lttng-dev] Trigger snapshots on a watchdog Damien Berget via lttng-dev
2024-09-12  7:57 ` Kienan Stewart via lttng-dev
2024-09-12 16:14   ` Damien Berget via lttng-dev
2024-09-13  9:51     ` Kienan Stewart via lttng-dev

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).