* [lttng-dev] Trigger snapshots on a watchdog @ 2024-09-11 22:38 Damien Berget via lttng-dev 2024-09-12 7:57 ` Kienan Stewart via lttng-dev 0 siblings, 1 reply; 4+ messages in thread From: Damien Berget via lttng-dev @ 2024-09-11 22:38 UTC (permalink / raw) To: lttng-dev [-- Attachment #1.1: Type: text/plain, Size: 516 bytes --] Good day, We are trying to see what it the best way to monitor some applications not hitting a deadline. Ideally something like a watchdog that needs to be pat regularly and if timeout is reached triggers the snapshot. Before we reinvent the wheel and code some userland applications, is there a canonical way in LTTng to do it? I found this <https://review.lttng.org/c/lttng-tools/+/9657/9> that is suspiciously close maybe? Thanks, Cheers -- *Damien Berget* Embedded Platform Lead damien.berget@flyzipline.com [-- Attachment #1.2: Type: text/html, Size: 1215 bytes --] [-- Attachment #2: Type: text/plain, Size: 156 bytes --] _______________________________________________ lttng-dev mailing list lttng-dev@lists.lttng.org https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [lttng-dev] Trigger snapshots on a watchdog 2024-09-11 22:38 [lttng-dev] Trigger snapshots on a watchdog Damien Berget via lttng-dev @ 2024-09-12 7:57 ` Kienan Stewart via lttng-dev 2024-09-12 16:14 ` Damien Berget via lttng-dev 0 siblings, 1 reply; 4+ messages in thread From: Kienan Stewart via lttng-dev @ 2024-09-12 7:57 UTC (permalink / raw) To: Damien Berget, lttng-dev Hi Damien, On 2024-09-11 18:38, Damien Berget via lttng-dev wrote: > Good day, > We are trying to see what it the best way to monitor some applications > not hitting a deadline. Ideally something like a watchdog that needs > to be pat regularly and if timeout is reached triggers the snapshot. > > Before we reinvent the wheel and code some userland applications, is > there a canonical way in LTTng to do it? I found this > <https://review.lttng.org/c/lttng-tools/+/9657/9> that is suspiciously > close maybe? > I don't think the the proposed changes you linked to are useful or related to what you hope to achieve. The patch series is a concept about how some types of UST ring buffer stalls might be addressed by the session daemon. After a quick glance, the monitoring seems to be more closely related to the 'monitor timer', which is used to sample statistical information channels[1]. There is a concept of triggers[2]; however triggers react to the presence of events rather than the absence thereof. I think a small user space application that monitors the state of other applications is more the direction to head in. There's at least of couple of ways that a snapshot on unhealthy state could be achieved: * Use liblttng-ctl to trigger a snapshot from your watchdog application[3][4]. * Have the watchdog application exec `lttng snapshot record`[5]. * Have the watchdog application emit some sort of "health state" events with some data (e.g. health_okay, health_bad, ...) per your usage requirements, and configure a trigger[2] to take a snapshot on the "health state" events that have the non-okay state. Depending on your tracing configuration - channel overwrite/discard mode[6], buffer sizes, blocking mode, and number of events it is possible that events may not be recorded. I would privilege using liblttng-ctl or exec'ing `lttng snapshort record` if you want a stronger guarantee that your watchdog will cause a snapshot to be taken. I would love to hear if there are other ideas. Regardless, hope this helps! thanks, kienan [1]: https://lttng.org/docs/v2.13/#doc-channel-timers [2]: https://lttng.org/docs/v2.13/#doc-trigger [3]: https://lttng.org/docs/v2.13/#doc-liblttng-ctl-lttng [4]: https://github.com/lttng/lttng-tools/tree/master/src/lib/lttng-ctl [5]: https://lttng.org/man/1/lttng-snapshot/v2.13/ [6]: https://lttng.org/docs/v2.13/#doc-channel-overwrite-mode-vs-discard-mode > Thanks, > Cheers > > -- > *Damien Berget* > Embedded Platform Lead > damien.berget@flyzipline.com > > _______________________________________________ > lttng-dev mailing list > lttng-dev@lists.lttng.org > https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev _______________________________________________ lttng-dev mailing list lttng-dev@lists.lttng.org https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [lttng-dev] Trigger snapshots on a watchdog 2024-09-12 7:57 ` Kienan Stewart via lttng-dev @ 2024-09-12 16:14 ` Damien Berget via lttng-dev 2024-09-13 9:51 ` Kienan Stewart via lttng-dev 0 siblings, 1 reply; 4+ messages in thread From: Damien Berget via lttng-dev @ 2024-09-12 16:14 UTC (permalink / raw) To: Kienan Stewart; +Cc: lttng-dev [-- Attachment #1.1: Type: text/plain, Size: 3449 bytes --] Thanks for the quick response Kienan, Your proposal is exactly how we were thinking the monitor application could work, so we'll go with that for now. Reacting to absence of an event (watch dog) would really be a good complement to the existing trigger types. It's a really useful feature for a flight recorder in embedded medium real-time applications, is the team open to feature requests? Cheers Damien On Thu, Sep 12, 2024 at 12:57 AM Kienan Stewart <kstewart@efficios.com> wrote: > Hi Damien, > > On 2024-09-11 18:38, Damien Berget via lttng-dev wrote: > > Good day, > > We are trying to see what it the best way to monitor some applications > > not hitting a deadline. Ideally something like a watchdog that needs > > to be pat regularly and if timeout is reached triggers the snapshot. > > > > Before we reinvent the wheel and code some userland applications, is > > there a canonical way in LTTng to do it? I found this > > <https://review.lttng.org/c/lttng-tools/+/9657/9> that is suspiciously > > close maybe? > > > I don't think the the proposed changes you linked to are useful or > related to what you hope to achieve. The patch series is a concept about > how some types of UST ring buffer stalls might be addressed by the > session daemon. After a quick glance, the monitoring seems to be more > closely related to the 'monitor timer', which is used to sample > statistical information channels[1]. > > > There is a concept of triggers[2]; however triggers react to the > presence of events rather than the absence thereof. > > > I think a small user space application that monitors the state of other > applications is more the direction to head in. There's at least of > couple of ways that a snapshot on unhealthy state could be achieved: > > > * Use liblttng-ctl to trigger a snapshot from your watchdog > application[3][4]. > > * Have the watchdog application exec `lttng snapshot record`[5]. > > * Have the watchdog application emit some sort of "health state" events > with some data (e.g. health_okay, health_bad, ...) per your usage > requirements, and configure a trigger[2] to take a snapshot on the > "health state" events that have the non-okay state. > > > Depending on your tracing configuration - channel overwrite/discard > mode[6], buffer sizes, blocking mode, and number of events it is > possible that events may not be recorded. I would privilege using > liblttng-ctl or exec'ing `lttng snapshort record` if you want a stronger > guarantee that your watchdog will cause a snapshot to be taken. > > > I would love to hear if there are other ideas. Regardless, hope this helps! > > > thanks, > > kienan > > > [1]: https://lttng.org/docs/v2.13/#doc-channel-timers > > [2]: https://lttng.org/docs/v2.13/#doc-trigger > > [3]: https://lttng.org/docs/v2.13/#doc-liblttng-ctl-lttng > > [4]: https://github.com/lttng/lttng-tools/tree/master/src/lib/lttng-ctl > > [5]: https://lttng.org/man/1/lttng-snapshot/v2.13/ > > [6]: > https://lttng.org/docs/v2.13/#doc-channel-overwrite-mode-vs-discard-mode > > > > Thanks, > > Cheers > > > > -- > > *Damien Berget* > > Embedded Platform Lead > > damien.berget@flyzipline.com > > > > _______________________________________________ > > lttng-dev mailing list > > lttng-dev@lists.lttng.org > > https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev > -- *Damien Berget* [-- Attachment #1.2: Type: text/html, Size: 5185 bytes --] [-- Attachment #2: Type: text/plain, Size: 156 bytes --] _______________________________________________ lttng-dev mailing list lttng-dev@lists.lttng.org https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [lttng-dev] Trigger snapshots on a watchdog 2024-09-12 16:14 ` Damien Berget via lttng-dev @ 2024-09-13 9:51 ` Kienan Stewart via lttng-dev 0 siblings, 0 replies; 4+ messages in thread From: Kienan Stewart via lttng-dev @ 2024-09-13 9:51 UTC (permalink / raw) To: Damien Berget; +Cc: lttng-dev Hi Damien, I've added a very summaryfeature request issue here[1], referring to this discussion. If you would like to elaborate or add other details, that would be most excellent. thanks, kienan [1]: https://bugs.lttng.org/issues/1416 On 2024-09-12 12:14, Damien Berget wrote: > Thanks for the quick response Kienan, > Your proposal is exactly how we were thinking the monitor application > could work, so we'll go with that for now. > Reacting to absence of an event (watch dog) would really be a good > complement to the existing trigger types. > It's a really useful feature for a flight recorder in embedded medium > real-time applications, is the team open to feature requests? > Cheers > Damien > > On Thu, Sep 12, 2024 at 12:57 AM Kienan Stewart > <kstewart@efficios.com> wrote: > > Hi Damien, > > On 2024-09-11 18:38, Damien Berget via lttng-dev wrote: > > Good day, > > We are trying to see what it the best way to monitor some > applications > > not hitting a deadline. Ideally something like a watchdog that > needs > > to be pat regularly and if timeout is reached triggers the snapshot. > > > > Before we reinvent the wheel and code some userland > applications, is > > there a canonical way in LTTng to do it? I found this > > <https://review.lttng.org/c/lttng-tools/+/9657/9> that is > suspiciously > > close maybe? > > > I don't think the the proposed changes you linked to are useful or > related to what you hope to achieve. The patch series is a concept > about > how some types of UST ring buffer stalls might be addressed by the > session daemon. After a quick glance, the monitoring seems to be more > closely related to the 'monitor timer', which is used to sample > statistical information channels[1]. > > > There is a concept of triggers[2]; however triggers react to the > presence of events rather than the absence thereof. > > > I think a small user space application that monitors the state of > other > applications is more the direction to head in. There's at least of > couple of ways that a snapshot on unhealthy state could be achieved: > > > * Use liblttng-ctl to trigger a snapshot from your watchdog > application[3][4]. > > * Have the watchdog application exec `lttng snapshot record`[5]. > > * Have the watchdog application emit some sort of "health state" > events > with some data (e.g. health_okay, health_bad, ...) per your usage > requirements, and configure a trigger[2] to take a snapshot on the > "health state" events that have the non-okay state. > > > Depending on your tracing configuration - channel overwrite/discard > mode[6], buffer sizes, blocking mode, and number of events it is > possible that events may not be recorded. I would privilege using > liblttng-ctl or exec'ing `lttng snapshort record` if you want a > stronger > guarantee that your watchdog will cause a snapshot to be taken. > > > I would love to hear if there are other ideas. Regardless, hope > this helps! > > > thanks, > > kienan > > > [1]: https://lttng.org/docs/v2.13/#doc-channel-timers > > [2]: https://lttng.org/docs/v2.13/#doc-trigger > > [3]: https://lttng.org/docs/v2.13/#doc-liblttng-ctl-lttng > > [4]: > https://github.com/lttng/lttng-tools/tree/master/src/lib/lttng-ctl > > [5]: https://lttng.org/man/1/lttng-snapshot/v2.13/ > > [6]: > https://lttng.org/docs/v2.13/#doc-channel-overwrite-mode-vs-discard-mode > > > > Thanks, > > Cheers > > > > -- > > *Damien Berget* > > Embedded Platform Lead > > damien.berget@flyzipline.com > > > > _______________________________________________ > > lttng-dev mailing list > > lttng-dev@lists.lttng.org > > https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev > > > > -- > *Damien Berget* _______________________________________________ lttng-dev mailing list lttng-dev@lists.lttng.org https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2024-09-13 9:51 UTC | newest] Thread overview: 4+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2024-09-11 22:38 [lttng-dev] Trigger snapshots on a watchdog Damien Berget via lttng-dev 2024-09-12 7:57 ` Kienan Stewart via lttng-dev 2024-09-12 16:14 ` Damien Berget via lttng-dev 2024-09-13 9:51 ` Kienan Stewart via lttng-dev
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).