* RFC for event logging mechanism
@ 2017-09-05 11:24 Deepak Kodihalli
2017-09-05 22:46 ` Rick Altherr
2017-09-05 23:02 ` Vernon Mauery
0 siblings, 2 replies; 4+ messages in thread
From: Deepak Kodihalli @ 2017-09-05 11:24 UTC (permalink / raw)
To: OpenBMC Maillist
Hello,
I'm working this sprint on designing an event logging mechanism
(https://github.com/openbmc/openbmc/issues/1856). I have a couple
proposals below along with some questions. Hoping to hear thoughts on
which might be a better proposal. Any other feedback is welcome.
Potential requirements
1) Applications should be able to log events of interest. Events could
be used for purposes such as telemetry, analytics, debug. Examples of
events could be changes in the power/thermal domain, such as operating
temps on a server, boot related, user account changes, etc.
2a) Users should be able to query events via REST.
2b) Users should also be able to query events of a certain category or type.
3) Users should also be able to "download" events in a format such as
JSON (This comes for free today with the rest server running on OpenBMC).
4) It should be possible to specify event metadata, which may have use
for a human as well as a program.
5) It should be possible to persist events up to a certain cap.
Proposal 1 - Leverage existing OpenBMC phosphor-logging
Phopshor-logging works as a supplement to journald - at a high level it
makes it possible to log errors to the journal, as well as create d-bus
objects representing the errors.
- Phosphor-logging uses the Entry interface [1] to describe an error. I
have [2] as the proposed Event interface. It's mostly similar to [1] -
differences being - I wasn't sure if we really need event severity and
resolution, plus having an event Category would be handy for handling
Requirement 2b).
- Phosphor-logging requires describing errors in yaml (error yaml and
error metadata yaml), which are processed [3] by a script that generates
an error log API, which clients can use. The API is part of a
phosphor-logging client lib. The same yaml structure can be utilized for
events, maybe with the yaml files themselves being named slightly
differently to depict events and event metadata instead of errors. This
means the client lib will have an event API, similar to the existing
elog API [6]. Error yaml files are stored either in the
phosphor-dbus-interfaces repo, or within an application's repo, based on
whether the error corresponds to a d-bus interface failure or not. In
case of events, I think the event yaml files can just be stored in the
app that creates them.
- The event logging API, in addition to logging to journal, will call an
internal phosphor-logging d-bus API, similar to [4], in order create a
d-bus object depicting the event. Based on the event Category, the d-bus
object will be placed in the right namespace, such as
/xyz/openbmc_project/logging/events/boot/ or
/xyz/openbmc_project/logging/events/thermal/. The phosphor-logging
process, hence, will own these d-bus objects, do the id management (per
category), etc.
Proposal 2 - Write d-bus interfaces to describe events
Couple of issues I see with Proposal 1 :
a) It's cumbersome for a BMC app to figure out that a specific event was
reported, or to express interest in a certain category of events. The
d-bus path namespace can help to a certain extent here though, but it's
based on paths and properties and not interfaces being added.
b) Both the existing Entry interface [1] and the proposed Event
interface [2] express metadata as strings, probably not the most elegant
way for an interested program to deal with them.
Given this, it feels more natural to express an event in it's own d-bus
interface, such as an Event.Boot or Event.Thermal interface. So, this
proposal looks like :
- Define an Event log interface [5]. Note that this is mostly like [2],
although it has an additional method to create the event d-bus object.
- For specific event types, define their own d-bus interfaces. I don't
have examples for these at the moment, but like I mentioned above, we
could have interfaces for Event.Boot and Event.Thermal to start with.
These interfaces could be placed in the phosphor-dbus-interfaces repo. A
phosphor-logging application will have the code to implement these
well-known event interfaces, and to basically create d-bus objects. This
app will also implement the "Notify" method defined in [5].
- An application interested in reporting an event will make a call to
the "Notify" API defined in [5], stating the event category and the
event metadata. The phosphor-logging application that implements
"Notify", will create d-bus objects based on the event Category and
metadata, and place them in appropriate d-bus path namespaces, similar
to Proposal 1. It can also log the event information to the journal,
though I am not sure why this would be required, aside from the having
the need to have the journal as the repo of all events.
[1]
https://github.com/openbmc/phosphor-dbus-interfaces/blob/master/xyz/openbmc_project/Logging/Entry.interface.yaml
[2] https://gerrit.openbmc-project.xyz/#/c/6405/1
[3]
https://github.com/openbmc/phosphor-logging/blob/master/tools/elog-gen.py,
error yaml example :
https://github.com/openbmc/phosphor-dbus-interfaces/tree/master/xyz/openbmc_project/Dump
[4]
https://github.com/openbmc/phosphor-logging/blob/master/log_manager.cpp#L27
[5] https://gerrit.openbmc-project.xyz/#/c/6406/1
[6]
https://github.com/openbmc/phosphor-logging/blob/master/phosphor-logging/elog.hpp#L126
Regards,
Deepak
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: RFC for event logging mechanism
2017-09-05 11:24 RFC for event logging mechanism Deepak Kodihalli
@ 2017-09-05 22:46 ` Rick Altherr
2017-09-05 23:13 ` Andrew Jeffery
2017-09-05 23:02 ` Vernon Mauery
1 sibling, 1 reply; 4+ messages in thread
From: Rick Altherr @ 2017-09-05 22:46 UTC (permalink / raw)
To: Deepak Kodihalli; +Cc: OpenBMC Maillist
On Tue, Sep 5, 2017 at 4:24 AM, Deepak Kodihalli
<dkodihal@linux.vnet.ibm.com> wrote:
> Hello,
>
> I'm working this sprint on designing an event logging mechanism
> (https://github.com/openbmc/openbmc/issues/1856). I have a couple proposals
> below along with some questions. Hoping to hear thoughts on which might be a
> better proposal. Any other feedback is welcome.
>
> Potential requirements
> 1) Applications should be able to log events of interest. Events could be
> used for purposes such as telemetry, analytics, debug. Examples of events
> could be changes in the power/thermal domain, such as operating temps on a
> server, boot related, user account changes, etc.
My experience at Google is that telemetry and debug, while both
event-oriented, have vastly different usages which lead to different
designs. Telemetry like temperature, voltage, current, boot count,
f/w version are primarily collected, aggregated, and analyzed by
automation systems. To avoid complexity in the consumers of that
data, it is ideally self-describing, easily parsable, and in the most
intuitive, primitive form. Our internal systems use explicit types
and generous metadata for _each_ piece of telemetry. For example, a
voltage sensor is encoded as a float with metadata describing the rail
being monitored, units (mV or V), time recorded, etc. I expect these
to be consumed as they are generated, either by local processes or by
streaming off-machine.
Debug is almost exclusively consumed by humans in cases where a
problem has already been found and the relevant data for identifying
the root cause is not in the telemetry and unknown. I think of debug
as being comprised mostly of unstructured logs. Off-machine
collection can be done infrequently and in bulk.
All attempts I've seen to mix the two has lead to a messy API and/or
data model. Even if the data model and API _can_ be shared, the
higher-level consumers of the two types of data will be different.
Instead of requiring the client to know which are telemetry and which
are debug logs, separating that into two APIs makes the consumer's
life simpler.
> 2a) Users should be able to query events via REST.
> 2b) Users should also be able to query events of a certain category or type.
> 3) Users should also be able to "download" events in a format such as JSON
> (This comes for free today with the rest server running on OpenBMC).
> 4) It should be possible to specify event metadata, which may have use for a
> human as well as a program.
> 5) It should be possible to persist events up to a certain cap.
>
>
>
> Proposal 1 - Leverage existing OpenBMC phosphor-logging
>
> Phopshor-logging works as a supplement to journald - at a high level it
> makes it possible to log errors to the journal, as well as create d-bus
> objects representing the errors.
>
> - Phosphor-logging uses the Entry interface [1] to describe an error. I have
> [2] as the proposed Event interface. It's mostly similar to [1] -
> differences being - I wasn't sure if we really need event severity and
> resolution, plus having an event Category would be handy for handling
> Requirement 2b).
>
> - Phosphor-logging requires describing errors in yaml (error yaml and error
> metadata yaml), which are processed [3] by a script that generates an error
> log API, which clients can use. The API is part of a phosphor-logging client
> lib. The same yaml structure can be utilized for events, maybe with the yaml
> files themselves being named slightly differently to depict events and event
> metadata instead of errors. This means the client lib will have an event
> API, similar to the existing elog API [6]. Error yaml files are stored
> either in the phosphor-dbus-interfaces repo, or within an application's
> repo, based on whether the error corresponds to a d-bus interface failure or
> not. In case of events, I think the event yaml files can just be stored in
> the app that creates them.
>
> - The event logging API, in addition to logging to journal, will call an
> internal phosphor-logging d-bus API, similar to [4], in order create a d-bus
> object depicting the event. Based on the event Category, the d-bus object
> will be placed in the right namespace, such as
> /xyz/openbmc_project/logging/events/boot/ or
> /xyz/openbmc_project/logging/events/thermal/. The phosphor-logging process,
> hence, will own these d-bus objects, do the id management (per category),
> etc.
>
>
>
> Proposal 2 - Write d-bus interfaces to describe events
>
> Couple of issues I see with Proposal 1 :
>
> a) It's cumbersome for a BMC app to figure out that a specific event was
> reported, or to express interest in a certain category of events. The d-bus
> path namespace can help to a certain extent here though, but it's based on
> paths and properties and not interfaces being added.
> b) Both the existing Entry interface [1] and the proposed Event interface
> [2] express metadata as strings, probably not the most elegant way for an
> interested program to deal with them.
>
> Given this, it feels more natural to express an event in it's own d-bus
> interface, such as an Event.Boot or Event.Thermal interface. So, this
> proposal looks like :
>
> - Define an Event log interface [5]. Note that this is mostly like [2],
> although it has an additional method to create the event d-bus object.
>
> - For specific event types, define their own d-bus interfaces. I don't have
> examples for these at the moment, but like I mentioned above, we could have
> interfaces for Event.Boot and Event.Thermal to start with. These interfaces
> could be placed in the phosphor-dbus-interfaces repo. A phosphor-logging
> application will have the code to implement these well-known event
> interfaces, and to basically create d-bus objects. This app will also
> implement the "Notify" method defined in [5].
>
> - An application interested in reporting an event will make a call to the
> "Notify" API defined in [5], stating the event category and the event
> metadata. The phosphor-logging application that implements "Notify", will
> create d-bus objects based on the event Category and metadata, and place
> them in appropriate d-bus path namespaces, similar to Proposal 1. It can
> also log the event information to the journal, though I am not sure why this
> would be required, aside from the having the need to have the journal as the
> repo of all events.
>
>
>
> [1]
> https://github.com/openbmc/phosphor-dbus-interfaces/blob/master/xyz/openbmc_project/Logging/Entry.interface.yaml
> [2] https://gerrit.openbmc-project.xyz/#/c/6405/1
> [3]
> https://github.com/openbmc/phosphor-logging/blob/master/tools/elog-gen.py,
> error yaml example :
> https://github.com/openbmc/phosphor-dbus-interfaces/tree/master/xyz/openbmc_project/Dump
> [4]
> https://github.com/openbmc/phosphor-logging/blob/master/log_manager.cpp#L27
> [5] https://gerrit.openbmc-project.xyz/#/c/6406/1
> [6]
> https://github.com/openbmc/phosphor-logging/blob/master/phosphor-logging/elog.hpp#L126
>
> Regards,
> Deepak
>
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: RFC for event logging mechanism
2017-09-05 11:24 RFC for event logging mechanism Deepak Kodihalli
2017-09-05 22:46 ` Rick Altherr
@ 2017-09-05 23:02 ` Vernon Mauery
1 sibling, 0 replies; 4+ messages in thread
From: Vernon Mauery @ 2017-09-05 23:02 UTC (permalink / raw)
To: Deepak Kodihalli; +Cc: OpenBMC Maillist
On 05-Sep-2017 04:54 PM, Deepak Kodihalli wrote:
> Hello,
>
> I'm working this sprint on designing an event logging mechanism
> (https://github.com/openbmc/openbmc/issues/1856). I have a couple proposals
> below along with some questions. Hoping to hear thoughts on which might be a
> better proposal. Any other feedback is welcome.
>
> Potential requirements
> 1) Applications should be able to log events of interest. Events could be
> used for purposes such as telemetry, analytics, debug. Examples of events
> could be changes in the power/thermal domain, such as operating temps on a
> server, boot related, user account changes, etc.
> 2a) Users should be able to query events via REST.
> 2b) Users should also be able to query events of a certain category or type.
> 3) Users should also be able to "download" events in a format such as JSON
> (This comes for free today with the rest server running on OpenBMC).
> 4) It should be possible to specify event metadata, which may have use for a
> human as well as a program.
> 5) It should be possible to persist events up to a certain cap.
6) It should be possible to parse the log to make other formats. (Some
IPMI users still may want SEL-formatted logs instead of Redfish/REST)
> Proposal 1 - Leverage existing OpenBMC phosphor-logging
>
> Phopshor-logging works as a supplement to journald - at a high level it
> makes it possible to log errors to the journal, as well as create d-bus
> objects representing the errors.
>
> - Phosphor-logging uses the Entry interface [1] to describe an error. I have
> [2] as the proposed Event interface. It's mostly similar to [1] -
> differences being - I wasn't sure if we really need event severity and
> resolution, plus having an event Category would be handy for handling
> Requirement 2b).
>
> - Phosphor-logging requires describing errors in yaml (error yaml and error
> metadata yaml), which are processed [3] by a script that generates an error
> log API, which clients can use. The API is part of a phosphor-logging client
> lib. The same yaml structure can be utilized for events, maybe with the yaml
> files themselves being named slightly differently to depict events and event
> metadata instead of errors. This means the client lib will have an event
> API, similar to the existing elog API [6]. Error yaml files are stored
> either in the phosphor-dbus-interfaces repo, or within an application's
> repo, based on whether the error corresponds to a d-bus interface failure or
> not. In case of events, I think the event yaml files can just be stored in
> the app that creates them.
>
> - The event logging API, in addition to logging to journal, will call an
> internal phosphor-logging d-bus API, similar to [4], in order create a d-bus
> object depicting the event. Based on the event Category, the d-bus object
> will be placed in the right namespace, such as
> /xyz/openbmc_project/logging/events/boot/ or
> /xyz/openbmc_project/logging/events/thermal/. The phosphor-logging process,
> hence, will own these d-bus objects, do the id management (per category),
> etc.
>
Possibly with the right metadata, it would be possible to filter and
parse event logs this way into a SEL format. In my ideal world, I want
to say that we would have one log that we could just 'mine' SEL entries
from upon request so that we don't have duplicate entries in various
logs.
>
> Proposal 2 - Write d-bus interfaces to describe events
>
> Couple of issues I see with Proposal 1 :
>
> a) It's cumbersome for a BMC app to figure out that a specific event was
> reported, or to express interest in a certain category of events. The d-bus
> path namespace can help to a certain extent here though, but it's based on
> paths and properties and not interfaces being added.
> b) Both the existing Entry interface [1] and the proposed Event interface
> [2] express metadata as strings, probably not the most elegant way for an
> interested program to deal with them.
Once again, thinking of legacy IPMI interfaces, something like PEF
would be a natural fit for this sort of app listening for certain types
of events. Not every user will want to be using PEF, since it is an
IPMI-era tool, but for old-school system admin types, this is useful.
--Vernon
> Given this, it feels more natural to express an event in it's own d-bus
> interface, such as an Event.Boot or Event.Thermal interface. So, this
> proposal looks like :
>
> - Define an Event log interface [5]. Note that this is mostly like [2],
> although it has an additional method to create the event d-bus object.
>
> - For specific event types, define their own d-bus interfaces. I don't have
> examples for these at the moment, but like I mentioned above, we could have
> interfaces for Event.Boot and Event.Thermal to start with. These interfaces
> could be placed in the phosphor-dbus-interfaces repo. A phosphor-logging
> application will have the code to implement these well-known event
> interfaces, and to basically create d-bus objects. This app will also
> implement the "Notify" method defined in [5].
>
> - An application interested in reporting an event will make a call to the
> "Notify" API defined in [5], stating the event category and the event
> metadata. The phosphor-logging application that implements "Notify", will
> create d-bus objects based on the event Category and metadata, and place
> them in appropriate d-bus path namespaces, similar to Proposal 1. It can
> also log the event information to the journal, though I am not sure why this
> would be required, aside from the having the need to have the journal as the
> repo of all events.
>
>
>
> [1] https://github.com/openbmc/phosphor-dbus-interfaces/blob/master/xyz/openbmc_project/Logging/Entry.interface.yaml
> [2] https://gerrit.openbmc-project.xyz/#/c/6405/1
> [3]
> https://github.com/openbmc/phosphor-logging/blob/master/tools/elog-gen.py,
> error yaml example : https://github.com/openbmc/phosphor-dbus-interfaces/tree/master/xyz/openbmc_project/Dump
> [4]
> https://github.com/openbmc/phosphor-logging/blob/master/log_manager.cpp#L27
> [5] https://gerrit.openbmc-project.xyz/#/c/6406/1
> [6] https://github.com/openbmc/phosphor-logging/blob/master/phosphor-logging/elog.hpp#L126
>
> Regards,
> Deepak
>
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: RFC for event logging mechanism
2017-09-05 22:46 ` Rick Altherr
@ 2017-09-05 23:13 ` Andrew Jeffery
0 siblings, 0 replies; 4+ messages in thread
From: Andrew Jeffery @ 2017-09-05 23:13 UTC (permalink / raw)
To: Rick Altherr, Deepak Kodihalli; +Cc: OpenBMC Maillist
[-- Attachment #1: Type: text/plain, Size: 2697 bytes --]
On Tue, 2017-09-05 at 15:46 -0700, Rick Altherr wrote:
> On Tue, Sep 5, 2017 at 4:24 AM, Deepak Kodihalli
> > <dkodihal@linux.vnet.ibm.com> wrote:
> > Hello,
> >
> > I'm working this sprint on designing an event logging mechanism
> > > > (https://github.com/openbmc/openbmc/issues/1856). I have a couple proposals
> > below along with some questions. Hoping to hear thoughts on which might be a
> > better proposal. Any other feedback is welcome.
> >
> > Potential requirements
> > 1) Applications should be able to log events of interest. Events could be
> > used for purposes such as telemetry, analytics, debug. Examples of events
> > could be changes in the power/thermal domain, such as operating temps on a
> > server, boot related, user account changes, etc.
>
> My experience at Google is that telemetry and debug, while both
> event-oriented, have vastly different usages which lead to different
> designs. Telemetry like temperature, voltage, current, boot count,
> f/w version are primarily collected, aggregated, and analyzed by
> automation systems. To avoid complexity in the consumers of that
> data, it is ideally self-describing, easily parsable, and in the most
> intuitive, primitive form. Our internal systems use explicit types
> and generous metadata for _each_ piece of telemetry. For example, a
> voltage sensor is encoded as a float with metadata describing the rail
> being monitored, units (mV or V), time recorded, etc. I expect these
> to be consumed as they are generated, either by local processes or by
> streaming off-machine.
>
> Debug is almost exclusively consumed by humans in cases where a
> problem has already been found and the relevant data for identifying
> the root cause is not in the telemetry and unknown. I think of debug
> as being comprised mostly of unstructured logs. Off-machine
> collection can be done infrequently and in bulk.
>
> All attempts I've seen to mix the two has lead to a messy API and/or
> data model. Even if the data model and API _can_ be shared, the
> higher-level consumers of the two types of data will be different.
> Instead of requiring the client to know which are telemetry and which
> are debug logs, separating that into two APIs makes the consumer's
> life simpler.
Having only briefly skimmed the emails, I support Rick's position.
Having recently tried to debug complex issues with OpenBMC in its
current state, the structured logging made life pretty miserable. It
lead to vague error messages that gave no chance of diagnosing the
actual problem at hand, whilst the volume of the conflated
telemetry/debug logging was large.
Andrew
[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 801 bytes --]
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2017-09-05 23:15 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-09-05 11:24 RFC for event logging mechanism Deepak Kodihalli
2017-09-05 22:46 ` Rick Altherr
2017-09-05 23:13 ` Andrew Jeffery
2017-09-05 23:02 ` Vernon Mauery
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.