* RFC for Telemetry data collection
@ 2017-09-07 15:20 tomjose
2017-09-07 18:41 ` Rick Altherr
` (2 more replies)
0 siblings, 3 replies; 11+ messages in thread
From: tomjose @ 2017-09-07 15:20 UTC (permalink / raw)
To: OpenBMC Maillist; +Cc: rosedahl, thalerj, jkeusema
Hello,
I am working on the issue
(https://github.com/openbmc/openbmc/issues/1957) to design a telemetry
application for the OpenBMC. I would be explaining a rough idea of how
we plan to go about. Please share your thoughts and feedback on this
proposal. This issue would depend on the design evolving out of
following issues, since this app would utilize the capabilities
provided. (https://github.com/openbmc/openbmc/issues/1856,
https://github.com/openbmc/openbmc/issues/2102).
Summary of the requirements that we came across relevant to this discussion.
1) BMC telemetry data (example VRM rail voltages) where the data is
collected at different rates depending on the data and aggregated by the
BMC app (minimum, maximum
and average). Based on the collection timing request(frequency) the
metrics are logged, so that the user can fetch it for analytics.
2) Users should be able to set thresholds for the temperature limits,
and receive alerts. This would allow user to plan the cooling needs.
3) BMC would act as route for the OCC metrics to be send to the user.
The OCC would send down telemetric data to the BMC and BMC should figure
out a way to
alert the user to consume this data.
We would keep the focus of the discussion on the requirement no 1.
This proposal presupposes that all the resources( example VRM rail
voltages, ambient temperature) that the telemetry app is interested in,
should be populated as dbus objects, which can
be queried to read the instantaneous values. phosphor-hwmon application
exposes many of the interested resources.
The idea is to have a yaml based approach, where the policy of the
telemetry app will be expressed. The application would be able to
consume the yaml and initiate the telemetry
data collection. The yaml would express the following:
a) Dbus Info (object, interface, property) associated with the resource.
b) Units associated with the value (celsius) and the associated scaling
factor).
c) Granularity - the time between two measures.
d) Aggregation methods - min,max,avg..etc.
e) Logging policy - frequency for creating an event and alerting the user.
The application would operate based on the policy and log the telemetry
data. The details of logging would evolve as we progress on the related
issue.
Regards,
Tom
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: RFC for Telemetry data collection
2017-09-07 15:20 RFC for Telemetry data collection tomjose
@ 2017-09-07 18:41 ` Rick Altherr
2017-09-07 20:04 ` Todd Rosedahl
2017-09-08 1:18 ` Brad Bishop
2017-09-08 1:16 ` Brad Bishop
2018-03-09 13:43 ` Deepak Kodihalli
2 siblings, 2 replies; 11+ messages in thread
From: Rick Altherr @ 2017-09-07 18:41 UTC (permalink / raw)
To: tomjose; +Cc: OpenBMC Maillist, thalerj, jkeusema, rosedahl
[-- Attachment #1.1: Type: text/plain, Size: 3135 bytes --]
I have many opinions on telemetry data formats and APIs. What I'm seeing
in your proposal looks pretty good with some subtlety in the details. For
example, I expect to collect most data at least once-per-second, not log
anything locally, and not alert. I'll do all aggregation and thresholding
at a higher level in the software stack. I also, ideally, want very
descriptive information about where in the system the sensor is. I've
attached a screenshot of what our existing host-based reporting software
makes available to higher-level software. This is a view via the
human-readable web interface, the data is normally served via protobufs.
Rick
On Thu, Sep 7, 2017 at 8:20 AM, tomjose <tomjose@linux.vnet.ibm.com> wrote:
> Hello,
>
> I am working on the issue (https://github.com/openbmc/openbmc/issues/1957)
> to design a telemetry application for the OpenBMC. I would be explaining a
> rough idea of how we plan to go about. Please share your thoughts and
> feedback on this proposal. This issue would depend on the design evolving
> out of following issues, since this app would utilize the capabilities
> provided. (https://github.com/openbmc/openbmc/issues/1856,
> https://github.com/openbmc/openbmc/issues/2102).
>
> Summary of the requirements that we came across relevant to this
> discussion.
>
>
> 1) BMC telemetry data (example VRM rail voltages) where the data is
> collected at different rates depending on the data and aggregated by the
> BMC app (minimum, maximum
> and average). Based on the collection timing request(frequency) the
> metrics are logged, so that the user can fetch it for analytics.
>
> 2) Users should be able to set thresholds for the temperature limits, and
> receive alerts. This would allow user to plan the cooling needs.
>
> 3) BMC would act as route for the OCC metrics to be send to the user. The
> OCC would send down telemetric data to the BMC and BMC should figure out a
> way to
> alert the user to consume this data.
>
>
> We would keep the focus of the discussion on the requirement no 1.
> This proposal presupposes that all the resources( example VRM rail
> voltages, ambient temperature) that the telemetry app is interested in,
> should be populated as dbus objects, which can
> be queried to read the instantaneous values. phosphor-hwmon application
> exposes many of the interested resources.
>
> The idea is to have a yaml based approach, where the policy of the
> telemetry app will be expressed. The application would be able to consume
> the yaml and initiate the telemetry
> data collection. The yaml would express the following:
>
> a) Dbus Info (object, interface, property) associated with the resource.
> b) Units associated with the value (celsius) and the associated scaling
> factor).
> c) Granularity - the time between two measures.
> d) Aggregation methods - min,max,avg..etc.
> e) Logging policy - frequency for creating an event and alerting the user.
>
> The application would operate based on the policy and log the telemetry
> data. The details of logging would evolve as we progress on the related
> issue.
>
> Regards,
> Tom
>
>
>
>
>
[-- Attachment #1.2: Type: text/html, Size: 3866 bytes --]
[-- Attachment #2: zaius-fan-telemetry.png --]
[-- Type: image/png, Size: 73436 bytes --]
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: RFC for Telemetry data collection
2017-09-07 18:41 ` Rick Altherr
@ 2017-09-07 20:04 ` Todd Rosedahl
2017-09-08 1:18 ` Brad Bishop
1 sibling, 0 replies; 11+ messages in thread
From: Todd Rosedahl @ 2017-09-07 20:04 UTC (permalink / raw)
To: Rick Altherr; +Cc: jkeusema, OpenBMC Maillist, thalerj, tomjose
[-- Attachment #1: Type: text/plain, Size: 4706 bytes --]
We do need to provide live reads of a subset of the data, which I think is
what Rick is describing below. For instance fan speeds, 30 second power
averages, component temperatures, etc. Much like IPMI/DCMI
implementations out there today. And these need to alert based on trip
levels that are set. Higher layers of software can then act on this data
or log it away as they see fit. I like the idea of rich meta-data around
these values, but I would think we would use Redfish as the method of
exporting this data.
We also need deep traces where the data is gathered, processed, and logged
locally by the BMC. Then the BMC should alert every X hours and the log
should be collected by the higher layer entity. This would be for things
like VRM currents on every output (hourly min, max, average). It is not
required that any other company use these deep telemetry logs, but they
are required on our systems.
As far as #3 below, this should not be a new requirement. Just export the
OCC telemetry log in the same way that you export all OCC/HOST logs.
Todd Rosedahl
IBM Power and Thermal Management
(507) 250-3275
rosedahl@us.ibm.com
From: Rick Altherr <raltherr@google.com>
To: tomjose <tomjose@linux.vnet.ibm.com>
Cc: OpenBMC Maillist <openbmc@lists.ozlabs.org>, thalerj@us.ibm.com,
jkeusema@us.ibm.com, rosedahl@us.ibm.com
Date: 09/07/2017 01:41 PM
Subject: Re: RFC for Telemetry data collection
I have many opinions on telemetry data formats and APIs. What I'm seeing
in your proposal looks pretty good with some subtlety in the details. For
example, I expect to collect most data at least once-per-second, not log
anything locally, and not alert. I'll do all aggregation and thresholding
at a higher level in the software stack. I also, ideally, want very
descriptive information about where in the system the sensor is. I've
attached a screenshot of what our existing host-based reporting software
makes available to higher-level software. This is a view via the
human-readable web interface, the data is normally served via protobufs.
Rick
On Thu, Sep 7, 2017 at 8:20 AM, tomjose <tomjose@linux.vnet.ibm.com>
wrote:
Hello,
I am working on the issue (https://github.com/openbmc/openbmc/issues/1957)
to design a telemetry application for the OpenBMC. I would be explaining a
rough idea of how we plan to go about. Please share your thoughts and
feedback on this proposal. This issue would depend on the design evolving
out of following issues, since this app would utilize the capabilities
provided. (https://github.com/openbmc/openbmc/issues/1856,
https://github.com/openbmc/openbmc/issues/2102).
Summary of the requirements that we came across relevant to this
discussion.
1) BMC telemetry data (example VRM rail voltages) where the data is
collected at different rates depending on the data and aggregated by the
BMC app (minimum, maximum
and average). Based on the collection timing request(frequency) the
metrics are logged, so that the user can fetch it for analytics.
2) Users should be able to set thresholds for the temperature limits, and
receive alerts. This would allow user to plan the cooling needs.
3) BMC would act as route for the OCC metrics to be send to the user. The
OCC would send down telemetric data to the BMC and BMC should figure out a
way to
alert the user to consume this data.
We would keep the focus of the discussion on the requirement no 1.
This proposal presupposes that all the resources( example VRM rail
voltages, ambient temperature) that the telemetry app is interested in,
should be populated as dbus objects, which can
be queried to read the instantaneous values. phosphor-hwmon application
exposes many of the interested resources.
The idea is to have a yaml based approach, where the policy of the
telemetry app will be expressed. The application would be able to consume
the yaml and initiate the telemetry
data collection. The yaml would express the following:
a) Dbus Info (object, interface, property) associated with the resource.
b) Units associated with the value (celsius) and the associated scaling
factor).
c) Granularity - the time between two measures.
d) Aggregation methods - min,max,avg..etc.
e) Logging policy - frequency for creating an event and alerting the user.
The application would operate based on the policy and log the telemetry
data. The details of logging would evolve as we progress on the related
issue.
Regards,
Tom
[attachment "zaius-fan-telemetry.png" deleted by Todd
Rosedahl/Rochester/IBM]
[-- Attachment #2: Type: text/html, Size: 7005 bytes --]
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: RFC for Telemetry data collection
2017-09-07 15:20 RFC for Telemetry data collection tomjose
2017-09-07 18:41 ` Rick Altherr
@ 2017-09-08 1:16 ` Brad Bishop
2017-09-08 3:29 ` Deepak Kodihalli
2018-03-09 13:43 ` Deepak Kodihalli
2 siblings, 1 reply; 11+ messages in thread
From: Brad Bishop @ 2017-09-08 1:16 UTC (permalink / raw)
To: tomjose; +Cc: OpenBMC Maillist, thalerj, jkeusema, rosedahl
Hi Tom
I don’t really disagree with anything you wrote…I’m just throwing out some additional thoughts.
I would propose that we:
1 - Add support to the REST server to subscribe to DBus signals, and simply forward the signal content in JSON format out over a websocket. This allows an external user to get any async notification that any code running on the BMC can get. I have a lot of questions on specifics here but I’ll save those for later, in case this doesn’t work out.
2 - Write an application to run on the BMC that subscribes to DBus signals and each time it occurs, creates a DBus object with the content of the signal again in JSON format. The signals to subscribe to (and therefore the resulting “history” DBus objects for) would be defined via per-platform YAML.
From there implementing different “eventing” interfaces like Redfish, DCMI, SNMP traps, or whatever Rick does with protobufs :-) becomes an exercise in:
1 - If the eventing interface is async, socket based: Doing the same as #1 above in a new application, but applying a transformation on the JSON.
2 - If the eventing interface is sync, history based: Write an application that transforms the objects created by #2 above.
Does anyone see this framework not enabling them to meet their requirements? Please poke holes.
-brad
> On Sep 7, 2017, at 11:20 AM, tomjose <tomjose@linux.vnet.ibm.com> wrote:
>
> Hello,
>
> I am working on the issue (https://github.com/openbmc/openbmc/issues/1957) to design a telemetry application for the OpenBMC. I would be explaining a rough idea of how we plan to go about. Please share your thoughts and feedback on this proposal.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: RFC for Telemetry data collection
2017-09-07 18:41 ` Rick Altherr
2017-09-07 20:04 ` Todd Rosedahl
@ 2017-09-08 1:18 ` Brad Bishop
1 sibling, 0 replies; 11+ messages in thread
From: Brad Bishop @ 2017-09-08 1:18 UTC (permalink / raw)
To: Rick Altherr; +Cc: tomjose, OpenBMC Maillist, thalerj, jkeusema, rosedahl
> On Sep 7, 2017, at 2:41 PM, Rick Altherr <raltherr@google.com> wrote:
>
> I have many opinions on telemetry data formats and APIs. What I'm seeing in your proposal looks pretty good with some subtlety in the details. For example, I expect to collect most data at least once-per-second,
This sounds like polling the BMC externally. Did I infer correctly? How do you feel about a push model with persistent connections?
> not log anything locally, and not alert. I'll do all aggregation and thresholding at a higher level in the software stack. I also, ideally, want very descriptive information about where in the system the sensor is. I've attached a screenshot of what our existing host-based reporting software makes available to higher-level software. This is a view via the human-readable web interface, the data is normally served via protobufs.
>
> Rick
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: RFC for Telemetry data collection
2017-09-08 1:16 ` Brad Bishop
@ 2017-09-08 3:29 ` Deepak Kodihalli
2017-09-08 4:06 ` Brad Bishop
0 siblings, 1 reply; 11+ messages in thread
From: Deepak Kodihalli @ 2017-09-08 3:29 UTC (permalink / raw)
To: Brad Bishop, OpenBMC Maillist
On 08/09/17 6:46 am, Brad Bishop wrote:
> Hi Tom
>
> I don’t really disagree with anything you wrote…I’m just throwing out some additional thoughts.
>
> I would propose that we:
>
> 1 - Add support to the REST server to subscribe to DBus signals, and simply forward the signal content in JSON format out over a websocket. This allows an external user to get any async notification that any code running on the BMC can get. I have a lot of questions on specifics here but I’ll save those for later, in case this doesn’t work out.
Brad, that sounds fine to me in terms of how a notification can be sent,
although Tom's proposal also talked about describing in yaml, how
frequently to read something off of a, say a d-bus object representing a
sensor. If I understood correctly, are you saying instead of doing this
every X hours or seconds, we send out a notification whenever a
PropertyChanged d-bus signal is caught? And let an off-BMC application
bother about averages, min/max?
Regards,
Deepak
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: RFC for Telemetry data collection
2017-09-08 3:29 ` Deepak Kodihalli
@ 2017-09-08 4:06 ` Brad Bishop
0 siblings, 0 replies; 11+ messages in thread
From: Brad Bishop @ 2017-09-08 4:06 UTC (permalink / raw)
To: Deepak Kodihalli; +Cc: OpenBMC Maillist
> On Sep 7, 2017, at 11:29 PM, Deepak Kodihalli <dkodihal@linux.vnet.ibm.com> wrote:
>
> On 08/09/17 6:46 am, Brad Bishop wrote:
>> Hi Tom
>> I don’t really disagree with anything you wrote…I’m just throwing out some additional thoughts.
>> I would propose that we:
>> 1 - Add support to the REST server to subscribe to DBus signals, and simply forward the signal content in JSON format out over a websocket. This allows an external user to get any async notification that any code running on the BMC can get. I have a lot of questions on specifics here but I’ll save those for later, in case this doesn’t work out.
>
You only quoted #1 here. I’m highlighting that because terms like notification, event, and telemetry are being used a little loosely and we are not all on the same page.
I think we need to accommodate both async and sync/historical modes where async is just streamed out of the BMC, and sync, out of necessity have dbus objects created. Hence my proposal for two applications, one for each mode.
Really trying to drive my point here - to put the above another way - in terms of how I have described these two modes, Toms thread (this thread) would be tackling async mode and your note thread (https://lists.ozlabs.org/pipermail/openbmc/2017-September/009065.html) covering sync/historical mode.
> Brad, that sounds fine to me in terms of how a notification can be sent, although Tom's proposal also talked about describing in yaml, how frequently to read something off of a, say a d-bus object representing a sensor.
My initial thinking is that neither of the proposed applications would ever be doing sync reads to a dbus object. Rather, they just passively sit there and consume signals. If need be they could discard them (see rate-limiting below).
> If I understood correctly, are you saying instead of doing this every X hours or seconds, we send out a notification whenever a PropertyChanged d-bus signal is caught?
Yes and no. I am proposing we consume signals rather than perform reads but I still think we need:
1 - An API to configure async mode (what to subscribe to, maybe some kind of rate limiting). I’m not sure if this would be a DBus API or just a REST thing.
2 - A robust YAML syntax that instructs application #2 (sync mode) to do the same things as #1 (what to subscribe to, rate limiting when creating historical snapshot objects).
> And let an off-BMC application bother about averages, min/max?
>
>
> Regards,
> Deepak
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: RFC for Telemetry data collection
2017-09-07 15:20 RFC for Telemetry data collection tomjose
2017-09-07 18:41 ` Rick Altherr
2017-09-08 1:16 ` Brad Bishop
@ 2018-03-09 13:43 ` Deepak Kodihalli
2018-03-12 17:57 ` Deepak Kodihalli
2018-03-13 14:23 ` Kurt Taylor
2 siblings, 2 replies; 11+ messages in thread
From: Deepak Kodihalli @ 2018-03-09 13:43 UTC (permalink / raw)
To: jk, joel, bradleyb, venture, tomjose, rosedahl, vernon.mauery,
openbmc
On 07/09/17 8:50 pm, tomjose wrote:
> Hello,
>
> I am working on the issue
> (https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_openbmc_openbmc_issues_1957&d=DwICaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=LzkOghL3x_3V_EkHiUQJUJ0xrq_s3_wwfssFT35AQXw&m=Gk0KnGVKy2iC82jVgpSqjzR2K_EYRlsFBqs34EjyKys&s=nQguk1LTD_q7dPNEn6dV2p4vsKaMu1vYZs5_Vh0BiNc&e=
> ) to design a telemetry application for the OpenBMC. I would be
> explaining a rough idea of how we plan to go about. Please share your
> thoughts and feedback on this proposal. This issue would depend on the
> design evolving out of following issues, since this app would utilize
> the capabilities provided.
> (https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_openbmc_openbmc_issues_1856&d=DwICaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=LzkOghL3x_3V_EkHiUQJUJ0xrq_s3_wwfssFT35AQXw&m=Gk0KnGVKy2iC82jVgpSqjzR2K_EYRlsFBqs34EjyKys&s=2B_nLYU03S0QgMnCrMr8YawangOxRXmXGBqPF9593DY&e=
> ,
> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_openbmc_openbmc_issues_2102&d=DwICaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=LzkOghL3x_3V_EkHiUQJUJ0xrq_s3_wwfssFT35AQXw&m=Gk0KnGVKy2iC82jVgpSqjzR2K_EYRlsFBqs34EjyKys&s=U6M9vpoDDmNbTJJH5I6M6lPBGFzS1nuqYEEGwXjAviY&e=
> ).
>
> Summary of the requirements that we came across relevant to this
> discussion.
>
>
> 1) BMC telemetry data (example VRM rail voltages) where the data is
> collected at different rates depending on the data and aggregated by the
> BMC app (minimum, maximum
> and average). Based on the collection timing request(frequency) the
> metrics are logged, so that the user can fetch it for analytics.
>
> 2) Users should be able to set thresholds for the temperature limits,
> and receive alerts. This would allow user to plan the cooling needs.
>
> 3) BMC would act as route for the OCC metrics to be send to the user.
> The OCC would send down telemetric data to the BMC and BMC should figure
> out a way to
> alert the user to consume this data.
>
>
> We would keep the focus of the discussion on the requirement no 1.
> This proposal presupposes that all the resources( example VRM rail
> voltages, ambient temperature) that the telemetry app is interested in,
> should be populated as dbus objects, which can
> be queried to read the instantaneous values. phosphor-hwmon application
> exposes many of the interested resources.
>
> The idea is to have a yaml based approach, where the policy of the
> telemetry app will be expressed. The application would be able to
> consume the yaml and initiate the telemetry
> data collection. The yaml would express the following:
>
> a) Dbus Info (object, interface, property) associated with the resource.
> b) Units associated with the value (celsius) and the associated scaling
> factor).
> c) Granularity - the time between two measures.
> d) Aggregation methods - min,max,avg..etc.
> e) Logging policy - frequency for creating an event and alerting the user.
>
> The application would operate based on the policy and log the telemetry
> data. The details of logging would evolve as we progress on the related
> issue.
>
> Regards,
> Tom
Hi,
I'd like to bump this topic and add some more details. I'd like to
discuss design proposals/directions for a couple things :
1) A short/mid term proposal for telemetry requirements specific to IBM
labs (which need to be implemented in a relatively short span of time,
so there may not be the bandwidth to write an entirely new application
not based on D-Bus or the OpenBMC REST API).
2) Industry standard methods for storing and retrieving telemetry data -
thoughts on how to get here.
1) Telemetry requirements specific to IBM labs
Here are the requirements and a design proposal.
a) Instantaneous readings, such as temperatures, currents, errors,
events etc. Let's call this Layer 0.
Proposal:
- The D-Bus model is the source for instantaneous readings. This means
there would be D-Bus objects representing this data, and hence an
OpenBMC REST API around it.
- These D-Bus objects would not necessarily implement the same D-Bus
interfaces.
- Interested clients can read these D-Bus objects via the OpenBMC REST API.
- If clients are interested in being notified about "changes" to the
readings, that's possible via the existing event notification over
WebSockets mechanism.
b) Instantaneous aggregations - this would mostly apply to, but may not
be limited to, readings such as temperatures and currents. Let's call
this Layer 1. This basically is to solve, for eg, "what is the
min/max/average over the last X seconds?". We have a requirement to do
such aggregations on the BMC.
Proposal:
- Aggregations are represented as D-Bus objects, created by a telemetry
app. For eg if we need to know the min/max/avg ambient temp for the last
5 minutes, and say the the ambient temp is usually at temps/ambient, the
aggregation could be at temps/aggregations/ambient.
- Implement D-Bus interfaces to denote aggregations, for eg the
temps/aggregation/ambient object could implement a D-Bus interface
describing min/max/avg properties.
- Aggregation objects will have the values as described in the D-Bus
interface (such as min/max/avg), and a timestamp, as properties.
- Enable a config (eg JSON) to let the telemetry app know things like :
What (supported) aggregations should be performed (min/max/avg)? What
D-Bus objects should be aggregated? How frequently should they be
aggregated? What should be the paths of the aggregations? Potentially
add a REST API to allow changing the (JSON) config at run-time.
- It will be possible to read all aggregation objects, or aggregation
objects of a specific type via one REST call.
c) Historical aggregations or snapshot. Let's call this Layer 2. This is
to solve, for eg, "Need a reading corresponding to every X minutes in a
period of Y hours". This can be a snapshot of Layer 1 or Layer 0 D-Bus
objects. We have a requirement to store this snapshot on the BMC.
Proposal:
- The snapshot will be represented as a set of D-Bus objects. For eg if
one needs an hourly reading for a period of 24 hours, the objects could
be at temps/aggregations/ambient/per-hour/{1..24}.
- Enable a config to let a telemetry app to know things like : What
D-Bus objects should I keep a history of? What is the duration of the
snapshot? At what frequency should entries be added into the snapshot?
Once the snapshot is full, should the entries roll, or should we
restart? Potentially add a REST API to allow changing the (JSON) config
at run-time.
- The historical aggregations can be read via one REST call. It should
be one D-Bus call as well most likely for the REST server, if there's an
object manager at temps/aggregations/ambient/per-hour for eg.
- These objects in the snapshot will implement the same interfaces as
Layer 1 objects, so they will have the same properties (eg min/max/avg,
timestamp).
d) Some notes
- With the proposal above, the API to retrieve the telemetry data is via
the current OpenBMC REST API, so it may not readily work with telemetry
tools relying on industry-standard API (see point 2 below), but it seems
to be the feasible option to rely on to implement IBM's requirements in
the expected timelines.
- Layer 1 and Layer 2 telemetry apps would be different processes, and
can function independent of each other.
2) Industry standard methods for storing and retrieving telemetry data
- With the proposal above, the instantaneous readings are D-Bus objects,
the instantaneous and historical aggregations are D-Bus objects as well.
The API is the OpenBMC REST API.
- Typically, aggregations may not have to happen on the BMC, in which
case one can turn off layers 1 and 2.
- This is regarding how the telemetry data is presented, and how we'd
eventually not use the current OpenBMC REST API in production. I've
heard (mostly from people on the To: list) of the following
industry-standard ways to represent/retrieve telemetry data. This would
mean transforming layer 0 D-Bus objects into these :
- Via Redfish (events) API
- Via IPMI events/PEF
- Via SNMP traps
- Via an sqlite db, and have something like Logstash parse it
- Others?
Discussions are already happening regarding Redfish, so telemetry could
be one aspect to consider as well.
- Aggregations could be done on the BMC with collectd. I need to look at
this in detail. Aggregations could be stored in an RRD format. Need to
look at this in detail as well. These are as opposed to a D-Bus model of
aggregations. Thoughts on this? For eg, would this be much less work
both for the BMC and the telemetry data users than the proposed D-Bus
model, but at the same time can address the requirements I've mentioned?
Do we know what are the commonly used client tools for processing
telemetry data, and how they expect the data to be presented?
Thanks,
Deepak
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: RFC for Telemetry data collection
2018-03-09 13:43 ` Deepak Kodihalli
@ 2018-03-12 17:57 ` Deepak Kodihalli
2018-03-13 14:23 ` Kurt Taylor
1 sibling, 0 replies; 11+ messages in thread
From: Deepak Kodihalli @ 2018-03-12 17:57 UTC (permalink / raw)
To: jk, joel, bradleyb, venture, tomjose, rosedahl, vernon.mauery,
openbmc
On 09/03/18 7:13 pm, Deepak Kodihalli wrote:
> On 07/09/17 8:50 pm, tomjose wrote:
>> Hello,
>>
>> I am working on the issue
>> (https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_openbmc_openbmc_issues_1957&d=DwICaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=LzkOghL3x_3V_EkHiUQJUJ0xrq_s3_wwfssFT35AQXw&m=Gk0KnGVKy2iC82jVgpSqjzR2K_EYRlsFBqs34EjyKys&s=nQguk1LTD_q7dPNEn6dV2p4vsKaMu1vYZs5_Vh0BiNc&e=
>> ) to design a telemetry application for the OpenBMC. I would be
>> explaining a rough idea of how we plan to go about. Please share your
>> thoughts and feedback on this proposal. This issue would depend on the
>> design evolving out of following issues, since this app would utilize
>> the capabilities provided.
>> (https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_openbmc_openbmc_issues_1856&d=DwICaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=LzkOghL3x_3V_EkHiUQJUJ0xrq_s3_wwfssFT35AQXw&m=Gk0KnGVKy2iC82jVgpSqjzR2K_EYRlsFBqs34EjyKys&s=2B_nLYU03S0QgMnCrMr8YawangOxRXmXGBqPF9593DY&e=
>> ,
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_openbmc_openbmc_issues_2102&d=DwICaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=LzkOghL3x_3V_EkHiUQJUJ0xrq_s3_wwfssFT35AQXw&m=Gk0KnGVKy2iC82jVgpSqjzR2K_EYRlsFBqs34EjyKys&s=U6M9vpoDDmNbTJJH5I6M6lPBGFzS1nuqYEEGwXjAviY&e=
>> ).
>>
>> Summary of the requirements that we came across relevant to this
>> discussion.
>>
>>
>> 1) BMC telemetry data (example VRM rail voltages) where the data is
>> collected at different rates depending on the data and aggregated by
>> the BMC app (minimum, maximum
>> and average). Based on the collection timing request(frequency)
>> the metrics are logged, so that the user can fetch it for analytics.
>>
>> 2) Users should be able to set thresholds for the temperature limits,
>> and receive alerts. This would allow user to plan the cooling needs.
>>
>> 3) BMC would act as route for the OCC metrics to be send to the user.
>> The OCC would send down telemetric data to the BMC and BMC should
>> figure out a way to
>> alert the user to consume this data.
>>
>>
>> We would keep the focus of the discussion on the requirement no 1.
>> This proposal presupposes that all the resources( example VRM rail
>> voltages, ambient temperature) that the telemetry app is interested
>> in, should be populated as dbus objects, which can
>> be queried to read the instantaneous values. phosphor-hwmon
>> application exposes many of the interested resources.
>>
>> The idea is to have a yaml based approach, where the policy of the
>> telemetry app will be expressed. The application would be able to
>> consume the yaml and initiate the telemetry
>> data collection. The yaml would express the following:
>>
>> a) Dbus Info (object, interface, property) associated with the resource.
>> b) Units associated with the value (celsius) and the associated
>> scaling factor).
>> c) Granularity - the time between two measures.
>> d) Aggregation methods - min,max,avg..etc.
>> e) Logging policy - frequency for creating an event and alerting the
>> user.
>>
>> The application would operate based on the policy and log the
>> telemetry data. The details of logging would evolve as we progress on
>> the related issue.
>>
>> Regards,
>> Tom
>
> Hi,
>
> I'd like to bump this topic and add some more details. I'd like to
> discuss design proposals/directions for a couple things :
>
> 1) A short/mid term proposal for telemetry requirements specific to IBM
> labs (which need to be implemented in a relatively short span of time,
> so there may not be the bandwidth to write an entirely new application
> not based on D-Bus or the OpenBMC REST API).
> 2) Industry standard methods for storing and retrieving telemetry data -
> thoughts on how to get here.
>
>
>
> 1) Telemetry requirements specific to IBM labs
> Here are the requirements and a design proposal.
>
> a) Instantaneous readings, such as temperatures, currents, errors,
> events etc. Let's call this Layer 0.
>
> Proposal:
> - The D-Bus model is the source for instantaneous readings. This means
> there would be D-Bus objects representing this data, and hence an
> OpenBMC REST API around it.
> - These D-Bus objects would not necessarily implement the same D-Bus
> interfaces.
> - Interested clients can read these D-Bus objects via the OpenBMC REST API.
> - If clients are interested in being notified about "changes" to the
> readings, that's possible via the existing event notification over
> WebSockets mechanism.
>
>
> b) Instantaneous aggregations - this would mostly apply to, but may not
> be limited to, readings such as temperatures and currents. Let's call
> this Layer 1. This basically is to solve, for eg, "what is the
> min/max/average over the last X seconds?". We have a requirement to do
> such aggregations on the BMC.
>
> Proposal:
> - Aggregations are represented as D-Bus objects, created by a telemetry
> app. For eg if we need to know the min/max/avg ambient temp for the last
> 5 minutes, and say the the ambient temp is usually at temps/ambient, the
> aggregation could be at temps/aggregations/ambient.
> - Implement D-Bus interfaces to denote aggregations, for eg the
> temps/aggregation/ambient object could implement a D-Bus interface
> describing min/max/avg properties.
> - Aggregation objects will have the values as described in the D-Bus
> interface (such as min/max/avg), and a timestamp, as properties.
> - Enable a config (eg JSON) to let the telemetry app know things like :
> What (supported) aggregations should be performed (min/max/avg)? What
> D-Bus objects should be aggregated? How frequently should they be
> aggregated? What should be the paths of the aggregations? Potentially
> add a REST API to allow changing the (JSON) config at run-time.
> - It will be possible to read all aggregation objects, or aggregation
> objects of a specific type via one REST call.
>
>
> c) Historical aggregations or snapshot. Let's call this Layer 2. This is
> to solve, for eg, "Need a reading corresponding to every X minutes in a
> period of Y hours". This can be a snapshot of Layer 1 or Layer 0 D-Bus
> objects. We have a requirement to store this snapshot on the BMC.
>
> Proposal:
> - The snapshot will be represented as a set of D-Bus objects. For eg if
> one needs an hourly reading for a period of 24 hours, the objects could
> be at temps/aggregations/ambient/per-hour/{1..24}.
> - Enable a config to let a telemetry app to know things like : What
> D-Bus objects should I keep a history of? What is the duration of the
> snapshot? At what frequency should entries be added into the snapshot?
> Once the snapshot is full, should the entries roll, or should we
> restart? Potentially add a REST API to allow changing the (JSON) config
> at run-time.
> - The historical aggregations can be read via one REST call. It should
> be one D-Bus call as well most likely for the REST server, if there's an
> object manager at temps/aggregations/ambient/per-hour for eg.
> - These objects in the snapshot will implement the same interfaces as
> Layer 1 objects, so they will have the same properties (eg min/max/avg,
> timestamp).
>
>
> d) Some notes
> - With the proposal above, the API to retrieve the telemetry data is via
> the current OpenBMC REST API, so it may not readily work with telemetry
> tools relying on industry-standard API (see point 2 below), but it seems
> to be the feasible option to rely on to implement IBM's requirements in
> the expected timelines.
> - Layer 1 and Layer 2 telemetry apps would be different processes, and
> can function independent of each other.
>
>
>
> 2) Industry standard methods for storing and retrieving telemetry data
>
> - With the proposal above, the instantaneous readings are D-Bus objects,
> the instantaneous and historical aggregations are D-Bus objects as well.
> The API is the OpenBMC REST API.
> - Typically, aggregations may not have to happen on the BMC, in which
> case one can turn off layers 1 and 2.
> - This is regarding how the telemetry data is presented, and how we'd
> eventually not use the current OpenBMC REST API in production. I've
> heard (mostly from people on the To: list) of the following
> industry-standard ways to represent/retrieve telemetry data. This would
> mean transforming layer 0 D-Bus objects into these :
> - Via Redfish (events) API
> - Via IPMI events/PEF
> - Via SNMP traps
> - Via an sqlite db, and have something like Logstash parse it
> - Others?
> Discussions are already happening regarding Redfish, so telemetry could
> be one aspect to consider as well.
> - Aggregations could be done on the BMC with collectd. I need to look at
> this in detail. Aggregations could be stored in an RRD format. Need to
> look at this in detail as well. These are as opposed to a D-Bus model of
> aggregations. Thoughts on this? For eg, would this be much less work
> both for the BMC and the telemetry data users than the proposed D-Bus
> model, but at the same time can address the requirements I've mentioned?
> Do we know what are the commonly used client tools for processing
> telemetry data, and how they expect the data to be presented?
>
>
>
> Thanks,
> Deepak
Added this topic to the agenda in the wiki. Would appreciate feedback so
as to talk about the same in the telecon.
Thanks,
Deepak
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: RFC for Telemetry data collection
2018-03-09 13:43 ` Deepak Kodihalli
2018-03-12 17:57 ` Deepak Kodihalli
@ 2018-03-13 14:23 ` Kurt Taylor
2018-03-13 14:50 ` Deepak Kodihalli
1 sibling, 1 reply; 11+ messages in thread
From: Kurt Taylor @ 2018-03-13 14:23 UTC (permalink / raw)
To: Deepak Kodihalli
Cc: jk, joel, bradleyb, venture, tomjose, rosedahl, vernon.mauery,
OpenBMC Maillist
[-- Attachment #1: Type: text/plain, Size: 9844 bytes --]
On Fri, Mar 9, 2018 at 7:43 AM, Deepak Kodihalli <
dkodihal@linux.vnet.ibm.com> wrote:
> On 07/09/17 8:50 pm, tomjose wrote:
>
>> Hello,
>>
>> I am working on the issue (https://urldefense.proofpoint
>> .com/v2/url?u=https-3A__github.com_openbmc_openbmc_
>> issues_1957&d=DwICaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=LzkOghL3
>> x_3V_EkHiUQJUJ0xrq_s3_wwfssFT35AQXw&m=Gk0KnGVKy2iC82jVgpSqjz
>> R2K_EYRlsFBqs34EjyKys&s=nQguk1LTD_q7dPNEn6dV2p4vsKaMu1vYZs5_Vh0BiNc&e= )
>> to design a telemetry application for the OpenBMC. I would be explaining a
>> rough idea of how we plan to go about. Please share your thoughts and
>> feedback on this proposal. This issue would depend on the design evolving
>> out of following issues, since this app would utilize the capabilities
>> provided. (https://urldefense.proofpoint.com/v2/url?u=https-3A__
>> github.com_openbmc_openbmc_issues_1856&d=DwICaQ&c=jf_
>> iaSHvJObTbx-siA1ZOg&r=LzkOghL3x_3V_EkHiUQJUJ0xrq_s3_wwfssFT3
>> 5AQXw&m=Gk0KnGVKy2iC82jVgpSqjzR2K_EYRlsFBqs34EjyKys&s=2B_nLY
>> U03S0QgMnCrMr8YawangOxRXmXGBqPF9593DY&e= , https://urldefense.proofpoint.
>> com/v2/url?u=https-3A__github.com_openbmc_openbmc_issues_210
>> 2&d=DwICaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=LzkOghL3x_3V_
>> EkHiUQJUJ0xrq_s3_wwfssFT35AQXw&m=Gk0KnGVKy2iC82jVgpSqjzR2K_E
>> YRlsFBqs34EjyKys&s=U6M9vpoDDmNbTJJH5I6M6lPBGFzS1nuqYEEGwXjAviY&e= ).
>>
>> Summary of the requirements that we came across relevant to this
>> discussion.
>>
>> 1) BMC telemetry data (example VRM rail voltages) where the data is
>> collected at different rates depending on the data and aggregated by the
>> BMC app (minimum, maximum
>> and average). Based on the collection timing request(frequency) the
>> metrics are logged, so that the user can fetch it for analytics.
>>
>> 2) Users should be able to set thresholds for the temperature limits,
>> and receive alerts. This would allow user to plan the cooling needs.
>>
>> 3) BMC would act as route for the OCC metrics to be send to the user.
>> The OCC would send down telemetric data to the BMC and BMC should figure
>> out a way to
>> alert the user to consume this data.
>>
>>
>> We would keep the focus of the discussion on the requirement no 1.
>> This proposal presupposes that all the resources( example VRM rail
>> voltages, ambient temperature) that the telemetry app is interested in,
>> should be populated as dbus objects, which can
>> be queried to read the instantaneous values. phosphor-hwmon application
>> exposes many of the interested resources.
>>
>> The idea is to have a yaml based approach, where the policy of the
>> telemetry app will be expressed. The application would be able to consume
>> the yaml and initiate the telemetry
>> data collection. The yaml would express the following:
>>
>> a) Dbus Info (object, interface, property) associated with the resource.
>> b) Units associated with the value (celsius) and the associated scaling
>> factor).
>> c) Granularity - the time between two measures.
>> d) Aggregation methods - min,max,avg..etc.
>> e) Logging policy - frequency for creating an event and alerting the user.
>>
>> The application would operate based on the policy and log the telemetry
>> data. The details of logging would evolve as we progress on the related
>> issue.
>>
>> Regards,
>> Tom
>>
>
> Hi,
>
> I'd like to bump this topic and add some more details. I'd like to discuss
> design proposals/directions for a couple things :
>
> 1) A short/mid term proposal for telemetry requirements specific to IBM
> labs (which need to be implemented in a relatively short span of time, so
> there may not be the bandwidth to write an entirely new application not
> based on D-Bus or the OpenBMC REST API).
> 2) Industry standard methods for storing and retrieving telemetry data -
> thoughts on how to get here.
>
>
> 1) Telemetry requirements specific to IBM labs
> Here are the requirements and a design proposal.
>
> a) Instantaneous readings, such as temperatures, currents, errors, events
> etc. Let's call this Layer 0.
>
> Proposal:
> - The D-Bus model is the source for instantaneous readings. This means
> there would be D-Bus objects representing this data, and hence an OpenBMC
> REST API around it.
> - These D-Bus objects would not necessarily implement the same D-Bus
> interfaces.
> - Interested clients can read these D-Bus objects via the OpenBMC REST API.
> - If clients are interested in being notified about "changes" to the
> readings, that's possible via the existing event notification over
> WebSockets mechanism.
>
This would also map well into an OBMC MIB extension for example.
>
>
> b) Instantaneous aggregations - this would mostly apply to, but may not be
> limited to, readings such as temperatures and currents. Let's call this
> Layer 1. This basically is to solve, for eg, "what is the min/max/average
> over the last X seconds?". We have a requirement to do such aggregations on
> the BMC.
>
I would be interested in why aggregations (and historical - level 3) are a
requirement and not just handled by the monitoring/event management app as
done in network management. If this work is to be done in the BMC, it needs
to be user definable and able to be turned off for resource-critical
situations.
>
> Proposal:
> - Aggregations are represented as D-Bus objects, created by a telemetry
> app. For eg if we need to know the min/max/avg ambient temp for the last 5
> minutes, and say the the ambient temp is usually at temps/ambient, the
> aggregation could be at temps/aggregations/ambient.
> - Implement D-Bus interfaces to denote aggregations, for eg the
> temps/aggregation/ambient object could implement a D-Bus interface
> describing min/max/avg properties.
> - Aggregation objects will have the values as described in the D-Bus
> interface (such as min/max/avg), and a timestamp, as properties.
> - Enable a config (eg JSON) to let the telemetry app know things like :
> What (supported) aggregations should be performed (min/max/avg)? What D-Bus
> objects should be aggregated? How frequently should they be aggregated?
> What should be the paths of the aggregations? Potentially add a REST API to
> allow changing the (JSON) config at run-time.
> - It will be possible to read all aggregation objects, or aggregation
> objects of a specific type via one REST call.
>
>
> c) Historical aggregations or snapshot. Let's call this Layer 2. This is
> to solve, for eg, "Need a reading corresponding to every X minutes in a
> period of Y hours". This can be a snapshot of Layer 1 or Layer 0 D-Bus
> objects. We have a requirement to store this snapshot on the BMC.
>
> Proposal:
> - The snapshot will be represented as a set of D-Bus objects. For eg if
> one needs an hourly reading for a period of 24 hours, the objects could be
> at temps/aggregations/ambient/per-hour/{1..24}.
> - Enable a config to let a telemetry app to know things like : What D-Bus
> objects should I keep a history of? What is the duration of the snapshot?
> At what frequency should entries be added into the snapshot? Once the
> snapshot is full, should the entries roll, or should we restart?
> Potentially add a REST API to allow changing the (JSON) config at run-time.
> - The historical aggregations can be read via one REST call. It should be
> one D-Bus call as well most likely for the REST server, if there's an
> object manager at temps/aggregations/ambient/per-hour for eg.
> - These objects in the snapshot will implement the same interfaces as
> Layer 1 objects, so they will have the same properties (eg min/max/avg,
> timestamp).
>
>
> d) Some notes
> - With the proposal above, the API to retrieve the telemetry data is via
> the current OpenBMC REST API, so it may not readily work with telemetry
> tools relying on industry-standard API (see point 2 below), but it seems to
> be the feasible option to rely on to implement IBM's requirements in the
> expected timelines.
> - Layer 1 and Layer 2 telemetry apps would be different processes, and can
> function independent of each other.
>
>
>
> 2) Industry standard methods for storing and retrieving telemetry data
>
> - With the proposal above, the instantaneous readings are D-Bus objects,
> the instantaneous and historical aggregations are D-Bus objects as well.
> The API is the OpenBMC REST API.
> - Typically, aggregations may not have to happen on the BMC, in which case
> one can turn off layers 1 and 2.
> - This is regarding how the telemetry data is presented, and how we'd
> eventually not use the current OpenBMC REST API in production. I've heard
> (mostly from people on the To: list) of the following industry-standard
> ways to represent/retrieve telemetry data. This would mean transforming
> layer 0 D-Bus objects into these :
> - Via Redfish (events) API
> - Via IPMI events/PEF
>
Meh. I'd stick with Redfish/OBMC REST API over this one.
> - Via SNMP traps
>
If there is interest here, I have experience designing MIB extensions and
sub-agents to support them.
> - Via an sqlite db, and have something like Logstash parse it
>
Seems very heavy for BMC.
> - Others?
> Discussions are already happening regarding Redfish, so telemetry could be
> one aspect to consider as well.
> - Aggregations could be done on the BMC with collectd. I need to look at
> this in detail. Aggregations could be stored in an RRD format. Need to look
> at this in detail as well. These are as opposed to a D-Bus model of
> aggregations. Thoughts on this? For eg, would this be much less work both
> for the BMC and the telemetry data users than the proposed D-Bus model, but
> at the same time can address the requirements I've mentioned? Do we know
> what are the commonly used client tools for processing telemetry data, and
> how they expect the data to be presented?
>
>
>
> Thanks,
> Deepak
>
>
Kurt Taylor (krtaylor)
[-- Attachment #2: Type: text/html, Size: 12446 bytes --]
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: RFC for Telemetry data collection
2018-03-13 14:23 ` Kurt Taylor
@ 2018-03-13 14:50 ` Deepak Kodihalli
0 siblings, 0 replies; 11+ messages in thread
From: Deepak Kodihalli @ 2018-03-13 14:50 UTC (permalink / raw)
To: Kurt Taylor, rosedahl; +Cc: OpenBMC Maillist
On 13/03/18 7:53 pm, Kurt Taylor wrote:
> Hi,
>
> I'd like to bump this topic and add some more details. I'd like to
> discuss design proposals/directions for a couple things :
>
> 1) A short/mid term proposal for telemetry requirements specific to
> IBM labs (which need to be implemented in a relatively short span of
> time, so there may not be the bandwidth to write an entirely new
> application not based on D-Bus or the OpenBMC REST API).
> 2) Industry standard methods for storing and retrieving telemetry
> data - thoughts on how to get here.
>
>
> 1) Telemetry requirements specific to IBM labs
> Here are the requirements and a design proposal.
>
> a) Instantaneous readings, such as temperatures, currents, errors,
> events etc. Let's call this Layer 0.
>
> Proposal:
> - The D-Bus model is the source for instantaneous readings. This
> means there would be D-Bus objects representing this data, and hence
> an OpenBMC REST API around it.
> - These D-Bus objects would not necessarily implement the same D-Bus
> interfaces.
> - Interested clients can read these D-Bus objects via the OpenBMC
> REST API.
> - If clients are interested in being notified about "changes" to the
> readings, that's possible via the existing event notification over
> WebSockets mechanism.
>
>
> This would also map well into an OBMC MIB extension for example.
>
>
>
> b) Instantaneous aggregations - this would mostly apply to, but may
> not be limited to, readings such as temperatures and currents. Let's
> call this Layer 1. This basically is to solve, for eg, "what is the
> min/max/average over the last X seconds?". We have a requirement to
> do such aggregations on the BMC.
>
>
> I would be interested in why aggregations (and historical - level 3) are
> a requirement and not just handled by the monitoring/event management
> app as done in network management.If this work is to be done in the
> BMC, it needs to be user definable and able to be turned off for
> resource-critical situations.
Right, it should be possible to turn off the layer 2 and 3 aggregation
apps, and not have them in the BMC image at all.
Why the aggregations are required to be done on the BMC - I think that's
the expectation of some of the IBM monitoring tools. I'm sure Todd
Rosedahl would have a better answer here.
>
> Proposal:
> - Aggregations are represented as D-Bus objects, created by a
> telemetry app. For eg if we need to know the min/max/avg ambient
> temp for the last 5 minutes, and say the the ambient temp is usually
> at temps/ambient, the aggregation could be at
> temps/aggregations/ambient.
> - Implement D-Bus interfaces to denote aggregations, for eg the
> temps/aggregation/ambient object could implement a D-Bus interface
> describing min/max/avg properties.
> - Aggregation objects will have the values as described in the D-Bus
> interface (such as min/max/avg), and a timestamp, as properties.
> - Enable a config (eg JSON) to let the telemetry app know things
> like : What (supported) aggregations should be performed
> (min/max/avg)? What D-Bus objects should be aggregated? How
> frequently should they be aggregated? What should be the paths of
> the aggregations? Potentially add a REST API to allow changing the
> (JSON) config at run-time.
> - It will be possible to read all aggregation objects, or
> aggregation objects of a specific type via one REST call.
>
>
> c) Historical aggregations or snapshot. Let's call this Layer 2.
> This is to solve, for eg, "Need a reading corresponding to every X
> minutes in a period of Y hours". This can be a snapshot of Layer 1
> or Layer 0 D-Bus objects. We have a requirement to store this
> snapshot on the BMC.
>
> Proposal:
> - The snapshot will be represented as a set of D-Bus objects. For eg
> if one needs an hourly reading for a period of 24 hours, the objects
> could be at temps/aggregations/ambient/per-hour/{1..24}.
> - Enable a config to let a telemetry app to know things like : What
> D-Bus objects should I keep a history of? What is the duration of
> the snapshot? At what frequency should entries be added into the
> snapshot? Once the snapshot is full, should the entries roll, or
> should we restart? Potentially add a REST API to allow changing the
> (JSON) config at run-time.
> - The historical aggregations can be read via one REST call. It
> should be one D-Bus call as well most likely for the REST server, if
> there's an object manager at temps/aggregations/ambient/per-hour for eg.
> - These objects in the snapshot will implement the same interfaces
> as Layer 1 objects, so they will have the same properties (eg
> min/max/avg, timestamp).
>
>
> d) Some notes
> - With the proposal above, the API to retrieve the telemetry data is
> via the current OpenBMC REST API, so it may not readily work with
> telemetry tools relying on industry-standard API (see point 2
> below), but it seems to be the feasible option to rely on to
> implement IBM's requirements in the expected timelines.
> - Layer 1 and Layer 2 telemetry apps would be different processes,
> and can function independent of each other.
>
>
>
> 2) Industry standard methods for storing and retrieving telemetry data
>
> - With the proposal above, the instantaneous readings are D-Bus
> objects, the instantaneous and historical aggregations are D-Bus
> objects as well. The API is the OpenBMC REST API.
> - Typically, aggregations may not have to happen on the BMC, in
> which case one can turn off layers 1 and 2.
> - This is regarding how the telemetry data is presented, and how
> we'd eventually not use the current OpenBMC REST API in production.
> I've heard (mostly from people on the To: list) of the following
> industry-standard ways to represent/retrieve telemetry data. This
> would mean transforming layer 0 D-Bus objects into these :
> - Via Redfish (events) API
> - Via IPMI events/PEF
>
>
> Meh. I'd stick with Redfish/OBMC REST API over this one.
>
> - Via SNMP traps
>
>
> If there is interest here, I have experience designing MIB extensions
> and sub-agents to support them.
>
> - Via an sqlite db, and have something like Logstash parse it
>
>
> Seems very heavy for BMC.
I tend to agree.
> Kurt Taylor (krtaylor)
>
Regards,
Deepak
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2018-03-13 14:50 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-09-07 15:20 RFC for Telemetry data collection tomjose
2017-09-07 18:41 ` Rick Altherr
2017-09-07 20:04 ` Todd Rosedahl
2017-09-08 1:18 ` Brad Bishop
2017-09-08 1:16 ` Brad Bishop
2017-09-08 3:29 ` Deepak Kodihalli
2017-09-08 4:06 ` Brad Bishop
2018-03-09 13:43 ` Deepak Kodihalli
2018-03-12 17:57 ` Deepak Kodihalli
2018-03-13 14:23 ` Kurt Taylor
2018-03-13 14:50 ` Deepak Kodihalli
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.