Feedback on Current OpenBMC and Proposing Some Improvements ---- Difficulties and Issue Examples

All of lore.kernel.org
 help / color / mirror / Atom feed

* Feedback on Current OpenBMC and Proposing Some Improvements ---- Difficulties and Issue Examples
@ 2020-06-18 21:25 Alex Qiu
  2020-06-21 21:59 ` Ed Tanous
  0 siblings, 1 reply; 7+ messages in thread
From: Alex Qiu @ 2020-06-18 21:25 UTC (permalink / raw)
  To: openbmc; +Cc: gBMC Discussions

[-- Attachment #1: Type: text/plain, Size: 4047 bytes --]

(Splitted from main email thread "Feedback on Current OpenBMC and Proposing
Some Improvements")

Difficulties

The current architecture allows different JSON configuration files to be
loaded to represent varieties of FRUs and boards, however, the architecture
still emphasizes individual sensors internally, and there are no solid
concepts of individual FRUs and entities as a whole object in memory. This
data-driven architecture starts to cause difficulties on the implementation
when some sensors require specialized treatments according to requirement
of hardware platforms.

Due to having no place for code as control flow around the sensors or data
purely generated based on configuration files:

   1.

   When the current configuration format cannot express a hardware topology
   or topology discovery logic, a new schema is added case by case. This not
   only causes burden to the configuration files themselves but also the
   configuration file parser. In practice, some code is often preferred to
   intervene around the hardware topology.
   2.

   It is hard for developers to debug against a certain hardware, as the
   developer has to filter the log for it based on various condition
   statements.
   3.

   It is hard for new developers to ramp up on debugging, because there is
   no symbol for developers to quickly look up and find the code related to
   one hardware or module to put debug statements.
   4.

   Unfortunately, debugging hardware issues occupy the life of BMC
   developers quite often, then debug codes are sometimes asked to be turned
   into workarounds which will stay in the code base. The harder to debug, the
   even harder it is to put up an elegant workaround for hardware issues.
   5.

   If the OpenBMC framework does not provide enough flexibility to
   accommodate specialized code for specific requirements, they quickly become
   downstream patches and technical debts, and they cause cost on maintenance
   as an open source software.

Issue Examples

In this section, I will describe issues that we’re facing with the existing
dynamic software stack, and they should all be well handled by
"Improvements".
Configuring device registers according to needs

For the context, related discussions can be found in the mailing list
archive
<https://lists.ozlabs.org/pipermail/openbmc/2020-January/020078.html>
“Configuring shunt_resistor in hwmon”. Although we managed to properly
adjust the IPMI readouts using scales, we later realized that it would
still be better to configure it directly in hwmon sysfs. If we configure it
in hwmon, we can have the correct reading right from the bottom. On the
other hand, an implementation using the device tree is probably against the
idea of having the dynamic software stack to configure hardware only when
discovered.

Also, hardware engineers came up with requests to configure the voltage
regulator outputs, and from my understanding, this is not what hwmon sysfs
interface intended to do, and we needed that within a very short time
period. I had to use shell script to configure the device registers by
issuing raw I2C commands using i2ctools.

All these requirements ended up in shell scripts run as standalone services
aside from OpenBMC applications, which had their own hardware topology
discovery logic inside them. This may be redundant to what we do in the
dynamic software stack.
Handling special requirements and logics

As our hardware program progresses, we are maintaining more and more
patches in Yocto to apply upon entity-manager and dbus-sensors.

We had to replace some code logic due to various reasons, for example, we
found the existing 8-bit / 16-bit addressable EEPROM detection logic is not
stable enough, and we were left with 8-bit addressable EEPROMs with their
content corrupted in the first byte.

(Continuing to thread "Feedback on Current OpenBMC and Proposing Some
Improvements ---- "Improvements" Ideas")

- Alex Qiu

[-- Attachment #2: Type: text/html, Size: 11004 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Feedback on Current OpenBMC and Proposing Some Improvements ---- Difficulties and Issue Examples
  2020-06-18 21:25 Feedback on Current OpenBMC and Proposing Some Improvements ---- Difficulties and Issue Examples Alex Qiu
@ 2020-06-21 21:59 ` Ed Tanous
  2020-06-23 23:47   ` Alex Qiu
  0 siblings, 1 reply; 7+ messages in thread
From: Ed Tanous @ 2020-06-21 21:59 UTC (permalink / raw)
  To: Alex Qiu; +Cc: openbmc, gBMC Discussions

Love the feedback.  I want to start off by thanking you for taking the
right path in finding answers.  You tried the code, found some
problems, and suggested concrete solutions on how it could be made
better.  There are many that could take your email as an example and
run with it, both as a template, and as an actionable change to be
made.

I will be the first to admit, we didn't document a lot of the intent
behind entity manager, and clearly from your response below the data
structures alone weren't enough to convey meaning.  Lets see if we can
get some of your queries answered.

-Ed

On Thu, Jun 18, 2020 at 2:28 PM Alex Qiu <xqiu@google.com> wrote:
>
> (Splitted from main email thread "Feedback on Current OpenBMC and Proposing Some Improvements")
>
> Difficulties
>
> The current architecture allows different JSON configuration files to be loaded to represent varieties of FRUs and boards, however, the architecture still emphasizes individual sensors internally, and there are no solid concepts of individual FRUs and entities as a whole object in memory. This data-driven architecture starts to cause difficulties on the implementation when some sensors require specialized treatments according to requirement of hardware platforms.
>

One important thing that is non obvious when you look at it:  Files in
EM are meant to be 1:1 with a unit.  Each file is intended to
represent a composable organization of a physical components that may
or may not be in the system as a unit.  It could be a baseboard, hot
swap backplane, or in the cases of things like runBMC modules, might
be the BMC board itself.  This is intentionally not exposed to the
rest of the system, (except through a couple targeted association
interfaces for inventory purposes) under the assumption that
downstream code shouldn't need to know whether a component is on
another physical board or not, as it's only dealing with them at
kernel level, and in the end electrically on the board.  In theory,
the only thing that really needs to know these mappings are things
like Redfish, which actually present those associations in a way that
the user can see (and I will note, that code is a mess and could use
some thought on how to simplify it).  Clearly you've found a use case
where you actually need the individual component information
downstream for organization.  Can you elaborate on what you were
hoping to do with the data, if you had it available to you on dbus?

>
> Due to having no place for code as control flow around the sensors or data purely generated based on configuration files:

I don't really follow this.  There is a lot of code that has quite a
bit of control flow in the consumers (like dbus-sensors).  In theory,
the configuration files are intended to be there only to specify the
things that are different between individual types of "exposes"
records.  The intention is that if you have a platform that needs a
different control flow than someone else, you would make that control
flow configurable through some kind of setting, but there are _tons_
of cases where we can assume that all platforms can have the same
control flow, and write it as code with no parameters.

>
> When the current configuration format cannot express a hardware topology or topology discovery logic, a new schema is added case by case. This not only causes burden to the configuration files themselves but also the configuration file parser. In practice, some code is often preferred to intervene around the hardware topology.
Can you give some examples of this?  I'm not really following.  See
above, schemas are added where we make the decision that a platform
might need one behavior, or another depending on their preference.  If
every platform can agree on a single handling of a behavior, or the
correct behavior is detectable at runtime, there's no reason for it to
be in the configuration files at all.  Your point about the parser is
well taken;  I had always intended the majority of the files to be
moved into an optimized compile-time data structure so they didn't
need to be parsed at startup in most cases.  With that said, the file
parsing with the current number of supported config files present is
relatively low overhead.  This trick will stop scaling eventually as
we support more and more boards, but there should be a no external
impact hit to move that code to run at compile time.

>
> It is hard for developers to debug against a certain hardware, as the developer has to filter the log for it based on various condition statements.

This is a very valid complaint that I've had myself many many times in
the past.  I'm assuming you're looking at the dbus logs.  Can you come
up with some concrete suggestions on what could be done to make it
better?  Better naming of dbus paths?  Better tooling?  If you could
do up a design doc, I'm sure there'd be many people willing to help
you.

>
> It is hard for new developers to ramp up on debugging, because there is no symbol for developers to quickly look up and find the code related to one hardware or module to put debug statements.

I'm not quite following this.  Were you hoping to have a "symbol" that
would represent a FRU?  a specific sensor?  A specific subsystem?  A
specific instance (drive 1,2,3 ect) of the previous?  This might be
solvable with documentation.  Coming up with a couple use cases like
"I want to filter on i2c transactions to a device at address X" and
documenting how to do it might be a good start?
Today, the dbus logs give you a way to filter on:
1. All messages from a particular daemon.
2. Messages only from a particular dbus path, which allows you to
filter on all events from a specific piece of hardware. (see the
path_namespace filter)
3. A particular event type, which lets you filter on all events of a
certain class.
and any combination of the above.  I'm guessing we just need to
document this a little better

>
> Unfortunately, debugging hardware issues occupy the life of BMC developers quite often, then debug codes are sometimes asked to be turned into workarounds which will stay in the code base. The harder to debug, the even harder it is to put up an elegant workaround for hardware issues.
>
> If the OpenBMC framework does not provide enough flexibility to accommodate specialized code for specific requirements, they quickly become downstream patches and technical debts, and they cause cost on maintenance as an open source software.
>
> Issue Examples
>
> In this section, I will describe issues that we’re facing with the existing dynamic software stack, and they should all be well handled by "Improvements".
>
> Configuring device registers according to needs
>
> For the context, related discussions can be found in the mailing list archive “Configuring shunt_resistor in hwmon”. Although we managed to properly adjust the IPMI readouts using scales, we later realized that it would still be better to configure it directly in hwmon sysfs. If we configure it in hwmon, we can have the correct reading right from the bottom. On the other hand, an implementation using the device tree is probably against the idea of having the dynamic software stack to configure hardware only when discovered.

Contrary to what you said, at one point the dbus-sensors
implementations used runtime-compiled device tree overlays to
implement their features, using a patch originally written by the
raspberry pi team.  For sensors, these worked pretty well, but
required out of tree patches that were never merged into the mainline
kernel, and were removed because of exactly the problem you identify
later with out of tree patches.  If getting this mainlined is
something you're interested in pursuing, there are several people that
have been down this path, and can likely give you pointers.  Your
shunt resistor is a great example of a parameter (or possibly default
behavior) that could be added to the ADC sensor implementation without
a schema change.  If the shunt_resistor property exists, use that as
the primary application of the ScaleFactor instead of doing it in
userspace integer math.  That seems like a place where an argument
could be made where that could be the always on behavior, as it would
give the best result, but we'd have to collect some info on it.

>
>
> Also, hardware engineers came up with requests to configure the voltage regulator outputs, and from my understanding, this is not what hwmon sysfs interface intended to do, and we needed that within a very short time period. I had to use shell script to configure the device registers by issuing raw I2C commands using i2ctools.

I've long thought that we need a "set these i2c registers to these
values", or "run this bash script" type in entity manager for this
kind of thing for handling things like FPGAs, or configurable hardware
that you don't necessarily need or want a full kernel driver for
(especially in the prototyping stage).  Unfortunately, these kinds of
hacks tended to be easier to do in a custom build with patches against
the specific driver, rather than in a configurable way, so nobody
really built it out.  Also, doing it in the kernel guarantees better
timing, but I still think there's a use case for these.  My initial
ideas for configuration options for a "register setter daemon", and a
"run this command" daemon are below:

{
"Address": "$address",
"Bus": "$bus",
"Register": "0x12,
"Value": "0x42",
"Name": "Set PCIe Retimer to 42",
"Type": "EEPROM"
},
{
"Command": "/usr/bin/echo \"This is the greatest thing ever\"",
"Name": "My \"greatest ever\" command"
"Type": "Command"
}
>
>
> All these requirements ended up in shell scripts run as standalone services aside from OpenBMC applications, which had their own hardware topology discovery logic inside them. This may be redundant to what we do in the dynamic software stack.
>
> Handling special requirements and logics
>
> As our hardware program progresses, we are maintaining more and more patches in Yocto to apply upon entity-manager and dbus-sensors.
Have all of these patches been submitted to
entity-manager/dbus-sensors?  If they're not making breaking changes,
and you've tested them heavily, I'm not sure why this would be a
problem.  If you're fixing issues that were caused by a change, file
the bugs and we as a community can get the patches reverted until we
have a patch that works for all.
With that said, I understand the concern.  Hardware never waits for
firmware, and sometimes downstream patches are a necessary evil to get
a platform launched.  The hope is that after you're through crunch
time, you go back and simplify/cleanup the patches, and submit them
for review, so others won't have to go through the same pain in the
future.  I realize I'm being idealistic here, but downstream patches
and how to handle upstreaming really needs to be handled within your
organization.  Having done it before, I do have opinions on how to
make "downstream" easier to handle as a project, but that's a much
longer discussion.  If you make the patches available, other people
might even upstream them for you (as has happened to me personally
many times).

>
>
> We had to replace some code logic due to various reasons, for example, we found the existing 8-bit / 16-bit addressable EEPROM detection logic is not stable enough, and we were left with 8-bit addressable EEPROMs with their content corrupted in the first byte.
(not knowing the details here, I risk making a mistake, but) This
sounds like a bad case of the above, where someone intended a change
to apply to everyone, under the assumption that it would have no
effect on 8 bit eeproms.  Clearly this isn't the case, and we either
need to find the bug, revert it, or make it configurable per platform.
Report the bug, point to the commit where it occurred, and the
maintainer and community can make a call on the best way to proceed.
Unfortunately, bugs happen.  The best way to prevent them is to be
proactive, and help in testing the patches on your system before they
hit mainline.

-Ed

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Feedback on Current OpenBMC and Proposing Some Improvements ---- Difficulties and Issue Examples
  2020-06-21 21:59 ` Ed Tanous
@ 2020-06-23 23:47   ` Alex Qiu
  2020-06-24  1:20     ` Patrick Williams
  2020-06-24  2:07     ` Ed Tanous
  0 siblings, 2 replies; 7+ messages in thread
From: Alex Qiu @ 2020-06-23 23:47 UTC (permalink / raw)
  To: Ed Tanous, Josh Lehan, Peter Lundgren, Jason Ling
  Cc: OpenBMC Maillist, Kais Belgaied, Ofer Yehielli, Richard Hanley,
	Benjamin Fair, Nancy Yuen, Robert Lippert, Kun Yi

-Internal email list
+A couple of folks who might be interested in this topic

Hi Ed,

Thank you for your reply! As a note, I'm working on Google's OpenBMC
closely on a hardware product schedule within Google's organization,
so chances are my experience is heavily biased within my area of work,
and maybe there's knowledge that I'm not aware of. Please excuse me if
I'm ignorant in some areas, and hopefully my colleagues can make up
the communication gap if they wish to follow up, and hopefully my demo
can help on the illustration as intended. With that said, please see
my other replies inlined. Thanks!

- Alex Qiu


On Sun, Jun 21, 2020 at 2:59 PM Ed Tanous <ed@tanous.net> wrote:
>
> Love the feedback.  I want to start off by thanking you for taking the
> right path in finding answers.  You tried the code, found some
> problems, and suggested concrete solutions on how it could be made
> better.  There are many that could take your email as an example and
> run with it, both as a template, and as an actionable change to be
> made.
>
> I will be the first to admit, we didn't document a lot of the intent
> behind entity manager, and clearly from your response below the data
> structures alone weren't enough to convey meaning.  Lets see if we can
> get some of your queries answered.
>
> -Ed
>
> On Thu, Jun 18, 2020 at 2:28 PM Alex Qiu <xqiu@google.com> wrote:
> >
> > (Splitted from main email thread "Feedback on Current OpenBMC and Proposing Some Improvements")
> >
> > Difficulties
> >
> > The current architecture allows different JSON configuration files to be loaded to represent varieties of FRUs and boards, however, the architecture still emphasizes individual sensors internally, and there are no solid concepts of individual FRUs and entities as a whole object in memory. This data-driven architecture starts to cause difficulties on the implementation when some sensors require specialized treatments according to requirement of hardware platforms.
> >
>
> One important thing that is non obvious when you look at it:  Files in
> EM are meant to be 1:1 with a unit.  Each file is intended to
> represent a composable organization of a physical components that may
> or may not be in the system as a unit.  It could be a baseboard, hot
> swap backplane, or in the cases of things like runBMC modules, might
> be the BMC board itself.  This is intentionally not exposed to the
> rest of the system, (except through a couple targeted association
> interfaces for inventory purposes) under the assumption that
> downstream code shouldn't need to know whether a component is on
> another physical board or not, as it's only dealing with them at
> kernel level, and in the end electrically on the board.  In theory,
> the only thing that really needs to know these mappings are things
> like Redfish, which actually present those associations in a way that
> the user can see (and I will note, that code is a mess and could use
> some thought on how to simplify it).  Clearly you've found a use case
> where you actually need the individual component information
> downstream for organization.  Can you elaborate on what you were
> hoping to do with the data, if you had it available to you on dbus?

For example, some of our FRU has way too many temperature sensors from
hardware design, but we only need the maximum temperature among these
sensors on one FRU to feed to the PID control loop or health
reporting. It would be great to aggregate the sensors according to
individual FRUs. You can see this sensor aggregation feature as a
simple example in my demo:
https://github.com/alex310110/bmc-proto-20q2/blob/master/downstream/card_example_g.py#L69

We also encounter voltage regulator devices requiring current reading
corrections according to their own properties (duty cycle and
temperature). This correction is not preferred to perform within
kernel drivers, and we are suggested to deal with them in user space.
See "Read Output Current" in
https://www.maximintegrated.com/en/design/technical-documents/app-notes/6/6042.html
Example in demo:
https://github.com/alex310110/bmc-proto-20q2/blob/master/device_xam2734.py

>
> >
> > Due to having no place for code as control flow around the sensors or data purely generated based on configuration files:
>
> I don't really follow this.  There is a lot of code that has quite a
> bit of control flow in the consumers (like dbus-sensors).  In theory,
> the configuration files are intended to be there only to specify the
> things that are different between individual types of "exposes"
> records.  The intention is that if you have a platform that needs a
> different control flow than someone else, you would make that control
> flow configurable through some kind of setting, but there are _tons_
> of cases where we can assume that all platforms can have the same
> control flow, and write it as code with no parameters.

I think you are right, and I didn't illustrate my idea well. I'm
thinking of being able to easily add additional control flow as
needed, and also making related code/config organized to be as
adjacent as possible, so developers can view/add/modify related
code/config fast, without the need to jump among different code
repositories. Please also see the demo parts above.

>
> >
> > When the current configuration format cannot express a hardware topology or topology discovery logic, a new schema is added case by case. This not only causes burden to the configuration files themselves but also the configuration file parser. In practice, some code is often preferred to intervene around the hardware topology.
> Can you give some examples of this?  I'm not really following.  See
> above, schemas are added where we make the decision that a platform
> might need one behavior, or another depending on their preference.  If
> every platform can agree on a single handling of a behavior, or the
> correct behavior is detectable at runtime, there's no reason for it to
> be in the configuration files at all.  Your point about the parser is
> well taken;  I had always intended the majority of the files to be
> moved into an optimized compile-time data structure so they didn't
> need to be parsed at startup in most cases.  With that said, the file
> parsing with the current number of supported config files present is
> relatively low overhead.  This trick will stop scaling eventually as
> we support more and more boards, but there should be a no external
> impact hit to move that code to run at compile time.

Please see this email thread on Configuring devices with I2C MUX:
https://lists.ozlabs.org/pipermail/openbmc/2020-January/020144.html

We ended up configuring the device tree to help solving this problem,
but it is still sort of a workaround, and we would be more happy if
entity-manager can handle this dynamic topology by itself.

>
> >
> > It is hard for developers to debug against a certain hardware, as the developer has to filter the log for it based on various condition statements.
>
> This is a very valid complaint that I've had myself many many times in
> the past.  I'm assuming you're looking at the dbus logs.  Can you come
> up with some concrete suggestions on what could be done to make it
> better?  Better naming of dbus paths?  Better tooling?  If you could
> do up a design doc, I'm sure there'd be many people willing to help
> you.

Unfortunately, I don't have a solution for this for now, but only
demonstrated the possibility in my demo, where you can add debug
statements per FRU type easily once you find the source file for the
FRU. If as you said in your previous replies that the component
organization is intentionally not exposed to the dbus, we would
probably have difficulties finding these information in the dbus for
debugging, right?

>
> >
> > It is hard for new developers to ramp up on debugging, because there is no symbol for developers to quickly look up and find the code related to one hardware or module to put debug statements.
>
> I'm not quite following this.  Were you hoping to have a "symbol" that
> would represent a FRU?  a specific sensor?  A specific subsystem?  A
> specific instance (drive 1,2,3 ect) of the previous?  This might be
> solvable with documentation.  Coming up with a couple use cases like
> "I want to filter on i2c transactions to a device at address X" and
> documenting how to do it might be a good start?
> Today, the dbus logs give you a way to filter on:
> 1. All messages from a particular daemon.
> 2. Messages only from a particular dbus path, which allows you to
> filter on all events from a specific piece of hardware. (see the
> path_namespace filter)
> 3. A particular event type, which lets you filter on all events of a
> certain class.
> and any combination of the above.  I'm guessing we just need to
> document this a little better

Apologies that I have little experience debugging with dbus, and this
statement came straight out of my mind with the question: If I am a
new developer for this hardware project, how can I quickly locate the
proper place to intercept the control flow and put debug prints?

>
> >
> > Unfortunately, debugging hardware issues occupy the life of BMC developers quite often, then debug codes are sometimes asked to be turned into workarounds which will stay in the code base. The harder to debug, the even harder it is to put up an elegant workaround for hardware issues.
> >
> > If the OpenBMC framework does not provide enough flexibility to accommodate specialized code for specific requirements, they quickly become downstream patches and technical debts, and they cause cost on maintenance as an open source software.
> >
> > Issue Examples
> >
> > In this section, I will describe issues that we’re facing with the existing dynamic software stack, and they should all be well handled by "Improvements".
> >
> > Configuring device registers according to needs
> >
> > For the context, related discussions can be found in the mailing list archive “Configuring shunt_resistor in hwmon”. Although we managed to properly adjust the IPMI readouts using scales, we later realized that it would still be better to configure it directly in hwmon sysfs. If we configure it in hwmon, we can have the correct reading right from the bottom. On the other hand, an implementation using the device tree is probably against the idea of having the dynamic software stack to configure hardware only when discovered.
>
> Contrary to what you said, at one point the dbus-sensors
> implementations used runtime-compiled device tree overlays to
> implement their features, using a patch originally written by the
> raspberry pi team.  For sensors, these worked pretty well, but
> required out of tree patches that were never merged into the mainline
> kernel, and were removed because of exactly the problem you identify
> later with out of tree patches.  If getting this mainlined is
> something you're interested in pursuing, there are several people that
> have been down this path, and can likely give you pointers.  Your
> shunt resistor is a great example of a parameter (or possibly default
> behavior) that could be added to the ADC sensor implementation without
> a schema change.  If the shunt_resistor property exists, use that as
> the primary application of the ScaleFactor instead of doing it in
> userspace integer math.  That seems like a place where an argument
> could be made where that could be the always on behavior, as it would
> give the best result, but we'd have to collect some info on it.

Some folks within my team would be interested in this information, as
they have been wondering why we didn't use device tree overlay. +Jason
Ling

>
> >
> >
> > Also, hardware engineers came up with requests to configure the voltage regulator outputs, and from my understanding, this is not what hwmon sysfs interface intended to do, and we needed that within a very short time period. I had to use shell script to configure the device registers by issuing raw I2C commands using i2ctools.
>
> I've long thought that we need a "set these i2c registers to these
> values", or "run this bash script" type in entity manager for this
> kind of thing for handling things like FPGAs, or configurable hardware
> that you don't necessarily need or want a full kernel driver for
> (especially in the prototyping stage).  Unfortunately, these kinds of
> hacks tended to be easier to do in a custom build with patches against
> the specific driver, rather than in a configurable way, so nobody
> really built it out.  Also, doing it in the kernel guarantees better
> timing, but I still think there's a use case for these.  My initial
> ideas for configuration options for a "register setter daemon", and a
> "run this command" daemon are below:
>
> {
> "Address": "$address",
> "Bus": "$bus",
> "Register": "0x12,
> "Value": "0x42",
> "Name": "Set PCIe Retimer to 42",
> "Type": "EEPROM"
> },
> {
> "Command": "/usr/bin/echo \"This is the greatest thing ever\"",
> "Name": "My \"greatest ever\" command"
> "Type": "Command"
> }

Putting the register initialization into the kernel driver would have
two problems: 1) That would be a kernel patch to maintain downstream
instead; 2) kernel driver may not have the best knowledge on the value
to configure. For example, both FRU share the same VR device, but one
is required to set voltage to X, and the other is required to set
voltage to Y.

Adding the "command" property in configuration files brings an issue
that what if we need some 'if' statements (or any control flow only
expressible by code) around that command.

A daemon on the other side, would have a redundant logic of hardware
discovery. I have been using this trick actually, but I have to face
an ironic fact that the daemon written in shell script does a reliable
job walking through the hardware topology, including distinguishing
the exact FRU that it should act upon. I believe this is some feature
that the OpenBMC software stack should take good care of.

> >
> >
> > All these requirements ended up in shell scripts run as standalone services aside from OpenBMC applications, which had their own hardware topology discovery logic inside them. This may be redundant to what we do in the dynamic software stack.
> >
> > Handling special requirements and logics
> >
> > As our hardware program progresses, we are maintaining more and more patches in Yocto to apply upon entity-manager and dbus-sensors.
> Have all of these patches been submitted to
> entity-manager/dbus-sensors?  If they're not making breaking changes,
> and you've tested them heavily, I'm not sure why this would be a
> problem.  If you're fixing issues that were caused by a change, file
> the bugs and we as a community can get the patches reverted until we
> have a patch that works for all.
> With that said, I understand the concern.  Hardware never waits for
> firmware, and sometimes downstream patches are a necessary evil to get
> a platform launched.  The hope is that after you're through crunch
> time, you go back and simplify/cleanup the patches, and submit them
> for review, so others won't have to go through the same pain in the
> future.  I realize I'm being idealistic here, but downstream patches
> and how to handle upstreaming really needs to be handled within your
> organization.  Having done it before, I do have opinions on how to
> make "downstream" easier to handle as a project, but that's a much
> longer discussion.  If you make the patches available, other people
> might even upstream them for you (as has happened to me personally
> many times).

Yep, your concern is exactly the case here. Our team doesn't have the
breath to generate a feature to handle special hardware requirements.
Some of the patches that we are maintaining right now are heavily
vendor-specific and can't be upstreamed, either. Due to these
realities, I'm suggesting to have an infrastructure which can handle
short-term bursts without causing too much overhead to maintain
patches.

+Josh Lehan and +Peter Lundgren may have more information on this.

>
> >
> >
> > We had to replace some code logic due to various reasons, for example, we found the existing 8-bit / 16-bit addressable EEPROM detection logic is not stable enough, and we were left with 8-bit addressable EEPROMs with their content corrupted in the first byte.
> (not knowing the details here, I risk making a mistake, but) This
> sounds like a bad case of the above, where someone intended a change
> to apply to everyone, under the assumption that it would have no
> effect on 8 bit eeproms.  Clearly this isn't the case, and we either
> need to find the bug, revert it, or make it configurable per platform.
> Report the bug, point to the commit where it occurred, and the
> maintainer and community can make a call on the best way to proceed.
> Unfortunately, bugs happen.  The best way to prevent them is to be
> proactive, and help in testing the patches on your system before they
> hit mainline.

Point taken. And yes, I think it's another case that we don't have
time to really root cause it. +Peter Lundgren would have more info on
this.

>
> -Ed

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Feedback on Current OpenBMC and Proposing Some Improvements ---- Difficulties and Issue Examples
  2020-06-23 23:47   ` Alex Qiu
@ 2020-06-24  1:20     ` Patrick Williams
  2020-06-24 16:57       ` Alex Qiu
  2020-06-24  2:07     ` Ed Tanous
  1 sibling, 1 reply; 7+ messages in thread
From: Patrick Williams @ 2020-06-24  1:20 UTC (permalink / raw)
  To: Alex Qiu
  Cc: Ed Tanous, Josh Lehan, Peter Lundgren, Jason Ling, Kais Belgaied,
	Benjamin Fair, OpenBMC Maillist, Ofer Yehielli, Richard Hanley

[-- Attachment #1: Type: text/plain, Size: 1623 bytes --]

Hi Alex,

On Tue, Jun 23, 2020 at 04:47:36PM -0700, Alex Qiu wrote:
> For example, some of our FRU has way too many temperature sensors from
> hardware design, but we only need the maximum temperature among these
> sensors on one FRU to feed to the PID control loop or health
> reporting. It would be great to aggregate the sensors according to
> individual FRUs. You can see this sensor aggregation feature as a
> simple example in my demo:
> https://github.com/alex310110/bmc-proto-20q2/blob/master/downstream/card_example_g.py#L69
> 
> We also encounter voltage regulator devices requiring current reading
> corrections according to their own properties (duty cycle and
> temperature). This correction is not preferred to perform within
> kernel drivers, and we are suggested to deal with them in user space.
> See "Read Output Current" in
> https://www.maximintegrated.com/en/design/technical-documents/app-notes/6/6042.html
> Example in demo:
> https://github.com/alex310110/bmc-proto-20q2/blob/master/device_xam2734.py
> 

We've been discussing a very similar feature to what you're describing
here at Facebook.  There is a very rough design document at:

    https://gerrit.openbmc-project.xyz/c/openbmc/docs/+/32345

We need to have both aggregation sensors (the first) and corrections
(the second).  The design document here allows both.

On correction, my understanding is that we have some sensors that need
an adjustment depending on a *different* sensor value.  For instance, a
voltage regulator might have some error based on the ambient
temperature.

-- 
Patrick Williams

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Feedback on Current OpenBMC and Proposing Some Improvements ---- Difficulties and Issue Examples
  2020-06-23 23:47   ` Alex Qiu
  2020-06-24  1:20     ` Patrick Williams
@ 2020-06-24  2:07     ` Ed Tanous
  2020-06-24 23:37       ` Alex Qiu
  1 sibling, 1 reply; 7+ messages in thread
From: Ed Tanous @ 2020-06-24  2:07 UTC (permalink / raw)
  To: Alex Qiu
  Cc: Josh Lehan, Peter Lundgren, Jason Ling, OpenBMC Maillist,
	Kais Belgaied, Ofer Yehielli, Richard Hanley, Benjamin Fair,
	Nancy Yuen, Robert Lippert, Kun Yi

On Tue, Jun 23, 2020 at 4:47 PM Alex Qiu <xqiu@google.com> wrote:
>
> For example, some of our FRU has way too many temperature sensors from
> hardware design, but we only need the maximum temperature among these
> sensors on one FRU to feed to the PID control loop or health
> reporting. It would be great to aggregate the sensors according to
> individual FRUs. You can see this sensor aggregation feature as a
> simple example in my demo:
> https://github.com/alex310110/bmc-proto-20q2/blob/master/downstream/card_example_g.py#L69
>
Too many sensors for what?  Are you overloading dbus?

Margin sensors were something we (note, I say we here, but James did
vastly more work than I did) explicitly did not add, and the argument
I used at the time was threefold.
1. PID control loops can (and in phosphor-pid-control do) use multiple
sensors, and wildcards as inputs to a single PID.  This reduces the
configuration and computation overhead for many-sensored systems down
to what an equivalent "aggregate" sensor would be.
2. If the aggregated data is not put onto dbus (or somewhere else) you
lose all the information.  As an engineer/operator, many users would
ask "which sensor is causing the margins to be so low?" and "Why is
this sensor in the error state?" which would require someone to break
them down again, debug the issue, then reaggregate them.  You could
probably solve this with an interface, but we decided that we'd rather
spend time optimizing the sensor subsystem in other ways (like only
sending temp changes, and batching updates).  Keep in mind, if you're
clever about batching the PropertiesChanged signal on dbus, the
per-sensor cost of unpacking the value is minimized quite a bit, as
the O(1) in the dbus message significantly outweighs the O(N) for the
sensor values.  With that said, I could see if you needed to add this
feature for systems with many, many sensors on a slow processor, but
if it were me, I would focus my efforts on further increasing the
sensor subsystem performance for these cases.  There's a lot of
performance wins that could be had depending on your configuration.
3. If your sensors have different thresholds, that means that they
need to be represented as an integer that is negative when the system
is within margin.  Small negative numbers cause confusion at first
glance, and require a lot more documentation than if all the sensor
values are available, and the end unit (be it webui or control system)
aggregates them by type.  It should be noted, phosphor-pid-control has
a mode where it can treat sensors as a margin against their threshold,
so you can still have this behavior in the PID loops, it's just
aggregated in a less confusing place.

With all of that said, if you still feel like you need aggregated
sensor types, they could certainly be added.

> We also encounter voltage regulator devices requiring current reading
> corrections according to their own properties (duty cycle and
> temperature). This correction is not preferred to perform within
> kernel drivers, and we are suggested to deal with them in user space.
> See "Read Output Current" in
> https://www.maximintegrated.com/en/design/technical-documents/app-notes/6/6042.html
> Example in demo:
> https://github.com/alex310110/bmc-proto-20q2/blob/master/device_xam2734.py
>

Openbmc doesn't do this today anywhere that I'm aware of, so this
would be a feature you need to add.  There's a couple of ways you
could slice this.
A. You say that you don't want to do this in a kernel module, but you
don't say why.   Considering this part is a self-contained unit, and
it's in the appnote for the part, getting the kernel driver to do the
calculation for you seems straightforward, and would help many others
outside of openbmc if you did it in the kernel code.  Can you
elaborate on why you're thinking it would be better to do it in
userspace?
B. If you must do it in userspace, I'd expect you'd want to start by
writing a MAX20743 userspace daemon that pulls in both of those
values, and does the calculation, manages thresholds, then posts to
the sensor interface on dbus .  What I'm not following is what about
this you're finding difficult that we can make better.  There are some
simple <200 line examples that you can grab and extend.  How can we
make this easier for those in the future?  Ideally we would have
supported that maxim chip out of the box, but in this case, let's make
it generic enough so that the next person that gets here doesn't have
to edit code, and can simply add it to their configuration.

> >
> > >
> > > Due to having no place for code as control flow around the sensors or data purely generated based on configuration files:
> >
> > I don't really follow this.  There is a lot of code that has quite a
> > bit of control flow in the consumers (like dbus-sensors).  In theory,
> > the configuration files are intended to be there only to specify the
> > things that are different between individual types of "exposes"
> > records.  The intention is that if you have a platform that needs a
> > different control flow than someone else, you would make that control
> > flow configurable through some kind of setting, but there are _tons_
> > of cases where we can assume that all platforms can have the same
> > control flow, and write it as code with no parameters.
>
> I think you are right, and I didn't illustrate my idea well. I'm
> thinking of being able to easily add additional control flow as
> needed, and also making related code/config organized to be as
> adjacent as possible, so developers can view/add/modify related
> code/config fast, without the need to jump among different code
> repositories. Please also see the demo parts above.

Gotcha.  If I can summarize, you didn't like that the code was in 2
separate repos.  I completely agree with you, and at a couple of
conferences, I spitballed the idea of merging all of the
OpenBMC-specific code into a single repo.  This needs more thought,
and a real design doc, but it's something I long intended on trying to
propose, just never had the time.  Maybe you could start with those
two modules, and come up with a way we could organize it better?

>
> >
> > >
> > > When the current configuration format cannot express a hardware topology or topology discovery logic, a new schema is added case by case. This not only causes burden to the configuration files themselves but also the configuration file parser. In practice, some code is often preferred to intervene around the hardware topology.
> > Can you give some examples of this?  I'm not really following.  See
> > above, schemas are added where we make the decision that a platform
> > might need one behavior, or another depending on their preference.  If
> > every platform can agree on a single handling of a behavior, or the
> > correct behavior is detectable at runtime, there's no reason for it to
> > be in the configuration files at all.  Your point about the parser is
> > well taken;  I had always intended the majority of the files to be
> > moved into an optimized compile-time data structure so they didn't
> > need to be parsed at startup in most cases.  With that said, the file
> > parsing with the current number of supported config files present is
> > relatively low overhead.  This trick will stop scaling eventually as
> > we support more and more boards, but there should be a no external
> > impact hit to move that code to run at compile time.
>
> Please see this email thread on Configuring devices with I2C MUX:
> https://lists.ozlabs.org/pipermail/openbmc/2020-January/020144.html
>
> We ended up configuring the device tree to help solving this problem,
> but it is still sort of a workaround, and we would be more happy if
> entity-manager can handle this dynamic topology by itself.

Without knowing the details of the system you're working on, I'm not
sure how to help more.  This is certainly a place where the i2c
topology is hard to grok, but to my understanding most of the topology
should've been automatically handled by the code James pointed you to.
Clearly you weren't able to make that work, so there's either a bug,
missing feature, or a configuration error present in that scenario.
In your case, it sounds like you opted for a compile-time config
option, which should work just fine if you don't need the dynamic
behavior of adding/removing components at runtime.  If you'd like to
triage a little, I'm sure there are people that could help you.

>
> >
> > >
> > > It is hard for developers to debug against a certain hardware, as the developer has to filter the log for it based on various condition statements.
> >
> > This is a very valid complaint that I've had myself many many times in
> > the past.  I'm assuming you're looking at the dbus logs.  Can you come
> > up with some concrete suggestions on what could be done to make it
> > better?  Better naming of dbus paths?  Better tooling?  If you could
> > do up a design doc, I'm sure there'd be many people willing to help
> > you.
>
> Unfortunately, I don't have a solution for this for now, but only
> demonstrated the possibility in my demo, where you can add debug
> statements per FRU type easily once you find the source file for the
> FRU. If as you said in your previous replies that the component
> organization is intentionally not exposed to the dbus, we would
> probably have difficulties finding these information in the dbus for
> debugging, right?
Yes.  Rarely in working on OpenBMC have I actually needed to know FRU
level logging.  In general, the cases that are more common are "I need
to debug X thing on this card" or "I need to debug all things of class
X".  Very rarely do I need "X, Y, and Z on this card" in a single log
filter.  With that said, that's just how I think.  If you come up with
some concrete solutions, I'm sure many would be able to help you flesh
them out.  Yes, with the current data available, going from log -> FRU
(in that direction) is very difficult.  Go look at the logging code,
what interfaces would you need to be able to build what you want?

>
> >
> > >
> > > It is hard for new developers to ramp up on debugging, because there is no symbol for developers to quickly look up and find the code related to one hardware or module to put debug statements.
> >
> > I'm not quite following this.  Were you hoping to have a "symbol" that
> > would represent a FRU?  a specific sensor?  A specific subsystem?  A
> > specific instance (drive 1,2,3 ect) of the previous?  This might be
> > solvable with documentation.  Coming up with a couple use cases like
> > "I want to filter on i2c transactions to a device at address X" and
> > documenting how to do it might be a good start?
> > Today, the dbus logs give you a way to filter on:
> > 1. All messages from a particular daemon.
> > 2. Messages only from a particular dbus path, which allows you to
> > filter on all events from a specific piece of hardware. (see the
> > path_namespace filter)
> > 3. A particular event type, which lets you filter on all events of a
> > certain class.
> > and any combination of the above.  I'm guessing we just need to
> > document this a little better
>
> Apologies that I have little experience debugging with dbus, and this
> statement came straight out of my mind with the question: If I am a
> new developer for this hardware project, how can I quickly locate the
> proper place to intercept the control flow and put debug prints?
>

Slight correction, openbmc is a software project, not a hardware
project.  Because of that, and how configurable it is, it's really
hard to give generic "put your control flow here".  In the case of
sensors, find the sensor you care about, find the control loop you
care about (most of them are relatively small) and add printing code,
then use the Journal to grab the information.  If you're looking to
debug low level kernel info, enable kernel tracing as you would any
other kernel module.  Have you tried reading about the various logging
capabilities dbus and journald have?  Is there a tool you could add to
phosphor-webui that would capture the logs you want over the websocket
dbus connection?  Because that code is running on the users browser,
there's a lot of freedom to do what you want there.

> >
> > >
> > > Unfortunately, debugging hardware issues occupy the life of BMC developers quite often, then debug codes are sometimes asked to be turned into workarounds which will stay in the code base. The harder to debug, the even harder it is to put up an elegant workaround for hardware issues.
> > >
> > > If the OpenBMC framework does not provide enough flexibility to accommodate specialized code for specific requirements, they quickly become downstream patches and technical debts, and they cause cost on maintenance as an open source software.
> > >
> > > Issue Examples
> > >
> > > In this section, I will describe issues that we’re facing with the existing dynamic software stack, and they should all be well handled by "Improvements".
> > >
> > > Configuring device registers according to needs
> > >
> > > For the context, related discussions can be found in the mailing list archive “Configuring shunt_resistor in hwmon”. Although we managed to properly adjust the IPMI readouts using scales, we later realized that it would still be better to configure it directly in hwmon sysfs. If we configure it in hwmon, we can have the correct reading right from the bottom. On the other hand, an implementation using the device tree is probably against the idea of having the dynamic software stack to configure hardware only when discovered.
> >
> > Contrary to what you said, at one point the dbus-sensors
> > implementations used runtime-compiled device tree overlays to
> > implement their features, using a patch originally written by the
> > raspberry pi team.  For sensors, these worked pretty well, but
> > required out of tree patches that were never merged into the mainline
> > kernel, and were removed because of exactly the problem you identify
> > later with out of tree patches.  If getting this mainlined is
> > something you're interested in pursuing, there are several people that
> > have been down this path, and can likely give you pointers.  Your
> > shunt resistor is a great example of a parameter (or possibly default
> > behavior) that could be added to the ADC sensor implementation without
> > a schema change.  If the shunt_resistor property exists, use that as
> > the primary application of the ScaleFactor instead of doing it in
> > userspace integer math.  That seems like a place where an argument
> > could be made where that could be the always on behavior, as it would
> > give the best result, but we'd have to collect some info on it.
>
> Some folks within my team would be interested in this information, as
> they have been wondering why we didn't use device tree overlay. +Jason
> Ling

Short version of a long answer: To detect a full topology, you need
userspace up to run a lot of code that shouldn't be in the kernel.
Device tree overlays have to be added from inside the kernel (unless
you have the aforementioned patch and can trigger an overlay from
userspace).  Having to reboot the BMC every time the topology changes
is a non-starter.

>
> >
> > >
> > >
> > > Also, hardware engineers came up with requests to configure the voltage regulator outputs, and from my understanding, this is not what hwmon sysfs interface intended to do, and we needed that within a very short time period. I had to use shell script to configure the device registers by issuing raw I2C commands using i2ctools.
> >
> > I've long thought that we need a "set these i2c registers to these
> > values", or "run this bash script" type in entity manager for this
> > kind of thing for handling things like FPGAs, or configurable hardware
> > that you don't necessarily need or want a full kernel driver for
> > (especially in the prototyping stage).  Unfortunately, these kinds of
> > hacks tended to be easier to do in a custom build with patches against
> > the specific driver, rather than in a configurable way, so nobody
> > really built it out.  Also, doing it in the kernel guarantees better
> > timing, but I still think there's a use case for these.  My initial
> > ideas for configuration options for a "register setter daemon", and a
> > "run this command" daemon are below:
> >
> > {
> > "Address": "$address",
> > "Bus": "$bus",
> > "Register": "0x12,
> > "Value": "0x42",
> > "Name": "Set PCIe Retimer to 42",
> > "Type": "EEPROM"
> > },
> > {
> > "Command": "/usr/bin/echo \"This is the greatest thing ever\"",
> > "Name": "My \"greatest ever\" command"
> > "Type": "Command"
> > }
>
> Putting the register initialization into the kernel driver would have
> two problems: 1) That would be a kernel patch to maintain downstream
> instead; 2) kernel driver may not have the best knowledge on the value
> to configure. For example, both FRU share the same VR device, but one
> is required to set voltage to X, and the other is required to set
> voltage to Y.
1. I don't follow why you go directly downstream.  Why not upstream
the patch to the kernel?  Yeah, sometimes it's difficult, but you get
a lot of great eyes on it, and the next person outside openbmc to hit
this problem has a solution already baked in.
2. The kernel doesn't "need" to know what value to set for a given
set.  It can take that direction from userspace and/or devicetree.
All you have to do is expose the appropriate interface to userspace.

>
> Adding the "command" property in configuration files brings an issue
> that what if we need some 'if' statements (or any control flow only
> expressible by code) around that command.
>
> A daemon on the other side, would have a redundant logic of hardware
> discovery. I have been using this trick actually, but I have to face
> an ironic fact that the daemon written in shell script does a reliable
> job walking through the hardware topology, including distinguishing
> the exact FRU that it should act upon. I believe this is some feature
> that the OpenBMC software stack should take good care of.

Again, the above is not for control flow.  Write your control flow in
code (ideally compiled code, to avoid runtime/boot performance
issues).  The above is intended for temporary, band-aid solutions.
Bash is great... for certain things.  Walking a topology that might be
large, complex, and have complexes that could have hardware errors
present is not one of the things bash excels at.  Also keep in mind
that your bash script walking all topologies is affecting the overall
boot time of OpenBMC.
What you've described is essentially what FruDevice/Entity Manager/
dbus-sensors (and others) does.  FruDevice walks the i2c topology, and
identifies all things, but doesn't take an opinion on whether or not
they're interesting.  Entity manager takes those things, and tries to
match them with runtime configurations.  These might be sensors,
daemons that need launched, or settings that need conveyed to the
kernel.  Then the individual pieces take off with their role.  There
are other examples of other topology walkers, but that one is the most
simple.

>
> Yep, your concern is exactly the case here. Our team doesn't have the
> breath to generate a feature to handle special hardware requirements.
> Some of the patches that we are maintaining right now are heavily
> vendor-specific and can't be upstreamed, either. Due to these
> realities, I'm suggesting to have an infrastructure which can handle
> short-term bursts without causing too much overhead to maintain
> patches.

Personal opinion time: This is outside of OpenBMCs scope to make
downstream management easier for you and your team.  Without knowledge
of what patches you have, and what problems you're facing, it's nearly
impossible to build a solution that would work for you.  With that
said, if you have some ideas and they're actionable, happy to hear
them.

>
> Point taken. And yes, I think it's another case that we don't have
> time to really root cause it. +Peter Lundgren would have more info on
> this.
>
I feel the "not enough time" problem.  Getting a real root cause is
less important than creating a bug publically, and saying "this commit
X caused this bad behavior Y, and if I remove it it goes away".  That
puts the onus on the submitter to find the root cause, and gives the
maintainer good reason to revert it.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Feedback on Current OpenBMC and Proposing Some Improvements ---- Difficulties and Issue Examples
  2020-06-24  1:20     ` Patrick Williams
@ 2020-06-24 16:57       ` Alex Qiu
  0 siblings, 0 replies; 7+ messages in thread
From: Alex Qiu @ 2020-06-24 16:57 UTC (permalink / raw)
  To: Patrick Williams
  Cc: Ed Tanous, Josh Lehan, Peter Lundgren, Jason Ling, Kais Belgaied,
	Benjamin Fair, OpenBMC Maillist, Ofer Yehielli, Richard Hanley

Hi Patrick,

Thanks for the information! I'm taking a look at it...

- Alex Qiu

On Tue, Jun 23, 2020 at 6:20 PM Patrick Williams <patrick@stwcx.xyz> wrote:
>
> Hi Alex,
>
> On Tue, Jun 23, 2020 at 04:47:36PM -0700, Alex Qiu wrote:
> > For example, some of our FRU has way too many temperature sensors from
> > hardware design, but we only need the maximum temperature among these
> > sensors on one FRU to feed to the PID control loop or health
> > reporting. It would be great to aggregate the sensors according to
> > individual FRUs. You can see this sensor aggregation feature as a
> > simple example in my demo:
> > https://github.com/alex310110/bmc-proto-20q2/blob/master/downstream/card_example_g.py#L69
> >
> > We also encounter voltage regulator devices requiring current reading
> > corrections according to their own properties (duty cycle and
> > temperature). This correction is not preferred to perform within
> > kernel drivers, and we are suggested to deal with them in user space.
> > See "Read Output Current" in
> > https://www.maximintegrated.com/en/design/technical-documents/app-notes/6/6042.html
> > Example in demo:
> > https://github.com/alex310110/bmc-proto-20q2/blob/master/device_xam2734.py
> >
>
> We've been discussing a very similar feature to what you're describing
> here at Facebook.  There is a very rough design document at:
>
>     https://gerrit.openbmc-project.xyz/c/openbmc/docs/+/32345
>
> We need to have both aggregation sensors (the first) and corrections
> (the second).  The design document here allows both.
>
> On correction, my understanding is that we have some sensors that need
> an adjustment depending on a *different* sensor value.  For instance, a
> voltage regulator might have some error based on the ambient
> temperature.
>
> --
> Patrick Williams

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Feedback on Current OpenBMC and Proposing Some Improvements ---- Difficulties and Issue Examples
  2020-06-24  2:07     ` Ed Tanous
@ 2020-06-24 23:37       ` Alex Qiu
  0 siblings, 0 replies; 7+ messages in thread
From: Alex Qiu @ 2020-06-24 23:37 UTC (permalink / raw)
  To: Ed Tanous, Sui Chen, Andrew Macrae, Jie Yang
  Cc: Josh Lehan, Peter Lundgren, Jason Ling, OpenBMC Maillist,
	Kais Belgaied, Ofer Yehielli, Richard Hanley, Benjamin Fair,
	Nancy Yuen, Robert Lippert, Kun Yi

Hi Ed,

I've noted issues or missing features in these email threads within my
team, and thanks for echoing on them so that we know that's some paths
to work on, not that we are missing anything or we fixed them in the
wrong ways. However, I haven't come up with solutions for them as one
person, and so far I only made a demo to show that these issues may be
easy to solve if we organize the code or the architecture in another
way. And one of the purposes of this demo is to show the issues remain
to solve in code for better illustration purposes, and I'm hoping the
community may come up with solutions for them based on the current
entity-manager and dbus-sensors step by step.

Other replies inlined. Thank you!

- Alex Qiu

On Tue, Jun 23, 2020 at 7:07 PM Ed Tanous <ed@tanous.net> wrote:
>
> On Tue, Jun 23, 2020 at 4:47 PM Alex Qiu <xqiu@google.com> wrote:
> >
> > For example, some of our FRU has way too many temperature sensors from
> > hardware design, but we only need the maximum temperature among these
> > sensors on one FRU to feed to the PID control loop or health
> > reporting. It would be great to aggregate the sensors according to
> > individual FRUs. You can see this sensor aggregation feature as a
> > simple example in my demo:
> > https://github.com/alex310110/bmc-proto-20q2/blob/master/downstream/card_example_g.py#L69
> >
> Too many sensors for what?  Are you overloading dbus?
>
> Margin sensors were something we (note, I say we here, but James did
> vastly more work than I did) explicitly did not add, and the argument
> I used at the time was threefold.
> 1. PID control loops can (and in phosphor-pid-control do) use multiple
> sensors, and wildcards as inputs to a single PID.  This reduces the
> configuration and computation overhead for many-sensored systems down
> to what an equivalent "aggregate" sensor would be.
> 2. If the aggregated data is not put onto dbus (or somewhere else) you
> lose all the information.  As an engineer/operator, many users would
> ask "which sensor is causing the margins to be so low?" and "Why is
> this sensor in the error state?" which would require someone to break
> them down again, debug the issue, then reaggregate them.  You could
> probably solve this with an interface, but we decided that we'd rather
> spend time optimizing the sensor subsystem in other ways (like only
> sending temp changes, and batching updates).  Keep in mind, if you're
> clever about batching the PropertiesChanged signal on dbus, the
> per-sensor cost of unpacking the value is minimized quite a bit, as
> the O(1) in the dbus message significantly outweighs the O(N) for the
> sensor values.  With that said, I could see if you needed to add this
> feature for systems with many, many sensors on a slow processor, but
> if it were me, I would focus my efforts on further increasing the
> sensor subsystem performance for these cases.  There's a lot of
> performance wins that could be had depending on your configuration.
> 3. If your sensors have different thresholds, that means that they
> need to be represented as an integer that is negative when the system
> is within margin.  Small negative numbers cause confusion at first
> glance, and require a lot more documentation than if all the sensor
> values are available, and the end unit (be it webui or control system)
> aggregates them by type.  It should be noted, phosphor-pid-control has
> a mode where it can treat sensors as a margin against their threshold,
> so you can still have this behavior in the PID loops, it's just
> aggregated in a less confusing place.
>
> With all of that said, if you still feel like you need aggregated
> sensor types, they could certainly be added.

In the end, we probably just want one aggregated temperature from a
lot of temperature sensors as one temperature reading at the protocol
layer, and the PID loop should also run based on that one. The hard
limit is that we are running out of IPMI sensor space if we don't
aggregate them into one sensor. Definitely we can deal with this issue
before these many readings are dumped onto dbus, but I would prefer
the existing OpenBMC to have a good framework to handle that by
itself.

On overloading dbus, could you elaborate more on this? What kind of
number limit have you seen on dbus due to many sensors based on your
experience? We are right now actively investigating an issue that dbus
is very slow, even though we haven't filled up the IPMI sensor space
yet.
+Sui Chen +Andrew Macrae +Jie Yang +Peter Lundgren +Josh Lehan

>
> > We also encounter voltage regulator devices requiring current reading
> > corrections according to their own properties (duty cycle and
> > temperature). This correction is not preferred to perform within
> > kernel drivers, and we are suggested to deal with them in user space.
> > See "Read Output Current" in
> > https://www.maximintegrated.com/en/design/technical-documents/app-notes/6/6042.html
> > Example in demo:
> > https://github.com/alex310110/bmc-proto-20q2/blob/master/device_xam2734.py
> >
>
> Openbmc doesn't do this today anywhere that I'm aware of, so this
> would be a feature you need to add.  There's a couple of ways you
> could slice this.
> A. You say that you don't want to do this in a kernel module, but you
> don't say why.   Considering this part is a self-contained unit, and
> it's in the appnote for the part, getting the kernel driver to do the
> calculation for you seems straightforward, and would help many others
> outside of openbmc if you did it in the kernel code.  Can you
> elaborate on why you're thinking it would be better to do it in
> userspace?
> B. If you must do it in userspace, I'd expect you'd want to start by
> writing a MAX20743 userspace daemon that pulls in both of those
> values, and does the calculation, manages thresholds, then posts to
> the sensor interface on dbus .  What I'm not following is what about
> this you're finding difficult that we can make better.  There are some
> simple <200 line examples that you can grab and extend.  How can we
> make this easier for those in the future?  Ideally we would have
> supported that maxim chip out of the box, but in this case, let's make
> it generic enough so that the next person that gets here doesn't have
> to edit code, and can simply add it to their configuration.

This is from my hearsays, and based on my knowledge of the PMBus
drivers, it's not a trivial task. +Jason Ling here.

For user space, again, there's the trouble for the daemon itself to
perform the topology discovery. Or the entity-manager should be able
to export the details of topology discovery and notify downstream
daemons.

>
> > >
> > > >
> > > > Due to having no place for code as control flow around the sensors or data purely generated based on configuration files:
> > >
> > > I don't really follow this.  There is a lot of code that has quite a
> > > bit of control flow in the consumers (like dbus-sensors).  In theory,
> > > the configuration files are intended to be there only to specify the
> > > things that are different between individual types of "exposes"
> > > records.  The intention is that if you have a platform that needs a
> > > different control flow than someone else, you would make that control
> > > flow configurable through some kind of setting, but there are _tons_
> > > of cases where we can assume that all platforms can have the same
> > > control flow, and write it as code with no parameters.
> >
> > I think you are right, and I didn't illustrate my idea well. I'm
> > thinking of being able to easily add additional control flow as
> > needed, and also making related code/config organized to be as
> > adjacent as possible, so developers can view/add/modify related
> > code/config fast, without the need to jump among different code
> > repositories. Please also see the demo parts above.
>
> Gotcha.  If I can summarize, you didn't like that the code was in 2
> separate repos.  I completely agree with you, and at a couple of
> conferences, I spitballed the idea of merging all of the
> OpenBMC-specific code into a single repo.  This needs more thought,
> and a real design doc, but it's something I long intended on trying to
> propose, just never had the time.  Maybe you could start with those
> two modules, and come up with a way we could organize it better?
>
> >
> > >
> > > >
> > > > When the current configuration format cannot express a hardware topology or topology discovery logic, a new schema is added case by case. This not only causes burden to the configuration files themselves but also the configuration file parser. In practice, some code is often preferred to intervene around the hardware topology.
> > > Can you give some examples of this?  I'm not really following.  See
> > > above, schemas are added where we make the decision that a platform
> > > might need one behavior, or another depending on their preference.  If
> > > every platform can agree on a single handling of a behavior, or the
> > > correct behavior is detectable at runtime, there's no reason for it to
> > > be in the configuration files at all.  Your point about the parser is
> > > well taken;  I had always intended the majority of the files to be
> > > moved into an optimized compile-time data structure so they didn't
> > > need to be parsed at startup in most cases.  With that said, the file
> > > parsing with the current number of supported config files present is
> > > relatively low overhead.  This trick will stop scaling eventually as
> > > we support more and more boards, but there should be a no external
> > > impact hit to move that code to run at compile time.
> >
> > Please see this email thread on Configuring devices with I2C MUX:
> > https://lists.ozlabs.org/pipermail/openbmc/2020-January/020144.html
> >
> > We ended up configuring the device tree to help solving this problem,
> > but it is still sort of a workaround, and we would be more happy if
> > entity-manager can handle this dynamic topology by itself.
>
> Without knowing the details of the system you're working on, I'm not
> sure how to help more.  This is certainly a place where the i2c
> topology is hard to grok, but to my understanding most of the topology
> should've been automatically handled by the code James pointed you to.
> Clearly you weren't able to make that work, so there's either a bug,
> missing feature, or a configuration error present in that scenario.
> In your case, it sounds like you opted for a compile-time config
> option, which should work just fine if you don't need the dynamic
> behavior of adding/removing components at runtime.  If you'd like to
> triage a little, I'm sure there are people that could help you.
>
> >
> > >
> > > >
> > > > It is hard for developers to debug against a certain hardware, as the developer has to filter the log for it based on various condition statements.
> > >
> > > This is a very valid complaint that I've had myself many many times in
> > > the past.  I'm assuming you're looking at the dbus logs.  Can you come
> > > up with some concrete suggestions on what could be done to make it
> > > better?  Better naming of dbus paths?  Better tooling?  If you could
> > > do up a design doc, I'm sure there'd be many people willing to help
> > > you.
> >
> > Unfortunately, I don't have a solution for this for now, but only
> > demonstrated the possibility in my demo, where you can add debug
> > statements per FRU type easily once you find the source file for the
> > FRU. If as you said in your previous replies that the component
> > organization is intentionally not exposed to the dbus, we would
> > probably have difficulties finding these information in the dbus for
> > debugging, right?
> Yes.  Rarely in working on OpenBMC have I actually needed to know FRU
> level logging.  In general, the cases that are more common are "I need
> to debug X thing on this card" or "I need to debug all things of class
> X".  Very rarely do I need "X, Y, and Z on this card" in a single log
> filter.  With that said, that's just how I think.  If you come up with
> some concrete solutions, I'm sure many would be able to help you flesh
> them out.  Yes, with the current data available, going from log -> FRU
> (in that direction) is very difficult.  Go look at the logging code,
> what interfaces would you need to be able to build what you want?
>
> >
> > >
> > > >
> > > > It is hard for new developers to ramp up on debugging, because there is no symbol for developers to quickly look up and find the code related to one hardware or module to put debug statements.
> > >
> > > I'm not quite following this.  Were you hoping to have a "symbol" that
> > > would represent a FRU?  a specific sensor?  A specific subsystem?  A
> > > specific instance (drive 1,2,3 ect) of the previous?  This might be
> > > solvable with documentation.  Coming up with a couple use cases like
> > > "I want to filter on i2c transactions to a device at address X" and
> > > documenting how to do it might be a good start?
> > > Today, the dbus logs give you a way to filter on:
> > > 1. All messages from a particular daemon.
> > > 2. Messages only from a particular dbus path, which allows you to
> > > filter on all events from a specific piece of hardware. (see the
> > > path_namespace filter)
> > > 3. A particular event type, which lets you filter on all events of a
> > > certain class.
> > > and any combination of the above.  I'm guessing we just need to
> > > document this a little better
> >
> > Apologies that I have little experience debugging with dbus, and this
> > statement came straight out of my mind with the question: If I am a
> > new developer for this hardware project, how can I quickly locate the
> > proper place to intercept the control flow and put debug prints?
> >
>
> Slight correction, openbmc is a software project, not a hardware
> project.  Because of that, and how configurable it is, it's really
> hard to give generic "put your control flow here".  In the case of
> sensors, find the sensor you care about, find the control loop you
> care about (most of them are relatively small) and add printing code,
> then use the Journal to grab the information.  If you're looking to
> debug low level kernel info, enable kernel tracing as you would any
> other kernel module.  Have you tried reading about the various logging
> capabilities dbus and journald have?  Is there a tool you could add to
> phosphor-webui that would capture the logs you want over the websocket
> dbus connection?  Because that code is running on the users browser,
> there's a lot of freedom to do what you want there.
>
> > >
> > > >
> > > > Unfortunately, debugging hardware issues occupy the life of BMC developers quite often, then debug codes are sometimes asked to be turned into workarounds which will stay in the code base. The harder to debug, the even harder it is to put up an elegant workaround for hardware issues.
> > > >
> > > > If the OpenBMC framework does not provide enough flexibility to accommodate specialized code for specific requirements, they quickly become downstream patches and technical debts, and they cause cost on maintenance as an open source software.
> > > >
> > > > Issue Examples
> > > >
> > > > In this section, I will describe issues that we’re facing with the existing dynamic software stack, and they should all be well handled by "Improvements".
> > > >
> > > > Configuring device registers according to needs
> > > >
> > > > For the context, related discussions can be found in the mailing list archive “Configuring shunt_resistor in hwmon”. Although we managed to properly adjust the IPMI readouts using scales, we later realized that it would still be better to configure it directly in hwmon sysfs. If we configure it in hwmon, we can have the correct reading right from the bottom. On the other hand, an implementation using the device tree is probably against the idea of having the dynamic software stack to configure hardware only when discovered.
> > >
> > > Contrary to what you said, at one point the dbus-sensors
> > > implementations used runtime-compiled device tree overlays to
> > > implement their features, using a patch originally written by the
> > > raspberry pi team.  For sensors, these worked pretty well, but
> > > required out of tree patches that were never merged into the mainline
> > > kernel, and were removed because of exactly the problem you identify
> > > later with out of tree patches.  If getting this mainlined is
> > > something you're interested in pursuing, there are several people that
> > > have been down this path, and can likely give you pointers.  Your
> > > shunt resistor is a great example of a parameter (or possibly default
> > > behavior) that could be added to the ADC sensor implementation without
> > > a schema change.  If the shunt_resistor property exists, use that as
> > > the primary application of the ScaleFactor instead of doing it in
> > > userspace integer math.  That seems like a place where an argument
> > > could be made where that could be the always on behavior, as it would
> > > give the best result, but we'd have to collect some info on it.
> >
> > Some folks within my team would be interested in this information, as
> > they have been wondering why we didn't use device tree overlay. +Jason
> > Ling
>
> Short version of a long answer: To detect a full topology, you need
> userspace up to run a lot of code that shouldn't be in the kernel.
> Device tree overlays have to be added from inside the kernel (unless
> you have the aforementioned patch and can trigger an overlay from
> userspace).  Having to reboot the BMC every time the topology changes
> is a non-starter.
>
> >
> > >
> > > >
> > > >
> > > > Also, hardware engineers came up with requests to configure the voltage regulator outputs, and from my understanding, this is not what hwmon sysfs interface intended to do, and we needed that within a very short time period. I had to use shell script to configure the device registers by issuing raw I2C commands using i2ctools.
> > >
> > > I've long thought that we need a "set these i2c registers to these
> > > values", or "run this bash script" type in entity manager for this
> > > kind of thing for handling things like FPGAs, or configurable hardware
> > > that you don't necessarily need or want a full kernel driver for
> > > (especially in the prototyping stage).  Unfortunately, these kinds of
> > > hacks tended to be easier to do in a custom build with patches against
> > > the specific driver, rather than in a configurable way, so nobody
> > > really built it out.  Also, doing it in the kernel guarantees better
> > > timing, but I still think there's a use case for these.  My initial
> > > ideas for configuration options for a "register setter daemon", and a
> > > "run this command" daemon are below:
> > >
> > > {
> > > "Address": "$address",
> > > "Bus": "$bus",
> > > "Register": "0x12,
> > > "Value": "0x42",
> > > "Name": "Set PCIe Retimer to 42",
> > > "Type": "EEPROM"
> > > },
> > > {
> > > "Command": "/usr/bin/echo \"This is the greatest thing ever\"",
> > > "Name": "My \"greatest ever\" command"
> > > "Type": "Command"
> > > }
> >
> > Putting the register initialization into the kernel driver would have
> > two problems: 1) That would be a kernel patch to maintain downstream
> > instead; 2) kernel driver may not have the best knowledge on the value
> > to configure. For example, both FRU share the same VR device, but one
> > is required to set voltage to X, and the other is required to set
> > voltage to Y.
> 1. I don't follow why you go directly downstream.  Why not upstream
> the patch to the kernel?  Yeah, sometimes it's difficult, but you get
> a lot of great eyes on it, and the next person outside openbmc to hit
> this problem has a solution already baked in.
> 2. The kernel doesn't "need" to know what value to set for a given
> set.  It can take that direction from userspace and/or devicetree.
> All you have to do is expose the appropriate interface to userspace.

Because we don't want people outside of the project to see the
confidential code names and register specs. Unfortunately, I don't
know the reason for it further behind, and I just follow what I was
told to do. I see your point on the kernel driver parameter, then I
agree that we are still missing a feature on this for entity-manager
to export these parameters to kernel drivers.

>
> >
> > Adding the "command" property in configuration files brings an issue
> > that what if we need some 'if' statements (or any control flow only
> > expressible by code) around that command.
> >
> > A daemon on the other side, would have a redundant logic of hardware
> > discovery. I have been using this trick actually, but I have to face
> > an ironic fact that the daemon written in shell script does a reliable
> > job walking through the hardware topology, including distinguishing
> > the exact FRU that it should act upon. I believe this is some feature
> > that the OpenBMC software stack should take good care of.
>
> Again, the above is not for control flow.  Write your control flow in
> code (ideally compiled code, to avoid runtime/boot performance
> issues).  The above is intended for temporary, band-aid solutions.
> Bash is great... for certain things.  Walking a topology that might be
> large, complex, and have complexes that could have hardware errors
> present is not one of the things bash excels at.  Also keep in mind
> that your bash script walking all topologies is affecting the overall
> boot time of OpenBMC.
> What you've described is essentially what FruDevice/Entity Manager/
> dbus-sensors (and others) does.  FruDevice walks the i2c topology, and
> identifies all things, but doesn't take an opinion on whether or not
> they're interesting.  Entity manager takes those things, and tries to
> match them with runtime configurations.  These might be sensors,
> daemons that need launched, or settings that need conveyed to the
> kernel.  Then the individual pieces take off with their role.  There
> are other examples of other topology walkers, but that one is the most
> simple.

Yep, and I hope entity-manager can have a framework to take more
complicated topology discovery logics as code, so I don't have to do
it with shell script.

>
> >
> > Yep, your concern is exactly the case here. Our team doesn't have the
> > breath to generate a feature to handle special hardware requirements.
> > Some of the patches that we are maintaining right now are heavily
> > vendor-specific and can't be upstreamed, either. Due to these
> > realities, I'm suggesting to have an infrastructure which can handle
> > short-term bursts without causing too much overhead to maintain
> > patches.
>
> Personal opinion time: This is outside of OpenBMCs scope to make
> downstream management easier for you and your team.  Without knowledge
> of what patches you have, and what problems you're facing, it's nearly
> impossible to build a solution that would work for you.  With that
> said, if you have some ideas and they're actionable, happy to hear
> them.
>
> >
> > Point taken. And yes, I think it's another case that we don't have
> > time to really root cause it. +Peter Lundgren would have more info on
> > this.
> >
> I feel the "not enough time" problem.  Getting a real root cause is
> less important than creating a bug publically, and saying "this commit
> X caused this bad behavior Y, and if I remove it it goes away".  That
> puts the onus on the submitter to find the root cause, and gives the
> maintainer good reason to revert it.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2020-06-24 23:37 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2020-06-18 21:25 Feedback on Current OpenBMC and Proposing Some Improvements ---- Difficulties and Issue Examples Alex Qiu
2020-06-21 21:59 ` Ed Tanous
2020-06-23 23:47   ` Alex Qiu
2020-06-24  1:20     ` Patrick Williams
2020-06-24 16:57       ` Alex Qiu
2020-06-24  2:07     ` Ed Tanous
2020-06-24 23:37       ` Alex Qiu

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.