Summarizing Meeting on BMC Aggregation

All of lore.kernel.org
 help / color / mirror / Atom feed

* Summarizing Meeting on BMC Aggregation
@ 2020-01-16 20:15 Richard Hanley
  2020-01-18  2:39 ` Michael Richardson
  2020-01-27  9:49 ` vishwa
  0 siblings, 2 replies; 4+ messages in thread
From: Richard Hanley @ 2020-01-16 20:15 UTC (permalink / raw)
  To: Neeraj Ladkani, vishwa, OpenBMC Maillist; +Cc: Nancy Yuen, Sossy Mansourian

[-- Attachment #1: Type: text/plain, Size: 3698 bytes --]

Hi everyone,

We had a meeting today to talk about BMC aggregation.  I wanted to
thank everyone who joined.

Below is my summary of the topics we discussed, and some of the action
items I took from the meeting.  Please let me know if there was something
important that I missed or miss-characterized.
------------------------------------------------------------------------------------------------------

There is a strong need to aggregate data and control features from multiple
BMCs into a single uniform view of a "machine."

The definition of a machine here is relatively opaque, but it can be
thought of as an atomic physical unit for management.  A machine is then
split into multiple domains, each of which is managed by some management
controller (most cases it would be a BMC).  There may be some cases where a
domain has multiple BMCs for redundancy.

Domains are relatively close to each other physically.  Sometimes they will
be in the same chassis/enclosure, while other cases they will be in an
adjacent tray.

One key point that was discussed in this meeting was that the data and
transport of these domains is relatively unconstrained.  Domains may be
connected to the aggregator via a LAN, but there is a community need to
support other transports like SMBus and PCIe.

An aggregator will likely need to be split up into three layers:

1) The lowest layer would detect, import, and transform individual domains
into a common data model.  We would need to provide a specification for
that data model and tooling for implementers to create their own instance
of a domain's data.

2) An aggregation layer would take the instances of these domain level data
models, and aggregate them into a single view or graph of the system.  This
process could be relatively automated graph manipulation.

3) A presentation layer would take that aggregate, and expose it to the
outside world.  This presentation layer could be Redfish, but there is some
divergence on that (see below). Regardless, we would need tooling to
program against the data model for implementers to modify their
presentation layers as needed.

There is fairly broad agreement that Layer 1 would need to support
multiple protocols including; Redfish, PLDM/MCTP, and legacy IPMI systems.
There would need to be support for creating custom drivers for importing
these various transports into a common data model.

There is some diverging needs when it comes to the presentation layer.
Here at Google, we were planning to have the presentation layer be
primarily Redfish and the common data model would be more Redfish focused.
Neeraj pointed out that there are some needs for other presentation layers
besides Redfish.

Some other design considerations include the hardware target for this
aggregator.  This aggregator will have to run on an OpenBMC platform, but
Google has some need for an aggregator to run on host linux machines for
legacy platforms without an out of band connection.

Another consideration is the security of this aggregator.  The aggregation
layer will have the primary responsibility of adjudicating authentication
and authorization for the sub-ordinate nodes.

One of the key takeaways (for me anyways) from this meeting is that there
is a community interest in keeping this aggregator generic, and not tied to
closely to a particular protocol, transport, or presentation layer.  There
was mention of the CIM data model that may be appropriate for this
situation.

We will be having follow-up meetings because this project is going to take
some time to scope out and design.  I will be researching prior art for
existing data models that we could build a presentation layer off of.

Regards,
Richard

[-- Attachment #2: Type: text/html, Size: 4203 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Summarizing Meeting on BMC Aggregation
  2020-01-16 20:15 Summarizing Meeting on BMC Aggregation Richard Hanley
@ 2020-01-18  2:39 ` Michael Richardson
  2020-01-27  9:49 ` vishwa
  1 sibling, 0 replies; 4+ messages in thread
From: Michael Richardson @ 2020-01-18  2:39 UTC (permalink / raw)
  To: OpenBMC Maillist

[-- Attachment #1: Type: text/plain, Size: 2913 bytes --]


Thank for the interesting call yesterday.
I don't think I will have the time to attend regularly, but we'll see.
I have code to write, ya know :-)

Richard Hanley <rhanley@google.com> wrote:
    > The definition of a machine here is relatively opaque, but it can be
    > thought of as an atomic physical unit for management. A machine is
    > then split into multiple domains, each of which is managed by some
    > management controller (most cases it would be a BMC). There may be
    > some cases where a domain has multiple BMCs for redundancy.

    > Domains are relatively close to each other physically. Sometimes they
    > will be in the same chassis/enclosure, while other cases they will be
    > in an adjacent tray.

    > One key point that was discussed in this meeting was that the data and
    > transport of these domains is relatively unconstrained. Domains may be
    > connected to the aggregator via a LAN, but there is a community need
    > to support other transports like SMBus and PCIe.

If I were designing this, I would define standard way to transport IPv6 over
SMBus and PCIe, and then use IPv6 Link-Local addresses, and call it all a
LAN.  This has three effects in my opinion:
1) all transports need and get security resulting in fewer bugs
2) no need to re-invent TCP or HTTPS
3) directly connected hosts have less inherent privilege.

The IETF ANIMA working group
    https://datatracker.ietf.org/wg/anima/documents/
has created an O&M mechanism called the Autonomic Control Plane.
It has a discovery and negotiation protocol (GRASP), and as well as
onboarding (BRSKI).  It is designed for exactly this kind of use.
https://datatracker.ietf.org/doc/rfc8368/  describes some of the high-level
design goals.  The documents are stuck in the RFC-EDITOR queue due to
cross-references, but will get RFC-numbers very soon.
I am one of the authors of BRSKI.

In addition, the IETF Remote Attestation WG (RATS), at:
   https://datatracker.ietf.org/wg/rats/documents/

has been working on an architectural document.   (We have people from TCG,
FIDO, Android, Global Platform, ...)   Actually, we have a few such
documents, and we are merging them, the live process visible at:
   https://github.com/ietf-rats-wg/architecture

In particular, I point you to this pull request which was discussed this past
Tuesday:
  https://github.com/ietf-rats-wg/architecture/pull/13/files#diff-daea007baaef3c42f94e996f540dcd76

Doesn't the composite device use case look very much like the aggregator
situation you are trying to create?  If you care about attestation (and I
think you said you did), then it seems like there are significant synergies
here.

--
]               Never tell me the odds!                 | ipv6 mesh networks [
]   Michael Richardson, Sandelman Software Works        |    IoT architect   [
]     mcr@sandelman.ca  http://www.sandelman.ca/        |   ruby on rails    [



[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 487 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Summarizing Meeting on BMC Aggregation
  2020-01-16 20:15 Summarizing Meeting on BMC Aggregation Richard Hanley
  2020-01-18  2:39 ` Michael Richardson
@ 2020-01-27  9:49 ` vishwa
  2020-01-27 14:58   ` vishwa
  1 sibling, 1 reply; 4+ messages in thread
From: vishwa @ 2020-01-27  9:49 UTC (permalink / raw)
  To: Richard Hanley, Neeraj Ladkani, OpenBMC Maillist
  Cc: murunata, kusripat@in.ibm.com, shahjsha@in.ibm.com,
	sgundura@in.ibm.com, vikantan@in.ibm.com

Hi Richard,

Thanks for capturing and sharing the discussion here. If I am reading it 
all correct, it looks like the aggregator here is an external entity and 
not part of one of the BMCs in the domain. To somewhat relate, this is 
kind of an aggregator like Nagios. Did I get it correct ?

The email mentions "data and control". Could you give an example 
solution on how below problem statements may be seen and executed by the 
proposed aggregator ?

*Hypothetical Problems*:

Case-1 : I have 4 Nodes in the rack, with each having a BMC inside, 
responsible for doing things for THAT node.
I want to power-on all the nodes in the rack and I want to use RedFish 
from a Management console.
Where is the aggregator in this setup and how is it orchestrated ?

Case-2 : Some BMC fails to power on the container node and it needs to 
report the error back to the initiator.

Thank you very much for taking this initiative,

!! Vishwa !!

On 1/17/20 1:45 AM, Richard Hanley wrote:
> Hi everyone,
>
> We had a meeting today to talk about BMC aggregation.  I wanted to 
> thank everyone who joined.
>
> Below is my summary of the topics we discussed, and some of the action 
> items I took from the meeting.  Please let me know if there was 
> something important that I missed or miss-characterized.
> ------------------------------------------------------------------------------------------------------
>
> There is a strong need to aggregate data and control features from 
> multiple BMCs into a single uniform view of a "machine."
>
> The definition of a machine here is relatively opaque, but it can be 
> thought of as an atomic physical unit for management.  A machine is 
> then split into multiple domains, each of which is managed by some 
> management controller (most cases it would be a BMC).  There may be 
> some cases where a domain has multiple BMCs for redundancy.
>
> Domains are relatively close to each other physically. Sometimes they 
> will be in the same chassis/enclosure, while other cases they will be 
> in an adjacent tray.
>
> One key point that was discussed in this meeting was that the data and 
> transport of these domains is relatively unconstrained.  Domains may 
> be connected to the aggregator via a LAN, but there is a 
> community need to support other transports like SMBus and PCIe.
>
> An aggregator will likely need to be split up into three layers:
>
> 1) The lowest layer would detect, import, and transform individual 
> domains into a common data model.  We would need to provide a 
> specification for that data model and tooling for implementers to 
> create their own instance of a domain's data.
>
> 2) An aggregation layer would take the instances of these domain level 
> data models, and aggregate them into a single view or graph of the 
> system.  This process could be relatively automated graph manipulation.
>
> 3) A presentation layer would take that aggregate, and expose it to 
> the outside world.  This presentation layer could be Redfish, but 
> there is some divergence on that (see below). Regardless, we would 
> need tooling to program against the data model for implementers to 
> modify their presentation layers as needed.
>
> There is fairly broad agreement that Layer 1 would need to support 
> multiple protocols including; Redfish, PLDM/MCTP, and legacy IPMI 
> systems.  There would need to be support for creating custom drivers 
> for importing these various transports into a common data model.
>
> There is some diverging needs when it comes to the presentation 
> layer.  Here at Google, we were planning to have the presentation 
> layer be primarily Redfish and the common data model would be more 
> Redfish focused.  Neeraj pointed out that there are some needs for 
> other presentation layers besides Redfish.
>
> Some other design considerations include the hardware target for this 
> aggregator.  This aggregator will have to run on an OpenBMC platform, 
> but Google has some need for an aggregator to run on host linux 
> machines for legacy platforms without an out of band connection.
>
> Another consideration is the security of this aggregator. The 
> aggregation layer will have the primary responsibility of 
> adjudicating authentication and authorization for the sub-ordinate nodes.
>
> One of the key takeaways (for me anyways) from this meeting is that 
> there is a community interest in keeping this aggregator generic, and 
> not tied to closely to a particular protocol, transport, or 
> presentation layer.  There was mention of the CIM data model that may 
> be appropriate for this situation.
>
> We will be having follow-up meetings because this project is going to 
> take some time to scope out and design.  I will be researching prior 
> art for existing data models that we could build a presentation layer 
> off of.
>
> Regards,
> Richard

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Summarizing Meeting on BMC Aggregation
  2020-01-27  9:49 ` vishwa
@ 2020-01-27 14:58   ` vishwa
  0 siblings, 0 replies; 4+ messages in thread
From: vishwa @ 2020-01-27 14:58 UTC (permalink / raw)
  To: Richard Hanley, Neeraj Ladkani, OpenBMC Maillist
  Cc: sgundura@in.ibm.com, shahjsha@in.ibm.com, vikantan@in.ibm.com,
	kusripat@in.ibm.com, murunata

Missed mentioning this variant.

All the 4 nodes in the rack together form 1 Machine. So, a power-on 
would mean, power-on all the nodes. Similarly, "Get the data" would 
mean, "Get the data" from all the nodes.

 From an external entity, there is ONE power-on. However, it needs to be 
deciphered into 4 power-on, one per each BMC in the rack

Thanks,

!!Vishwa !!

On 1/27/20 3:19 PM, vishwa wrote:
> Hi Richard,
>
> Thanks for capturing and sharing the discussion here. If I am reading 
> it all correct, it looks like the aggregator here is an external 
> entity and not part of one of the BMCs in the domain. To somewhat 
> relate, this is kind of an aggregator like Nagios. Did I get it correct ?
>
> The email mentions "data and control". Could you give an example 
> solution on how below problem statements may be seen and executed by 
> the proposed aggregator ?
>
> *Hypothetical Problems*:
>
> Case-1 : I have 4 Nodes in the rack, with each having a BMC inside, 
> responsible for doing things for THAT node.
> I want to power-on all the nodes in the rack and I want to use RedFish 
> from a Management console.
> Where is the aggregator in this setup and how is it orchestrated ?
>
> Case-2 : Some BMC fails to power on the container node and it needs to 
> report the error back to the initiator.
>
> Thank you very much for taking this initiative,
>
> !! Vishwa !!
>
> On 1/17/20 1:45 AM, Richard Hanley wrote:
>> Hi everyone,
>>
>> We had a meeting today to talk about BMC aggregation.  I wanted to 
>> thank everyone who joined.
>>
>> Below is my summary of the topics we discussed, and some of the 
>> action items I took from the meeting.  Please let me know if there 
>> was something important that I missed or miss-characterized.
>> ------------------------------------------------------------------------------------------------------ 
>>
>>
>> There is a strong need to aggregate data and control features from 
>> multiple BMCs into a single uniform view of a "machine."
>>
>> The definition of a machine here is relatively opaque, but it can be 
>> thought of as an atomic physical unit for management.  A machine is 
>> then split into multiple domains, each of which is managed by some 
>> management controller (most cases it would be a BMC).  There may be 
>> some cases where a domain has multiple BMCs for redundancy.
>>
>> Domains are relatively close to each other physically. Sometimes they 
>> will be in the same chassis/enclosure, while other cases they will be 
>> in an adjacent tray.
>>
>> One key point that was discussed in this meeting was that the data 
>> and transport of these domains is relatively unconstrained.  Domains 
>> may be connected to the aggregator via a LAN, but there is a 
>> community need to support other transports like SMBus and PCIe.
>>
>> An aggregator will likely need to be split up into three layers:
>>
>> 1) The lowest layer would detect, import, and transform individual 
>> domains into a common data model.  We would need to provide a 
>> specification for that data model and tooling for implementers to 
>> create their own instance of a domain's data.
>>
>> 2) An aggregation layer would take the instances of these domain 
>> level data models, and aggregate them into a single view or graph of 
>> the system.  This process could be relatively automated graph 
>> manipulation.
>>
>> 3) A presentation layer would take that aggregate, and expose it to 
>> the outside world.  This presentation layer could be Redfish, but 
>> there is some divergence on that (see below). Regardless, we would 
>> need tooling to program against the data model for implementers to 
>> modify their presentation layers as needed.
>>
>> There is fairly broad agreement that Layer 1 would need to support 
>> multiple protocols including; Redfish, PLDM/MCTP, and legacy IPMI 
>> systems.  There would need to be support for creating custom drivers 
>> for importing these various transports into a common data model.
>>
>> There is some diverging needs when it comes to the presentation 
>> layer.  Here at Google, we were planning to have the presentation 
>> layer be primarily Redfish and the common data model would be more 
>> Redfish focused.  Neeraj pointed out that there are some needs for 
>> other presentation layers besides Redfish.
>>
>> Some other design considerations include the hardware target for this 
>> aggregator.  This aggregator will have to run on an OpenBMC platform, 
>> but Google has some need for an aggregator to run on host linux 
>> machines for legacy platforms without an out of band connection.
>>
>> Another consideration is the security of this aggregator. The 
>> aggregation layer will have the primary responsibility of 
>> adjudicating authentication and authorization for the sub-ordinate 
>> nodes.
>>
>> One of the key takeaways (for me anyways) from this meeting is that 
>> there is a community interest in keeping this aggregator generic, and 
>> not tied to closely to a particular protocol, transport, or 
>> presentation layer.  There was mention of the CIM data model that may 
>> be appropriate for this situation.
>>
>> We will be having follow-up meetings because this project is going to 
>> take some time to scope out and design.  I will be researching prior 
>> art for existing data models that we could build a presentation layer 
>> off of.
>>
>> Regards,
>> Richard
>

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2020-01-27 14:58 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2020-01-16 20:15 Summarizing Meeting on BMC Aggregation Richard Hanley
2020-01-18  2:39 ` Michael Richardson
2020-01-27  9:49 ` vishwa
2020-01-27 14:58   ` vishwa

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.