Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH net-next 00/12] Add support for PSE port priority
       [not found]       ` <ZwdpQRRGst1Z0eQE@pengutronix.de>
@ 2024-10-15  9:43         ` Kory Maincent
  2024-10-17 10:35           ` Kory Maincent
  0 siblings, 1 reply; 4+ messages in thread
From: Kory Maincent @ 2024-10-15  9:43 UTC (permalink / raw)
  To: Oleksij Rempel
  Cc: Kyle Swenson, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Jonathan Corbet, Donald Hunter, Thomas Petazzoni,
	linux-kernel@vger.kernel.org, netdev@vger.kernel.org,
	linux-doc@vger.kernel.org, Dent Project, kernel@pengutronix.de

Hello,

On Thu, 10 Oct 2024 07:42:25 +0200
Oleksij Rempel <o.rempel@pengutronix.de> wrote:

> > The condition where we've exceeded our system-level power
> > budget is a little different, in that it causes a port to be shutdown
> > despite that port not exceeding it's class power limit.  This condition
> > is the case I'm concerned we're solving in this series, and solving it
> > for the PD692xx case only, and it's based off dynamic power consumption.
> > 
> > So I guess I'm suggesting that we take the power budgeting concept out
> > of the PSE drivers, and put it into software (either kernel, userspace)
> > instead of the PSE hardware.  
> >   
> > >   I can't find global power budget concept for the TPS23881.   
> > 
> > This is because this idea doesn't exist on the TPS2388x.  
> >   
> > >   I could't test this case because I don't have enough load. In fact,
> > > maybe by setting the PD692x0 power bank limit low it could work.  
> > 
> > Hopefully this helps clarify.  
> 
> 
> Thank you for your detailed insights. Before we dive deeper into policies and
> implementations, I’d like to clarify an important point to avoid confusion
> later. When comparing different PSE components, it's crucial to note that the
> Microchip PD692x0 operates in two distinct categories:
> 1. PoE controller (PD692x0)
> 2. PoE manager (PD6920x)
> 
> Comparing the PoE controller (PD692x0) with TPS2388x or LTC4266 isn't entirely
> fair, as TPS2388x and LTC4266 are more comparable to the PoE manager
> (PD6920x). The functionalities provided by the PoE controller (PD692x0) are
> things we would need to implement ourselves on the software stack (kernel or
> userspace). The budget heuristic that is implemented in the PD692x0's
> firmware is absent in TPS2388x and LTC4266.
> 
> Policy Variants and Implementation
> 
> In cases where we are discussing prioritization, we are fundamentally talking
> about over-provisioning. This typically means that while a device advertises a
> certain maximum per-port power capacity (e.g., 95W), the total system power
> budget (e.g., 300W) is insufficient to supply maximum power to all ports
> simultaneously. This is often due to various system limitations, and if there
> were no power limits, prioritization wouldn't be necessary.
> 
> The challenge then becomes how to squeeze more Powered Devices (PDs) onto one
> PSE system. Here are two methods for over-provisioning:
> 
> 1. Static Method:
>  
>    This method involves distributing power based on PD classification. It’s
>    straightforward and stable, with the software (probably within the PSE
>    framework) keeping track of the budget and subtracting the power requested
> by each PD’s class. 
>  
>    Advantages: Every PD gets its promised power at any time, which guarantees
>    reliability. 
> 
>    Disadvantages: PD classification steps are large, meaning devices request
>    much more power than they actually need. As a result, the power supply may
>    only operate at, say, 50% capacity, which is inefficient and wastes money.
> 
> 2. Dynamic Method:  
> 
>    To address the inefficiencies of the static method, vendors like Microchip
>    have introduced dynamic power budgeting, as seen in the PD692x0 firmware.
>    This method monitors the current consumption per port and subtracts it from
>    the available power budget. When the budget is exceeded, lower-priority
>    ports are shut down.  
> 
>    Advantages: This method optimizes resource utilization, saving costs.
> 
>    Disadvantages: Low-priority devices may experience instability. A possible
>    improvement could involve using LLDP protocols to dynamically configure
>    power limits per port, thus allowing us to reduce power on over-consuming
>    ports rather than shutting them down entirely.

Indeed we will have only static method for PSE controllers not supporting system
power budget management like the TPS2388x or LTC426.
Both method could be supported for "smart" PSE controller like PD692x0.

Let's begin with the static method implementation in the PSE framework for now.
It will need the power domain notion you have talked about.

> Recommendations for Software Handling
> 
> Both methods have their pros and cons. Since the dynamic method is not always
> desirable, and if there's no way to disable it in the PD692x0's firmware, one
> potential workaround could be handling the budget in software and dynamically
> setting per-port limits. For instance, with a total budget of 300W and unused
> ports, we could initially set 95W limits per port. As high-priority PDs (e.g.,
> three 95W devices) are powered, we could dynamically reduce the power limit on
> the remaining ports to 15W, ensuring that no device exceeds that
> classification threshold.
> 
> This is just one idea, and there are likely other policy variants we could
> explore. Importantly, I believe these heuristics don’t belong in the kernel
> itself. Instead, the kernel should simply provide the necessary interfaces,
> leaving the policy implementation to userspace management software. At least
> this is a lesson learned from Thermal Management talk at LPC :D

I think the kernel is only missing the PSE notification events to be ready to
leave the port priority policy to the userspace.

Regards,
-- 
Köry Maincent, Bootlin
Embedded Linux and kernel engineering
https://bootlin.com

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH net-next 00/12] Add support for PSE port priority
  2024-10-15  9:43         ` [PATCH net-next 00/12] Add support for PSE port priority Kory Maincent
@ 2024-10-17 10:35           ` Kory Maincent
  2024-10-18  6:14             ` Oleksij Rempel
  0 siblings, 1 reply; 4+ messages in thread
From: Kory Maincent @ 2024-10-17 10:35 UTC (permalink / raw)
  To: Oleksij Rempel
  Cc: Kyle Swenson, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Jonathan Corbet, Donald Hunter, Thomas Petazzoni,
	linux-kernel@vger.kernel.org, netdev@vger.kernel.org,
	linux-doc@vger.kernel.org, Dent Project, kernel@pengutronix.de

On Tue, 15 Oct 2024 11:43:52 +0200
Kory Maincent <kory.maincent@bootlin.com> wrote:

> > Policy Variants and Implementation
> > 
> > In cases where we are discussing prioritization, we are fundamentally
> > talking about over-provisioning. This typically means that while a device
> > advertises a certain maximum per-port power capacity (e.g., 95W), the total
> > system power budget (e.g., 300W) is insufficient to supply maximum power to
> > all ports simultaneously. This is often due to various system limitations,
> > and if there were no power limits, prioritization wouldn't be necessary.
> > 
> > The challenge then becomes how to squeeze more Powered Devices (PDs) onto
> > one PSE system. Here are two methods for over-provisioning:
> > 
> > 1. Static Method:
> >  
> >    This method involves distributing power based on PD classification. It’s
> >    straightforward and stable, with the software (probably within the PSE
> >    framework) keeping track of the budget and subtracting the power
> > requested by each PD’s class. 
> >  
> >    Advantages: Every PD gets its promised power at any time, which
> > guarantees reliability. 
> > 
> >    Disadvantages: PD classification steps are large, meaning devices request
> >    much more power than they actually need. As a result, the power supply
> > may only operate at, say, 50% capacity, which is inefficient and wastes
> > money.
> > 
> > 2. Dynamic Method:  
> > 
> >    To address the inefficiencies of the static method, vendors like
> > Microchip have introduced dynamic power budgeting, as seen in the PD692x0
> > firmware. This method monitors the current consumption per port and
> > subtracts it from the available power budget. When the budget is exceeded,
> > lower-priority ports are shut down.  
> > 
> >    Advantages: This method optimizes resource utilization, saving costs.
> > 
> >    Disadvantages: Low-priority devices may experience instability. A
> > possible improvement could involve using LLDP protocols to dynamically
> > configure power limits per port, thus allowing us to reduce power on
> > over-consuming ports rather than shutting them down entirely.  
> 
> Indeed we will have only static method for PSE controllers not supporting
> system power budget management like the TPS2388x or LTC426.
> Both method could be supported for "smart" PSE controller like PD692x0.
> 
> Let's begin with the static method implementation in the PSE framework for
> now. It will need the power domain notion you have talked about.

While developing the software support for port priority in static method, I
faced an issue.

Supposing we are exceeding the power budget when we plug a new PD.
The port power should not be enabled directly or magic smoke will appear.
So we have to separate the detection part to know the needs of the PD from the
power enable part.

Currently the port power is enabled on the hardware automatically after the
detection process. There is no way to separate power port process and detection
process with the PD692x0 controller and it could be done on the TPS23881 by
configuring it to manual mode but: "The use of this mode is intended for system
diagnostic purposes only in the event that ports cannot be powered in
accordance with the IEEE 802.3bt standard from semiauto or auto modes."
Not sure we want that.

So in fact the workaround you talked about above will be needed for the two PSE
controllers.
 
> Both methods have their pros and cons. Since the dynamic method is not always
> desirable, and if there's no way to disable it in the PD692x0's firmware, one
> potential workaround could be handling the budget in software and dynamically
> setting per-port limits. For instance, with a total budget of 300W and unused
> ports, we could initially set 95W limits per port. As high-priority PDs (e.g.,
> three 95W devices) are powered, we could dynamically reduce the power limit on
> the remaining ports to 15W, ensuring that no device exceeds that
> classification threshold.

We would set port overcurrent limit for all unpowered ports when the power
budget available is less than max PI power 100W as you described.
If a new PD plugged exceed the overcurrent limit then it will raise an interrupt
and we could deal with the power budget to turn off low priority ports at that
time. 

Mmh in fact I could not know if the overcurrent event interrupt comes from a
newly plugged PD or not.

An option: When we get new PD device plug interrupt event, we wait the end of
classification time (Tpon 400ms) and read the interrupt states again to know if
there is an overcurrent or not on the port.

What do you think?

Regards,
-- 
Köry Maincent, Bootlin
Embedded Linux and kernel engineering
https://bootlin.com

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH net-next 00/12] Add support for PSE port priority
  2024-10-17 10:35           ` Kory Maincent
@ 2024-10-18  6:14             ` Oleksij Rempel
  2024-10-18 12:37               ` Kory Maincent
  0 siblings, 1 reply; 4+ messages in thread
From: Oleksij Rempel @ 2024-10-18  6:14 UTC (permalink / raw)
  To: Kory Maincent
  Cc: Kyle Swenson, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Jonathan Corbet, Donald Hunter, Thomas Petazzoni,
	linux-kernel@vger.kernel.org, netdev@vger.kernel.org,
	linux-doc@vger.kernel.org, Dent Project, kernel@pengutronix.de

On Thu, Oct 17, 2024 at 12:35:57PM +0200, Kory Maincent wrote:
> On Tue, 15 Oct 2024 11:43:52 +0200
> Kory Maincent <kory.maincent@bootlin.com> wrote:
> 
> > > Policy Variants and Implementation
> > > 
> > > In cases where we are discussing prioritization, we are fundamentally
> > > talking about over-provisioning. This typically means that while a device
> > > advertises a certain maximum per-port power capacity (e.g., 95W), the total
> > > system power budget (e.g., 300W) is insufficient to supply maximum power to
> > > all ports simultaneously. This is often due to various system limitations,
> > > and if there were no power limits, prioritization wouldn't be necessary.
> > > 
> > > The challenge then becomes how to squeeze more Powered Devices (PDs) onto
> > > one PSE system. Here are two methods for over-provisioning:
> > > 
> > > 1. Static Method:
> > >  
> > >    This method involves distributing power based on PD classification. It’s
> > >    straightforward and stable, with the software (probably within the PSE
> > >    framework) keeping track of the budget and subtracting the power
> > > requested by each PD’s class. 
> > >  
> > >    Advantages: Every PD gets its promised power at any time, which
> > > guarantees reliability. 
> > > 
> > >    Disadvantages: PD classification steps are large, meaning devices request
> > >    much more power than they actually need. As a result, the power supply
> > > may only operate at, say, 50% capacity, which is inefficient and wastes
> > > money.
> > > 
> > > 2. Dynamic Method:  
> > > 
> > >    To address the inefficiencies of the static method, vendors like
> > > Microchip have introduced dynamic power budgeting, as seen in the PD692x0
> > > firmware. This method monitors the current consumption per port and
> > > subtracts it from the available power budget. When the budget is exceeded,
> > > lower-priority ports are shut down.  
> > > 
> > >    Advantages: This method optimizes resource utilization, saving costs.
> > > 
> > >    Disadvantages: Low-priority devices may experience instability. A
> > > possible improvement could involve using LLDP protocols to dynamically
> > > configure power limits per port, thus allowing us to reduce power on
> > > over-consuming ports rather than shutting them down entirely.  
> > 
> > Indeed we will have only static method for PSE controllers not supporting
> > system power budget management like the TPS2388x or LTC426.
> > Both method could be supported for "smart" PSE controller like PD692x0.
> > 
> > Let's begin with the static method implementation in the PSE framework for
> > now. It will need the power domain notion you have talked about.
> 
> While developing the software support for port priority in static method, I
> faced an issue.
> 
> Supposing we are exceeding the power budget when we plug a new PD.
> The port power should not be enabled directly or magic smoke will appear.
> So we have to separate the detection part to know the needs of the PD from the
> power enable part.
> 
> Currently the port power is enabled on the hardware automatically after the
> detection process. There is no way to separate power port process and detection
> process with the PD692x0 controller and it could be done on the TPS23881 by
> configuring it to manual mode but: "The use of this mode is intended for system
> diagnostic purposes only in the event that ports cannot be powered in
> accordance with the IEEE 802.3bt standard from semiauto or auto modes."
> Not sure we want that.
> 
> So in fact the workaround you talked about above will be needed for the two PSE
> controllers.

For the TPS23881, "9.1.1.2 Semiauto", seems to be exactly what we wont:
"The port performs detection and classification (if valid detection
occurs) continuously. Registers are updated each time a detection or
classification occurs. The port power is not automatically turned on. A
Power Enable command is required to turn on the port"

For PD692x0 controller, i'm not 100% sure. There is "4.3.5 Set Enable/Disable
Channels" command, "Sets individual port Enable (Delivering power
enable) or Disable (Delivering power disable)." 

For my understanding, "Delivering power" is the state after
classification. So, it is what we wont too.

If, it works in both cases, it would be a more elegant way to go. THe
controller do auto- detection and classification, what we should do in
the software is do decide if the PD can be enabled based on
classification results, priority and available budget.

> > Both methods have their pros and cons. Since the dynamic method is not always
> > desirable, and if there's no way to disable it in the PD692x0's firmware, one
> > potential workaround could be handling the budget in software and dynamically
> > setting per-port limits. For instance, with a total budget of 300W and unused
> > ports, we could initially set 95W limits per port. As high-priority PDs (e.g.,
> > three 95W devices) are powered, we could dynamically reduce the power limit on
> > the remaining ports to 15W, ensuring that no device exceeds that
> > classification threshold.
> 
> We would set port overcurrent limit for all unpowered ports when the power
> budget available is less than max PI power 100W as you described.
> If a new PD plugged exceed the overcurrent limit then it will raise an interrupt
> and we could deal with the power budget to turn off low priority ports at that
> time. 

> Mmh in fact I could not know if the overcurrent event interrupt comes from a
> newly plugged PD or not.

Hm..  in case of PD692x0, may be using event counters?

> An option: When we get new PD device plug interrupt event, we wait the end of
> classification time (Tpon 400ms) and read the interrupt states again to know if
> there is an overcurrent or not on the port.

Let's try Semiauto mode for TPS23881 first, I assume it is designed
exactly for this use case.

And then, test if PD692x0 supports a way to disable auto power delivery
in the 4.3.5 command.

Regards,
Oleksij
-- 
Pengutronix e.K.                           |                             |
Steuerwalder Str. 21                       | http://www.pengutronix.de/  |
31137 Hildesheim, Germany                  | Phone: +49-5121-206917-0    |
Amtsgericht Hildesheim, HRA 2686           | Fax:   +49-5121-206917-5555 |

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH net-next 00/12] Add support for PSE port priority
  2024-10-18  6:14             ` Oleksij Rempel
@ 2024-10-18 12:37               ` Kory Maincent
  0 siblings, 0 replies; 4+ messages in thread
From: Kory Maincent @ 2024-10-18 12:37 UTC (permalink / raw)
  To: Oleksij Rempel
  Cc: Kyle Swenson, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Jonathan Corbet, Donald Hunter, Thomas Petazzoni,
	linux-kernel@vger.kernel.org, netdev@vger.kernel.org,
	linux-doc@vger.kernel.org, Dent Project, kernel@pengutronix.de

On Fri, 18 Oct 2024 08:14:26 +0200
Oleksij Rempel <o.rempel@pengutronix.de> wrote:

> On Thu, Oct 17, 2024 at 12:35:57PM +0200, Kory Maincent wrote:
> > On Tue, 15 Oct 2024 11:43:52 +0200
> > Kory Maincent <kory.maincent@bootlin.com> wrote:
> >   
>  [...]  
> > > 
> > > Indeed we will have only static method for PSE controllers not supporting
> > > system power budget management like the TPS2388x or LTC426.
> > > Both method could be supported for "smart" PSE controller like PD692x0.
> > > 
> > > Let's begin with the static method implementation in the PSE framework for
> > > now. It will need the power domain notion you have talked about.  
> > 
> > While developing the software support for port priority in static method, I
> > faced an issue.
> > 
> > Supposing we are exceeding the power budget when we plug a new PD.
> > The port power should not be enabled directly or magic smoke will appear.
> > So we have to separate the detection part to know the needs of the PD from
> > the power enable part.
> > 
> > Currently the port power is enabled on the hardware automatically after the
> > detection process. There is no way to separate power port process and
> > detection process with the PD692x0 controller and it could be done on the
> > TPS23881 by configuring it to manual mode but: "The use of this mode is
> > intended for system diagnostic purposes only in the event that ports cannot
> > be powered in accordance with the IEEE 802.3bt standard from semiauto or
> > auto modes." Not sure we want that.
> > 
> > So in fact the workaround you talked about above will be needed for the two
> > PSE controllers.  
> 
> For the TPS23881, "9.1.1.2 Semiauto", seems to be exactly what we wont:
> "The port performs detection and classification (if valid detection
> occurs) continuously. Registers are updated each time a detection or
> classification occurs. The port power is not automatically turned on. A
> Power Enable command is required to turn on the port"

I tested reading the assigned class and not the requested class register so I
thought it was not working but indeed it detects the class even if the port
power is off. That's what I was looking for, nice!
Just figured out also that calling pwoff is reseting detection, classification,
power policy... So the port need to be setup again after a pwoff.
 
> For PD692x0 controller, i'm not 100% sure. There is "4.3.5 Set Enable/Disable
> Channels" command, "Sets individual port Enable (Delivering power
> enable) or Disable (Delivering power disable)." 
> 
> For my understanding, "Delivering power" is the state after
> classification. So, it is what we wont too.

On the PD692x0 there is also a requested class and power value but it stay "to
no class detected value" (0xc) if the port is not enabled.
It did not find a way to detect the class and keep port power off.
 
> If, it works in both cases, it would be a more elegant way to go. THe
> controller do auto- detection and classification, what we should do in
> the software is do decide if the PD can be enabled based on
> classification results, priority and available budget.
> 
> > > Both methods have their pros and cons. Since the dynamic method is not
> > > always desirable, and if there's no way to disable it in the PD692x0's
> > > firmware, one potential workaround could be handling the budget in
> > > software and dynamically setting per-port limits. For instance, with a
> > > total budget of 300W and unused ports, we could initially set 95W limits
> > > per port. As high-priority PDs (e.g., three 95W devices) are powered, we
> > > could dynamically reduce the power limit on the remaining ports to 15W,
> > > ensuring that no device exceeds that classification threshold.  
> > 
> > We would set port overcurrent limit for all unpowered ports when the power
> > budget available is less than max PI power 100W as you described.
> > If a new PD plugged exceed the overcurrent limit then it will raise an
> > interrupt and we could deal with the power budget to turn off low priority
> > ports at that time.   
> 
> > Mmh in fact I could not know if the overcurrent event interrupt comes from a
> > newly plugged PD or not.  
> 
> Hm..  in case of PD692x0, may be using event counters?

Counters? I don't see how.

> > An option: When we get new PD device plug interrupt event, we wait the end
> > of classification time (Tpon 400ms) and read the interrupt states again to
> > know if there is an overcurrent or not on the port.  
> 
> Let's try Semiauto mode for TPS23881 first, I assume it is designed
> exactly for this use case.

Yes,

> And then, test if PD692x0 supports a way to disable auto power delivery
> in the 4.3.5 command.

I don't have this 4.3.5 command. Are you refering to another document than the
communication protocol version 3.55 document?

Regards,
-- 
Köry Maincent, Bootlin
Embedded Linux and kernel engineering
https://bootlin.com

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2024-10-18 12:37 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20241002-feature_poe_port_prio-v1-0-787054f74ed5@bootlin.com>
     [not found] ` <ZwaLDW6sKcytVhYX@p620.local.tld>
     [not found]   ` <20241009170400.3988b2ac@kmaincent-XPS-13-7390>
     [not found]     ` <ZwbAYyciOcjt7q3e@est-xps15>
     [not found]       ` <ZwdpQRRGst1Z0eQE@pengutronix.de>
2024-10-15  9:43         ` [PATCH net-next 00/12] Add support for PSE port priority Kory Maincent
2024-10-17 10:35           ` Kory Maincent
2024-10-18  6:14             ` Oleksij Rempel
2024-10-18 12:37               ` Kory Maincent

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox