From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-11.5 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,NICE_REPLY_A, SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_1 autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3B271C2D0A3 for ; Thu, 29 Oct 2020 14:21:36 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id E7B2120825 for ; Thu, 29 Oct 2020 14:21:35 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726595AbgJ2OVf (ORCPT ); Thu, 29 Oct 2020 10:21:35 -0400 Received: from foss.arm.com ([217.140.110.172]:38144 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726182AbgJ2OUy (ORCPT ); Thu, 29 Oct 2020 10:20:54 -0400 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 49DCC13D5; Thu, 29 Oct 2020 07:20:53 -0700 (PDT) Received: from [10.57.13.20] (unknown [10.57.13.20]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id B01CD3F719; Thu, 29 Oct 2020 07:20:50 -0700 (PDT) Subject: Re: [PATCH 1/4] dt-bindings: opp: Introduce opp-sustainable bindings To: Nishanth Menon Cc: linux-kernel@vger.kernel.org, linux-pm@vger.kernel.org, devicetree@vger.kernel.org, linux-arm-kernel@lists.infradead.org, vireshk@kernel.org, robh+dt@kernel.org, sboyd@kernel.org, rafael@kernel.org, sudeep.holla@arm.com, daniel.lezcano@linaro.org, Dietmar.Eggemann@arm.com References: <20201028140847.1018-1-lukasz.luba@arm.com> <20201028140847.1018-2-lukasz.luba@arm.com> <20201028214713.zttk47qtua5jhieo@pureness> <5b3a99a8-6972-5c60-6cc5-00ec84387b97@arm.com> <20201029125932.fvhaj6fsgt3qvmoc@gloomily> <792cce7c-a316-fee4-fbf6-3cfc688b7088@arm.com> <20201029134953.k3ag72zujqsdt3fm@platonic> From: Lukasz Luba Message-ID: Date: Thu, 29 Oct 2020 14:20:48 +0000 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.9.0 MIME-Version: 1.0 In-Reply-To: <20201029134953.k3ag72zujqsdt3fm@platonic> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit Precedence: bulk List-ID: X-Mailing-List: devicetree@vger.kernel.org On 10/29/20 1:49 PM, Nishanth Menon wrote: > On 13:33-20201029, Lukasz Luba wrote: >> >> >> On 10/29/20 12:59 PM, Nishanth Menon wrote: >>> On 10:04-20201029, Lukasz Luba wrote: >>>> >>>> >>>> On 10/28/20 9:47 PM, Nishanth Menon wrote: >>>>> On 14:08-20201028, Lukasz Luba wrote: >>>>>> Add opp-sustainable as an additional property in the OPP node to describe >>>>>> the sustainable performance level of the device. This will help to >>>>>> estimate the sustainable performance of the whole system. >>>>>> >>>>>> Signed-off-by: Lukasz Luba >>>>>> --- >>>>>> Documentation/devicetree/bindings/opp/opp.txt | 4 ++++ >>>>>> 1 file changed, 4 insertions(+) >>>>>> >>>>>> diff --git a/Documentation/devicetree/bindings/opp/opp.txt b/Documentation/devicetree/bindings/opp/opp.txt >>>>>> index 9847dfeeffcb..cd01028de305 100644 >>>>>> --- a/Documentation/devicetree/bindings/opp/opp.txt >>>>>> +++ b/Documentation/devicetree/bindings/opp/opp.txt >>>>>> @@ -154,6 +154,10 @@ Optional properties: >>>>>> - opp-suspend: Marks the OPP to be used during device suspend. If multiple OPPs >>>>>> in the table have this, the OPP with highest opp-hz will be used. >>>>>> +- opp-sustainable: Marks the OPP as sustainable. This property can be used for >>>>>> + estimating sustainable performance of the whole system. If multiple OPPs in >>>>>> + the table have this, the OPP with highest opp-hz will be used. >>>>> >>>>> >>>>> By "sustainable", do you mean sustainable across Process, Voltage and >>>>> Temperature corners upto the max rated operational Power-ON hours >>>>> without IDLE state being achieved on the processor? >>>> >>>> Yes, in case of CPU: running 100% without idle at that particular OPP. >>>> Running above that OPP would lead to cross control temperature. >>> >>> We need to tighten the definitions a lot more here and add that to the >>> binding. What we are stating, if I am not misunderstanding is an OPP >>> that is guaranteed by SoC vendor that across Process Voltage and >>> Temperature corners - aka across the entire production spectrum >>> for the part number, *all* devices will operate at this OPP for the >>> mandated power-on-hours rating without hitting IDLE. >>> >>> Example: So -40C to 125C, across the process (hot/cold/nominal), 100s of >>> thousands/millions of units can operate upto 125,0000 power-on-hours >>> while running a tight deadloop OR maybe high processing function or even >>> cpuburn[1]? >> >> I think I know what you mean. But this would lead to redefining a lot >> more that just this optional field. This wide range -40C to 125C is for >> automotive chips, then what about opp-suspend, when the device cannot >> even reach that OPP under some stress test e.g. outside temp >> ~100-110C... >> Or opp-turbo, shell all the OPPs have multidimensional table to reflect >> the temperature dependency for all affected optional fields? > > yes, and down the rabbit hole we will go :) > >> >>> >>> >>> Can you give me one SoC vendor and part that guarantees this? I am >>> wondering if this is all theoretical... There are tons of parameters >>> that come into play for "reliability" "sustainability" etc. Those are >>> tricky terminology that typically makes legal folks pretty happy to >>> debate for decades.. >> >> Yes, but the outside temperature is probably most important for this use >> case. >> >>> >>> just my 2 cents. >>>> >>>>> >>>>> OR do you mean to leave it up to interpretation? >>>> >>>> I can tell how I would use them. There is thermal governor IPA, which >>>> needs sustainable power either form DT or uses internal algorithm to >>>> estimate it based on lowest allowed freq OPPs. Then it estimated >>>> internal coefficients based on that value, which is not optimal >>>> for lowest OPPs. When some higher OPP could be marked as sustainable, >>>> it would lead to better estimation and better power budget split. >>> >>> Seeing your series, I got an idea about how you plan on using it, I >>> just think we need to be more precise in our definition.. >> >> Thank you for having a look on that and understanding the motivation >> behind this series. >> >> How about adding a description that this sustainable OPP is considered >> for normal room temp (20-25C)? > > You could.. but then, practically as we go into smaller process nodes, > the 20-25C reliability is just theoretical. I mean, we Texans in summer > or Finns in winter would probably define "normal room temperature" as > something different in practise (ISO not withstanding ;) ).. Challenge > of reliability has always been on the edge of the PVT ranges. To ignore > that OR to have a scheme that does not scale to describe that, IMHO is a > lacking definition. > > My entire point is, if we can avoid getting into rabbit hole > definitions, we probably should.. IMHO.. keep things as simple as > possible. > >> >> BTW, in the Arm SCMI spec definition of that value (used in patch 4/4), > > You mean [1] Table 11 Performance Domain Levels with Special > Significance Yes, the table 11 from that SCMI doc (under link you provided). >> there is no specific temperature for it, just: >> 'This is the maximum performance level that the platform can >> sustain under normal conditions. In exceptional circumstances, >> such as thermal runaway, the platform might not be be able to >> guarantee this level.' >> > > Hehe.. Vincent and SCMI teams have been having fun there :)... But, I > think the definition has little practical significance for the very > reasons I made above IMHO, and with full respect to SCMI team(defining > SCMI is not an easy task, I admit) - it is at best a theoretical, > "works at the engineer's cube definition", as typical "nominal > operation conditions" escape clause tend to be, OR at the worst > ignoring to define the parameters that constitute what would bound > things in a closed box precisely (example: does'nt mention process, so > just nominal OR considers all process corners - what does omission of > that factor really mean?). > > >> I can put this whole description into the DT binding, if you like. > > Will leave it to Viresh and others to comment and guide, the terminology > got my attention, since I almost got bit by a similar usage.. my 2 cents: > I dont think that suffices unfortunately. what it lacks are the > parameters of what that terminology really means, > > One actual production part that demonstrates this will probably help > guide the discussion, I guess.. > > /me goes back to OPP hibernation > >> > > [1] https://developer.arm.com/documentation/den0056/b >