Linux Power Management development

* Re: [PATCH v4 0/3] DEVFREQ, DVFS framework for non-CPU devices
From: Turquette, Mike @ 2011-08-01 22:01 UTC (permalink / raw)
  To: myungjoo.ham
  Cc: Len Brown, Greg Kroah-Hartman, Kyungmin Park, Thomas Gleixner,
	linux-pm
In-Reply-To: <CAJ0PZbT0+ovjPAnVKaaJNVChG+kU3cgjC9sEU4ppnMGqGL+UoA@mail.gmail.com>

On Sun, Jul 31, 2011 at 11:22 PM, MyungJoo Ham <myungjoo.ham@samsung.com> wrote:
> Hello.
>
> On Sat, Jul 30, 2011 at 10:02 AM, Turquette, Mike <mturquette@ti.com> wrote:
>> On Fri, Jul 29, 2011 at 2:10 AM, Rafael J. Wysocki <rjw@sisk.pl> wrote:
>>> On Friday, July 29, 2011, Turquette, Mike wrote:
>>>> On Thu, Jul 28, 2011 at 3:10 PM, Rafael J. Wysocki <rjw@sisk.pl> wrote:
>>>> > On Friday, July 15, 2011, MyungJoo Ham wrote:
>>>> >> For a usage example, please look at
>>>> >> http://git.infradead.org/users/kmpark/linux-2.6-samsung/shortlog/refs/heads/devfreq
>>>> >>
>>>> >> In the above git tree, DVFS (dynamic voltage and frequency scaling) mechanism
>>>> >> is applied to the memory bus of Exynos4210 for Exynos4210-NURI boards.
>>>> >> In the example, the LPDDR2 DRAM frequency changes between 133, 266, and 400MHz
>>>> >> and other related clocks simply follow the determined DDR RAM clock.
>>>> >>
>>>> >> The DEVFREQ driver for Exynos4210 memory bus is at
>>>> >> /arch/arm/mach-exynos4/devfreq_bus.c in the git tree.
>>>> >>
>>>> >> MyungJoo Ham (3):
>>>> >>   PM: Introduce DEVFREQ: generic DVFS framework with device-specific
>>>> >>     OPPs
>>>> >>   PM / DEVFREQ: add example governors
>>>> >>   PM / DEVFREQ: add sysfs interface (including user tickling)
>>>> >
>>>> > OK, I'm going to take the patches for 3.2.
>>>>
>>>> Have any other platforms signed up to use this mechanism to manage
>>>> their peripheral DVFS?
>>>
>>> Not that I know of, but one initial user is sufficient for me.
>>> So if you have anything _against_ the patches, please speak up.
>>
>> I do have some concerns.  Let me start by saying that I'm defining a
>> "governor" as some active piece of executing code, probably a looping
>> workqueue that inspects activity/idleness of a device and then makes a
>> determination regarding clock frequency.
>>
>> devfreq seems to be good framework for creating DVFS governors.
>> However I think that most scalable devices on an SoC do *not* need a
>> governor, and many scalable devices won't have performance counters or
>> any other way to implement such introspection.
>
> Yes, governors except for some static or userspace-driven ones (such
> as "performance", "powersave", and "userspace" although "userspace" is
> not implemented for devfreq yet), they loop workqueue that inspects
> activity/idleness of a device and determines frequency. However, the
> inspection is done with a callback provided by each device, not done
> directly by the devfreq itself. Therefore, if there is any way to
> measure the activities (not just performance counters, number of
> requests/function calls should be fine for may cases), normal
> governors like "simple-ondemand" will work.

Maybe I'm not understanding how the devfreq requests would be made
from drivers.  Can you explain an example where a single target device
named X has constraints placed on it's clock rate from two different
drivers Y & Z?  Imagine in this case that there are no performance
counters or any way in hardware to monitor device saturation.

>> Some examples include a MMC controller, which might change its clock
>> rate depending on the class of card that the user has inserted.  Or
>> even a "smartish" device like a GPU lacking performance counters; it's
>> driver will ramp up frequency when there is work to be done and kick
>> off a timeout.  If no new work comes in before the timeout then the
>> driver will drop the frequency.
>
> In the "simple MMC controller w/o performance counter" case, there are
> following ways to use devfreq even if using the number of requests or
> functions calls is not possible.
>
> Method 1) use "userspace" governor and let user process choose
> frequency based on the class

I'm less interested in userspace control of MMC controller operating
frequency and much more interested in how devfreq might arbitrate QoS
requests from multiple "client" devices.

> Method 2) use any "reasonable" governor and let the device driver set
> only "valid" frequencies enabled.

Can you elaborate on this?  I'm not sure I understand how this will
look in driver code.  Maybe the example I requested above will shed
some light.

>   For a rough example, we may do if class < 6, disable freq > 40MHz,
> class < 10, disable freq > 80MHz, and so on. If we do not have
> performance counters or any other mechanisms to monitor the
> activities, "performance" governor along with clock-gated MMC driver
> will save enough power.
>
> For GPUs without anything to monitor the activities, we may do the
> same as the MMC case.
>
> However, with the H/W I've got now, (Exynos4210), we have performance
> counters (PPMU) for many blocks: 3D(MALI GPU), ACP, CAMIF, CPU, DMC0,
> DMC1 (memory controllers), FSYS, IMAGE, LCD0, LCD1, MFC_L, MFC_R, TV,
> LEFT_BUS, and RIGHT_BUS. I don't think Exynos4 is an exceptionally
> fancy SoC (already millions are sold for phones) and other mobile SoCs
> (at least for flagship models) will have them very soon (or already
> have them). Along with this patch, in the example with git branch
> link, we control DMC0/DMC1 blocks. And,

I agree devfreq is well-suited for such hardware.

>> A governor is not required in these cases (as they are event driven)
>> and devfreq is quite heavyweight for such applications.  What is
>> needed is a QoS-style software layer that allows throughput requests
>> to be made from an initiator device towards a target device.  This
>> layer should aggregate requests since many initator devices may make
>> requests to the same target device.  This layer I'm describing, which
>> does not exist today, should be where the actual DVFS transition takes
>> place.  That could take the form of a clk_set_rate call in the clock
>> framework (as described by Colin in V1 of this series), or some other
>> not-yet-realized dvfs_set_opp ,or something like Jean Pihet's
>> per-device PM QoS patches or whatever.  For the purposes of this email
>> I don't really care which framework implements the QoS request
>> aggregation.
>
> Such aggregation could be also done with governors. If the
> governor-device pair does not want to poll devfreq wouldn't loop
> unless there is any governor-device pair that wants to do so. If it is
> event-driven, users may just "allow/disallow" frequencies with OPP
> framework and devfreq will choose proper frequency with the given
> governor for the device. If every device uses "static" or
> "event-driven" governors such as powersave/performance/userspace,
> there will be no polling/looping.

So drivers must disable OPPs, and then the non-polling devfreq
governor will have to be notified by the OPP code and then run it's
->target code again?  This sounds backwards to me.

devfreq seems like an ideal bit of code to understand the constraints
needed by a device (via the workqueue/monitor loop) and then request
those needs via the proper API.  It seems entirely wrong to me to have
other device drivers send their QoS needs to devfreq.

I'm starting to sound like a broken record though, and I've rescinded
my NAK in my reply to Rafael.  If you could explain how multiple
drivers can request their performance needs to a devfreq governor
(same question I asked above) then that would be really helpful.

Thanks,
Mike

> When it is going to be directly controlled by userspace, we'll need a
> "userspace" governor (same with userspace governor of cpufreq).
>
> If there is a QoS request for a devfreq-ed device, the request could
> be done with OPP's frequency enable/disable. If a device is to be
> executed at 400MHz or faster, all frequencies under 400MHz could be
> simply disabled w/ OPP. Devfreq governors cannot override such
> frequency enable/disable configurations.
>
> However, if such QoS requests need delays (timers) like tickle, a
> generalized tickle supplied with frequency or percent of max-frequency
> might work. (i.e., tickle(dev, freuqency, duration); ) Then, this
> generalized tickle will hold at the request frequency or higher by
> disabling lower frequencies temporarily.
>
>>
>> The point of describing this non-existant API is that devfreq should
>> really be just another input into it.  A governor that can measure bus
>> saturation is really cool, but it may not yield optimal results
>> compared to several drivers which make QoS-style requests and insure
>> that performance is guaranteed for their particular needs during their
>> transactions.  The good news is that we don't have to choose between
>> performance counter introspection and software QoS requests: both the
>> driver requests and the governor should all feed as inputs into the
>> QoS-style DVFS mechanism.
>>
>> Taking that logic to its inevitable conclusion, tickle doesn't belong
>> inside the governor at all.  If some device X wants to ramp up the
>> frequency of device Y, it should just make a QoS-style throughput
>> request towards device Y, possibly with a timeout (keeping the
>> original idea of tickle intact).  This is entirely a separate idea
>> from a governor's introspective workqueue loop.
>
> Although tickle is sharing the same loop with governors, tickle does
> not belong inside governors. Tickle overrides the decisions of
> governors; governor's decision function is not called if the device is
> being tickled. However, generalizing the tickle function so that it
> may take "at least at xx % of max frequency" or "operate at least xx
> khz" as an option seems reasonable for QoS requests. And such options
> might be implemented for next version of devfreq later. This requires
> modification in tickle function interface or adding another interface
> for tickle function. However, if such QoS requests do not need
> duration set, we can just go with OPP's frequency enable/disable and
> disable lower-than-QoS-requirement frequencies.
>
> Thus, I guess this QoS issue is somewhat not very significant for
> devfreq. And it can be easily mitigated by adding another interface or
> modifying the interface of tickle function.
>
>>
>> For userspace, a sysfs entry for tickle would also not feed into the
>> governor, but some dummy struct device *user would probably be the
>> initiator device and it would simply call the QoS-style throughput
>> API.
>>
>> In summary my objections to this series are:
>> 1) devfreq should not be the *final* software layer to invoke a DVFS
>> transition as it has not taken all constraints into account.
>> 2) a devfreq governor represents just one constraint out of many to be
>> considered for any given scalable device.
>
> If the concern is about the QoS requests, I guess generalizing tickle
> would be sufficient as above. For devices without performance counters
> and any other mechanisms to infer the usage statistics, "performance"
> governor with event-driven OPP freq-enable/disable should be fine.
>
>>
>> My objection to these patches getting merged is that I think they are
>> a bit ahead of their time.  We need to know what the real DVFS API
>> looks like underneath devfreq first, since devfreq should really be
>> built on top of it.
>>
>> Regards,
>> Mike
>>
>>> Thanks,
>>> Rafael
>>>
>> _______________________________________________
>> linux-pm mailing list
>> linux-pm@lists.linux-foundation.org
>> https://lists.linux-foundation.org/mailman/listinfo/linux-pm
>>
>
> Cheers!
> MyungJoo.
>
>
> --
> MyungJoo Ham (함명주), Ph.D.
> Mobile Software Platform Lab,
> Digital Media and Communications (DMC) Business
> Samsung Electronics
> cell: 82-10-6714-2858
>
_______________________________________________
linux-pm mailing list
linux-pm@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/linux-pm

^ permalink raw reply