* Common clock and dvfs [not found] <4DC07F10.6010305@ti.com> @ 2011-05-05 5:08 ` Cousson, Benoit 2011-05-05 6:11 ` Colin Cross 2011-05-05 6:25 ` Paul Walmsley 0 siblings, 2 replies; 22+ messages in thread From: Cousson, Benoit @ 2011-05-05 5:08 UTC (permalink / raw) To: linux-arm-kernel (Cc folks with some DVFS interest) Hi Colin, On Fri, 22 Apr 2011, Colin Cross wrote: > Now that we are approaching a common clock management implementation, > I was thinking it might be the right place to put a common dvfs > implementation as well. > > It is very common for SoC manufacturers to provide a table of the > minimum voltage required on a voltage rail for a clock to run at a > given frequency. There may be multiple clocks in a voltage rail that > each can specify their own minimum voltage, and one clock may affect > multiple voltage rails. I have seen two ways to handle keeping the > clocks and voltages within spec: > > The Tegra way is to put everything dvfs related under the clock > framework. Enabling (or preparing, in the new clock world) or raising > the frequency calls dvfs_set_rate before touching the clock, which > looks up the required voltage on a voltage rail, aggregates it with > the other voltage requests, and passes the minimum voltage required to > the regulator api. Disabling or unpreparing, or lowering the > frequency changes the clock first, and then calls dvfs_set_rate. For > a generic implementation, an SoC would provide the clock/dvfs > framework with a list of clocks, the voltages required for each > frequency step on the clock, and the regulator name to change. The > frequency/voltage tables are similar to OPP, except that OPP gets > voltages for a device instead of a clock. In a few odd cases (Tegra > always has a few odd cases), a clock that is internal to a device and > not exposed to the clock framework (pclk output on the display, for > example) has a voltage requirement, which requires some devices to > manually call dvfs_set_rate directly, but with a common clock > framework it would probably be possible for the display driver to > export pclk as a real clock. Those kinds of exceptions are somehow the rules for an OMAP4 device. Most scalable devices are using some internal dividers or even internal PLL to control the scalable clock rate (DSS, HSI, MMC, McBSP... the OMAP4430 Data Manual [1] is providing the various clock rate limitation depending of the OPP). And none of these internal dividers are handled by the clock fmwk today. For sure, it should be possible to extend the clock data with internal devices clock nodes (like the UART baud rate divider for example), but then we will have to handle a bunch of nodes that may not be always available depending of device state. In order to do that, you have to tie these clocks node to the device that contains them. And for the clocks that do not belong to any device, like most PRCM source clocks or DPLL inside OMAP, we can easily define a PRCM device or several CM (Clock Manager) devices that will handle all these clock nodes. > The proposed OMAP4 way (I believe, correct me if I am wrong) is to > create a new api outside the clock api that calls into both the clock > api and the regulator api in the correct order for each operation, > using OPP to determine the voltage. This has a few disadvantages > (obviously, I am biased, having written the Tegra code) - clocks and > voltages are tied to a device, which is not always the case for > platforms outside of OMAP, and drivers must know if their hardware > requires voltage scaling. The clock api becomes unsafe to use on any > device that requires dvfs, as it could change the frequency higher > than the supported voltage. You have to tie clock and voltage to a device. Most of the time a clock does not have any clear relation with a voltage domain. It can even cross power / voltage domain without any issue. The efficiency of the DVFS technique is mainly due to the reduction of the voltage rail that supply a device. In order to achieve that you have to reduce the clock rate of one or several clocks nodes that supply the critical path inside the HW. The clock node itself does not know anything about the device and that's why it should not be the proper structure to do DVFS. OMAP moved away from using the clock nodes to represent IP blocks because the clock abstraction was not enough to represent the way an IP is interacting with clocks. That's why omap_hwmod was introduced to represent an IP block. > Is the clock api the right place to do dvfs, or should the clock api > be kept simple, and more complicated operations like dvfs be kept > outside? In term of SW layering, so far we have the clock fmwk and the regulator fmwk. Since DVFS is about both clock and voltage scaling, it makes more sense to me to handle DVFS on top of both existing fmwks. Let stick to the "do one thing and do it well" principle instead of hacking an existing fmwk with what I consider to be an unrelated functionality. Moreover, the only exiting DVFS SW on Linux today is CPUFreq, so extending this fmwk to a devfreq kind of fwmk seems a more logical approach to me. The important point is that IMO, the device should be the central component of any DVFS implementation. Both clock and voltage are just some device resources that have to change synchronously to reduce the power consumption of the device. Because the clock is not the central piece of the DVFS sequence, I don't think it deserves to handle the whole sequence including voltage scaling. A change to a clock rate might trigger a voltage change, but the opposite is true as well. A reduction of the voltage could trigger the clock rate change inside all the devices that belong to the voltage domain. Because of that, both fmwks are siblings. This is not a parent-child relationship. Another important point is that in order to trigger a DVFS sequence you have to do some voting to take into account shared clock and shared voltage domains. Moreover, playing directly with a clock rate is not necessarily appropriate or sufficient for some devices. For example, the interconnect should expose a BW knob instead of a clock rate one. In general, some more abstract information like BW, latency or performance level (P-state) should be the ones to be exposed at driver level. By exposing such knobs, the underlying DVFS fmwk will be able to do voting based on all the system constraints and then set the proper clock rate using clock fmwk if the divider is exposed as a clock node or let the driver convert the final device recommendation using whatever register that will adjust the critical clock path rate. Regards, Benoit [1] http://focus.ti.com/pdfs/wtbu/OMAP4430_ES2.x_DM_Public_Book_vC.pdf ^ permalink raw reply [flat|nested] 22+ messages in thread
* Common clock and dvfs 2011-05-05 5:08 ` Common clock and dvfs Cousson, Benoit @ 2011-05-05 6:11 ` Colin Cross 2011-05-05 6:35 ` Paul Walmsley 2011-05-05 21:08 ` Cousson, Benoit 2011-05-05 6:25 ` Paul Walmsley 1 sibling, 2 replies; 22+ messages in thread From: Colin Cross @ 2011-05-05 6:11 UTC (permalink / raw) To: linux-arm-kernel On Wed, May 4, 2011 at 10:08 PM, Cousson, Benoit <b-cousson@ti.com> wrote: > (Cc folks with some DVFS interest) > > Hi Colin, > > On Fri, 22 Apr 2011, Colin Cross wrote: >> >> Now that we are approaching a common clock management implementation, >> I was thinking it might be the right place to put a common dvfs >> implementation as well. >> >> It is very common for SoC manufacturers to provide a table of the >> minimum voltage required on a voltage rail for a clock to run at a >> given frequency. ?There may be multiple clocks in a voltage rail that >> each can specify their own minimum voltage, and one clock may affect >> multiple voltage rails. ?I have seen two ways to handle keeping the >> clocks and voltages within spec: >> >> The Tegra way is to put everything dvfs related under the clock >> framework. ?Enabling (or preparing, in the new clock world) or raising >> the frequency calls dvfs_set_rate before touching the clock, which >> looks up the required voltage on a voltage rail, aggregates it with >> the other voltage requests, and passes the minimum voltage required to >> the regulator api. ?Disabling or unpreparing, or lowering the >> frequency changes the clock first, and then calls dvfs_set_rate. ?For >> a generic implementation, an SoC would provide the clock/dvfs >> framework with a list of clocks, the voltages required for each >> frequency step on the clock, and the regulator name to change. ?The >> frequency/voltage tables are similar to OPP, except that OPP gets >> voltages for a device instead of a clock. ?In a few odd cases (Tegra >> always has a few odd cases), a clock that is internal to a device and >> not exposed to the clock framework (pclk output on the display, for >> example) has a voltage requirement, which requires some devices to >> manually call dvfs_set_rate directly, but with a common clock >> framework it would probably be possible for the display driver to >> export pclk as a real clock. > > Those kinds of exceptions are somehow the rules for an OMAP4 device. Most > scalable devices are using some internal dividers or even internal PLL to > control the scalable clock rate (DSS, HSI, MMC, McBSP... the OMAP4430 Data > Manual [1] is providing the various clock rate limitation depending of the > OPP). > And none of these internal dividers are handled by the clock fmwk today. > > For sure, it should be possible to extend the clock data with internal > devices clock nodes (like the UART baud rate divider for example), but then > we will have to handle a bunch of nodes that may not be always available > depending of device state. In order to do that, you have to tie these clocks > node to the device that contains them. I agree there are cases where the clock framework may not be a fit for a specific divider, but it would be simple to export the same dvfs_set_rate functions that the generic clk_set_rate calls, and allow drivers that need to scale their own clocks to take advantage of the common tables. > And for the clocks that do not belong to any device, like most PRCM source > clocks or DPLL inside OMAP, we can easily define a PRCM device or several CM > (Clock Manager) devices that will handle all these clock nodes. > >> The proposed OMAP4 way (I believe, correct me if I am wrong) is to >> create a new api outside the clock api that calls into both the clock >> api and the regulator api in the correct order for each operation, >> using OPP to determine the voltage. ?This has a few disadvantages >> (obviously, I am biased, having written the Tegra code) - clocks and >> voltages are tied to a device, which is not always the case for >> platforms outside of OMAP, and drivers must know if their hardware >> requires voltage scaling. ?The clock api becomes unsafe to use on any >> device that requires dvfs, as it could change the frequency higher >> than the supported voltage. > > You have to tie clock and voltage to a device. Most of the time a clock does > not have any clear relation with a voltage domain. It can even cross power / > voltage domain without any issue. > The efficiency of the DVFS technique is mainly due to the reduction of the > voltage rail that supply a device. In order to achieve that you have to > reduce the clock rate of one or several clocks nodes that supply the > critical path inside the HW. A clock crossing a voltage domain is not a problem, a single clock can have relationships to multiple regulators. But a clock does not need to be tied to a device. From the silicon perspective, it doesn't matter how you divide up the devices in the kernel, a clock is just a line toggling at a rate, and the maximum speed it can toggle is determined by the silicon it feeds and the voltage that silicon is operating at. If a device can be turned on or off, that's a clock gate, and the line downstream from the clock gate is a separate clock. > The clock node itself does not know anything about the device and that's why > it should not be the proper structure to do DVFS. One of us is confused here. The clock node does not know about the device, and it doesn't need to. All the clock needs to know is that the manufacturer has specified that for a single node to toggle at some rate, a voltage rail must be set some minimum voltage. The devices are irrelevant. Imagine a chip where a clock can feed devices A, B, and C. If the devices are always clocked at the same rate, and can't gate their clocks, the minimum voltage that can be applied to a rail is determined ONLY by the rate of the clock. If device A can be disabled, with its clock gated, then the devices no longer share a clock. Device A is controlled by clock 1, and devices B and C are controlled by clock 2, where clock 2 is the parent of clock 1, and clock 1 is just a "clock gate" building block from the generic clock code. If clock 1 is enabled, both clock 1 and clock 2 apply their own, independent minimum voltage requirements on a regulator. If clock 1 is disabled, only the voltage requirement of clock 2 is applied. No knowledge of the device is required, only the voltage requirement for the toggling rate at each node, and each node can be 0, 1, or more devices. > OMAP moved away from using the clock nodes to represent IP blocks because > the clock abstraction was not enough to represent the way an IP is > interacting with clocks. That's why omap_hwmod was introduced to represent > an IP block. omap_hwmod is entirely omap specific, and any generic solution cannot be based on it. >> Is the clock api the right place to do dvfs, or should the clock api >> be kept simple, and more complicated operations like dvfs be kept >> outside? > > In term of SW layering, so far we have the clock fmwk and the regulator > fmwk. Since DVFS is about both clock and voltage scaling, it makes more > sense to me to handle DVFS on top of both existing fmwks. Let stick to the > "do one thing and do it well" principle instead of hacking an existing fmwk > with what I consider to be an unrelated functionality. There are two reasons I hate putting DVFS above the clock framework. First, it breaks existing users of the clock api. Any driver that calls the clock api directly risks raising the frequency above the silicon specs. Instead, you introduce a new api, something like dvfs_set_rate(struct device, frequency), which takes the same arguments as the clock api, except a device instead of a clock, which I have already argued against. If needs the same arguments to run, and it provides a superset of the functionality, and it is trivial to fall back to the old behavior if the clock is not a dvfs clock, why does it need a new api? > Moreover, the only exiting DVFS SW on Linux today is CPUFreq, so extending > this fmwk to a devfreq kind of fwmk seems a more logical approach to me. I think this is where we disagree most. CPUFreq is NOT a DVFS implementation. It is a frequency scaling implementation only. If it happens to scale the voltage, it is only because that is the logical place to do it. Every CPUFreq driver that scales the voltage has to look like this: pick the cpu frequency if the frequency is increasing, raise the voltage based on the new frequency set the cpu frequency if the frequency is decreasing, lower the voltage based on the new frequency Note that the last 3 lines are a completely generic clock-based voltage scaling, and could be moved into the dvfs api under the clock api. > The important point is that IMO, the device should be the central component > of any DVFS implementation. Both clock and voltage are just some device > resources that have to change synchronously to reduce the power consumption > of the device. The don't just have to change synchronously, one exactly determines the other. Given a table from the manufacturer, and a clock frequency, you can always set the voltage rails correctly. > Because the clock is not the central piece of the DVFS sequence, I don't > think it deserves to handle the whole sequence including voltage ?scaling. > > A change to a clock rate might trigger a voltage change, but the opposite is > true as well. A reduction of the voltage could trigger the clock rate change > inside all the devices that belong to the voltage domain. > Because of that, both fmwks are siblings. This is not a parent-child > relationship. In what case would you ever trigger a voltage change first? Devices never care about their voltage, they only care about how fast they can run. The only case I can think of is thermal throttling, but could just as well be implemented as lowering the clock frequency to allow the voltage to drop. > Another important point is that in order to trigger a DVFS sequence you have > to do some voting to take into account shared clock and shared voltage > domains. This is conflating frequency selection with voltage selection. The voltage only depends on the maximum clock that is voted, and the voltage is always a minimum voltage, so other clocks in the same voltage domain can request a higher voltage, which needs to be handled by the regulator api. > Moreover, playing directly with a clock rate is not necessarily appropriate > or sufficient for some devices. For example, the interconnect should expose > a BW knob instead of a clock rate one. > In general, some more abstract information like BW, latency or performance > level (P-state) should be the ones to be exposed at driver level. Yes, but again you are conflating frequency selection with voltage selection. BW, latency, and performance are all knobs that will determine one or more clock frequencies, but the voltage is determined only from those final clock frequencies. I agree there is a need for some sort of governor above the clock api, but that governor generally does not need to know voltages. It may be useful to expose power numbers for the different clock frequencies to it, so it knows what the best clock frequencies to select are based on power vs. performance. > By exposing such knobs, the underlying DVFS fmwk will be able to do voting > based on all the system constraints and then set the proper clock rate using > clock fmwk if the divider is exposed as a clock node or let the driver > convert the final device recommendation using whatever register that will > adjust the critical clock path rate. Note that you only referred to setting clock registers - the governor has no need to directly modify voltages. > Regards, > Benoit > > > [1] http://focus.ti.com/pdfs/wtbu/OMAP4430_ES2.x_DM_Public_Book_vC.pdf > ^ permalink raw reply [flat|nested] 22+ messages in thread
* Common clock and dvfs 2011-05-05 6:11 ` Colin Cross @ 2011-05-05 6:35 ` Paul Walmsley 2011-05-05 6:50 ` Colin Cross 2011-05-05 21:08 ` Cousson, Benoit 1 sibling, 1 reply; 22+ messages in thread From: Paul Walmsley @ 2011-05-05 6:35 UTC (permalink / raw) To: linux-arm-kernel On Wed, 4 May 2011, Colin Cross wrote: > Imagine a chip where a clock can feed devices A, B, and C. If the > devices are always clocked at the same rate, and can't gate their > clocks, the minimum voltage that can be applied to a rail is > determined ONLY by the rate of the clock. That's not so -- although admittedly it's a side issue, and not particularly related to DVFS. For example, the device may have some external I/O lines which need to be at least some minimum voltage level for the externally-connected device to function. This minimum voltage level can be unrelated to the device's clock frequency. - Paul ^ permalink raw reply [flat|nested] 22+ messages in thread
* Common clock and dvfs 2011-05-05 6:35 ` Paul Walmsley @ 2011-05-05 6:50 ` Colin Cross 2011-05-05 13:59 ` Mark Brown 0 siblings, 1 reply; 22+ messages in thread From: Colin Cross @ 2011-05-05 6:50 UTC (permalink / raw) To: linux-arm-kernel On Wed, May 4, 2011 at 11:35 PM, Paul Walmsley <paul@pwsan.com> wrote: > On Wed, 4 May 2011, Colin Cross wrote: > >> Imagine a chip where a clock can feed devices A, B, and C. ?If the >> devices are always clocked at the same rate, and can't gate their >> clocks, the minimum voltage that can be applied to a rail is >> determined ONLY by the rate of the clock. > > That's not so -- although admittedly it's a side issue, and not > particularly related to DVFS. > > For example, the device may have some external I/O lines which need to be > at least some minimum voltage level for the externally-connected device to > function. ?This minimum voltage level can be unrelated to the device's > clock frequency. True, that was an oversimplificaiton. I meant the minimum voltage that scales with clock frequencies only depends on the clock frequency, not the device. Devices do need to be able to specify a higher minimum voltage, and the regulator api needs to handle it. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Common clock and dvfs 2011-05-05 6:50 ` Colin Cross @ 2011-05-05 13:59 ` Mark Brown 0 siblings, 0 replies; 22+ messages in thread From: Mark Brown @ 2011-05-05 13:59 UTC (permalink / raw) To: linux-arm-kernel On Wed, May 04, 2011 at 11:50:52PM -0700, Colin Cross wrote: > True, that was an oversimplificaiton. I meant the minimum voltage that > scales with clock frequencies only depends on the clock frequency, not > the device. Devices do need to be able to specify a higher minimum > voltage, and the regulator api needs to handle it. The regulator API already supports this so we're fine there. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Common clock and dvfs 2011-05-05 6:11 ` Colin Cross 2011-05-05 6:35 ` Paul Walmsley @ 2011-05-05 21:08 ` Cousson, Benoit 2011-05-05 23:15 ` Colin Cross 2011-05-06 8:13 ` MyungJoo Ham 1 sibling, 2 replies; 22+ messages in thread From: Cousson, Benoit @ 2011-05-05 21:08 UTC (permalink / raw) To: linux-arm-kernel On 5/5/2011 8:11 AM, Colin Cross wrote: > On Wed, May 4, 2011 at 10:08 PM, Cousson, Benoit<b-cousson@ti.com> wrote: >> (Cc folks with some DVFS interest) >> >> Hi Colin, >> >> On Fri, 22 Apr 2011, Colin Cross wrote: >>> >>> Now that we are approaching a common clock management implementation, >>> I was thinking it might be the right place to put a common dvfs >>> implementation as well. >>> >>> It is very common for SoC manufacturers to provide a table of the >>> minimum voltage required on a voltage rail for a clock to run at a >>> given frequency. There may be multiple clocks in a voltage rail that >>> each can specify their own minimum voltage, and one clock may affect >>> multiple voltage rails. I have seen two ways to handle keeping the >>> clocks and voltages within spec: >>> >>> The Tegra way is to put everything dvfs related under the clock >>> framework. Enabling (or preparing, in the new clock world) or raising >>> the frequency calls dvfs_set_rate before touching the clock, which >>> looks up the required voltage on a voltage rail, aggregates it with >>> the other voltage requests, and passes the minimum voltage required to >>> the regulator api. Disabling or unpreparing, or lowering the >>> frequency changes the clock first, and then calls dvfs_set_rate. For >>> a generic implementation, an SoC would provide the clock/dvfs >>> framework with a list of clocks, the voltages required for each >>> frequency step on the clock, and the regulator name to change. The >>> frequency/voltage tables are similar to OPP, except that OPP gets >>> voltages for a device instead of a clock. In a few odd cases (Tegra >>> always has a few odd cases), a clock that is internal to a device and >>> not exposed to the clock framework (pclk output on the display, for >>> example) has a voltage requirement, which requires some devices to >>> manually call dvfs_set_rate directly, but with a common clock >>> framework it would probably be possible for the display driver to >>> export pclk as a real clock. >> >> Those kinds of exceptions are somehow the rules for an OMAP4 device. Most >> scalable devices are using some internal dividers or even internal PLL to >> control the scalable clock rate (DSS, HSI, MMC, McBSP... the OMAP4430 Data >> Manual [1] is providing the various clock rate limitation depending of the >> OPP). >> And none of these internal dividers are handled by the clock fmwk today. >> >> For sure, it should be possible to extend the clock data with internal >> devices clock nodes (like the UART baud rate divider for example), but then >> we will have to handle a bunch of nodes that may not be always available >> depending of device state. In order to do that, you have to tie these clocks >> node to the device that contains them. > > I agree there are cases where the clock framework may not be a fit for > a specific divider, but it would be simple to export the same > dvfs_set_rate functions that the generic clk_set_rate calls, and allow > drivers that need to scale their own clocks to take advantage of the > common tables. > >> And for the clocks that do not belong to any device, like most PRCM source >> clocks or DPLL inside OMAP, we can easily define a PRCM device or several CM >> (Clock Manager) devices that will handle all these clock nodes. >> >>> The proposed OMAP4 way (I believe, correct me if I am wrong) is to >>> create a new api outside the clock api that calls into both the clock >>> api and the regulator api in the correct order for each operation, >>> using OPP to determine the voltage. This has a few disadvantages >>> (obviously, I am biased, having written the Tegra code) - clocks and >>> voltages are tied to a device, which is not always the case for >>> platforms outside of OMAP, and drivers must know if their hardware >>> requires voltage scaling. The clock api becomes unsafe to use on any >>> device that requires dvfs, as it could change the frequency higher >>> than the supported voltage. >> >> You have to tie clock and voltage to a device. Most of the time a clock does >> not have any clear relation with a voltage domain. It can even cross power / >> voltage domain without any issue. >> The efficiency of the DVFS technique is mainly due to the reduction of the >> voltage rail that supply a device. In order to achieve that you have to >> reduce the clock rate of one or several clocks nodes that supply the >> critical path inside the HW. > > A clock crossing a voltage domain is not a problem, a single clock can > have relationships to multiple regulators. But a clock does not need > to be tied to a device. From the silicon perspective, it doesn't > matter how you divide up the devices in the kernel, a clock is just a > line toggling at a rate, and the maximum speed it can toggle is > determined by the silicon it feeds and the voltage that silicon is > operating at. If a device can be turned on or off, that's a clock > gate, and the line downstream from the clock gate is a separate clock. Fully agree. Just to clarify the terminology, I'm using device to represent the IP block as well. The mapping is not necessarily one to one, but for most relevant IPs this is mostly true. In our case, the hwmod will represent the HW device. My point is that a Soc with just clocks and voltage domains will be pretty useless. We do have as well a bunch of IPs that are represented by devices, and these IPs are the relevant piece of HW we have to managed. Clocks and voltages are just some resources needed by an IP to work properly. Hence the importance of the device. >> The clock node itself does not know anything about the device and that's why >> it should not be the proper structure to do DVFS. > > One of us is confused here. The clock node does not know about the > device, and it doesn't need to. All the clock needs to know is that > the manufacturer has specified that for a single node to toggle at > some rate, a voltage rail must be set some minimum voltage. The > devices are irrelevant. The manufacturer will specify the IP (represented by a device) characteristics in term of voltage rails, clock input, IRQ... This is all about the IP, the clock is just a parameter. The clock itself even tied with a voltage domain is of no use if not connected to an IP. The DSP DPLL that belongs to the IVA voltage domain can probably run up to 2 GHz at 1.1v without any issue. As soon as you connect that clock to the DSP... suddenly you cannot run the DPLL anymore at that rate. You have to reduce it to 400MHz. The constraint is purely due the the IP connected to that clock. Imagine now a new release of the SoC (ES2.0 for Ex) with an updated DSP block that can run at 500MHz... Same clock tree, same voltage domain partitioning but because of the new IP version, you can run faster... What piece of HW is really relevant in that change? It is neither the clock nor the voltage domain. It is only the device that have to update its requirement toward its resources suppliers. > Imagine a chip where a clock can feed devices A, B, and C. If the > devices are always clocked at the same rate, and can't gate their > clocks, the minimum voltage that can be applied to a rail is > determined ONLY by the rate of the clock. > If device A can be disabled, with its clock gated, then the devices no > longer share a clock. Device A is controlled by clock 1, and devices > B and C are controlled by clock 2, where clock 2 is the parent of > clock 1, and clock 1 is just a "clock gate" building block from the > generic clock code. If clock 1 is enabled, both clock 1 and clock 2 > apply their own, independent minimum voltage requirements on a > regulator. As previously explained, a clock node cannot have any voltage requirement toward a voltage domain. It will depend of the devices supplied by this clock node. Only the HW device can have frequency requirement and voltage requirement according to its HW characteristics. > If clock 1 is disabled, only the voltage requirement of > clock 2 is applied. No knowledge of the device is required, only the > voltage requirement for the toggling rate at each node, and each node > can be 0, 1, or more devices. > >> OMAP moved away from using the clock nodes to represent IP blocks because >> the clock abstraction was not enough to represent the way an IP is >> interacting with clocks. That's why omap_hwmod was introduced to represent >> an IP block. > > omap_hwmod is entirely omap specific, and any generic solution cannot > be based on it. For the moment, because it is a fairly new design, but nothing should prevent us to make it generic if this abstraction is relevant for other SoC. >>> Is the clock api the right place to do dvfs, or should the clock api >>> be kept simple, and more complicated operations like dvfs be kept >>> outside? >> >> In term of SW layering, so far we have the clock fmwk and the regulator >> fmwk. Since DVFS is about both clock and voltage scaling, it makes more >> sense to me to handle DVFS on top of both existing fmwks. Let stick to the >> "do one thing and do it well" principle instead of hacking an existing fmwk >> with what I consider to be an unrelated functionality. > > There are two reasons I hate putting DVFS above the clock framework. > > First, it breaks existing users of the clock api. Any driver that > calls the clock api directly risks raising the frequency above the > silicon specs. Instead, you introduce a new api, something like > dvfs_set_rate(struct device, frequency), which takes the same > arguments as the clock api, except a device instead of a clock, which > I have already argued against. If needs the same arguments to run, > and it provides a superset of the functionality, and it is trivial to > fall back to the old behavior if the clock is not a dvfs clock, why > does it need a new api? Because it does not have the same purpose. And it does not break the user of the clock API. It is even the opposite. You are breaking the expectation of the current user of the clock API. Adding DVFS under the clock set_rate will completely change the behaviour of an existing API. A set_rate call that use to last a couple of micro second and that was atomic will last potentially 10ms because a voltage change sequence will be done under the hood. I think this is quite a huge side effect that an user of that API might not expect at all. Just because of that, I think it worth having another API. >> Moreover, the only exiting DVFS SW on Linux today is CPUFreq, so extending >> this fmwk to a devfreq kind of fwmk seems a more logical approach to me. > > I think this is where we disagree most. CPUFreq is NOT a DVFS > implementation. It is a frequency scaling implementation only. I don't think we have such a strong disagreement here. I do agree that CPUFreq is not a full DFVS implementation. It is indeed more focused on the governor / decision part. The interesting part is the CPUFreq driver layer part that is for my point of view the missing layer we have between the decision layer and the clock / regulator fmwk. > If it > happens to scale the voltage, it is only because that is the logical > place to do it. Every CPUFreq driver that scales the voltage has to > look like this: > > pick the cpu frequency > if the frequency is increasing, raise the voltage based on the new frequency > set the cpu frequency > if the frequency is decreasing, lower the voltage based on the new frequency > > Note that the last 3 lines are a completely generic clock-based > voltage scaling, and could be moved into the dvfs api under the clock > api. Except in the ACPI world... That does not have necessarily a clock fmwk. >> The important point is that IMO, the device should be the central component >> of any DVFS implementation. Both clock and voltage are just some device >> resources that have to change synchronously to reduce the power consumption >> of the device. > > The don't just have to change synchronously, one exactly determines > the other. No not necessarily, there is a big difference between the clock / voltage you can use based on the actual constraints and the ones you actually use. A set_rate user does expect the rate to be changed or to fail. A DVFS constraint will be expressed using some kind of set_minimum_rate API that will just give the minimum clock frequency value that will allow the device to work properly for the expected task. The real frequency will change based on the various constraint the system have. And that can change whenever someone change any constraint in the system. A user might require only 200MHz for the DSP for example, but if at least one other device inside the DSP voltage domain does require the highest voltage, there is no point reducing the DSP frequency. It is much more efficient to run it at 400MHz whenever this is possible. That's why we do need another API, because the set_rate API is the one that will effectively change the frequency. Most driver / user should use this kind of set_minimum_rate API and not the set_rate. Most of the time they do not care or should not care about the exact clock rate. they just have to ensure that the clock will run at the sufficient rate to do its work properly. > Given a table from the manufacturer, and a clock > frequency, you can always set the voltage rails correctly. I do agree, my point is just that this should be a HW device related table. >> Because the clock is not the central piece of the DVFS sequence, I don't >> think it deserves to handle the whole sequence including voltage scaling. >> >> A change to a clock rate might trigger a voltage change, but the opposite is >> true as well. A reduction of the voltage could trigger the clock rate change >> inside all the devices that belong to the voltage domain. >> Because of that, both fmwks are siblings. This is not a parent-child >> relationship. > In what case would you ever trigger a voltage change first? Devices > never care about their voltage, they only care about how fast they can > run. The only case I can think of is thermal throttling, but could > just as well be implemented as lowering the clock frequency to allow > the voltage to drop. Devices will indeed never care about voltage directly, but that will happen indirectly because of: - voltage domains dependency: Changing the MPU or IVA voltage domain might force the CORE voltage to increase its voltage due to HW limitation. We cannot have the CPU at 1GHz while the interconnect is at the lowest OPP. - voltage domain increase due to one device frequency increase might force the other voltage domain devices to increase their frequency. - Thermal management might be a good example as well, but in general changing the main contributors frequency (MPU, GPU) should be enough. In both cases, the indirect voltage change will trigger potentially frequency change. vdd1 <--> vdd2 | | +----+ +----+ | | | | devA devB devC devD With such partitioning, an increase of devA OPP, will increase vdd1 that will trigger an increase of vdd2 that will then broadcast to devices that belong to it. devC and devD might or not increase their frequency to reduce the energy consumption. Any devices like processors that can run fast and idle must run at the max frequency allowed by the current voltage. >> Another important point is that in order to trigger a DVFS sequence you have >> to do some voting to take into accountn shared clock and shared voltage >> domains. > This is conflating frequency selection with voltage selection. The > voltage only depends on the maximum clock that is voted, and the > voltage is always a minimum voltage, so other clocks in the same > voltage domain can request a higher voltage, which needs to be handled > by the regulator api. > >> Moreover, playing directly with a clock rate is not necessarily appropriate >> or sufficient for some devices. For example, the interconnect should expose >> a BW knob instead of a clock rate one. >> In general, some more abstract information like BW, latency or performance >> level (P-state) should be the ones to be exposed at driver level. > Yes, but again you are conflating frequency selection with voltage > selection. BW, latency, and performance are all knobs that will > determine one or more clock frequencies, but the voltage is determined > only from those final clock frequencies. Not I'm not, I do agree with your point. the final frequency will indeed allow to chose the proper voltage. I do not have any confusion about that. My whole point is that the freq <-> voltage dependency is bi-directional as explained before, that's why you do need an intermediate layer that will select both freq and voltage depending of the various constraints. > I agree there is a need for > some sort of governor above the clock api, but that governor generally > does not need to know voltages. It is not necessarily a governor but more some kind of QoS at device level. Exposing a clock set_rate on a input clock to a driver is, in general, not very good since it might make the driver platform dependent. Whereas exposing some abstract QoS APIs will avoid a driver to use directly a low level clock set_rate API. > It may be useful to expose power > numbers for the different clock frequencies to it, so it knows what > the best clock frequencies to select are based on power vs. > performance. > >> By exposing such knobs, the underlying DVFS fmwk will be able to do voting >> based on all the system constraints and then set the proper clock rate using >> clock fmwk if the divider is exposed as a clock node or let the driver >> convert the final device recommendation using whatever register that will >> adjust the critical clock path rate. > Note that you only referred to setting clock registers - the governor > has no need to directly modify voltages. You're right, let's rephrase: ...using whatever register that will adjust the critical clock path rate and then change the voltage if needed. I do not have any disagreement with you on that point. A freq change might trigger a voltage change. But a voltage change might trigger as well a frequency change to another clock. That's why a parent-child relationship does not seems appropriate here for my point of view. Regards, Benoit ^ permalink raw reply [flat|nested] 22+ messages in thread
* Common clock and dvfs 2011-05-05 21:08 ` Cousson, Benoit @ 2011-05-05 23:15 ` Colin Cross 2011-05-06 17:36 ` Paul Walmsley 2011-05-06 8:13 ` MyungJoo Ham 1 sibling, 1 reply; 22+ messages in thread From: Colin Cross @ 2011-05-05 23:15 UTC (permalink / raw) To: linux-arm-kernel On Thu, May 5, 2011 at 2:08 PM, Cousson, Benoit <b-cousson@ti.com> wrote: > On 5/5/2011 8:11 AM, Colin Cross wrote: >> >> On Wed, May 4, 2011 at 10:08 PM, Cousson, Benoit<b-cousson@ti.com> ?wrote: >>> >>> (Cc folks with some DVFS interest) >>> >>> Hi Colin, >>> >>> On Fri, 22 Apr 2011, Colin Cross wrote: >>>> >>>> Now that we are approaching a common clock management implementation, >>>> I was thinking it might be the right place to put a common dvfs >>>> implementation as well. >>>> >>>> It is very common for SoC manufacturers to provide a table of the >>>> minimum voltage required on a voltage rail for a clock to run at a >>>> given frequency. ?There may be multiple clocks in a voltage rail that >>>> each can specify their own minimum voltage, and one clock may affect >>>> multiple voltage rails. ?I have seen two ways to handle keeping the >>>> clocks and voltages within spec: >>>> >>>> The Tegra way is to put everything dvfs related under the clock >>>> framework. ?Enabling (or preparing, in the new clock world) or raising >>>> the frequency calls dvfs_set_rate before touching the clock, which >>>> looks up the required voltage on a voltage rail, aggregates it with >>>> the other voltage requests, and passes the minimum voltage required to >>>> the regulator api. ?Disabling or unpreparing, or lowering the >>>> frequency changes the clock first, and then calls dvfs_set_rate. ?For >>>> a generic implementation, an SoC would provide the clock/dvfs >>>> framework with a list of clocks, the voltages required for each >>>> frequency step on the clock, and the regulator name to change. ?The >>>> frequency/voltage tables are similar to OPP, except that OPP gets >>>> voltages for a device instead of a clock. ?In a few odd cases (Tegra >>>> always has a few odd cases), a clock that is internal to a device and >>>> not exposed to the clock framework (pclk output on the display, for >>>> example) has a voltage requirement, which requires some devices to >>>> manually call dvfs_set_rate directly, but with a common clock >>>> framework it would probably be possible for the display driver to >>>> export pclk as a real clock. >>> >>> Those kinds of exceptions are somehow the rules for an OMAP4 device. Most >>> scalable devices are using some internal dividers or even internal PLL to >>> control the scalable clock rate (DSS, HSI, MMC, McBSP... the OMAP4430 >>> Data >>> Manual [1] is providing the various clock rate limitation depending of >>> the >>> OPP). >>> And none of these internal dividers are handled by the clock fmwk today. >>> >>> For sure, it should be possible to extend the clock data with internal >>> devices clock nodes (like the UART baud rate divider for example), but >>> then >>> we will have to handle a bunch of nodes that may not be always available >>> depending of device state. In order to do that, you have to tie these >>> clocks >>> node to the device that contains them. >> >> I agree there are cases where the clock framework may not be a fit for >> a specific divider, but it would be simple to export the same >> dvfs_set_rate functions that the generic clk_set_rate calls, and allow >> drivers that need to scale their own clocks to take advantage of the >> common tables. >> >>> And for the clocks that do not belong to any device, like most PRCM >>> source >>> clocks or DPLL inside OMAP, we can easily define a PRCM device or several >>> CM >>> (Clock Manager) devices that will handle all these clock nodes. >>> >>>> The proposed OMAP4 way (I believe, correct me if I am wrong) is to >>>> create a new api outside the clock api that calls into both the clock >>>> api and the regulator api in the correct order for each operation, >>>> using OPP to determine the voltage. ?This has a few disadvantages >>>> (obviously, I am biased, having written the Tegra code) - clocks and >>>> voltages are tied to a device, which is not always the case for >>>> platforms outside of OMAP, and drivers must know if their hardware >>>> requires voltage scaling. ?The clock api becomes unsafe to use on any >>>> device that requires dvfs, as it could change the frequency higher >>>> than the supported voltage. >>> >>> You have to tie clock and voltage to a device. Most of the time a clock >>> does >>> not have any clear relation with a voltage domain. It can even cross >>> power / >>> voltage domain without any issue. >>> The efficiency of the DVFS technique is mainly due to the reduction of >>> the >>> voltage rail that supply a device. In order to achieve that you have to >>> reduce the clock rate of one or several clocks nodes that supply the >>> critical path inside the HW. >> >> A clock crossing a voltage domain is not a problem, a single clock can >> have relationships to multiple regulators. ?But a clock does not need >> to be tied to a device. ?From the silicon perspective, it doesn't >> matter how you divide up the devices in the kernel, a clock is just a >> line toggling at a rate, and the maximum speed it can toggle is >> determined by the silicon it feeds and the voltage that silicon is >> operating at. ?If a device can be turned on or off, that's a clock >> gate, and the line downstream from the clock gate is a separate clock. > > Fully agree. > > Just to clarify the terminology, I'm using device to represent the IP block > as well. The mapping is not necessarily one to one, but for most relevant > IPs this is mostly true. In our case, the hwmod will represent the HW > device. Lets be clearer. "struct device" means the kernel's view of a single device, and "IP block" means a piece of silicon, and avoid "device" completely. > My point is that a Soc with just clocks and voltage domains will be pretty > useless. > We do have as well a bunch of IPs that are represented by devices, and these > IPs are the relevant piece of HW we have to managed. > > Clocks and voltages are just some resources needed by an IP to work > properly. > Hence the importance of the device. I agree that an IP block needs a clock, but I disagree that (in most cases), an IP block needs a voltage directly. In 99% of cases, the voltage is relevant to all the silicon connected to a clock node, and not to a specific IP block. >>> The clock node itself does not know anything about the device and that's >>> why >>> it should not be the proper structure to do DVFS. >> >> One of us is confused here. ?The clock node does not know about the >> device, and it doesn't need to. ?All the clock needs to know is that >> the manufacturer has specified that for a single node to toggle at >> some rate, a voltage rail must be set some minimum voltage. ?The >> devices are irrelevant. > > The manufacturer will specify the IP (represented by a device) > characteristics in term of voltage rails, clock input, IRQ... > This is all about the IP, the clock is just a parameter. No, TI may specify the voltage and clock input for a device, but nVidia specifies a clock node and a voltage. One is going to need to be converted to the other. > The clock itself even tied with a voltage domain is of no use if not > connected to an IP. Clocks are always tied to IP - that's what clk_get(struct device *, const char *con_id) is for. > The DSP DPLL that belongs to the IVA voltage domain can probably run up to 2 > GHz at 1.1v without any issue. > As soon as you connect that clock to the DSP... suddenly you cannot run the > DPLL anymore at that rate. You have to reduce it to 400MHz. > The constraint is purely due the the IP connected to that clock. I think we have another terminology issue. A two wires in the silicon are only the same "clock node" if they always toggle at the same rate. If one wire can be disabled while the other one is enabled (the DSP clock and the DPLL, in your example), they are not the same node. The DSP clock is a child of the DPLL clock. Maintaining the relationships between multiple clocks is clearly necessary, but voltages are irrelevant in your example. It doesn't matter that the DPLL can run really fast, the constraint is that the DSP clock is limiting it. This would have to be handled by "devfreq", but devfreq would set the DPLL to 400 MHz, and DVFS would lower its constraint on the IVA voltage. > Imagine now a new release of the SoC (ES2.0 for Ex) with an updated DSP > block that can run at 500MHz... Same clock tree, same voltage domain > partitioning but because of the new IP version, you can run faster... > > What piece of HW is really relevant in that change? It is neither the clock > nor the voltage domain. It is only the device that have to update its > requirement toward its resources suppliers. I agree, and devfreq will need to have knowledge of struct devices, but, once again, voltage is irrelevant here. >> Imagine a chip where a clock can feed devices A, B, and C. ?If the >> devices are always clocked at the same rate, and can't gate their >> clocks, the minimum voltage that can be applied to a rail is >> determined ONLY by the rate of the clock. >> If device A can be disabled, with its clock gated, then the devices no >> longer share a clock. ?Device A is controlled by clock 1, and devices >> B and C are controlled by clock 2, where clock 2 is the parent of >> clock 1, and clock 1 is just a "clock gate" building block from the >> generic clock code. ?If clock 1 is enabled, both clock 1 and clock 2 >> apply their own, independent minimum voltage requirements on a >> regulator. > > As previously explained, a clock node cannot have any voltage requirement > toward a voltage domain. It will depend of the devices supplied by this > clock node. Only the HW device can have frequency requirement and voltage > requirement according to its HW characteristics. I think this disagreement is related to our conflicting definitions of clock node. The PLL that feeds multiple IP blocks is one clock node, which may have no voltage constraints at all. Each IP block that is connected to the PLL has it's own clock gates (in the PRCM, for OMAP), so each IP block has its own clock node, represented by a struct clk whose parent is the struct clk of the PLL, and each of those clock nodes may have voltage constraints. Note that there is no mention of "struct device" - those IP blocks may not have a struct device. A cpufreq driver has no struct device, but it could have a struct clk *, and want the voltage to scale when it updates that clock. >> If clock 1 is disabled, only the voltage requirement of >> clock 2 is applied. ?No knowledge of the device is required, only the >> voltage requirement for the toggling rate at each node, and each node >> can be 0, 1, or more devices. >> >>> OMAP moved away from using the clock nodes to represent IP blocks because >>> the clock abstraction was not enough to represent the way an IP is >>> interacting with clocks. That's why omap_hwmod was introduced to >>> represent >>> an IP block. >> >> omap_hwmod is entirely omap specific, and any generic solution cannot >> be based on it. > > For the moment, because it is a fairly new design, but nothing should > prevent us to make it generic if this abstraction is relevant for other SoC. That's not how you design abstractions. You can't abstract one case, without considering other SoCs, and then make it generic if it fits other SoCs - it will never fit other SoCs. You have to consider all the cases you want it to cover, and design an abstraction that makes sense for the superset. OPP is an example of what happens when you design a generic API based off a TI TRM - you end up with something that is irrelevant to half the other SoCs. >>>> Is the clock api the right place to do dvfs, or should the clock api >>>> be kept simple, and more complicated operations like dvfs be kept >>>> outside? >>> >>> In term of SW layering, so far we have the clock fmwk and the regulator >>> fmwk. Since DVFS is about both clock and voltage scaling, it makes more >>> sense to me to handle DVFS on top of both existing fmwks. Let stick to >>> the >>> "do one thing and do it well" principle instead of hacking an existing >>> fmwk >>> with what I consider to be an unrelated functionality. >> >> There are two reasons I hate putting DVFS above the clock framework. >> >> First, it breaks existing users of the clock api. ?Any driver that >> calls the clock api directly risks raising the frequency above the >> silicon specs. ?Instead, you introduce a new api, something like >> dvfs_set_rate(struct device, frequency), which takes the same >> arguments as the clock api, except a device instead of a clock, which >> I have already argued against. ?If needs the same arguments to run, >> and it provides a superset of the functionality, and it is trivial to >> fall back to the old behavior if the clock is not a dvfs clock, why >> does it need a new api? > > Because it does not have the same purpose. > > And it does not break the user of the clock API. It is even the opposite. > You are breaking the expectation of the current user of the clock API. > Adding DVFS under the clock set_rate will completely change the behaviour of > an existing API. > A set_rate call that use to last a couple of micro second and that was > atomic will last potentially 10ms because a voltage change sequence will be > done under the hood. I think this is quite a huge side effect that an user > of that API might not expect at all. Not true. It has recently been clarified that clk_set_rate is a sleepable call, and it is likely that future calls to clk_set_rate in a generic implementation will take a global or semi-global mutex. > Just because of that, I think it worth having another API. What happens when a common driver, something like EHCI, or an 8250 driver, calls clk_enable? If the clock, or one of its parents, has a voltage constraint, it gets undefined behavior when it operates the silicon out of spec. You are requiring every call to clk_* from a driver that is shared across SoCs to be updated to use dvfs_*, and dvfs_* to be implemented on every SoC that uses those drivers. And your new API takes the exact same arguments as the old api - some sort of token, and a frequency. >>> Moreover, the only exiting DVFS SW on Linux today is CPUFreq, so >>> extending >>> this fmwk to a devfreq kind of fwmk seems a more logical approach to me. >> >> I think this is where we disagree most. ?CPUFreq is NOT a DVFS >> implementation. ?It is a frequency scaling implementation only. > > I don't think we have such a strong disagreement here. I do agree that > CPUFreq is not a full DFVS implementation. > It is indeed more focused on the governor / decision part. > The interesting part is the CPUFreq driver layer part that is for my point > of view the missing layer we have between the decision layer and the clock / > regulator fmwk. > >> If it >> happens to scale the voltage, it is only because that is the logical >> place to do it. ?Every CPUFreq driver that scales the voltage has to >> look like this: >> >> pick the cpu frequency >> if the frequency is increasing, raise the voltage based on the new >> frequency >> set the cpu frequency >> if the frequency is decreasing, lower the voltage based on the new >> frequency >> >> Note that the last 3 lines are a completely generic clock-based >> voltage scaling, and could be moved into the dvfs api under the clock >> api. > > Except in the ACPI world... That does not have necessarily a clock fmwk. If it doesn't have a clock framework, how is it relevant to this discussion? >>> The important point is that IMO, the device should be the central >>> component >>> of any DVFS implementation. Both clock and voltage are just some device >>> resources that have to change synchronously to reduce the power >>> consumption >>> of the device. >> >> The don't just have to change synchronously, one exactly determines >> the other. > > No not necessarily, there is a big difference between the clock / voltage > you can use based on the actual constraints and the ones you actually use. Agree - and devices need to be able to specify constraints on the minimum frequency, and some sort of governor needs to decide what the best frequency is to use. That's a perfectly reasonable job for devfreq. > A set_rate user does expect the rate to be changed or to fail. > A DVFS constraint will be expressed using some kind of set_minimum_rate API > that will just give the minimum clock frequency value that will allow the > device to work properly for the expected task. > The real frequency will change based on the various constraint the system > have. And that can change whenever someone change any constraint in the > system. > A user might require only 200MHz for the DSP for example, but if at least > one other device inside the DSP voltage domain does require the highest > voltage, there is no point reducing the DSP frequency. It is much more > efficient to run it at 400MHz whenever this is possible. > That's why we do need another API, because the set_rate API is the one that > will effectively change the frequency. > > Most driver / user should use this kind of set_minimum_rate API and not the > set_rate. > Most of the time they do not care or should not care about the exact clock > rate. they just have to ensure that the clock will run at the sufficient > rate to do its work properly. > >> Given a table from the manufacturer, and a clock >> frequency, you can always set the voltage rails correctly. > > I do agree, my point is just that this should be a HW device related table. A constraint in TI land may always be on an IP block (i assume that's what you mean by HW device), but it could also be a constraint on a cluster of IP blocks, or it could be a constraint on an IP block that doesn't have a struct device, for example the CPU. So you can't use struct device as the token you use to look up in the table. >>> Because the clock is not the central piece of the DVFS sequence, I don't >>> think it deserves to handle the whole sequence including voltage >>> ?scaling. >>> >>> A change to a clock rate might trigger a voltage change, but the opposite >>> is >>> true as well. A reduction of the voltage could trigger the clock rate >>> change >>> inside all the devices that belong to the voltage domain. >>> Because of that, both fmwks are siblings. This is not a parent-child >>> relationship. >> >> In what case would you ever trigger a voltage change first? ?Devices >> never care about their voltage, they only care about how fast they can >> run. ?The only case I can think of is thermal throttling, but could >> just as well be implemented as lowering the clock frequency to allow >> the voltage to drop. > > Devices will indeed never care about voltage directly, but that will happen > indirectly because of: > - voltage domains dependency: Changing the MPU or IVA voltage domain might > force the CORE voltage to increase its voltage due to HW limitation. We > cannot have the CPU at 1GHz while the interconnect is at the lowest OPP. > - voltage domain increase due to one device frequency increase might force > the other voltage domain devices to increase their frequency. > - Thermal management might be a good example as well, but in general > changing the main contributors frequency (MPU, GPU) should be enough. > > In both cases, the indirect voltage change will trigger potentially > frequency change. > > vdd1 <--> vdd2 > ?| ? ? ? ? | > ?+----+ ? ?+----+ > ?| ? ?| ? ?| ? ?| > devA devB devC devD > > With such partitioning, an increase of devA OPP, will increase vdd1 that > will trigger an increase of vdd2 that will then broadcast to devices that > belong to it. devC and devD might or not increase their frequency to reduce > the energy consumption. > Any devices like processors that can run fast and idle must run at the max > frequency allowed by the current voltage. This is a good point, and its a tough problem to solve. Do you have any numbers on the power savings here? In some cases it may be beneficial, but if you have to disable auto idle on the IP blocks for any reason (HW bug, etc), you end up wasting power. I think you can still implement this with devfreq and the half of OPP I don't dislike. Your example starts and ends with a clock, but passes through voltages. Why not take the voltages out of the equation, and simplify TI's OPP table to: CPU frequency 1GHz, set minimum GPU frequency high. You could also implement this by having devfreq register notifiers with the regulator API - when the voltage increases on a regulator, increase the rates of the clocks that can benefit from the higher voltage. On the surface, this seems like duplicating code, but with the right exports from the dvfs api, devfreq could easily query the best voltage to use without duplicating the voltage table. The end result is that the decision making is split - the decision on the voltage to use for the clock, which is a hard requirement of the silicon, is made based on the clock, but the decision on the best clock to use for the voltage, which is an optimization, is made at a higher level by devfreq. The biggest advantage of this split is that all the existing APIs continue to work - setting a clock with clk_set_rate will get the right voltage, setting a voltage directly with regulator_set_voltage will trigger a clock frequency increase. Until a case appears that requires increasing clock frequencies based on a voltage change that was not triggered by another clock, I don't think the complexity of the regulator notifier solution is necessary, and OPPs would be trivial to use for now. >>> Another important point is that in order to trigger a DVFS sequence you >>> have >>> to do some voting to take into accountn shared clock and shared voltage >>> domains. >> >> This is conflating frequency selection with voltage selection. ?The >> voltage only depends on the maximum clock that is voted, and the >> voltage is always a minimum voltage, so other clocks in the same >> voltage domain can request a higher voltage, which needs to be handled >> by the regulator api. >> >>> Moreover, playing directly with a clock rate is not necessarily >>> appropriate >>> or sufficient for some devices. For example, the interconnect should >>> expose >>> a BW knob instead of a clock rate one. >>> In general, some more abstract information like BW, latency or >>> performance >>> level (P-state) should be the ones to be exposed at driver level. >> >> Yes, but again you are conflating frequency selection with voltage >> selection. ?BW, latency, and performance are all knobs that will >> determine one or more clock frequencies, but the voltage is determined >> only from those final clock frequencies. > > Not I'm not, I do agree with your point. the final frequency will indeed > allow to chose the proper voltage. I do not have any confusion about that. > > My whole point is that the freq <-> voltage dependency is bi-directional as > explained before, that's why you do need an intermediate layer that will > select both freq and voltage depending of the various constraints. OK, I see your point on the bidirectional relationship. But I don't think it is a symmetric bidirectional relationship - one direction is a requirement for correct operation, the other is an optimization, and an optimization that will not apply equally to all platforms. >> I agree there is a need for >> some sort of governor above the clock api, but that governor generally >> does not need to know voltages. > > It is not necessarily a governor but more some kind of QoS at device level. > Exposing a clock set_rate on a input clock to a driver is, in general, not > very good since it might make the driver platform dependent. Whereas > exposing some abstract QoS APIs will avoid a driver to use directly a low > level clock set_rate API. Yes - a perfect job for devfreq. >> It may be useful to expose power >> numbers for the different clock frequencies to it, so it knows what >> the best clock frequencies to select are based on power vs. >> performance. >> >>> By exposing such knobs, the underlying DVFS fmwk will be able to do >>> voting >>> based on all the system constraints and then set the proper clock rate >>> using >>> clock fmwk if the divider is exposed as a clock node or let the driver >>> convert the final device recommendation using whatever register that will >>> adjust the critical clock path rate. >> >> Note that you only referred to setting clock registers - the governor >> has no need to directly modify voltages. > > You're right, let's rephrase: > ...using whatever register that will adjust the critical clock path rate and > then change the voltage if needed. > > I do not have any disagreement with you on that point. A freq change might > trigger a voltage change. But a voltage change might trigger as well a > frequency change to another clock. > That's why a parent-child relationship does not seems appropriate here for > my point of view. > > > Regards, > Benoit > > ^ permalink raw reply [flat|nested] 22+ messages in thread
* Common clock and dvfs 2011-05-05 23:15 ` Colin Cross @ 2011-05-06 17:36 ` Paul Walmsley 0 siblings, 0 replies; 22+ messages in thread From: Paul Walmsley @ 2011-05-06 17:36 UTC (permalink / raw) To: linux-arm-kernel Not that this is particularly related to DVFS, but: On Thu, 5 May 2011, Colin Cross wrote: > On Thu, May 5, 2011 at 2:08 PM, Cousson, Benoit <b-cousson@ti.com> wrote: > > > Colin Cross wrote: > > >> omap_hwmod is entirely omap specific, and any generic solution cannot > >> be based on it. > > > > For the moment, because it is a fairly new design, but nothing should > > prevent us to make it generic if this abstraction is relevant for other SoC. > > That's not how you design abstractions. Oh really? > You can't abstract one case, without considering other SoCs, and then > make it generic if it fits other SoCs - it will never fit other SoCs. > You have to consider all the cases you want it to cover, and design an > abstraction that makes sense for the superset. In actual practice, one often does not know in advance the entire universe of cases that one needs to cover. Even just for one SoC. Consider that you mentioned earlier that you had to rewrite the Tegra clock code several times. Now, add several other families of SoCs to the requirements. If the documentation for these chips is even available at all, it is often misleading or wrong. Attempting to create an abstraction before one knows the underlying requirements of what one is actually trying to abstract is a plan for intense suffering. There's little glory in it. ... In the specific case of omap_hwmod, the core of the omap_hwmod data structures were designed such that they could apply to any SoC with a complex interconnect. The design was based on hardware principles common to any SoC: interconnects, IP blocks, reset lines, etc. There are OMAP-specific parts, but if others found omap_hwmod useful, they're trivial to abstract. We haven't sought to force it on others. - Paul ^ permalink raw reply [flat|nested] 22+ messages in thread
* Common clock and dvfs 2011-05-05 21:08 ` Cousson, Benoit 2011-05-05 23:15 ` Colin Cross @ 2011-05-06 8:13 ` MyungJoo Ham 1 sibling, 0 replies; 22+ messages in thread From: MyungJoo Ham @ 2011-05-06 8:13 UTC (permalink / raw) To: linux-arm-kernel On Fri, May 6, 2011 at 6:08 AM, Cousson, Benoit <b-cousson@ti.com> wrote: [] > > Devices will indeed never care about voltage directly, but that will happen > indirectly because of: > - voltage domains dependency: Changing the MPU or IVA voltage domain might > force the CORE voltage to increase its voltage due to HW limitation. We > cannot have the CPU at 1GHz while the interconnect is at the lowest OPP. > - voltage domain increase due to one device frequency increase might force > the other voltage domain devices to increase their frequency. > - Thermal management might be a good example as well, but in general > changing the main contributors frequency (MPU, GPU) should be enough. > > In both cases, the indirect voltage change will trigger potentially > frequency change. > > vdd1 <--> vdd2 > ?| ? ? ? ? | > ?+----+ ? ?+----+ > ?| ? ?| ? ?| ? ?| > devA devB devC devD > > With such partitioning, an increase of devA OPP, will increase vdd1 that > will trigger an increase of vdd2 that will then broadcast to devices that > belong to it. devC and devD might or not increase their frequency to reduce > the energy consumption. > Any devices like processors that can run fast and idle must run at the max > frequency allowed by the current voltage. As long as the voltage change in vdd1, which changes vdd2 (vdd1 and 2 are consumers of the same regulator, right?), can update OPP entries related (enable/disable entries), devfreq can handle this. If the clocks and devices (A~D) related are using devfreq, disabling, enabling, and adding OPPs will instantly affect devfreq and adjust clock frequency based on the enabled OPP entries only. Thus, if a module is increasing the voltage, it just needs to disable some low-voltage OPP entries although some set_min/max APIs mentioned by Colin will be more useful. -- MyungJoo Ham (???), Ph.D. Mobile Software Platform Lab, Digital Media and Communications (DMC) Business Samsung Electronics cell: 82-10-6714-2858 ^ permalink raw reply [flat|nested] 22+ messages in thread
* Common clock and dvfs 2011-05-05 5:08 ` Common clock and dvfs Cousson, Benoit 2011-05-05 6:11 ` Colin Cross @ 2011-05-05 6:25 ` Paul Walmsley 1 sibling, 0 replies; 22+ messages in thread From: Paul Walmsley @ 2011-05-05 6:25 UTC (permalink / raw) To: linux-arm-kernel On Thu, 5 May 2011, Cousson, Benoit wrote: > Those kinds of exceptions are somehow the rules for an OMAP4 device. > Most scalable devices are using some internal dividers or even internal > PLL to control the scalable clock rate (DSS, HSI, MMC, McBSP... the > OMAP4430 Data Manual [1] is providing the various clock rate limitation > depending of the OPP). And none of these internal dividers are handled > by the clock fmwk today. That's mostly because no one has taken the time to implement them, not really for any technical reason. > For sure, it should be possible to extend the clock data with internal > devices clock nodes (like the UART baud rate divider for example), but > then we will have to handle a bunch of nodes that may not be always > available depending of device state. In order to do that, you have to > tie these clocks node to the device that contains them. It's only necessary to do that for the device where the clock's control registers are located. In many cases (almost all on OMAP), this is a different device from the device that the clock actually drives. > And for the clocks that do not belong to any device, like most PRCM > source clocks or DPLL inside OMAP, we can easily define a PRCM device or > several CM (Clock Manager) devices that will handle all these clock > nodes. > > > The proposed OMAP4 way (I believe, correct me if I am wrong) is to > > create a new api outside the clock api that calls into both the clock > > api and the regulator api in the correct order for each operation, > > using OPP to determine the voltage. This has a few disadvantages > > (obviously, I am biased, having written the Tegra code) - clocks and > > voltages are tied to a device, which is not always the case for > > platforms outside of OMAP, and drivers must know if their hardware > > requires voltage scaling. The clock api becomes unsafe to use on any > > device that requires dvfs, as it could change the frequency higher > > than the supported voltage. > > You have to tie clock and voltage to a device. As you mentioned above, there are several clocks that aren't associated with any specific "device" outside of the clock itself, or which are associated with multiple devices. > Most of the time a clock does not have any clear relation with a voltage > domain. It can even cross power / voltage domain without any issue. Each instance of a clock signal -- a conductor on a chip that carries an AC signal that is used to drive some gates -- can only be driven by one voltage rail. How could it be otherwise? In the unusual instances where a clock crosses voltage rails (by virtue of some gates between the rails that handle the translation) and it is important for Linux to know this, then in the Linux-OMAP code, the intention is for separate struct clks to be used for the clock signals on either side of the voltage rail crossing. > The clock node itself does not know anything about the device and that's > why it should not be the proper structure to do DVFS. What aspects of the device are you referring to that the clock node would need to know? > OMAP moved away from using the clock nodes to represent IP blocks > because the clock abstraction was not enough to represent the way an IP > is interacting with clocks. That's why omap_hwmod was introduced to > represent an IP block. omap_hwmod was introduced to represent IP blocks and their interconnection. Separating IP block gating from individual clock gating was one part of this, but not the only one; and gating isn't really related to DVFS. > Because the clock is not the central piece of the DVFS sequence, I don't > think it deserves to handle the whole sequence including voltage > scaling. > > A change to a clock rate might trigger a voltage change, but the > opposite is true as well. A reduction of the voltage could trigger the > clock rate change inside all the devices that belong to the voltage > domain. Because of that, both fmwks are siblings. This is not a > parent-child relationship. What's the use case for voltage reduction that isn't triggered by a clock rate reduction? > Another important point is that in order to trigger a DVFS sequence you > have to do some voting to take into account shared clock and shared > voltage domains. > > Moreover, playing directly with a clock rate is not necessarily > appropriate or sufficient for some devices. For example, the > interconnect should expose a BW knob instead of a clock rate one. In > general, some more abstract information like BW, latency or performance > level (P-state) should be the ones to be exposed at driver level. It's definitely true, that, say, the SDMA driver should not specify its interconnect bandwidth requirements in terms of an interconnect clock frequency. It should specify some variant of bytes per second. But that's only possible because the goal is to provide the interconnect driver with have enough information to convert the bandwidth constraint to a clock rate constraint. The core code is not capable of translating bandwidth constraints for non-interconnect devices to clock rates in the general case. That must be done by the device driver and/or device subsystem itself, which would provide a clock rate constraint to the core code. > By exposing such knobs, the underlying DVFS fmwk will be able to do voting > based on all the system constraints and then set the proper clock rate using > clock fmwk if the divider is exposed as a clock node or let the driver convert > the final device recommendation using whatever register that will adjust the > critical clock path rate. - Paul ^ permalink raw reply [flat|nested] 22+ messages in thread
* Common clock and dvfs
@ 2011-04-22 18:15 Colin Cross
2011-04-22 19:37 ` Thomas Gleixner
` (2 more replies)
0 siblings, 3 replies; 22+ messages in thread
From: Colin Cross @ 2011-04-22 18:15 UTC (permalink / raw)
To: linux-arm-kernel
Now that we are approaching a common clock management implementation,
I was thinking it might be the right place to put a common dvfs
implementation as well.
It is very common for SoC manufacturers to provide a table of the
minimum voltage required on a voltage rail for a clock to run at a
given frequency. There may be multiple clocks in a voltage rail that
each can specify their own minimum voltage, and one clock may affect
multiple voltage rails. I have seen two ways to handle keeping the
clocks and voltages within spec:
The Tegra way is to put everything dvfs related under the clock
framework. Enabling (or preparing, in the new clock world) or raising
the frequency calls dvfs_set_rate before touching the clock, which
looks up the required voltage on a voltage rail, aggregates it with
the other voltage requests, and passes the minimum voltage required to
the regulator api. Disabling or unpreparing, or lowering the
frequency changes the clock first, and then calls dvfs_set_rate. For
a generic implementation, an SoC would provide the clock/dvfs
framework with a list of clocks, the voltages required for each
frequency step on the clock, and the regulator name to change. The
frequency/voltage tables are similar to OPP, except that OPP gets
voltages for a device instead of a clock. In a few odd cases (Tegra
always has a few odd cases), a clock that is internal to a device and
not exposed to the clock framework (pclk output on the display, for
example) has a voltage requirement, which requires some devices to
manually call dvfs_set_rate directly, but with a common clock
framework it would probably be possible for the display driver to
export pclk as a real clock.
The proposed OMAP4 way (I believe, correct me if I am wrong) is to
create a new api outside the clock api that calls into both the clock
api and the regulator api in the correct order for each operation,
using OPP to determine the voltage. This has a few disadvantages
(obviously, I am biased, having written the Tegra code) - clocks and
voltages are tied to a device, which is not always the case for
platforms outside of OMAP, and drivers must know if their hardware
requires voltage scaling. The clock api becomes unsafe to use on any
device that requires dvfs, as it could change the frequency higher
than the supported voltage.
Is the clock api the right place to do dvfs, or should the clock api
be kept simple, and more complicated operations like dvfs be kept
outside?
^ permalink raw reply [flat|nested] 22+ messages in thread* Common clock and dvfs 2011-04-22 18:15 Colin Cross @ 2011-04-22 19:37 ` Thomas Gleixner 2011-04-23 1:21 ` Saravana Kannan 2011-04-22 19:40 ` Mark Brown 2011-04-25 8:33 ` Paul Walmsley 2 siblings, 1 reply; 22+ messages in thread From: Thomas Gleixner @ 2011-04-22 19:37 UTC (permalink / raw) To: linux-arm-kernel On Fri, 22 Apr 2011, Colin Cross wrote: > Now that we are approaching a common clock management implementation, > I was thinking it might be the right place to put a common dvfs > implementation as well. Hehe, that would have been my next stupid question :) > Is the clock api the right place to do dvfs, or should the clock api > be kept simple, and more complicated operations like dvfs be kept > outside? I think it's an orthogonal issue which can be solved once we have generic implementations in place for clk_ functions which result in DFVS relevant changes. clk_prepare() { lock_tree(); if (dvfs_validate()) return -ECRAP; prepare(); unlock_tree(); } clk_unprepare() { lock_tree(); unprepare(); dvfs_recalc(); unlock_tree(); } clk_set_rate() { lock_tree(); if (rate > clk->rate && dvfs_prevalidate(rate)) return -ECRAP; set_and_propagate_rate(); if (rate < old->rate) dvfs_recalc(); unlock_tree(); } We can put the dvfs functions into the propagation code as well and back out there when dvfs tells us. That probably works nicely for prepare and unprepare, but might be complex for set_rate in the actual set_rate propagation (except when the rate is less than the original one). I was wondering already whether we might need to do a pre check for set_rate when we get called from random drivers. For two reasons: 1) Is the change valid for all childs of the clock 2) Be more clever than saying no, when the requested rate cannot be provided. Someone mentioned this elsewhere in the clk thread, that it would be nice to be able to figure out whether changing one of the parent clocks would allow to satisfy all childs up the tree. Again, that's an extension and optimization and fits nicely into the basic feature set I'm trying to come up with. Those functions want to be implemented in drivers/clk/ so they can share internals with the clock interface. Then you can also integrate other DVFS users by locking the clk tree for validation or propagation purposes. So yes, we want integration at the interface and concept level, but the actual implementation wants to be outside. That keeps the clk code nice and simple. A few extra checks/calls here and there with proper inline replacements for the !DVFS case are not making the code a big mess. Of course you need at least a pointer to the OPP tables in the clock structure, but that's the least of our worries. Thanks, tglx ^ permalink raw reply [flat|nested] 22+ messages in thread
* Common clock and dvfs 2011-04-22 19:37 ` Thomas Gleixner @ 2011-04-23 1:21 ` Saravana Kannan 2011-04-23 1:35 ` Saravana Kannan 2011-04-23 13:14 ` Thomas Gleixner 0 siblings, 2 replies; 22+ messages in thread From: Saravana Kannan @ 2011-04-23 1:21 UTC (permalink / raw) To: linux-arm-kernel On 04/22/2011 12:37 PM, Thomas Gleixner wrote: > On Fri, 22 Apr 2011, Colin Cross wrote: > >> Is the clock api the right place to do dvfs, or should the clock api >> be kept simple, and more complicated operations like dvfs be kept >> outside? > > I think it's an orthogonal issue which can be solved once we have > generic implementations in place for clk_ functions which result in > DFVS relevant changes. > > clk_prepare() > { > lock_tree(); > if (dvfs_validate()) > return -ECRAP; > prepare(); > unlock_tree(); > } > > clk_unprepare() > { > lock_tree(); > unprepare(); > dvfs_recalc(); > unlock_tree(); > } > > clk_set_rate() > { > lock_tree(); > > if (rate> clk->rate&& dvfs_prevalidate(rate)) > return -ECRAP; > > set_and_propagate_rate(); > > if (rate< old->rate) > dvfs_recalc(); > > unlock_tree(); > } I understand that this is just some example code, but I'm not sure how accurate you were trying to be. So, I will assume the worst. I really don't like this lock the whole clock tree approach for DVFS. I think a better approach would be something like this: struct dvfs_tuple { long max_rate; /* Max rate for this DVFS level. */ int dvfs_level; }; /* Yeah, bad name. pick what you like. */ struct dvfs_class { int *level_cnt; int num_levels; int (*dvfs_update) (int level); mutex lock; } stuct clk { ... /* Last entry would have {LONG_MAX, highest level in dvfs}, */ struct dvfs_tuple *dvfs_list; struct dvfs_class *dvfs_class; } Then clk_prepare() { for each dvfs tuple { if current_rate <= max_rate; { level = dvfs_level; break; } } lock(dvfs_class->mutex); clk->dvfs_class->level_cnt[level]++; /* Find highest level in this dvfs class with non-zero count */ if(clk->dvfs_class->update(highest_level)) return -ECRAP; /* Yes, I see locking err, but you get the point */ unlock(dvfs_class->mutex); clk->ops->prepare(clk); } I did some crappy factorization of functions, returning -ECRAP without unlocking, etc, but you get the point. Whether the "level" translates to controlling power supply A or B or A & B is upto the implementation of dvfs_class->dvfs_update() which is per SOC/mach/arch. This will keep the dvfs data structure common for all clocks, remove the need to traverse the entire tree and keep the generic code agnostic of 1-1 vs. many-1 vs many-many. > We can put the dvfs functions into the propagation code as well and > back out there when dvfs tells us. That probably works nicely for > prepare and unprepare, but might be complex for set_rate in the actual > set_rate propagation (except when the rate is less than the original > one). For set rate, you lock the dvfs class, increment the count for the new rate's dvfs_level, decrement it for the previous dvfs level, and then call dvfs_update() with the max level in that class, unlock dvfs_class. Thanks, Saravana -- Sent by an employee of the Qualcomm Innovation Center, Inc. The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Common clock and dvfs 2011-04-23 1:21 ` Saravana Kannan @ 2011-04-23 1:35 ` Saravana Kannan 2011-04-23 13:14 ` Thomas Gleixner 1 sibling, 0 replies; 22+ messages in thread From: Saravana Kannan @ 2011-04-23 1:35 UTC (permalink / raw) To: linux-arm-kernel On 04/22/2011 06:21 PM, Saravana Kannan wrote: > On 04/22/2011 12:37 PM, Thomas Gleixner wrote: >> On Fri, 22 Apr 2011, Colin Cross wrote: >> <snipped out the code since thunderbird is messing it up for some reason>. > I did some crappy factorization of functions, returning -ECRAP without > unlocking, etc, but you get the point. > > Whether the "level" translates to controlling power supply A or B or A & > B is upto the implementation of dvfs_class->dvfs_update() which is per > SOC/mach/arch. Not sure if this was obvious from my previous email. But each SOC/mach/arch could have more than one dvfs_class depending on the no. of power supply combinations that are manipulated for dvfs. -Saravana -- Sent by an employee of the Qualcomm Innovation Center, Inc. The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Common clock and dvfs 2011-04-23 1:21 ` Saravana Kannan 2011-04-23 1:35 ` Saravana Kannan @ 2011-04-23 13:14 ` Thomas Gleixner 1 sibling, 0 replies; 22+ messages in thread From: Thomas Gleixner @ 2011-04-23 13:14 UTC (permalink / raw) To: linux-arm-kernel On Fri, 22 Apr 2011, Saravana Kannan wrote: > On 04/22/2011 12:37 PM, Thomas Gleixner wrote: > > On Fri, 22 Apr 2011, Colin Cross wrote: > I understand that this is just some example code, but I'm not sure how > accurate you were trying to be. So, I will assume the worst. Not at all. I merily wanted to show that it can be integrated in some way. I let you DVFS folks churn the details out. :) Thanks, tglx ^ permalink raw reply [flat|nested] 22+ messages in thread
* Common clock and dvfs 2011-04-22 18:15 Colin Cross 2011-04-22 19:37 ` Thomas Gleixner @ 2011-04-22 19:40 ` Mark Brown 2011-04-22 19:48 ` Colin Cross 2011-04-25 8:33 ` Paul Walmsley 2 siblings, 1 reply; 22+ messages in thread From: Mark Brown @ 2011-04-22 19:40 UTC (permalink / raw) To: linux-arm-kernel On Fri, Apr 22, 2011 at 11:15:34AM -0700, Colin Cross wrote: > Now that we are approaching a common clock management implementation, > I was thinking it might be the right place to put a common dvfs > implementation as well. I looked at this a bit when doing the S3C64xx stuff. > frequency changes the clock first, and then calls dvfs_set_rate. For > a generic implementation, an SoC would provide the clock/dvfs > framework with a list of clocks, the voltages required for each > frequency step on the clock, and the regulator name to change. The > frequency/voltage tables are similar to OPP, except that OPP gets > voltages for a device instead of a clock. In a few odd cases (Tegra This sounds like it assumes a 1:1 mapping between clocks and supplies which is going to break at some point. It should be handlable but will add complexity. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Common clock and dvfs 2011-04-22 19:40 ` Mark Brown @ 2011-04-22 19:48 ` Colin Cross 2011-04-22 20:35 ` Mark Brown 0 siblings, 1 reply; 22+ messages in thread From: Colin Cross @ 2011-04-22 19:48 UTC (permalink / raw) To: linux-arm-kernel On Fri, Apr 22, 2011 at 12:40 PM, Mark Brown <broonie@opensource.wolfsonmicro.com> wrote: > On Fri, Apr 22, 2011 at 11:15:34AM -0700, Colin Cross wrote: > >> Now that we are approaching a common clock management implementation, >> I was thinking it might be the right place to put a common dvfs >> implementation as well. > > I looked at this a bit when doing the S3C64xx stuff. > >> frequency changes the clock first, and then calls dvfs_set_rate. ?For >> a generic implementation, an SoC would provide the clock/dvfs >> framework with a list of clocks, the voltages required for each >> frequency step on the clock, and the regulator name to change. ?The >> frequency/voltage tables are similar to OPP, except that OPP gets >> voltages for a device instead of a clock. ?In a few odd cases (Tegra > > This sounds like it assumes a 1:1 mapping between clocks and supplies > which is going to break at some point. ?It should be handlable but will > add complexity. > Almost every platform requires a many-to-one mapping between clocks and supplies (many clocks fed off one supply), and I bet at least one platform has one clock that requires changing two supplies, so a many-to-many mapping is probably required. It's not hard, I implemented it for Tegra before moving to a many-to-one - each relationship between a clock and a supply can be treated independently, so the pointers in the clock and dvfs structs can just be converted to lists of an intermediate struct. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Common clock and dvfs 2011-04-22 19:48 ` Colin Cross @ 2011-04-22 20:35 ` Mark Brown 2011-04-22 21:18 ` Colin Cross 0 siblings, 1 reply; 22+ messages in thread From: Mark Brown @ 2011-04-22 20:35 UTC (permalink / raw) To: linux-arm-kernel On Fri, Apr 22, 2011 at 12:48:37PM -0700, Colin Cross wrote: > On Fri, Apr 22, 2011 at 12:40 PM, Mark Brown > > This sounds like it assumes a 1:1 mapping between clocks and supplies > > which is going to break at some point. ?It should be handlable but will > > add complexity. > Almost every platform requires a many-to-one mapping between clocks > and supplies (many clocks fed off one supply), and I bet at least one > platform has one clock that requires changing two supplies, so a In most of the platforms I've looked at the supported configurations are specified en masse as operating points so it definitely ends up being the case, you get a set of frequencies and a set of voltages specified as a block. > many-to-many mapping is probably required. It's not hard, I > implemented it for Tegra before moving to a many-to-one - each > relationship between a clock and a supply can be treated > independently, so the pointers in the clock and dvfs structs can just > be converted to lists of an intermediate struct. Right, but like I say it does add complexity. I'm not sure it's noticably more than what you get from having to link some of the clock rates, though. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Common clock and dvfs 2011-04-22 20:35 ` Mark Brown @ 2011-04-22 21:18 ` Colin Cross 2011-04-25 11:03 ` Mark Brown 0 siblings, 1 reply; 22+ messages in thread From: Colin Cross @ 2011-04-22 21:18 UTC (permalink / raw) To: linux-arm-kernel On Fri, Apr 22, 2011 at 1:35 PM, Mark Brown <broonie@opensource.wolfsonmicro.com> wrote: > On Fri, Apr 22, 2011 at 12:48:37PM -0700, Colin Cross wrote: >> On Fri, Apr 22, 2011 at 12:40 PM, Mark Brown > >> > This sounds like it assumes a 1:1 mapping between clocks and supplies >> > which is going to break at some point. ?It should be handlable but will >> > add complexity. > >> Almost every platform requires a many-to-one mapping between clocks >> and supplies (many clocks fed off one supply), and I bet at least one >> platform has one clock that requires changing two supplies, so a > > In most of the platforms I've looked at the supported configurations are > specified en masse as operating points so it definitely ends up being > the case, you get a set of frequencies and a set of voltages specified > as a block. I see. Do you happen to know if there are any max voltage requirements for clocks? Would running one clock in a group at a low operating point and another at a high operating point cause a problem, assuming the voltage is set to what is required by the high operating point? ^ permalink raw reply [flat|nested] 22+ messages in thread
* Common clock and dvfs 2011-04-22 21:18 ` Colin Cross @ 2011-04-25 11:03 ` Mark Brown 0 siblings, 0 replies; 22+ messages in thread From: Mark Brown @ 2011-04-25 11:03 UTC (permalink / raw) To: linux-arm-kernel On Fri, Apr 22, 2011 at 02:18:06PM -0700, Colin Cross wrote: > I see. Do you happen to know if there are any max voltage > requirements for clocks? Would running one clock in a group at a low > operating point and another at a high operating point cause a problem, > assuming the voltage is set to what is required by the high operating > point? I'm not aware of any such requirements, and my understanding of the electrical engineering issues suggests that they'd be unlikely. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Common clock and dvfs 2011-04-22 18:15 Colin Cross 2011-04-22 19:37 ` Thomas Gleixner 2011-04-22 19:40 ` Mark Brown @ 2011-04-25 8:33 ` Paul Walmsley 2011-04-25 18:26 ` Turquette, Mike 2 siblings, 1 reply; 22+ messages in thread From: Paul Walmsley @ 2011-04-25 8:33 UTC (permalink / raw) To: linux-arm-kernel (cc Tony, Beno?t) Hi, By way of brief introduction, I'm currently the maintainer of the OMAP clock code and data, as well as some other low-level OMAP kernel pieces. On Fri, 22 Apr 2011, Colin Cross wrote: > The Tegra way is to put everything dvfs related under the clock > framework. Enabling (or preparing, in the new clock world) or raising > the frequency calls dvfs_set_rate before touching the clock, which > looks up the required voltage on a voltage rail, aggregates it with > the other voltage requests, and passes the minimum voltage required to > the regulator api. Disabling or unpreparing, or lowering the > frequency changes the clock first, and then calls dvfs_set_rate. For > a generic implementation, an SoC would provide the clock/dvfs > framework with a list of clocks, the voltages required for each > frequency step on the clock, and the regulator name to change. The > frequency/voltage tables are similar to OPP, except that OPP gets > voltages for a device instead of a clock. In a few odd cases (Tegra > always has a few odd cases), a clock that is internal to a device and > not exposed to the clock framework (pclk output on the display, for > example) has a voltage requirement, which requires some devices to > manually call dvfs_set_rate directly, but with a common clock > framework it would probably be possible for the display driver to > export pclk as a real clock. > > The proposed OMAP4 way (I believe, correct me if I am wrong) is to > create a new api outside the clock api that calls into both the clock > api and the regulator api in the correct order for each operation, > using OPP to determine the voltage. Some people may have proposed this approach, but that's definitely not my perspective. I don't think it's a good design, and have so far declined to merge any DVFS code that doesn't use clk_set_rate() as its interface (from the device driver's perspective), at least until the proponents of the separate-API camp can explain why it's needed. > This has a few disadvantages (obviously, I am biased, having written > the Tegra code) - clocks and voltages are tied to a device, which is not > always the case for platforms outside of OMAP, It's not the case for OMAP either. > and drivers must know if their hardware requires voltage scaling. The > clock api becomes unsafe to use on any device that requires dvfs, as it > could change the frequency higher than the supported voltage. > > Is the clock api the right place to do dvfs, or should the clock api > be kept simple, and more complicated operations like dvfs be kept > outside? My personal opinion is that the clock framework is the right place for this, since it's a defined interface that is already exposed to drivers. However, since the current clock interface doesn't anticipate that some code (e.g. CPUFreq) may need to change a clock's rate while some other code (e.g. a device driver) is currently using that clock, the clock interface will need to be expanded somewhat to handle this safely. Clock notifiers are needed, plus the ability for clock users to indicate when it is safe for an in-use clock's rate/parent to change. I'd been planning to post patches for that stuff for 2.6.40 until all of the recent drama started. I guess I should post them anyway... - Paul ^ permalink raw reply [flat|nested] 22+ messages in thread
* Common clock and dvfs 2011-04-25 8:33 ` Paul Walmsley @ 2011-04-25 18:26 ` Turquette, Mike 0 siblings, 0 replies; 22+ messages in thread From: Turquette, Mike @ 2011-04-25 18:26 UTC (permalink / raw) To: linux-arm-kernel On Mon, Apr 25, 2011 at 3:33 AM, Paul Walmsley <paul@pwsan.com> wrote: > (cc Tony, Beno?t) > > Hi, > > By way of brief introduction, I'm currently the maintainer of the OMAP > clock code and data, as well as some other low-level OMAP kernel pieces. > > On Fri, 22 Apr 2011, Colin Cross wrote: > >> The Tegra way is to put everything dvfs related under the clock >> framework. ?Enabling (or preparing, in the new clock world) or raising >> the frequency calls dvfs_set_rate before touching the clock, which >> looks up the required voltage on a voltage rail, aggregates it with >> the other voltage requests, and passes the minimum voltage required to >> the regulator api. ?Disabling or unpreparing, or lowering the >> frequency changes the clock first, and then calls dvfs_set_rate. ?For >> a generic implementation, an SoC would provide the clock/dvfs >> framework with a list of clocks, the voltages required for each >> frequency step on the clock, and the regulator name to change. ?The >> frequency/voltage tables are similar to OPP, except that OPP gets >> voltages for a device instead of a clock. ?In a few odd cases (Tegra >> always has a few odd cases), a clock that is internal to a device and >> not exposed to the clock framework (pclk output on the display, for >> example) has a voltage requirement, which requires some devices to >> manually call dvfs_set_rate directly, but with a common clock >> framework it would probably be possible for the display driver to >> export pclk as a real clock. >> >> The proposed OMAP4 way (I believe, correct me if I am wrong) is to >> create a new api outside the clock api that calls into both the clock >> api and the regulator api in the correct order for each operation, >> using OPP to determine the voltage. > > Some people may have proposed this approach, but that's definitely not my > perspective. ?I don't think it's a good design, and have so far declined > to merge any DVFS code that doesn't use clk_set_rate() as its interface > (from the device driver's perspective), at least until the proponents of > the separate-API camp can explain why it's needed. > >> This has a few disadvantages (obviously, I am biased, having written >> the Tegra code) - clocks and voltages are tied to a device, which is not >> always the case for platforms outside of OMAP, > > It's not the case for OMAP either. > >> and drivers must know if their hardware requires voltage scaling. ?The >> clock api becomes unsafe to use on any device that requires dvfs, as it >> could change the frequency higher than the supported voltage. >> >> Is the clock api the right place to do dvfs, or should the clock api >> be kept simple, and more complicated operations like dvfs be kept >> outside? > > My personal opinion is that the clock framework is the right place for > this, since it's a defined interface that is already exposed to drivers. I'm concerned that the clock framework will grow far larger than any of us expect it to right now. We need to consider the intersection points between a basic clock framework API, some constraints framework that manages multiple "users" of the clock that have frequency requirements (probably specified in a higher level "throughput-style" constraint) as well as the child-parent "arbitration" issues that some in this thread have referenced already. > However, since the current clock interface doesn't anticipate that some > code (e.g. CPUFreq) may need to change a clock's rate while some other > code (e.g. a device driver) is currently using that clock, the clock > interface will need to be expanded somewhat to handle this safely. ?Clock > notifiers are needed, plus the ability for clock users to indicate when it > is safe for an in-use clock's rate/parent to change. Agreed. If there are multiple users of a clock that are using a higher-level abstraction to manage rates then some other driver shouldn't be able to blindly change the clock rate with clk_set_rate() without notifying/handling the other users first. CPUfreq is one example of a clock management API co-existing with clock framework (and certainly making use of the clock fwk under the hood). But also a constraint framework that exports some throughput request will also want to use the clock framework. Problem here is that now there are two different levels of APIs that are trying to achieve the same thing and arbitration must occur (clk fwk VS constraints API VS CPUfreq VS whatever). Sounds messy. Maybe everyone (drivers) can just use the single higher level API that sits on top of the simple clock framework? Easier to arbitrate these requests within a single API level than across API layers. There is also talk of propagating rates along the tree and arbitrating intelligently when some bad issue crops up. Take the following example: device A wants its clock X to run faster. clock X has a divisor of 1, so to go faster its parent clock P must run faster. device B has a fixed-divisor clock Y which is also parented by P. Due to some limitation (external SD card is crappy and can't handle fast rates) it is invalid for clock Y to run faster than it is already running even though the hardware supports it. In this case device A's request could use the pre-change clock notifiers that Paul mentions and return some -ECRAP preventing the transition. This is probably easy enough to do in the clock framework since only two levels of clocks are involved. However imagine a PLL driving a 192MHz clock that drives 3 child clocks: 96MHz, 48MHz and 12MHz clock respectively. Now each of these last 3 clocks get divided into module-specific functional clocks with unique dividers and in some cases some final dividers are present which are internal to the device which may not even by represented in the clock tree (though they probably could be represented). If an arbitration issue happens at the very bottom of this tree (per the simple example above) then there are at least 4 levels of clocks to go through while trying to find a valid combination. The simplest solution when a problem gets hit is to throw -ECRAP. However the most optimal solution would require that drivers specify "most-desired" rates along with "rates that I can live with" and the tree gets walked until either the first valid combo is found or no valid combo is found and the request is rejected with -ECRAP. This is similar to a classic travelling salesman problem for our clock tree, but is it something we want inside the clock framework? Perhaps the list of "most-desired" rates and "rates I can live with" and "rates that the HW supports but are invalid for me right now due to constraints" should not be tracked in the clock framework but somewhere higher up. No solutions to this problem for now, but food for thought before making a long-term decision. Maybe the parent-child rate change arbitration is over-engineered or too generic for practical use. Let me know what you think. Regards, Mike > I'd been planning to post patches for that stuff for 2.6.40 until all of > the recent drama started. ?I guess I should post them anyway... > > > - Paul ^ permalink raw reply [flat|nested] 22+ messages in thread
end of thread, other threads:[~2011-05-06 17:36 UTC | newest]
Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <4DC07F10.6010305@ti.com>
2011-05-05 5:08 ` Common clock and dvfs Cousson, Benoit
2011-05-05 6:11 ` Colin Cross
2011-05-05 6:35 ` Paul Walmsley
2011-05-05 6:50 ` Colin Cross
2011-05-05 13:59 ` Mark Brown
2011-05-05 21:08 ` Cousson, Benoit
2011-05-05 23:15 ` Colin Cross
2011-05-06 17:36 ` Paul Walmsley
2011-05-06 8:13 ` MyungJoo Ham
2011-05-05 6:25 ` Paul Walmsley
2011-04-22 18:15 Colin Cross
2011-04-22 19:37 ` Thomas Gleixner
2011-04-23 1:21 ` Saravana Kannan
2011-04-23 1:35 ` Saravana Kannan
2011-04-23 13:14 ` Thomas Gleixner
2011-04-22 19:40 ` Mark Brown
2011-04-22 19:48 ` Colin Cross
2011-04-22 20:35 ` Mark Brown
2011-04-22 21:18 ` Colin Cross
2011-04-25 11:03 ` Mark Brown
2011-04-25 8:33 ` Paul Walmsley
2011-04-25 18:26 ` Turquette, Mike
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).