From mboxrd@z Thu Jan 1 00:00:00 1970 From: Guenter Roeck Date: Wed, 24 Oct 2012 20:01:44 +0000 Subject: Re: [lm-sensors] [RFC] Energy/power monitoring within the kernel Message-Id: <20121024200144.GA21137@roeck-us.net> List-Id: References: <1351013449.9070.5.camel@hornet> <20121023220240.GA25895@roeck-us.net> <1351096647.23327.64.camel@hornet> In-Reply-To: <1351096647.23327.64.camel@hornet> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: Pawel Moll Cc: Amit Daniel Kachhap , Zhang Rui , Viresh Kumar , Daniel Lezcano , Jean Delvare , Steven Rostedt , Frederic Weisbecker , Ingo Molnar , Jesper Juhl , Thomas Renninger , Jean Pihet , "linux-kernel@vger.kernel.org" , "linux-arm-kernel@lists.infradead.org" , "lm-sensors@lm-sensors.org" , "linaro-dev@lists.linaro.org" T24gV2VkLCBPY3QgMjQsIDIwMTIgYXQgMDU6Mzc6MjdQTSArMDEwMCwgUGF3ZWwgTW9sbCB3cm90 ZToKPiBPbiBUdWUsIDIwMTItMTAtMjMgYXQgMjM6MDIgKzAxMDAsIEd1ZW50ZXIgUm9lY2sgd3Jv dGU6Cj4gPiA+IFRyYWRpdGlvbmFsbHkgc3VjaCBkYXRhIHNob3VsZCBiZSBleHBvc2VkIHRvIHRo ZSB1c2VyIHZpYSBod21vbiBzeXNmcwo+ID4gPiBpbnRlcmZhY2UsIGFuZCB0aGF0J3MgZXhhY3Rs eSB3aGF0IEkgZGlkIGZvciAibXkiIHBsYXRmb3JtIC0gSSBoYXZlCj4gPiA+IGEgL3N5cy9jbGFz cy9od21vbi9od21vbiovZGV2aWNlL2VuZXJneSpfaW5wdXQgYW5kIHRoaXMgd2FzIGdvb2QKPiA+ ID4gZW5vdWdoIHRvIGRyYXcgcHJldHR5IGdyYXBocyBpbiB1c2Vyc3BhY2UuIEV2ZXJ5b25lIHdh cyBoYXBweS4uLgo+ID4gPiAKPiA+IE9ubHkgZHJpdmVyIHN1cHBvcnRpbmcgImVuZXJneSIgb3V0 cHV0IHNvIGZhciBpcyBpYm1hZW0sIGFuZCB0aGUgcmVwb3J0ZWQgZW5lcmd5Cj4gPiBpcyBzdXBw b3NlZCB0byBiZSBjdW11bGF0aXZlLCBhcyBpbiBlbmVyZ3kgPSBwb3dlciAqIHRpbWUuIERvIHlv dSBtZWFuIHBvd2VyLAo+ID4gcG9zc2libHkgPwo+IAo+IFNvIHRoZSB2ZXhwcmVzcyB3b3VsZCBi ZSB0aGUgc2Vjb25kIG9uZSwgdGhhbiA6LSkgYXMgdGhlIGVuZXJneQo+ICJtb25pdG9yIiBhY3R1 YWxseSBvbiB0aGUgbGF0ZXN0IHRpbGVzIHJlcG9ydHMgNjQtYml0IHZhbHVlIG9mCj4gbWljcm9K b3VsZXMgY29uc3VtZWQgKG9yIHByb2R1Y2VkKSBzaW5jZSB0aGUgcG93ZXItdXAuCj4gCj4gU29t ZSBvZiB0aGUgb2xkZXIgYm9hcmRzIHdlcmUgYWJsZSB0byByZXBvcnQgaW5zdGFudCBwb3dlciwg YnV0IHRoaXMKPiBtZXRyaWNzIGlzIGxlc3MgdXNlZnVsIGluIG91ciBjYXNlLgo+IAo+ID4gPiBO b3cgSSBhbSBnZXR0aW5nIG5ldyByZXF1ZXN0cyB0byBkbyBtb3JlIHdpdGggdGhpcyBkYXRhLiBJ biBwYXJ0aWN1bGFyCj4gPiA+IEknbSBhc2tlZCBob3cgdG8gYWRkIHN1Y2ggaW5mb3JtYXRpb24g dG8gZnRyYWNlL3BlcmYgb3V0cHV0LiBUaGUgc2Vjb25kCj4gPiA+IG1vc3QgZnJlcXVlbnQgcmVx dWVzdCBpcyBhYm91dCBwcm92aWRpbmcgaXQgdG8gYSAiZW5lcmd5IGF3YXJlIgo+ID4gPiBjcHVm cmVxIGdvdmVybm9yLgo+ID4gCj4gPiBBbnl0aGluZyBlbmVyZ3kgcmVsYXRlZCB3b3VsZCBoYXZl IHRvIGJlIGFsb25nIHRoZSBsaW5lIG9mICJkbyBzb21ldGhpbmcgYWZ0ZXIgYQo+ID4gY2VydGFp biBhbW91bnQgb2Ygd29yayBoYXMgYmVlbiBwZXJmb3JtZWQiLCB3aGljaCBhdCBsZWFzdCBhdCB0 aGUgc3VyZmFjZSBkb2VzCj4gPiBub3QgbWFrZSBtdWNoIHNlbnNlIHRvIG1lLCB1bmxlc3MgeW91 IG1lYW4gc29tZXRoaW5nIGFsb25nIHRoZSBsaW5lIG9mIGEKPiA+IHByb2Nlc3Mgc2NoZWR1bGVy IHdoaWNoIHNjaGVkdWxlcyBhIHByb2Nlc3Mgbm90IGJhc2VkIG9uIHRpbWUgc2xpY2VzIGJ1dCBi YXNlZAo+ID4gb24gZW5lcmd5IGNvbnN1bWVkLCBpZSBpZiB5b3Ugd2FudCB0byBkZWZpbmUgYSB0 aW1lIHNsaWNlIG5vdCBpbiBtaWxsaS1zZWNvbmRzCj4gPiBidXQgaW4gSm91bGUuCj4gCj4gQWN0 dWFsbHkgdGhlcmUgaXMgc29tZSByZXNlYXJjaCBiZWluZyBkb25lIGluIHRoaXMgZGlyZWN0aW9u LCBidXQgaXQncwo+IHdheSB0b28gZWFybHkgdG8gZHJhdyBhbnkgY29uY2x1c2lvbnMuLi4KPiAK PiA+IElmIHNvLCBJIHdvdWxkIGFyZ3VlIHRoYXQgYSBzaW1pbGFyIGJlaGF2aW9yIGNvdWxkIGJl IGFjaGlldmVkIGJ5IHZhcnlpbmcgdGhlCj4gPiBkdXJhdGlvbiBvZiB0aW1lIHNsaWNlcyB3aXRo IHRoZSBjdXJyZW50IENQVSBzcGVlZCwgb3Igc2ltcGx5IGJ5IHVzaW5nIGN5Y2xlCj4gPiBjb3Vu dCBpbnN0ZWFkIG9mIHRpbWUgYXMgdGltZSBzbGljZSBwYXJhbWV0ZXIuIE5vdCB0aGF0IEkgYW0g c3VyZSBpZiBzdWNoIGFuCj4gPiBhcHByb2FjaCB3b3VsZCByZWFsbHkgYmUgb2YgaW50ZXJlc3Qg Zm9yIGFueW9uZS4gCj4gPiAKPiA+IE9yIGRvIHlvdSByZWFsbHkgbWVhbiBwb3dlciwgbm90IGVu ZXJneSwgc3VjaCBhcyBpbiAicmVkdWNlIENQVSBzcGVlZCBpZiBpdHMKPiA+IHBvd2VyIGNvbnN1 bXB0aW9uIGlzIGFib3ZlIFggV2F0dCIgPwo+IAo+IFVoLiBUbyBiZSBjb21wbGV0ZWx5IGhvbmVz dCBJIG11c3QgYW5zd2VyOiBJJ20gbm90IHN1cmUgaG93IHRoZSAiZW5lcmd5Cj4gYXdhcmUiIGNw dWZyZXEgZ292ZXJub3IgaXMgc3VwcG9zZWQgdG8gd29yay4gSSBoYXZlIGJlZW4gc2ltcGx5IGFz a2VkIHRvCj4gcHJvdmlkZSB0aGUgZGF0YSBpbiBzb21lIHN0YW5kYXJkIHdheSwgaWYgcG9zc2li bGUuCj4gCj4gPiBJIGFtIG5vdCBzdXJlIGhvdyB0aGlzIHdvdWxkIGJlIGV4cGVjdGVkIHRvIHdv cmsuIGh3bW9uIGlzLCBieSBpdHMgdmVyeSBuYXR1cmUsCj4gPiBhIHBhc3NpdmUgc3Vic3lzdGVt OiBJdCBkb2Vzbid0IGRvIGFueXRoaW5nIHVubGVzcyBkYXRhIGlzIGV4cGxpY2l0bHkgcmVxdWVz dGVkCj4gPiBmcm9tIGl0LiBJdCBkb2VzIG5vdCB1cGRhdGUgYW4gYXR0cmlidXRlIHVubGVzcyB0 aGF0IGF0dHJpYnV0ZSBpcyByZWFkLgo+ID4gVGhhdCBkb2VzIG5vdCBzZWVtIHRvIGZpdCB3ZWxs IHdpdGggdGhlIGlkZWEgb2YgdHJhY2luZyAtIHdoaWNoIGFzc3VtZXMKPiA+IHRoYXQgc29tZSBh Y3Rpdml0eSBpcyBoYXBwZW5pbmcsIHVsdGltYXRlbHksIGFsbCBieSBpdHNlbGYsIHByZXN1bWFi bHkKPiA+IHBlcmlvZGljYWxseS4gVGhlIGlkZWEgdG8gaGF2ZSBhIHVzZXIgc3BhY2UgYXBwbGlj YXRpb24gcmVhZCBod21vbiBkYXRhIG9ubHkKPiA+IGZvciBpdCB0byB0cmlnZ2VyIHRyYWNlIGV2 ZW50cyBkb2VzIG5vdCBzZWVtIHRvIGJlIHZlcnkgY29tcGVsbGluZyB0byBtZS4KPiAKPiBXaGF0 IEkgaGFkIGluIG1pbmQgd2FzIHNpbWlsYXIgdG8gd2hhdCBhZHQ3NDcwIGRyaXZlciBkb2VzLiBU aGUgZHJpdmVyCj4gd291bGQgYXV0b21hdGljYWxseSBhY2Nlc3MgdGhlIGRldmljZSBldmVyeSBu b3cgYW5kIHRoZW4gdG8gdXBkYXRlIGl0J3MKPiBpbnRlcm5hbCBzdGF0ZSBhbmQgZ2VuZXJhdGUg dGhlIHRyYWNlIGV2ZW50IG9uIHRoZSB3YXkuIFRoaXMKPiBhdXRvLXJlZnJlc2ggImZlYXR1cmUi IGlzIHBhcnRpY3VsYXJseSBhcHBlYWxpbmcgZm9yIG1lLCBhcyBvbiBzb21lIG9mCj4gIm15IiBw bGF0Zm9ybXMgY2FuIHRha2UgdXAgdG8gNTAwIG1pY3Jvc2Vjb25kcyB0byBhY3R1YWxseSBnZXQg dGhlIGRhdGEuCj4gU28gZG9pbmcgdGhpcyBpbiBiYWNrZ3JvdW5kIChhbmQgcHJvdmlkaW5nIHVz ZXJzIHdpdGggdGhlIGxhc3Qga25vd24KPiB2YWx1ZSBpbiB0aGUgbWVhbnRpbWUpIHNlZW1zIGF0 dHJhY3RpdmUuCj4gCkEgYmFkIGV4YW1wbGUgZG9lc24ndCBtZWFuIGl0IHNob3VsZCBiZSB1c2Vk IGVsc2V3aGVyZS4KCmFkdDc0NzAgbmVlZHMgdXAgdG8gdHdvIHNlY29uZHMgZm9yIGEgdGVtcGVy YXR1cmUgbWVhc3VyZW1lbnQgY3ljbGUsIGFuZCBpdApjYW4gbm90IHBlcmZvcm0gYXV0b21hdGlj IGN5Y2xlcyBhbGwgYnkgaXRzZWxmLiBJbiB0aGlzIGNvbnRleHQsIGV4ZWN1dGluZwp0ZW1wZXJh dHVyZSBtZWFzdXJlbWVudCBjeWNsZXMgaW4gdGhlIGJhY2tncm91bmQgbWFrZXMgYSBsb3Qgb2Yg c2Vuc2UsCmVzcGVjaWFsbHkgc2luY2Ugb25lIGRvZXMgbm90IHdhbnQgdG8gd2FpdCBmb3IgdHdv IHNlY29uZHMgd2hlbiByZWFkaW5nCmEgc3lzZnMgYXR0cmlidXRlLgoKQnV0IHRoYXQgb25seSBt ZWFucyB0aGF0IHRoZSBjaGlwIGlzIG1vc3QgbGlrZWx5IG5vdCBhIGdvb2QgY2hvaWNlIHdoZW4g c2VsZWN0aW5nCmEgdGVtcGVyYXR1cmUgc2Vuc29yLCBub3QgdGhhdCB0aGUgY29kZSBuZWNlc3Nh cnkgdG8gZ2V0IGl0IHdvcmtpbmcgc2hvdWxkIGJlIHVzZWQKYXMgYW4gZXhhbXBsZSBmb3Igb3Ro ZXIgZHJpdmVycy4gCgpHdWVudGVyCgo+ID4gQW4gZXhjZXB0aW9uIGlzIGlmIGEgbW9uaXRvcmlu ZyBkZXZpY2Ugc3VwcHBvcnRzIGludGVycnVwdHMsIGFuZCBpZiBpdHMgZHJpdmVyCj4gPiBhY3R1 YWxseSBpbXBsZW1lbnRzIHRob3NlIGludGVycnVwdHMuIFRoaXMgaXMsIGhvd2V2ZXIsIG5vdCB0 aGUgY2FzZSBmb3IgbW9zdCBvZgo+ID4gdGhlIGN1cnJlbnQgZHJpdmVycyAoaWYgYW55KSwgbW9z dGx5IGJlY2F1c2UgaW50ZXJydXB0IHN1cHBvcnQgZm9yIGhhcmR3YXJlCj4gPiBtb25pdG9yaW5n IGRldmljZXMgaXMgdmVyeSBwbGF0Zm9ybSBkZXBlbmRlbnQgYW5kIHRodXMgZGlmZmljdWx0IHRv IGltcGxlbWVudC4KPiAKPiBJbnRlcmVzdGluZ2x5IGVub3VnaCB0aGUgbmV3ZXN0IHZlcnNpb24g b2Ygb3VyIHBsYXRmb3JtIGNvbnRyb2wgbWljcm8KPiAoZG9pbmcgdGhlIGVuZXJneSBtb25pdG9y aW5nIGFzIHdlbGwpIGNhbiBnZW5lcmF0ZSBhbmQgaW50ZXJydXB0IHdoZW4gYQo+IHRyYW5zYWN0 aW9uIGlzIGZpbmlzaGVkLCBzbyBJIHdhcyBwbGFubmluZyB0byBwZXJpb2RpY2FsbHkgdXBkYXRl IHRoZQo+IGFsbCBzb3J0IG9mIHZhbHVlcy4gQW5kIGFnYWluLCBnZW5lcmF0aW5nIGEgdHJhY2Ug ZXZlbnQgb24gdGhpcwo+IG9wcG9ydHVuaXR5IHdvdWxkIGJlIHRyaXZpYWwuCj4gCj4gPiA+IE9m IGNvdXJzZSBhIHBhcnRpY3VsYXIgZHJpdmVyIGNvdWxkIHJlZ2lzdGVyIGl0cyBvd24gcGVyZiBQ TVUgb24gaXRzCj4gPiA+IG93bi4gSXQncyBjZXJ0YWlubHkgYW4gb3B0aW9uLCBqdXN0IHZlcnkg c3Vib3B0aW1hbCBpbiBteSBvcGluaW9uLgo+ID4gPiBPciBtYXliZSBub3Q/IE1heWJlIHRoZSB0 YXNrIGlzIHNvIHNwZWNpYWxpemVkIHRoYXQgaXQgbWFrZXMgc2Vuc2U/Cj4gPiA+IAo+ID4gV2Ug aGFkIGEgY291cGxlIG9mIGF0dGVtcHRzIHRvIHByb3ZpZGUgYW4gaW4ta2VybmVsIEFQSS4gVW5m b3J0dW5hdGVseSwKPiA+IHRoZSByZXN1bHQgd2FzLCBhdCBsZWFzdCBzbyBmYXIsIG1vcmUgY29t cGxleGl0eSBvbiB0aGUgZHJpdmVyIHNpZGUuCj4gPiBTbyB0aGUgZGlmZmljdWx0eSBpcyByZWFs bHkgdG8gZGVmaW5lIGFuIEFQSSB3aGljaCBpcyByZWFsbHkgc2ltcGxlLCBhbmQgZG9lcwo+ID4g bm90IGp1c3QgY29tcGxpY2F0ZSBkcml2ZXIgZGV2ZWxvcG1lbnQgZm9yIGEgKHByZXN1bWFibHkp IHJhcmUgdXNlIGNhc2UuCj4gCj4gWWVzLCBJIGFwcHJlY2lhdGUgdGhpcy4gVGhhdCdzIHdoeSB0 aGlzIG9wdGlvbiBpcyBhY3R1YWxseSBteSBsZWFzdAo+IGZhdm91cml0ZS4gQW55d2F5LCB3aGF0 IEkgd2FzIHRoaW5raW5nIGFib3V0IHdhcyBqdXN0IGEgdGhpbiBzaGluIHRoYXQKPiAqY2FuKiBi ZSB1c2VkIGJ5IGEgZHJpdmVyIHRvIHJlZ2lzdGVyIHNvbWUgcGFydGljdWxhciB2YWx1ZSB3aXRo IHRoZQo+IGNvcmUgKHNvIGl0IGNhbiBiZSBlbnVtZXJhdGVkIGFuZCBhY2Nlc3NlZCBieSBpbi1r ZXJuZWwgY2xpZW50cykgYW5kIHRoZQo+IGNvcmUgY291bGQgKG9yIG5vdCkgY3JlYXRlIGEgc3lz ZnMgYXR0cmlidXRlIGZvciB0aGlzIHZhbHVlIG9uIGJlaGFsZiBvZgo+IHRoZSBkcml2ZXIuIFNl ZW1zIGxpZ2h0d2VpZ2h0IGVub3VnaCwgdW5sZXNzIHByZXZpb3VzIGV4cGVyaWVuY2UKPiBzdWdn ZXN0cyBvdGhlcndpc2U/Cj4gCj4gQ2hlZXJzIQo+IAo+IFBhd2XFggo+IAo+IAo+IAoKX19fX19f X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KbG0tc2Vuc29ycyBtYWls aW5nIGxpc3QKbG0tc2Vuc29yc0BsbS1zZW5zb3JzLm9yZwpodHRwOi8vbGlzdHMubG0tc2Vuc29y cy5vcmcvbWFpbG1hbi9saXN0aW5mby9sbS1zZW5zb3Jz From mboxrd@z Thu Jan 1 00:00:00 1970 From: linux@roeck-us.net (Guenter Roeck) Date: Wed, 24 Oct 2012 13:01:44 -0700 Subject: [RFC] Energy/power monitoring within the kernel In-Reply-To: <1351096647.23327.64.camel@hornet> References: <1351013449.9070.5.camel@hornet> <20121023220240.GA25895@roeck-us.net> <1351096647.23327.64.camel@hornet> Message-ID: <20121024200144.GA21137@roeck-us.net> To: linux-arm-kernel@lists.infradead.org List-Id: linux-arm-kernel.lists.infradead.org On Wed, Oct 24, 2012 at 05:37:27PM +0100, Pawel Moll wrote: > On Tue, 2012-10-23 at 23:02 +0100, Guenter Roeck wrote: > > > Traditionally such data should be exposed to the user via hwmon sysfs > > > interface, and that's exactly what I did for "my" platform - I have > > > a /sys/class/hwmon/hwmon*/device/energy*_input and this was good > > > enough to draw pretty graphs in userspace. Everyone was happy... > > > > > Only driver supporting "energy" output so far is ibmaem, and the reported energy > > is supposed to be cumulative, as in energy = power * time. Do you mean power, > > possibly ? > > So the vexpress would be the second one, than :-) as the energy > "monitor" actually on the latest tiles reports 64-bit value of > microJoules consumed (or produced) since the power-up. > > Some of the older boards were able to report instant power, but this > metrics is less useful in our case. > > > > Now I am getting new requests to do more with this data. In particular > > > I'm asked how to add such information to ftrace/perf output. The second > > > most frequent request is about providing it to a "energy aware" > > > cpufreq governor. > > > > Anything energy related would have to be along the line of "do something after a > > certain amount of work has been performed", which at least at the surface does > > not make much sense to me, unless you mean something along the line of a > > process scheduler which schedules a process not based on time slices but based > > on energy consumed, ie if you want to define a time slice not in milli-seconds > > but in Joule. > > Actually there is some research being done in this direction, but it's > way too early to draw any conclusions... > > > If so, I would argue that a similar behavior could be achieved by varying the > > duration of time slices with the current CPU speed, or simply by using cycle > > count instead of time as time slice parameter. Not that I am sure if such an > > approach would really be of interest for anyone. > > > > Or do you really mean power, not energy, such as in "reduce CPU speed if its > > power consumption is above X Watt" ? > > Uh. To be completely honest I must answer: I'm not sure how the "energy > aware" cpufreq governor is supposed to work. I have been simply asked to > provide the data in some standard way, if possible. > > > I am not sure how this would be expected to work. hwmon is, by its very nature, > > a passive subsystem: It doesn't do anything unless data is explicitly requested > > from it. It does not update an attribute unless that attribute is read. > > That does not seem to fit well with the idea of tracing - which assumes > > that some activity is happening, ultimately, all by itself, presumably > > periodically. The idea to have a user space application read hwmon data only > > for it to trigger trace events does not seem to be very compelling to me. > > What I had in mind was similar to what adt7470 driver does. The driver > would automatically access the device every now and then to update it's > internal state and generate the trace event on the way. This > auto-refresh "feature" is particularly appealing for me, as on some of > "my" platforms can take up to 500 microseconds to actually get the data. > So doing this in background (and providing users with the last known > value in the meantime) seems attractive. > A bad example doesn't mean it should be used elsewhere. adt7470 needs up to two seconds for a temperature measurement cycle, and it can not perform automatic cycles all by itself. In this context, executing temperature measurement cycles in the background makes a lot of sense, especially since one does not want to wait for two seconds when reading a sysfs attribute. But that only means that the chip is most likely not a good choice when selecting a temperature sensor, not that the code necessary to get it working should be used as an example for other drivers. Guenter > > An exception is if a monitoring device suppports interrupts, and if its driver > > actually implements those interrupts. This is, however, not the case for most of > > the current drivers (if any), mostly because interrupt support for hardware > > monitoring devices is very platform dependent and thus difficult to implement. > > Interestingly enough the newest version of our platform control micro > (doing the energy monitoring as well) can generate and interrupt when a > transaction is finished, so I was planning to periodically update the > all sort of values. And again, generating a trace event on this > opportunity would be trivial. > > > > Of course a particular driver could register its own perf PMU on its > > > own. It's certainly an option, just very suboptimal in my opinion. > > > Or maybe not? Maybe the task is so specialized that it makes sense? > > > > > We had a couple of attempts to provide an in-kernel API. Unfortunately, > > the result was, at least so far, more complexity on the driver side. > > So the difficulty is really to define an API which is really simple, and does > > not just complicate driver development for a (presumably) rare use case. > > Yes, I appreciate this. That's why this option is actually my least > favourite. Anyway, what I was thinking about was just a thin shin that > *can* be used by a driver to register some particular value with the > core (so it can be enumerated and accessed by in-kernel clients) and the > core could (or not) create a sysfs attribute for this value on behalf of > the driver. Seems lightweight enough, unless previous experience > suggests otherwise? > > Cheers! > > Pawe? > > > From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1758051Ab2JXUAc (ORCPT ); Wed, 24 Oct 2012 16:00:32 -0400 Received: from mail.active-venture.com ([67.228.131.205]:51575 "EHLO mail.active-venture.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1757503Ab2JXUAa (ORCPT ); Wed, 24 Oct 2012 16:00:30 -0400 X-Originating-IP: 108.223.40.66 Date: Wed, 24 Oct 2012 13:01:44 -0700 From: Guenter Roeck To: Pawel Moll Cc: Amit Daniel Kachhap , Zhang Rui , Viresh Kumar , Daniel Lezcano , Jean Delvare , Steven Rostedt , Frederic Weisbecker , Ingo Molnar , Jesper Juhl , Thomas Renninger , Jean Pihet , "linux-kernel@vger.kernel.org" , "linux-arm-kernel@lists.infradead.org" , "lm-sensors@lm-sensors.org" , "linaro-dev@lists.linaro.org" Subject: Re: [RFC] Energy/power monitoring within the kernel Message-ID: <20121024200144.GA21137@roeck-us.net> References: <1351013449.9070.5.camel@hornet> <20121023220240.GA25895@roeck-us.net> <1351096647.23327.64.camel@hornet> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <1351096647.23327.64.camel@hornet> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Oct 24, 2012 at 05:37:27PM +0100, Pawel Moll wrote: > On Tue, 2012-10-23 at 23:02 +0100, Guenter Roeck wrote: > > > Traditionally such data should be exposed to the user via hwmon sysfs > > > interface, and that's exactly what I did for "my" platform - I have > > > a /sys/class/hwmon/hwmon*/device/energy*_input and this was good > > > enough to draw pretty graphs in userspace. Everyone was happy... > > > > > Only driver supporting "energy" output so far is ibmaem, and the reported energy > > is supposed to be cumulative, as in energy = power * time. Do you mean power, > > possibly ? > > So the vexpress would be the second one, than :-) as the energy > "monitor" actually on the latest tiles reports 64-bit value of > microJoules consumed (or produced) since the power-up. > > Some of the older boards were able to report instant power, but this > metrics is less useful in our case. > > > > Now I am getting new requests to do more with this data. In particular > > > I'm asked how to add such information to ftrace/perf output. The second > > > most frequent request is about providing it to a "energy aware" > > > cpufreq governor. > > > > Anything energy related would have to be along the line of "do something after a > > certain amount of work has been performed", which at least at the surface does > > not make much sense to me, unless you mean something along the line of a > > process scheduler which schedules a process not based on time slices but based > > on energy consumed, ie if you want to define a time slice not in milli-seconds > > but in Joule. > > Actually there is some research being done in this direction, but it's > way too early to draw any conclusions... > > > If so, I would argue that a similar behavior could be achieved by varying the > > duration of time slices with the current CPU speed, or simply by using cycle > > count instead of time as time slice parameter. Not that I am sure if such an > > approach would really be of interest for anyone. > > > > Or do you really mean power, not energy, such as in "reduce CPU speed if its > > power consumption is above X Watt" ? > > Uh. To be completely honest I must answer: I'm not sure how the "energy > aware" cpufreq governor is supposed to work. I have been simply asked to > provide the data in some standard way, if possible. > > > I am not sure how this would be expected to work. hwmon is, by its very nature, > > a passive subsystem: It doesn't do anything unless data is explicitly requested > > from it. It does not update an attribute unless that attribute is read. > > That does not seem to fit well with the idea of tracing - which assumes > > that some activity is happening, ultimately, all by itself, presumably > > periodically. The idea to have a user space application read hwmon data only > > for it to trigger trace events does not seem to be very compelling to me. > > What I had in mind was similar to what adt7470 driver does. The driver > would automatically access the device every now and then to update it's > internal state and generate the trace event on the way. This > auto-refresh "feature" is particularly appealing for me, as on some of > "my" platforms can take up to 500 microseconds to actually get the data. > So doing this in background (and providing users with the last known > value in the meantime) seems attractive. > A bad example doesn't mean it should be used elsewhere. adt7470 needs up to two seconds for a temperature measurement cycle, and it can not perform automatic cycles all by itself. In this context, executing temperature measurement cycles in the background makes a lot of sense, especially since one does not want to wait for two seconds when reading a sysfs attribute. But that only means that the chip is most likely not a good choice when selecting a temperature sensor, not that the code necessary to get it working should be used as an example for other drivers. Guenter > > An exception is if a monitoring device suppports interrupts, and if its driver > > actually implements those interrupts. This is, however, not the case for most of > > the current drivers (if any), mostly because interrupt support for hardware > > monitoring devices is very platform dependent and thus difficult to implement. > > Interestingly enough the newest version of our platform control micro > (doing the energy monitoring as well) can generate and interrupt when a > transaction is finished, so I was planning to periodically update the > all sort of values. And again, generating a trace event on this > opportunity would be trivial. > > > > Of course a particular driver could register its own perf PMU on its > > > own. It's certainly an option, just very suboptimal in my opinion. > > > Or maybe not? Maybe the task is so specialized that it makes sense? > > > > > We had a couple of attempts to provide an in-kernel API. Unfortunately, > > the result was, at least so far, more complexity on the driver side. > > So the difficulty is really to define an API which is really simple, and does > > not just complicate driver development for a (presumably) rare use case. > > Yes, I appreciate this. That's why this option is actually my least > favourite. Anyway, what I was thinking about was just a thin shin that > *can* be used by a driver to register some particular value with the > core (so it can be enumerated and accessed by in-kernel clients) and the > core could (or not) create a sysfs attribute for this value on behalf of > the driver. Seems lightweight enough, unless previous experience > suggests otherwise? > > Cheers! > > Paweł > > >