From: Guenter Roeck <linux@roeck-us.net>
To: Pawel Moll <pawel.moll@arm.com>
Cc: Amit Daniel Kachhap <amit.kachhap@linaro.org>,
Zhang Rui <rui.zhang@intel.com>,
Viresh Kumar <viresh.kumar@linaro.org>,
Daniel Lezcano <daniel.lezcano@linaro.org>,
Jean Delvare <khali@linux-fr.org>,
Steven Rostedt <rostedt@goodmis.org>,
Frederic Weisbecker <fweisbec@gmail.com>,
Ingo Molnar <mingo@elte.hu>, Jesper Juhl <jj@chaosbits.net>,
Thomas Renninger <trenn@suse.de>,
Jean Pihet <jean.pihet@newoldbits.com>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"linux-arm-kernel@lists.infradead.org"
<linux-arm-kernel@lists.infradead.org>,
"lm-sensors@lm-sensors.org" <lm-sensors@lm-sensors.org>,
"linaro-dev@lists.linaro.org" <linaro-dev@lists.linaro.org>
Subject: Re: [lm-sensors] [RFC] Energy/power monitoring within the kernel
Date: Wed, 24 Oct 2012 20:01:44 +0000 [thread overview]
Message-ID: <20121024200144.GA21137@roeck-us.net> (raw)
In-Reply-To: <1351096647.23327.64.camel@hornet>
T24gV2VkLCBPY3QgMjQsIDIwMTIgYXQgMDU6Mzc6MjdQTSArMDEwMCwgUGF3ZWwgTW9sbCB3cm90
ZToKPiBPbiBUdWUsIDIwMTItMTAtMjMgYXQgMjM6MDIgKzAxMDAsIEd1ZW50ZXIgUm9lY2sgd3Jv
dGU6Cj4gPiA+IFRyYWRpdGlvbmFsbHkgc3VjaCBkYXRhIHNob3VsZCBiZSBleHBvc2VkIHRvIHRo
ZSB1c2VyIHZpYSBod21vbiBzeXNmcwo+ID4gPiBpbnRlcmZhY2UsIGFuZCB0aGF0J3MgZXhhY3Rs
eSB3aGF0IEkgZGlkIGZvciAibXkiIHBsYXRmb3JtIC0gSSBoYXZlCj4gPiA+IGEgL3N5cy9jbGFz
cy9od21vbi9od21vbiovZGV2aWNlL2VuZXJneSpfaW5wdXQgYW5kIHRoaXMgd2FzIGdvb2QKPiA+
ID4gZW5vdWdoIHRvIGRyYXcgcHJldHR5IGdyYXBocyBpbiB1c2Vyc3BhY2UuIEV2ZXJ5b25lIHdh
cyBoYXBweS4uLgo+ID4gPiAKPiA+IE9ubHkgZHJpdmVyIHN1cHBvcnRpbmcgImVuZXJneSIgb3V0
cHV0IHNvIGZhciBpcyBpYm1hZW0sIGFuZCB0aGUgcmVwb3J0ZWQgZW5lcmd5Cj4gPiBpcyBzdXBw
b3NlZCB0byBiZSBjdW11bGF0aXZlLCBhcyBpbiBlbmVyZ3kgPSBwb3dlciAqIHRpbWUuIERvIHlv
dSBtZWFuIHBvd2VyLAo+ID4gcG9zc2libHkgPwo+IAo+IFNvIHRoZSB2ZXhwcmVzcyB3b3VsZCBi
ZSB0aGUgc2Vjb25kIG9uZSwgdGhhbiA6LSkgYXMgdGhlIGVuZXJneQo+ICJtb25pdG9yIiBhY3R1
YWxseSBvbiB0aGUgbGF0ZXN0IHRpbGVzIHJlcG9ydHMgNjQtYml0IHZhbHVlIG9mCj4gbWljcm9K
b3VsZXMgY29uc3VtZWQgKG9yIHByb2R1Y2VkKSBzaW5jZSB0aGUgcG93ZXItdXAuCj4gCj4gU29t
ZSBvZiB0aGUgb2xkZXIgYm9hcmRzIHdlcmUgYWJsZSB0byByZXBvcnQgaW5zdGFudCBwb3dlciwg
YnV0IHRoaXMKPiBtZXRyaWNzIGlzIGxlc3MgdXNlZnVsIGluIG91ciBjYXNlLgo+IAo+ID4gPiBO
b3cgSSBhbSBnZXR0aW5nIG5ldyByZXF1ZXN0cyB0byBkbyBtb3JlIHdpdGggdGhpcyBkYXRhLiBJ
biBwYXJ0aWN1bGFyCj4gPiA+IEknbSBhc2tlZCBob3cgdG8gYWRkIHN1Y2ggaW5mb3JtYXRpb24g
dG8gZnRyYWNlL3BlcmYgb3V0cHV0LiBUaGUgc2Vjb25kCj4gPiA+IG1vc3QgZnJlcXVlbnQgcmVx
dWVzdCBpcyBhYm91dCBwcm92aWRpbmcgaXQgdG8gYSAiZW5lcmd5IGF3YXJlIgo+ID4gPiBjcHVm
cmVxIGdvdmVybm9yLgo+ID4gCj4gPiBBbnl0aGluZyBlbmVyZ3kgcmVsYXRlZCB3b3VsZCBoYXZl
IHRvIGJlIGFsb25nIHRoZSBsaW5lIG9mICJkbyBzb21ldGhpbmcgYWZ0ZXIgYQo+ID4gY2VydGFp
biBhbW91bnQgb2Ygd29yayBoYXMgYmVlbiBwZXJmb3JtZWQiLCB3aGljaCBhdCBsZWFzdCBhdCB0
aGUgc3VyZmFjZSBkb2VzCj4gPiBub3QgbWFrZSBtdWNoIHNlbnNlIHRvIG1lLCB1bmxlc3MgeW91
IG1lYW4gc29tZXRoaW5nIGFsb25nIHRoZSBsaW5lIG9mIGEKPiA+IHByb2Nlc3Mgc2NoZWR1bGVy
IHdoaWNoIHNjaGVkdWxlcyBhIHByb2Nlc3Mgbm90IGJhc2VkIG9uIHRpbWUgc2xpY2VzIGJ1dCBi
YXNlZAo+ID4gb24gZW5lcmd5IGNvbnN1bWVkLCBpZSBpZiB5b3Ugd2FudCB0byBkZWZpbmUgYSB0
aW1lIHNsaWNlIG5vdCBpbiBtaWxsaS1zZWNvbmRzCj4gPiBidXQgaW4gSm91bGUuCj4gCj4gQWN0
dWFsbHkgdGhlcmUgaXMgc29tZSByZXNlYXJjaCBiZWluZyBkb25lIGluIHRoaXMgZGlyZWN0aW9u
LCBidXQgaXQncwo+IHdheSB0b28gZWFybHkgdG8gZHJhdyBhbnkgY29uY2x1c2lvbnMuLi4KPiAK
PiA+IElmIHNvLCBJIHdvdWxkIGFyZ3VlIHRoYXQgYSBzaW1pbGFyIGJlaGF2aW9yIGNvdWxkIGJl
IGFjaGlldmVkIGJ5IHZhcnlpbmcgdGhlCj4gPiBkdXJhdGlvbiBvZiB0aW1lIHNsaWNlcyB3aXRo
IHRoZSBjdXJyZW50IENQVSBzcGVlZCwgb3Igc2ltcGx5IGJ5IHVzaW5nIGN5Y2xlCj4gPiBjb3Vu
dCBpbnN0ZWFkIG9mIHRpbWUgYXMgdGltZSBzbGljZSBwYXJhbWV0ZXIuIE5vdCB0aGF0IEkgYW0g
c3VyZSBpZiBzdWNoIGFuCj4gPiBhcHByb2FjaCB3b3VsZCByZWFsbHkgYmUgb2YgaW50ZXJlc3Qg
Zm9yIGFueW9uZS4gCj4gPiAKPiA+IE9yIGRvIHlvdSByZWFsbHkgbWVhbiBwb3dlciwgbm90IGVu
ZXJneSwgc3VjaCBhcyBpbiAicmVkdWNlIENQVSBzcGVlZCBpZiBpdHMKPiA+IHBvd2VyIGNvbnN1
bXB0aW9uIGlzIGFib3ZlIFggV2F0dCIgPwo+IAo+IFVoLiBUbyBiZSBjb21wbGV0ZWx5IGhvbmVz
dCBJIG11c3QgYW5zd2VyOiBJJ20gbm90IHN1cmUgaG93IHRoZSAiZW5lcmd5Cj4gYXdhcmUiIGNw
dWZyZXEgZ292ZXJub3IgaXMgc3VwcG9zZWQgdG8gd29yay4gSSBoYXZlIGJlZW4gc2ltcGx5IGFz
a2VkIHRvCj4gcHJvdmlkZSB0aGUgZGF0YSBpbiBzb21lIHN0YW5kYXJkIHdheSwgaWYgcG9zc2li
bGUuCj4gCj4gPiBJIGFtIG5vdCBzdXJlIGhvdyB0aGlzIHdvdWxkIGJlIGV4cGVjdGVkIHRvIHdv
cmsuIGh3bW9uIGlzLCBieSBpdHMgdmVyeSBuYXR1cmUsCj4gPiBhIHBhc3NpdmUgc3Vic3lzdGVt
OiBJdCBkb2Vzbid0IGRvIGFueXRoaW5nIHVubGVzcyBkYXRhIGlzIGV4cGxpY2l0bHkgcmVxdWVz
dGVkCj4gPiBmcm9tIGl0LiBJdCBkb2VzIG5vdCB1cGRhdGUgYW4gYXR0cmlidXRlIHVubGVzcyB0
aGF0IGF0dHJpYnV0ZSBpcyByZWFkLgo+ID4gVGhhdCBkb2VzIG5vdCBzZWVtIHRvIGZpdCB3ZWxs
IHdpdGggdGhlIGlkZWEgb2YgdHJhY2luZyAtIHdoaWNoIGFzc3VtZXMKPiA+IHRoYXQgc29tZSBh
Y3Rpdml0eSBpcyBoYXBwZW5pbmcsIHVsdGltYXRlbHksIGFsbCBieSBpdHNlbGYsIHByZXN1bWFi
bHkKPiA+IHBlcmlvZGljYWxseS4gVGhlIGlkZWEgdG8gaGF2ZSBhIHVzZXIgc3BhY2UgYXBwbGlj
YXRpb24gcmVhZCBod21vbiBkYXRhIG9ubHkKPiA+IGZvciBpdCB0byB0cmlnZ2VyIHRyYWNlIGV2
ZW50cyBkb2VzIG5vdCBzZWVtIHRvIGJlIHZlcnkgY29tcGVsbGluZyB0byBtZS4KPiAKPiBXaGF0
IEkgaGFkIGluIG1pbmQgd2FzIHNpbWlsYXIgdG8gd2hhdCBhZHQ3NDcwIGRyaXZlciBkb2VzLiBU
aGUgZHJpdmVyCj4gd291bGQgYXV0b21hdGljYWxseSBhY2Nlc3MgdGhlIGRldmljZSBldmVyeSBu
b3cgYW5kIHRoZW4gdG8gdXBkYXRlIGl0J3MKPiBpbnRlcm5hbCBzdGF0ZSBhbmQgZ2VuZXJhdGUg
dGhlIHRyYWNlIGV2ZW50IG9uIHRoZSB3YXkuIFRoaXMKPiBhdXRvLXJlZnJlc2ggImZlYXR1cmUi
IGlzIHBhcnRpY3VsYXJseSBhcHBlYWxpbmcgZm9yIG1lLCBhcyBvbiBzb21lIG9mCj4gIm15IiBw
bGF0Zm9ybXMgY2FuIHRha2UgdXAgdG8gNTAwIG1pY3Jvc2Vjb25kcyB0byBhY3R1YWxseSBnZXQg
dGhlIGRhdGEuCj4gU28gZG9pbmcgdGhpcyBpbiBiYWNrZ3JvdW5kIChhbmQgcHJvdmlkaW5nIHVz
ZXJzIHdpdGggdGhlIGxhc3Qga25vd24KPiB2YWx1ZSBpbiB0aGUgbWVhbnRpbWUpIHNlZW1zIGF0
dHJhY3RpdmUuCj4gCkEgYmFkIGV4YW1wbGUgZG9lc24ndCBtZWFuIGl0IHNob3VsZCBiZSB1c2Vk
IGVsc2V3aGVyZS4KCmFkdDc0NzAgbmVlZHMgdXAgdG8gdHdvIHNlY29uZHMgZm9yIGEgdGVtcGVy
YXR1cmUgbWVhc3VyZW1lbnQgY3ljbGUsIGFuZCBpdApjYW4gbm90IHBlcmZvcm0gYXV0b21hdGlj
IGN5Y2xlcyBhbGwgYnkgaXRzZWxmLiBJbiB0aGlzIGNvbnRleHQsIGV4ZWN1dGluZwp0ZW1wZXJh
dHVyZSBtZWFzdXJlbWVudCBjeWNsZXMgaW4gdGhlIGJhY2tncm91bmQgbWFrZXMgYSBsb3Qgb2Yg
c2Vuc2UsCmVzcGVjaWFsbHkgc2luY2Ugb25lIGRvZXMgbm90IHdhbnQgdG8gd2FpdCBmb3IgdHdv
IHNlY29uZHMgd2hlbiByZWFkaW5nCmEgc3lzZnMgYXR0cmlidXRlLgoKQnV0IHRoYXQgb25seSBt
ZWFucyB0aGF0IHRoZSBjaGlwIGlzIG1vc3QgbGlrZWx5IG5vdCBhIGdvb2QgY2hvaWNlIHdoZW4g
c2VsZWN0aW5nCmEgdGVtcGVyYXR1cmUgc2Vuc29yLCBub3QgdGhhdCB0aGUgY29kZSBuZWNlc3Nh
cnkgdG8gZ2V0IGl0IHdvcmtpbmcgc2hvdWxkIGJlIHVzZWQKYXMgYW4gZXhhbXBsZSBmb3Igb3Ro
ZXIgZHJpdmVycy4gCgpHdWVudGVyCgo+ID4gQW4gZXhjZXB0aW9uIGlzIGlmIGEgbW9uaXRvcmlu
ZyBkZXZpY2Ugc3VwcHBvcnRzIGludGVycnVwdHMsIGFuZCBpZiBpdHMgZHJpdmVyCj4gPiBhY3R1
YWxseSBpbXBsZW1lbnRzIHRob3NlIGludGVycnVwdHMuIFRoaXMgaXMsIGhvd2V2ZXIsIG5vdCB0
aGUgY2FzZSBmb3IgbW9zdCBvZgo+ID4gdGhlIGN1cnJlbnQgZHJpdmVycyAoaWYgYW55KSwgbW9z
dGx5IGJlY2F1c2UgaW50ZXJydXB0IHN1cHBvcnQgZm9yIGhhcmR3YXJlCj4gPiBtb25pdG9yaW5n
IGRldmljZXMgaXMgdmVyeSBwbGF0Zm9ybSBkZXBlbmRlbnQgYW5kIHRodXMgZGlmZmljdWx0IHRv
IGltcGxlbWVudC4KPiAKPiBJbnRlcmVzdGluZ2x5IGVub3VnaCB0aGUgbmV3ZXN0IHZlcnNpb24g
b2Ygb3VyIHBsYXRmb3JtIGNvbnRyb2wgbWljcm8KPiAoZG9pbmcgdGhlIGVuZXJneSBtb25pdG9y
aW5nIGFzIHdlbGwpIGNhbiBnZW5lcmF0ZSBhbmQgaW50ZXJydXB0IHdoZW4gYQo+IHRyYW5zYWN0
aW9uIGlzIGZpbmlzaGVkLCBzbyBJIHdhcyBwbGFubmluZyB0byBwZXJpb2RpY2FsbHkgdXBkYXRl
IHRoZQo+IGFsbCBzb3J0IG9mIHZhbHVlcy4gQW5kIGFnYWluLCBnZW5lcmF0aW5nIGEgdHJhY2Ug
ZXZlbnQgb24gdGhpcwo+IG9wcG9ydHVuaXR5IHdvdWxkIGJlIHRyaXZpYWwuCj4gCj4gPiA+IE9m
IGNvdXJzZSBhIHBhcnRpY3VsYXIgZHJpdmVyIGNvdWxkIHJlZ2lzdGVyIGl0cyBvd24gcGVyZiBQ
TVUgb24gaXRzCj4gPiA+IG93bi4gSXQncyBjZXJ0YWlubHkgYW4gb3B0aW9uLCBqdXN0IHZlcnkg
c3Vib3B0aW1hbCBpbiBteSBvcGluaW9uLgo+ID4gPiBPciBtYXliZSBub3Q/IE1heWJlIHRoZSB0
YXNrIGlzIHNvIHNwZWNpYWxpemVkIHRoYXQgaXQgbWFrZXMgc2Vuc2U/Cj4gPiA+IAo+ID4gV2Ug
aGFkIGEgY291cGxlIG9mIGF0dGVtcHRzIHRvIHByb3ZpZGUgYW4gaW4ta2VybmVsIEFQSS4gVW5m
b3J0dW5hdGVseSwKPiA+IHRoZSByZXN1bHQgd2FzLCBhdCBsZWFzdCBzbyBmYXIsIG1vcmUgY29t
cGxleGl0eSBvbiB0aGUgZHJpdmVyIHNpZGUuCj4gPiBTbyB0aGUgZGlmZmljdWx0eSBpcyByZWFs
bHkgdG8gZGVmaW5lIGFuIEFQSSB3aGljaCBpcyByZWFsbHkgc2ltcGxlLCBhbmQgZG9lcwo+ID4g
bm90IGp1c3QgY29tcGxpY2F0ZSBkcml2ZXIgZGV2ZWxvcG1lbnQgZm9yIGEgKHByZXN1bWFibHkp
IHJhcmUgdXNlIGNhc2UuCj4gCj4gWWVzLCBJIGFwcHJlY2lhdGUgdGhpcy4gVGhhdCdzIHdoeSB0
aGlzIG9wdGlvbiBpcyBhY3R1YWxseSBteSBsZWFzdAo+IGZhdm91cml0ZS4gQW55d2F5LCB3aGF0
IEkgd2FzIHRoaW5raW5nIGFib3V0IHdhcyBqdXN0IGEgdGhpbiBzaGluIHRoYXQKPiAqY2FuKiBi
ZSB1c2VkIGJ5IGEgZHJpdmVyIHRvIHJlZ2lzdGVyIHNvbWUgcGFydGljdWxhciB2YWx1ZSB3aXRo
IHRoZQo+IGNvcmUgKHNvIGl0IGNhbiBiZSBlbnVtZXJhdGVkIGFuZCBhY2Nlc3NlZCBieSBpbi1r
ZXJuZWwgY2xpZW50cykgYW5kIHRoZQo+IGNvcmUgY291bGQgKG9yIG5vdCkgY3JlYXRlIGEgc3lz
ZnMgYXR0cmlidXRlIGZvciB0aGlzIHZhbHVlIG9uIGJlaGFsZiBvZgo+IHRoZSBkcml2ZXIuIFNl
ZW1zIGxpZ2h0d2VpZ2h0IGVub3VnaCwgdW5sZXNzIHByZXZpb3VzIGV4cGVyaWVuY2UKPiBzdWdn
ZXN0cyBvdGhlcndpc2U/Cj4gCj4gQ2hlZXJzIQo+IAo+IFBhd2XFggo+IAo+IAo+IAoKX19fX19f
X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KbG0tc2Vuc29ycyBtYWls
aW5nIGxpc3QKbG0tc2Vuc29yc0BsbS1zZW5zb3JzLm9yZwpodHRwOi8vbGlzdHMubG0tc2Vuc29y
cy5vcmcvbWFpbG1hbi9saXN0aW5mby9sbS1zZW5zb3Jz
WARNING: multiple messages have this Message-ID (diff)
From: linux@roeck-us.net (Guenter Roeck)
To: linux-arm-kernel@lists.infradead.org
Subject: [RFC] Energy/power monitoring within the kernel
Date: Wed, 24 Oct 2012 13:01:44 -0700 [thread overview]
Message-ID: <20121024200144.GA21137@roeck-us.net> (raw)
In-Reply-To: <1351096647.23327.64.camel@hornet>
On Wed, Oct 24, 2012 at 05:37:27PM +0100, Pawel Moll wrote:
> On Tue, 2012-10-23 at 23:02 +0100, Guenter Roeck wrote:
> > > Traditionally such data should be exposed to the user via hwmon sysfs
> > > interface, and that's exactly what I did for "my" platform - I have
> > > a /sys/class/hwmon/hwmon*/device/energy*_input and this was good
> > > enough to draw pretty graphs in userspace. Everyone was happy...
> > >
> > Only driver supporting "energy" output so far is ibmaem, and the reported energy
> > is supposed to be cumulative, as in energy = power * time. Do you mean power,
> > possibly ?
>
> So the vexpress would be the second one, than :-) as the energy
> "monitor" actually on the latest tiles reports 64-bit value of
> microJoules consumed (or produced) since the power-up.
>
> Some of the older boards were able to report instant power, but this
> metrics is less useful in our case.
>
> > > Now I am getting new requests to do more with this data. In particular
> > > I'm asked how to add such information to ftrace/perf output. The second
> > > most frequent request is about providing it to a "energy aware"
> > > cpufreq governor.
> >
> > Anything energy related would have to be along the line of "do something after a
> > certain amount of work has been performed", which at least at the surface does
> > not make much sense to me, unless you mean something along the line of a
> > process scheduler which schedules a process not based on time slices but based
> > on energy consumed, ie if you want to define a time slice not in milli-seconds
> > but in Joule.
>
> Actually there is some research being done in this direction, but it's
> way too early to draw any conclusions...
>
> > If so, I would argue that a similar behavior could be achieved by varying the
> > duration of time slices with the current CPU speed, or simply by using cycle
> > count instead of time as time slice parameter. Not that I am sure if such an
> > approach would really be of interest for anyone.
> >
> > Or do you really mean power, not energy, such as in "reduce CPU speed if its
> > power consumption is above X Watt" ?
>
> Uh. To be completely honest I must answer: I'm not sure how the "energy
> aware" cpufreq governor is supposed to work. I have been simply asked to
> provide the data in some standard way, if possible.
>
> > I am not sure how this would be expected to work. hwmon is, by its very nature,
> > a passive subsystem: It doesn't do anything unless data is explicitly requested
> > from it. It does not update an attribute unless that attribute is read.
> > That does not seem to fit well with the idea of tracing - which assumes
> > that some activity is happening, ultimately, all by itself, presumably
> > periodically. The idea to have a user space application read hwmon data only
> > for it to trigger trace events does not seem to be very compelling to me.
>
> What I had in mind was similar to what adt7470 driver does. The driver
> would automatically access the device every now and then to update it's
> internal state and generate the trace event on the way. This
> auto-refresh "feature" is particularly appealing for me, as on some of
> "my" platforms can take up to 500 microseconds to actually get the data.
> So doing this in background (and providing users with the last known
> value in the meantime) seems attractive.
>
A bad example doesn't mean it should be used elsewhere.
adt7470 needs up to two seconds for a temperature measurement cycle, and it
can not perform automatic cycles all by itself. In this context, executing
temperature measurement cycles in the background makes a lot of sense,
especially since one does not want to wait for two seconds when reading
a sysfs attribute.
But that only means that the chip is most likely not a good choice when selecting
a temperature sensor, not that the code necessary to get it working should be used
as an example for other drivers.
Guenter
> > An exception is if a monitoring device suppports interrupts, and if its driver
> > actually implements those interrupts. This is, however, not the case for most of
> > the current drivers (if any), mostly because interrupt support for hardware
> > monitoring devices is very platform dependent and thus difficult to implement.
>
> Interestingly enough the newest version of our platform control micro
> (doing the energy monitoring as well) can generate and interrupt when a
> transaction is finished, so I was planning to periodically update the
> all sort of values. And again, generating a trace event on this
> opportunity would be trivial.
>
> > > Of course a particular driver could register its own perf PMU on its
> > > own. It's certainly an option, just very suboptimal in my opinion.
> > > Or maybe not? Maybe the task is so specialized that it makes sense?
> > >
> > We had a couple of attempts to provide an in-kernel API. Unfortunately,
> > the result was, at least so far, more complexity on the driver side.
> > So the difficulty is really to define an API which is really simple, and does
> > not just complicate driver development for a (presumably) rare use case.
>
> Yes, I appreciate this. That's why this option is actually my least
> favourite. Anyway, what I was thinking about was just a thin shin that
> *can* be used by a driver to register some particular value with the
> core (so it can be enumerated and accessed by in-kernel clients) and the
> core could (or not) create a sysfs attribute for this value on behalf of
> the driver. Seems lightweight enough, unless previous experience
> suggests otherwise?
>
> Cheers!
>
> Pawe?
>
>
>
WARNING: multiple messages have this Message-ID (diff)
From: Guenter Roeck <linux@roeck-us.net>
To: Pawel Moll <pawel.moll@arm.com>
Cc: Amit Daniel Kachhap <amit.kachhap@linaro.org>,
Zhang Rui <rui.zhang@intel.com>,
Viresh Kumar <viresh.kumar@linaro.org>,
Daniel Lezcano <daniel.lezcano@linaro.org>,
Jean Delvare <khali@linux-fr.org>,
Steven Rostedt <rostedt@goodmis.org>,
Frederic Weisbecker <fweisbec@gmail.com>,
Ingo Molnar <mingo@elte.hu>, Jesper Juhl <jj@chaosbits.net>,
Thomas Renninger <trenn@suse.de>,
Jean Pihet <jean.pihet@newoldbits.com>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"linux-arm-kernel@lists.infradead.org"
<linux-arm-kernel@lists.infradead.org>,
"lm-sensors@lm-sensors.org" <lm-sensors@lm-sensors.org>,
"linaro-dev@lists.linaro.org" <linaro-dev@lists.linaro.org>
Subject: Re: [RFC] Energy/power monitoring within the kernel
Date: Wed, 24 Oct 2012 13:01:44 -0700 [thread overview]
Message-ID: <20121024200144.GA21137@roeck-us.net> (raw)
In-Reply-To: <1351096647.23327.64.camel@hornet>
On Wed, Oct 24, 2012 at 05:37:27PM +0100, Pawel Moll wrote:
> On Tue, 2012-10-23 at 23:02 +0100, Guenter Roeck wrote:
> > > Traditionally such data should be exposed to the user via hwmon sysfs
> > > interface, and that's exactly what I did for "my" platform - I have
> > > a /sys/class/hwmon/hwmon*/device/energy*_input and this was good
> > > enough to draw pretty graphs in userspace. Everyone was happy...
> > >
> > Only driver supporting "energy" output so far is ibmaem, and the reported energy
> > is supposed to be cumulative, as in energy = power * time. Do you mean power,
> > possibly ?
>
> So the vexpress would be the second one, than :-) as the energy
> "monitor" actually on the latest tiles reports 64-bit value of
> microJoules consumed (or produced) since the power-up.
>
> Some of the older boards were able to report instant power, but this
> metrics is less useful in our case.
>
> > > Now I am getting new requests to do more with this data. In particular
> > > I'm asked how to add such information to ftrace/perf output. The second
> > > most frequent request is about providing it to a "energy aware"
> > > cpufreq governor.
> >
> > Anything energy related would have to be along the line of "do something after a
> > certain amount of work has been performed", which at least at the surface does
> > not make much sense to me, unless you mean something along the line of a
> > process scheduler which schedules a process not based on time slices but based
> > on energy consumed, ie if you want to define a time slice not in milli-seconds
> > but in Joule.
>
> Actually there is some research being done in this direction, but it's
> way too early to draw any conclusions...
>
> > If so, I would argue that a similar behavior could be achieved by varying the
> > duration of time slices with the current CPU speed, or simply by using cycle
> > count instead of time as time slice parameter. Not that I am sure if such an
> > approach would really be of interest for anyone.
> >
> > Or do you really mean power, not energy, such as in "reduce CPU speed if its
> > power consumption is above X Watt" ?
>
> Uh. To be completely honest I must answer: I'm not sure how the "energy
> aware" cpufreq governor is supposed to work. I have been simply asked to
> provide the data in some standard way, if possible.
>
> > I am not sure how this would be expected to work. hwmon is, by its very nature,
> > a passive subsystem: It doesn't do anything unless data is explicitly requested
> > from it. It does not update an attribute unless that attribute is read.
> > That does not seem to fit well with the idea of tracing - which assumes
> > that some activity is happening, ultimately, all by itself, presumably
> > periodically. The idea to have a user space application read hwmon data only
> > for it to trigger trace events does not seem to be very compelling to me.
>
> What I had in mind was similar to what adt7470 driver does. The driver
> would automatically access the device every now and then to update it's
> internal state and generate the trace event on the way. This
> auto-refresh "feature" is particularly appealing for me, as on some of
> "my" platforms can take up to 500 microseconds to actually get the data.
> So doing this in background (and providing users with the last known
> value in the meantime) seems attractive.
>
A bad example doesn't mean it should be used elsewhere.
adt7470 needs up to two seconds for a temperature measurement cycle, and it
can not perform automatic cycles all by itself. In this context, executing
temperature measurement cycles in the background makes a lot of sense,
especially since one does not want to wait for two seconds when reading
a sysfs attribute.
But that only means that the chip is most likely not a good choice when selecting
a temperature sensor, not that the code necessary to get it working should be used
as an example for other drivers.
Guenter
> > An exception is if a monitoring device suppports interrupts, and if its driver
> > actually implements those interrupts. This is, however, not the case for most of
> > the current drivers (if any), mostly because interrupt support for hardware
> > monitoring devices is very platform dependent and thus difficult to implement.
>
> Interestingly enough the newest version of our platform control micro
> (doing the energy monitoring as well) can generate and interrupt when a
> transaction is finished, so I was planning to periodically update the
> all sort of values. And again, generating a trace event on this
> opportunity would be trivial.
>
> > > Of course a particular driver could register its own perf PMU on its
> > > own. It's certainly an option, just very suboptimal in my opinion.
> > > Or maybe not? Maybe the task is so specialized that it makes sense?
> > >
> > We had a couple of attempts to provide an in-kernel API. Unfortunately,
> > the result was, at least so far, more complexity on the driver side.
> > So the difficulty is really to define an API which is really simple, and does
> > not just complicate driver development for a (presumably) rare use case.
>
> Yes, I appreciate this. That's why this option is actually my least
> favourite. Anyway, what I was thinking about was just a thin shin that
> *can* be used by a driver to register some particular value with the
> core (so it can be enumerated and accessed by in-kernel clients) and the
> core could (or not) create a sysfs attribute for this value on behalf of
> the driver. Seems lightweight enough, unless previous experience
> suggests otherwise?
>
> Cheers!
>
> Paweł
>
>
>
next prev parent reply other threads:[~2012-10-24 20:01 UTC|newest]
Thread overview: 33+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-10-23 17:30 [lm-sensors] [RFC] Energy/power monitoring within the kernel Pawel Moll
2012-10-23 17:30 ` Pawel Moll
2012-10-23 17:30 ` Pawel Moll
2012-10-23 17:43 ` [lm-sensors] " Steven Rostedt
2012-10-23 17:43 ` Steven Rostedt
2012-10-23 17:43 ` Steven Rostedt
2012-10-24 16:00 ` [lm-sensors] " Pawel Moll
2012-10-24 16:00 ` Pawel Moll
2012-10-24 16:00 ` Pawel Moll
2012-10-23 18:49 ` [lm-sensors] " Andy Green
2012-10-23 18:49 ` Andy Green
2012-10-23 18:49 ` Andy Green
2012-10-24 16:05 ` [lm-sensors] " Pawel Moll
2012-10-24 16:05 ` Pawel Moll
2012-10-24 16:05 ` Pawel Moll
2012-10-23 22:02 ` [lm-sensors] " Guenter Roeck
2012-10-23 22:02 ` Guenter Roeck
2012-10-23 22:02 ` Guenter Roeck
2012-10-24 16:37 ` [lm-sensors] " Pawel Moll
2012-10-24 16:37 ` Pawel Moll
2012-10-24 16:37 ` Pawel Moll
2012-10-24 20:01 ` Guenter Roeck [this message]
2012-10-24 20:01 ` Guenter Roeck
2012-10-24 20:01 ` Guenter Roeck
2012-10-24 0:40 ` [lm-sensors] " Thomas Renninger
2012-10-24 0:40 ` Thomas Renninger
2012-10-24 0:40 ` Thomas Renninger
2012-10-24 16:51 ` [lm-sensors] " Pawel Moll
2012-10-24 16:51 ` Pawel Moll
2012-10-24 16:51 ` Pawel Moll
2012-10-24 0:41 ` [lm-sensors] " Thomas Renninger
2012-10-24 0:41 ` Thomas Renninger
2012-10-24 0:41 ` Thomas Renninger
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20121024200144.GA21137@roeck-us.net \
--to=linux@roeck-us.net \
--cc=amit.kachhap@linaro.org \
--cc=daniel.lezcano@linaro.org \
--cc=fweisbec@gmail.com \
--cc=jean.pihet@newoldbits.com \
--cc=jj@chaosbits.net \
--cc=khali@linux-fr.org \
--cc=linaro-dev@lists.linaro.org \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=lm-sensors@lm-sensors.org \
--cc=mingo@elte.hu \
--cc=pawel.moll@arm.com \
--cc=rostedt@goodmis.org \
--cc=rui.zhang@intel.com \
--cc=trenn@suse.de \
--cc=viresh.kumar@linaro.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.