From: Ingo Molnar <mingo@elte.hu>
To: Thomas Renninger <trenn@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
Andrew Morton <akpm@linux-foundation.org>,
Thomas Gleixner <tglx@linutronix.de>,
Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>,
Frank Eigler <fche@redhat.com>,
Steven Rostedt <rostedt@goodmis.org>,
Kevin Hilman <khilman@deeprootsystems.com>,
Peter Zijlstra <peterz@infradead.org>,
linux-omap@vger.kernel.org, rjw@sisk.pl,
linux-pm@lists.linux-foundation.org,
linux-trace-users@vger.kernel.org,
Jean Pihet <jean.pihet@newoldbits.com>,
Pierre Tardy <tardyp@gmail.com>,
Frederic Weisbecker <fweisbec@gmail.com>,
Tejun Heo <tj@kernel.org>,
Mathieu Desnoyers <mathieu.desnoyers@efficios.com>,
Arjan van de Ven <arjan@linux.intel.com>
Subject: Re: [PATCH] PERF(kernel): Cleanup power events V2
Date: Tue, 26 Oct 2010 13:19:20 +0200 [thread overview]
Message-ID: <20101026111920.GA27258@elte.hu> (raw)
In-Reply-To: <201010261237.49386.trenn@suse.de>
* Thomas Renninger <trenn@suse.de> wrote:
> On Tuesday 26 October 2010 09:10:20 Ingo Molnar wrote:
> >
> > * Thomas Renninger <trenn@suse.de> wrote:
> >
> > > Changes in V2:
> > > - Introduce PWR_EVENT_EXIT instead of 0 to mark non-power state
> > > - Use u32 instead of u64 for cpuid, state which is by far enough
> > >
> > > New power trace events:
> > > power:processor_idle
> > > power:processor_frequency
> > > power:machine_suspend
> > >
> > >
> > > C-state/idle accounting events:
> > > power:power_start
> > > power:power_end
> > > are replaced with:
> > > power:processor_idle
> > >
> > > and
> > > power:power_frequency
> > > is replaced with:
> > > power:processor_frequency
> >
> > Could you please name it power:cpu_idle and power:cpu_frequency instead, for
> > shortness? We generally use 'cpu' in the kernel and for events.
> Sure.
> >
> > > power:machine_suspend
> >
> > How will future PCI (or other device) power saving tracepoints be called?
> >
> > Might be more consistent to use:
> >
> > power:cpu_idle
> > power:machine_idle
> > power:device_idle
>
> device idle is not true. Those may be low power modes
> like reduced network throughput, reduced wlan range, the device
> needs not to be idle.
> Device power states is probably the most complex area, if such
> a thing gets introduced, it should makes sense to state
> the interface experimental for some time until a wider range of
> devices uses it (in contrast to these new ones
> which should not change that soon anymore...).
Ok.
> Also machine_idle may be true, but machine_suspend sounds more
> familiar and everyone immediately knows what the event is about.
Ok - fair enough.
> > > +#define PWR_EVENT_EXIT 0xFFFFFFFF
> >
> > Shouldnt this be part of the POWER_ enum? (and you can write -1 there)
>
> No, below enum will vanish, but -1 is nicer.
When it vanishes what will replace it?
> ...
>
> > Plus:
> >
> > > +DECLARE_EVENT_CLASS(processor,
> > > +
> > > + TP_PROTO(unsigned int state, unsigned int cpu_id),
> > > +
> > > + TP_ARGS(state, cpu_id),
> > > +
> > > + TP_STRUCT__entry(
> > > + __field( u32, state )
> > > + __field( u32, cpu_id )
> >
> > Trace entries can carry a cpu_id of the current processor already. Can this cpu_id
> > ever be different from that CPU id?
>
> Yes. A core's frequency can depend on another one which
> will get switched as well (by one command/MSR).
> Compare with commit 6f4f2723d08534fd4e407e1e.
>
> This can theoretically also be the case for sleep states.
> Afaik such HW does not exist yet, but ACPI spec already
> provides interfaces to pass these dependency from BIOS to OS.
> -> We want a stable ABI and should be prepared for such stuff.
>
> > > + ),
> > > +
> > > + TP_fast_assign(
> > > + __entry->state = state;
> > > + __entry->cpu_id = cpu_id;
> > > + ),
> > > +
> > > + TP_printk("state=%lu cpu_id=%lu", (unsigned long)__entry->state,
> > > + (unsigned long)__entry->cpu_id)
> > > +);
> > > +
> > > +DEFINE_EVENT(processor, processor_idle,
> > > +
> > > + TP_PROTO(unsigned int state, unsigned int cpu_id),
> > > +
> > > + TP_ARGS(state, cpu_id)
> > > +);
> > > +
> > > +#define PWR_EVENT_EXIT 0xFFFFFFFF
> > > +
> > > +DEFINE_EVENT(processor, processor_frequency,
> > > +
> > > + TP_PROTO(unsigned int frequency, unsigned int cpu_id),
> > > +
> > > + TP_ARGS(frequency, cpu_id)
> > > +);
> >
> > So, we have a 'state' field in the class, which is used as 'state' by the
> > power::cpu_idle event, and as 'frequency' by the power::cpu_freq event?
>
> Yes, is this a problem?
>
> Definitions are a bit shorter having one power processor class.
> As "frequency" is stated in frequency event definition everything should
> be obvious and this one looks like the more elegant way to me.
>
> > Are there any architectures that track frequency in Hz, not in kHz? If yes, might
> > there ever be a need for the frequency value to be larger than 4.29 GHz? If yes,
> > then it wont fit into u32.
>
> drivers/cpufreq subsystem is fixed to unsigned int (cmp. include/linux/cpufreq.h):
> unsigned int min; /* in kHz */
> unsigned int max; /* in kHz */
> unsigned int cur; /* in kHz,
> ...
> that should be fine.
ok, good - so we should be fine up to 4 THz CPUs.
> > Also, might there be a future need to express different types of frequencies?
> > For example, should we decide to track turbo frequencies in Intel CPUs, how
> > would that be expressed via these events? Are there any architectures and CPUs
> > that somehow have some extra attribute to the frequency value?
>
> I wonder whether this ever can/will work in a sane way.
> Userspace can compare with:
> /sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_max_freq
> everything above is turbo. So I do not think it's ever needed.
> But adding an additional value at the end does not violate
> userspace compatibility. This has been done with the cpuid
> as well.
>
> > > +TRACE_EVENT(machine_suspend,
> > > +
> > > + TP_PROTO(unsigned int state),
> > > +
> > > + TP_ARGS(state),
> > > +
> > > + TP_STRUCT__entry(
> > > + __field( u32, state )
> > > + ),
> >
> > Hm, this event is not used anywhere in the submitted patches. Where is the patch
> > that adds usage, and what are the possible values for 'state'?
>
> Jean wants to make use of it on ARM.
> I also had patch for x86, I can have another look at it, Rafael
> already gave me a comment on it. But on X86 you typically realize
> when you suspend the machine (could imagine this is more useful on
> ARM driven mobile phones and similar), still I can add it..
>
> Values probably should be (include/linux/suspend.h):
> #define PM_SUSPEND_ON 0
> #define PM_SUSPEND_STANDBY 1
> #define PM_SUSPEND_MEM 3
> #define PM_SUSPEND_MAX 4
>
> How this strange state Arjan talked about is passed is up
> to these guys. Instead of using 0 and above pre-defined such
> arch specific special states better should get passed by:
>
> #define X86_MOORESTOWN_STANDBY_S0 0x100
> .. 0x101
> #define ARM_WHATEVER_STRANGE_THING 0x200
> ...
I'd rather like to see a meaningful name to be given to these states and them being
passed, instead of weird platform specific things. Tooling will try to be as generic
as possible.
I dont know this stuff, but making a distinction between s2ram and s2disk events
would seem meaningful.
Thanks,
Ingo
next prev parent reply other threads:[~2010-10-26 11:19 UTC|newest]
Thread overview: 65+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <1287488171-25303-1-git-send-email-trenn@suse.de>
2010-10-19 11:36 ` [PATCH 1/3] PERF: Do not export power_frequency, but power_start event Thomas Renninger
2010-10-19 11:36 ` [PATCH 2/3] PERF(kernel): Cleanup power events Thomas Renninger
2010-10-25 6:54 ` Arjan van de Ven
2010-10-25 9:41 ` Thomas Renninger
2010-10-25 13:55 ` Arjan van de Ven
2010-10-25 14:36 ` Thomas Renninger
2010-10-25 14:45 ` Arjan van de Ven
2010-10-25 14:56 ` Ingo Molnar
2010-10-25 15:48 ` Thomas Renninger
2010-10-25 16:00 ` Arjan van de Ven
2010-10-25 23:32 ` Thomas Renninger
2010-10-25 6:58 ` Arjan van de Ven
2010-10-25 10:04 ` Ingo Molnar
2010-10-25 11:03 ` Thomas Renninger
2010-10-25 11:55 ` Ingo Molnar
2010-10-25 12:55 ` Thomas Renninger
2010-10-25 14:11 ` Arjan van de Ven
2010-10-25 14:51 ` Thomas Renninger
2010-10-25 12:58 ` Mathieu Desnoyers
2010-10-25 20:29 ` Rafael J. Wysocki
2010-10-25 13:58 ` Arjan van de Ven
2010-10-25 20:33 ` Rafael J. Wysocki
2010-10-25 23:33 ` [PATCH] PERF(kernel): Cleanup power events V2 Thomas Renninger
2010-10-26 1:09 ` Arjan van de Ven
2010-10-26 7:10 ` Ingo Molnar
2010-10-26 8:08 ` Jean Pihet
2010-10-26 11:21 ` Ingo Molnar
2010-10-26 11:48 ` Thomas Renninger
2010-10-26 11:54 ` Ingo Molnar
2010-10-26 13:17 ` Thomas Renninger
2010-10-26 13:35 ` Thomas Renninger
2010-10-26 18:57 ` Rafael J. Wysocki
2010-10-27 0:00 ` Thomas Renninger
2010-10-27 9:16 ` Rafael J. Wysocki
2010-10-26 9:58 ` Arjan van de Ven
2010-10-26 10:19 ` Ingo Molnar
2010-10-26 10:37 ` Thomas Renninger
2010-10-26 11:19 ` Ingo Molnar [this message]
2010-10-26 19:01 ` Rafael J. Wysocki
2010-10-26 15:32 ` Pierre Tardy
2010-10-26 16:04 ` Arjan van de Ven
2010-10-26 16:56 ` Pierre Tardy
2010-10-26 17:58 ` Peter Zijlstra
2010-10-26 18:14 ` Mathieu Desnoyers
2010-10-26 18:50 ` [linux-pm] " Alan Stern
2010-10-26 21:33 ` Mathieu Desnoyers
2010-10-26 22:20 ` Rafael J. Wysocki
2010-10-26 22:39 ` Rafael J. Wysocki
2010-10-27 0:46 ` Mathieu Desnoyers
2010-10-27 10:22 ` Rafael J. Wysocki
2010-10-27 12:21 ` Mathieu Desnoyers
2010-10-27 21:43 ` Rafael J. Wysocki
2010-10-26 19:04 ` Rafael J. Wysocki
2010-10-26 21:38 ` Mathieu Desnoyers
2010-10-26 22:22 ` Rafael J. Wysocki
2010-10-26 18:15 ` Pierre Tardy
2010-10-26 19:08 ` Rafael J. Wysocki
2010-10-26 20:23 ` Pierre Tardy
2010-10-26 20:38 ` Rafael J. Wysocki
2010-10-26 20:52 ` Arjan van de Ven
2010-10-26 21:17 ` Rafael J. Wysocki
2010-10-26 7:59 ` Jean Pihet
2010-10-26 18:52 ` Rafael J. Wysocki
2010-10-19 11:36 ` [PATCH 3/3] PERF(userspace): Adjust perf timechart to the new power events Thomas Renninger
2010-10-26 0:18 ` [PATCH] PERF(userspace): Adjust perf timechart to the new power events V2 Thomas Renninger
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20101026111920.GA27258@elte.hu \
--to=mingo@elte.hu \
--cc=akpm@linux-foundation.org \
--cc=arjan@linux.intel.com \
--cc=fche@redhat.com \
--cc=fweisbec@gmail.com \
--cc=jean.pihet@newoldbits.com \
--cc=khilman@deeprootsystems.com \
--cc=linux-omap@vger.kernel.org \
--cc=linux-pm@lists.linux-foundation.org \
--cc=linux-trace-users@vger.kernel.org \
--cc=masami.hiramatsu.pt@hitachi.com \
--cc=mathieu.desnoyers@efficios.com \
--cc=peterz@infradead.org \
--cc=rjw@sisk.pl \
--cc=rostedt@goodmis.org \
--cc=tardyp@gmail.com \
--cc=tglx@linutronix.de \
--cc=tj@kernel.org \
--cc=torvalds@linux-foundation.org \
--cc=trenn@suse.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox