All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Li, Aubrey" <aubrey.li@linux.intel.com>
To: "Rafael J. Wysocki" <rjw@rjwysocki.net>, Mike Galbraith <efault@gmx.de>
Cc: Aubrey Li <aubrey.li@intel.com>,
	tglx@linutronix.de, peterz@infradead.org, len.brown@intel.com,
	ak@linux.intel.com, tim.c.chen@linux.intel.com,
	linux-pm@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [RFC PATCH v2 2/8] cpuidle: record the overhead of idle entry
Date: Tue, 17 Oct 2017 15:04:11 +0800	[thread overview]
Message-ID: <fd85c52f-98f2-e847-b534-30f7907ecf11@linux.intel.com> (raw)
In-Reply-To: <10643458.iWc7GTROAz@aspire.rjw.lan>

On 2017/10/17 8:05, Rafael J. Wysocki wrote:
> On Monday, October 16, 2017 5:11:57 AM CEST Li, Aubrey wrote:
>> On 2017/10/14 8:35, Rafael J. Wysocki wrote:
>>> On Saturday, September 30, 2017 9:20:28 AM CEST Aubrey Li wrote:
>>>> Record the overhead of idle entry in micro-second
>>>>
>>>
>>> What is this needed for?
>>
>> We need to figure out how long of a idle is a short idle and recording
>> the overhead is for this purpose. The short idle threshold is based
>> on this overhead.
> 
> I don't really understand this statement.
> 
> Pretent I'm not familiar with this stuff and try to explain it to me. :-)
> 

Okay, let me try, :-)

Today what we did in idle loop as follows:

do_idle {
	idle_entry {
	- deferrable stuff like quiet_vmstat
	- turn off tick(without looking at historical/predicted idle interval)
	- rcu idle enter, c-state selection, etc
	}

	idle_call {
	- poll or halt or mwait
	}

	idle_exit {
	- rcu idle exit
	- restore the tick if tick is stopped before enter idle
	}
}

And we already measured idle_entry and idle_exit costs several micro-seconds,
say 10us.

Now if idle_call is 1000us, much larger than idle_entry and idle_exit, we can
ignore the time cost in idle_entry and idle_exit.

But for some workloads with short idle pattern, like netperf, the idle_call
is 2us, then idle_entry and idle_exit start to dominate. If we can reduce the
time in idle_entry and idle_exit, we then get better workload performance
significantly.

Modem high-speed network and low-latency I/O like Nvme disk has this requirement.
Mike's patch was made several years ago though I don't know the details. Here is
an article related to this.
https://cacm.acm.org/magazines/2017/4/215032-attack-of-the-killer-microseconds/fulltext

>>>
>>>> +void cpuidle_entry_end(void)
>>>> +{
>>>> +	struct cpuidle_device *dev = cpuidle_get_device();
>>>> +	u64 overhead;
>>>> +	s64 diff;
>>>> +
>>>> +	if (dev) {
>>>> +		dev->idle_stat.entry_end = local_clock();
>>>> +		overhead = div_u64(dev->idle_stat.entry_end -
>>>> +				dev->idle_stat.entry_start, NSEC_PER_USEC);
>>>
>>> Is the conversion really necessary?
>>>
>>> If so, then why?
>>
>> We can choose nano-second and micro-second. Given that workload results
>> in the short idle pattern, I think micro-second is good enough for the
>> real workload.
>>
>> Another reason is that prediction from idle governor is micro-second, so
>> I convert it for comparing purpose.
>>>
>>> And if there is a good reason, what about using right shift to do
>>> an approximate conversion to avoid the extra division here?
>>
>> Sure >> 10 works for me as I don't think here precision is a big deal.
>>
>>>
>>>> +		diff = overhead - dev->idle_stat.overhead;
>>>> +		dev->idle_stat.overhead += diff >> 3;
>>>
>>> Can you please explain what happens in the two lines above?
>>
>> Online average computing algorithm, stolen from update_avg() @ kernel/sched/core.c.
> 
> OK
> 
> Maybe care to add a comment to that effect?

Sure, I'll add in the next version.

Thanks,
-Aubrey

  reply	other threads:[~2017-10-17  7:04 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-09-30  7:20 [RFC PATCH v2 0/8] Introduct cpu idle prediction functionality Aubrey Li
2017-09-30  7:20 ` [RFC PATCH v2 1/8] cpuidle: menu: extract " Aubrey Li
2017-10-14  0:26   ` Rafael J. Wysocki
2017-10-16  2:46     ` Li, Aubrey
2017-09-30  7:20 ` [RFC PATCH v2 2/8] cpuidle: record the overhead of idle entry Aubrey Li
2017-10-14  0:35   ` Rafael J. Wysocki
2017-10-16  3:11     ` Li, Aubrey
2017-10-17  0:05       ` Rafael J. Wysocki
2017-10-17  7:04         ` Li, Aubrey [this message]
2017-09-30  7:20 ` [RFC PATCH v2 3/8] cpuidle: add a new predict interface Aubrey Li
2017-10-14  0:45   ` Rafael J. Wysocki
2017-10-16  8:04     ` Li, Aubrey
2017-10-14  1:27   ` Rafael J. Wysocki
2017-10-16  9:52     ` Li, Aubrey
2017-09-30  7:20 ` [RFC PATCH v2 4/8] tick/nohz: keep tick on for a fast idle Aubrey Li
2017-10-14  0:51   ` Rafael J. Wysocki
2017-10-16  3:26     ` Li, Aubrey
2017-10-16  4:45       ` Mike Galbraith
2017-10-16  5:34         ` Li, Aubrey
2017-10-16  6:25           ` Mike Galbraith
2017-10-16  6:31             ` Li, Aubrey
2017-09-30  7:20 ` [RFC PATCH v2 5/8] timers: keep sleep length updated as needed Aubrey Li
2017-10-14  0:56   ` Rafael J. Wysocki
2017-10-16  6:46     ` Li, Aubrey
2017-10-16 23:58       ` Rafael J. Wysocki
2017-10-17  6:10         ` Li, Aubrey
2017-09-30  7:20 ` [RFC PATCH v2 6/8] cpuidle: make fast idle threshold tunable Aubrey Li
2017-10-14  0:59   ` Rafael J. Wysocki
2017-10-16  6:00     ` Li, Aubrey
2017-10-17  0:01       ` Rafael J. Wysocki
2017-10-17  6:12         ` Li, Aubrey
2017-09-30  7:20 ` [RFC PATCH v2 7/8] cpuidle: introduce irq timing to make idle prediction Aubrey Li
2017-10-14  1:01   ` Rafael J. Wysocki
2017-10-16  6:03     ` Li, Aubrey
2017-09-30  7:20 ` [RFC PATCH v2 8/8] cpuidle: introduce run queue average idle " Aubrey Li
2017-10-14  1:02   ` Rafael J. Wysocki
2017-10-14  1:14 ` [RFC PATCH v2 0/8] Introduct cpu idle prediction functionality Rafael J. Wysocki
2017-10-16  7:44   ` Li, Aubrey
2017-10-17  0:07     ` Rafael J. Wysocki
2017-10-17  7:32       ` Li, Aubrey
2017-11-30  1:00 ` Li, Aubrey
2017-11-30  1:37   ` Rafael J. Wysocki

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=fd85c52f-98f2-e847-b534-30f7907ecf11@linux.intel.com \
    --to=aubrey.li@linux.intel.com \
    --cc=ak@linux.intel.com \
    --cc=aubrey.li@intel.com \
    --cc=efault@gmx.de \
    --cc=len.brown@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pm@vger.kernel.org \
    --cc=peterz@infradead.org \
    --cc=rjw@rjwysocki.net \
    --cc=tglx@linutronix.de \
    --cc=tim.c.chen@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.