linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Venki Pallipadi <venki@google.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@redhat.com>, "H. Peter Anvin" <hpa@zytor.com>,
	Suresh Siddha <suresh.b.siddha@intel.com>,
	Aaron Durbin <adurbin@google.com>, Paul Turner <pjt@google.com>,
	linux-kernel@vger.kernel.org
Subject: Re: [RFC] Extend mwait idle to optimize away IPIs when possible
Date: Mon, 6 Feb 2012 13:26:19 -0800	[thread overview]
Message-ID: <CABeCy1ayUHxpyqwYQKUZcGCBpiqDZyMcaB4bz7JuEycrZV6CEw@mail.gmail.com> (raw)
In-Reply-To: <1328562166.2482.40.camel@laptop>

On Mon, Feb 6, 2012 at 1:02 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Mon, 2012-02-06 at 12:42 -0800, Venkatesh Pallipadi wrote:
>> smp_call_function_single and ttwu_queue_remote sends unconditional IPI
>> to target CPU. However, if the target CPU is in mwait based idle, we can
>> do IPI-less wakeups using the magical powers of monitor-mwait.
>> Doing this has certain advantages:
>> * Lower overhead on Async IPI send path. Measurements on Westmere based
>>   systems show savings on "no wait" smp_call_function_single with idle
>>   target CPU (as measured on the sender side).
>>   local socket smp_call_func cost goes from ~1600 to ~1200 cycles
>>   remote socket smp_call_func cost goes from ~2000 to ~1800 cycles
>> * Avoiding actual interrupts shows a measurable reduction (10%) in system
>>   non-idle cycles and cache-references with micro-benchmark sending IPI from
>>   one CPU to all the other mostly idle CPUs in the system.
>> * On a mostly idle system, turbostat shows a tiny decrease in C0(active) time
>>   and a corresponding increase in C6 state (Each row being 10min avg)
>>           %c0   %c1   %c6
>>   Before
>>   Run 1  1.51  2.93 95.55
>>   Run 2  1.48  2.86 95.65
>>   Run 3  1.46  2.78 95.74
>>   After
>>   Run 1  1.35  2.63 96.00
>>   Run 2  1.46  2.78 95.74
>>   Run 3  1.37  2.63 95.98
>>
>> * As a bonus, we can avoid sched/call IPI overhead altogether in a special case.
>>   When CPU Y has woken up CPU X (which can take 50-100us to actually wakeup
>>   from a deep idle state) and CPU Z wants to send IPI to CPU X in this period.
>>   It can get it for free.
>>
>> We started looking at this with one of our workloads where system is partially
>> busy and we noticed some kernel hotspots in find_next_bit and
>> default_send_IPI_mask_sequence_phys coming from sched wakeup (futex wakeups)
>> and networking call functions. So, this change addresses those two specific
>> IPI types. This could be extended to nohz_kick, etc.
>>
>> Note:
>> * This only helps when target CPU is idle. When it is busy we will still send
>>   IPI as before.
>> * Only for X86_64 and mwait_idle_with_hints for now, with limited testing.
>> * Will need some accounting for these wakeups exported for powertop and friends.
>>
>> Comments?
>
> Curiously you avoided the existing tsk_is_polling() magic, which IIRC is
> doing something similar for waking from the idle loop.
>

Yes. That needs remote CPU's current task, which extends onto rq lock,
which I was trying to avoid. So, I went with conditional waiting on
idle exit for the small window of WAKING to WOKEN state change, as we
know we are always polling in the mwait loop.

  reply	other threads:[~2012-02-06 21:26 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-02-06 20:42 [RFC] Extend mwait idle to optimize away IPIs when possible Venkatesh Pallipadi
2012-02-06 21:02 ` Peter Zijlstra
2012-02-06 21:26   ` Venki Pallipadi [this message]
2012-02-07  0:26 ` David Daney
2012-02-07  1:24   ` H. Peter Anvin
2012-02-07  1:34     ` David Daney
2012-02-07  1:45       ` H. Peter Anvin
2012-02-07  2:03         ` Venki Pallipadi
2012-02-07  2:24 ` Suresh Siddha
2012-02-07 21:39   ` Venki Pallipadi
2012-02-08  6:51 ` Yong Zhang
2012-02-08 23:28   ` Venki Pallipadi
2012-02-09  2:18     ` Yong Zhang
2012-02-10  2:17       ` Venki Pallipadi
2012-02-13  5:27         ` Yong Zhang
2012-02-10 19:19 ` Peter Zijlstra
2012-02-11  2:11   ` Venki Pallipadi
2012-02-11  3:09     ` Peter Zijlstra
2012-02-13  5:34     ` Yong Zhang
2012-02-14 13:52       ` Peter Zijlstra
2012-02-15  1:39         ` Yong Zhang
2012-02-15  2:32         ` Venki Pallipadi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CABeCy1ayUHxpyqwYQKUZcGCBpiqDZyMcaB4bz7JuEycrZV6CEw@mail.gmail.com \
    --to=venki@google.com \
    --cc=adurbin@google.com \
    --cc=hpa@zytor.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=pjt@google.com \
    --cc=suresh.b.siddha@intel.com \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).