RE: One (possible) x86 get_user_pages bug

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Kaushik Barde <kbarde@huawei.com>
To: "'Jeremy Fitzhardinge'" <jeremy@goop.org>
Cc: "'Avi Kivity'" <avi@redhat.com>,
	"'Jan Beulich'" <JBeulich@novell.com>,
	"'Xiaowei Yang'" <xiaowei.yang@huawei.com>,
	"'Nick Piggin'" <npiggin@kernel.dk>,
	"'Peter Zijlstra'" <a.p.zijlstra@chello.nl>,
	fanhenglong@huawei.com, "'Kenneth Lee'" <liguozhu@huawei.com>,
	"'linqaingmin'" <linqiangmin@huawei.com>,
	wangzhenguo@huawei.com, "'Wu Fengguang'" <fengguang.wu@intel.com>,
	xen-devel@lists.xensource.com, linux-kernel@vger.kernel.org,
	"'Marcelo Tosatti'" <mtosatti@redhat.com>
Subject: RE: One (possible) x86 get_user_pages bug
Date: Mon, 31 Jan 2011 12:10:04 -0800	[thread overview]
Message-ID: <003301cbc182$da3affc0$8eb0ff40$@com> (raw)
In-Reply-To: <4D46F9AE.80606@goop.org>

<< I'm not sure I follow you here.  The issue with TLB flush IPIs is that
the hypervisor doesn't know the purpose of the IPI and ends up
(potentially) waking up a sleeping VCPU just to flush its tlb - but
since it was sleeping there were no stale TLB entries to flush.>>

That's what I was trying understand, what is "Sleep" here? Is it ACPI sleep
or some internal scheduling state? If vCPUs  are asynchronous to pCPU in
terms of ACPI sleep state, then they need to synced-up. That's where entire
ACPI modeling needs to be considered. That's where KVM may not see this
issue. Maybe I am missing something here.

<< A "few hundred uSecs" is really very slow - that's nearly a
millisecond.  It's worth spending some effort to avoid those kinds of
delays.>>

Actually, just checked IPIs are usually 1000-1500 cycles long (comparable to
VMEXIT). My point is ideal solution should be where virtual platform
behavior is closer to bare metal interrupts, memory, cpu state etc.. How to
do it ? well that's what needs to be figured out :-)

-Kaushik

-----Original Message-----
From: Jeremy Fitzhardinge [mailto:jeremy@goop.org] 
Sent: Monday, January 31, 2011 10:05 AM
To: Kaushik Barde
Cc: 'Avi Kivity'; 'Jan Beulich'; 'Xiaowei Yang'; 'Nick Piggin'; 'Peter
Zijlstra'; fanhenglong@huawei.com; 'Kenneth Lee'; 'linqaingmin';
wangzhenguo@huawei.com; 'Wu Fengguang'; xen-devel@lists.xensource.com;
linux-kernel@vger.kernel.org; 'Marcelo Tosatti'
Subject: Re: One (possible) x86 get_user_pages bug

On 01/30/2011 02:21 PM, Kaushik Barde wrote:
> I agree i.e. deviation from underlying arch consideration is not a good
> idea.
>
> Also, agreed, hypervisor knows which page entries are ready for TLB flush
> across vCPUs. 
>
> But, using above knowledge, along with TLB flush based on IPI is a better
> solution.  Its ability to synchronize it with pCPU based IPI and TLB flush
> across vCPU. is key. 

I'm not sure I follow you here.  The issue with TLB flush IPIs is that
the hypervisor doesn't know the purpose of the IPI and ends up
(potentially) waking up a sleeping VCPU just to flush its tlb - but
since it was sleeping there were no stale TLB entries to flush.

Xen's TLB flush hypercalls can optimise that case by only sending a real
IPI to PCPUs which are actually running target VCPUs.  In other cases,
where a PCPU is known to have stale entries but it isn't running a
relevant VCPU, it can just mark a deferred TLB flush which gets executed
before the VCPU runs again.

In other words, Xen can take significant advantage of getting a
higher-level call ("flush these TLBs") compared just a simple IPI.

Are you suggesting that the hypervisor should export some kind of "known
dirty TLB" table to the guest, and have the guest work out which VCPUs
need IPIs sent to them?  How would that work?

> IPIs themselves should be in few hundred uSecs in terms latency. Also, why
> should pCPU be in sleep state for active vCPU scheduled page workload?

A "few hundred uSecs" is really very slow - that's nearly a
millisecond.  It's worth spending some effort to avoid those kinds of
delays.

    J

> -Kaushik
>
> -----Original Message-----
> From: Avi Kivity [mailto:avi@redhat.com] 
> Sent: Sunday, January 30, 2011 5:02 AM
> To: Jeremy Fitzhardinge
> Cc: Jan Beulich; Xiaowei Yang; Nick Piggin; Peter Zijlstra;
> fanhenglong@huawei.com; Kaushik Barde; Kenneth Lee; linqaingmin;
> wangzhenguo@huawei.com; Wu Fengguang; xen-devel@lists.xensource.com;
> linux-kernel@vger.kernel.org; Marcelo Tosatti
> Subject: Re: One (possible) x86 get_user_pages bug
>
> On 01/27/2011 08:27 PM, Jeremy Fitzhardinge wrote:
>> And even just considering virtualization, having non-IPI-based tlb
>> shootdown is a measurable performance win, since a hypervisor can
>> optimise away a cross-VCPU shootdown if it knows no physical TLB
>> contains the target VCPU's entries.  I can imagine the KVM folks could
>> get some benefit from that as well.
> It's nice to avoid the IPI (and waking up a cpu if it happens to be 
> asleep) but I think the risk of deviating too much from the baremetal 
> arch is too large, as demonstrated by this bug.
>
> (well, async page faults is a counterexample, I wonder if/when it will 
> bite us)
>

next prev parent reply	other threads:[~2011-01-31 20:10 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-01-27 13:05 One (possible) x86 get_user_pages bug Xiaowei Yang
2011-01-27 13:56 ` Peter Zijlstra
2011-01-27 14:30   ` Jan Beulich
2011-01-28 10:51     ` Peter Zijlstra
2011-01-27 14:49 ` Jan Beulich
2011-01-27 15:01   ` Peter Zijlstra
2011-01-27 18:27   ` Jeremy Fitzhardinge
2011-01-27 19:27     ` Peter Zijlstra
2011-01-30 13:01     ` Avi Kivity
2011-01-30 22:21       ` Kaushik Barde
2011-01-31 18:04         ` Jeremy Fitzhardinge
2011-01-31 20:10           ` Kaushik Barde [this message]
2011-01-31 22:10             ` Jeremy Fitzhardinge
2011-01-27 16:07 ` Jan Beulich
2011-01-27 16:25   ` Peter Zijlstra
2011-01-27 16:41     ` Jan Beulich
2011-01-27 16:56   ` Peter Zijlstra
2011-01-27 21:24   ` Nick Piggin
2011-01-28  7:17     ` Xiaowei Yang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='003301cbc182$da3affc0$8eb0ff40$@com' \
    --to=kbarde@huawei.com \
    --cc=JBeulich@novell.com \
    --cc=a.p.zijlstra@chello.nl \
    --cc=avi@redhat.com \
    --cc=fanhenglong@huawei.com \
    --cc=fengguang.wu@intel.com \
    --cc=jeremy@goop.org \
    --cc=liguozhu@huawei.com \
    --cc=linqiangmin@huawei.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mtosatti@redhat.com \
    --cc=npiggin@kernel.dk \
    --cc=wangzhenguo@huawei.com \
    --cc=xen-devel@lists.xensource.com \
    --cc=xiaowei.yang@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox