Re: Question about x86/mm/gup.c's use of disabled interrupts

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Avi Kivity <avi@redhat.com>
To: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Linux Memory Management List <linux-mm@kvack.org>,
	Xen-devel <xen-devel@lists.xensource.com>,
	Jan Beulich <jbeulich@novell.com>, Ingo Molnar <mingo@elte.hu>
Subject: Re: Question about x86/mm/gup.c's use of disabled interrupts
Date: Thu, 19 Mar 2009 11:46:27 +0200	[thread overview]
Message-ID: <49C21473.2000702@redhat.com> (raw)
In-Reply-To: <49C18487.1020703@goop.org>

Jeremy Fitzhardinge wrote:
>>>
>>> Well, no, not deferring.  Making xen_flush_tlb_others() spin waiting 
>>> for "doing_gup" to clear on the target cpu.  Or add an explicit 
>>> notion of a "pte update barrier" rather than implicitly relying on 
>>> the tlb IPI (which is extremely convenient when available...).
>>
>> Pick up a percpu flag from all cpus and spin on each?  Nasty.
>
> Yeah, not great.  Each of those flag fetches is likely to be cold, so 
> a bunch of cache misses.  The only mitigating factor is that cross-cpu 
> tlb flushes are expected to be expensive, but some workloads are 
> apparently very sensitive to extra latency in that path.  

Right, and they'll do a bunch more cache misses, so in comparison it 
isn't too bad.

> And the hypercall could result in no Xen-level IPIs at all, so it 
> could be very quick by comparison to an IPI-based Linux 
> implementation, in which case the flag polling would be particularly 
> harsh.

Maybe we could bring these optimizations into Linux as well.  The only 
thing Xen knows that Linux doesn't is if a vcpu is not scheduled; all 
other information is shared.

>
> Also, the straightforward implementation of "poll until all target 
> cpu's flags are clear" may never make progress, so you'd have to "scan 
> flags, remove busy cpus from set, repeat until all cpus done".
>
> All annoying because this race is pretty unlikely, and it seems a 
> shame to slow down all tlb flushes to deal with it.  Some kind of 
> global "doing gup_fast" counter would get flush_tlb_others bypass the 
> check, at the cost of putting a couple of atomic ops around the 
> outside of gup_fast.

The nice thing about local_irq_disable() is that it scales so well.

>
>> You could use the irq enabled flag; it's available and what native 
>> spins on (but also means I'll need to add one if I implement this).
>
> Yes, but then we'd end up spuriously polling on cpus which happened to 
> disable interrupts for any reason.  And if the vcpu is not running 
> then we could end up polling for a long time.  (Same applies for 
> things in gup_fast, but I'm assuming that's a lot less common than 
> disabling interrupts in general).

Right.

-- 
error compiling committee.c: too many arguments to function

WARNING: multiple messages have this Message-ID (diff)

From: Avi Kivity <avi@redhat.com>
To: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Linux Memory Management List <linux-mm@kvack.org>,
	Xen-devel <xen-devel@lists.xensource.com>,
	Jan Beulich <jbeulich@novell.com>, Ingo Molnar <mingo@elte.hu>
Subject: Re: Question about x86/mm/gup.c's use of disabled interrupts
Date: Thu, 19 Mar 2009 11:46:27 +0200	[thread overview]
Message-ID: <49C21473.2000702@redhat.com> (raw)
In-Reply-To: <49C18487.1020703@goop.org>

Jeremy Fitzhardinge wrote:
>>>
>>> Well, no, not deferring.  Making xen_flush_tlb_others() spin waiting 
>>> for "doing_gup" to clear on the target cpu.  Or add an explicit 
>>> notion of a "pte update barrier" rather than implicitly relying on 
>>> the tlb IPI (which is extremely convenient when available...).
>>
>> Pick up a percpu flag from all cpus and spin on each?  Nasty.
>
> Yeah, not great.  Each of those flag fetches is likely to be cold, so 
> a bunch of cache misses.  The only mitigating factor is that cross-cpu 
> tlb flushes are expected to be expensive, but some workloads are 
> apparently very sensitive to extra latency in that path.  

Right, and they'll do a bunch more cache misses, so in comparison it 
isn't too bad.

> And the hypercall could result in no Xen-level IPIs at all, so it 
> could be very quick by comparison to an IPI-based Linux 
> implementation, in which case the flag polling would be particularly 
> harsh.

Maybe we could bring these optimizations into Linux as well.  The only 
thing Xen knows that Linux doesn't is if a vcpu is not scheduled; all 
other information is shared.

>
> Also, the straightforward implementation of "poll until all target 
> cpu's flags are clear" may never make progress, so you'd have to "scan 
> flags, remove busy cpus from set, repeat until all cpus done".
>
> All annoying because this race is pretty unlikely, and it seems a 
> shame to slow down all tlb flushes to deal with it.  Some kind of 
> global "doing gup_fast" counter would get flush_tlb_others bypass the 
> check, at the cost of putting a couple of atomic ops around the 
> outside of gup_fast.

The nice thing about local_irq_disable() is that it scales so well.

>
>> You could use the irq enabled flag; it's available and what native 
>> spins on (but also means I'll need to add one if I implement this).
>
> Yes, but then we'd end up spuriously polling on cpus which happened to 
> disable interrupts for any reason.  And if the vcpu is not running 
> then we could end up polling for a long time.  (Same applies for 
> things in gup_fast, but I'm assuming that's a lot less common than 
> disabling interrupts in general).

Right.

-- 
error compiling committee.c: too many arguments to function

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

next prev parent reply	other threads:[~2009-03-19  9:47 UTC|newest]

Thread overview: 43+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-03-18 19:17 Question about x86/mm/gup.c's use of disabled interrupts Jeremy Fitzhardinge
2009-03-18 19:17 ` Jeremy Fitzhardinge
2009-03-18 19:17 ` Jeremy Fitzhardinge
2009-03-18 21:13 ` Avi Kivity
2009-03-18 21:13   ` Avi Kivity
2009-03-18 21:23   ` Jeremy Fitzhardinge
2009-03-18 21:23     ` Jeremy Fitzhardinge
2009-03-18 21:23     ` Jeremy Fitzhardinge
2009-03-18 21:40     ` Avi Kivity
2009-03-18 21:40       ` Avi Kivity
2009-03-18 22:14       ` Jeremy Fitzhardinge
2009-03-18 22:14         ` Jeremy Fitzhardinge
2009-03-18 22:14         ` Jeremy Fitzhardinge
2009-03-18 22:41         ` Avi Kivity
2009-03-18 22:41           ` Avi Kivity
2009-03-18 22:55           ` Jeremy Fitzhardinge
2009-03-18 22:55             ` Jeremy Fitzhardinge
2009-03-18 23:05             ` Avi Kivity
2009-03-18 23:05               ` Avi Kivity
2009-03-18 23:05               ` Avi Kivity
2009-03-18 23:32               ` Jeremy Fitzhardinge
2009-03-18 23:32                 ` Jeremy Fitzhardinge
2009-03-19  9:46                 ` Avi Kivity [this message]
2009-03-19  9:46                   ` Avi Kivity
2009-03-19 17:16                   ` Jeremy Fitzhardinge
2009-03-19 17:16                     ` Jeremy Fitzhardinge
2009-03-19 17:16                     ` Jeremy Fitzhardinge
2009-03-19 17:33                     ` Avi Kivity
2009-03-19 17:33                       ` Avi Kivity
2009-04-03  2:41                 ` paravirtops kernel and HVM guests Nitin A Kamble
2009-04-03  3:37                   ` Jeremy Fitzhardinge
     [not found]               ` <70513aa50903181617r418ec23s744544dccfd812e8@mail.gmail.com>
2009-03-18 23:37                 ` Question about x86/mm/gup.c's use of disabled interrupts Jeremy Fitzhardinge
2009-03-18 23:37                   ` Jeremy Fitzhardinge
2009-03-19  1:32 ` Nick Piggin
2009-03-19  1:32   ` Nick Piggin
2009-03-19 17:31   ` Jeremy Fitzhardinge
2009-03-19 17:31     ` Jeremy Fitzhardinge
2009-03-20  4:40     ` Paul E. McKenney
2009-03-20  4:40       ` Paul E. McKenney
2009-03-20 15:38       ` Jeremy Fitzhardinge
2009-03-20 15:38         ` Jeremy Fitzhardinge
2009-03-20 15:57         ` Paul E. McKenney
2009-03-20 15:57           ` Paul E. McKenney

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=49C21473.2000702@redhat.com \
    --to=avi@redhat.com \
    --cc=jbeulich@novell.com \
    --cc=jeremy@goop.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mingo@elte.hu \
    --cc=nickpiggin@yahoo.com.au \
    --cc=xen-devel@lists.xensource.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.