All of lore.kernel.org
 help / color / mirror / Atom feed
From: Eric Dumazet <dada1@cosmosbay.com>
To: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: Avi Kivity <avi@redhat.com>, Nick Piggin <npiggin@suse.de>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Linux Memory Management List <linux-mm@kvack.org>,
	the arch/x86 maintainers <x86@kernel.org>
Subject: Re: [PATCH 1/2] x86/mm: maintain a percpu "in get_user_pages_fast" flag
Date: Sat, 28 Mar 2009 08:54:28 +0100	[thread overview]
Message-ID: <49CDD7B4.4020701@cosmosbay.com> (raw)
In-Reply-To: <49CDAF17.5060207@goop.org>

Jeremy Fitzhardinge a écrit :
> Avi Kivity wrote:
>> Jeremy Fitzhardinge wrote:
>>> get_user_pages_fast() relies on cross-cpu tlb flushes being a barrier
>>> between clearing and setting a pte, and before freeing a pagetable page.
>>> It usually does this by disabling interrupts to hold off IPIs, but
>>> some tlb flush implementations don't use IPIs for tlb flushes, and
>>> must use another mechanism.
>>>
>>> In this change, add in_gup_cpumask, which is a cpumask of cpus currently
>>> performing a get_user_pages_fast traversal of a pagetable.  A cross-cpu
>>> tlb flush function can use this to determine whether it should hold-off
>>> on the flush until the gup_fast has finished.
>>>
>>> @@ -255,6 +260,10 @@ int get_user_pages_fast(unsigned long start, int
>>> nr_pages, int write,
>>>      * address down to the the page and take a ref on it.
>>>      */
>>>     local_irq_disable();
>>> +
>>> +    cpu = smp_processor_id();
>>> +    cpumask_set_cpu(cpu, in_gup_cpumask);
>>> +
>>
>> This will bounce a cacheline, every time.  Please wrap in CONFIG_XEN
>> and skip at runtime if Xen is not enabled.
> 
> Every time?  Only when running successive gup_fasts on different cpus,
> and only twice per gup_fast. (What's the typical page count?  I see that
> kvm and lguest are page-at-a-time users, but presumably direct IO has
> larger batches.)

If I am not mistaken, shared futexes where hitting hard mm semaphore.
Then gup_fast was introduced in kernel/futex.c to remove this contention point.
Yet, this contention point was process specific, not a global one :)

And now, you want to add a global hot point, that would slow
down unrelated processes, only because they use shared futexes, thousand
times per second...

> 
> Alternatively, it could have per-cpu flags and the other side could
> construct the mask (I originally had that, but this was simpler).

Simpler but would be a regression for legacy applications still using shared
futexes (because statically linked with old libc)



WARNING: multiple messages have this Message-ID (diff)
From: Eric Dumazet <dada1@cosmosbay.com>
To: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: Avi Kivity <avi@redhat.com>, Nick Piggin <npiggin@suse.de>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Linux Memory Management List <linux-mm@kvack.org>,
	the arch/x86 maintainers <x86@kernel.org>
Subject: Re: [PATCH 1/2] x86/mm: maintain a percpu "in get_user_pages_fast" flag
Date: Sat, 28 Mar 2009 08:54:28 +0100	[thread overview]
Message-ID: <49CDD7B4.4020701@cosmosbay.com> (raw)
In-Reply-To: <49CDAF17.5060207@goop.org>

Jeremy Fitzhardinge a écrit :
> Avi Kivity wrote:
>> Jeremy Fitzhardinge wrote:
>>> get_user_pages_fast() relies on cross-cpu tlb flushes being a barrier
>>> between clearing and setting a pte, and before freeing a pagetable page.
>>> It usually does this by disabling interrupts to hold off IPIs, but
>>> some tlb flush implementations don't use IPIs for tlb flushes, and
>>> must use another mechanism.
>>>
>>> In this change, add in_gup_cpumask, which is a cpumask of cpus currently
>>> performing a get_user_pages_fast traversal of a pagetable.  A cross-cpu
>>> tlb flush function can use this to determine whether it should hold-off
>>> on the flush until the gup_fast has finished.
>>>
>>> @@ -255,6 +260,10 @@ int get_user_pages_fast(unsigned long start, int
>>> nr_pages, int write,
>>>      * address down to the the page and take a ref on it.
>>>      */
>>>     local_irq_disable();
>>> +
>>> +    cpu = smp_processor_id();
>>> +    cpumask_set_cpu(cpu, in_gup_cpumask);
>>> +
>>
>> This will bounce a cacheline, every time.  Please wrap in CONFIG_XEN
>> and skip at runtime if Xen is not enabled.
> 
> Every time?  Only when running successive gup_fasts on different cpus,
> and only twice per gup_fast. (What's the typical page count?  I see that
> kvm and lguest are page-at-a-time users, but presumably direct IO has
> larger batches.)

If I am not mistaken, shared futexes where hitting hard mm semaphore.
Then gup_fast was introduced in kernel/futex.c to remove this contention point.
Yet, this contention point was process specific, not a global one :)

And now, you want to add a global hot point, that would slow
down unrelated processes, only because they use shared futexes, thousand
times per second...

> 
> Alternatively, it could have per-cpu flags and the other side could
> construct the mask (I originally had that, but this was simpler).

Simpler but would be a regression for legacy applications still using shared
futexes (because statically linked with old libc)


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2009-03-28  8:16 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-03-27 20:31 [PATCH 1/2] x86/mm: maintain a percpu "in get_user_pages_fast" flag Jeremy Fitzhardinge
2009-03-27 20:31 ` Jeremy Fitzhardinge
2009-03-28  3:48 ` Avi Kivity
2009-03-28  3:48   ` Avi Kivity
2009-03-28  5:01   ` Jeremy Fitzhardinge
2009-03-28  5:01     ` Jeremy Fitzhardinge
2009-03-28  7:54     ` Eric Dumazet [this message]
2009-03-28  7:54       ` Eric Dumazet
2009-03-28 12:31       ` Peter Zijlstra
2009-03-28 12:31         ` Peter Zijlstra
2009-03-28  9:54     ` Avi Kivity
2009-03-28  9:54       ` Avi Kivity
2009-03-28 12:27     ` Peter Zijlstra
2009-03-28 12:27       ` Peter Zijlstra

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=49CDD7B4.4020701@cosmosbay.com \
    --to=dada1@cosmosbay.com \
    --cc=avi@redhat.com \
    --cc=jeremy@goop.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=npiggin@suse.de \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.