Re: [PATCH 0/3] mm,vdso: preallocate new vmas

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Andy Lutomirski <luto@amacapital.net>
To: Michel Lespinasse <walken@google.com>
Cc: Davidlohr Bueso <davidlohr@hp.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Ingo Molnar <mingo@kernel.org>,
	Peter Zijlstra <a.p.zijlstra@chello.nl>,
	Rik van Riel <riel@redhat.com>,
	Tim Chen <tim.c.chen@linux.intel.com>,
	aswin@hp.com, linux-mm <linux-mm@kvack.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 0/3] mm,vdso: preallocate new vmas
Date: Wed, 23 Oct 2013 14:42:34 -0700	[thread overview]
Message-ID: <CALCETrVemOctXwA8Waa1bOWew7eW5fU_gAcBUvmuyL7-qK-uRg@mail.gmail.com> (raw)
In-Reply-To: <CANN689GGTnkG1+=aH1PDxkEyN3VdCfHLjDfA3VErpOpT84rZTg@mail.gmail.com>

On Wed, Oct 23, 2013 at 3:13 AM, Michel Lespinasse <walken@google.com> wrote:
> On Tue, Oct 22, 2013 at 10:54 AM, Andy Lutomirski <luto@amacapital.net> wrote:
>> On 10/22/2013 08:48 AM, walken@google.com wrote:
>>> Generally the problems I see with mmap_sem are related to long latency
>>> operations. Specifically, the mmap_sem write side is currently held
>>> during the entire munmap operation, which iterates over user pages to
>>> free them, and can take hundreds of milliseconds for large VMAs.
>>
>> This is the leading cause of my "egads, something that should have been
>> fast got delayed for several ms" detector firing.
>
> Yes, I'm seeing such issues relatively frequently as well.
>
>>  I've been wondering:
>>
>> Could we replace mmap_sem with some kind of efficient range lock?  The
>> operations would be:
>>
>>  - mm_lock_all_write (drop-in replacement for down_write(&...->mmap_sem))
>>  - mm_lock_all_read (same for down_read)
>>  - mm_lock_write_range(mm, start, end)
>>  - mm_lock_read_range(mm, start_end)
>>
>> and corresponding unlock functions (that maybe take a cookie that the
>> lock functions return or that take a pointer to some small on-stack data
>> structure).
>
> That seems doable, however I believe we can get rid of the latencies
> in the first place which seems to be a better direction. As I briefly
> mentioned, I would like to tackle the munmap problem sometime; Jan
> Kara also has a project to remove places where blocking FS functions
> are called with mmap_sem held (he's doing it for lock ordering
> purposes, so that FS can call in to MM functions that take mmap_sem,
> but there are latency benefits as well if we can avoid blocking in FS
> with mmap_sem held).

There will still be scalability issues if there are enough threads,
but maybe this isn't so bad.  (My workload may also have priority
inversion problems -- there's a thread that runs on its own core and
needs the mmap_sem read lock and a thread that runs on a highly
contended core that needs the write lock.)

>
>> The easiest way to implement this that I can think of is a doubly-linked
>> list or even just an array, which should be fine for a handful of
>> threads.  Beyond that, I don't really know.  Creating a whole trie for
>> these things would be expensive, and fine-grained locking on rbtree-like
>> things isn't so easy.
>
> Jan also had an implementation of range locks using interval trees. To
> take a range lock, you'd add the range you want to the interval tree,
> count the conflicting range lock requests that were there before you,
> and (if nonzero) block until that count goes to 0. When releasing the
> range lock, you look for any conflicting requests in the interval tree
> and decrement their conflict count, waking them up if the count goes
> to 0.

Yuck.  Now we're taking a per-mm lock on the rbtree, doing some
cacheline-bouncing rbtree operations, and dropping the lock to
serialize access to something that probably only has a small handful
of accessors at a time.  I bet that an O(num locks) array or linked
list will end up being faster in practice.

I think the idea solution would be to shove these things into the page
tables somehow, but that seems impossibly complicated.

--Andy

>
> But as I said earlier, I would prefer if we could avoid holding
> mmap_sem during long-latency operations rather than working around
> this issue with range locks.
>
> --
> Michel "Walken" Lespinasse
> A program is never fully debugged until the last user dies.



-- 
Andy Lutomirski
AMA Capital Management, LLC

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

next prev parent reply	other threads:[~2013-10-23 21:43 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-10-18  0:50 [PATCH 0/3] mm,vdso: preallocate new vmas Davidlohr Bueso
2013-10-18  0:50 ` [PATCH 1/3] mm: add mlock_future_check helper Davidlohr Bueso
2013-10-23  9:30   ` walken
2013-10-18  0:50 ` [PATCH 2/3] mm/mlock: prepare params outside critical region Davidlohr Bueso
2013-10-23  9:33   ` walken
2013-10-23  9:46   ` Vlastimil Babka
2013-10-18  0:50 ` [PATCH 3/3] vdso: preallocate new vmas Davidlohr Bueso
2013-10-18  1:17   ` Linus Torvalds
2013-10-18  5:59   ` Richard Weinberger
2013-10-18  6:05   ` [PATCH 4/3] x86/vdso: Optimize setup_additional_pages() Ingo Molnar
2013-10-21  3:52     ` Davidlohr Bueso
2013-10-21  5:27       ` Ingo Molnar
2013-10-21  3:26   ` [PATCH 3/3] vdso: preallocate new vmas Davidlohr Bueso
2013-10-23  9:53     ` walken
2013-10-25  0:55       ` Davidlohr Bueso
2013-10-22 15:48 ` [PATCH 0/3] mm,vdso: " walken
2013-10-22 16:20   ` Linus Torvalds
2013-10-22 17:04     ` Michel Lespinasse
2013-10-22 17:54   ` Andy Lutomirski
2013-10-23 10:13     ` Michel Lespinasse
2013-10-23 21:42       ` Andy Lutomirski [this message]
2013-10-23  2:46   ` Davidlohr Bueso
2013-11-05  0:39 ` Davidlohr Bueso

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CALCETrVemOctXwA8Waa1bOWew7eW5fU_gAcBUvmuyL7-qK-uRg@mail.gmail.com \
    --to=luto@amacapital.net \
    --cc=a.p.zijlstra@chello.nl \
    --cc=akpm@linux-foundation.org \
    --cc=aswin@hp.com \
    --cc=davidlohr@hp.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mingo@kernel.org \
    --cc=riel@redhat.com \
    --cc=tim.c.chen@linux.intel.com \
    --cc=torvalds@linux-foundation.org \
    --cc=walken@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).