From: Andy Lutomirski <luto@amacapital.net>
To: Michel Lespinasse <walken@google.com>
Cc: Davidlohr Bueso <davidlohr@hp.com>,
Andrew Morton <akpm@linux-foundation.org>,
Linus Torvalds <torvalds@linux-foundation.org>,
Ingo Molnar <mingo@kernel.org>,
Peter Zijlstra <a.p.zijlstra@chello.nl>,
Rik van Riel <riel@redhat.com>,
Tim Chen <tim.c.chen@linux.intel.com>,
aswin@hp.com, linux-mm <linux-mm@kvack.org>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 0/3] mm,vdso: preallocate new vmas
Date: Wed, 23 Oct 2013 14:42:34 -0700 [thread overview]
Message-ID: <CALCETrVemOctXwA8Waa1bOWew7eW5fU_gAcBUvmuyL7-qK-uRg@mail.gmail.com> (raw)
In-Reply-To: <CANN689GGTnkG1+=aH1PDxkEyN3VdCfHLjDfA3VErpOpT84rZTg@mail.gmail.com>
On Wed, Oct 23, 2013 at 3:13 AM, Michel Lespinasse <walken@google.com> wrote:
> On Tue, Oct 22, 2013 at 10:54 AM, Andy Lutomirski <luto@amacapital.net> wrote:
>> On 10/22/2013 08:48 AM, walken@google.com wrote:
>>> Generally the problems I see with mmap_sem are related to long latency
>>> operations. Specifically, the mmap_sem write side is currently held
>>> during the entire munmap operation, which iterates over user pages to
>>> free them, and can take hundreds of milliseconds for large VMAs.
>>
>> This is the leading cause of my "egads, something that should have been
>> fast got delayed for several ms" detector firing.
>
> Yes, I'm seeing such issues relatively frequently as well.
>
>> I've been wondering:
>>
>> Could we replace mmap_sem with some kind of efficient range lock? The
>> operations would be:
>>
>> - mm_lock_all_write (drop-in replacement for down_write(&...->mmap_sem))
>> - mm_lock_all_read (same for down_read)
>> - mm_lock_write_range(mm, start, end)
>> - mm_lock_read_range(mm, start_end)
>>
>> and corresponding unlock functions (that maybe take a cookie that the
>> lock functions return or that take a pointer to some small on-stack data
>> structure).
>
> That seems doable, however I believe we can get rid of the latencies
> in the first place which seems to be a better direction. As I briefly
> mentioned, I would like to tackle the munmap problem sometime; Jan
> Kara also has a project to remove places where blocking FS functions
> are called with mmap_sem held (he's doing it for lock ordering
> purposes, so that FS can call in to MM functions that take mmap_sem,
> but there are latency benefits as well if we can avoid blocking in FS
> with mmap_sem held).
There will still be scalability issues if there are enough threads,
but maybe this isn't so bad. (My workload may also have priority
inversion problems -- there's a thread that runs on its own core and
needs the mmap_sem read lock and a thread that runs on a highly
contended core that needs the write lock.)
>
>> The easiest way to implement this that I can think of is a doubly-linked
>> list or even just an array, which should be fine for a handful of
>> threads. Beyond that, I don't really know. Creating a whole trie for
>> these things would be expensive, and fine-grained locking on rbtree-like
>> things isn't so easy.
>
> Jan also had an implementation of range locks using interval trees. To
> take a range lock, you'd add the range you want to the interval tree,
> count the conflicting range lock requests that were there before you,
> and (if nonzero) block until that count goes to 0. When releasing the
> range lock, you look for any conflicting requests in the interval tree
> and decrement their conflict count, waking them up if the count goes
> to 0.
Yuck. Now we're taking a per-mm lock on the rbtree, doing some
cacheline-bouncing rbtree operations, and dropping the lock to
serialize access to something that probably only has a small handful
of accessors at a time. I bet that an O(num locks) array or linked
list will end up being faster in practice.
I think the idea solution would be to shove these things into the page
tables somehow, but that seems impossibly complicated.
--Andy
>
> But as I said earlier, I would prefer if we could avoid holding
> mmap_sem during long-latency operations rather than working around
> this issue with range locks.
>
> --
> Michel "Walken" Lespinasse
> A program is never fully debugged until the last user dies.
--
Andy Lutomirski
AMA Capital Management, LLC
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2013-10-23 21:43 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-10-18 0:50 [PATCH 0/3] mm,vdso: preallocate new vmas Davidlohr Bueso
2013-10-18 0:50 ` [PATCH 1/3] mm: add mlock_future_check helper Davidlohr Bueso
2013-10-23 9:30 ` walken
2013-10-18 0:50 ` [PATCH 2/3] mm/mlock: prepare params outside critical region Davidlohr Bueso
2013-10-23 9:33 ` walken
2013-10-23 9:46 ` Vlastimil Babka
2013-10-18 0:50 ` [PATCH 3/3] vdso: preallocate new vmas Davidlohr Bueso
2013-10-18 1:17 ` Linus Torvalds
2013-10-18 5:59 ` Richard Weinberger
2013-10-18 6:05 ` [PATCH 4/3] x86/vdso: Optimize setup_additional_pages() Ingo Molnar
2013-10-21 3:52 ` Davidlohr Bueso
2013-10-21 5:27 ` Ingo Molnar
2013-10-21 3:26 ` [PATCH 3/3] vdso: preallocate new vmas Davidlohr Bueso
2013-10-23 9:53 ` walken
2013-10-25 0:55 ` Davidlohr Bueso
2013-10-22 15:48 ` [PATCH 0/3] mm,vdso: " walken
2013-10-22 16:20 ` Linus Torvalds
2013-10-22 17:04 ` Michel Lespinasse
2013-10-22 17:54 ` Andy Lutomirski
2013-10-23 10:13 ` Michel Lespinasse
2013-10-23 21:42 ` Andy Lutomirski [this message]
2013-10-23 2:46 ` Davidlohr Bueso
2013-11-05 0:39 ` Davidlohr Bueso
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CALCETrVemOctXwA8Waa1bOWew7eW5fU_gAcBUvmuyL7-qK-uRg@mail.gmail.com \
--to=luto@amacapital.net \
--cc=a.p.zijlstra@chello.nl \
--cc=akpm@linux-foundation.org \
--cc=aswin@hp.com \
--cc=davidlohr@hp.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mingo@kernel.org \
--cc=riel@redhat.com \
--cc=tim.c.chen@linux.intel.com \
--cc=torvalds@linux-foundation.org \
--cc=walken@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).