From: Michel Lespinasse <walken@google.com>
To: Roman Dubtsov <dubtsov@gmail.com>
Cc: linux-kernel@vger.kernel.org,
Andy Lutomirski <luto@amacapital.net>,
Rik van Riel <riel@redhat.com>,
Andrew Morton <akpm@linux-foundation.org>,
Hugh Dickins <hughd@google.com>
Subject: Re: mmap() scalability in the presence of the MAP_POPULATE flag
Date: Fri, 4 Jan 2013 03:57:48 -0800 [thread overview]
Message-ID: <20130104115748.GA8830@google.com> (raw)
In-Reply-To: <1357232977.1886.17.camel@mesosphere.localdomain>
On Fri, Jan 04, 2013 at 12:09:37AM +0700, Roman Dubtsov wrote:
> On Wed, 2013-01-02 at 16:09 -0800, Michel Lespinasse wrote:
> > > Is there an interest in fixing this or concurrent mmaps() from the same
> > > process are too much of a corner case to worry about it?
> >
> > Funny this comes up again. I actually have a patch series that is
> > supposed to do that:
> > [PATCH 0/9] Avoid populating unbounded num of ptes with mmap_sem held
> >
> > However, the patches are still pending, didn't get much review
> > (probably not enough for Andrew to take them at this point), and I
> > think everyone forgot about them during the winter break.
> >
> > Care to have a look at that thread and see if it works for you ?
> >
> > (caveat: you will possibly also need "[PATCH 10/9] mm: make
> > do_mmap_pgoff return populate as a size in bytes, not as a bool" to
> > make the series actually work for you)
>
> I applied the patches on top of 3.7.1. Here're the results for 4 threads
> concurrently mmap()-ing 10 64MB buffers in a loop without munmap()-s.
> The data is from a Nehalem i7-920 single-socket 4-core CPU. I've also
> added the older data I have for the 3.6.11 (patched and not) for
> reference.
>
> 3.6.11 vanilla, do not populate: 0.001 seconds
> 3.6.11 vanilla, populate via a loop: 0.216 seconds
> 3.6.11 vanilla, populate via MAP_POPULATE: 0.358 seconds
>
> 3.6.11 + crude patch, do not populate: 0.002 seconds
> 3.6.11 + crude patch, populate via loop: 0.215 seconds
> 3.6.11 + crude patch, populate via MAP_POPULATE: 0.217 seconds
>
> 3.7.1 vanilla, do not populate: 0.001 seconds
> 3.7.1 vanilla, populate via a loop: 0.216 seconds
> 3.7.1 vanilla, populate via MAP_POPULATE: 0.411 seconds
>
> 3.7.1 + patch series, do not populate: 0.001 seconds
> 3.7.1 + patch series, populate via loop: 0.216 seconds
> 3.7.1 + patch series, populate via MAP_POPULATE: 0.273 seconds
>
> So, the patch series mentioned above do improve performance but as far
> as I can read the benchmarking data there's still some performance left
> on the table.
Interesting. I expect you are using anon memory, so it's likely that
mm_populate() holds the mmap_sem read side for the entire duration of
the 64MB populate.
Just curious, does the following help ?
diff --git a/mm/memory.c b/mm/memory.c
index e4ab66b94bb8..f65a4b3b2141 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -1627,6 +1627,12 @@ static inline int stack_guard_page(struct vm_area_struct *vma, unsigned long add
stack_guard_page_end(vma, addr+PAGE_SIZE);
}
+/* not upstreamable as is, just for the sake of testing */
+static inline int rwsem_is_contended(struct rw_semaphore *sem)
+{
+ return (sem->count < 0);
+}
+
/**
* __get_user_pages() - pin user pages in memory
* @tsk: task_struct of target task
@@ -1854,6 +1860,11 @@ next_page:
i++;
start += PAGE_SIZE;
nr_pages--;
+ if (nonblocking && rwsem_is_contended(&mm->mmap_sem)) {
+ up_read(&mm->mmap_sem);
+ *nonblocking = 0;
+ return i;
+ }
} while (nr_pages && start < vma->vm_end);
} while (nr_pages);
return i;
Linus didn't like rwsem_is_contended() when I implemented the mlock
side of this a couple years ago, but maybe we can change his mind now.
If this doesn't help, could you please send me your test case ? I
think you described enough of it that I would be able to reproduce it
given some time, but it's just easier if you send me a short C file :)
--
Michel "Walken" Lespinasse
A program is never fully debugged until the last user dies.
next prev parent reply other threads:[~2013-01-04 11:57 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-01-02 16:50 mmap() scalability in the presence of the MAP_POPULATE flag Roman Dubtsov
2013-01-03 0:09 ` Michel Lespinasse
2013-01-03 17:09 ` Roman Dubtsov
2013-01-04 11:57 ` Michel Lespinasse [this message]
2013-01-05 6:40 ` Roman Dubtsov
2013-01-05 7:43 ` Michel Lespinasse
2013-01-31 0:31 ` Michel Lespinasse
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20130104115748.GA8830@google.com \
--to=walken@google.com \
--cc=akpm@linux-foundation.org \
--cc=dubtsov@gmail.com \
--cc=hughd@google.com \
--cc=linux-kernel@vger.kernel.org \
--cc=luto@amacapital.net \
--cc=riel@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox