public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Michel Lespinasse <walken@google.com>
To: Roman Dubtsov <dubtsov@gmail.com>
Cc: linux-kernel@vger.kernel.org,
	Andy Lutomirski <luto@amacapital.net>,
	Rik van Riel <riel@redhat.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Hugh Dickins <hughd@google.com>
Subject: Re: mmap() scalability in the presence of the MAP_POPULATE flag
Date: Fri, 4 Jan 2013 03:57:48 -0800	[thread overview]
Message-ID: <20130104115748.GA8830@google.com> (raw)
In-Reply-To: <1357232977.1886.17.camel@mesosphere.localdomain>

On Fri, Jan 04, 2013 at 12:09:37AM +0700, Roman Dubtsov wrote:
> On Wed, 2013-01-02 at 16:09 -0800, Michel Lespinasse wrote:
> > > Is there an interest in fixing this or concurrent mmaps() from the same
> > > process are too much of a corner case to worry about it?
> > 
> > Funny this comes up again. I actually have a patch series that is
> > supposed to do that:
> > [PATCH 0/9] Avoid populating unbounded num of ptes with mmap_sem held
> > 
> > However, the patches are still pending, didn't get much review
> > (probably not enough for Andrew to take them at this point), and I
> > think everyone forgot about them during the winter break.
> > 
> > Care to have a look at that thread and see if it works for you ?
> > 
> > (caveat: you will possibly also need "[PATCH 10/9] mm: make
> > do_mmap_pgoff return populate as a size in bytes, not as a bool" to
> > make the series actually work for you)
> 
> I applied the patches on top of 3.7.1. Here're the results for 4 threads
> concurrently mmap()-ing 10 64MB buffers in a loop without munmap()-s.
> The data is from a Nehalem i7-920 single-socket 4-core CPU. I've also
> added the older data I have for the 3.6.11 (patched and not) for
> reference.
> 
> 3.6.11 vanilla, do not populate: 0.001 seconds
> 3.6.11 vanilla, populate via a loop: 0.216 seconds
> 3.6.11 vanilla, populate via MAP_POPULATE: 0.358 seconds 
> 
> 3.6.11 + crude patch, do not populate: 0.002 seconds
> 3.6.11 + crude patch, populate via loop: 0.215 seconds
> 3.6.11 + crude patch, populate via MAP_POPULATE: 0.217 seconds
> 
> 3.7.1 vanilla, do not populate: 0.001 seconds
> 3.7.1 vanilla, populate via a loop: 0.216 seconds
> 3.7.1 vanilla, populate via MAP_POPULATE: 0.411 seconds
> 
> 3.7.1 + patch series, do not populate: 0.001 seconds
> 3.7.1 + patch series, populate via loop: 0.216 seconds
> 3.7.1 + patch series, populate via MAP_POPULATE: 0.273 seconds
> 
> So, the patch series mentioned above do improve performance but as far
> as I can read the benchmarking data there's still some performance left
> on the table.

Interesting. I expect you are using anon memory, so it's likely that
mm_populate() holds the mmap_sem read side for the entire duration of
the 64MB populate.

Just curious, does the following help ?

diff --git a/mm/memory.c b/mm/memory.c
index e4ab66b94bb8..f65a4b3b2141 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -1627,6 +1627,12 @@ static inline int stack_guard_page(struct vm_area_struct *vma, unsigned long add
 	       stack_guard_page_end(vma, addr+PAGE_SIZE);
 }
 
+/* not upstreamable as is, just for the sake of testing */
+static inline int rwsem_is_contended(struct rw_semaphore *sem)
+{
+	return (sem->count < 0);
+}
+
 /**
  * __get_user_pages() - pin user pages in memory
  * @tsk:	task_struct of target task
@@ -1854,6 +1860,11 @@ next_page:
 			i++;
 			start += PAGE_SIZE;
 			nr_pages--;
+			if (nonblocking && rwsem_is_contended(&mm->mmap_sem)) {
+				up_read(&mm->mmap_sem);
+				*nonblocking = 0;
+				return i;
+			}
 		} while (nr_pages && start < vma->vm_end);
 	} while (nr_pages);
 	return i;

Linus didn't like rwsem_is_contended() when I implemented the mlock
side of this a couple years ago, but maybe we can change his mind now.

If this doesn't help, could you please send me your test case ? I
think you described enough of it that I would be able to reproduce it
given some time, but it's just easier if you send me a short C file :)

-- 
Michel "Walken" Lespinasse
A program is never fully debugged until the last user dies.

  reply	other threads:[~2013-01-04 11:57 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-01-02 16:50 mmap() scalability in the presence of the MAP_POPULATE flag Roman Dubtsov
2013-01-03  0:09 ` Michel Lespinasse
2013-01-03 17:09   ` Roman Dubtsov
2013-01-04 11:57     ` Michel Lespinasse [this message]
2013-01-05  6:40       ` Roman Dubtsov
2013-01-05  7:43         ` Michel Lespinasse
2013-01-31  0:31           ` Michel Lespinasse

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130104115748.GA8830@google.com \
    --to=walken@google.com \
    --cc=akpm@linux-foundation.org \
    --cc=dubtsov@gmail.com \
    --cc=hughd@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=luto@amacapital.net \
    --cc=riel@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox