[PATCH 18/25] Downgrade mmap sem while populating mlocked regions

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Lee Schermerhorn <lee.schermerhorn@hp.com>
From: Lee Schermerhorn <lee.schermerhorn@hp.com>
To: linux-kernel@vger.kernel.org
Cc: Kosaki Motohiro <kosaki.motohiro@jp.fujitsu.com>,
	Eric Whitney <eric.whitney@hp.com>,
	linux-mm@kvack.org, Nick Piggin <npiggin@suse.de>,
	Rik van Riel <riel@redhat.com>,
	Andrew Morton <akpm@linux-foundation.org>
Subject: [PATCH 18/25] Downgrade mmap sem while populating mlocked regions
Date: Thu, 29 May 2008 15:51:08 -0400	[thread overview]
Message-ID: <20080529195108.27159.5746.sendpatchset@lts-notebook> (raw)
In-Reply-To: <20080529195030.27159.66161.sendpatchset@lts-notebook>

Against:  2.6.26-rc2-mm1

V2 -> V3:
+ rebase to 23-mm1 atop RvR's split lru series [no change]
+ fix function return types [void -> int] to fix build when
  not configured.

New in V2.

We need to hold the mmap_sem for write to initiatate mlock()/munlock()
because we may need to merge/split vmas.  However, this can lead to
very long lock hold times attempting to fault in a large memory region
to mlock it into memory.   This can hold off other faults against the
mm [multithreaded tasks] and other scans of the mm, such as via /proc.
To alleviate this, downgrade the mmap_sem to read mode during the 
population of the region for locking.  This is especially the case 
if we need to reclaim memory to lock down the region.  We [probably?]
don't need to do this for unlocking as all of the pages should be
resident--they're already mlocked.

Now, the caller's of the mlock functions [mlock_fixup() and 
mlock_vma_pages_range()] expect the mmap_sem to be returned in write
mode.  Changing all callers appears to be way too much effort at this
point.  So, restore write mode before returning.  Note that this opens
a window where the mmap list could change in a multithreaded process.
So, at least for mlock_fixup(), where we could be called in a loop over
multiple vmas, we check that a vma still exists at the start address
and that vma still covers the page range [start,end).  If not, we return
an error, -EAGAIN, and let the caller deal with it.

Return -EAGAIN from mlock_vma_pages_range() function and mlock_fixup()
if the vma at 'start' disappears or changes so that the page range
[start,end) is no longer contained in the vma.  Again, let the caller
deal with it.  Looks like only sys_remap_file_pages() [via mmap_region()]
should actually care.

With this patch, I no longer see processes like ps(1) blocked for seconds
or minutes at a time waiting for a large [multiple gigabyte] region to be
locked down.  However, I occassionally see delays while unlocking or
unmapping a large mlocked region.  Should we also downgrade the mmap_sem
for the unlock path?

Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
Signed-off-by: Rik van Riel <riel@redhat.com>

 mm/mlock.c |   43 +++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 41 insertions(+), 2 deletions(-)

Index: linux-2.6.26-rc2-mm1/mm/mlock.c
===================================================================
--- linux-2.6.26-rc2-mm1.orig/mm/mlock.c	2008-05-21 13:58:40.000000000 -0400
+++ linux-2.6.26-rc2-mm1/mm/mlock.c	2008-05-21 13:58:42.000000000 -0400
@@ -309,6 +309,7 @@ static void __munlock_vma_pages_range(st
 int mlock_vma_pages_range(struct vm_area_struct *vma,
 			unsigned long start, unsigned long end)
 {
+	struct mm_struct *mm = vma->vm_mm;
 	int nr_pages = (end - start) / PAGE_SIZE;
 	BUG_ON(!(vma->vm_flags & VM_LOCKED));

@@ -323,7 +324,17 @@ int mlock_vma_pages_range(struct vm_area
 			vma == get_gate_vma(current))
 		goto make_present;

-	return __mlock_vma_pages_range(vma, start, end);
+	downgrade_write(&mm->mmap_sem);
+	nr_pages = __mlock_vma_pages_range(vma, start, end);
+
+	up_read(&mm->mmap_sem);
+	/* vma can change or disappear */
+	down_write(&mm->mmap_sem);
+	vma = find_vma(mm, start);
+	/* non-NULL vma must contain @start, but need to check @end */
+	if (!vma ||  end > vma->vm_end)
+		return -EAGAIN;
+	return nr_pages;

 make_present:
 	/*
@@ -418,13 +429,41 @@ success:
 	vma->vm_flags = newflags;

 	if (lock) {
+		/*
+		 * mmap_sem is currently held for write.  Downgrade the write
+		 * lock to a read lock so that other faults, mmap scans, ...
+		 * while we fault in all pages.
+		 */
+		downgrade_write(&mm->mmap_sem);
+
 		ret = __mlock_vma_pages_range(vma, start, end);
 		if (ret > 0) {
 			mm->locked_vm -= ret;
 			ret = 0;
 		}
-	} else
+		/*
+		 * Need to reacquire mmap sem in write mode, as our callers
+		 * expect this.  We have no support for atomically upgrading
+		 * a sem to write, so we need to check for ranges while sem
+		 * is unlocked.
+		 */
+		up_read(&mm->mmap_sem);
+		/* vma can change or disappear */
+		down_write(&mm->mmap_sem);
+		*prev = find_vma(mm, start);
+		/* non-NULL *prev must contain @start, but need to check @end */
+		if (!(*prev) || end > (*prev)->vm_end)
+			ret = -EAGAIN;
+	} else {
+		/*
+		 * TODO:  for unlocking, pages will already be resident, so
+		 * we don't need to wait for allocations/reclaim/pagein, ...
+		 * However, unlocking a very large region can still take a
+		 * while.  Should we downgrade the semaphore for both lock
+		 * AND unlock ?
+		 */
 		__munlock_vma_pages_range(vma, start, end);
+	}

 out:
 	*prev = vma;

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

next prev parent reply	other threads:[~2008-05-29 19:51 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-05-29 19:50 [PATCH 00/25] Vm Pageout Scalability Improvements (V8) - continued Lee Schermerhorn
2008-05-29 19:50 ` [PATCH 13/25] Noreclaim LRU Infrastructure Lee Schermerhorn
2008-05-29 19:50 ` [PATCH 14/25] Noreclaim LRU Page Statistics Lee Schermerhorn
2008-05-29 19:50 ` [PATCH 15/25] Ramfs and Ram Disk pages are non-reclaimable Lee Schermerhorn
2008-05-29 19:50 ` [PATCH 16/25] SHM_LOCKED " Lee Schermerhorn, Lee Schermerhorn
2008-05-29 19:51 ` [PATCH 17/25] Mlocked Pages " Lee Schermerhorn
2008-05-29 19:51 ` Lee Schermerhorn, Lee Schermerhorn [this message]
2008-05-29 19:51 ` [PATCH 19/25] Handle mlocked pages during map, remap, unmap Lee Schermerhorn
2008-05-29 19:51 ` [PATCH 20/25] Mlocked Pages statistics Lee Schermerhorn, Nick Piggin
2008-05-29 19:51 ` [PATCH 21/25] Cull non-reclaimable pages in fault path Lee Schermerhorn, Lee Schermerhorn
2008-05-29 19:51 ` [PATCH 22/25] Noreclaim and Mlocked pages vm events Lee Schermerhorn, Lee Schermerhorn
2008-05-29 19:51 ` [PATCH 23/25] Noreclaim LRU scan sysctl Lee Schermerhorn, Lee Schermerhorn
2008-05-29 19:51 ` [PATCH 24/25] Mlocked Pages: count attempts to free mlocked page Lee Schermerhorn
2008-05-29 19:51 ` [PATCH 25/25] Noreclaim LRU and Mlocked Pages Documentation Lee Schermerhorn
2008-05-29 20:16 ` [PATCH 00/25] Vm Pageout Scalability Improvements (V8) - continued Andrew Morton
2008-05-29 20:20   ` Rik van Riel
2008-05-30  1:56     ` MinChan Kim
2008-05-30 13:52     ` John Stoffel
2008-05-30 14:29       ` Rik van Riel
2008-05-30 14:36         ` John Stoffel
2008-05-30 15:27           ` Rik van Riel
2008-05-30  9:27 ` KOSAKI Motohiro

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20080529195108.27159.5746.sendpatchset@lts-notebook \
    --to=lee.schermerhorn@hp.com \
    --cc=akpm@linux-foundation.org \
    --cc=eric.whitney@hp.com \
    --cc=kosaki.motohiro@jp.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=npiggin@suse.de \
    --cc=riel@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).