From: Rik van Riel <riel@redhat.com>
To: linux-kernel@vger.kernel.org
Cc: Andrew Morton <akpm@linux-foundation.org>,
Lee Schermerhorn <lee.schermerhorn@hp.com>,
Kosaki Motohiro <kosaki.motohiro@jp.fujitsu.com>
Subject: [PATCH -mm 17/24] mlock: downgrade mmap sem while populating mlocked regions
Date: Wed, 11 Jun 2008 14:42:31 -0400 [thread overview]
Message-ID: <20080611184339.899722381@redhat.com> (raw)
In-Reply-To: 20080611184214.605110868@redhat.com
[-- Attachment #1: vmscan-downgrade-mmap-sem-while-populating-mlocked-regions.patch --]
[-- Type: text/plain, Size: 4836 bytes --]
From: Lee Schermerhorn <lee.schermerhorn@hp.com>
We need to hold the mmap_sem for write to initiatate mlock()/munlock()
because we may need to merge/split vmas. However, this can lead to
very long lock hold times attempting to fault in a large memory region
to mlock it into memory. This can hold off other faults against the
mm [multithreaded tasks] and other scans of the mm, such as via /proc.
To alleviate this, downgrade the mmap_sem to read mode during the
population of the region for locking. This is especially the case
if we need to reclaim memory to lock down the region. We [probably?]
don't need to do this for unlocking as all of the pages should be
resident--they're already mlocked.
Now, the caller's of the mlock functions [mlock_fixup() and
mlock_vma_pages_range()] expect the mmap_sem to be returned in write
mode. Changing all callers appears to be way too much effort at this
point. So, restore write mode before returning. Note that this opens
a window where the mmap list could change in a multithreaded process.
So, at least for mlock_fixup(), where we could be called in a loop over
multiple vmas, we check that a vma still exists at the start address
and that vma still covers the page range [start,end). If not, we return
an error, -EAGAIN, and let the caller deal with it.
Return -EAGAIN from mlock_vma_pages_range() function and mlock_fixup()
if the vma at 'start' disappears or changes so that the page range
[start,end) is no longer contained in the vma. Again, let the caller
deal with it. Looks like only sys_remap_file_pages() [via mmap_region()]
should actually care.
With this patch, I no longer see processes like ps(1) blocked for seconds
or minutes at a time waiting for a large [multiple gigabyte] region to be
locked down. However, I occassionally see delays while unlocking or
unmapping a large mlocked region. Should we also downgrade the mmap_sem
for the unlock path?
Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
Signed-off-by: Rik van Riel <riel@redhat.com>
---
V2 -> V3:
+ rebase to 23-mm1 atop RvR's split lru series [no change]
+ fix function return types [void -> int] to fix build when
not configured.
New in V2.
mm/mlock.c | 46 +++++++++++++++++++++++++++++++++++++++++++---
1 file changed, 43 insertions(+), 3 deletions(-)
Index: linux-2.6.26-rc5-mm2/mm/mlock.c
===================================================================
--- linux-2.6.26-rc5-mm2.orig/mm/mlock.c 2008-06-10 22:28:01.000000000 -0400
+++ linux-2.6.26-rc5-mm2/mm/mlock.c 2008-06-10 22:28:05.000000000 -0400
@@ -308,6 +308,7 @@ static void __munlock_vma_pages_range(st
int mlock_vma_pages_range(struct vm_area_struct *vma,
unsigned long start, unsigned long end)
{
+ struct mm_struct *mm = vma->vm_mm;
int nr_pages = (end - start) / PAGE_SIZE;
BUG_ON(!(vma->vm_flags & VM_LOCKED));
@@ -319,8 +320,19 @@ int mlock_vma_pages_range(struct vm_area
if (!((vma->vm_flags & (VM_DONTEXPAND | VM_RESERVED)) ||
is_vm_hugetlb_page(vma) ||
- vma == get_gate_vma(current)))
- return __mlock_vma_pages_range(vma, start, end);
+ vma == get_gate_vma(current))) {
+ downgrade_write(&mm->mmap_sem);
+ nr_pages = __mlock_vma_pages_range(vma, start, end);
+
+ up_read(&mm->mmap_sem);
+ /* vma can change or disappear */
+ down_write(&mm->mmap_sem);
+ vma = find_vma(mm, start);
+ /* non-NULL vma must contain @start, but need to check @end */
+ if (!vma || end > vma->vm_end)
+ return -EAGAIN;
+ return nr_pages;
+ }
/*
* User mapped kernel pages or huge pages:
@@ -414,13 +426,41 @@ success:
vma->vm_flags = newflags;
if (lock) {
+ /*
+ * mmap_sem is currently held for write. Downgrade the write
+ * lock to a read lock so that other faults, mmap scans, ...
+ * while we fault in all pages.
+ */
+ downgrade_write(&mm->mmap_sem);
+
ret = __mlock_vma_pages_range(vma, start, end);
if (ret > 0) {
mm->locked_vm -= ret;
ret = 0;
}
- } else
+ /*
+ * Need to reacquire mmap sem in write mode, as our callers
+ * expect this. We have no support for atomically upgrading
+ * a sem to write, so we need to check for ranges while sem
+ * is unlocked.
+ */
+ up_read(&mm->mmap_sem);
+ /* vma can change or disappear */
+ down_write(&mm->mmap_sem);
+ *prev = find_vma(mm, start);
+ /* non-NULL *prev must contain @start, but need to check @end */
+ if (!(*prev) || end > (*prev)->vm_end)
+ ret = -EAGAIN;
+ } else {
+ /*
+ * TODO: for unlocking, pages will already be resident, so
+ * we don't need to wait for allocations/reclaim/pagein, ...
+ * However, unlocking a very large region can still take a
+ * while. Should we downgrade the semaphore for both lock
+ * AND unlock ?
+ */
__munlock_vma_pages_range(vma, start, end);
+ }
out:
*prev = vma;
--
All Rights Reversed
next prev parent reply other threads:[~2008-06-11 18:48 UTC|newest]
Thread overview: 47+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-06-11 18:42 [PATCH -mm 00/24] VM pageout scalability improvements (V12) Rik van Riel
2008-06-11 18:42 ` [PATCH -mm 01/24] vmscan: move isolate_lru_page() to vmscan.c Rik van Riel
2008-06-11 18:42 ` [PATCH -mm 02/24] vmscan: Use an indexed array for LRU variables Rik van Riel
2008-06-11 18:42 ` [PATCH -mm 03/24] swap: use an array for the LRU pagevecs Rik van Riel
2008-06-11 18:42 ` [PATCH -mm 04/24] vmscan: free swap space on swap-in/activation Rik van Riel
2008-06-11 18:42 ` [PATCH -mm 05/24] define page_file_cache() function Rik van Riel
2008-06-11 18:42 ` [PATCH -mm 06/24] vmscan: split LRU lists into anon & file sets Rik van Riel
2008-06-13 0:39 ` Hiroshi Shimamoto
2008-06-13 17:48 ` [PATCH] fix printk in show_free_areas Rik van Riel
2008-06-13 20:21 ` [PATCH] collect lru meminfo statistics from correct offset Lee Schermerhorn
2008-06-15 15:07 ` [PATCH] fix printk in show_free_areas KOSAKI Motohiro
2008-06-11 18:42 ` [PATCH -mm 07/24] vmscan: second chance replacement for anonymous pages Rik van Riel
2008-06-11 18:42 ` [PATCH -mm 08/24] vmscan: fix pagecache reclaim referenced bit check Rik van Riel
2008-06-11 18:42 ` [PATCH -mm 09/24] vmscan: add newly swapped in pages to the inactive list Rik van Riel
2008-06-11 18:42 ` [PATCH -mm 10/24] more aggressively use lumpy reclaim Rik van Riel
2008-06-11 18:42 ` [PATCH -mm 11/24] pageflag helpers for configed-out flags Rik van Riel
2008-06-11 20:01 ` Andrew Morton
2008-06-11 20:08 ` Rik van Riel
2008-06-11 20:23 ` Lee Schermerhorn
2008-06-11 20:30 ` Rik van Riel
2008-06-11 20:28 ` Lee Schermerhorn
2008-06-11 20:32 ` Rik van Riel
2008-06-11 20:43 ` Lee Schermerhorn
2008-06-11 20:48 ` Rik van Riel
2008-06-11 18:42 ` [PATCH -mm 12/24] Unevictable LRU Infrastructure Rik van Riel
2008-06-11 18:42 ` [PATCH -mm 13/24] Unevictable LRU Page Statistics Rik van Riel
2008-06-11 18:42 ` [PATCH -mm 14/24] Ramfs and Ram Disk pages are unevictable Rik van Riel
2008-06-12 0:54 ` Nick Piggin
2008-06-12 17:29 ` Rik van Riel
2008-06-12 17:37 ` Nick Piggin
2008-06-12 17:50 ` Rik van Riel
2008-06-12 17:57 ` Nick Piggin
2008-06-11 18:42 ` [PATCH -mm 15/24] SHM_LOCKED " Rik van Riel
2008-06-11 18:42 ` [PATCH -mm 16/24] mlock: mlocked " Rik van Riel
2008-06-11 18:42 ` Rik van Riel [this message]
2008-06-11 18:42 ` [PATCH -mm 18/24] mmap: handle mlocked pages during map, remap, unmap Rik van Riel
2008-06-11 18:42 ` [PATCH -mm 19/24] vmstat: mlocked pages statistics Rik van Riel
2008-06-11 18:42 ` [PATCH -mm 20/24] swap: cull unevictable pages in fault path Rik van Riel
2008-06-11 18:42 ` [PATCH -mm 21/24] vmstat: unevictable and mlocked pages vm events Rik van Riel
2008-06-11 18:42 ` [PATCH -mm 22/24] vmscan: unevictable LRU scan sysctl Rik van Riel
2008-06-11 18:42 ` [PATCH -mm 23/24] mlock: count attempts to free mlocked page Rik van Riel
2008-06-11 18:42 ` [PATCH -mm 24/24] doc: unevictable LRU and mlocked pages documentation Rik van Riel
2008-06-12 5:34 ` [PATCH -mm 00/24] VM pageout scalability improvements (V12) Andrew Morton
2008-06-12 13:31 ` Rik van Riel
2008-06-16 5:32 ` KOSAKI Motohiro
2008-06-16 6:20 ` Andrew Morton
2008-06-16 6:22 ` KOSAKI Motohiro
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20080611184339.899722381@redhat.com \
--to=riel@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=kosaki.motohiro@jp.fujitsu.com \
--cc=lee.schermerhorn@hp.com \
--cc=linux-kernel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox