From: Rik van Riel <riel@redhat.com>
To: linux-kernel@vger.kernel.org
Cc: Andrew Morton <akpm@linux-foundation.org>,
Lee Schermerhorn <lee.schermerhorn@hp.com>,
Kosaki Motohiro <kosaki.motohiro@jp.fujitsu.com>
Subject: [PATCH -mm 17/24] mlock: downgrade mmap sem while populating mlocked regions
Date: Wed, 11 Jun 2008 14:42:31 -0400 [thread overview]
Message-ID: <20080611184339.899722381@redhat.com> (raw)
In-Reply-To: 20080611184214.605110868@redhat.com
[-- Attachment #1: vmscan-downgrade-mmap-sem-while-populating-mlocked-regions.patch --]
[-- Type: text/plain, Size: 4836 bytes --]
From: Lee Schermerhorn <lee.schermerhorn@hp.com>
We need to hold the mmap_sem for write to initiatate mlock()/munlock()
because we may need to merge/split vmas. However, this can lead to
very long lock hold times attempting to fault in a large memory region
to mlock it into memory. This can hold off other faults against the
mm [multithreaded tasks] and other scans of the mm, such as via /proc.
To alleviate this, downgrade the mmap_sem to read mode during the
population of the region for locking. This is especially the case
if we need to reclaim memory to lock down the region. We [probably?]
don't need to do this for unlocking as all of the pages should be
resident--they're already mlocked.
Now, the caller's of the mlock functions [mlock_fixup() and
mlock_vma_pages_range()] expect the mmap_sem to be returned in write
mode. Changing all callers appears to be way too much effort at this
point. So, restore write mode before returning. Note that this opens
a window where the mmap list could change in a multithreaded process.
So, at least for mlock_fixup(), where we could be called in a loop over
multiple vmas, we check that a vma still exists at the start address
and that vma still covers the page range [start,end). If not, we return
an error, -EAGAIN, and let the caller deal with it.
Return -EAGAIN from mlock_vma_pages_range() function and mlock_fixup()
if the vma at 'start' disappears or changes so that the page range
[start,end) is no longer contained in the vma. Again, let the caller
deal with it. Looks like only sys_remap_file_pages() [via mmap_region()]
should actually care.
With this patch, I no longer see processes like ps(1) blocked for seconds
or minutes at a time waiting for a large [multiple gigabyte] region to be
locked down. However, I occassionally see delays while unlocking or
unmapping a large mlocked region. Should we also downgrade the mmap_sem
for the unlock path?
Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
Signed-off-by: Rik van Riel <riel@redhat.com>
---
V2 -> V3:
+ rebase to 23-mm1 atop RvR's split lru series [no change]
+ fix function return types [void -> int] to fix build when
not configured.
New in V2.
mm/mlock.c | 46 +++++++++++++++++++++++++++++++++++++++++++---
1 file changed, 43 insertions(+), 3 deletions(-)
Index: linux-2.6.26-rc5-mm2/mm/mlock.c
===================================================================
--- linux-2.6.26-rc5-mm2.orig/mm/mlock.c 2008-06-10 22:28:01.000000000 -0400
+++ linux-2.6.26-rc5-mm2/mm/mlock.c 2008-06-10 22:28:05.000000000 -0400
@@ -308,6 +308,7 @@ static void __munlock_vma_pages_range(st
int mlock_vma_pages_range(struct vm_area_struct *vma,
unsigned long start, unsigned long end)
{
+ struct mm_struct *mm = vma->vm_mm;
int nr_pages = (end - start) / PAGE_SIZE;
BUG_ON(!(vma->vm_flags & VM_LOCKED));
@@ -319,8 +320,19 @@ int mlock_vma_pages_range(struct vm_area
if (!((vma->vm_flags & (VM_DONTEXPAND | VM_RESERVED)) ||
is_vm_hugetlb_page(vma) ||
- vma == get_gate_vma(current)))
- return __mlock_vma_pages_range(vma, start, end);
+ vma == get_gate_vma(current))) {
+ downgrade_write(&mm->mmap_sem);
+ nr_pages = __mlock_vma_pages_range(vma, start, end);
+
+ up_read(&mm->mmap_sem);
+ /* vma can change or disappear */
+ down_write(&mm->mmap_sem);
+ vma = find_vma(mm, start);
+ /* non-NULL vma must contain @start, but need to check @end */
+ if (!vma || end > vma->vm_end)
+ return -EAGAIN;
+ return nr_pages;
+ }
/*
* User mapped kernel pages or huge pages:
@@ -414,13 +426,41 @@ success:
vma->vm_flags = newflags;
if (lock) {
+ /*
+ * mmap_sem is currently held for write. Downgrade the write
+ * lock to a read lock so that other faults, mmap scans, ...
+ * while we fault in all pages.
+ */
+ downgrade_write(&mm->mmap_sem);
+
ret = __mlock_vma_pages_range(vma, start, end);
if (ret > 0) {
mm->locked_vm -= ret;
ret = 0;
}
- } else
+ /*
+ * Need to reacquire mmap sem in write mode, as our callers
+ * expect this. We have no support for atomically upgrading
+ * a sem to write, so we need to check for ranges while sem
+ * is unlocked.
+ */
+ up_read(&mm->mmap_sem);
+ /* vma can change or disappear */
+ down_write(&mm->mmap_sem);
+ *prev = find_vma(mm, start);
+ /* non-NULL *prev must contain @start, but need to check @end */
+ if (!(*prev) || end > (*prev)->vm_end)
+ ret = -EAGAIN;
+ } else {
+ /*
+ * TODO: for unlocking, pages will already be resident, so
+ * we don't need to wait for allocations/reclaim/pagein, ...
+ * However, unlocking a very large region can still take a
+ * while. Should we downgrade the semaphore for both lock
+ * AND unlock ?
+ */
__munlock_vma_pages_range(vma, start, end);
+ }
out:
*prev = vma;
--
All Rights Reversed
next prev parent reply other threads:[~2008-06-11 18:48 UTC|newest]
Thread overview: 60+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-06-11 18:42 [PATCH -mm 00/24] VM pageout scalability improvements (V12) Rik van Riel
2008-06-11 18:42 ` [PATCH -mm 01/24] vmscan: move isolate_lru_page() to vmscan.c Rik van Riel
2008-06-11 18:42 ` [PATCH -mm 02/24] vmscan: Use an indexed array for LRU variables Rik van Riel
2008-06-11 18:42 ` [PATCH -mm 03/24] swap: use an array for the LRU pagevecs Rik van Riel
2008-06-11 18:42 ` [PATCH -mm 04/24] vmscan: free swap space on swap-in/activation Rik van Riel
2008-06-11 18:42 ` [PATCH -mm 05/24] define page_file_cache() function Rik van Riel
2008-06-11 18:42 ` [PATCH -mm 06/24] vmscan: split LRU lists into anon & file sets Rik van Riel
2008-06-13 0:39 ` Hiroshi Shimamoto
2008-06-13 17:48 ` [PATCH] fix printk in show_free_areas Rik van Riel
2008-06-13 20:21 ` [PATCH] collect lru meminfo statistics from correct offset Lee Schermerhorn
2008-06-13 20:21 ` Lee Schermerhorn
2008-06-15 15:07 ` [PATCH] fix printk in show_free_areas KOSAKI Motohiro
2008-06-11 18:42 ` [PATCH -mm 07/24] vmscan: second chance replacement for anonymous pages Rik van Riel
2008-06-11 18:42 ` [PATCH -mm 08/24] vmscan: fix pagecache reclaim referenced bit check Rik van Riel
2008-06-11 18:42 ` [PATCH -mm 09/24] vmscan: add newly swapped in pages to the inactive list Rik van Riel
2008-06-11 18:42 ` [PATCH -mm 10/24] more aggressively use lumpy reclaim Rik van Riel
2008-06-11 18:42 ` [PATCH -mm 11/24] pageflag helpers for configed-out flags Rik van Riel
2008-06-11 20:01 ` Andrew Morton
2008-06-11 20:08 ` Rik van Riel
2008-06-11 20:23 ` Lee Schermerhorn
2008-06-11 20:30 ` Rik van Riel
2008-06-11 20:28 ` Lee Schermerhorn
2008-06-11 20:32 ` Rik van Riel
2008-06-11 20:43 ` Lee Schermerhorn
2008-06-11 20:48 ` Rik van Riel
2008-06-11 18:42 ` [PATCH -mm 12/24] Unevictable LRU Infrastructure Rik van Riel
2008-06-11 18:42 ` Rik van Riel
2008-06-11 18:42 ` [PATCH -mm 13/24] Unevictable LRU Page Statistics Rik van Riel
2008-06-11 18:42 ` [PATCH -mm 14/24] Ramfs and Ram Disk pages are unevictable Rik van Riel
2008-06-11 18:42 ` Rik van Riel
2008-06-12 0:54 ` Nick Piggin
2008-06-12 0:54 ` Nick Piggin
2008-06-12 17:29 ` Rik van Riel
2008-06-12 17:29 ` Rik van Riel
2008-06-12 17:37 ` Nick Piggin
2008-06-12 17:37 ` Nick Piggin
2008-06-12 17:50 ` Rik van Riel
2008-06-12 17:50 ` Rik van Riel
2008-06-12 17:57 ` Nick Piggin
2008-06-12 17:57 ` Nick Piggin
2008-06-11 18:42 ` [PATCH -mm 15/24] SHM_LOCKED " Rik van Riel
2008-06-11 18:42 ` [PATCH -mm 16/24] mlock: mlocked " Rik van Riel
2008-06-11 18:42 ` Rik van Riel, Nick Piggin
2008-06-11 18:42 ` Rik van Riel [this message]
2008-06-11 18:42 ` [PATCH -mm 18/24] mmap: handle mlocked pages during map, remap, unmap Rik van Riel
2008-06-11 18:42 ` Rik van Riel
2008-06-11 18:42 ` [PATCH -mm 19/24] vmstat: mlocked pages statistics Rik van Riel
2008-06-11 18:42 ` [PATCH -mm 20/24] swap: cull unevictable pages in fault path Rik van Riel
2008-06-11 18:42 ` Rik van Riel, Lee Schermerhorn
2008-06-11 18:42 ` [PATCH -mm 21/24] vmstat: unevictable and mlocked pages vm events Rik van Riel
2008-06-11 18:42 ` [PATCH -mm 22/24] vmscan: unevictable LRU scan sysctl Rik van Riel
2008-06-11 18:42 ` Rik van Riel, Lee Schermerhorn
2008-06-11 18:42 ` [PATCH -mm 23/24] mlock: count attempts to free mlocked page Rik van Riel
2008-06-11 18:42 ` [PATCH -mm 24/24] doc: unevictable LRU and mlocked pages documentation Rik van Riel
2008-06-11 18:42 ` Rik van Riel
2008-06-12 5:34 ` [PATCH -mm 00/24] VM pageout scalability improvements (V12) Andrew Morton
2008-06-12 13:31 ` Rik van Riel
2008-06-16 5:32 ` KOSAKI Motohiro
2008-06-16 6:20 ` Andrew Morton
2008-06-16 6:22 ` KOSAKI Motohiro
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20080611184339.899722381@redhat.com \
--to=riel@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=kosaki.motohiro@jp.fujitsu.com \
--cc=lee.schermerhorn@hp.com \
--cc=linux-kernel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.