From: Michel Lespinasse <walken@google.com>
To: Andrew Morton <akpm@linux-foundation.org>,
Linus Torvalds <torvalds@linux-foundation.org>
Cc: "linux-mm@kvack.org" <linux-mm@kvack.org>,
Hugh Dickins <hughd@google.com>, Rik van Riel <riel@redhat.com>,
Peter Zijlstra <peterz@infradead.org>,
Nick Piggin <npiggin@kernel.dk>,
KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 1/6] mlock: only hold mmap_sem in shared mode when faulting in pages
Date: Mon, 13 Dec 2010 16:51:40 -0800 [thread overview]
Message-ID: <20101214005140.GA29904@google.com> (raw)
In-Reply-To: <AANLkTinY0pcTcd+OxPLyvsJgHgh=cTaB1-8VbEA2tstb@mail.gmail.com>
On Thu, Dec 9, 2010 at 10:11 PM, Linus Torvalds <torvalds@linux-foundation.org> wrote:
> On Wednesday, December 8, 2010, Michel Lespinasse <walken@google.com> wrote:
>>
>> Yes, patch 1/6 changes the long hold time to be in read mode instead
>> of write mode, which is only a band-aid. But, this prepares for patch
>> 5/6, which releases mmap_sem whenever there is contention on it or
>> when blocking on disk reads.
>
> I have to say that I'm not a huge fan of that horribly kludgy
> contention check case.
>
> The "move page-in to read-locked sequence" and the changes to
> get_user_pages look fine, but the contention thing is just disgusting.
> I'd really like to see some other approach if at all possible.
Andrew, should I amend my patches to remove the rwsem_is_contended() code ?
This would involve:
- remove rwsem-implement-rwsem_is_contended.patch and
x86-rwsem-more-precise-rwsem_is_contended-implementation.patch
- in mlock-do-not-hold-mmap_sem-for-extended-periods-of-time.patch,
drop the one hunk making use of rwsem_is_contended (rest of the patch
would still work without it)
- optionally, follow up patch to limit batch size to a constant
in do_mlock_pages():
diff --git a/mm/mlock.c b/mm/mlock.c
index 569ae6a..a505a7e 100644
--- a/mm/mlock.c
+++ b/mm/mlock.c
@@ -457,15 +457,23 @@ static int do_mlock_pages(unsigned long start, size_t len)
continue;
if (nstart < vma->vm_start)
nstart = vma->vm_start;
+ /*
+ * Constrain batch size to limit mmap_sem hold time.
+ */
+ if (nend > nstart + 1024 * PAGE_SIZE)
+ nend = nstart + 1024 * PAGE_SIZE;
/*
* Now fault in a range of pages. __mlock_vma_pages_range()
* double checks the vma flags, so that it won't mlock pages
* if the vma was already munlocked.
*/
ret = __mlock_vma_pages_range(vma, nstart, nend, &locked);
if (ret < 0) {
ret = __mlock_posix_error_return(ret);
break;
+ } else if (locked) {
+ locked = 0;
+ up_read(&mm->mmap_sem);
}
nend = nstart + ret * PAGE_SIZE;
ret = 0;
I don't really prefer using a constant, but I'm not sure how else to make
Linus happy :)
--
Michel "Walken" Lespinasse
A program is never fully debugged until the last user dies.
WARNING: multiple messages have this Message-ID (diff)
From: Michel Lespinasse <walken@google.com>
To: Andrew Morton <akpm@linux-foundation.org>,
Linus Torvalds <torvalds@linux-foundation.org>
Cc: "linux-mm@kvack.org" <linux-mm@kvack.org>,
Hugh Dickins <hughd@google.com>, Rik van Riel <riel@redhat.com>,
Peter Zijlstra <peterz@infradead.org>,
Nick Piggin <npiggin@kernel.dk>,
KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 1/6] mlock: only hold mmap_sem in shared mode when faulting in pages
Date: Mon, 13 Dec 2010 16:51:40 -0800 [thread overview]
Message-ID: <20101214005140.GA29904@google.com> (raw)
In-Reply-To: <AANLkTinY0pcTcd+OxPLyvsJgHgh=cTaB1-8VbEA2tstb@mail.gmail.com>
On Thu, Dec 9, 2010 at 10:11 PM, Linus Torvalds <torvalds@linux-foundation.org> wrote:
> On Wednesday, December 8, 2010, Michel Lespinasse <walken@google.com> wrote:
>>
>> Yes, patch 1/6 changes the long hold time to be in read mode instead
>> of write mode, which is only a band-aid. But, this prepares for patch
>> 5/6, which releases mmap_sem whenever there is contention on it or
>> when blocking on disk reads.
>
> I have to say that I'm not a huge fan of that horribly kludgy
> contention check case.
>
> The "move page-in to read-locked sequence" and the changes to
> get_user_pages look fine, but the contention thing is just disgusting.
> I'd really like to see some other approach if at all possible.
Andrew, should I amend my patches to remove the rwsem_is_contended() code ?
This would involve:
- remove rwsem-implement-rwsem_is_contended.patch and
x86-rwsem-more-precise-rwsem_is_contended-implementation.patch
- in mlock-do-not-hold-mmap_sem-for-extended-periods-of-time.patch,
drop the one hunk making use of rwsem_is_contended (rest of the patch
would still work without it)
- optionally, follow up patch to limit batch size to a constant
in do_mlock_pages():
diff --git a/mm/mlock.c b/mm/mlock.c
index 569ae6a..a505a7e 100644
--- a/mm/mlock.c
+++ b/mm/mlock.c
@@ -457,15 +457,23 @@ static int do_mlock_pages(unsigned long start, size_t len)
continue;
if (nstart < vma->vm_start)
nstart = vma->vm_start;
+ /*
+ * Constrain batch size to limit mmap_sem hold time.
+ */
+ if (nend > nstart + 1024 * PAGE_SIZE)
+ nend = nstart + 1024 * PAGE_SIZE;
/*
* Now fault in a range of pages. __mlock_vma_pages_range()
* double checks the vma flags, so that it won't mlock pages
* if the vma was already munlocked.
*/
ret = __mlock_vma_pages_range(vma, nstart, nend, &locked);
if (ret < 0) {
ret = __mlock_posix_error_return(ret);
break;
+ } else if (locked) {
+ locked = 0;
+ up_read(&mm->mmap_sem);
}
nend = nstart + ret * PAGE_SIZE;
ret = 0;
I don't really prefer using a constant, but I'm not sure how else to make
Linus happy :)
--
Michel "Walken" Lespinasse
A program is never fully debugged until the last user dies.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2010-12-14 0:51 UTC|newest]
Thread overview: 44+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-12-03 0:16 [PATCH 0/6] mlock: do not hold mmap_sem for extended periods of time Michel Lespinasse
2010-12-03 0:16 ` Michel Lespinasse
2010-12-03 0:16 ` [PATCH 1/6] mlock: only hold mmap_sem in shared mode when faulting in pages Michel Lespinasse
2010-12-03 0:16 ` Michel Lespinasse
2010-12-08 23:27 ` Andrew Morton
2010-12-08 23:27 ` Andrew Morton
2010-12-08 23:58 ` Michel Lespinasse
2010-12-08 23:58 ` Michel Lespinasse
2010-12-10 6:11 ` Linus Torvalds
2010-12-10 6:11 ` Linus Torvalds
2010-12-10 6:39 ` Michel Lespinasse
2010-12-10 6:39 ` Michel Lespinasse
2010-12-10 11:12 ` Peter Zijlstra
2010-12-10 11:12 ` Peter Zijlstra
2010-12-14 0:51 ` Michel Lespinasse [this message]
2010-12-14 0:51 ` Michel Lespinasse
2010-12-14 1:05 ` Andrew Morton
2010-12-14 1:05 ` Andrew Morton
2010-12-14 1:26 ` Michel Lespinasse
2010-12-14 1:26 ` Michel Lespinasse
2010-12-14 15:43 ` Linus Torvalds
2010-12-14 15:43 ` Linus Torvalds
2010-12-14 23:22 ` Michel Lespinasse
2010-12-14 23:22 ` Michel Lespinasse
2010-12-03 0:16 ` [PATCH 2/6] mm: add FOLL_MLOCK follow_page flag Michel Lespinasse
2010-12-03 0:16 ` Michel Lespinasse
2010-12-04 6:55 ` Michel Lespinasse
2010-12-04 6:55 ` Michel Lespinasse
2010-12-03 0:16 ` [PATCH 3/6] mm: move VM_LOCKED check to __mlock_vma_pages_range() Michel Lespinasse
2010-12-03 0:16 ` Michel Lespinasse
2010-12-03 0:16 ` [PATCH 4/6] rwsem: implement rwsem_is_contended() Michel Lespinasse
2010-12-03 0:16 ` Michel Lespinasse
2010-12-03 0:16 ` [PATCH 5/6] mlock: do not hold mmap_sem for extended periods of time Michel Lespinasse
2010-12-03 0:16 ` Michel Lespinasse
2010-12-08 23:42 ` Andrew Morton
2010-12-08 23:42 ` Andrew Morton
2010-12-03 0:16 ` [PATCH 6/6] x86 rwsem: more precise rwsem_is_contended() implementation Michel Lespinasse
2010-12-03 0:16 ` Michel Lespinasse
2010-12-03 22:41 ` Peter Zijlstra
2010-12-03 22:41 ` Peter Zijlstra
2010-12-03 22:51 ` Michel Lespinasse
2010-12-03 22:51 ` Michel Lespinasse
2010-12-03 23:02 ` [PATCH 0/6] mlock: do not hold mmap_sem for extended periods of time Michel Lespinasse
2010-12-03 23:02 ` Michel Lespinasse
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20101214005140.GA29904@google.com \
--to=walken@google.com \
--cc=akpm@linux-foundation.org \
--cc=hughd@google.com \
--cc=kosaki.motohiro@jp.fujitsu.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=npiggin@kernel.dk \
--cc=peterz@infradead.org \
--cc=riel@redhat.com \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.