All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andrew Morton <akpm@linux-foundation.org>
To: Michel Lespinasse <walken@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	Hugh Dickins <hughd@google.com>, Rik van Riel <riel@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Nick Piggin <npiggin@kernel.dk>,
	KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 1/6] mlock: only hold mmap_sem in shared mode when faulting in pages
Date: Mon, 13 Dec 2010 17:05:26 -0800	[thread overview]
Message-ID: <20101213170526.3b010058.akpm@linux-foundation.org> (raw)
In-Reply-To: <20101214005140.GA29904@google.com>

On Mon, 13 Dec 2010 16:51:40 -0800
Michel Lespinasse <walken@google.com> wrote:

> On Thu, Dec 9, 2010 at 10:11 PM, Linus Torvalds <torvalds@linux-foundation.org> wrote:
> > On Wednesday, December 8, 2010, Michel Lespinasse <walken@google.com> wrote:
> >>
> >> Yes, patch 1/6 changes the long hold time to be in read mode instead
> >> of write mode, which is only a band-aid. But, this prepares for patch
> >> 5/6, which releases mmap_sem whenever there is contention on it or
> >> when blocking on disk reads.
> >
> > I have to say that I'm not a huge fan of that horribly kludgy
> > contention check case.
> >
> > The "move page-in to read-locked sequence" and the changes to
> > get_user_pages look fine, but the contention thing is just disgusting.
> > I'd really like to see some other approach if at all possible.
> 
> Andrew, should I amend my patches to remove the rwsem_is_contended() code ?
> This would involve:
> - remove rwsem-implement-rwsem_is_contended.patch and
>   x86-rwsem-more-precise-rwsem_is_contended-implementation.patch
> - in mlock-do-not-hold-mmap_sem-for-extended-periods-of-time.patch,
>   drop the one hunk making use of rwsem_is_contended (rest of the patch
>   would still work without it)

I think I fixed all that up.

> - optionally, follow up patch to limit batch size to a constant
>   in do_mlock_pages():
> 
> diff --git a/mm/mlock.c b/mm/mlock.c
> index 569ae6a..a505a7e 100644
> --- a/mm/mlock.c
> +++ b/mm/mlock.c
> @@ -457,15 +457,23 @@ static int do_mlock_pages(unsigned long start, size_t len)
>  			continue;
>  		if (nstart < vma->vm_start)
>  			nstart = vma->vm_start;
> +		/*
> +		 * Constrain batch size to limit mmap_sem hold time.
> +		 */
> +		if (nend > nstart + 1024 * PAGE_SIZE)
> +			nend = nstart + 1024 * PAGE_SIZE;
>  		/*
>  		 * Now fault in a range of pages. __mlock_vma_pages_range()
>  		 * double checks the vma flags, so that it won't mlock pages
>  		 * if the vma was already munlocked.
>  		 */
>  		ret = __mlock_vma_pages_range(vma, nstart, nend, &locked);
>  		if (ret < 0) {
>  			ret = __mlock_posix_error_return(ret);
>  			break;
> +		} else if (locked) {
> +			locked = 0;
> +			up_read(&mm->mmap_sem);
>  		}
>  		nend = nstart + ret * PAGE_SIZE;
>  		ret = 0;
> 
> 
> I don't really prefer using a constant, but I'm not sure how else to make
> Linus happy :)

rwsem_is_contended() didn't seem so bad to me.

Reading 1024 pages can still take a long time.  I can't immediately
think of a better approach though.


WARNING: multiple messages have this Message-ID (diff)
From: Andrew Morton <akpm@linux-foundation.org>
To: Michel Lespinasse <walken@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	Hugh Dickins <hughd@google.com>, Rik van Riel <riel@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Nick Piggin <npiggin@kernel.dk>,
	KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 1/6] mlock: only hold mmap_sem in shared mode when faulting in pages
Date: Mon, 13 Dec 2010 17:05:26 -0800	[thread overview]
Message-ID: <20101213170526.3b010058.akpm@linux-foundation.org> (raw)
In-Reply-To: <20101214005140.GA29904@google.com>

On Mon, 13 Dec 2010 16:51:40 -0800
Michel Lespinasse <walken@google.com> wrote:

> On Thu, Dec 9, 2010 at 10:11 PM, Linus Torvalds <torvalds@linux-foundation.org> wrote:
> > On Wednesday, December 8, 2010, Michel Lespinasse <walken@google.com> wrote:
> >>
> >> Yes, patch 1/6 changes the long hold time to be in read mode instead
> >> of write mode, which is only a band-aid. But, this prepares for patch
> >> 5/6, which releases mmap_sem whenever there is contention on it or
> >> when blocking on disk reads.
> >
> > I have to say that I'm not a huge fan of that horribly kludgy
> > contention check case.
> >
> > The "move page-in to read-locked sequence" and the changes to
> > get_user_pages look fine, but the contention thing is just disgusting.
> > I'd really like to see some other approach if at all possible.
> 
> Andrew, should I amend my patches to remove the rwsem_is_contended() code ?
> This would involve:
> - remove rwsem-implement-rwsem_is_contended.patch and
>   x86-rwsem-more-precise-rwsem_is_contended-implementation.patch
> - in mlock-do-not-hold-mmap_sem-for-extended-periods-of-time.patch,
>   drop the one hunk making use of rwsem_is_contended (rest of the patch
>   would still work without it)

I think I fixed all that up.

> - optionally, follow up patch to limit batch size to a constant
>   in do_mlock_pages():
> 
> diff --git a/mm/mlock.c b/mm/mlock.c
> index 569ae6a..a505a7e 100644
> --- a/mm/mlock.c
> +++ b/mm/mlock.c
> @@ -457,15 +457,23 @@ static int do_mlock_pages(unsigned long start, size_t len)
>  			continue;
>  		if (nstart < vma->vm_start)
>  			nstart = vma->vm_start;
> +		/*
> +		 * Constrain batch size to limit mmap_sem hold time.
> +		 */
> +		if (nend > nstart + 1024 * PAGE_SIZE)
> +			nend = nstart + 1024 * PAGE_SIZE;
>  		/*
>  		 * Now fault in a range of pages. __mlock_vma_pages_range()
>  		 * double checks the vma flags, so that it won't mlock pages
>  		 * if the vma was already munlocked.
>  		 */
>  		ret = __mlock_vma_pages_range(vma, nstart, nend, &locked);
>  		if (ret < 0) {
>  			ret = __mlock_posix_error_return(ret);
>  			break;
> +		} else if (locked) {
> +			locked = 0;
> +			up_read(&mm->mmap_sem);
>  		}
>  		nend = nstart + ret * PAGE_SIZE;
>  		ret = 0;
> 
> 
> I don't really prefer using a constant, but I'm not sure how else to make
> Linus happy :)

rwsem_is_contended() didn't seem so bad to me.

Reading 1024 pages can still take a long time.  I can't immediately
think of a better approach though.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2010-12-14  1:05 UTC|newest]

Thread overview: 44+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-12-03  0:16 [PATCH 0/6] mlock: do not hold mmap_sem for extended periods of time Michel Lespinasse
2010-12-03  0:16 ` Michel Lespinasse
2010-12-03  0:16 ` [PATCH 1/6] mlock: only hold mmap_sem in shared mode when faulting in pages Michel Lespinasse
2010-12-03  0:16   ` Michel Lespinasse
2010-12-08 23:27   ` Andrew Morton
2010-12-08 23:27     ` Andrew Morton
2010-12-08 23:58     ` Michel Lespinasse
2010-12-08 23:58       ` Michel Lespinasse
2010-12-10  6:11       ` Linus Torvalds
2010-12-10  6:11         ` Linus Torvalds
2010-12-10  6:39         ` Michel Lespinasse
2010-12-10  6:39           ` Michel Lespinasse
2010-12-10 11:12           ` Peter Zijlstra
2010-12-10 11:12             ` Peter Zijlstra
2010-12-14  0:51         ` Michel Lespinasse
2010-12-14  0:51           ` Michel Lespinasse
2010-12-14  1:05           ` Andrew Morton [this message]
2010-12-14  1:05             ` Andrew Morton
2010-12-14  1:26             ` Michel Lespinasse
2010-12-14  1:26               ` Michel Lespinasse
2010-12-14 15:43             ` Linus Torvalds
2010-12-14 15:43               ` Linus Torvalds
2010-12-14 23:22               ` Michel Lespinasse
2010-12-14 23:22                 ` Michel Lespinasse
2010-12-03  0:16 ` [PATCH 2/6] mm: add FOLL_MLOCK follow_page flag Michel Lespinasse
2010-12-03  0:16   ` Michel Lespinasse
2010-12-04  6:55   ` Michel Lespinasse
2010-12-04  6:55     ` Michel Lespinasse
2010-12-03  0:16 ` [PATCH 3/6] mm: move VM_LOCKED check to __mlock_vma_pages_range() Michel Lespinasse
2010-12-03  0:16   ` Michel Lespinasse
2010-12-03  0:16 ` [PATCH 4/6] rwsem: implement rwsem_is_contended() Michel Lespinasse
2010-12-03  0:16   ` Michel Lespinasse
2010-12-03  0:16 ` [PATCH 5/6] mlock: do not hold mmap_sem for extended periods of time Michel Lespinasse
2010-12-03  0:16   ` Michel Lespinasse
2010-12-08 23:42   ` Andrew Morton
2010-12-08 23:42     ` Andrew Morton
2010-12-03  0:16 ` [PATCH 6/6] x86 rwsem: more precise rwsem_is_contended() implementation Michel Lespinasse
2010-12-03  0:16   ` Michel Lespinasse
2010-12-03 22:41   ` Peter Zijlstra
2010-12-03 22:41     ` Peter Zijlstra
2010-12-03 22:51     ` Michel Lespinasse
2010-12-03 22:51       ` Michel Lespinasse
2010-12-03 23:02 ` [PATCH 0/6] mlock: do not hold mmap_sem for extended periods of time Michel Lespinasse
2010-12-03 23:02   ` Michel Lespinasse

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20101213170526.3b010058.akpm@linux-foundation.org \
    --to=akpm@linux-foundation.org \
    --cc=hughd@google.com \
    --cc=kosaki.motohiro@jp.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=npiggin@kernel.dk \
    --cc=peterz@infradead.org \
    --cc=riel@redhat.com \
    --cc=torvalds@linux-foundation.org \
    --cc=walken@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.