From: Nick Piggin <npiggin@suse.de>
To: "Török Edwin" <edwintorok@gmail.com>
Cc: Mike Waychison <mikew@google.com>, Ying Han <yinghan@google.com>,
Ingo Molnar <mingo@elte.hu>,
linux-mm@kvack.org, linux-kernel@vger.kernel.org,
akpm <akpm@linux-foundation.org>,
David Rientjes <rientjes@google.com>,
Rohit Seth <rohitseth@google.com>,
Hugh Dickins <hugh@veritas.com>,
Peter Zijlstra <a.p.zijlstra@chello.nl>,
"H. Peter Anvin" <hpa@zytor.com>
Subject: Re: [RFC v1][PATCH]page_fault retry with NOPAGE_RETRY
Date: Mon, 1 Dec 2008 09:58:44 +0100 [thread overview]
Message-ID: <20081201085844.GC4926@wotan.suse.de> (raw)
In-Reply-To: <4932EF90.9070601@gmail.com>
On Sun, Nov 30, 2008 at 09:54:56PM +0200, Török Edwin wrote:
> On 2008-11-29 01:02, Mike Waychison wrote:
> > Nick Piggin wrote:
> >> On Thu, Nov 27, 2008 at 11:03:40AM -0800, Mike Waychison wrote:
> >>> Nick Piggin wrote:
> >>>> On Thu, Nov 27, 2008 at 01:28:41AM -0800, Mike Waychison wrote:
> >>>>> Török however identified mmap taking on the order of several
> >>>>> milliseconds due to this exact problem:
> >>>>>
> >>>>> http://lkml.org/lkml/2008/9/12/185
> >>>> Turns out to be a different problem.
> >>>>
> >>> What do you mean?
> >>
> >> His is just contending on the write side. The retry patch doesn't help.
> >>
> >
> > I disagree. How do you get 'write contention' from the following
> > paragraph:
> >
> > "Just to confirm that the problem is with pagefaults and mmap, I dropped
> > the mmap_sem in filemap_fault, and then
> > I got same performance in my testprogram for mmap and read. Of course
> > this is totally unsafe, because the mapping could change at any time."
> >
> > It reads to me that the writers were held off by the readers sleeping
> > in IO.
>
> It is true that I have a write/write contention too, but do_page_fault
> shows up too on lock_stat.
>
> This is my guess at what happens:
> * filemap_fault used to sleep with mmap_sem held while waiting for the
> page lock.
> * the google patch avoids that, which is fine: if page lock can't be
> taken, it drops mmap_sem, waits, then retries the fault once
> * however after we acquired the page lock, mapping->a_ops->readpage is
> invoked, mmap_sem is NOT dropped here:
>
> error = mapping->a_ops->readpage(file, page);
> if (!error) {
> wait_on_page_locked(page);
>
> If my understanding is correct ->readpage does the actual disk I/O, and
> it keeps the page locked, when the lock is released we know it has finished.
> So wait_on_page_locked(page) holds mmap_sem locked for read during the
> disk I/O, preventing sys_mmap/sys_munmap from making progress.
Yes that's exactly right. Ahh, the google patch doesn't solve this
case? Interesting...
> I don't know how to prove/disprove my guess above, suggestions welcome.
>
> Could the patch be changed to also release the mmap_sem after readpage,
> and before wait_on_page_locked?
It should be possible somehow, but it is difficult because after
dropping mmap_sem, then we have to basically retry the whole fault
because the vma might have gone away.
WARNING: multiple messages have this Message-ID (diff)
From: Nick Piggin <npiggin@suse.de>
To: "Török Edwin" <edwintorok@gmail.com>
Cc: Mike Waychison <mikew@google.com>, Ying Han <yinghan@google.com>,
Ingo Molnar <mingo@elte.hu>,
linux-mm@kvack.org, linux-kernel@vger.kernel.org,
akpm <akpm@linux-foundation.org>,
David Rientjes <rientjes@google.com>,
Rohit Seth <rohitseth@google.com>,
Hugh Dickins <hugh@veritas.com>,
Peter Zijlstra <a.p.zijlstra@chello.nl>,
"H. Peter Anvin" <hpa@zytor.com>
Subject: Re: [RFC v1][PATCH]page_fault retry with NOPAGE_RETRY
Date: Mon, 1 Dec 2008 09:58:44 +0100 [thread overview]
Message-ID: <20081201085844.GC4926@wotan.suse.de> (raw)
In-Reply-To: <4932EF90.9070601@gmail.com>
On Sun, Nov 30, 2008 at 09:54:56PM +0200, Torok Edwin wrote:
> On 2008-11-29 01:02, Mike Waychison wrote:
> > Nick Piggin wrote:
> >> On Thu, Nov 27, 2008 at 11:03:40AM -0800, Mike Waychison wrote:
> >>> Nick Piggin wrote:
> >>>> On Thu, Nov 27, 2008 at 01:28:41AM -0800, Mike Waychison wrote:
> >>>>> Torok however identified mmap taking on the order of several
> >>>>> milliseconds due to this exact problem:
> >>>>>
> >>>>> http://lkml.org/lkml/2008/9/12/185
> >>>> Turns out to be a different problem.
> >>>>
> >>> What do you mean?
> >>
> >> His is just contending on the write side. The retry patch doesn't help.
> >>
> >
> > I disagree. How do you get 'write contention' from the following
> > paragraph:
> >
> > "Just to confirm that the problem is with pagefaults and mmap, I dropped
> > the mmap_sem in filemap_fault, and then
> > I got same performance in my testprogram for mmap and read. Of course
> > this is totally unsafe, because the mapping could change at any time."
> >
> > It reads to me that the writers were held off by the readers sleeping
> > in IO.
>
> It is true that I have a write/write contention too, but do_page_fault
> shows up too on lock_stat.
>
> This is my guess at what happens:
> * filemap_fault used to sleep with mmap_sem held while waiting for the
> page lock.
> * the google patch avoids that, which is fine: if page lock can't be
> taken, it drops mmap_sem, waits, then retries the fault once
> * however after we acquired the page lock, mapping->a_ops->readpage is
> invoked, mmap_sem is NOT dropped here:
>
> error = mapping->a_ops->readpage(file, page);
> if (!error) {
> wait_on_page_locked(page);
>
> If my understanding is correct ->readpage does the actual disk I/O, and
> it keeps the page locked, when the lock is released we know it has finished.
> So wait_on_page_locked(page) holds mmap_sem locked for read during the
> disk I/O, preventing sys_mmap/sys_munmap from making progress.
Yes that's exactly right. Ahh, the google patch doesn't solve this
case? Interesting...
> I don't know how to prove/disprove my guess above, suggestions welcome.
>
> Could the patch be changed to also release the mmap_sem after readpage,
> and before wait_on_page_locked?
It should be possible somehow, but it is difficult because after
dropping mmap_sem, then we have to basically retry the whole fault
because the vma might have gone away.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2008-12-01 8:58 UTC|newest]
Thread overview: 82+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-11-22 6:47 [RFC v1][PATCH]page_fault retry with NOPAGE_RETRY Ying Han
2008-11-22 6:47 ` Ying Han
2008-11-22 7:15 ` Andrew Morton
2008-11-22 7:15 ` Andrew Morton
2008-11-23 9:18 ` Ingo Molnar
2008-11-23 9:18 ` Ingo Molnar
2008-11-23 18:24 ` Andrew Morton
2008-11-23 18:24 ` Andrew Morton
2008-11-25 18:42 ` Ying Han
2008-11-25 18:42 ` Ying Han
2008-11-26 12:32 ` Nick Piggin
2008-11-26 12:32 ` Nick Piggin
2008-11-26 19:57 ` Mike Waychison
2008-11-26 19:57 ` Mike Waychison
2008-11-27 8:55 ` Nick Piggin
2008-11-27 8:55 ` Nick Piggin
2008-11-27 9:28 ` Mike Waychison
2008-11-27 9:28 ` Mike Waychison
2008-11-27 10:00 ` Peter Zijlstra
2008-11-27 10:00 ` Peter Zijlstra
2008-11-27 10:14 ` Nick Piggin
2008-11-27 10:14 ` Nick Piggin
2008-11-27 19:22 ` Mike Waychison
2008-11-27 19:22 ` Mike Waychison
2008-11-28 9:41 ` Nick Piggin
2008-11-28 9:41 ` Nick Piggin
2008-11-28 22:46 ` Mike Waychison
2008-11-28 22:46 ` Mike Waychison
2008-11-27 11:08 ` KOSAKI Motohiro
2008-11-27 11:08 ` KOSAKI Motohiro
2008-11-27 19:10 ` Mike Waychison
2008-11-27 19:10 ` Mike Waychison
2008-11-27 11:39 ` Török Edwin
2008-11-27 11:39 ` Török Edwin
2008-11-27 12:03 ` Nick Piggin
2008-11-27 12:03 ` Nick Piggin
2008-11-27 12:21 ` Török Edwin
2008-11-27 12:21 ` Török Edwin
2008-11-27 12:32 ` Peter Zijlstra
2008-11-27 12:32 ` Peter Zijlstra
2008-11-27 12:39 ` Nick Piggin
2008-11-27 12:39 ` Nick Piggin
2008-11-27 12:52 ` Török Edwin
2008-11-27 12:52 ` Török Edwin
2008-11-27 13:05 ` Nick Piggin
2008-11-27 13:05 ` Nick Piggin
2008-11-27 13:10 ` Török Edwin
2008-11-27 13:10 ` Török Edwin
2008-11-27 13:12 ` Nick Piggin
2008-11-27 13:12 ` Nick Piggin
2008-11-27 13:23 ` Török Edwin
2008-11-27 13:23 ` Török Edwin
2008-11-28 12:10 ` Nick Piggin
2008-11-28 12:10 ` Nick Piggin
2008-11-30 19:38 ` Török Edwin
2008-11-30 19:38 ` Török Edwin
2008-12-01 8:52 ` Nick Piggin
2008-12-01 8:52 ` Nick Piggin
2008-12-01 11:13 ` Nick Piggin
2008-12-01 11:13 ` Nick Piggin
2008-12-01 11:37 ` Török Edwin
2008-12-01 11:37 ` Török Edwin
2008-12-04 22:27 ` Ying Han
2008-12-04 22:27 ` Ying Han
2008-12-05 6:50 ` Török Edwin
2008-12-05 6:50 ` Török Edwin
2008-11-27 13:08 ` Nick Piggin
2008-11-27 13:08 ` Nick Piggin
2008-11-27 19:03 ` Mike Waychison
2008-11-27 19:03 ` Mike Waychison
2008-11-28 9:37 ` Nick Piggin
2008-11-28 9:37 ` Nick Piggin
2008-11-28 23:02 ` Mike Waychison
2008-11-28 23:02 ` Mike Waychison
2008-11-30 19:54 ` Török Edwin
2008-11-30 19:54 ` Török Edwin
2008-12-01 4:50 ` Mike Waychison
2008-12-01 4:50 ` Mike Waychison
2008-12-01 8:58 ` Nick Piggin [this message]
2008-12-01 8:58 ` Nick Piggin
2008-12-01 11:45 ` Nick Piggin
2008-12-01 11:45 ` Nick Piggin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20081201085844.GC4926@wotan.suse.de \
--to=npiggin@suse.de \
--cc=a.p.zijlstra@chello.nl \
--cc=akpm@linux-foundation.org \
--cc=edwintorok@gmail.com \
--cc=hpa@zytor.com \
--cc=hugh@veritas.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mikew@google.com \
--cc=mingo@elte.hu \
--cc=rientjes@google.com \
--cc=rohitseth@google.com \
--cc=yinghan@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.