From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
To: Rik van Riel <riel@redhat.com>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>,
Michal Hocko <mhocko@kernel.org>,
Ebru Akagunduz <ebru.akagunduz@gmail.com>,
linux-mm@kvack.org, hughd@google.com, akpm@linux-foundation.org,
n-horiguchi@ah.jp.nec.com, aarcange@redhat.com,
iamjoonsoo.kim@lge.com, gorcunov@openvz.org,
linux-kernel@vger.kernel.org, mgorman@suse.de,
rientjes@google.com, vbabka@suse.cz,
aneesh.kumar@linux.vnet.ibm.com, hannes@cmpxchg.org,
boaz@plexistor.com
Subject: Re: [PATCH 3/3] mm, thp: make swapin readahead under down_read of mmap_sem
Date: Tue, 24 May 2016 00:49:42 +0300 [thread overview]
Message-ID: <20160523214942.GA79646@black.fi.intel.com> (raw)
In-Reply-To: <1464034383.16365.70.camel@redhat.com>
On Mon, May 23, 2016 at 04:13:03PM -0400, Rik van Riel wrote:
> On Mon, 2016-05-23 at 23:02 +0300, Kirill A. Shutemov wrote:
> > On Mon, May 23, 2016 at 03:26:47PM -0400, Rik van Riel wrote:
> > >
> > > On Mon, 2016-05-23 at 22:01 +0300, Kirill A. Shutemov wrote:
> > > >
> > > > On Mon, May 23, 2016 at 02:49:09PM -0400, Rik van Riel wrote:
> > > > >
> > > > >
> > > > > On Mon, 2016-05-23 at 20:42 +0200, Michal Hocko wrote:
> > > > > >
> > > > > >
> > > > > > On Mon 23-05-16 20:14:11, Ebru Akagunduz wrote:
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > Currently khugepaged makes swapin readahead under
> > > > > > > down_write. This patch supplies to make swapin
> > > > > > > readahead under down_read instead of down_write.
> > > > > > You are still keeping down_write. Can we do without it
> > > > > > altogether?
> > > > > > Blocking mmap_sem of a remote proces for write is certainly
> > > > > > not
> > > > > > nice.
> > > > > Maybe Andrea can explain why khugepaged requires
> > > > > a down_write of mmap_sem?
> > > > >
> > > > > If it were possible to have just down_read that
> > > > > would make the code a lot simpler.
> > > > You need a down_write() to retract page table. We need to make
> > > > sure
> > > > that
> > > > nobody sees the page table before we can replace it with huge
> > > > pmd.
> > > Good point.
> > >
> > > I guess the alternative is to have the page_table_lock
> > > taken by a helper function (everywhere) that can return
> > > failure if the page table was changed while the caller
> > > was waiting for the lock.
> > Not page table was changed, but pmd is now pointing to something
> > else.
> > Basically, we would need to nest all pte-ptl's within pmd_lock().
> > That's not good for scalability.
>
> I can see a few alternatives here:
>
> 1) huge pmd collapsing takes both the pmd lock and the pte lock,
> preventing pte updates from happening simultaneously
That's what we do now and that's not enough.
We would need to serialize against pmd_lock() during normal page-fault
path (and other pte manipulation), which we don't do now if pmd points to
page table.
That's huge hit on scalability.
>
> 2) code that (re-)acquires the pte lock can read a sequence number
> at the pmd level, check that it did not change after the
> pte lock has been acquired, and abort if it has - I believe most
> of the code that re-acquires the pte lock already knows how to
> abort if somebody else touched the pte while it was looking
> elsewhere
So, every pmd_lock() (and other means we take the lock) should bump the
sequence number and we need to be able to read stable result outside
pmd_lock(), meaning it should be atomic_t or something similar.
Not exactly free.
And I'm not convinced the hassle worth the gain.
> That way the (uncommon) thp collapse code should still exclude
> pte level operations, at the cost of potentially teaching a few
> more pte level operations to abort (chances are most already do,
> considering a race with other pte-level manipulations requires that).
--
Kirill A. Shutemov
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
WARNING: multiple messages have this Message-ID (diff)
From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
To: Rik van Riel <riel@redhat.com>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>,
Michal Hocko <mhocko@kernel.org>,
Ebru Akagunduz <ebru.akagunduz@gmail.com>,
linux-mm@kvack.org, hughd@google.com, akpm@linux-foundation.org,
n-horiguchi@ah.jp.nec.com, aarcange@redhat.com,
iamjoonsoo.kim@lge.com, gorcunov@openvz.org,
linux-kernel@vger.kernel.org, mgorman@suse.de,
rientjes@google.com, vbabka@suse.cz,
aneesh.kumar@linux.vnet.ibm.com, hannes@cmpxchg.org,
boaz@plexistor.com
Subject: Re: [PATCH 3/3] mm, thp: make swapin readahead under down_read of mmap_sem
Date: Tue, 24 May 2016 00:49:42 +0300 [thread overview]
Message-ID: <20160523214942.GA79646@black.fi.intel.com> (raw)
In-Reply-To: <1464034383.16365.70.camel@redhat.com>
On Mon, May 23, 2016 at 04:13:03PM -0400, Rik van Riel wrote:
> On Mon, 2016-05-23 at 23:02 +0300, Kirill A. Shutemov wrote:
> > On Mon, May 23, 2016 at 03:26:47PM -0400, Rik van Riel wrote:
> > >
> > > On Mon, 2016-05-23 at 22:01 +0300, Kirill A. Shutemov wrote:
> > > >
> > > > On Mon, May 23, 2016 at 02:49:09PM -0400, Rik van Riel wrote:
> > > > >
> > > > >
> > > > > On Mon, 2016-05-23 at 20:42 +0200, Michal Hocko wrote:
> > > > > >
> > > > > >
> > > > > > On Mon 23-05-16 20:14:11, Ebru Akagunduz wrote:
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > Currently khugepaged makes swapin readahead under
> > > > > > > down_write. This patch supplies to make swapin
> > > > > > > readahead under down_read instead of down_write.
> > > > > > You are still keeping down_write. Can we do without it
> > > > > > altogether?
> > > > > > Blocking mmap_sem of a remote proces for write is certainly
> > > > > > not
> > > > > > nice.
> > > > > Maybe Andrea can explain why khugepaged requires
> > > > > a down_write of mmap_sem?
> > > > >
> > > > > If it were possible to have just down_read that
> > > > > would make the code a lot simpler.
> > > > You need a down_write() to retract page table. We need to make
> > > > sure
> > > > that
> > > > nobody sees the page table before we can replace it with huge
> > > > pmd.
> > > Good point.
> > >
> > > I guess the alternative is to have the page_table_lock
> > > taken by a helper function (everywhere) that can return
> > > failure if the page table was changed while the caller
> > > was waiting for the lock.
> > Not page table was changed, but pmd is now pointing to something
> > else.
> > Basically, we would need to nest all pte-ptl's within pmd_lock().
> > That's not good for scalability.
>
> I can see a few alternatives here:
>
> 1) huge pmd collapsing takes both the pmd lock and the pte lock,
> preventing pte updates from happening simultaneously
That's what we do now and that's not enough.
We would need to serialize against pmd_lock() during normal page-fault
path (and other pte manipulation), which we don't do now if pmd points to
page table.
That's huge hit on scalability.
>
> 2) code that (re-)acquires the pte lock can read a sequence number
> at the pmd level, check that it did not change after the
> pte lock has been acquired, and abort if it has - I believe most
> of the code that re-acquires the pte lock already knows how to
> abort if somebody else touched the pte while it was looking
> elsewhere
So, every pmd_lock() (and other means we take the lock) should bump the
sequence number and we need to be able to read stable result outside
pmd_lock(), meaning it should be atomic_t or something similar.
Not exactly free.
And I'm not convinced the hassle worth the gain.
> That way the (uncommon) thp collapse code should still exclude
> pte level operations, at the cost of potentially teaching a few
> more pte level operations to abort (chances are most already do,
> considering a race with other pte-level manipulations requires that).
--
Kirill A. Shutemov
next prev parent reply other threads:[~2016-05-23 21:49 UTC|newest]
Thread overview: 31+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-05-23 17:14 [PATCH 0/3] mm, thp: remove duplication and fix locking issues in swapin Ebru Akagunduz
2016-05-23 17:14 ` Ebru Akagunduz
2016-05-23 17:14 ` [PATCH 1/3] mm, thp: remove duplication of included header Ebru Akagunduz
2016-05-23 17:14 ` Ebru Akagunduz
2016-05-23 17:14 ` [PATCH 2/3] mm, thp: fix possible circular locking dependency caused by sum_vm_event() Ebru Akagunduz
2016-05-23 17:14 ` Ebru Akagunduz
2016-05-23 17:14 ` [PATCH 3/3] mm, thp: make swapin readahead under down_read of mmap_sem Ebru Akagunduz
2016-05-23 17:14 ` Ebru Akagunduz
2016-05-23 17:44 ` Kirill A. Shutemov
2016-05-23 17:44 ` Kirill A. Shutemov
2016-05-23 18:42 ` Michal Hocko
2016-05-23 18:42 ` Michal Hocko
2016-05-23 18:49 ` Rik van Riel
2016-05-23 19:01 ` Kirill A. Shutemov
2016-05-23 19:01 ` Kirill A. Shutemov
2016-05-23 19:26 ` Rik van Riel
2016-05-23 20:02 ` Kirill A. Shutemov
2016-05-23 20:02 ` Kirill A. Shutemov
2016-05-23 20:13 ` Rik van Riel
2016-05-23 21:49 ` Kirill A. Shutemov [this message]
2016-05-23 21:49 ` Kirill A. Shutemov
2016-05-23 23:08 ` Andrea Arcangeli
2016-05-23 23:08 ` Andrea Arcangeli
2016-05-23 17:29 ` [PATCH 0/3] mm, thp: remove duplication and fix locking issues in swapin Ebru Akagunduz
2016-05-23 17:29 ` Ebru Akagunduz
2016-05-27 13:12 ` Michal Hocko
2016-05-27 13:12 ` Michal Hocko
2016-06-11 19:21 ` Ebru Akagunduz
2016-06-11 19:21 ` Ebru Akagunduz
2016-06-13 13:55 ` Michal Hocko
2016-06-13 13:55 ` Michal Hocko
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20160523214942.GA79646@black.fi.intel.com \
--to=kirill.shutemov@linux.intel.com \
--cc=aarcange@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=aneesh.kumar@linux.vnet.ibm.com \
--cc=boaz@plexistor.com \
--cc=ebru.akagunduz@gmail.com \
--cc=gorcunov@openvz.org \
--cc=hannes@cmpxchg.org \
--cc=hughd@google.com \
--cc=iamjoonsoo.kim@lge.com \
--cc=kirill@shutemov.name \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mgorman@suse.de \
--cc=mhocko@kernel.org \
--cc=n-horiguchi@ah.jp.nec.com \
--cc=riel@redhat.com \
--cc=rientjes@google.com \
--cc=vbabka@suse.cz \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.