From: "Kirill A. Shutemov" <kirill@shutemov.name>
To: Ebru Akagunduz <ebru.akagunduz@gmail.com>
Cc: linux-mm@kvack.org, hughd@google.com, riel@redhat.com,
akpm@linux-foundation.org, kirill.shutemov@linux.intel.com,
n-horiguchi@ah.jp.nec.com, aarcange@redhat.com,
iamjoonsoo.kim@lge.com, gorcunov@openvz.org,
linux-kernel@vger.kernel.org, mgorman@suse.de,
rientjes@google.com, vbabka@suse.cz,
aneesh.kumar@linux.vnet.ibm.com, hannes@cmpxchg.org,
mhocko@suse.cz, boaz@plexistor.com
Subject: Re: [PATCH 3/3] mm, thp: make swapin readahead under down_read of mmap_sem
Date: Mon, 23 May 2016 20:44:37 +0300 [thread overview]
Message-ID: <20160523174437.GA3317@node.shutemov.name> (raw)
In-Reply-To: <1464023651-19420-4-git-send-email-ebru.akagunduz@gmail.com>
On Mon, May 23, 2016 at 08:14:11PM +0300, Ebru Akagunduz wrote:
> Currently khugepaged makes swapin readahead under
> down_write. This patch supplies to make swapin
> readahead under down_read instead of down_write.
>
> The patch was tested with a test program that allocates
> 800MB of memory, writes to it, and then sleeps. The system
> was forced to swap out all. Afterwards, the test program
> touches the area by writing, it skips a page in each
> 20 pages of the area.
>
> Signed-off-by: Ebru Akagunduz <ebru.akagunduz@gmail.com>
> ---
> mm/huge_memory.c | 33 +++++++++++++++++++++++++++------
> 1 file changed, 27 insertions(+), 6 deletions(-)
>
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index feee44c..668bc07 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -2386,13 +2386,14 @@ static bool hugepage_vma_check(struct vm_area_struct *vma)
> * but with mmap_sem held to protect against vma changes.
> */
>
> -static void __collapse_huge_page_swapin(struct mm_struct *mm,
> +static bool __collapse_huge_page_swapin(struct mm_struct *mm,
> struct vm_area_struct *vma,
> unsigned long address, pmd_t *pmd)
> {
> unsigned long _address;
> pte_t *pte, pteval;
> int swapped_in = 0, ret = 0;
> + struct vm_area_struct *vma_orig = vma;
>
> pte = pte_offset_map(pmd, address);
> for (_address = address; _address < address + HPAGE_PMD_NR*PAGE_SIZE;
> @@ -2402,11 +2403,19 @@ static void __collapse_huge_page_swapin(struct mm_struct *mm,
> continue;
> swapped_in++;
> ret = do_swap_page(mm, vma, _address, pte, pmd,
> - FAULT_FLAG_ALLOW_RETRY|FAULT_FLAG_RETRY_NOWAIT,
> + FAULT_FLAG_ALLOW_RETRY,
> pteval);
> + /* do_swap_page returns VM_FAULT_RETRY with released mmap_sem */
> + if (ret & VM_FAULT_RETRY) {
> + down_read(&mm->mmap_sem);
> + vma = find_vma(mm, address);
> + /* vma is no longer available, don't continue to swapin */
> + if (vma != vma_orig)
> + return false;
> + }
> if (ret & VM_FAULT_ERROR) {
> trace_mm_collapse_huge_page_swapin(mm, swapped_in, 0);
> - return;
> + return false;
> }
> /* pte is unmapped now, we need to map it */
> pte = pte_offset_map(pmd, _address);
> @@ -2414,6 +2423,7 @@ static void __collapse_huge_page_swapin(struct mm_struct *mm,
> pte--;
> pte_unmap(pte);
> trace_mm_collapse_huge_page_swapin(mm, swapped_in, 1);
> + return true;
> }
>
> static void collapse_huge_page(struct mm_struct *mm,
> @@ -2459,7 +2469,7 @@ static void collapse_huge_page(struct mm_struct *mm,
> * gup_fast later hanlded by the ptep_clear_flush and the VM
> * handled by the anon_vma lock + PG_lock.
> */
> - down_write(&mm->mmap_sem);
> + down_read(&mm->mmap_sem);
> if (unlikely(khugepaged_test_exit(mm))) {
> result = SCAN_ANY_PROCESS;
> goto out;
> @@ -2490,9 +2500,20 @@ static void collapse_huge_page(struct mm_struct *mm,
> * Don't perform swapin readahead when the system is under pressure,
> * to avoid unnecessary resource consumption.
> */
> - if (allocstall == curr_allocstall && swap != 0)
> - __collapse_huge_page_swapin(mm, vma, address, pmd);
> + if (allocstall == curr_allocstall && swap != 0) {
> + /*
> + * __collapse_huge_page_swapin always returns with mmap_sem
> + * locked. If it fails, release mmap_sem and jump directly
> + * label out. Continuing to collapse causes inconsistency.
> + */
> + if (!__collapse_huge_page_swapin(mm, vma, address, pmd)) {
> + up_read(&mm->mmap_sem);
> + goto out;
> + }
> + }
>
> + up_read(&mm->mmap_sem);
> + down_write(&mm->mmap_sem);
That's the critical point.
How do you guarantee that the vma will not be destroyed (or changed)
between up_read() and down_write()?
You need at least find_vma() again.
> anon_vma_lock_write(vma->anon_vma);
>
> pte = pte_offset_map(pmd, address);
> --
> 1.9.1
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
--
Kirill A. Shutemov
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
WARNING: multiple messages have this Message-ID (diff)
From: "Kirill A. Shutemov" <kirill@shutemov.name>
To: Ebru Akagunduz <ebru.akagunduz@gmail.com>
Cc: linux-mm@kvack.org, hughd@google.com, riel@redhat.com,
akpm@linux-foundation.org, kirill.shutemov@linux.intel.com,
n-horiguchi@ah.jp.nec.com, aarcange@redhat.com,
iamjoonsoo.kim@lge.com, gorcunov@openvz.org,
linux-kernel@vger.kernel.org, mgorman@suse.de,
rientjes@google.com, vbabka@suse.cz,
aneesh.kumar@linux.vnet.ibm.com, hannes@cmpxchg.org,
mhocko@suse.cz, boaz@plexistor.com
Subject: Re: [PATCH 3/3] mm, thp: make swapin readahead under down_read of mmap_sem
Date: Mon, 23 May 2016 20:44:37 +0300 [thread overview]
Message-ID: <20160523174437.GA3317@node.shutemov.name> (raw)
In-Reply-To: <1464023651-19420-4-git-send-email-ebru.akagunduz@gmail.com>
On Mon, May 23, 2016 at 08:14:11PM +0300, Ebru Akagunduz wrote:
> Currently khugepaged makes swapin readahead under
> down_write. This patch supplies to make swapin
> readahead under down_read instead of down_write.
>
> The patch was tested with a test program that allocates
> 800MB of memory, writes to it, and then sleeps. The system
> was forced to swap out all. Afterwards, the test program
> touches the area by writing, it skips a page in each
> 20 pages of the area.
>
> Signed-off-by: Ebru Akagunduz <ebru.akagunduz@gmail.com>
> ---
> mm/huge_memory.c | 33 +++++++++++++++++++++++++++------
> 1 file changed, 27 insertions(+), 6 deletions(-)
>
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index feee44c..668bc07 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -2386,13 +2386,14 @@ static bool hugepage_vma_check(struct vm_area_struct *vma)
> * but with mmap_sem held to protect against vma changes.
> */
>
> -static void __collapse_huge_page_swapin(struct mm_struct *mm,
> +static bool __collapse_huge_page_swapin(struct mm_struct *mm,
> struct vm_area_struct *vma,
> unsigned long address, pmd_t *pmd)
> {
> unsigned long _address;
> pte_t *pte, pteval;
> int swapped_in = 0, ret = 0;
> + struct vm_area_struct *vma_orig = vma;
>
> pte = pte_offset_map(pmd, address);
> for (_address = address; _address < address + HPAGE_PMD_NR*PAGE_SIZE;
> @@ -2402,11 +2403,19 @@ static void __collapse_huge_page_swapin(struct mm_struct *mm,
> continue;
> swapped_in++;
> ret = do_swap_page(mm, vma, _address, pte, pmd,
> - FAULT_FLAG_ALLOW_RETRY|FAULT_FLAG_RETRY_NOWAIT,
> + FAULT_FLAG_ALLOW_RETRY,
> pteval);
> + /* do_swap_page returns VM_FAULT_RETRY with released mmap_sem */
> + if (ret & VM_FAULT_RETRY) {
> + down_read(&mm->mmap_sem);
> + vma = find_vma(mm, address);
> + /* vma is no longer available, don't continue to swapin */
> + if (vma != vma_orig)
> + return false;
> + }
> if (ret & VM_FAULT_ERROR) {
> trace_mm_collapse_huge_page_swapin(mm, swapped_in, 0);
> - return;
> + return false;
> }
> /* pte is unmapped now, we need to map it */
> pte = pte_offset_map(pmd, _address);
> @@ -2414,6 +2423,7 @@ static void __collapse_huge_page_swapin(struct mm_struct *mm,
> pte--;
> pte_unmap(pte);
> trace_mm_collapse_huge_page_swapin(mm, swapped_in, 1);
> + return true;
> }
>
> static void collapse_huge_page(struct mm_struct *mm,
> @@ -2459,7 +2469,7 @@ static void collapse_huge_page(struct mm_struct *mm,
> * gup_fast later hanlded by the ptep_clear_flush and the VM
> * handled by the anon_vma lock + PG_lock.
> */
> - down_write(&mm->mmap_sem);
> + down_read(&mm->mmap_sem);
> if (unlikely(khugepaged_test_exit(mm))) {
> result = SCAN_ANY_PROCESS;
> goto out;
> @@ -2490,9 +2500,20 @@ static void collapse_huge_page(struct mm_struct *mm,
> * Don't perform swapin readahead when the system is under pressure,
> * to avoid unnecessary resource consumption.
> */
> - if (allocstall == curr_allocstall && swap != 0)
> - __collapse_huge_page_swapin(mm, vma, address, pmd);
> + if (allocstall == curr_allocstall && swap != 0) {
> + /*
> + * __collapse_huge_page_swapin always returns with mmap_sem
> + * locked. If it fails, release mmap_sem and jump directly
> + * label out. Continuing to collapse causes inconsistency.
> + */
> + if (!__collapse_huge_page_swapin(mm, vma, address, pmd)) {
> + up_read(&mm->mmap_sem);
> + goto out;
> + }
> + }
>
> + up_read(&mm->mmap_sem);
> + down_write(&mm->mmap_sem);
That's the critical point.
How do you guarantee that the vma will not be destroyed (or changed)
between up_read() and down_write()?
You need at least find_vma() again.
> anon_vma_lock_write(vma->anon_vma);
>
> pte = pte_offset_map(pmd, address);
> --
> 1.9.1
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
--
Kirill A. Shutemov
next prev parent reply other threads:[~2016-05-23 17:44 UTC|newest]
Thread overview: 31+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-05-23 17:14 [PATCH 0/3] mm, thp: remove duplication and fix locking issues in swapin Ebru Akagunduz
2016-05-23 17:14 ` Ebru Akagunduz
2016-05-23 17:14 ` [PATCH 1/3] mm, thp: remove duplication of included header Ebru Akagunduz
2016-05-23 17:14 ` Ebru Akagunduz
2016-05-23 17:14 ` [PATCH 2/3] mm, thp: fix possible circular locking dependency caused by sum_vm_event() Ebru Akagunduz
2016-05-23 17:14 ` Ebru Akagunduz
2016-05-23 17:14 ` [PATCH 3/3] mm, thp: make swapin readahead under down_read of mmap_sem Ebru Akagunduz
2016-05-23 17:14 ` Ebru Akagunduz
2016-05-23 17:44 ` Kirill A. Shutemov [this message]
2016-05-23 17:44 ` Kirill A. Shutemov
2016-05-23 18:42 ` Michal Hocko
2016-05-23 18:42 ` Michal Hocko
2016-05-23 18:49 ` Rik van Riel
2016-05-23 19:01 ` Kirill A. Shutemov
2016-05-23 19:01 ` Kirill A. Shutemov
2016-05-23 19:26 ` Rik van Riel
2016-05-23 20:02 ` Kirill A. Shutemov
2016-05-23 20:02 ` Kirill A. Shutemov
2016-05-23 20:13 ` Rik van Riel
2016-05-23 21:49 ` Kirill A. Shutemov
2016-05-23 21:49 ` Kirill A. Shutemov
2016-05-23 23:08 ` Andrea Arcangeli
2016-05-23 23:08 ` Andrea Arcangeli
2016-05-23 17:29 ` [PATCH 0/3] mm, thp: remove duplication and fix locking issues in swapin Ebru Akagunduz
2016-05-23 17:29 ` Ebru Akagunduz
2016-05-27 13:12 ` Michal Hocko
2016-05-27 13:12 ` Michal Hocko
2016-06-11 19:21 ` Ebru Akagunduz
2016-06-11 19:21 ` Ebru Akagunduz
2016-06-13 13:55 ` Michal Hocko
2016-06-13 13:55 ` Michal Hocko
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20160523174437.GA3317@node.shutemov.name \
--to=kirill@shutemov.name \
--cc=aarcange@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=aneesh.kumar@linux.vnet.ibm.com \
--cc=boaz@plexistor.com \
--cc=ebru.akagunduz@gmail.com \
--cc=gorcunov@openvz.org \
--cc=hannes@cmpxchg.org \
--cc=hughd@google.com \
--cc=iamjoonsoo.kim@lge.com \
--cc=kirill.shutemov@linux.intel.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mgorman@suse.de \
--cc=mhocko@suse.cz \
--cc=n-horiguchi@ah.jp.nec.com \
--cc=riel@redhat.com \
--cc=rientjes@google.com \
--cc=vbabka@suse.cz \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.