Re: [RFC 3/3] mm: make swapin readahead to improve thp collapse rate

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Ebru Akagunduz <ebru.akagunduz@gmail.com>
To: Rik van Riel <riel@redhat.com>
Cc: linux-mm@kvack.org, kirill.shutemov@linux.intel.com,
	n-horiguchi@ah.jp.nec.com, aarcange@redhat.com,
	iamjoonsoo.kim@lge.com, xiexiuqi@huawei.com, gorcunov@openvz.org,
	linux-kernel@vger.kernel.org, mgorman@suse.de,
	rientjes@google.com, vbabka@suse.cz,
	aneesh.kumar@linux.vnet.ibm.com, hughd@google.com,
	hannes@cmpxchg.org, mhocko@suse.cz, boaz@plexistor.com,
	raindel@mellanox.com
Subject: Re: [RFC 3/3] mm: make swapin readahead to improve thp collapse rate
Date: Wed, 17 Jun 2015 20:38:56 +0300	[thread overview]
Message-ID: <20150617173856.GA3970@debian> (raw)
In-Reply-To: <5580E774.3070307@redhat.com>

On Tue, Jun 16, 2015 at 11:20:20PM -0400, Rik van Riel wrote:
> On 06/16/2015 05:15 PM, Andrew Morton wrote:
> > On Sun, 14 Jun 2015 18:04:43 +0300 Ebru Akagunduz <ebru.akagunduz@gmail.com> wrote:
> > 
> >> This patch makes swapin readahead to improve thp collapse rate.
> >> When khugepaged scanned pages, there can be a few of the pages
> >> in swap area.
> >>
> >> With the patch THP can collapse 4kB pages into a THP when
> >> there are up to max_ptes_swap swap ptes in a 2MB range.
> >>
> >> The patch was tested with a test program that allocates
> >> 800MB of memory, writes to it, and then sleeps. I force
> >> the system to swap out all. Afterwards, the test program
> >> touches the area by writing, it skips a page in each
> >> 20 pages of the area.
> >>
> >> Without the patch, system did not swap in readahead.
> >> THP rate was %47 of the program of the memory, it
> >> did not change over time.
> >>
> >> With this patch, after 10 minutes of waiting khugepaged had
> >> collapsed %99 of the program's memory.
> >>
> >> ...
> >>
> >> +/*
> >> + * Bring missing pages in from swap, to complete THP collapse.
> >> + * Only done if khugepaged_scan_pmd believes it is worthwhile.
> >> + *
> >> + * Called and returns without pte mapped or spinlocks held,
> >> + * but with mmap_sem held to protect against vma changes.
> >> + */
> >> +
> >> +static void __collapse_huge_page_swapin(struct mm_struct *mm,
> >> +					struct vm_area_struct *vma,
> >> +					unsigned long address, pmd_t *pmd,
> >> +					pte_t *pte)
> >> +{
> >> +	unsigned long _address;
> >> +	pte_t pteval = *pte;
> >> +	int swap_pte = 0;
> >> +
> >> +	pte = pte_offset_map(pmd, address);
> >> +	for (_address = address; _address < address + HPAGE_PMD_NR*PAGE_SIZE;
> >> +	     pte++, _address += PAGE_SIZE) {
> >> +		pteval = *pte;
> >> +		if (is_swap_pte(pteval)) {
> >> +			swap_pte++;
> >> +			do_swap_page(mm, vma, _address, pte, pmd, 0x0, pteval);
> >> +			/* pte is unmapped now, we need to map it */
> >> +			pte = pte_offset_map(pmd, _address);
> >> +		}
> >> +	}
> >> +	pte--;
> >> +	pte_unmap(pte);
> >> +	trace_mm_collapse_huge_page_swapin(mm, vma->vm_start, swap_pte);
> >> +}
> > 
> > This is doing a series of synchronous reads.  That will be sloooow on
> > spinning disks.
> >
> > This function should be significantly faster if it first gets all the
> > necessary I/O underway.  I don't think we have a function which exactly
> > does this.  Perhaps generalise swapin_readahead() or open-code
> > something like
> 
> Looking at do_swap_page() and __lock_page_or_retry(), I guess
> there already is a way to do the above.
> 
> Passing a "flags" of FAULT_FLAG_ALLOW_RETRY|FAULT_FLAG_RETRY_NOWAIT
> to do_swap_page() should result in do_swap_page() returning with
> the pte unmapped and the mmap_sem still held if the page was not
> immediately available to map into the pte (trylock_page succeeds).
> 
> Ebru, can you try passing the above as the flags argument to
> do_swap_page(), and see what happens?

I will try and resent the patch series.

Thanks for suggestions.

Ebru

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

WARNING: multiple messages have this Message-ID (diff)

From: Ebru Akagunduz <ebru.akagunduz@gmail.com>
To: Rik van Riel <riel@redhat.com>
Cc: linux-mm@kvack.org, kirill.shutemov@linux.intel.com,
	n-horiguchi@ah.jp.nec.com, aarcange@redhat.com,
	iamjoonsoo.kim@lge.com, xiexiuqi@huawei.com, gorcunov@openvz.org,
	linux-kernel@vger.kernel.org, mgorman@suse.de,
	rientjes@google.com, vbabka@suse.cz,
	aneesh.kumar@linux.vnet.ibm.com, hughd@google.com,
	hannes@cmpxchg.org, mhocko@suse.cz, boaz@plexistor.com,
	raindel@mellanox.com
Subject: Re: [RFC 3/3] mm: make swapin readahead to improve thp collapse rate
Date: Wed, 17 Jun 2015 20:38:56 +0300	[thread overview]
Message-ID: <20150617173856.GA3970@debian> (raw)
In-Reply-To: <5580E774.3070307@redhat.com>

On Tue, Jun 16, 2015 at 11:20:20PM -0400, Rik van Riel wrote:
> On 06/16/2015 05:15 PM, Andrew Morton wrote:
> > On Sun, 14 Jun 2015 18:04:43 +0300 Ebru Akagunduz <ebru.akagunduz@gmail.com> wrote:
> > 
> >> This patch makes swapin readahead to improve thp collapse rate.
> >> When khugepaged scanned pages, there can be a few of the pages
> >> in swap area.
> >>
> >> With the patch THP can collapse 4kB pages into a THP when
> >> there are up to max_ptes_swap swap ptes in a 2MB range.
> >>
> >> The patch was tested with a test program that allocates
> >> 800MB of memory, writes to it, and then sleeps. I force
> >> the system to swap out all. Afterwards, the test program
> >> touches the area by writing, it skips a page in each
> >> 20 pages of the area.
> >>
> >> Without the patch, system did not swap in readahead.
> >> THP rate was %47 of the program of the memory, it
> >> did not change over time.
> >>
> >> With this patch, after 10 minutes of waiting khugepaged had
> >> collapsed %99 of the program's memory.
> >>
> >> ...
> >>
> >> +/*
> >> + * Bring missing pages in from swap, to complete THP collapse.
> >> + * Only done if khugepaged_scan_pmd believes it is worthwhile.
> >> + *
> >> + * Called and returns without pte mapped or spinlocks held,
> >> + * but with mmap_sem held to protect against vma changes.
> >> + */
> >> +
> >> +static void __collapse_huge_page_swapin(struct mm_struct *mm,
> >> +					struct vm_area_struct *vma,
> >> +					unsigned long address, pmd_t *pmd,
> >> +					pte_t *pte)
> >> +{
> >> +	unsigned long _address;
> >> +	pte_t pteval = *pte;
> >> +	int swap_pte = 0;
> >> +
> >> +	pte = pte_offset_map(pmd, address);
> >> +	for (_address = address; _address < address + HPAGE_PMD_NR*PAGE_SIZE;
> >> +	     pte++, _address += PAGE_SIZE) {
> >> +		pteval = *pte;
> >> +		if (is_swap_pte(pteval)) {
> >> +			swap_pte++;
> >> +			do_swap_page(mm, vma, _address, pte, pmd, 0x0, pteval);
> >> +			/* pte is unmapped now, we need to map it */
> >> +			pte = pte_offset_map(pmd, _address);
> >> +		}
> >> +	}
> >> +	pte--;
> >> +	pte_unmap(pte);
> >> +	trace_mm_collapse_huge_page_swapin(mm, vma->vm_start, swap_pte);
> >> +}
> > 
> > This is doing a series of synchronous reads.  That will be sloooow on
> > spinning disks.
> >
> > This function should be significantly faster if it first gets all the
> > necessary I/O underway.  I don't think we have a function which exactly
> > does this.  Perhaps generalise swapin_readahead() or open-code
> > something like
> 
> Looking at do_swap_page() and __lock_page_or_retry(), I guess
> there already is a way to do the above.
> 
> Passing a "flags" of FAULT_FLAG_ALLOW_RETRY|FAULT_FLAG_RETRY_NOWAIT
> to do_swap_page() should result in do_swap_page() returning with
> the pte unmapped and the mmap_sem still held if the page was not
> immediately available to map into the pte (trylock_page succeeds).
> 
> Ebru, can you try passing the above as the flags argument to
> do_swap_page(), and see what happens?

I will try and resent the patch series.

Thanks for suggestions.

Ebru

next prev parent reply	other threads:[~2015-06-17 17:39 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-06-14 15:04 [RFC 0/3] mm: make swapin readahead to gain more thp performance Ebru Akagunduz
2015-06-14 15:04 ` Ebru Akagunduz
2015-06-14 15:04 ` [RFC 1/3] mm: add tracepoint for scanning pages Ebru Akagunduz
2015-06-14 15:04   ` Ebru Akagunduz
2015-06-15  1:04   ` Rik van Riel
2015-06-15  1:04     ` Rik van Riel
2015-06-14 15:04 ` [RFC 2/3] mm: make optimistic check for swapin readahead Ebru Akagunduz
2015-06-14 15:04   ` Ebru Akagunduz
2015-06-15  5:40   ` Leon Romanovsky
2015-06-15  5:40     ` Leon Romanovsky
2015-06-15  5:43     ` Rik van Riel
2015-06-15  5:43       ` Rik van Riel
2015-06-15  6:08       ` Leon Romanovsky
2015-06-15  6:08         ` Leon Romanovsky
2015-06-15  6:35         ` Rik van Riel
2015-06-15  6:35           ` Rik van Riel
2015-06-15 14:05   ` Rik van Riel
2015-06-15 14:05     ` Rik van Riel
2015-06-15 16:07     ` Leon Romanovsky
2015-06-15 16:07       ` Leon Romanovsky
2015-06-14 15:04 ` [RFC 3/3] mm: make swapin readahead to improve thp collapse rate Ebru Akagunduz
2015-06-14 15:04   ` Ebru Akagunduz
2015-06-15 13:59   ` Rik van Riel
2015-06-15 13:59     ` Rik van Riel
2015-06-16 21:15   ` Andrew Morton
2015-06-16 21:15     ` Andrew Morton
2015-06-17  3:20     ` Rik van Riel
2015-06-17  3:20       ` Rik van Riel
2015-06-17 17:38       ` Ebru Akagunduz [this message]
2015-06-17 17:38         ` Ebru Akagunduz

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150617173856.GA3970@debian \
    --to=ebru.akagunduz@gmail.com \
    --cc=aarcange@redhat.com \
    --cc=aneesh.kumar@linux.vnet.ibm.com \
    --cc=boaz@plexistor.com \
    --cc=gorcunov@openvz.org \
    --cc=hannes@cmpxchg.org \
    --cc=hughd@google.com \
    --cc=iamjoonsoo.kim@lge.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@suse.de \
    --cc=mhocko@suse.cz \
    --cc=n-horiguchi@ah.jp.nec.com \
    --cc=raindel@mellanox.com \
    --cc=riel@redhat.com \
    --cc=rientjes@google.com \
    --cc=vbabka@suse.cz \
    --cc=xiexiuqi@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.