All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ebru Akagunduz <ebru.akagunduz@gmail.com>
To: Hillf Danton <hillf.zj@alibaba-inc.com>,
	"Kirill A. Shutemov" <kirill@shutemov.name>
Cc: hughd@google.com, linux-kernel@vger.kernel.org,
	linux-mm@kvack.org, akpm@linux-foundation.org
Subject: Re: [PATCHv9-rebased2 01/37] mm, thp: make swapin readahead under down_read of mmap_sem
Date: Wed, 22 Jun 2016 14:24:26 +0300	[thread overview]
Message-ID: <20160622112426.GA2028@gmail.com> (raw)
In-Reply-To: <05f001d1ca9e$a3ac5a00$eb050e00$@alibaba-inc.com>

On Mon, Jun 20, 2016 at 10:51:25AM +0800, Hillf Danton wrote:
> > > > > @@ -2401,11 +2430,18 @@ static void __collapse_huge_page_swapin(struct mm_struct *mm,
> > > > >  			continue;
> > > > >  		swapped_in++;
> > > > >  		ret = do_swap_page(mm, vma, _address, pte, pmd,
> > > > > -				   FAULT_FLAG_ALLOW_RETRY|FAULT_FLAG_RETRY_NOWAIT,
> > > > > +				   FAULT_FLAG_ALLOW_RETRY,
> > > >
> > > > Add a description in change log for it please.
> > >
> > > Ebru, would you address it?
> > >
> > This changelog really seems poor.
> > Is there a way to update only changelog of the commit?
> > I tried to use git rebase to amend commit, however
> > I could not rebase. This patch only needs better changelog.
> > 
> > I would like to update it as follows, if you would like to too:
> > 
> > "
> > Currently khugepaged makes swapin readahead under down_write.  This patch
> > supplies to make swapin readahead under down_read instead of down_write.
> > 
> > Along swapin, we can need to drop and re-take mmap_sem. Therefore we
> > have to be sure vma is consistent. This patch adds a helper function
> > to validate vma and also supplies that async swapin should not be
> > performed without waiting.
> > 
> > The patch was tested with a test program that allocates 800MB of memory,
> > writes to it, and then sleeps.  The system was forced to swap out all.
> > Afterwards, the test program touches the area by writing, it skips a page
> > in each 20 pages of the area.
> > "
> > 
> I like to ask again, why is FAULT_FLAG_RETRY_NOWAIT dropped?
> 
I dropped it regarding to Hugh's concern. If I understood correctly,
he said async swapin without waiting can cause waste. When I looked the
code path, it seemed subtle to me. There was a large discussion,
below is Hugh's concern:

"
Doesn't this imply that __collapse_huge_page_swapin() will initiate all
the necessary swapins for a THP, then (given the FAULT_FLAG_ALLOW_RETRY)
not wait for them to complete, so khugepaged will give up on that extent
and move on to another; then after another full circuit of all the mms
it needs to examine, it will arrive back at this extent and build a THP
from the swapins it arranged last time.

Which may work well when a system transitions from busy+swappingout
to idle+swappingin, but isn't that rather a special case?  It feels
(meaning, I've not measured at all) as if the inbetween busyish case
will waste a lot of I/O and memory on swapins that have to be discarded
again before khugepaged has made its sedate way back to slotting them in.
"

Cc'ed Hugh. Maybe I misunderstood him.

> > Could you please suggest me a way to replace above changelog with the old?
> > 
> We can ask Andrew for some advices.
> 
> > > >
> > > > They are cleaned up in subsequent darns?
> > >
> > Yes, that is reported and fixed here:
> > http://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git/commit/?id=fc7038a69cee6b817261f7cd805e9663fdc1075c
> > 
> > However, the above comment inconsistency still there.
> > I've added a fix patch:
> > 
> > From 404438ff1b0617cbf7434cba0c5a08f79ccb8a5d Mon Sep 17 00:00:00 2001
> > From: Ebru Akagunduz <ebru.akagunduz@gmail.com>
> > Date: Sat, 18 Jun 2016 21:07:22 +0300
> > Subject: [PATCH] mm, thp: fix comment inconsistency for swapin readahead
> >  functions
> >
> Fill in change log please.
> 
I filled it and sent as part of a series.
Cc'ed you in that patch.
> thanks
> Hillf
>  
> > Signed-off-by: Ebru Akagunduz <ebru.akagunduz@gmail.com>
> > ---
> >  mm/huge_memory.c | 9 +++++----
> >  1 file changed, 5 insertions(+), 4 deletions(-)
> > 
> > diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> > index acd374e..f0d528e 100644
> > --- a/mm/huge_memory.c
> > +++ b/mm/huge_memory.c
> > @@ -2436,9 +2436,10 @@ static bool __collapse_huge_page_swapin(struct mm_struct *mm,
> >  		/* do_swap_page returns VM_FAULT_RETRY with released mmap_sem */
> >  		if (ret & VM_FAULT_RETRY) {
> >  			down_read(&mm->mmap_sem);
> > -			/* vma is no longer available, don't continue to swapin */
> > -			if (hugepage_vma_revalidate(mm, address))
> > +			if (hugepage_vma_revalidate(mm, address)) {
> > +				/* vma is no longer available, don't continue to swapin */
> >  				return false;
> > +			}
> >  		}
> >  		if (ret & VM_FAULT_ERROR) {
> >  			trace_mm_collapse_huge_page_swapin(mm, swapped_in, 0);
> > @@ -2513,8 +2514,8 @@ static void collapse_huge_page(struct mm_struct *mm,
> >  	if (allocstall == curr_allocstall && swap != 0) {
> >  		/*
> >  		 * __collapse_huge_page_swapin always returns with mmap_sem
> > -		 * locked.  If it fails, release mmap_sem and jump directly
> > -		 * out.  Continuing to collapse causes inconsistency.
> > +		 * locked. If it fails, we release mmap_sem and jump out_nolock.
> > +		 * Continuing to collapse causes inconsistency.
> >  		 */
> >  		if (!__collapse_huge_page_swapin(mm, vma, address, pmd)) {
> >  			mem_cgroup_cancel_charge(new_page, memcg, true);
> > --
> > 1.9.1
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

WARNING: multiple messages have this Message-ID (diff)
From: Ebru Akagunduz <ebru.akagunduz@gmail.com>
To: Hillf Danton <hillf.zj@alibaba-inc.com>,
	"Kirill A. Shutemov" <kirill@shutemov.name>
Cc: hughd@google.com, linux-kernel@vger.kernel.org,
	linux-mm@kvack.org, akpm@linux-foundation.org
Subject: Re: [PATCHv9-rebased2 01/37] mm, thp: make swapin readahead under down_read of mmap_sem
Date: Wed, 22 Jun 2016 14:24:26 +0300	[thread overview]
Message-ID: <20160622112426.GA2028@gmail.com> (raw)
In-Reply-To: <05f001d1ca9e$a3ac5a00$eb050e00$@alibaba-inc.com>

On Mon, Jun 20, 2016 at 10:51:25AM +0800, Hillf Danton wrote:
> > > > > @@ -2401,11 +2430,18 @@ static void __collapse_huge_page_swapin(struct mm_struct *mm,
> > > > >  			continue;
> > > > >  		swapped_in++;
> > > > >  		ret = do_swap_page(mm, vma, _address, pte, pmd,
> > > > > -				   FAULT_FLAG_ALLOW_RETRY|FAULT_FLAG_RETRY_NOWAIT,
> > > > > +				   FAULT_FLAG_ALLOW_RETRY,
> > > >
> > > > Add a description in change log for it please.
> > >
> > > Ebru, would you address it?
> > >
> > This changelog really seems poor.
> > Is there a way to update only changelog of the commit?
> > I tried to use git rebase to amend commit, however
> > I could not rebase. This patch only needs better changelog.
> > 
> > I would like to update it as follows, if you would like to too:
> > 
> > "
> > Currently khugepaged makes swapin readahead under down_write.  This patch
> > supplies to make swapin readahead under down_read instead of down_write.
> > 
> > Along swapin, we can need to drop and re-take mmap_sem. Therefore we
> > have to be sure vma is consistent. This patch adds a helper function
> > to validate vma and also supplies that async swapin should not be
> > performed without waiting.
> > 
> > The patch was tested with a test program that allocates 800MB of memory,
> > writes to it, and then sleeps.  The system was forced to swap out all.
> > Afterwards, the test program touches the area by writing, it skips a page
> > in each 20 pages of the area.
> > "
> > 
> I like to ask again, why is FAULT_FLAG_RETRY_NOWAIT dropped?
> 
I dropped it regarding to Hugh's concern. If I understood correctly,
he said async swapin without waiting can cause waste. When I looked the
code path, it seemed subtle to me. There was a large discussion,
below is Hugh's concern:

"
Doesn't this imply that __collapse_huge_page_swapin() will initiate all
the necessary swapins for a THP, then (given the FAULT_FLAG_ALLOW_RETRY)
not wait for them to complete, so khugepaged will give up on that extent
and move on to another; then after another full circuit of all the mms
it needs to examine, it will arrive back at this extent and build a THP
from the swapins it arranged last time.

Which may work well when a system transitions from busy+swappingout
to idle+swappingin, but isn't that rather a special case?  It feels
(meaning, I've not measured at all) as if the inbetween busyish case
will waste a lot of I/O and memory on swapins that have to be discarded
again before khugepaged has made its sedate way back to slotting them in.
"

Cc'ed Hugh. Maybe I misunderstood him.

> > Could you please suggest me a way to replace above changelog with the old?
> > 
> We can ask Andrew for some advices.
> 
> > > >
> > > > They are cleaned up in subsequent darns?
> > >
> > Yes, that is reported and fixed here:
> > http://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git/commit/?id=fc7038a69cee6b817261f7cd805e9663fdc1075c
> > 
> > However, the above comment inconsistency still there.
> > I've added a fix patch:
> > 
> > From 404438ff1b0617cbf7434cba0c5a08f79ccb8a5d Mon Sep 17 00:00:00 2001
> > From: Ebru Akagunduz <ebru.akagunduz@gmail.com>
> > Date: Sat, 18 Jun 2016 21:07:22 +0300
> > Subject: [PATCH] mm, thp: fix comment inconsistency for swapin readahead
> >  functions
> >
> Fill in change log please.
> 
I filled it and sent as part of a series.
Cc'ed you in that patch.
> thanks
> Hillf
>  
> > Signed-off-by: Ebru Akagunduz <ebru.akagunduz@gmail.com>
> > ---
> >  mm/huge_memory.c | 9 +++++----
> >  1 file changed, 5 insertions(+), 4 deletions(-)
> > 
> > diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> > index acd374e..f0d528e 100644
> > --- a/mm/huge_memory.c
> > +++ b/mm/huge_memory.c
> > @@ -2436,9 +2436,10 @@ static bool __collapse_huge_page_swapin(struct mm_struct *mm,
> >  		/* do_swap_page returns VM_FAULT_RETRY with released mmap_sem */
> >  		if (ret & VM_FAULT_RETRY) {
> >  			down_read(&mm->mmap_sem);
> > -			/* vma is no longer available, don't continue to swapin */
> > -			if (hugepage_vma_revalidate(mm, address))
> > +			if (hugepage_vma_revalidate(mm, address)) {
> > +				/* vma is no longer available, don't continue to swapin */
> >  				return false;
> > +			}
> >  		}
> >  		if (ret & VM_FAULT_ERROR) {
> >  			trace_mm_collapse_huge_page_swapin(mm, swapped_in, 0);
> > @@ -2513,8 +2514,8 @@ static void collapse_huge_page(struct mm_struct *mm,
> >  	if (allocstall == curr_allocstall && swap != 0) {
> >  		/*
> >  		 * __collapse_huge_page_swapin always returns with mmap_sem
> > -		 * locked.  If it fails, release mmap_sem and jump directly
> > -		 * out.  Continuing to collapse causes inconsistency.
> > +		 * locked. If it fails, we release mmap_sem and jump out_nolock.
> > +		 * Continuing to collapse causes inconsistency.
> >  		 */
> >  		if (!__collapse_huge_page_swapin(mm, vma, address, pmd)) {
> >  			mem_cgroup_cancel_charge(new_page, memcg, true);
> > --
> > 1.9.1
> 

  reply	other threads:[~2016-06-22 11:24 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <04f701d1c797$1ebe6b80$5c3b4280$@alibaba-inc.com>
2016-06-16  6:52 ` [PATCHv9-rebased2 01/37] mm, thp: make swapin readahead under down_read of mmap_sem Hillf Danton
2016-06-16  6:52   ` Hillf Danton
2016-06-16 10:08   ` Kirill A. Shutemov
2016-06-16 10:08     ` Kirill A. Shutemov
2016-06-18 19:09     ` Ebru Akagunduz
2016-06-18 19:09       ` Ebru Akagunduz
2016-06-20  2:51       ` Hillf Danton
2016-06-20  2:51         ` Hillf Danton
2016-06-22 11:24         ` Ebru Akagunduz [this message]
2016-06-22 11:24           ` Ebru Akagunduz
2016-06-20 11:15       ` Michal Hocko
2016-06-20 11:15         ` Michal Hocko
2016-06-06 14:06 [PATCHv9 00/32] THP-enabled tmpfs/shmem using compound pages Kirill A. Shutemov
2016-06-15 20:06 ` [PATCHv9-rebased2 00/37] " Kirill A. Shutemov
2016-06-15 20:06   ` [PATCHv9-rebased2 01/37] mm, thp: make swapin readahead under down_read of mmap_sem Kirill A. Shutemov
2016-06-15 20:06     ` Kirill A. Shutemov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160622112426.GA2028@gmail.com \
    --to=ebru.akagunduz@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=hillf.zj@alibaba-inc.com \
    --cc=hughd@google.com \
    --cc=kirill@shutemov.name \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.