Re: [RFC v5 0/3] mm: make swapin readahead to gain more thp performance

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Ebru Akagunduz <ebru.akagunduz@gmail.com>
To: Hugh Dickins <hughd@google.com>
Cc: akpm@linux-foundation.org, linux-mm@kvack.org,
	kirill.shutemov@linux.intel.com, n-horiguchi@ah.jp.nec.com,
	aarcange@redhat.com, riel@redhat.com, iamjoonsoo.kim@lge.com,
	xiexiuqi@huawei.com, gorcunov@openvz.org,
	linux-kernel@vger.kernel.org, mgorman@suse.de,
	rientjes@google.com, vbabka@suse.cz,
	aneesh.kumar@linux.vnet.ibm.com, hannes@cmpxchg.org,
	mhocko@suse.cz, boaz@plexistor.com, raindel@mellanox.com
Subject: Re: [RFC v5 0/3] mm: make swapin readahead to gain more thp performance
Date: Fri, 26 Feb 2016 01:16:44 +0200	[thread overview]
Message-ID: <20160225231644.GA13782@debian> (raw)
In-Reply-To: <alpine.LSU.2.11.1602242301040.6947@eggly.anvils>

On Wed, Feb 24, 2016 at 11:36:30PM -0800, Hugh Dickins wrote:
> On Mon, 14 Sep 2015, Andrew Morton wrote:
> > On Mon, 14 Sep 2015 22:31:42 +0300 Ebru Akagunduz <ebru.akagunduz@gmail.com> wrote:
> > 
> > > This patch series makes swapin readahead up to a
> > > certain number to gain more thp performance and adds
> > > tracepoint for khugepaged_scan_pmd, collapse_huge_page,
> > > __collapse_huge_page_isolate.
> > 
> > I'll merge this series for testing.  Hopefully Andrea and/or Hugh will
> > find time for a quality think about the issue before 4.3 comes around.
> > 
> > It would be much better if we didn't have that sysfs knob - make the
> > control automatic in some fashion.
> > 
> > If we can't think of a way of doing that then at least let's document
> > max_ptes_swap very carefully.  Explain to our users what it does, why
> > they should care about it, how they should set about determining (ie:
> > measuring) its effect upon their workloads.
> 
> Ebru, I don't know whether you realize, but your THP swapin work has
> been languishing in mmotm for five months now, without getting any
> nearer to Linus's tree.
> 
> That's partly my fault - sorry - for not responding to Andrew's nudge
> above.  But I think you also got caught up in conference, and in the
> end did not get around to answering outstanding issues: please take a
> look at your mailbox from last September, to see what more is needed.
> 
I've seen my patch series in mmotm mails but I thought,
other parts of thp have problem so those are going to be
forwarded to Linus's tree when other parts fixed.

I did not know the file: http://www.ozlabs.org/~akpm/mmotm/series
It shows explicitly the problem of patches.
Thank you for summarizing it below.

> Here's what mmotm's series file says...
> 
> #mm-add-tracepoint-for-scanning-pages.patch+2: Andrea/Hugh review?. 2 Fengguang warnings, one "kernel test robot" oops
> #mm-make-optimistic-check-for-swapin-readahead.patch: TBU (docs)
I've sent doc patch: http://lkml.iu.edu/hypermail/linux/kernel/1509.2/01783.html
> mm-make-optimistic-check-for-swapin-readahead.patch
> mm-make-optimistic-check-for-swapin-readahead-fix-2.patch
> #mm-make-swapin-readahead-to-improve-thp-collapse-rate.patch: Hugh/Kirill want collapse_huge_page() rework
> mm-make-swapin-readahead-to-improve-thp-collapse-rate.patch
> mm-make-swapin-readahead-to-improve-thp-collapse-rate-fix.patch
> mm-make-swapin-readahead-to-improve-thp-collapse-rate-fix-2.patch
> #mm-make-swapin-readahead-to-improve-thp-collapse-rate-fix-3.patch: Ebru to test?
I've tested my whole patch series and could not produce the fault
again. I've also seen Tested-by tag from Sergey so I did not sent
the tag.
> mm-make-swapin-readahead-to-improve-thp-collapse-rate-fix-3.patch
> 
> ...but I think some of that is stale.  There were a few little bugs
> when it first went into mmotm, which Kirill very swiftly fixed up,
> and I don't think it has given anybody any trouble since then.
> 
> But do I want to see this work go in?  Yes and no.  The problem it
> fixes (that although we give out a THP to someone who faults a single
> page of it, after swapout the THP cannot be recovered until they have
> faulted in every page of it) is real and embarrassing; the code is good;
> and I don't mind the max_ptes_swap tunable that concerns Andrew above;
> but Kirill and Vlastimil made important points that still trouble me.
> 
> I can't locate Kirill's mail right now, perhaps I'm misremembering:
> but wasn't he concerned by your __collapse_huge_page_swapin() (likely
> to be allocating many small pages) being called under down_write of
> mmap_sem?  That's usually something we soon regret, and even down_read
> of mmap_sem across many memory allocations would be unfortunate
> (khugepaged used to allocate its THP that way, but we have
> Vlastimil to thank for stopping that in his 8b1645685acf).
> 
> And didn't Vlastimil (9/4/15) make some other unanswered
> observations about the call to __collapse_huge_page_swapin():
> 
> > Hmm it seems rather wasteful to call this when no swap entries were detected.
> > Also it seems pointless to try continue collapsing when we have just only issued
> > async swap-in? What are the chances they would finish in time?
> > 
> > I'm less sure about the relation vs khugepaged_alloc_page(). At this point, we
> > have already succeeded the hugepage allocation. It makes sense not to swap-in if
> > we can't allocate a hugepage. It makes also sense not to allocate a hugepage if
> > we will just issue async swap-ins and then free the hugepage back. Swap-in means
> > disk I/O that's best avoided if not useful. But the reclaim for hugepage
> > allocation might also involve disk I/O. At worst, it could be creating new swap
> > pte's in the very pmd we are scanning... Thoughts?
> 
I did not take enough responsibility, you're right. I should have
asked regarding the patch at least.
> Doesn't this imply that __collapse_huge_page_swapin() will initiate all
> the necessary swapins for a THP, then (given the FAULT_FLAG_ALLOW_RETRY)
> not wait for them to complete, so khugepaged will give up on that extent
> and move on to another; then after another full circuit of all the mms
> it needs to examine, it will arrive back at this extent and build a THP
> from the swapins it arranged last time.
> 
> Which may work well when a system transitions from busy+swappingout
> to idle+swappingin, but isn't that rather a special case?  It feels
> (meaning, I've not measured at all) as if the inbetween busyish case
> will waste a lot of I/O and memory on swapins that have to be discarded
> again before khugepaged has made its sedate way back to slotting them in.
> 
> So I wonder how useful this is in its present form.  The problem being,
> not with your code as such, but the whole nature of khugepaged.  When
> I had to solve a similar problem with recovering huge tmpfs pages (not
> yet posted), I did briefly consider whether to hook in to use khugepaged;
> but rejected that, and have never regretted using a workqueue item for
> the extent instead.  Did Vlastimil (argh, him again!) propose something
> similar to replace khugepaged?  Or should khugepaged fire off workqueue
> items for THP extents needing swapin?
> 
> Hugh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

     prev parent reply	other threads:[~2016-02-25 23:16 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-09-14 19:31 [RFC v5 0/3] mm: make swapin readahead to gain more thp performance Ebru Akagunduz
2015-09-14 19:31 ` [RFC v5 1/3] mm: add tracepoint for scanning pages Ebru Akagunduz
2015-09-14 19:31 ` [RFC v5 2/3] mm: make optimistic check for swapin readahead Ebru Akagunduz
2015-09-14 19:47   ` Rik van Riel
2015-09-14 21:33   ` Andrew Morton
2015-09-15 20:08     ` Ebru Akagunduz
2015-09-14 19:31 ` [RFC v5 3/3] mm: make swapin readahead to improve thp collapse rate Ebru Akagunduz
2015-09-17 13:28   ` Kirill A. Shutemov
2015-09-17 15:13   ` Kirill A. Shutemov
2015-09-14 21:41 ` [RFC v5 0/3] mm: make swapin readahead to gain more thp performance Andrew Morton
2016-02-25  7:36   ` Hugh Dickins
2016-02-25 22:35     ` Rik van Riel
2016-02-25 23:30       ` Ebru Akagunduz
2016-02-26  6:17         ` Hugh Dickins
2016-02-26 14:51           ` Rik van Riel
2016-03-03 22:08             ` Ebru Akagunduz
2016-02-25 23:16     ` Ebru Akagunduz [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160225231644.GA13782@debian \
    --to=ebru.akagunduz@gmail.com \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=aneesh.kumar@linux.vnet.ibm.com \
    --cc=boaz@plexistor.com \
    --cc=gorcunov@openvz.org \
    --cc=hannes@cmpxchg.org \
    --cc=hughd@google.com \
    --cc=iamjoonsoo.kim@lge.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@suse.de \
    --cc=mhocko@suse.cz \
    --cc=n-horiguchi@ah.jp.nec.com \
    --cc=raindel@mellanox.com \
    --cc=riel@redhat.com \
    --cc=rientjes@google.com \
    --cc=vbabka@suse.cz \
    --cc=xiexiuqi@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).