From: Ebru Akagunduz <ebru.akagunduz@gmail.com>
To: Hugh Dickins <hughd@google.com>
Cc: akpm@linux-foundation.org, linux-mm@kvack.org,
kirill.shutemov@linux.intel.com, n-horiguchi@ah.jp.nec.com,
aarcange@redhat.com, riel@redhat.com, iamjoonsoo.kim@lge.com,
xiexiuqi@huawei.com, gorcunov@openvz.org,
linux-kernel@vger.kernel.org, mgorman@suse.de,
rientjes@google.com, vbabka@suse.cz,
aneesh.kumar@linux.vnet.ibm.com, hannes@cmpxchg.org,
mhocko@suse.cz, boaz@plexistor.com, raindel@mellanox.com
Subject: Re: [RFC v5 0/3] mm: make swapin readahead to gain more thp performance
Date: Fri, 26 Feb 2016 01:16:44 +0200 [thread overview]
Message-ID: <20160225231644.GA13782@debian> (raw)
In-Reply-To: <alpine.LSU.2.11.1602242301040.6947@eggly.anvils>
On Wed, Feb 24, 2016 at 11:36:30PM -0800, Hugh Dickins wrote:
> On Mon, 14 Sep 2015, Andrew Morton wrote:
> > On Mon, 14 Sep 2015 22:31:42 +0300 Ebru Akagunduz <ebru.akagunduz@gmail.com> wrote:
> >
> > > This patch series makes swapin readahead up to a
> > > certain number to gain more thp performance and adds
> > > tracepoint for khugepaged_scan_pmd, collapse_huge_page,
> > > __collapse_huge_page_isolate.
> >
> > I'll merge this series for testing. Hopefully Andrea and/or Hugh will
> > find time for a quality think about the issue before 4.3 comes around.
> >
> > It would be much better if we didn't have that sysfs knob - make the
> > control automatic in some fashion.
> >
> > If we can't think of a way of doing that then at least let's document
> > max_ptes_swap very carefully. Explain to our users what it does, why
> > they should care about it, how they should set about determining (ie:
> > measuring) its effect upon their workloads.
>
> Ebru, I don't know whether you realize, but your THP swapin work has
> been languishing in mmotm for five months now, without getting any
> nearer to Linus's tree.
>
> That's partly my fault - sorry - for not responding to Andrew's nudge
> above. But I think you also got caught up in conference, and in the
> end did not get around to answering outstanding issues: please take a
> look at your mailbox from last September, to see what more is needed.
>
I've seen my patch series in mmotm mails but I thought,
other parts of thp have problem so those are going to be
forwarded to Linus's tree when other parts fixed.
I did not know the file: http://www.ozlabs.org/~akpm/mmotm/series
It shows explicitly the problem of patches.
Thank you for summarizing it below.
> Here's what mmotm's series file says...
>
> #mm-add-tracepoint-for-scanning-pages.patch+2: Andrea/Hugh review?. 2 Fengguang warnings, one "kernel test robot" oops
> #mm-make-optimistic-check-for-swapin-readahead.patch: TBU (docs)
I've sent doc patch: http://lkml.iu.edu/hypermail/linux/kernel/1509.2/01783.html
> mm-make-optimistic-check-for-swapin-readahead.patch
> mm-make-optimistic-check-for-swapin-readahead-fix-2.patch
> #mm-make-swapin-readahead-to-improve-thp-collapse-rate.patch: Hugh/Kirill want collapse_huge_page() rework
> mm-make-swapin-readahead-to-improve-thp-collapse-rate.patch
> mm-make-swapin-readahead-to-improve-thp-collapse-rate-fix.patch
> mm-make-swapin-readahead-to-improve-thp-collapse-rate-fix-2.patch
> #mm-make-swapin-readahead-to-improve-thp-collapse-rate-fix-3.patch: Ebru to test?
I've tested my whole patch series and could not produce the fault
again. I've also seen Tested-by tag from Sergey so I did not sent
the tag.
> mm-make-swapin-readahead-to-improve-thp-collapse-rate-fix-3.patch
>
> ...but I think some of that is stale. There were a few little bugs
> when it first went into mmotm, which Kirill very swiftly fixed up,
> and I don't think it has given anybody any trouble since then.
>
> But do I want to see this work go in? Yes and no. The problem it
> fixes (that although we give out a THP to someone who faults a single
> page of it, after swapout the THP cannot be recovered until they have
> faulted in every page of it) is real and embarrassing; the code is good;
> and I don't mind the max_ptes_swap tunable that concerns Andrew above;
> but Kirill and Vlastimil made important points that still trouble me.
>
> I can't locate Kirill's mail right now, perhaps I'm misremembering:
> but wasn't he concerned by your __collapse_huge_page_swapin() (likely
> to be allocating many small pages) being called under down_write of
> mmap_sem? That's usually something we soon regret, and even down_read
> of mmap_sem across many memory allocations would be unfortunate
> (khugepaged used to allocate its THP that way, but we have
> Vlastimil to thank for stopping that in his 8b1645685acf).
>
> And didn't Vlastimil (9/4/15) make some other unanswered
> observations about the call to __collapse_huge_page_swapin():
>
> > Hmm it seems rather wasteful to call this when no swap entries were detected.
> > Also it seems pointless to try continue collapsing when we have just only issued
> > async swap-in? What are the chances they would finish in time?
> >
> > I'm less sure about the relation vs khugepaged_alloc_page(). At this point, we
> > have already succeeded the hugepage allocation. It makes sense not to swap-in if
> > we can't allocate a hugepage. It makes also sense not to allocate a hugepage if
> > we will just issue async swap-ins and then free the hugepage back. Swap-in means
> > disk I/O that's best avoided if not useful. But the reclaim for hugepage
> > allocation might also involve disk I/O. At worst, it could be creating new swap
> > pte's in the very pmd we are scanning... Thoughts?
>
I did not take enough responsibility, you're right. I should have
asked regarding the patch at least.
> Doesn't this imply that __collapse_huge_page_swapin() will initiate all
> the necessary swapins for a THP, then (given the FAULT_FLAG_ALLOW_RETRY)
> not wait for them to complete, so khugepaged will give up on that extent
> and move on to another; then after another full circuit of all the mms
> it needs to examine, it will arrive back at this extent and build a THP
> from the swapins it arranged last time.
>
> Which may work well when a system transitions from busy+swappingout
> to idle+swappingin, but isn't that rather a special case? It feels
> (meaning, I've not measured at all) as if the inbetween busyish case
> will waste a lot of I/O and memory on swapins that have to be discarded
> again before khugepaged has made its sedate way back to slotting them in.
>
> So I wonder how useful this is in its present form. The problem being,
> not with your code as such, but the whole nature of khugepaged. When
> I had to solve a similar problem with recovering huge tmpfs pages (not
> yet posted), I did briefly consider whether to hook in to use khugepaged;
> but rejected that, and have never regretted using a workqueue item for
> the extent instead. Did Vlastimil (argh, him again!) propose something
> similar to replace khugepaged? Or should khugepaged fire off workqueue
> items for THP extents needing swapin?
>
> Hugh
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
prev parent reply other threads:[~2016-02-25 23:16 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-09-14 19:31 [RFC v5 0/3] mm: make swapin readahead to gain more thp performance Ebru Akagunduz
2015-09-14 19:31 ` [RFC v5 1/3] mm: add tracepoint for scanning pages Ebru Akagunduz
2015-09-14 19:31 ` [RFC v5 2/3] mm: make optimistic check for swapin readahead Ebru Akagunduz
2015-09-14 19:47 ` Rik van Riel
2015-09-14 21:33 ` Andrew Morton
2015-09-15 20:08 ` Ebru Akagunduz
2015-09-14 19:31 ` [RFC v5 3/3] mm: make swapin readahead to improve thp collapse rate Ebru Akagunduz
2015-09-17 13:28 ` Kirill A. Shutemov
2015-09-17 15:13 ` Kirill A. Shutemov
2015-09-14 21:41 ` [RFC v5 0/3] mm: make swapin readahead to gain more thp performance Andrew Morton
2016-02-25 7:36 ` Hugh Dickins
2016-02-25 22:35 ` Rik van Riel
2016-02-25 23:30 ` Ebru Akagunduz
2016-02-26 6:17 ` Hugh Dickins
2016-02-26 14:51 ` Rik van Riel
2016-03-03 22:08 ` Ebru Akagunduz
2016-02-25 23:16 ` Ebru Akagunduz [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20160225231644.GA13782@debian \
--to=ebru.akagunduz@gmail.com \
--cc=aarcange@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=aneesh.kumar@linux.vnet.ibm.com \
--cc=boaz@plexistor.com \
--cc=gorcunov@openvz.org \
--cc=hannes@cmpxchg.org \
--cc=hughd@google.com \
--cc=iamjoonsoo.kim@lge.com \
--cc=kirill.shutemov@linux.intel.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mgorman@suse.de \
--cc=mhocko@suse.cz \
--cc=n-horiguchi@ah.jp.nec.com \
--cc=raindel@mellanox.com \
--cc=riel@redhat.com \
--cc=rientjes@google.com \
--cc=vbabka@suse.cz \
--cc=xiexiuqi@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).