From: Minchan Kim <minchan@kernel.org>
To: Michal Hocko <mhocko@kernel.org>
Cc: akpm@linux-foundation.org, ebru.akagunduz@gmail.com,
aarcange@redhat.com, aneesh.kumar@linux.vnet.ibm.com,
boaz@plexistor.com, gorcunov@openvz.org, hannes@cmpxchg.org,
hughd@google.com, iamjoonsoo.kim@lge.com,
kirill.shutemov@linux.intel.com, mgorman@suse.de,
n-horiguchi@ah.jp.nec.com, riel@redhat.com, rientjes@google.com,
vbabka@suse.cz, mm-commits@vger.kernel.org, linux-mm@kvack.org
Subject: Re: + mm-thp-avoid-unnecessary-swapin-in-khugepaged.patch added to -mm tree
Date: Fri, 20 May 2016 09:21:55 +0900 [thread overview]
Message-ID: <20160520002155.GA2224@bbox> (raw)
In-Reply-To: <20160519073957.GE26110@dhcp22.suse.cz>
On Thu, May 19, 2016 at 09:39:57AM +0200, Michal Hocko wrote:
> On Thu 19-05-16 16:27:51, Minchan Kim wrote:
> > On Thu, May 19, 2016 at 09:03:57AM +0200, Michal Hocko wrote:
> > > On Thu 19-05-16 14:00:38, Minchan Kim wrote:
> > > > On Tue, May 17, 2016 at 11:02:54AM +0200, Michal Hocko wrote:
> > > > > On Tue 17-05-16 09:58:15, Michal Hocko wrote:
> > > > > > On Thu 28-04-16 17:19:21, Michal Hocko wrote:
> > > > > > > On Wed 27-04-16 14:17:20, Andrew Morton wrote:
> > > > > > > [...]
> > > > > > > > @@ -2484,7 +2485,14 @@ static void collapse_huge_page(struct mm
> > > > > > > > goto out;
> > > > > > > > }
> > > > > > > >
> > > > > > > > - __collapse_huge_page_swapin(mm, vma, address, pmd);
> > > > > > > > + swap = get_mm_counter(mm, MM_SWAPENTS);
> > > > > > > > + curr_allocstall = sum_vm_event(ALLOCSTALL);
> > > > > > > > + /*
> > > > > > > > + * When system under pressure, don't swapin readahead.
> > > > > > > > + * So that avoid unnecessary resource consuming.
> > > > > > > > + */
> > > > > > > > + if (allocstall == curr_allocstall && swap != 0)
> > > > > > > > + __collapse_huge_page_swapin(mm, vma, address, pmd);
> > > > > > > >
> > > > > > > > anon_vma_lock_write(vma->anon_vma);
> > > > > > > >
> > > > > > >
> > > > > > > I have mentioned that before already but this seems like a rather weak
> > > > > > > heuristic. Don't we really rather teach __collapse_huge_page_swapin
> > > > > > > (resp. do_swap_page) do to an optimistic GFP_NOWAIT allocations and
> > > > > > > back off under the memory pressure?
> > > > > >
> > > > > > I gave it a try and it doesn't seem really bad. Untested and I might
> > > > > > have missed something really obvious but what do you think about this
> > > > > > approach rather than relying on ALLOCSTALL which is really weak
> > > > > > heuristic:
> > > >
> > > > I like this approach rather than playing with allocstall diff of vmevent
> > > > which can be disabled in some configuration and it's not a good indicator
> > > > to represent current memory pressure situation.
> > >
> > > Not only that it won't work for e.g. memcg configurations because we
> > > would end up reclaiming that memcg as the gfp mask tells us to do so and
> > > ALLOCSTALL would be quite about that.
> >
> > Right you are. I didn't consider memcg. Thanks for pointing out.
> >
> > >
> > > > However, I agree with Rik's requirement which doesn't want to turn over
> > > > page cache for collapsing THP page via swapin. So, your suggestion cannot
> > > > prevent it because khugepaged can consume memory through this swapin
> > > > operation continuously while kswapd is doing aging of LRU list in parallel.
> > > > IOW, fluctuation between HIGH and LOW watermark.
> > >
> > > I am not sure this is actually a problem. We have other sources of
> > > opportunistic allocations with some fallback and those wake up kswapd
> > > (they only clear __GFP_DIRECT_RECLAIM). Also this swapin should happen
> > > only when a certain portion of the huge page is already populated so
> >
> > I can't find any logic you mentioned "a certain portion of the huge page
> > is already populated" in next-20160517. What am I missing now?
>
> khugepaged_max_ptes_swap. I didn't look closer but from a quick glance
> this is the threshold for the optimistic swapin.
Thanks. I see it now.
>
> > > it won't happen all the time and sounds like we would benefit from the
> > > reclaimed page cache in favor of the THP.
> >
> > It depends on storage speed. If a page is swapped out, it means it's not a
> > workingset so we might read cold page at the cost of evciting warm page.
> > Additionally, if the huge page was swapped out, it is likely to swap out
> > again because it's not a hot * 512 page. For those pages, shouldn't we
> > evict page cache? I think it's not a good tradeoff.
>
> This is exactly the problem of the optimistic THP swap in. We just do
> not know whether it is worth it. But I guess that a reasonable threshold
> would solve this. It is really ineffective to keep small pages when only
> few holes are swapped out (for what ever reasons). HPAGE_PMD_NR/8 which
> we use right now is not documented but I guess 64 pages sounds like a
> reasonable value which shouldn't cause way too much of reclaim.
I don't know what's the best defaut vaule. Anyway I agree we should introduce
the threshold to collapse THP through swapin IO operation.
I think other important thing we should consider is how the THP page is likely
to be hot without split in a short time like KSM is doing double checking to
merge stable page. Of course, it wouldn't be easyI to implement but I think
algorithm is based on such *hotness* basically and then consider the number
of max_swap_ptes. IOW, I think we should approach more conservative rather
than optimistic because a page I/O overhead by wrong choice could be bigger
than benefit of a few TLB hit.
If we approach that way, maybe we don't need to detect memory pressure.
For that way, how about raising bar for swapin allowance?
I mean now we allows swapin
from
64 pages below swap ptes and 1 page young in 512 ptes
to
64 pages below swap ptes and 256 page young in 512 ptes
>
> --
> Michal Hocko
> SUSE Labs
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2016-05-20 0:21 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <57212c60.fUSE244UFwhXE+az%akpm@linux-foundation.org>
2016-04-28 15:19 ` + mm-thp-avoid-unnecessary-swapin-in-khugepaged.patch added to -mm tree Michal Hocko
2016-05-17 7:58 ` Michal Hocko
2016-05-17 9:02 ` Michal Hocko
2016-05-17 11:31 ` Kirill A. Shutemov
2016-05-17 12:25 ` Michal Hocko
2016-05-19 5:00 ` Minchan Kim
2016-05-19 7:03 ` Michal Hocko
2016-05-19 7:27 ` Minchan Kim
2016-05-19 7:39 ` Michal Hocko
2016-05-20 0:21 ` Minchan Kim [this message]
2016-05-20 6:39 ` Michal Hocko
2016-05-20 7:26 ` Minchan Kim
2016-05-20 7:34 ` Michal Hocko
2016-05-20 7:44 ` Minchan Kim
2016-05-20 8:02 ` Michal Hocko
2016-05-20 8:26 ` Minchan Kim
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20160520002155.GA2224@bbox \
--to=minchan@kernel.org \
--cc=aarcange@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=aneesh.kumar@linux.vnet.ibm.com \
--cc=boaz@plexistor.com \
--cc=ebru.akagunduz@gmail.com \
--cc=gorcunov@openvz.org \
--cc=hannes@cmpxchg.org \
--cc=hughd@google.com \
--cc=iamjoonsoo.kim@lge.com \
--cc=kirill.shutemov@linux.intel.com \
--cc=linux-mm@kvack.org \
--cc=mgorman@suse.de \
--cc=mhocko@kernel.org \
--cc=mm-commits@vger.kernel.org \
--cc=n-horiguchi@ah.jp.nec.com \
--cc=riel@redhat.com \
--cc=rientjes@google.com \
--cc=vbabka@suse.cz \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).