Re: [RFC 00/11] THP swap: Delay splitting THP during swapping out

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Minchan Kim <minchan@kernel.org>
To: Tim Chen <tim.c.chen@linux.intel.com>
Cc: "Huang, Ying" <ying.huang@intel.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	tim.c.chen@intel.com, dave.hansen@intel.com,
	andi.kleen@intel.com, aaron.lu@intel.com,
	"Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>,
	Andrea Arcangeli <aarcange@redhat.com>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [RFC 00/11] THP swap: Delay splitting THP during swapping out
Date: Thu, 18 Aug 2016 17:39:55 +0900	[thread overview]
Message-ID: <20160818083955.GA12296@bbox> (raw)
In-Reply-To: <1471454696.2888.94.camel@linux.intel.com>

Hi Tim,

On Wed, Aug 17, 2016 at 10:24:56AM -0700, Tim Chen wrote:
> On Wed, 2016-08-17 at 14:07 +0900, Minchan Kim wrote:
> > On Tue, Aug 16, 2016 at 07:06:00PM -0700, Huang, Ying wrote:
> > > 
> > > 
> > > > 
> > > > I think Tim and me discussed about that a few weeks ago.
> > > I work closely with Tim on swap optimization.  This patchset is the part
> > > of our swap optimization plan.
> > > 
> > > > 
> > > > Please search below topics.
> > > > 
> > > > [1] mm: Batch page reclamation under shink_page_list
> > > > [2] mm: Cleanup - Reorganize the shrink_page_list code into smaller functions
> > > > 
> > > > It's different with yours which focused on THP swapping while the suggestion
> > > > would be more general if we can do so it's worth to try it, I think.
> > > I think the general optimization above will benefit both normal pages
> > > and THP at least for now.  And I think there are no hard conflict
> > > between those two patchsets.
> > If we could do general optimzation, I guess THP swap without splitting
> > would be more straight forward.
> > 
> > If we can reclaim batch a certain of pages all at once, it helps we can
> > do scan_swap_map(si, SWAP_HAS_CACHE, nr_pages). The nr_pages could be
> > greater or less than 512 pages. With that, scan_swap_map effectively
> > search empty swap slots from scan_map or free cluser list.
> > Then, needed part from your patchset is to just delay splitting of THP.
> > 
> > > 
> > > 
> > > The THP swap has more opportunity to be optimized, because we can batch
> > > 512 operations together more easily.  For full THP swap support, unmap a
> > > THP could be more efficient with only one swap count operation instead
> > > of 512, so do many other operations, such as add/remove from swap cache
> > > with multi-order radix tree etc.  And it will help memory fragmentation.
> > > THP can be kept after swapping out/in, need not to rebuild THP via
> > > khugepaged.
> > It seems you increased cluster size to 512 and search a empty cluster
> > for a THP swap. With that approach, I have a concern that once clusters
> > will be fragmented, THP swap support doesn't take benefit at all.
> > 
> > Why do we need a empty cluster for swapping out 512 pages?
> > IOW, below case could work for the goal.
> > 
> > A : Allocated slot
> > F : Free slot
> > 
> > cluster A   cluster B
> > AAAAFFFF  -  FFFFAAAA
> > 
> > That's one of the reason I suggested batch reclaim work first and
> > support THP swap based on it. With that, scan_swap_map can be aware of nr_pages
> > and selects right clusters.
> > 
> > With the approach, justfication of THP swap support would be easier, too.
> > IOW, I'm not sure how only THP swap support is valuable in real workload.
> > 
> > Anyways, that's just my two cents.
> 
> Minchan,
> 
> Scanning for contiguous slots that span clusters may take quite a
> long time under fragmentation, and may eventually fail.  In that case the addition scan
> time overhead may go to waste and defeat the purpose of fast swapping of large page.
> 
> The empty cluster lookup on the other hand is very fast.
> We treat the empty cluster available case as an opportunity for fast path
> swap out of large page.  Otherwise, we'll revert to the current
> slow path behavior of breaking into normal pages so there's no
> regression, and we may get speed up.  We can be considerably faster when a lot of large
> pages are used.  

I didn't mean we should search scan_swap_map firstly without peeking
free cluster but what I wanted was we might abstract it into
scan_swap_map.

For example, if nr_pages is greather than the size of cluster, we can
get empty cluster first and nr_pages - sizeof(cluster) for other free
cluster or scanning of current CPU per-cpu cluster. If we cannot find
used slot during scanning, we can bail out simply. Then, although we
fail to get all* contiguous slots, we get a certain of contiguous slots
so it would be benefit for seq write and lock batching point of view
at the cost of a little scanning. And it's not specific to THP algorighm.

My point is that once we optimize normal page batch for swap, THP
swap support would be more straight forward. But I should admit I didn't
look into code in detail so it might have clear hurdle to implement it
so I will rely on you guys's decision whether which one is more urgent/
benefit/making good code quality for the goal.

Thanks.

> 
> 
> > 
> > > 
> > > 
> > > But not all pages are huge, so normal pages swap optimization is
> > > necessary and good anyway.
> > > 
> 
> Yes, optimizing the normal swap pages is still an important goal
> for us.  THP swap optimization is complementary component.  
> 
> We have seen system with THP spend significant cpu cycles breaking up the
> pages on swap out and then compacting the pages for THP again after
> swap in.  So if we can avoid this, that will be helpful.
> 
> Thanks for your valuable comments.

Thanks for good works.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

next prev parent reply	other threads:[~2016-08-18  8:39 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-08-09 16:37 [RFC 00/11] THP swap: Delay splitting THP during swapping out Huang, Ying
2016-08-09 16:37 ` [RFC 01/11] swap: Add swap_cluster_list Huang, Ying
2016-08-09 16:37 ` [RFC 02/11] swap: Change SWAPFILE_CLUSTER to 512 Huang, Ying
2016-08-09 16:37 ` [RFC 03/11] mm, memcg: Add swap_cgroup_iter iterator Huang, Ying
2016-08-09 16:37 ` [RFC 04/11] mm, memcg: Support to charge/uncharge multiple swap entries Huang, Ying
2016-08-09 16:37 ` [RFC 05/11] mm, THP, swap: Add swap cluster allocate/free functions Huang, Ying
2016-08-09 16:37 ` [RFC 06/11] mm, THP, swap: Add get_huge_swap_page() Huang, Ying
2016-08-09 16:37 ` [RFC 07/11] mm, THP, swap: Support to clear SWAP_HAS_CACHE for huge page Huang, Ying
2016-08-09 16:37 ` [RFC 08/11] mm, THP, swap: Support to add/delete THP to/from swap cache Huang, Ying
2016-08-09 16:37 ` [RFC 09/11] mm, THP: Add can_split_huge_page() Huang, Ying
2016-08-09 16:37 ` [RFC 10/11] mm, THP, swap: Support to split THP in swap cache Huang, Ying
2016-08-09 16:37 ` [RFC 11/11] mm, THP, swap: Delay splitting THP during swap out Huang, Ying
2016-08-09 17:25 ` [RFC 00/11] THP swap: Delay splitting THP during swapping out Huang, Ying
2016-08-17  0:59 ` Minchan Kim
2016-08-17  2:06   ` Huang, Ying
2016-08-17  5:07     ` Minchan Kim
2016-08-17 17:24       ` Tim Chen
2016-08-18  8:39         ` Minchan Kim [this message]
2016-08-18 17:19           ` Huang, Ying
2016-08-19  0:49             ` Minchan Kim
2016-08-19  3:44               ` Huang, Ying
2016-08-19  6:44                 ` Minchan Kim
2016-08-19  6:47                   ` Minchan Kim
2016-08-19 23:43                   ` Huang, Ying
2016-08-22 21:33   ` Huang, Ying
2016-08-24  1:00     ` Minchan Kim
2016-08-24  2:18       ` Huang, Ying

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160818083955.GA12296@bbox \
    --to=minchan@kernel.org \
    --cc=aarcange@redhat.com \
    --cc=aaron.lu@intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=andi.kleen@intel.com \
    --cc=dave.hansen@intel.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=tim.c.chen@intel.com \
    --cc=tim.c.chen@linux.intel.com \
    --cc=ying.huang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).