linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Minchan Kim <minchan@kernel.org>
To: "Huang, Ying" <ying.huang@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	Rik van Riel <riel@redhat.com>, Shaohua Li <shli@kernel.org>,
	Hugh Dickins <hughd@google.com>,
	Fengguang Wu <fengguang.wu@intel.com>,
	Tim Chen <tim.c.chen@intel.com>,
	Dave Hansen <dave.hansen@intel.com>,
	Johannes Weiner <hannes@cmpxchg.org>
Subject: Re: [PATCH -mm -v2 0/6] mm, swap: VMA based swap readahead
Date: Fri, 30 Jun 2017 11:26:26 +0900	[thread overview]
Message-ID: <20170630022626.GA25190@bbox> (raw)
In-Reply-To: <20170630014443.23983-1-ying.huang@intel.com>

Hi Huang,

Ccing Johannes:

I don't read this patch yet but I remember Johannes tried VMA-based
readahead approach long time ago so he might have good comment.

On Fri, Jun 30, 2017 at 09:44:37AM +0800, Huang, Ying wrote:
> The swap readahead is an important mechanism to reduce the swap in
> latency.  Although pure sequential memory access pattern isn't very
> popular for anonymous memory, the space locality is still considered
> valid.
> 
> In the original swap readahead implementation, the consecutive blocks
> in swap device are readahead based on the global space locality
> estimation.  But the consecutive blocks in swap device just reflect
> the order of page reclaiming, don't necessarily reflect the access
> pattern in virtual memory space.  And the different tasks in the
> system may have different access patterns, which makes the global
> space locality estimation incorrect.
> 
> In this patchset, when page fault occurs, the virtual pages near the
> fault address will be readahead instead of the swap slots near the
> fault swap slot in swap device.  This avoid to readahead the unrelated
> swap slots.  At the same time, the swap readahead is changed to work
> on per-VMA from globally.  So that the different access patterns of
> the different VMAs could be distinguished, and the different readahead
> policy could be applied accordingly.  The original core readahead
> detection and scaling algorithm is reused, because it is an effect
> algorithm to detect the space locality.
> 
> In addition to the swap readahead changes, some new sysfs interface is
> added to show the efficiency of the readahead algorithm and some other
> swap statistics.
> 
> This new implementation will incur more small random read, on SSD, the
> improved correctness of estimation and readahead target should beat
> the potential increased overhead, this is also illustrated in the test
> results below.  But on HDD, the overhead may beat the benefit, so the
> original implementation will be used by default.
> 
> The test and result is as follow,
> 
> Common test condition
> =====================
> 
> Test Machine: Xeon E5 v3 (2 sockets, 72 threads, 32G RAM)
> Swap device: NVMe disk
> 
> Micro-benchmark with combined access pattern
> ============================================
> 
> vm-scalability, sequential swap test case, 4 processes to eat 50G
> virtual memory space, repeat the sequential memory writing until 300
> seconds.  The first round writing will trigger swap out, the following
> rounds will trigger sequential swap in and out.
> 
> At the same time, run vm-scalability random swap test case in
> background, 8 processes to eat 30G virtual memory space, repeat the
> random memory write until 300 seconds.  This will trigger random
> swap-in in the background.
> 
> This is a combined workload with sequential and random memory
> accessing at the same time.  The result (for sequential workload) is
> as follow,
> 
> 			Base		Optimized
> 			----		---------
> throughput		345413 KB/s	414029 KB/s (+19.9%)
> latency.average		97.14 us	61.06 us (-37.1%)
> latency.50th		2 us		1 us
> latency.60th		2 us		1 us
> latency.70th		98 us		2 us
> latency.80th		160 us		2 us
> latency.90th		260 us		217 us
> latency.95th		346 us		369 us
> latency.99th		1.34 ms		1.09 ms
> ra_hit%			52.69%		99.98%
> 
> The original swap readahead algorithm is confused by the background
> random access workload, so readahead hit rate is lower.  The VMA-base
> readahead algorithm works much better.
> 
> Linpack
> =======
> 
> The test memory size is bigger than RAM to trigger swapping.
> 
> 			Base		Optimized
> 			----		---------
> elapsed_time		393.49 s	329.88 s (-16.2%)
> ra_hit%			86.21%		98.82%
> 
> The score of base and optimized kernel hasn't visible changes.  But
> the elapsed time reduced and readahead hit rate improved, so the
> optimized kernel runs better for startup and tear down stages.  And
> the absolute value of readahead hit rate is high, shows that the space
> locality is still valid in some practical workloads.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  parent reply	other threads:[~2017-06-30  2:26 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-06-30  1:44 [PATCH -mm -v2 0/6] mm, swap: VMA based swap readahead Huang, Ying
2017-06-30  1:44 ` [PATCH -mm -v2 1/6] mm, swap: Add swap cache statistics sysfs interface Huang, Ying
2017-06-30  1:44 ` [PATCH -mm -v2 2/6] mm, swap: Add swap readahead hit statistics Huang, Ying
2017-07-11 18:25   ` Dave Hansen
2017-07-12  2:22     ` Huang, Ying
2017-06-30  1:44 ` [PATCH -mm -v2 3/6] mm, swap: Fix swap readahead marking Huang, Ying
2017-06-30  1:44 ` [PATCH -mm -v2 4/6] mm, swap: VMA based swap readahead Huang, Ying
2017-06-30  1:44 ` [PATCH -mm -v2 5/6] mm, swap: Add sysfs interface for " Huang, Ying
2017-06-30  1:44 ` [PATCH -mm -v2 6/6] mm, swap: Don't use VMA based swap readahead if HDD is used as swap Huang, Ying
2017-06-30  2:26 ` Minchan Kim [this message]
2017-06-30  7:53   ` [PATCH -mm -v2 0/6] mm, swap: VMA based swap readahead Huang, Ying

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170630022626.GA25190@bbox \
    --to=minchan@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=dave.hansen@intel.com \
    --cc=fengguang.wu@intel.com \
    --cc=hannes@cmpxchg.org \
    --cc=hughd@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=riel@redhat.com \
    --cc=shli@kernel.org \
    --cc=tim.c.chen@intel.com \
    --cc=ying.huang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).