Re: [PATCH -mm -v2 0/6] mm, swap: VMA based swap readahead

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Minchan Kim <minchan@kernel.org>
To: "Huang, Ying" <ying.huang@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	Rik van Riel <riel@redhat.com>, Shaohua Li <shli@kernel.org>,
	Hugh Dickins <hughd@google.com>,
	Fengguang Wu <fengguang.wu@intel.com>,
	Tim Chen <tim.c.chen@intel.com>,
	Dave Hansen <dave.hansen@intel.com>,
	Johannes Weiner <hannes@cmpxchg.org>
Subject: Re: [PATCH -mm -v2 0/6] mm, swap: VMA based swap readahead
Date: Fri, 30 Jun 2017 11:26:26 +0900	[thread overview]
Message-ID: <20170630022626.GA25190@bbox> (raw)
In-Reply-To: <20170630014443.23983-1-ying.huang@intel.com>

Hi Huang,

Ccing Johannes:

I don't read this patch yet but I remember Johannes tried VMA-based
readahead approach long time ago so he might have good comment.

On Fri, Jun 30, 2017 at 09:44:37AM +0800, Huang, Ying wrote:
> The swap readahead is an important mechanism to reduce the swap in
> latency.  Although pure sequential memory access pattern isn't very
> popular for anonymous memory, the space locality is still considered
> valid.
> 
> In the original swap readahead implementation, the consecutive blocks
> in swap device are readahead based on the global space locality
> estimation.  But the consecutive blocks in swap device just reflect
> the order of page reclaiming, don't necessarily reflect the access
> pattern in virtual memory space.  And the different tasks in the
> system may have different access patterns, which makes the global
> space locality estimation incorrect.
> 
> In this patchset, when page fault occurs, the virtual pages near the
> fault address will be readahead instead of the swap slots near the
> fault swap slot in swap device.  This avoid to readahead the unrelated
> swap slots.  At the same time, the swap readahead is changed to work
> on per-VMA from globally.  So that the different access patterns of
> the different VMAs could be distinguished, and the different readahead
> policy could be applied accordingly.  The original core readahead
> detection and scaling algorithm is reused, because it is an effect
> algorithm to detect the space locality.
> 
> In addition to the swap readahead changes, some new sysfs interface is
> added to show the efficiency of the readahead algorithm and some other
> swap statistics.
> 
> This new implementation will incur more small random read, on SSD, the
> improved correctness of estimation and readahead target should beat
> the potential increased overhead, this is also illustrated in the test
> results below.  But on HDD, the overhead may beat the benefit, so the
> original implementation will be used by default.
> 
> The test and result is as follow,
> 
> Common test condition
> =====================
> 
> Test Machine: Xeon E5 v3 (2 sockets, 72 threads, 32G RAM)
> Swap device: NVMe disk
> 
> Micro-benchmark with combined access pattern
> ============================================
> 
> vm-scalability, sequential swap test case, 4 processes to eat 50G
> virtual memory space, repeat the sequential memory writing until 300
> seconds.  The first round writing will trigger swap out, the following
> rounds will trigger sequential swap in and out.
> 
> At the same time, run vm-scalability random swap test case in
> background, 8 processes to eat 30G virtual memory space, repeat the
> random memory write until 300 seconds.  This will trigger random
> swap-in in the background.
> 
> This is a combined workload with sequential and random memory
> accessing at the same time.  The result (for sequential workload) is
> as follow,
> 
> 			Base		Optimized
> 			----		---------
> throughput		345413 KB/s	414029 KB/s (+19.9%)
> latency.average		97.14 us	61.06 us (-37.1%)
> latency.50th		2 us		1 us
> latency.60th		2 us		1 us
> latency.70th		98 us		2 us
> latency.80th		160 us		2 us
> latency.90th		260 us		217 us
> latency.95th		346 us		369 us
> latency.99th		1.34 ms		1.09 ms
> ra_hit%			52.69%		99.98%
> 
> The original swap readahead algorithm is confused by the background
> random access workload, so readahead hit rate is lower.  The VMA-base
> readahead algorithm works much better.
> 
> Linpack
> =======
> 
> The test memory size is bigger than RAM to trigger swapping.
> 
> 			Base		Optimized
> 			----		---------
> elapsed_time		393.49 s	329.88 s (-16.2%)
> ra_hit%			86.21%		98.82%
> 
> The score of base and optimized kernel hasn't visible changes.  But
> the elapsed time reduced and readahead hit rate improved, so the
> optimized kernel runs better for startup and tear down stages.  And
> the absolute value of readahead hit rate is high, shows that the space
> locality is still valid in some practical workloads.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

WARNING: multiple messages have this Message-ID (diff)

From: Minchan Kim <minchan@kernel.org>
To: "Huang, Ying" <ying.huang@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	Rik van Riel <riel@redhat.com>, Shaohua Li <shli@kernel.org>,
	Hugh Dickins <hughd@google.com>,
	Fengguang Wu <fengguang.wu@intel.com>,
	Tim Chen <tim.c.chen@intel.com>,
	Dave Hansen <dave.hansen@intel.com>,
	Johannes Weiner <hannes@cmpxchg.org>
Subject: Re: [PATCH -mm -v2 0/6] mm, swap: VMA based swap readahead
Date: Fri, 30 Jun 2017 11:26:26 +0900	[thread overview]
Message-ID: <20170630022626.GA25190@bbox> (raw)
In-Reply-To: <20170630014443.23983-1-ying.huang@intel.com>

Hi Huang,

Ccing Johannes:

I don't read this patch yet but I remember Johannes tried VMA-based
readahead approach long time ago so he might have good comment.

On Fri, Jun 30, 2017 at 09:44:37AM +0800, Huang, Ying wrote:
> The swap readahead is an important mechanism to reduce the swap in
> latency.  Although pure sequential memory access pattern isn't very
> popular for anonymous memory, the space locality is still considered
> valid.
> 
> In the original swap readahead implementation, the consecutive blocks
> in swap device are readahead based on the global space locality
> estimation.  But the consecutive blocks in swap device just reflect
> the order of page reclaiming, don't necessarily reflect the access
> pattern in virtual memory space.  And the different tasks in the
> system may have different access patterns, which makes the global
> space locality estimation incorrect.
> 
> In this patchset, when page fault occurs, the virtual pages near the
> fault address will be readahead instead of the swap slots near the
> fault swap slot in swap device.  This avoid to readahead the unrelated
> swap slots.  At the same time, the swap readahead is changed to work
> on per-VMA from globally.  So that the different access patterns of
> the different VMAs could be distinguished, and the different readahead
> policy could be applied accordingly.  The original core readahead
> detection and scaling algorithm is reused, because it is an effect
> algorithm to detect the space locality.
> 
> In addition to the swap readahead changes, some new sysfs interface is
> added to show the efficiency of the readahead algorithm and some other
> swap statistics.
> 
> This new implementation will incur more small random read, on SSD, the
> improved correctness of estimation and readahead target should beat
> the potential increased overhead, this is also illustrated in the test
> results below.  But on HDD, the overhead may beat the benefit, so the
> original implementation will be used by default.
> 
> The test and result is as follow,
> 
> Common test condition
> =====================
> 
> Test Machine: Xeon E5 v3 (2 sockets, 72 threads, 32G RAM)
> Swap device: NVMe disk
> 
> Micro-benchmark with combined access pattern
> ============================================
> 
> vm-scalability, sequential swap test case, 4 processes to eat 50G
> virtual memory space, repeat the sequential memory writing until 300
> seconds.  The first round writing will trigger swap out, the following
> rounds will trigger sequential swap in and out.
> 
> At the same time, run vm-scalability random swap test case in
> background, 8 processes to eat 30G virtual memory space, repeat the
> random memory write until 300 seconds.  This will trigger random
> swap-in in the background.
> 
> This is a combined workload with sequential and random memory
> accessing at the same time.  The result (for sequential workload) is
> as follow,
> 
> 			Base		Optimized
> 			----		---------
> throughput		345413 KB/s	414029 KB/s (+19.9%)
> latency.average		97.14 us	61.06 us (-37.1%)
> latency.50th		2 us		1 us
> latency.60th		2 us		1 us
> latency.70th		98 us		2 us
> latency.80th		160 us		2 us
> latency.90th		260 us		217 us
> latency.95th		346 us		369 us
> latency.99th		1.34 ms		1.09 ms
> ra_hit%			52.69%		99.98%
> 
> The original swap readahead algorithm is confused by the background
> random access workload, so readahead hit rate is lower.  The VMA-base
> readahead algorithm works much better.
> 
> Linpack
> =======
> 
> The test memory size is bigger than RAM to trigger swapping.
> 
> 			Base		Optimized
> 			----		---------
> elapsed_time		393.49 s	329.88 s (-16.2%)
> ra_hit%			86.21%		98.82%
> 
> The score of base and optimized kernel hasn't visible changes.  But
> the elapsed time reduced and readahead hit rate improved, so the
> optimized kernel runs better for startup and tear down stages.  And
> the absolute value of readahead hit rate is high, shows that the space
> locality is still valid in some practical workloads.

next prev parent reply	other threads:[~2017-06-30  2:26 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-06-30  1:44 [PATCH -mm -v2 0/6] mm, swap: VMA based swap readahead Huang, Ying
2017-06-30  1:44 ` Huang, Ying
2017-06-30  1:44 ` [PATCH -mm -v2 1/6] mm, swap: Add swap cache statistics sysfs interface Huang, Ying
2017-06-30  1:44   ` Huang, Ying
2017-06-30  1:44 ` [PATCH -mm -v2 2/6] mm, swap: Add swap readahead hit statistics Huang, Ying
2017-06-30  1:44   ` Huang, Ying
2017-07-11 18:25   ` Dave Hansen
2017-07-11 18:25     ` Dave Hansen
2017-07-12  2:22     ` Huang, Ying
2017-07-12  2:22       ` Huang, Ying
2017-06-30  1:44 ` [PATCH -mm -v2 3/6] mm, swap: Fix swap readahead marking Huang, Ying
2017-06-30  1:44   ` Huang, Ying
2017-06-30  1:44 ` [PATCH -mm -v2 4/6] mm, swap: VMA based swap readahead Huang, Ying
2017-06-30  1:44   ` Huang, Ying
2017-06-30  1:44 ` [PATCH -mm -v2 5/6] mm, swap: Add sysfs interface for " Huang, Ying
2017-06-30  1:44   ` Huang, Ying
2017-06-30  1:44 ` [PATCH -mm -v2 6/6] mm, swap: Don't use VMA based swap readahead if HDD is used as swap Huang, Ying
2017-06-30  1:44   ` Huang, Ying
2017-06-30  2:26 ` Minchan Kim [this message]
2017-06-30  2:26   ` [PATCH -mm -v2 0/6] mm, swap: VMA based swap readahead Minchan Kim
2017-06-30  7:53   ` Huang, Ying
2017-06-30  7:53     ` Huang, Ying

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170630022626.GA25190@bbox \
    --to=minchan@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=dave.hansen@intel.com \
    --cc=fengguang.wu@intel.com \
    --cc=hannes@cmpxchg.org \
    --cc=hughd@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=riel@redhat.com \
    --cc=shli@kernel.org \
    --cc=tim.c.chen@intel.com \
    --cc=ying.huang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.