Re: [LSF/MM TOPIC]swap improvements for fast SSD

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Simon Jeons <simon.jeons@gmail.com>
To: Minchan Kim <minchan@kernel.org>
Cc: Shaohua Li <shli@kernel.org>,
	lsf-pc@lists.linux-foundation.org, linux-mm@kvack.org,
	Rik van Riel <riel@redhat.com>
Subject: Re: [LSF/MM TOPIC]swap improvements for fast SSD
Date: Fri, 05 Apr 2013 08:17:00 +0800	[thread overview]
Message-ID: <515E17FC.9050008@gmail.com> (raw)
In-Reply-To: <20130123075808.GH2723@blaptop>

[-- Attachment #1: Type: text/plain, Size: 4832 bytes --]

Hi Minchan,
On 01/23/2013 03:58 PM, Minchan Kim wrote:
> On Tue, Jan 22, 2013 at 02:53:41PM +0800, Shaohua Li wrote:
>> Hi,
>>
>> Because of high density, low power and low price, flash storage (SSD) is a good
>> candidate to partially replace DRAM. A quick answer for this is using SSD as
>> swap. But Linux swap is designed for slow hard disk storage. There are a lot of
>> challenges to efficiently use SSD for swap:
> Many of below item could be applied in in-memory swap like zram, zcache.
>
>> 1. Lock contentions (swap_lock, anon_vma mutex, swap address space lock)
>> 2. TLB flush overhead. To reclaim one page, we need at least 2 TLB flush. This
>> overhead is very high even in a normal 2-socket machine.
>> 3. Better swap IO pattern. Both direct and kswapd page reclaim can do swap,
>> which makes swap IO pattern is interleave. Block layer isn't always efficient
>> to do request merge. Such IO pattern also makes swap prefetch hard.
> Agreed.
>
>> 4. Swap map scan overhead. Swap in-memory map scan scans an array, which is
>> very inefficient, especially if swap storage is fast.
> Agreed.
>
>> 5. SSD related optimization, mainly discard support
>> 6. Better swap prefetch algorithm. Besides item 3, sequentially accessed pages
>> aren't always in LRU list adjacently, so page reclaim will not swap such pages
>> in adjacent storage sectors. This makes swap prefetch hard.
> One of problem is LRU churning and I wanted to try to fix it.
> http://marc.info/?l=linux-mm&m=130978831028952&w=4

I'm interested in this feature, why it didn't merged? what's the fatal 
issue in your patchset?
http://lwn.net/Articles/449866/
You mentioned test script and all-at-once patch, but I can't get them 
from the URL, could you tell me how to get it?

>
>> 7. Alternative page reclaim policy to bias reclaiming anonymous page.
>> Currently reclaim anonymous page is considering harder than reclaim file pages,
>> so we bias reclaiming file pages. If there are high speed swap storage, we are
>> considering doing swap more aggressively.
> Yeb. We need it. I tried it with extending vm_swappiness to 200.
>
> From: Minchan Kim <minchan@kernel.org>
> Date: Mon, 3 Dec 2012 16:21:00 +0900
> Subject: [PATCH] mm: increase swappiness to 200
>
> We have thought swap out cost is very high but it's not true
> if we use fast device like swap-over-zram. Nonetheless, we can
> swap out 1:1 ratio of anon and page cache at most.
> It's not enough to use swap device fully so we encounter OOM kill
> while there are many free space in zram swap device. It's never
> what we want.
>
> This patch makes swap out aggressively.
>
> Cc: Luigi Semenzato <semenzato@google.com>
> Signed-off-by: Minchan Kim <minchan@kernel.org>
> ---
>   kernel/sysctl.c |    3 ++-
>   mm/vmscan.c     |    6 ++++--
>   2 files changed, 6 insertions(+), 3 deletions(-)
>
> diff --git a/kernel/sysctl.c b/kernel/sysctl.c
> index 693e0ed..f1dbd9d 100644
> --- a/kernel/sysctl.c
> +++ b/kernel/sysctl.c
> @@ -130,6 +130,7 @@ static int __maybe_unused two = 2;
>   static int __maybe_unused three = 3;
>   static unsigned long one_ul = 1;
>   static int one_hundred = 100;
> +extern int max_swappiness;
>   #ifdef CONFIG_PRINTK
>   static int ten_thousand = 10000;
>   #endif
> @@ -1157,7 +1158,7 @@ static struct ctl_table vm_table[] = {
>                  .mode           = 0644,
>                  .proc_handler   = proc_dointvec_minmax,
>                  .extra1         = &zero,
> -               .extra2         = &one_hundred,
> +               .extra2         = &max_swappiness,
>          },
>   #ifdef CONFIG_HUGETLB_PAGE
>          {
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 53dcde9..64f3c21 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -53,6 +53,8 @@
>   #define CREATE_TRACE_POINTS
>   #include <trace/events/vmscan.h>
>   
> +int max_swappiness = 200;
> +
>   struct scan_control {
>          /* Incremented by the number of inactive pages that were scanned */
>          unsigned long nr_scanned;
> @@ -1626,6 +1628,7 @@ static int vmscan_swappiness(struct scan_control *sc)
>          return mem_cgroup_swappiness(sc->target_mem_cgroup);
>   }
>   
> +
>   /*
>    * Determine how aggressively the anon and file LRU lists should be
>    * scanned.  The relative value of each set of LRU lists is determined
> @@ -1701,11 +1704,10 @@ static void get_scan_count(struct lruvec *lruvec, struct scan_control *sc,
>          }
>   
>          /*
> -        * With swappiness at 100, anonymous and file have the same priority.
>           * This scanning priority is essentially the inverse of IO cost.
>           */
>          anon_prio = vmscan_swappiness(sc);
> -       file_prio = 200 - anon_prio;
> +       file_prio = max_swappiness - anon_prio;
>   
>          /*
>           * OK, so we have swap space and a fair amount of page cache


[-- Attachment #2: Type: text/html, Size: 6098 bytes --]

next prev parent reply	other threads:[~2013-04-05  0:17 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-01-22  6:53 [LSF/MM TOPIC]swap improvements for fast SSD Shaohua Li
2013-01-23  7:58 ` Minchan Kim
2013-01-23 19:04   ` Seth Jennings
2013-01-24  1:40     ` Minchan Kim
2013-01-24  8:29       ` Simon Jeons
2013-01-24  2:02   ` Shaohua Li
2013-01-24  7:52   ` Simon Jeons
2013-01-24  9:09   ` Simon Jeons
2013-01-26  4:40     ` Kyungmin Park
2013-01-27  0:26       ` Simon Jeons
2013-01-27 14:18       ` Shaohua Li
2013-01-28  7:37         ` Kyungmin Park
2013-02-01 12:37           ` Kyungmin Park
2013-02-04  4:56         ` Hugh Dickins
2013-02-19  6:15           ` Shaohua Li
2013-02-19 19:41             ` Hugh Dickins
2013-04-05  0:17   ` Simon Jeons [this message]
2013-04-05  8:08     ` Minchan Kim
2013-01-23 16:56 ` Seth Jennings
2013-01-24  6:28 ` Simon Jeons
2013-03-15  9:39 ` Simon Jeons
2013-03-18 10:38   ` Bob Liu
2013-03-19  1:27     ` Shaohua Li
2013-03-19  1:32       ` Simon Jeons
2013-03-19  5:57         ` Shaohua Li
2013-03-19  6:10           ` Simon Jeons
2013-03-19  4:25       ` Wanpeng Li
2013-03-19  4:25       ` Wanpeng Li
2013-04-28  8:12 ` Simon Jeons
     [not found] <766b9855-adf5-47ce-9484-971f88ff0e54@default>
2013-01-23 23:05 ` Dan Magenheimer
2013-01-24  2:11   ` Shaohua Li

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=515E17FC.9050008@gmail.com \
    --to=simon.jeons@gmail.com \
    --cc=linux-mm@kvack.org \
    --cc=lsf-pc@lists.linux-foundation.org \
    --cc=minchan@kernel.org \
    --cc=riel@redhat.com \
    --cc=shli@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).