Re: [PATCH 4/5] mm:swap: respect page_cluster for readahead

All of lore.kernel.org
 help / color / mirror / Atom feed

From: "Huang\, Ying" <ying.huang@intel.com>
To: Minchan Kim <minchan@kernel.org>
Cc: "Huang, Ying" <ying.huang@intel.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	kernel-team <kernel-team@lge.com>,
	Ilya Dryomov <idryomov@gmail.com>,
	Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Subject: Re: [PATCH 4/5] mm:swap: respect page_cluster for readahead
Date: Tue, 12 Sep 2017 16:07:01 +0800	[thread overview]
Message-ID: <87mv60nxwa.fsf@yhuang-dev.intel.com> (raw)
In-Reply-To: <20170912075645.GA2837@bbox> (Minchan Kim's message of "Tue, 12 Sep 2017 16:56:45 +0900")

Minchan Kim <minchan@kernel.org> writes:

> On Tue, Sep 12, 2017 at 03:29:45PM +0800, Huang, Ying wrote:
>> Minchan Kim <minchan@kernel.org> writes:
>> 
>> > On Tue, Sep 12, 2017 at 02:44:36PM +0800, Huang, Ying wrote:
>> >> Minchan Kim <minchan@kernel.org> writes:
>> >> 
>> >> > On Tue, Sep 12, 2017 at 01:23:01PM +0800, Huang, Ying wrote:
>> >> >> Minchan Kim <minchan@kernel.org> writes:
>> >> >> 
>> >> >> > page_cluster 0 means "we don't want readahead" so in the case,
>> >> >> > let's skip the readahead detection logic.
>> >> >> >
>> >> >> > Cc: "Huang, Ying" <ying.huang@intel.com>
>> >> >> > Signed-off-by: Minchan Kim <minchan@kernel.org>
>> >> >> > ---
>> >> >> >  include/linux/swap.h | 3 ++-
>> >> >> >  1 file changed, 2 insertions(+), 1 deletion(-)
>> >> >> >
>> >> >> > diff --git a/include/linux/swap.h b/include/linux/swap.h
>> >> >> > index 0f54b491e118..739d94397c47 100644
>> >> >> > --- a/include/linux/swap.h
>> >> >> > +++ b/include/linux/swap.h
>> >> >> > @@ -427,7 +427,8 @@ extern bool has_usable_swap(void);
>> >> >> >  
>> >> >> >  static inline bool swap_use_vma_readahead(void)
>> >> >> >  {
>> >> >> > -	return READ_ONCE(swap_vma_readahead) && !atomic_read(&nr_rotate_swap);
>> >> >> > +	return page_cluster > 0 && READ_ONCE(swap_vma_readahead)
>> >> >> > +				&& !atomic_read(&nr_rotate_swap);
>> >> >> >  }
>> >> >> >  
>> >> >> >  /* Swap 50% full? Release swapcache more aggressively.. */
>> >> >> 
>> >> >> Now the readahead window size of the VMA based swap readahead is
>> >> >> controlled by /sys/kernel/mm/swap/vma_ra_max_order, while that of the
>> >> >> original swap readahead is controlled by sysctl page_cluster.  It is
>> >> >> possible for anonymous memory to use VMA based swap readahead and tmpfs
>> >> >> to use original swap readahead algorithm at the same time.  So that, I
>> >> >> think it is necessary to use different control knob to control these two
>> >> >> algorithm.  So if we want to disable readahead for tmpfs, but keep it
>> >> >> for VMA based readahead, we can set 0 to page_cluster but non-zero to
>> >> >> /sys/kernel/mm/swap/vma_ra_max_order.  With your change, this will be
>> >> >> impossible.
>> >> >
>> >> > For a long time, page-cluster have been used as controlling swap readahead.
>> >> > One of example, zram users have been disabled readahead via 0 page-cluster.
>> >> > However, with your change, it would be regressed if it doesn't disable
>> >> > vma_ra_max_order.
>> >> >
>> >> > As well, all of swap users should be aware of vma_ra_max_order as well as
>> >> > page-cluster to control swap readahead but I didn't see any document about
>> >> > that. Acutaully, I don't like it but want to unify it with page-cluster.
>> >> 
>> >> The document is in
>> >> 
>> >> Documentation/ABI/testing/sysfs-kernel-mm-swap
>> >> 
>> >> The concern of unifying it with page-cluster is as following.
>> >> 
>> >> Original swap readahead on tmpfs may not work well because the combined
>> >> workload is running, so we want to disable or constrain it.  But at the
>> >> same time, the VMA based swap readahead may work better.  So I think it
>> >> may be necessary to control them separately.
>> >
>> > My concern is users have been disabled swap readahead by page-cluster would
>> > be regressed. Please take care of them.
>> 
>> How about disable VMA based swap readahead if zram used as swap?  Like
>> we have done for hard disk?
>
> It could be with SWP_SYNCHRONOUS_IO flag which indicates super-fast,
> no seek cost swap devices if this patchset is merged so VM automatically
> disables readahead. It is in my TODO but it's orthogonal work.
>
> The problem I raised is "Why shouldn't we obey user's decision?",
> not zram sepcific issue.
>
> A user has used SSD as swap devices decided to disable swap readahead
> by some reason(e.g., small memory system). Anyway, it has worked
> via page-cluster for a several years but with vma-based swap devices,
> it doesn't work any more.

Can they add one more line to their configuration scripts?

echo 0 > /sys/kernel/mm/swap/vma_ra_max_order

Best Regards,
Huang, Ying

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

WARNING: multiple messages have this Message-ID (diff)

From: "Huang\, Ying" <ying.huang@intel.com>
To: Minchan Kim <minchan@kernel.org>
Cc: "Huang\, Ying" <ying.huang@intel.com>,
	Andrew Morton <akpm@linux-foundation.org>, <linux-mm@kvack.org>,
	<linux-kernel@vger.kernel.org>, kernel-team <kernel-team@lge.com>,
	"Ilya Dryomov" <idryomov@gmail.com>,
	Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Subject: Re: [PATCH 4/5] mm:swap: respect page_cluster for readahead
Date: Tue, 12 Sep 2017 16:07:01 +0800	[thread overview]
Message-ID: <87mv60nxwa.fsf@yhuang-dev.intel.com> (raw)
In-Reply-To: <20170912075645.GA2837@bbox> (Minchan Kim's message of "Tue, 12 Sep 2017 16:56:45 +0900")

Minchan Kim <minchan@kernel.org> writes:

> On Tue, Sep 12, 2017 at 03:29:45PM +0800, Huang, Ying wrote:
>> Minchan Kim <minchan@kernel.org> writes:
>> 
>> > On Tue, Sep 12, 2017 at 02:44:36PM +0800, Huang, Ying wrote:
>> >> Minchan Kim <minchan@kernel.org> writes:
>> >> 
>> >> > On Tue, Sep 12, 2017 at 01:23:01PM +0800, Huang, Ying wrote:
>> >> >> Minchan Kim <minchan@kernel.org> writes:
>> >> >> 
>> >> >> > page_cluster 0 means "we don't want readahead" so in the case,
>> >> >> > let's skip the readahead detection logic.
>> >> >> >
>> >> >> > Cc: "Huang, Ying" <ying.huang@intel.com>
>> >> >> > Signed-off-by: Minchan Kim <minchan@kernel.org>
>> >> >> > ---
>> >> >> >  include/linux/swap.h | 3 ++-
>> >> >> >  1 file changed, 2 insertions(+), 1 deletion(-)
>> >> >> >
>> >> >> > diff --git a/include/linux/swap.h b/include/linux/swap.h
>> >> >> > index 0f54b491e118..739d94397c47 100644
>> >> >> > --- a/include/linux/swap.h
>> >> >> > +++ b/include/linux/swap.h
>> >> >> > @@ -427,7 +427,8 @@ extern bool has_usable_swap(void);
>> >> >> >  
>> >> >> >  static inline bool swap_use_vma_readahead(void)
>> >> >> >  {
>> >> >> > -	return READ_ONCE(swap_vma_readahead) && !atomic_read(&nr_rotate_swap);
>> >> >> > +	return page_cluster > 0 && READ_ONCE(swap_vma_readahead)
>> >> >> > +				&& !atomic_read(&nr_rotate_swap);
>> >> >> >  }
>> >> >> >  
>> >> >> >  /* Swap 50% full? Release swapcache more aggressively.. */
>> >> >> 
>> >> >> Now the readahead window size of the VMA based swap readahead is
>> >> >> controlled by /sys/kernel/mm/swap/vma_ra_max_order, while that of the
>> >> >> original swap readahead is controlled by sysctl page_cluster.  It is
>> >> >> possible for anonymous memory to use VMA based swap readahead and tmpfs
>> >> >> to use original swap readahead algorithm at the same time.  So that, I
>> >> >> think it is necessary to use different control knob to control these two
>> >> >> algorithm.  So if we want to disable readahead for tmpfs, but keep it
>> >> >> for VMA based readahead, we can set 0 to page_cluster but non-zero to
>> >> >> /sys/kernel/mm/swap/vma_ra_max_order.  With your change, this will be
>> >> >> impossible.
>> >> >
>> >> > For a long time, page-cluster have been used as controlling swap readahead.
>> >> > One of example, zram users have been disabled readahead via 0 page-cluster.
>> >> > However, with your change, it would be regressed if it doesn't disable
>> >> > vma_ra_max_order.
>> >> >
>> >> > As well, all of swap users should be aware of vma_ra_max_order as well as
>> >> > page-cluster to control swap readahead but I didn't see any document about
>> >> > that. Acutaully, I don't like it but want to unify it with page-cluster.
>> >> 
>> >> The document is in
>> >> 
>> >> Documentation/ABI/testing/sysfs-kernel-mm-swap
>> >> 
>> >> The concern of unifying it with page-cluster is as following.
>> >> 
>> >> Original swap readahead on tmpfs may not work well because the combined
>> >> workload is running, so we want to disable or constrain it.  But at the
>> >> same time, the VMA based swap readahead may work better.  So I think it
>> >> may be necessary to control them separately.
>> >
>> > My concern is users have been disabled swap readahead by page-cluster would
>> > be regressed. Please take care of them.
>> 
>> How about disable VMA based swap readahead if zram used as swap?  Like
>> we have done for hard disk?
>
> It could be with SWP_SYNCHRONOUS_IO flag which indicates super-fast,
> no seek cost swap devices if this patchset is merged so VM automatically
> disables readahead. It is in my TODO but it's orthogonal work.
>
> The problem I raised is "Why shouldn't we obey user's decision?",
> not zram sepcific issue.
>
> A user has used SSD as swap devices decided to disable swap readahead
> by some reason(e.g., small memory system). Anyway, it has worked
> via page-cluster for a several years but with vma-based swap devices,
> it doesn't work any more.

Can they add one more line to their configuration scripts?

echo 0 > /sys/kernel/mm/swap/vma_ra_max_order

Best Regards,
Huang, Ying

next prev parent reply	other threads:[~2017-09-12  8:07 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-09-12  2:37 [PATCH 1/5] zram: set BDI_CAP_STABLE_WRITES once Minchan Kim
2017-09-12  2:37 ` Minchan Kim
2017-09-12  2:37 ` [PATCH 2/5] bdi: introduce BDI_CAP_SYNCHRONOUS_IO Minchan Kim
2017-09-12  2:37   ` Minchan Kim
2017-09-12  2:37 ` [PATCH 3/5] mm:swap: introduce SWP_SYNCHRONOUS_IO Minchan Kim
2017-09-12  2:37   ` Minchan Kim
2017-09-12  4:48   ` Sergey Senozhatsky
2017-09-12  4:48     ` Sergey Senozhatsky
2017-09-12  2:37 ` [PATCH 4/5] mm:swap: respect page_cluster for readahead Minchan Kim
2017-09-12  2:37   ` Minchan Kim
2017-09-12  5:23   ` Huang, Ying
2017-09-12  5:23     ` Huang, Ying
2017-09-12  6:25     ` Minchan Kim
2017-09-12  6:25       ` Minchan Kim
2017-09-12  6:44       ` Huang, Ying
2017-09-12  6:44         ` Huang, Ying
2017-09-12  6:52         ` Minchan Kim
2017-09-12  6:52           ` Minchan Kim
2017-09-12  7:29           ` Huang, Ying
2017-09-12  7:29             ` Huang, Ying
2017-09-12  7:56             ` Minchan Kim
2017-09-12  7:56               ` Minchan Kim
2017-09-12  8:07               ` Huang, Ying [this message]
2017-09-12  8:07                 ` Huang, Ying
2017-09-12  8:22                 ` Minchan Kim
2017-09-12  8:22                   ` Minchan Kim
2017-09-12  8:32                   ` Huang, Ying
2017-09-12  8:32                     ` Huang, Ying
2017-09-12 23:34                     ` Minchan Kim
2017-09-12 23:34                       ` Minchan Kim
2017-09-13  0:55                       ` Huang, Ying
2017-09-13  0:55                         ` Huang, Ying
2017-09-12  2:37 ` [PATCH 5/5] mm:swap: skip swapcache for swapin of synchronous device Minchan Kim
2017-09-12  2:37   ` Minchan Kim
2017-09-12  6:04   ` Sergey Senozhatsky
2017-09-12  6:04     ` Sergey Senozhatsky
2017-09-12  6:31     ` Minchan Kim
2017-09-12  6:31       ` Minchan Kim
2017-09-12 20:22   ` kbuild test robot
2017-09-12 20:22     ` kbuild test robot
2017-09-13  0:16     ` Minchan Kim
2017-09-13  0:16       ` Minchan Kim

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87mv60nxwa.fsf@yhuang-dev.intel.com \
    --to=ying.huang@intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=idryomov@gmail.com \
    --cc=kernel-team@lge.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=minchan@kernel.org \
    --cc=sergey.senozhatsky@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.