* [RFC][PATCH 1/2] Add explanation about min_free_kbytes to clarify its effect
2011-01-07 22:03 [RFC][PATCH 0/2] Tunable watermark Satoru Moriya
@ 2011-01-07 22:04 ` Satoru Moriya
2011-01-07 22:27 ` David Rientjes
2011-01-07 22:07 ` [RFC][PATCH 2/2] Make watermarks tunable separately Satoru Moriya
` (2 subsequent siblings)
3 siblings, 1 reply; 13+ messages in thread
From: Satoru Moriya @ 2011-01-07 22:04 UTC (permalink / raw)
To: linux-mm@kvack.org
Cc: linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org,
akpm@linux-foundation.org, mel@csn.ul.ie,
kosaki.motohiro@jp.fujitsu.com, rdunlap@xenotime.net,
dle-develop@lists.sourceforge.net, Seiji Aguchi
Document that changing min_free_kbytes affects not only watermark[min]
but also watermark[low,high].
Signed-off-by: Satoru Moriya <satoru.moriya@hds.com>
---
Documentation/sysctl/vm.txt | 3 ++-
1 files changed, 2 insertions(+), 1 deletions(-)
diff --git a/Documentation/sysctl/vm.txt b/Documentation/sysctl/vm.txt
index 30289fa..e10b279 100644
--- a/Documentation/sysctl/vm.txt
+++ b/Documentation/sysctl/vm.txt
@@ -349,7 +349,8 @@ min_free_kbytes:
This is used to force the Linux VM to keep a minimum number
of kilobytes free. The VM uses this number to compute a
-watermark[WMARK_MIN] value for each lowmem zone in the system.
+watermark[WMARK_MIN] for each lowmem zone and
+watermark[WMARK_LOW/WMARK_HIGH] for each zone in the system.
Each lowmem zone gets a number of reserved free pages based
proportionally on its size.
--
1.7.1
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [RFC][PATCH 1/2] Add explanation about min_free_kbytes to clarify its effect
2011-01-07 22:04 ` [RFC][PATCH 1/2] Add explanation about min_free_kbytes to clarify its effect Satoru Moriya
@ 2011-01-07 22:27 ` David Rientjes
0 siblings, 0 replies; 13+ messages in thread
From: David Rientjes @ 2011-01-07 22:27 UTC (permalink / raw)
To: Satoru Moriya
Cc: linux-mm, linux-kernel, linux-doc, Andrew Morton, Mel Gorman,
KOSAKI Motohiro, Randy Dunlap, dle-develop, Seiji Aguchi
On Fri, 7 Jan 2011, Satoru Moriya wrote:
> diff --git a/Documentation/sysctl/vm.txt b/Documentation/sysctl/vm.txt
> index 30289fa..e10b279 100644
> --- a/Documentation/sysctl/vm.txt
> +++ b/Documentation/sysctl/vm.txt
> @@ -349,7 +349,8 @@ min_free_kbytes:
>
> This is used to force the Linux VM to keep a minimum number
> of kilobytes free. The VM uses this number to compute a
> -watermark[WMARK_MIN] value for each lowmem zone in the system.
> +watermark[WMARK_MIN] for each lowmem zone and
> +watermark[WMARK_LOW/WMARK_HIGH] for each zone in the system.
> Each lowmem zone gets a number of reserved free pages based
> proportionally on its size.
>
WMARK_MIN is changed for all zones.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 13+ messages in thread
* [RFC][PATCH 2/2] Make watermarks tunable separately
2011-01-07 22:03 [RFC][PATCH 0/2] Tunable watermark Satoru Moriya
2011-01-07 22:04 ` [RFC][PATCH 1/2] Add explanation about min_free_kbytes to clarify its effect Satoru Moriya
@ 2011-01-07 22:07 ` Satoru Moriya
2011-01-07 22:23 ` [RFC][PATCH 0/2] Tunable watermark David Rientjes
2011-01-21 0:16 ` Rik van Riel
3 siblings, 0 replies; 13+ messages in thread
From: Satoru Moriya @ 2011-01-07 22:07 UTC (permalink / raw)
To: linux-mm@kvack.org
Cc: linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org,
akpm@linux-foundation.org, mel@csn.ul.ie,
kosaki.motohiro@jp.fujitsu.com, rdunlap@xenotime.net,
dle-develop@lists.sourceforge.net, Seiji Aguchi
This patch introduces three new sysctls to /proc/sys/vm:
wmark_min_kbytes, wmark_low_kbytes and wmark_high_kbytes.
Each entry is used to compute watermark[min], watermark[low]
and watermark[high] for each zone.
These parameters are also updated when min_free_kbytes are
changed because originally they are set based on min_free_kbytes.
On the other hand, min_free_kbytes is updated when wmark_free_kbytes
changes.
By using the parameters one can adjust the difference among
watermark[min], watermark[low] and watermark[high] and as a result
one can tune the kernel reclaim behaviour to fit their requirement.
Signed-off-by: Satoru Moriya <satoru.moriya@hds.com>
---
Documentation/sysctl/vm.txt | 37 +++++++++++++++
include/linux/mmzone.h | 6 ++
kernel/sysctl.c | 28 +++++++++++-
mm/page_alloc.c | 109 +++++++++++++++++++++++++++++++++++++++++++
4 files changed, 179 insertions(+), 1 deletions(-)
diff --git a/Documentation/sysctl/vm.txt b/Documentation/sysctl/vm.txt
index e10b279..674681d 100644
--- a/Documentation/sysctl/vm.txt
+++ b/Documentation/sysctl/vm.txt
@@ -55,6 +55,9 @@ Currently, these files are in /proc/sys/vm:
- stat_interval
- swappiness
- vfs_cache_pressure
+- wmark_high_kbytes
+- wmark_low_kbytes
+- wmark_min_kbytes
- zone_reclaim_mode
==============================================================
@@ -360,6 +363,8 @@ become subtly broken, and prone to deadlock under high loads.
Setting this too high will OOM your machine instantly.
+This is also updated when wmark_min_free_kbytes changes.
+
=============================================================
min_slab_ratio:
@@ -664,6 +669,38 @@ causes the kernel to prefer to reclaim dentries and inodes.
==============================================================
+wmark_high_kbytes
+
+Contains the amount of free memory above which kswapd stops reclaiming pages.
+
+The Linux VM uses this number to compute a watermark[WMARK_HIGH] value for
+each zone in the system. This is also updated when min_free_kbytes is updated.
+The minimum is wmark_low_kbytes.
+
+==============================================================
+
+wmark_low_kbytes
+
+Contains the amount of free memory below which kswapd starts to reclaim pages.
+
+The Linux VM uses this number to compute a watermark[WMARK_LOW] value for
+each zone in the system. This is also updated when min_free_kbytes changes
^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [RFC][PATCH 0/2] Tunable watermark
2011-01-07 22:03 [RFC][PATCH 0/2] Tunable watermark Satoru Moriya
2011-01-07 22:04 ` [RFC][PATCH 1/2] Add explanation about min_free_kbytes to clarify its effect Satoru Moriya
2011-01-07 22:07 ` [RFC][PATCH 2/2] Make watermarks tunable separately Satoru Moriya
@ 2011-01-07 22:23 ` David Rientjes
2011-01-07 22:35 ` Ying Han
2011-01-13 22:05 ` Satoru Moriya
2011-01-21 0:16 ` Rik van Riel
3 siblings, 2 replies; 13+ messages in thread
From: David Rientjes @ 2011-01-07 22:23 UTC (permalink / raw)
To: Satoru Moriya
Cc: linux-mm, linux-kernel, linux-doc, Andrew Morton, Mel Gorman,
KOSAKI Motohiro, Randy Dunlap, dle-develop, Seiji Aguchi
On Fri, 7 Jan 2011, Satoru Moriya wrote:
> This patchset introduces a new knob to control each watermark
> separately.
>
> [Purpose]
> To control the timing at which kswapd/direct reclaim starts(ends)
> based on memory pressure and/or application characteristics
> because direct reclaim makes a memory alloc/access latency worse.
> (We'd like to avoid direct reclaim to keep latency low even if
> under the high memory pressure.)
>
> [Problem]
> The thresholds kswapd/direct reclaim starts(ends) depend on
> watermark[min,low,high] and currently all watermarks are set
> based on min_free_kbytes. min_free_kbytes is the amount of
> free memory that Linux VM should keep at least.
>
Not completely, it also depends on the amount of lowmem (because of the
reserve setup next) and the amount of memory in each zone.
> This means the difference between thresholds at which kswapd
> starts and direct reclaim starts depends on the amount of free
> memory.
>
> On the other hand, the amount of required memory depends on
> applications. Therefore when it allocates/access memory more
> than the difference between watemark[low] and watermark[min],
> kernel sometimes runs direct reclaim before allocation and
> it makes application latency bigger.
>
> [Solution]
> To avoid the situation above, this patch set introduces new
> tunables /proc/sys/vm/wmark_min_kbytes, wmark_low_kbytes and
> wmark_high_kbytes. Each entry controls watermark[min],
> watermark[low] and watermark[high] separately.
> By using these parameters one can make the difference between
> min and low bigger than the amount of memory which applications
> require.
>
I really dislike this because it adds additional tunables that should
already be handled correctly by the VM and it's very difficult for users
to know what to tune these values to; these watermarks (with the exception
of min) are supposed to be internal to the VM implementation.
You didn't mention why it wouldn't be possible to modify
setup_per_zone_wmarks() in some way for your configuration so this happens
automatically. If you can find a deterministic way to set these
watermarks from userspace, you should be able to do it in the kernel as
well based on the configuration.
I think we should invest time in making sure the VM works for any type of
workload thrown at it instead of relying on userspace making lots of
adjustments.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [RFC][PATCH 0/2] Tunable watermark
2011-01-07 22:23 ` [RFC][PATCH 0/2] Tunable watermark David Rientjes
@ 2011-01-07 22:35 ` Ying Han
2011-01-07 22:39 ` David Rientjes
2011-01-13 22:05 ` Satoru Moriya
1 sibling, 1 reply; 13+ messages in thread
From: Ying Han @ 2011-01-07 22:35 UTC (permalink / raw)
To: David Rientjes
Cc: Satoru Moriya, linux-mm, linux-kernel, linux-doc, Andrew Morton,
Mel Gorman, KOSAKI Motohiro, Randy Dunlap, dle-develop,
Seiji Aguchi, Ying Han
On Fri, Jan 7, 2011 at 2:23 PM, David Rientjes <rientjes@google.com> wrote:
> On Fri, 7 Jan 2011, Satoru Moriya wrote:
>
>> This patchset introduces a new knob to control each watermark
>> separately.
>>
>> [Purpose]
>> To control the timing at which kswapd/direct reclaim starts(ends)
>> based on memory pressure and/or application characteristics
>> because direct reclaim makes a memory alloc/access latency worse.
>> (We'd like to avoid direct reclaim to keep latency low even if
>> under the high memory pressure.)
>>
>> [Problem]
>> The thresholds kswapd/direct reclaim starts(ends) depend on
>> watermark[min,low,high] and currently all watermarks are set
>> based on min_free_kbytes. min_free_kbytes is the amount of
>> free memory that Linux VM should keep at least.
>>
>
> Not completely, it also depends on the amount of lowmem (because of the
> reserve setup next) and the amount of memory in each zone.
>
>> This means the difference between thresholds at which kswapd
>> starts and direct reclaim starts depends on the amount of free
>> memory.
>>
>> On the other hand, the amount of required memory depends on
>> applications. Therefore when it allocates/access memory more
>> than the difference between watemark[low] and watermark[min],
>> kernel sometimes runs direct reclaim before allocation and
>> it makes application latency bigger.
>>
>> [Solution]
>> To avoid the situation above, this patch set introduces new
>> tunables /proc/sys/vm/wmark_min_kbytes, wmark_low_kbytes and
>> wmark_high_kbytes. Each entry controls watermark[min],
>> watermark[low] and watermark[high] separately.
>> By using these parameters one can make the difference between
>> min and low bigger than the amount of memory which applications
>> require.
>>
>
> I really dislike this because it adds additional tunables that should
> already be handled correctly by the VM and it's very difficult for users
> to know what to tune these values to; these watermarks (with the exception
> of min) are supposed to be internal to the VM implementation.
>
> You didn't mention why it wouldn't be possible to modify
> setup_per_zone_wmarks() in some way for your configuration so this happens
> automatically. If you can find a deterministic way to set these
> watermarks from userspace, you should be able to do it in the kernel as
> well based on the configuration.
>
> I think we should invest time in making sure the VM works for any type of
> workload thrown at it instead of relying on userspace making lots of
> adjustments.
I agree in general that adding the APIs to each wmarks sounds like a
over-kill, and
hard for user to configure most of the time.
On the other hand, having the low/high wmark consider more characters
other than the
size of the zone sounds useful. But I am not sure how to approach that
entirely in the
kernel if we like the reclaim behavior to be reflected from the
different workload.
--Ying
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [RFC][PATCH 0/2] Tunable watermark
2011-01-07 22:35 ` Ying Han
@ 2011-01-07 22:39 ` David Rientjes
2011-01-13 22:05 ` Satoru Moriya
0 siblings, 1 reply; 13+ messages in thread
From: David Rientjes @ 2011-01-07 22:39 UTC (permalink / raw)
To: Ying Han
Cc: Satoru Moriya, linux-mm, linux-kernel, linux-doc, Andrew Morton,
Mel Gorman, KOSAKI Motohiro, Randy Dunlap, dle-develop,
Seiji Aguchi
On Fri, 7 Jan 2011, Ying Han wrote:
> On the other hand, having the low/high wmark consider more characters
> other than the
> size of the zone sounds useful.
The semantics of any watermark is to trigger events to happen at a
specific level, so they should be static with respect to a frame of
reference (which in the VM case is the min watermark with respect to the
size of the zone). If you're going to adjust the min watermark, it's then
_mandatory_ to adjust the others to that frame of reference, you shouldn't
need to tune them independently.
The problem that Satoru is reporting probably has nothing to do with the
watermarks themselves but probably requires more aggressive action by
kswapd and/or memory compaction.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 13+ messages in thread
* RE: [RFC][PATCH 0/2] Tunable watermark
2011-01-07 22:39 ` David Rientjes
@ 2011-01-13 22:05 ` Satoru Moriya
2011-01-13 22:24 ` David Rientjes
0 siblings, 1 reply; 13+ messages in thread
From: Satoru Moriya @ 2011-01-13 22:05 UTC (permalink / raw)
To: David Rientjes, Ying Han
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
linux-doc@vger.kernel.org, Andrew Morton, Mel Gorman,
KOSAKI Motohiro, Randy Dunlap, dle-develop@lists.sourceforge.net,
Seiji Aguchi
On 01/07/2011 05:39 PM, David Rientjes wrote:
> The semantics of any watermark is to trigger events to happen at a
> specific level, so they should be static with respect to a frame of
> reference (which in the VM case is the min watermark with respect to the
> size of the zone). If you're going to adjust the min watermark, it's then
> _mandatory_ to adjust the others to that frame of reference, you shouldn't
> need to tune them independently.
Currently watermark[low,high] are set by following calculation (lowmem case).
watermark[low] = watermark[min] * 1.25
watermark[high] = watermark[min] * 1.5
So the difference between watermarks are following:
min <-- min/4 --> low <-- min/4 --> high
I think the differences, "min/4", are too small in my case.
Of course I can make them bigger if I set min_free_kbytes to bigger value.
But it means kernel keeps more free memory for PF_MEMALLOC case unnecessarily.
So I suggest changing coefficients(1.25, 1.5). Also it's better
to make them accessible from user space to tune in response to application
requirements.
> The problem that Satoru is reporting probably has nothing to do with the
> watermarks themselves but probably requires more aggressive action by
> kswapd and/or memory compaction.
More aggressive action may reduce the possibility of the problem reported.
But we can't avoid the problem completely because applications may
allocate/access faster than reclaiming/compaction.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 13+ messages in thread
* RE: [RFC][PATCH 0/2] Tunable watermark
2011-01-13 22:05 ` Satoru Moriya
@ 2011-01-13 22:24 ` David Rientjes
0 siblings, 0 replies; 13+ messages in thread
From: David Rientjes @ 2011-01-13 22:24 UTC (permalink / raw)
To: Satoru Moriya
Cc: Ying Han, linux-mm, linux-kernel, linux-doc, Andrew Morton,
Mel Gorman, KOSAKI Motohiro, Randy Dunlap, dle-develop,
Seiji Aguchi
On Thu, 13 Jan 2011, Satoru Moriya wrote:
> Currently watermark[low,high] are set by following calculation (lowmem case).
>
> watermark[low] = watermark[min] * 1.25
> watermark[high] = watermark[min] * 1.5
>
> So the difference between watermarks are following:
>
> min <-- min/4 --> low <-- min/4 --> high
>
> I think the differences, "min/4", are too small in my case.
> Of course I can make them bigger if I set min_free_kbytes to bigger value.
> But it means kernel keeps more free memory for PF_MEMALLOC case unnecessarily.
>
> So I suggest changing coefficients(1.25, 1.5). Also it's better
> to make them accessible from user space to tune in response to application
> requirements.
>
Userspace can't possibly be held responsible for tuning internal VM
parameters in response to certain workloads like this; if you have
evidence that different coefficients work better in different
circumstances, then present the criteria for which you intend to change
them from the command line via your new tunables and let's work to make
the VM more extendable to serve those workloads well. This should be done
by showing how background reclaim is ineffective, we enter direct
compaction or reclaim too aggressively, we don't wait for writeout long
enough, we prematurely kill applications when unnecessary, etc, which
would undoubtedly have if you're going to make any sane adjustments via
these new tunables.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 13+ messages in thread
* RE: [RFC][PATCH 0/2] Tunable watermark
2011-01-07 22:23 ` [RFC][PATCH 0/2] Tunable watermark David Rientjes
2011-01-07 22:35 ` Ying Han
@ 2011-01-13 22:05 ` Satoru Moriya
2011-01-13 22:20 ` David Rientjes
1 sibling, 1 reply; 13+ messages in thread
From: Satoru Moriya @ 2011-01-13 22:05 UTC (permalink / raw)
To: David Rientjes
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
linux-doc@vger.kernel.org, Andrew Morton, Mel Gorman,
KOSAKI Motohiro, Randy Dunlap, dle-develop@lists.sourceforge.net,
Seiji Aguchi
Hi David,
Thank you for your comments.
On 01/07/2011 05:23 PM, David Rientjes wrote:
> On Fri, 7 Jan 2011, Satoru Moriya wrote:
>>
>> [Problem]
>> The thresholds kswapd/direct reclaim starts(ends) depend on
>> watermark[min,low,high] and currently all watermarks are set
>> based on min_free_kbytes. min_free_kbytes is the amount of
>> free memory that Linux VM should keep at least.
>>
>
> Not completely, it also depends on the amount of lowmem (because of the
> reserve setup next) and the amount of memory in each zone.
Right. Thanks.
>> [Solution]
>> To avoid the situation above, this patch set introduces new
>> tunables /proc/sys/vm/wmark_min_kbytes, wmark_low_kbytes and
>> wmark_high_kbytes. Each entry controls watermark[min],
>> watermark[low] and watermark[high] separately.
>> By using these parameters one can make the difference between
>> min and low bigger than the amount of memory which applications
>> require.
>>
>
> I really dislike this because it adds additional tunables that should
> already be handled correctly by the VM and it's very difficult for users
> to know what to tune these values to; these watermarks (with the exception
> of min) are supposed to be internal to the VM implementation.
The patchset targeted enterprise system and in that area users expect
that they can tune the system by themselves to fulfill their requirements.
> You didn't mention why it wouldn't be possible to modify
> setup_per_zone_wmarks() in some way for your configuration so this happens
> automatically. If you can find a deterministic way to set these
> watermarks from userspace, you should be able to do it in the kernel as
> well based on the configuration.
Do you mean that we should introduce a mechanism into kernel that changes
watermarks dynamically depending on its loads (such as cpu frequency control)
or we should change the calculation method in setup_per_zone_wmarks()?
I think it is difficult to control watermarks automatically in kernel because
required memory varies widely among applications. On the other hand, sysctl
parameters help us fit the kernel to each system's requirement flexibly.
> I think we should invest time in making sure the VM works for any type of
> workload thrown at it instead of relying on userspace making lots of
> adjustments.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 13+ messages in thread
* RE: [RFC][PATCH 0/2] Tunable watermark
2011-01-13 22:05 ` Satoru Moriya
@ 2011-01-13 22:20 ` David Rientjes
0 siblings, 0 replies; 13+ messages in thread
From: David Rientjes @ 2011-01-13 22:20 UTC (permalink / raw)
To: Satoru Moriya
Cc: linux-mm, linux-kernel, linux-doc, Andrew Morton, Mel Gorman,
KOSAKI Motohiro, Randy Dunlap, dle-develop, Seiji Aguchi
On Thu, 13 Jan 2011, Satoru Moriya wrote:
> > You didn't mention why it wouldn't be possible to modify
> > setup_per_zone_wmarks() in some way for your configuration so this happens
> > automatically. If you can find a deterministic way to set these
> > watermarks from userspace, you should be able to do it in the kernel as
> > well based on the configuration.
>
> Do you mean that we should introduce a mechanism into kernel that changes
> watermarks dynamically depending on its loads (such as cpu frequency control)
> or we should change the calculation method in setup_per_zone_wmarks()?
>
The watermarks you're exposing through this patchset to userspace for the
first time are meant to be internal to the VM. Userspace is not intended
to manipulate them in an effort to cover-up deficiencies within the memory
manager itself. If you have actual cases where tuning the watermarks from
userspace is helpful, then it logically means:
- the VM is acting incorrectly in response to situations where it
approaches the tunable min watermark (all watermarks are a function of
the min watermark) which shouldn't representative in just a handfull
of cases, and
- you can deterministically do the same calculation within the kernel
itself.
I'm skeptical that any tuning is actually helpful to your workload that
doesn't also indicate a problem internal to the VM itself. I think what
would be more helpful is if you would show how the watermarks currently
don't trigger fast enough (or aggressive enough) and then address the
issue in the kernel itself so everyone can benefit from your work, whether
that's adjusting where the watermarks are based on external factors or
whether the semantics of those watermarks are to slightly change.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [RFC][PATCH 0/2] Tunable watermark
2011-01-07 22:03 [RFC][PATCH 0/2] Tunable watermark Satoru Moriya
` (2 preceding siblings ...)
2011-01-07 22:23 ` [RFC][PATCH 0/2] Tunable watermark David Rientjes
@ 2011-01-21 0:16 ` Rik van Riel
2011-02-10 18:30 ` Satoru Moriya
3 siblings, 1 reply; 13+ messages in thread
From: Rik van Riel @ 2011-01-21 0:16 UTC (permalink / raw)
To: Satoru Moriya
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
linux-doc@vger.kernel.org, akpm@linux-foundation.org,
mel@csn.ul.ie, kosaki.motohiro@jp.fujitsu.com,
rdunlap@xenotime.net, dle-develop@lists.sourceforge.net,
Seiji Aguchi
On 01/07/2011 05:03 PM, Satoru Moriya wrote:
> The result is following.
>
> | default | case 1 | case 2 |
> ----------------------------------------------------------
> wmark_min_kbytes | 5752 | 5752 | 5752 |
> wmark_low_kbytes | 7190 | 16384 | 32768 | (KB)
> wmark_high_kbytes | 8628 | 20480 | 40960 |
> ----------------------------------------------------------
> real | 503 | 364 | 337 |
> user | 3 | 5 | 4 | (msec)
> sys | 153 | 149 | 146 |
> ----------------------------------------------------------
> page fault | 32768 | 32768 | 32768 |
> kswapd_wakeup | 1809 | 335 | 228 | (times)
> direct reclaim | 5 | 0 | 0 |
>
> As you can see, direct reclaim was performed 5 times and
> its exec time was 503 msec in the default case. On the other
> hand, in case 1 (large delta case ) no direct reclaim was
> performed and its exec time was 364 msec.
Saving 1.5 seconds on a one-off workload is probably not
worth the complexity of giving a system administrator
yet another set of tunables to mess with.
However, I suspect it may be a good idea if the kernel
could adjust these watermarks automatically, since direct
reclaim could lead to quite a big performance penalty.
I do not know which events should be used to increase and
decrease the watermarks, but I have some ideas:
- direct reclaim (increase)
- kswapd has trouble freeing pages (increase)
- kswapd frees enough memory at DEF_PRIORITY (decrease)
- next to no direct reclaim events in the last N (1000?)
reclaim events (decrease)
I guess we will also need to be sure that the watermarks
are never raised above some sane upper threshold. Maybe
4x or 5x the default?
--
All rights reversed
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 13+ messages in thread
* RE: [RFC][PATCH 0/2] Tunable watermark
2011-01-21 0:16 ` Rik van Riel
@ 2011-02-10 18:30 ` Satoru Moriya
0 siblings, 0 replies; 13+ messages in thread
From: Satoru Moriya @ 2011-02-10 18:30 UTC (permalink / raw)
To: Rik van Riel
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
linux-doc@vger.kernel.org, akpm@linux-foundation.org,
mel@csn.ul.ie, kosaki.motohiro@jp.fujitsu.com,
rdunlap@xenotime.net, dle-develop@lists.sourceforge.net,
Seiji Aguchi
On 01/20/2011 07:16 PM, Rik van Riel wrote:
> On 01/07/2011 05:03 PM, Satoru Moriya wrote:
>
> > The result is following.
> >
> > | default | case 1 | case 2 |
> > ----------------------------------------------------------
> > wmark_min_kbytes | 5752 | 5752 | 5752 |
> > wmark_low_kbytes | 7190 | 16384 | 32768 | (KB)
> > wmark_high_kbytes | 8628 | 20480 | 40960 |
> > ----------------------------------------------------------
> > real | 503 | 364 | 337 |
> > user | 3 | 5 | 4 | (msec)
> > sys | 153 | 149 | 146 |
> > ----------------------------------------------------------
> > page fault | 32768 | 32768 | 32768 |
> > kswapd_wakeup | 1809 | 335 | 228 | (times)
> > direct reclaim | 5 | 0 | 0 |
> >
> > As you can see, direct reclaim was performed 5 times and
> > its exec time was 503 msec in the default case. On the other
> > hand, in case 1 (large delta case ) no direct reclaim was
> > performed and its exec time was 364 msec.
>
> Saving 1.5 seconds on a one-off workload is probably not
> worth the complexity of giving a system administrator
> yet another set of tunables to mess with.
Above table shows average data but they might not be enough.
In a low-latency enterprise system, worst latency is the most
important. I recorded worst latency data per one page allocation
and here it is.
| default | case 1 | case 2 |
----------------------------------------------------------
worst latency | 223 | 75 | 50 | (usec)
per one page alloc | | | |
In the default case, the worst latency is 223 usec and at that time
direct reclaim occurred. OTOH our target latency is under 100 usec.
So I'd like to ensure that direct reclaim is never executed in a certain
situation.
> However, I suspect it may be a good idea if the kernel
> could adjust these watermarks automatically, since direct
> reclaim could lead to quite a big performance penalty.
>
> I do not know which events should be used to increase and
> decrease the watermarks, but I have some ideas:
> - direct reclaim (increase)
> - kswapd has trouble freeing pages (increase)
> - kswapd frees enough memory at DEF_PRIORITY (decrease)
> - next to no direct reclaim events in the last N (1000?)
> reclaim events (decrease)
I think it might be good idea but not enough because we can't avoid
direct reclaim completely. So what do you think of introducing a learning
mode to your idea? In the learning mode, kernel calculates appropriate
watermarks and next boot users use them.
It is useful for a enterprise system because we normally do performance/stress
tests and tune it before release. If we run stress tests under the learning mode,
we can get the appropriate watermarks for that system. By using them we can avoid
direct reclaim and keep latency low enough in a product system.
> I guess we will also need to be sure that the watermarks
> are never raised above some sane upper threshold. Maybe
> 4x or 5x the default?
>
>
> --
> All rights reversed
^ permalink raw reply [flat|nested] 13+ messages in thread