From: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
To: Anton Vorontsov <anton.vorontsov@linaro.org>
Cc: David Rientjes <rientjes@google.com>,
Pekka Enberg <penberg@kernel.org>, Mel Gorman <mgorman@suse.de>,
Glauber Costa <glommer@parallels.com>,
Michal Hocko <mhocko@suse.cz>,
"Kirill A. Shutemov" <kirill@shutemov.name>,
Luiz Capitulino <lcapitulino@redhat.com>,
Andrew Morton <akpm@linux-foundation.org>,
Greg Thelen <gthelen@google.com>,
Leonid Moiseichuk <leonid.moiseichuk@nokia.com>,
KOSAKI Motohiro <kosaki.motohiro@gmail.com>,
Minchan Kim <minchan@kernel.org>,
Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>,
John Stultz <john.stultz@linaro.org>,
linux-mm@kvack.org, linux-kernel@vger.kernel.org,
linaro-kernel@lists.linaro.org, patches@linaro.org,
kernel-team@android.com
Subject: Re: [PATCH 1/2] Add mempressure cgroup
Date: Tue, 08 Jan 2013 17:24:32 +0900 [thread overview]
Message-ID: <50EBD7C0.4010100@jp.fujitsu.com> (raw)
In-Reply-To: <20130108072935.GA15431@lizard.gateway.2wire.net>
(2013/01/08 16:29), Anton Vorontsov wrote:
> On Mon, Jan 07, 2013 at 05:51:46PM +0900, Kamezawa Hiroyuki wrote:
> [...]
>> I'm just curious..
>
> Thanks for taking a look! :)
>
> [...]
>>> +/*
>>> + * The window size is the number of scanned pages before we try to analyze
>>> + * the scanned/reclaimed ratio (or difference).
>>> + *
>>> + * It is used as a rate-limit tunable for the "low" level notification,
>>> + * and for averaging medium/oom levels. Using small window sizes can cause
>>> + * lot of false positives, but too big window size will delay the
>>> + * notifications.
>>> + */
>>> +static const uint vmpressure_win = SWAP_CLUSTER_MAX * 16;
>>> +static const uint vmpressure_level_med = 60;
>>> +static const uint vmpressure_level_oom = 99;
>>> +static const uint vmpressure_level_oom_prio = 4;
>>> +
>>
>> Hmm... isn't this window size too small ?
>> If vmscan cannot find a reclaimable page while scanning 2M of pages in a zone,
>> oom notify will be returned. Right ?
>
> Yup, you are right, if we were not able to find anything within the window
> size (which is 2M, but see below), then it is effectively the "OOM level".
> The thing is, the vmpressure reports... the pressure. :) Or, the
> allocation cost, and if the cost becomes high, it is no good.
>
> The 2M is, of course, not ideal. And the "ideal" depends on many factors,
> alike to vmstat. And, actually I dream about deriving the window size from
> zone->stat_threshold, which would make the window automatically adjustable
> for different "machine sizes" (as we do in calculate_normal_threshold(),
> in vmstat.c).
>
> But again, this is all "implementation details"; tunable stuff that we can
> either adjust ourselves as needed, or try to be smart, i.e. apply some
> heuristics, again, as in vmstat.
>
Hmm, I like automatic adjustment for things like this (but may be need to be tunable by
user). My concern is, for example, that if a qemu-kvm with pci-passthrough running on
a node using the most of memory on it, the interface will say "Hey it's near to OOM"
to users. We may need a complicated heuristics ;)
Anyway, your approach seems interesting to me but it seems peaky to usual users.
Uses should know what they should check (vmstat, zoneinfo, malloc latency ??) when they
get notify before rising real alarm. (not explained in the doc.)
For example, if the user takes care of usage of swap, he should check it.
I'm glad if you explain in Doc that this interface just makes a hint and notify status
of _recent_ vmscans of some amount of window. That means latency of recent memory allocations.
Users should confirm the real status and make the final judge by themselves.
The point is that this notify is important because it's quick and related to ongoing memory
allocation latency. But kernel is not sure there are long-standing heavy vm pressure.
I'm sorry if I misundestand the concept.
Thank you,
-Kame
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
WARNING: multiple messages have this Message-ID (diff)
From: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
To: Anton Vorontsov <anton.vorontsov@linaro.org>
Cc: David Rientjes <rientjes@google.com>,
Pekka Enberg <penberg@kernel.org>, Mel Gorman <mgorman@suse.de>,
Glauber Costa <glommer@parallels.com>,
Michal Hocko <mhocko@suse.cz>,
"Kirill A. Shutemov" <kirill@shutemov.name>,
Luiz Capitulino <lcapitulino@redhat.com>,
Andrew Morton <akpm@linux-foundation.org>,
Greg Thelen <gthelen@google.com>,
Leonid Moiseichuk <leonid.moiseichuk@nokia.com>,
KOSAKI Motohiro <kosaki.motohiro@gmail.com>,
Minchan Kim <minchan@kernel.org>,
Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>,
John Stultz <john.stultz@linaro.org>,
linux-mm@kvack.org, linux-kernel@vger.kernel.org,
linaro-kernel@lists.linaro.org, patches@linaro.org,
kernel-team@android.com
Subject: Re: [PATCH 1/2] Add mempressure cgroup
Date: Tue, 08 Jan 2013 17:24:32 +0900 [thread overview]
Message-ID: <50EBD7C0.4010100@jp.fujitsu.com> (raw)
In-Reply-To: <20130108072935.GA15431@lizard.gateway.2wire.net>
(2013/01/08 16:29), Anton Vorontsov wrote:
> On Mon, Jan 07, 2013 at 05:51:46PM +0900, Kamezawa Hiroyuki wrote:
> [...]
>> I'm just curious..
>
> Thanks for taking a look! :)
>
> [...]
>>> +/*
>>> + * The window size is the number of scanned pages before we try to analyze
>>> + * the scanned/reclaimed ratio (or difference).
>>> + *
>>> + * It is used as a rate-limit tunable for the "low" level notification,
>>> + * and for averaging medium/oom levels. Using small window sizes can cause
>>> + * lot of false positives, but too big window size will delay the
>>> + * notifications.
>>> + */
>>> +static const uint vmpressure_win = SWAP_CLUSTER_MAX * 16;
>>> +static const uint vmpressure_level_med = 60;
>>> +static const uint vmpressure_level_oom = 99;
>>> +static const uint vmpressure_level_oom_prio = 4;
>>> +
>>
>> Hmm... isn't this window size too small ?
>> If vmscan cannot find a reclaimable page while scanning 2M of pages in a zone,
>> oom notify will be returned. Right ?
>
> Yup, you are right, if we were not able to find anything within the window
> size (which is 2M, but see below), then it is effectively the "OOM level".
> The thing is, the vmpressure reports... the pressure. :) Or, the
> allocation cost, and if the cost becomes high, it is no good.
>
> The 2M is, of course, not ideal. And the "ideal" depends on many factors,
> alike to vmstat. And, actually I dream about deriving the window size from
> zone->stat_threshold, which would make the window automatically adjustable
> for different "machine sizes" (as we do in calculate_normal_threshold(),
> in vmstat.c).
>
> But again, this is all "implementation details"; tunable stuff that we can
> either adjust ourselves as needed, or try to be smart, i.e. apply some
> heuristics, again, as in vmstat.
>
Hmm, I like automatic adjustment for things like this (but may be need to be tunable by
user). My concern is, for example, that if a qemu-kvm with pci-passthrough running on
a node using the most of memory on it, the interface will say "Hey it's near to OOM"
to users. We may need a complicated heuristics ;)
Anyway, your approach seems interesting to me but it seems peaky to usual users.
Uses should know what they should check (vmstat, zoneinfo, malloc latency ??) when they
get notify before rising real alarm. (not explained in the doc.)
For example, if the user takes care of usage of swap, he should check it.
I'm glad if you explain in Doc that this interface just makes a hint and notify status
of _recent_ vmscans of some amount of window. That means latency of recent memory allocations.
Users should confirm the real status and make the final judge by themselves.
The point is that this notify is important because it's quick and related to ongoing memory
allocation latency. But kernel is not sure there are long-standing heavy vm pressure.
I'm sorry if I misundestand the concept.
Thank you,
-Kame
next prev parent reply other threads:[~2013-01-08 8:25 UTC|newest]
Thread overview: 64+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-01-04 8:27 [PATCH 0/2] Mempressure cgroup Anton Vorontsov
2013-01-04 8:27 ` Anton Vorontsov
2013-01-04 8:29 ` [PATCH 1/2] Add mempressure cgroup Anton Vorontsov
2013-01-04 8:29 ` Anton Vorontsov
2013-01-04 15:05 ` Kirill A. Shutemov
2013-01-04 15:05 ` Kirill A. Shutemov
2013-01-07 8:51 ` Kamezawa Hiroyuki
2013-01-07 8:51 ` Kamezawa Hiroyuki
2013-01-08 7:29 ` Anton Vorontsov
2013-01-08 7:29 ` Anton Vorontsov
2013-01-08 7:57 ` leonid.moiseichuk
2013-01-08 7:57 ` leonid.moiseichuk
2013-01-08 8:24 ` Kamezawa Hiroyuki [this message]
2013-01-08 8:24 ` Kamezawa Hiroyuki
2013-01-08 8:49 ` Minchan Kim
2013-01-08 8:49 ` Minchan Kim
2013-01-09 22:14 ` Anton Vorontsov
2013-01-09 22:14 ` Anton Vorontsov
2013-01-11 5:12 ` Minchan Kim
2013-01-11 5:12 ` Minchan Kim
2013-01-11 5:38 ` Anton Vorontsov
2013-01-11 5:38 ` Anton Vorontsov
2013-01-11 5:56 ` Minchan Kim
2013-01-11 5:56 ` Minchan Kim
2013-01-11 6:09 ` Anton Vorontsov
2013-01-11 6:09 ` Anton Vorontsov
2013-01-08 21:44 ` Andrew Morton
2013-01-08 21:44 ` Andrew Morton
2013-01-09 14:10 ` Glauber Costa
2013-01-09 14:10 ` Glauber Costa
2013-01-09 20:28 ` Andrew Morton
2013-01-09 20:28 ` Andrew Morton
2013-01-09 8:56 ` Glauber Costa
2013-01-09 8:56 ` Glauber Costa
2013-01-09 9:15 ` Andrew Morton
2013-01-09 9:15 ` Andrew Morton
2013-01-09 13:43 ` Glauber Costa
2013-01-09 13:43 ` Glauber Costa
2013-01-09 20:37 ` Tejun Heo
2013-01-09 20:37 ` Tejun Heo
2013-01-09 20:39 ` Tejun Heo
2013-01-09 20:39 ` Tejun Heo
2013-01-09 21:20 ` Glauber Costa
2013-01-09 21:20 ` Glauber Costa
2013-01-09 21:36 ` Anton Vorontsov
2013-01-09 21:36 ` Anton Vorontsov
2013-01-09 21:55 ` Tejun Heo
2013-01-09 21:55 ` Tejun Heo
2013-01-09 22:04 ` Tejun Heo
2013-01-09 22:04 ` Tejun Heo
2013-01-09 22:06 ` Anton Vorontsov
2013-01-09 22:06 ` Anton Vorontsov
2013-01-09 22:21 ` Tejun Heo
2013-01-09 22:21 ` Tejun Heo
2013-01-10 7:18 ` Glauber Costa
2013-01-10 7:18 ` Glauber Costa
2013-01-13 8:50 ` Simon Jeons
2013-01-13 8:50 ` Simon Jeons
2013-01-13 8:52 ` Wanpeng Li
2013-01-13 8:52 ` Wanpeng Li
2013-01-04 8:29 ` [PATCH 2/2] Add shrinker interface for " Anton Vorontsov
2013-01-04 8:29 ` Anton Vorontsov
2013-01-11 19:13 ` [PATCH 0/2] Mempressure cgroup Luiz Capitulino
2013-01-11 19:13 ` Luiz Capitulino
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=50EBD7C0.4010100@jp.fujitsu.com \
--to=kamezawa.hiroyu@jp.fujitsu.com \
--cc=akpm@linux-foundation.org \
--cc=anton.vorontsov@linaro.org \
--cc=b.zolnierkie@samsung.com \
--cc=glommer@parallels.com \
--cc=gthelen@google.com \
--cc=john.stultz@linaro.org \
--cc=kernel-team@android.com \
--cc=kirill@shutemov.name \
--cc=kosaki.motohiro@gmail.com \
--cc=lcapitulino@redhat.com \
--cc=leonid.moiseichuk@nokia.com \
--cc=linaro-kernel@lists.linaro.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mgorman@suse.de \
--cc=mhocko@suse.cz \
--cc=minchan@kernel.org \
--cc=patches@linaro.org \
--cc=penberg@kernel.org \
--cc=rientjes@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.