linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Andrew Morton <akpm@linux-foundation.org>
To: Anton Vorontsov <anton.vorontsov@linaro.org>
Cc: David Rientjes <rientjes@google.com>,
	Pekka Enberg <penberg@kernel.org>, Mel Gorman <mgorman@suse.de>,
	Glauber Costa <glommer@parallels.com>,
	Michal Hocko <mhocko@suse.cz>,
	"Kirill A. Shutemov" <kirill@shutemov.name>,
	Luiz Capitulino <lcapitulino@redhat.com>,
	Greg Thelen <gthelen@google.com>,
	Leonid Moiseichuk <leonid.moiseichuk@nokia.com>,
	KOSAKI Motohiro <kosaki.motohiro@gmail.com>,
	Minchan Kim <minchan@kernel.org>,
	Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>,
	John Stultz <john.stultz@linaro.org>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	linaro-kernel@lists.linaro.org, patches@linaro.org,
	kernel-team@android.com
Subject: Re: [RFC] Add mempressure cgroup
Date: Wed, 28 Nov 2012 15:14:32 -0800	[thread overview]
Message-ID: <20121128151432.3e29d830.akpm@linux-foundation.org> (raw)
In-Reply-To: <20121128102908.GA15415@lizard>

On Wed, 28 Nov 2012 02:29:08 -0800
Anton Vorontsov <anton.vorontsov@linaro.org> wrote:

> The main characteristics are the same to what I've tried to add to vmevent
> API:
> 
>   Internally, it uses Mel Gorman's idea of scanned/reclaimed ratio for
>   pressure index calculation. But we don't expose the index to the
>   userland. Instead, there are three levels of the pressure:
> 
>   o low (just reclaiming, e.g. caches are draining);
>   o medium (allocation cost becomes high, e.g. swapping);
>   o oom (about to oom very soon).
> 
>   The rationale behind exposing levels and not the raw pressure index
>   described here: http://lkml.org/lkml/2012/11/16/675

This rationale is central to the overall design (and is hence central
to the review).  It would be better to include it in the changelogs
where it can be maintained, understood and discussed.


I see a problem with it:


It blurs the question of "who is in control".  We tell userspace "hey,
we're getting a bit tight here, please do something".  And userspace
makes the decision about what "something" is.  So userspace is in
control of part of the reclaim function and the kernel is in control of
another part.  Strange interactions are likely.

Also, the system as a whole is untestable by kernel developers - it
puts the onus onto each and every userspace developer to develop, test
and tune his application against a particular kernel version.

And the more carefully the userspace developer tunes his application,
the more vulnerable he becomes to regressions which were caused by
subtle changes in the kernel's behaviour.


Compare this with the shrink_slab() shrinkers.  With these, the VM can
query and then control the clients.  If something goes wrong or is out
of balance, it's the VM's problem to solve.

So I'm thinking that a better design would be one which puts the kernel
VM in control of userspace scanning and freeing.  Presumably with a
query-and-control interface similar to the slab shrinkers.

IOW, we make the kernel smarter and make userspace dumber.  Userspace
just sits there and does what the kernel tells it to do.

This gives the kernel developers the ability to tune and tweak (ie:
alter) userspace's behaviour *years* after that userspace code was
written.

Probably most significantly, this approach has a really big advantage:
we can test it.  Once we have defined that userspace query/control
interface we can write a compliant userspace test application then fire
it up and observe the overall system behaviour.  We can fix bugs and we
can tune it.  This cannot be done with your proposed interface because
we just don't know what userspace will do in response to changes in the
exposed metric.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  parent reply	other threads:[~2012-11-28 23:14 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-11-28 10:29 [RFC] Add mempressure cgroup Anton Vorontsov
2012-11-28 16:29 ` Michal Hocko
2012-11-29  4:17   ` Anton Vorontsov
2012-11-28 23:14 ` Andrew Morton [this message]
2012-11-29  1:27   ` Anton Vorontsov
2012-11-29  3:32     ` Anton Vorontsov
2012-11-30 17:47     ` Luiz Capitulino
2012-12-01  8:01       ` Anton Vorontsov
2012-12-01 11:18       ` Anton Vorontsov
2012-11-29  6:14 ` Kirill A. Shutemov
2012-11-29  6:21   ` Anton Vorontsov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20121128151432.3e29d830.akpm@linux-foundation.org \
    --to=akpm@linux-foundation.org \
    --cc=anton.vorontsov@linaro.org \
    --cc=b.zolnierkie@samsung.com \
    --cc=glommer@parallels.com \
    --cc=gthelen@google.com \
    --cc=john.stultz@linaro.org \
    --cc=kernel-team@android.com \
    --cc=kirill@shutemov.name \
    --cc=kosaki.motohiro@gmail.com \
    --cc=lcapitulino@redhat.com \
    --cc=leonid.moiseichuk@nokia.com \
    --cc=linaro-kernel@lists.linaro.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@suse.de \
    --cc=mhocko@suse.cz \
    --cc=minchan@kernel.org \
    --cc=patches@linaro.org \
    --cc=penberg@kernel.org \
    --cc=rientjes@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).