Re: What can we do to get ready for memory controller merge in 2.6.25

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Balbir Singh <balbir@linux.vnet.ibm.com>
To: Paul Menage <menage@google.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>,
	Linux Memory Management List <linux-mm@kvack.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	linux kernel mailing list <linux-kernel@vger.kernel.org>,
	Peter Zijlstra <a.p.zijlstra@chello.nl>,
	Hugh Dickins <hugh@veritas.com>,
	Lee Schermerhorn <Lee.Schermerhorn@hp.com>,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
	Pavel Emelianov <xemul@sw.ru>,
	YAMAMOTO Takashi <yamamoto@valinux.co.jp>,
	Rik van Riel <riel@redhat.com>,
	Christoph Lameter <clameter@sgi.com>,
	"Martin J. Bligh" <mbligh@google.com>,
	Andy Whitcroft <andyw@uk.ibm.com>,
	Srivatsa Vaddagiri <vatsa@in.ibm.com>
Subject: Re: What can we do to get ready for memory controller merge in 2.6.25
Date: Sat, 01 Dec 2007 15:20:29 +0530	[thread overview]
Message-ID: <47512E65.9030803@linux.vnet.ibm.com> (raw)
In-Reply-To: <6599ad830711302339v1f92af40v85e89484a8a6575e@mail.gmail.com>

Paul Menage wrote:
> On Nov 29, 2007 6:11 PM, Nick Piggin <nickpiggin@yahoo.com.au> wrote:
>> And also some
>> results or even anecdotes of where this is going to be used would be
>> interesting...
> 
> We want to be able to run multiple isolated jobs on the same machine.
> So being able to limit how much memory each job can consume, in terms
> of anonymous memory and page cache, are useful. I've not had much time
> to look at the patches in great detail, but they seem to provide a
> sensible way to assign and enforce static limits on a bunch of jobs.
> 
> Some of our requirements are a bit beyond this, though:
> 
> In our experience, users are not good at figuring out how much memory
> they really need. In general they tend to massively over-estimate
> their requirements. So we want some way to determine how much of its
> allocated memory a job is actively using, and how much could be thrown
> away or swapped out without bothering the job too much.
> 

One would prefer the kernel provides the mechanism and user space
provides the policy. The algorithms to assign limits can exist in user
space and be supported by a good set of statistics.

> Of course, the definition of "actve use" is tricky - one possibility
> that we're looking at is "has been accessed within the last N
> seconds", where N can be configured appropriately for different jobs
> depending on the job's latency requirements. Active use should also be
> reported for pages that can't be easily freed quickly, e.g. mlocked or
> dirty pages, or anon pages on a swapless system. Inactive pages should
> be easily freeable, and be the first ones to go in the event of memory
> pressure. (From a scheduling point of view we can treat them as free
> memory, and schedule more jobs on the machine)
> 

This definition of active comes from the mainline kernel, which in-turn
is derived from our understanding of the working set.

> The existing active/inactive distinction doesn't really capture this,
> since it's relative rather than absolute.
> 

Not sure I understand why we need absolute use and not relative use.

> We want to be able to overcommit a machine, so the sums of the cgroup
> memory limits can add up to more than the total machine memory. So we
> need control over what happens when there's global memory pressure,
> and a way to ensure that the low-latency jobs don't get bogged down in
> reclaim (or OOM) due to the activity of batch jobs.
> 

I agree, well said. We need Job Isolation.

> Paul


-- 
	Warm Regards,
	Balbir Singh
	Linux Technology Center
	IBM, ISTL

WARNING: multiple messages have this Message-ID (diff)

From: Balbir Singh <balbir@linux.vnet.ibm.com>
To: Paul Menage <menage@google.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>,
	Linux Memory Management List <linux-mm@kvack.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	linux kernel mailing list <linux-kernel@vger.kernel.org>,
	Peter Zijlstra <a.p.zijlstra@chello.nl>,
	Hugh Dickins <hugh@veritas.com>,
	Lee Schermerhorn <Lee.Schermerhorn@hp.com>,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
	Pavel Emelianov <xemul@sw.ru>,
	YAMAMOTO Takashi <yamamoto@valinux.co.jp>,
	Rik van Riel <riel@redhat.com>,
	Christoph Lameter <clameter@sgi.com>,
	"Martin J. Bligh" <mbligh@google.com>,
	Andy Whitcroft <andyw@uk.ibm.com>,
	Srivatsa Vaddagiri <vatsa@in.ibm.com>
Subject: Re: What can we do to get ready for memory controller merge in 2.6.25
Date: Sat, 01 Dec 2007 15:20:29 +0530	[thread overview]
Message-ID: <47512E65.9030803@linux.vnet.ibm.com> (raw)
In-Reply-To: <6599ad830711302339v1f92af40v85e89484a8a6575e@mail.gmail.com>

Paul Menage wrote:
> On Nov 29, 2007 6:11 PM, Nick Piggin <nickpiggin@yahoo.com.au> wrote:
>> And also some
>> results or even anecdotes of where this is going to be used would be
>> interesting...
> 
> We want to be able to run multiple isolated jobs on the same machine.
> So being able to limit how much memory each job can consume, in terms
> of anonymous memory and page cache, are useful. I've not had much time
> to look at the patches in great detail, but they seem to provide a
> sensible way to assign and enforce static limits on a bunch of jobs.
> 
> Some of our requirements are a bit beyond this, though:
> 
> In our experience, users are not good at figuring out how much memory
> they really need. In general they tend to massively over-estimate
> their requirements. So we want some way to determine how much of its
> allocated memory a job is actively using, and how much could be thrown
> away or swapped out without bothering the job too much.
> 

One would prefer the kernel provides the mechanism and user space
provides the policy. The algorithms to assign limits can exist in user
space and be supported by a good set of statistics.

> Of course, the definition of "actve use" is tricky - one possibility
> that we're looking at is "has been accessed within the last N
> seconds", where N can be configured appropriately for different jobs
> depending on the job's latency requirements. Active use should also be
> reported for pages that can't be easily freed quickly, e.g. mlocked or
> dirty pages, or anon pages on a swapless system. Inactive pages should
> be easily freeable, and be the first ones to go in the event of memory
> pressure. (From a scheduling point of view we can treat them as free
> memory, and schedule more jobs on the machine)
> 

This definition of active comes from the mainline kernel, which in-turn
is derived from our understanding of the working set.

> The existing active/inactive distinction doesn't really capture this,
> since it's relative rather than absolute.
> 

Not sure I understand why we need absolute use and not relative use.

> We want to be able to overcommit a machine, so the sums of the cgroup
> memory limits can add up to more than the total machine memory. So we
> need control over what happens when there's global memory pressure,
> and a way to ensure that the low-latency jobs don't get bogged down in
> reclaim (or OOM) due to the activity of batch jobs.
> 

I agree, well said. We need Job Isolation.

> Paul


-- 
	Warm Regards,
	Balbir Singh
	Linux Technology Center
	IBM, ISTL

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

next prev parent reply	other threads:[~2007-12-01  9:50 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-11-29 14:43 What can we do to get ready for memory controller merge in 2.6.25 Balbir Singh
2007-11-29 14:43 ` Balbir Singh
2007-11-29 15:47 ` Rik van Riel
2007-11-29 15:47   ` Rik van Riel
2007-11-29 16:18   ` Balbir Singh
2007-11-29 16:18     ` Balbir Singh
2007-11-30  2:11 ` Nick Piggin
2007-11-30  2:11   ` Nick Piggin
2007-11-30  3:13   ` Balbir Singh
2007-11-30  3:13     ` Balbir Singh
2007-11-30 10:11     ` KAMEZAWA Hiroyuki
2007-11-30 10:11       ` KAMEZAWA Hiroyuki
2007-12-05 10:50       ` KAMEZAWA Hiroyuki
2007-12-05 10:50         ` KAMEZAWA Hiroyuki
2007-12-01  7:39   ` Paul Menage
2007-12-01  7:39     ` Paul Menage
2007-12-01  9:50     ` Balbir Singh [this message]
2007-12-01  9:50       ` Balbir Singh
2007-12-01 18:36       ` Rik van Riel
2007-12-01 18:36         ` Rik van Riel
2007-12-01 19:02         ` Paul Menage
2007-12-01 19:02           ` Paul Menage
2007-12-01 19:26           ` Rik van Riel
2007-12-01 19:26             ` Rik van Riel

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=47512E65.9030803@linux.vnet.ibm.com \
    --to=balbir@linux.vnet.ibm.com \
    --cc=Lee.Schermerhorn@hp.com \
    --cc=a.p.zijlstra@chello.nl \
    --cc=akpm@linux-foundation.org \
    --cc=andyw@uk.ibm.com \
    --cc=clameter@sgi.com \
    --cc=hugh@veritas.com \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mbligh@google.com \
    --cc=menage@google.com \
    --cc=nickpiggin@yahoo.com.au \
    --cc=riel@redhat.com \
    --cc=vatsa@in.ibm.com \
    --cc=xemul@sw.ru \
    --cc=yamamoto@valinux.co.jp \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.