From: Balbir Singh <balbir@linux.vnet.ibm.com>
To: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: linux-mm@kvack.org, YAMAMOTO Takashi <yamamoto@valinux.co.jp>,
lizf@cn.fujitsu.com,
KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
Rik van Riel <riel@redhat.com>,
Andrew Morton <akpm@linux-foundation.org>
Subject: Re: [PATCH 3/5] Memory controller soft limit organize cgroups (v7)
Date: Wed, 25 Mar 2009 10:59:47 +0530 [thread overview]
Message-ID: <20090325052945.GI24227@balbir.in.ibm.com> (raw)
In-Reply-To: <20090325135900.dc82f133.kamezawa.hiroyu@jp.fujitsu.com>
* KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> [2009-03-25 13:59:00]:
> On Thu, 19 Mar 2009 22:27:35 +0530
> Balbir Singh <balbir@linux.vnet.ibm.com> wrote:
>
> > Feature: Organize cgroups over soft limit in a RB-Tree
> >
> > From: Balbir Singh <balbir@linux.vnet.ibm.com>
> >
> > Changelog v7...v6
> > 1. Refactor the check and update logic. The goal is to allow the
> > check logic to be modular, so that it can be revisited in the future
> > if something more appropriate is found to be useful.
> >
> > Changelog v6...v5
> > 1. Update the key before inserting into RB tree. Without the current change
> > it could take an additional iteration to get the key correct.
> >
> > Changelog v5...v4
> > 1. res_counter_uncharge has an additional parameter to indicate if the
> > counter was over its soft limit, before uncharge.
> >
> > Changelog v4...v3
> > 1. Optimizations to ensure we don't uncessarily get res_counter values
> > 2. Fixed a bug in usage of time_after()
> >
> > Changelog v3...v2
> > 1. Add only the ancestor to the RB-Tree
> > 2. Use css_tryget/css_put instead of mem_cgroup_get/mem_cgroup_put
> >
> > Changelog v2...v1
> > 1. Add support for hierarchies
> > 2. The res_counter that is highest in the hierarchy is returned on soft
> > limit being exceeded. Since we do hierarchical reclaim and add all
> > groups exceeding their soft limits, this approach seems to work well
> > in practice.
> >
> > This patch introduces a RB-Tree for storing memory cgroups that are over their
> > soft limit. The overall goal is to
> >
> > 1. Add a memory cgroup to the RB-Tree when the soft limit is exceeded.
> > We are careful about updates, updates take place only after a particular
> > time interval has passed
> > 2. We remove the node from the RB-Tree when the usage goes below the soft
> > limit
> >
> > The next set of patches will exploit the RB-Tree to get the group that is
> > over its soft limit by the largest amount and reclaim from it, when we
> > face memory contention.
> >
> > Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com>
> > ---
> >
> > include/linux/res_counter.h | 6 +-
> > kernel/res_counter.c | 18 +++++
> > mm/memcontrol.c | 149 ++++++++++++++++++++++++++++++++++++++-----
> > 3 files changed, 151 insertions(+), 22 deletions(-)
> >
> >
> > diff --git a/include/linux/res_counter.h b/include/linux/res_counter.h
> > index 5c821fd..5bbf8b1 100644
> > --- a/include/linux/res_counter.h
> > +++ b/include/linux/res_counter.h
> > @@ -112,7 +112,8 @@ void res_counter_init(struct res_counter *counter, struct res_counter *parent);
> > int __must_check res_counter_charge_locked(struct res_counter *counter,
> > unsigned long val);
> > int __must_check res_counter_charge(struct res_counter *counter,
> > - unsigned long val, struct res_counter **limit_fail_at);
> > + unsigned long val, struct res_counter **limit_fail_at,
> > + struct res_counter **soft_limit_at);
> >
> > /*
> > * uncharge - tell that some portion of the resource is released
> > @@ -125,7 +126,8 @@ int __must_check res_counter_charge(struct res_counter *counter,
> > */
> >
> > void res_counter_uncharge_locked(struct res_counter *counter, unsigned long val);
> > -void res_counter_uncharge(struct res_counter *counter, unsigned long val);
> > +void res_counter_uncharge(struct res_counter *counter, unsigned long val,
> > + bool *was_soft_limit_excess);
> >
> > static inline bool res_counter_limit_check_locked(struct res_counter *cnt)
> > {
> > diff --git a/kernel/res_counter.c b/kernel/res_counter.c
> > index 4e6dafe..51ec438 100644
> > --- a/kernel/res_counter.c
> > +++ b/kernel/res_counter.c
> > @@ -37,17 +37,27 @@ int res_counter_charge_locked(struct res_counter *counter, unsigned long val)
> > }
> >
> > int res_counter_charge(struct res_counter *counter, unsigned long val,
> > - struct res_counter **limit_fail_at)
> > + struct res_counter **limit_fail_at,
> > + struct res_counter **soft_limit_fail_at)
> > {
> > int ret;
> > unsigned long flags;
> > struct res_counter *c, *u;
> >
> > *limit_fail_at = NULL;
> > + if (soft_limit_fail_at)
> > + *soft_limit_fail_at = NULL;
> > local_irq_save(flags);
> > for (c = counter; c != NULL; c = c->parent) {
> > spin_lock(&c->lock);
> > ret = res_counter_charge_locked(c, val);
> > + /*
> > + * With soft limits, we return the highest ancestor
> > + * that exceeds its soft limit
> > + */
> > + if (soft_limit_fail_at &&
> > + !res_counter_soft_limit_check_locked(c))
> > + *soft_limit_fail_at = c;
> > spin_unlock(&c->lock);
>
> I'm not sure this works as intended or not. Could you clarify ? (see below)
>
> In following hierarchy,
>
> A/ soft_limit=1G, usage=1.2G.
> B soft_limit=200M, usage=1G
> C soft_limit=800M, usage=200M
>
> This function returns only "A".
> And memory will be reclaimed from B and C, at first.
>
Yes, A will be put on the RB-Tree, earlier we were putting both A and
B, but you were opposed to it (I can't remember why). Memory will be
reclaimed from A, B and C through hierarchical reclaim.
>
>
> > if (ret < 0) {
> > *limit_fail_at = c;
> > @@ -75,7 +85,8 @@ void res_counter_uncharge_locked(struct res_counter *counter, unsigned long val)
> > counter->usage -= val;
> > }
> >
> > -void res_counter_uncharge(struct res_counter *counter, unsigned long val)
> > +void res_counter_uncharge(struct res_counter *counter, unsigned long val,
> > + bool *was_soft_limit_excess)
> > {
> > unsigned long flags;
> > struct res_counter *c;
> > @@ -83,6 +94,9 @@ void res_counter_uncharge(struct res_counter *counter, unsigned long val)
> > local_irq_save(flags);
> > for (c = counter; c != NULL; c = c->parent) {
> > spin_lock(&c->lock);
> > + if (c == counter && was_soft_limit_excess)
> > + *was_soft_limit_excess =
> > + !res_counter_soft_limit_check_locked(c);
> > res_counter_uncharge_locked(c, val);
> > spin_unlock(&c->lock);
> > }
> Does this work as intended ?
> Assume following hierarchy
>
> A/ softlimit=1G usage=300M
> B/ softlimit=200M usage=300M.
> C/ softlimit=800M usage=0M
>
> *was_soft_limit_excess will be false and no tree update, forever.
>
No.. was_soft_limit_excess checks the soft limit before uncharge to
see if we were over soft limit, when a page gets uncharged from B,
since B is over soft limit and on tree, we will update the tree. Why
do you say that was_soft_limit_excess will return false?
--
Balbir
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2009-03-25 5:05 UTC|newest]
Thread overview: 54+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-03-19 16:57 [PATCH 0/5] Memory controller soft limit patches (v7) Balbir Singh
2009-03-19 16:57 ` [PATCH 1/5] Memory controller soft limit documentation (v7) Balbir Singh
2009-03-19 16:57 ` [PATCH 2/5] Memory controller soft limit interface (v7) Balbir Singh
2009-03-19 16:57 ` [PATCH 3/5] Memory controller soft limit organize cgroups (v7) Balbir Singh
2009-03-20 3:46 ` KAMEZAWA Hiroyuki
2009-03-22 14:21 ` Balbir Singh
2009-03-22 23:53 ` KAMEZAWA Hiroyuki
2009-03-23 3:34 ` Balbir Singh
2009-03-23 3:38 ` KAMEZAWA Hiroyuki
2009-03-23 4:15 ` Balbir Singh
2009-03-23 4:23 ` KAMEZAWA Hiroyuki
2009-03-23 8:22 ` Balbir Singh
2009-03-23 8:47 ` KAMEZAWA Hiroyuki
2009-03-23 9:30 ` Balbir Singh
2009-03-25 4:59 ` KAMEZAWA Hiroyuki
2009-03-25 5:29 ` Balbir Singh [this message]
2009-03-25 5:39 ` KAMEZAWA Hiroyuki
2009-03-25 5:53 ` Balbir Singh
2009-03-25 6:01 ` KAMEZAWA Hiroyuki
2009-03-25 6:21 ` Balbir Singh
2009-03-25 6:38 ` Balbir Singh
2009-03-25 5:07 ` KAMEZAWA Hiroyuki
2009-03-25 5:18 ` Balbir Singh
2009-03-25 5:22 ` KAMEZAWA Hiroyuki
2009-03-19 16:57 ` [PATCH 4/5] Memory controller soft limit refactor reclaim flags (v7) Balbir Singh
2009-03-20 3:47 ` KAMEZAWA Hiroyuki
2009-03-22 14:21 ` Balbir Singh
2009-03-19 16:57 ` [PATCH 5/5] Memory controller soft limit reclaim on contention (v7) Balbir Singh
2009-03-20 4:06 ` KAMEZAWA Hiroyuki
2009-03-22 14:27 ` Balbir Singh
2009-03-23 0:02 ` KAMEZAWA Hiroyuki
2009-03-23 4:12 ` Balbir Singh
2009-03-23 4:20 ` KAMEZAWA Hiroyuki
2009-03-23 8:28 ` Balbir Singh
2009-03-23 8:30 ` KAMEZAWA Hiroyuki
2009-03-23 3:50 ` [PATCH 0/5] Memory controller soft limit patches (v7) KAMEZAWA Hiroyuki
2009-03-23 5:22 ` Balbir Singh
2009-03-23 5:31 ` KAMEZAWA Hiroyuki
2009-03-23 6:12 ` KAMEZAWA Hiroyuki
2009-03-23 6:17 ` KAMEZAWA Hiroyuki
2009-03-23 6:35 ` KOSAKI Motohiro
2009-03-23 8:24 ` Balbir Singh
2009-03-23 9:12 ` KOSAKI Motohiro
2009-03-23 9:23 ` Balbir Singh
2009-03-23 8:35 ` Balbir Singh
2009-03-23 8:52 ` KAMEZAWA Hiroyuki
2009-03-23 9:46 ` Balbir Singh
2009-03-23 9:41 ` Balbir Singh
2009-03-23 8:31 ` KAMEZAWA Hiroyuki
2009-03-24 17:34 ` Balbir Singh
2009-03-24 23:55 ` KAMEZAWA Hiroyuki
2009-03-25 3:42 ` KAMEZAWA Hiroyuki
2009-03-25 4:02 ` Balbir Singh
2009-03-25 4:05 ` KAMEZAWA Hiroyuki
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20090325052945.GI24227@balbir.in.ibm.com \
--to=balbir@linux.vnet.ibm.com \
--cc=akpm@linux-foundation.org \
--cc=kamezawa.hiroyu@jp.fujitsu.com \
--cc=kosaki.motohiro@jp.fujitsu.com \
--cc=linux-mm@kvack.org \
--cc=lizf@cn.fujitsu.com \
--cc=riel@redhat.com \
--cc=yamamoto@valinux.co.jp \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).