From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail137.messagelabs.com (mail137.messagelabs.com [216.82.249.19]) by kanga.kvack.org (Postfix) with ESMTP id 23D666B003D for ; Sun, 22 Mar 2009 09:37:24 -0400 (EDT) Received: from d23relay01.au.ibm.com (d23relay01.au.ibm.com [202.81.31.243]) by e23smtp03.au.ibm.com (8.13.1/8.13.1) with ESMTP id n2MEJZMT024170 for ; Mon, 23 Mar 2009 01:19:35 +1100 Received: from d23av01.au.ibm.com (d23av01.au.ibm.com [9.190.234.96]) by d23relay01.au.ibm.com (8.13.8/8.13.8/NCO v9.2) with ESMTP id n2MELZjL184738 for ; Mon, 23 Mar 2009 01:21:38 +1100 Received: from d23av01.au.ibm.com (loopback [127.0.0.1]) by d23av01.au.ibm.com (8.12.11.20060308/8.13.3) with ESMTP id n2MELHTD004322 for ; Mon, 23 Mar 2009 01:21:18 +1100 Date: Sun, 22 Mar 2009 19:51:05 +0530 From: Balbir Singh Subject: Re: [PATCH 3/5] Memory controller soft limit organize cgroups (v7) Message-ID: <20090322142105.GA24227@balbir.in.ibm.com> Reply-To: balbir@linux.vnet.ibm.com References: <20090319165713.27274.94129.sendpatchset@localhost.localdomain> <20090319165735.27274.96091.sendpatchset@localhost.localdomain> <20090320124639.83d22726.kamezawa.hiroyu@jp.fujitsu.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline In-Reply-To: <20090320124639.83d22726.kamezawa.hiroyu@jp.fujitsu.com> Sender: owner-linux-mm@kvack.org To: KAMEZAWA Hiroyuki Cc: linux-mm@kvack.org, YAMAMOTO Takashi , lizf@cn.fujitsu.com, KOSAKI Motohiro , Rik van Riel , Andrew Morton List-ID: * KAMEZAWA Hiroyuki [2009-03-20 12:46:39]: > On Thu, 19 Mar 2009 22:27:35 +0530 > Balbir Singh wrote: > > > Feature: Organize cgroups over soft limit in a RB-Tree > > > > From: Balbir Singh > > > > Changelog v7...v6 > > 1. Refactor the check and update logic. The goal is to allow the > > check logic to be modular, so that it can be revisited in the future > > if something more appropriate is found to be useful. > > > One of my motivation to this was "reducing if" in res_counter charege... > But ..plz see comment. > > > Changelog v6...v5 > > 1. Update the key before inserting into RB tree. Without the current change > > it could take an additional iteration to get the key correct. > > > > Changelog v5...v4 > > 1. res_counter_uncharge has an additional parameter to indicate if the > > counter was over its soft limit, before uncharge. > > > > Changelog v4...v3 > > 1. Optimizations to ensure we don't uncessarily get res_counter values > > 2. Fixed a bug in usage of time_after() > > > > Changelog v3...v2 > > 1. Add only the ancestor to the RB-Tree > > 2. Use css_tryget/css_put instead of mem_cgroup_get/mem_cgroup_put > > > > Changelog v2...v1 > > 1. Add support for hierarchies > > 2. The res_counter that is highest in the hierarchy is returned on soft > > limit being exceeded. Since we do hierarchical reclaim and add all > > groups exceeding their soft limits, this approach seems to work well > > in practice. > > > > This patch introduces a RB-Tree for storing memory cgroups that are over their > > soft limit. The overall goal is to > > > > 1. Add a memory cgroup to the RB-Tree when the soft limit is exceeded. > > We are careful about updates, updates take place only after a particular > > time interval has passed > > 2. We remove the node from the RB-Tree when the usage goes below the soft > > limit > > > > The next set of patches will exploit the RB-Tree to get the group that is > > over its soft limit by the largest amount and reclaim from it, when we > > face memory contention. > > > > Signed-off-by: Balbir Singh > > --- > > > > include/linux/res_counter.h | 6 +- > > kernel/res_counter.c | 18 +++++ > > mm/memcontrol.c | 149 ++++++++++++++++++++++++++++++++++++++----- > > 3 files changed, 151 insertions(+), 22 deletions(-) > > > > > > diff --git a/include/linux/res_counter.h b/include/linux/res_counter.h > > index 5c821fd..5bbf8b1 100644 > > --- a/include/linux/res_counter.h > > +++ b/include/linux/res_counter.h > > @@ -112,7 +112,8 @@ void res_counter_init(struct res_counter *counter, struct res_counter *parent); > > int __must_check res_counter_charge_locked(struct res_counter *counter, > > unsigned long val); > > int __must_check res_counter_charge(struct res_counter *counter, > > - unsigned long val, struct res_counter **limit_fail_at); > > + unsigned long val, struct res_counter **limit_fail_at, > > + struct res_counter **soft_limit_at); > > > > /* > > * uncharge - tell that some portion of the resource is released > > @@ -125,7 +126,8 @@ int __must_check res_counter_charge(struct res_counter *counter, > > */ > > > > void res_counter_uncharge_locked(struct res_counter *counter, unsigned long val); > > -void res_counter_uncharge(struct res_counter *counter, unsigned long val); > > +void res_counter_uncharge(struct res_counter *counter, unsigned long val, > > + bool *was_soft_limit_excess); > > > > static inline bool res_counter_limit_check_locked(struct res_counter *cnt) > > { > > diff --git a/kernel/res_counter.c b/kernel/res_counter.c > > index 4e6dafe..51ec438 100644 > > --- a/kernel/res_counter.c > > +++ b/kernel/res_counter.c > > @@ -37,17 +37,27 @@ int res_counter_charge_locked(struct res_counter *counter, unsigned long val) > > } > > > > int res_counter_charge(struct res_counter *counter, unsigned long val, > > - struct res_counter **limit_fail_at) > > + struct res_counter **limit_fail_at, > > + struct res_counter **soft_limit_fail_at) > > { > > int ret; > > unsigned long flags; > > struct res_counter *c, *u; > > > > *limit_fail_at = NULL; > > + if (soft_limit_fail_at) > > + *soft_limit_fail_at = NULL; > > local_irq_save(flags); > > for (c = counter; c != NULL; c = c->parent) { > > spin_lock(&c->lock); > > ret = res_counter_charge_locked(c, val); > > + /* > > + * With soft limits, we return the highest ancestor > > + * that exceeds its soft limit > > + */ > > + if (soft_limit_fail_at && > > + !res_counter_soft_limit_check_locked(c)) > > + *soft_limit_fail_at = c; > > Is this correct way to go ? In following situation, > > A/ softlimit=1G usage=1.2G > B1/ sfotlimit=400M usage=1G > C/ > B2/ softlimit=400M usage=200M > > "A" will be victim and both of B1 and B2 will be reclaim target, right ? > Yes, you remember we discussed adding the oldest ancestor in an older version. It was your suggestion to add the highest ancestor, have you changed your mind? > and I wonder we don't need *softlimit_failed_at*... here. > Not sure I get your point, could you please clarify this? > > > > > +static bool mem_cgroup_soft_limit_check(struct mem_cgroup *mem, > > + bool over_soft_limit) > > +{ > > + unsigned long next_update; > > + > > + if (!over_soft_limit) > > + return false; > > + > > + next_update = mem->last_tree_update + MEM_CGROUP_TREE_UPDATE_INTERVAL; > > + if (time_after(jiffies, next_update)) > > + return true; > > + > > + return false; > > +} > > If I write, this function will be > > static bool mem_cgroup_soft_limit_check(struct mem_cgroup *mem, struct res_counter *failed_at) > { > next_update = mem->last_tree_update + MEM_CGROUP_TREE_UPDATE_INTERVAL; > if (!time_after(jiffies, next_update)) > return true; > /* check softlimit */ > for (c = &mem->res; !c; c= c->parent) { > if (!res_counter_check_under_soft_limit(c)) { > failed_at =c; > } > } > return false; > } > > > /* > * Insert just the ancestor, we should trickle down to the correct > * cgroup for reclaim, since the other nodes will be below their > * soft limit > */ > if (mem_cgroup_soft_limit_check(mem, &soft_fail_res)) { > mem_over_soft_limit = > mem_cgroup_from_res_counter(soft_fail_res, res); > mem_cgroup_update_tree(mem_over_soft_limit); > } > > Then, we really do softlimit check once in interval. OK, so the trade-off is - every once per interval, I need to walk up res_counters all over again, hold all locks and check. Like I mentioned earlier, with the current approach I've reduced the overhead significantly for non-users. Earlier I was seeing a small loss in output with reaim, but since I changed res_counter_uncharge to track soft limits, that difference is negligible now. The issue I see with this approach is that if soft-limits were not enabled, even then we would need to walk up the hierarchy and do tests, where as embedding it in res_counter_charge, one simple check tells us we don't have more to do. -- Balbir -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org