[RFC PATCH 3/6] memcg: restructure shrink_slab to walk memcg hierarchy

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [RFC PATCH 3/6] memcg: restructure shrink_slab to walk memcg hierarchy
@ 2012-08-16 20:53 Ying Han
  2012-08-17  5:38 ` Glauber Costa
  0 siblings, 1 reply; 6+ messages in thread
From: Ying Han @ 2012-08-16 20:53 UTC (permalink / raw)
  To: Michal Hocko, Johannes Weiner, Mel Gorman, KAMEZAWA Hiroyuki,
	Rik van Riel, Greg Thelen, Christoph Lameter, KOSAKI Motohiro,
	Glauber Costa
  Cc: linux-mm

This patch moves the main slab shrinking to do_shrink_slab() and restructures
shrink_slab() to walk the memory cgroup hiearchy. The memcg context is embedded
inside the shrink_control. The underling shrinker will be respecting the new
field by only reclaiming slab objects charged to the memcg.

The hierarchy walk in shrink_slab() is slightly different than the walk in
shrink_zone(), where the latter one walks each memcg once for each priority
under concurrent reclaim threads. It makes less sense for slab since they are
spread out the system instead of per-zone. So here each shrink_slab() will
trigger a full walk of each memcg under the sub-tree.

One optimization is under global reclaim, where we skip walking the whole tree
but instead pass into shrinker w/ mem_cgroup=NULL. Then it will end up scanning
the full dentry lru list.

Signed-off-by: Ying Han <yinghan@google.com>
---
 mm/vmscan.c |   43 +++++++++++++++++++++++++++++++++++--------
 1 files changed, 35 insertions(+), 8 deletions(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 6ffdff6..7a3a1a4 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -204,7 +204,7 @@ static inline int do_shrinker_shrink(struct shrinker *shrinker,
  *
  * Returns the number of slab objects which we shrunk.
  */
-unsigned long shrink_slab(struct shrink_control *shrink,
+static unsigned long do_shrink_slab(struct shrink_control *shrink,
 			  unsigned long nr_pages_scanned,
 			  unsigned long lru_pages)
 {
@@ -214,12 +214,6 @@ unsigned long shrink_slab(struct shrink_control *shrink,
 	if (nr_pages_scanned == 0)
 		nr_pages_scanned = SWAP_CLUSTER_MAX;
 
-	if (!down_read_trylock(&shrinker_rwsem)) {
-		/* Assume we'll be able to shrink next time */
-		ret = 1;
-		goto out;
-	}
-
 	list_for_each_entry(shrinker, &shrinker_list, list) {
 		unsigned long long delta;
 		long total_scan;
@@ -309,8 +303,41 @@ unsigned long shrink_slab(struct shrink_control *shrink,
 
 		trace_mm_shrink_slab_end(shrinker, shrink_ret, nr, new_nr);
 	}
+
+	return ret;
+}
+
+unsigned long shrink_slab(struct shrink_control *shrink,
+			  unsigned long nr_pages_scanned,
+			  unsigned long lru_pages)
+{
+	unsigned long ret = 0;
+	struct mem_cgroup *root = shrink->target_mem_cgroup;
+	struct mem_cgroup *memcg;
+
+	if (!down_read_trylock(&shrinker_rwsem))
+		/* Assume we'll be able to shrink next time */
+		return 1;
+
+	/*
+	 * In case of a global reclaim, skip walking through the hierarchy and
+	 * invoke the shrinker with mem_cgroup=NULL.
+	 */
+	if (!root) {
+		ret = do_shrink_slab(shrink, nr_pages_scanned, lru_pages);
+		goto done;
+	}
+
+	memcg = mem_cgroup_iter(root, NULL, NULL);
+	do {
+		shrink->mem_cgroup = memcg;
+		ret += do_shrink_slab(shrink, nr_pages_scanned, lru_pages);
+		memcg = mem_cgroup_iter(root, memcg, NULL);
+	} while (memcg);
+
+done:
 	up_read(&shrinker_rwsem);
-out:
+
 	cond_resched();
 	return ret;
 }
-- 
1.7.7.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [RFC PATCH 3/6] memcg: restructure shrink_slab to walk memcg hierarchy
  2012-08-16 20:53 [RFC PATCH 3/6] memcg: restructure shrink_slab to walk memcg hierarchy Ying Han
@ 2012-08-17  5:38 ` Glauber Costa
  2012-08-17  5:46   ` Ying Han
  0 siblings, 1 reply; 6+ messages in thread
From: Glauber Costa @ 2012-08-17  5:38 UTC (permalink / raw)
  To: Ying Han
  Cc: Michal Hocko, Johannes Weiner, Mel Gorman, KAMEZAWA Hiroyuki,
	Rik van Riel, Greg Thelen, Christoph Lameter, KOSAKI Motohiro,
	linux-mm

On 08/17/2012 12:53 AM, Ying Han wrote:
> This patch moves the main slab shrinking to do_shrink_slab() and restructures
> shrink_slab() to walk the memory cgroup hiearchy. The memcg context is embedded
> inside the shrink_control. The underling shrinker will be respecting the new
> field by only reclaiming slab objects charged to the memcg.
> 
> The hierarchy walk in shrink_slab() is slightly different than the walk in
> shrink_zone(), where the latter one walks each memcg once for each priority
> under concurrent reclaim threads. It makes less sense for slab since they are
> spread out the system instead of per-zone. So here each shrink_slab() will
> trigger a full walk of each memcg under the sub-tree.
> 
> One optimization is under global reclaim, where we skip walking the whole tree
> but instead pass into shrinker w/ mem_cgroup=NULL. Then it will end up scanning
> the full dentry lru list.
> 
> Signed-off-by: Ying Han <yinghan@google.com>
> ---
>  mm/vmscan.c |   43 +++++++++++++++++++++++++++++++++++--------
>  1 files changed, 35 insertions(+), 8 deletions(-)
> 
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 6ffdff6..7a3a1a4 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -204,7 +204,7 @@ static inline int do_shrinker_shrink(struct shrinker *shrinker,
>   *
>   * Returns the number of slab objects which we shrunk.
>   */
> -unsigned long shrink_slab(struct shrink_control *shrink,
> +static unsigned long do_shrink_slab(struct shrink_control *shrink,
>  			  unsigned long nr_pages_scanned,
>  			  unsigned long lru_pages)
>  {
> @@ -214,12 +214,6 @@ unsigned long shrink_slab(struct shrink_control *shrink,
>  	if (nr_pages_scanned == 0)
>  		nr_pages_scanned = SWAP_CLUSTER_MAX;
>  
> -	if (!down_read_trylock(&shrinker_rwsem)) {
> -		/* Assume we'll be able to shrink next time */
> -		ret = 1;
> -		goto out;
> -	}
> -
>  	list_for_each_entry(shrinker, &shrinker_list, list) {
>  		unsigned long long delta;
>  		long total_scan;
> @@ -309,8 +303,41 @@ unsigned long shrink_slab(struct shrink_control *shrink,
>  
>  		trace_mm_shrink_slab_end(shrinker, shrink_ret, nr, new_nr);
>  	}
> +
> +	return ret;
> +}
> +
It seems to me this will call all shrinkers, regardless of whether or
not they are memcg-aware. Can't we just skip the ones we know not to be
memcg-aware?  (basically all non-vfs for the moment...)

My fear is that if called, they will shrink. And that may not be what we
want.



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC PATCH 3/6] memcg: restructure shrink_slab to walk memcg hierarchy
  2012-08-17  5:46   ` Ying Han
@ 2012-08-17  5:45     ` Glauber Costa
  2012-08-17  5:53       ` Ying Han
  0 siblings, 1 reply; 6+ messages in thread
From: Glauber Costa @ 2012-08-17  5:45 UTC (permalink / raw)
  To: Ying Han
  Cc: Michal Hocko, Johannes Weiner, Mel Gorman, KAMEZAWA Hiroyuki,
	Rik van Riel, Greg Thelen, Christoph Lameter, KOSAKI Motohiro,
	linux-mm

On 08/17/2012 09:46 AM, Ying Han wrote:
> On Thu, Aug 16, 2012 at 10:38 PM, Glauber Costa <glommer@parallels.com> wrote:
>> On 08/17/2012 12:53 AM, Ying Han wrote:
>>> This patch moves the main slab shrinking to do_shrink_slab() and restructures
>>> shrink_slab() to walk the memory cgroup hiearchy. The memcg context is embedded
>>> inside the shrink_control. The underling shrinker will be respecting the new
>>> field by only reclaiming slab objects charged to the memcg.
>>>
>>> The hierarchy walk in shrink_slab() is slightly different than the walk in
>>> shrink_zone(), where the latter one walks each memcg once for each priority
>>> under concurrent reclaim threads. It makes less sense for slab since they are
>>> spread out the system instead of per-zone. So here each shrink_slab() will
>>> trigger a full walk of each memcg under the sub-tree.
>>>
>>> One optimization is under global reclaim, where we skip walking the whole tree
>>> but instead pass into shrinker w/ mem_cgroup=NULL. Then it will end up scanning
>>> the full dentry lru list.
>>>
>>> Signed-off-by: Ying Han <yinghan@google.com>
>>> ---
>>>  mm/vmscan.c |   43 +++++++++++++++++++++++++++++++++++--------
>>>  1 files changed, 35 insertions(+), 8 deletions(-)
>>>
>>> diff --git a/mm/vmscan.c b/mm/vmscan.c
>>> index 6ffdff6..7a3a1a4 100644
>>> --- a/mm/vmscan.c
>>> +++ b/mm/vmscan.c
>>> @@ -204,7 +204,7 @@ static inline int do_shrinker_shrink(struct shrinker *shrinker,
>>>   *
>>>   * Returns the number of slab objects which we shrunk.
>>>   */
>>> -unsigned long shrink_slab(struct shrink_control *shrink,
>>> +static unsigned long do_shrink_slab(struct shrink_control *shrink,
>>>                         unsigned long nr_pages_scanned,
>>>                         unsigned long lru_pages)
>>>  {
>>> @@ -214,12 +214,6 @@ unsigned long shrink_slab(struct shrink_control *shrink,
>>>       if (nr_pages_scanned == 0)
>>>               nr_pages_scanned = SWAP_CLUSTER_MAX;
>>>
>>> -     if (!down_read_trylock(&shrinker_rwsem)) {
>>> -             /* Assume we'll be able to shrink next time */
>>> -             ret = 1;
>>> -             goto out;
>>> -     }
>>> -
>>>       list_for_each_entry(shrinker, &shrinker_list, list) {
>>>               unsigned long long delta;
>>>               long total_scan;
>>> @@ -309,8 +303,41 @@ unsigned long shrink_slab(struct shrink_control *shrink,
>>>
>>>               trace_mm_shrink_slab_end(shrinker, shrink_ret, nr, new_nr);
>>>       }
>>> +
>>> +     return ret;
>>> +}
>>> +
>> It seems to me this will call all shrinkers, regardless of whether or
>> not they are memcg-aware. Can't we just skip the ones we know not to be
>> memcg-aware?  (basically all non-vfs for the moment...)
>>
>> My fear is that if called, they will shrink. And that may not be what we
>> want.
> 
> Are you suggesting to not shrink slabs other than dentry cache? Not
> sure if that is what we want
> neither. However, maybe we can do that for target reclaim though if
> that is what you meant.
> 

If the other shrinkers are not memcg aware, they will end up discarding
random objects that may or may not have anything to do with the group
under pressure, right?

This sounds dangerous to the point I'd prefer not touching them at all.

Obviously, having more memcg-aware shrinkers would void this concern.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC PATCH 3/6] memcg: restructure shrink_slab to walk memcg hierarchy
  2012-08-17  5:38 ` Glauber Costa
@ 2012-08-17  5:46   ` Ying Han
  2012-08-17  5:45     ` Glauber Costa
  0 siblings, 1 reply; 6+ messages in thread
From: Ying Han @ 2012-08-17  5:46 UTC (permalink / raw)
  To: Glauber Costa
  Cc: Michal Hocko, Johannes Weiner, Mel Gorman, KAMEZAWA Hiroyuki,
	Rik van Riel, Greg Thelen, Christoph Lameter, KOSAKI Motohiro,
	linux-mm

On Thu, Aug 16, 2012 at 10:38 PM, Glauber Costa <glommer@parallels.com> wrote:
> On 08/17/2012 12:53 AM, Ying Han wrote:
>> This patch moves the main slab shrinking to do_shrink_slab() and restructures
>> shrink_slab() to walk the memory cgroup hiearchy. The memcg context is embedded
>> inside the shrink_control. The underling shrinker will be respecting the new
>> field by only reclaiming slab objects charged to the memcg.
>>
>> The hierarchy walk in shrink_slab() is slightly different than the walk in
>> shrink_zone(), where the latter one walks each memcg once for each priority
>> under concurrent reclaim threads. It makes less sense for slab since they are
>> spread out the system instead of per-zone. So here each shrink_slab() will
>> trigger a full walk of each memcg under the sub-tree.
>>
>> One optimization is under global reclaim, where we skip walking the whole tree
>> but instead pass into shrinker w/ mem_cgroup=NULL. Then it will end up scanning
>> the full dentry lru list.
>>
>> Signed-off-by: Ying Han <yinghan@google.com>
>> ---
>>  mm/vmscan.c |   43 +++++++++++++++++++++++++++++++++++--------
>>  1 files changed, 35 insertions(+), 8 deletions(-)
>>
>> diff --git a/mm/vmscan.c b/mm/vmscan.c
>> index 6ffdff6..7a3a1a4 100644
>> --- a/mm/vmscan.c
>> +++ b/mm/vmscan.c
>> @@ -204,7 +204,7 @@ static inline int do_shrinker_shrink(struct shrinker *shrinker,
>>   *
>>   * Returns the number of slab objects which we shrunk.
>>   */
>> -unsigned long shrink_slab(struct shrink_control *shrink,
>> +static unsigned long do_shrink_slab(struct shrink_control *shrink,
>>                         unsigned long nr_pages_scanned,
>>                         unsigned long lru_pages)
>>  {
>> @@ -214,12 +214,6 @@ unsigned long shrink_slab(struct shrink_control *shrink,
>>       if (nr_pages_scanned == 0)
>>               nr_pages_scanned = SWAP_CLUSTER_MAX;
>>
>> -     if (!down_read_trylock(&shrinker_rwsem)) {
>> -             /* Assume we'll be able to shrink next time */
>> -             ret = 1;
>> -             goto out;
>> -     }
>> -
>>       list_for_each_entry(shrinker, &shrinker_list, list) {
>>               unsigned long long delta;
>>               long total_scan;
>> @@ -309,8 +303,41 @@ unsigned long shrink_slab(struct shrink_control *shrink,
>>
>>               trace_mm_shrink_slab_end(shrinker, shrink_ret, nr, new_nr);
>>       }
>> +
>> +     return ret;
>> +}
>> +
> It seems to me this will call all shrinkers, regardless of whether or
> not they are memcg-aware. Can't we just skip the ones we know not to be
> memcg-aware?  (basically all non-vfs for the moment...)
>
> My fear is that if called, they will shrink. And that may not be what we
> want.

Are you suggesting to not shrink slabs other than dentry cache? Not
sure if that is what we want
neither. However, maybe we can do that for target reclaim though if
that is what you meant.

--Ying
>
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC PATCH 3/6] memcg: restructure shrink_slab to walk memcg hierarchy
  2012-08-17  5:53       ` Ying Han
@ 2012-08-17  5:53         ` Glauber Costa
  0 siblings, 0 replies; 6+ messages in thread
From: Glauber Costa @ 2012-08-17  5:53 UTC (permalink / raw)
  To: Ying Han
  Cc: Michal Hocko, Johannes Weiner, Mel Gorman, KAMEZAWA Hiroyuki,
	Rik van Riel, Greg Thelen, Christoph Lameter, KOSAKI Motohiro,
	linux-mm

On 08/17/2012 09:53 AM, Ying Han wrote:
>> > If the other shrinkers are not memcg aware, they will end up discarding
>> > random objects that may or may not have anything to do with the group
>> > under pressure, right?
> The main contributor of the accounted slabs and also reclaimable are
> vfs objects. Also we know dentry pins inode,
> so I wonder how bad the problem would be. Do you have specific example
> on which shrinker could cause the problem?
> 

I don't have any specific shrinkers in mind, but as you said yourself:
the main contributors comes from the VFS. So as long as we shrink the
VFS objects - and those will be memcg aware, why bother with the others?

It seems to me that we're just risking breaking isolation for very
little gain.


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC PATCH 3/6] memcg: restructure shrink_slab to walk memcg hierarchy
  2012-08-17  5:45     ` Glauber Costa
@ 2012-08-17  5:53       ` Ying Han
  2012-08-17  5:53         ` Glauber Costa
  0 siblings, 1 reply; 6+ messages in thread
From: Ying Han @ 2012-08-17  5:53 UTC (permalink / raw)
  To: Glauber Costa
  Cc: Michal Hocko, Johannes Weiner, Mel Gorman, KAMEZAWA Hiroyuki,
	Rik van Riel, Greg Thelen, Christoph Lameter, KOSAKI Motohiro,
	linux-mm

On Thu, Aug 16, 2012 at 10:45 PM, Glauber Costa <glommer@parallels.com> wrote:
> On 08/17/2012 09:46 AM, Ying Han wrote:
>> On Thu, Aug 16, 2012 at 10:38 PM, Glauber Costa <glommer@parallels.com> wrote:
>>> On 08/17/2012 12:53 AM, Ying Han wrote:
>>>> This patch moves the main slab shrinking to do_shrink_slab() and restructures
>>>> shrink_slab() to walk the memory cgroup hiearchy. The memcg context is embedded
>>>> inside the shrink_control. The underling shrinker will be respecting the new
>>>> field by only reclaiming slab objects charged to the memcg.
>>>>
>>>> The hierarchy walk in shrink_slab() is slightly different than the walk in
>>>> shrink_zone(), where the latter one walks each memcg once for each priority
>>>> under concurrent reclaim threads. It makes less sense for slab since they are
>>>> spread out the system instead of per-zone. So here each shrink_slab() will
>>>> trigger a full walk of each memcg under the sub-tree.
>>>>
>>>> One optimization is under global reclaim, where we skip walking the whole tree
>>>> but instead pass into shrinker w/ mem_cgroup=NULL. Then it will end up scanning
>>>> the full dentry lru list.
>>>>
>>>> Signed-off-by: Ying Han <yinghan@google.com>
>>>> ---
>>>>  mm/vmscan.c |   43 +++++++++++++++++++++++++++++++++++--------
>>>>  1 files changed, 35 insertions(+), 8 deletions(-)
>>>>
>>>> diff --git a/mm/vmscan.c b/mm/vmscan.c
>>>> index 6ffdff6..7a3a1a4 100644
>>>> --- a/mm/vmscan.c
>>>> +++ b/mm/vmscan.c
>>>> @@ -204,7 +204,7 @@ static inline int do_shrinker_shrink(struct shrinker *shrinker,
>>>>   *
>>>>   * Returns the number of slab objects which we shrunk.
>>>>   */
>>>> -unsigned long shrink_slab(struct shrink_control *shrink,
>>>> +static unsigned long do_shrink_slab(struct shrink_control *shrink,
>>>>                         unsigned long nr_pages_scanned,
>>>>                         unsigned long lru_pages)
>>>>  {
>>>> @@ -214,12 +214,6 @@ unsigned long shrink_slab(struct shrink_control *shrink,
>>>>       if (nr_pages_scanned == 0)
>>>>               nr_pages_scanned = SWAP_CLUSTER_MAX;
>>>>
>>>> -     if (!down_read_trylock(&shrinker_rwsem)) {
>>>> -             /* Assume we'll be able to shrink next time */
>>>> -             ret = 1;
>>>> -             goto out;
>>>> -     }
>>>> -
>>>>       list_for_each_entry(shrinker, &shrinker_list, list) {
>>>>               unsigned long long delta;
>>>>               long total_scan;
>>>> @@ -309,8 +303,41 @@ unsigned long shrink_slab(struct shrink_control *shrink,
>>>>
>>>>               trace_mm_shrink_slab_end(shrinker, shrink_ret, nr, new_nr);
>>>>       }
>>>> +
>>>> +     return ret;
>>>> +}
>>>> +
>>> It seems to me this will call all shrinkers, regardless of whether or
>>> not they are memcg-aware. Can't we just skip the ones we know not to be
>>> memcg-aware?  (basically all non-vfs for the moment...)
>>>
>>> My fear is that if called, they will shrink. And that may not be what we
>>> want.
>>
>> Are you suggesting to not shrink slabs other than dentry cache? Not
>> sure if that is what we want
>> neither. However, maybe we can do that for target reclaim though if
>> that is what you meant.
>>
>
> If the other shrinkers are not memcg aware, they will end up discarding
> random objects that may or may not have anything to do with the group
> under pressure, right?

The main contributor of the accounted slabs and also reclaimable are
vfs objects. Also we know dentry pins inode,
so I wonder how bad the problem would be. Do you have specific example
on which shrinker could cause the problem?

--Ying

> This sounds dangerous to the point I'd prefer not touching them at all.
>
> Obviously, having more memcg-aware shrinkers would void this concern.
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2012-08-17  5:56 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-08-16 20:53 [RFC PATCH 3/6] memcg: restructure shrink_slab to walk memcg hierarchy Ying Han
2012-08-17  5:38 ` Glauber Costa
2012-08-17  5:46   ` Ying Han
2012-08-17  5:45     ` Glauber Costa
2012-08-17  5:53       ` Ying Han
2012-08-17  5:53         ` Glauber Costa

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).