From: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
To: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: linux-mm@kvack.org, dhillf@gmail.com, rientjes@google.com,
mhocko@suse.cz, akpm@linux-foundation.org, hannes@cmpxchg.org,
linux-kernel@vger.kernel.org, cgroups@vger.kernel.org
Subject: Re: [PATCH -V7 07/14] mm/page_cgroup: Make page_cgroup point to the cgroup rather than the mem_cgroup
Date: Tue, 05 Jun 2012 12:40:43 +0900 [thread overview]
Message-ID: <4FCD7FBB.1000304@jp.fujitsu.com> (raw)
In-Reply-To: <87ehpu8o5z.fsf@skywalker.in.ibm.com>
(2012/06/05 11:53), Aneesh Kumar K.V wrote:
> Kamezawa Hiroyuki<kamezawa.hiroyu@jp.fujitsu.com> writes:
>
>> (2012/05/30 23:38), Aneesh Kumar K.V wrote:
>>> From: "Aneesh Kumar K.V"<aneesh.kumar@linux.vnet.ibm.com>
>>>
>>> We will use it later to make page_cgroup track the hugetlb cgroup information.
>>>
>>> Signed-off-by: Aneesh Kumar K.V<aneesh.kumar@linux.vnet.ibm.com>
>>> ---
>>> include/linux/mmzone.h | 2 +-
>>> include/linux/page_cgroup.h | 8 ++++----
>>> init/Kconfig | 4 ++++
>>> mm/Makefile | 3 ++-
>>> mm/memcontrol.c | 42 +++++++++++++++++++++++++-----------------
>>> 5 files changed, 36 insertions(+), 23 deletions(-)
>>>
>>> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
>>> index 2427706..2483cc5 100644
>>> --- a/include/linux/mmzone.h
>>> +++ b/include/linux/mmzone.h
>>> @@ -1052,7 +1052,7 @@ struct mem_section {
>>>
>>> /* See declaration of similar field in struct zone */
>>> unsigned long *pageblock_flags;
>>> -#ifdef CONFIG_CGROUP_MEM_RES_CTLR
>>> +#ifdef CONFIG_PAGE_CGROUP
>>> /*
>>> * If !SPARSEMEM, pgdat doesn't have page_cgroup pointer. We use
>>> * section. (see memcontrol.h/page_cgroup.h about this.)
>>> diff --git a/include/linux/page_cgroup.h b/include/linux/page_cgroup.h
>>> index a88cdba..7bbfe37 100644
>>> --- a/include/linux/page_cgroup.h
>>> +++ b/include/linux/page_cgroup.h
>>> @@ -12,7 +12,7 @@ enum {
>>> #ifndef __GENERATING_BOUNDS_H
>>> #include<generated/bounds.h>
>>>
>>> -#ifdef CONFIG_CGROUP_MEM_RES_CTLR
>>> +#ifdef CONFIG_PAGE_CGROUP
>>> #include<linux/bit_spinlock.h>
>>>
>>> /*
>>> @@ -24,7 +24,7 @@ enum {
>>> */
>>> struct page_cgroup {
>>> unsigned long flags;
>>> - struct mem_cgroup *mem_cgroup;
>>> + struct cgroup *cgroup;
>>> };
>>>
>>
>> This patch seems very bad.
>
> I had to change that to
>
> struct page_cgroup {
> unsigned long flags;
> struct cgroup_subsys_state *css;
> };
>
> to get memcg to work. We end up changing css.cgroup on cgroupfs mount/umount.
>
Hmm, then pointer to memcg can be calculated by this *css.
Ok to this.
>>
>> - What is the performance impact to memcg ? Doesn't this add extra overheads
>> to memcg lookup ?
>
> Considering that we are stashing cgroup_subsys_state, it should be a
> simple addition. I haven't measured the exact numbers. Do you have any
> suggestion on the tests I can run ?
>
copy-on-write, parallel page fault, file creation/deletion etc..
>> - Hugetlb reuquires much more smaller number of tracking information rather
>> than memcg requires. I guess you can record the information into page->private
>> if you want.
>
> So If we end up tracking page cgroup in struct page all these extra over
> head will go away. And in most case we would have both memcg and hugetlb
> enabled by default.
>
>> - This may prevent us from the work 'reducing size of page_cgroup'
>>
>
> by reducing you mean moving struct page_cgroup info to struct page
> itself ? If so this should not have any impact right ?
I'm not sure but....doesn't this change bring impact to rules around
(un)lock_page_cgroup() and pc->memcg overwriting algorithm ?
Let me think....but maybe discussing without patch was wrong. sorry.
>Most of the requirement of hugetlb should be similar to memcg.
>
Yes and No. hugetlb just requires 1/HUGEPAGE_SIZE of tracking information.
So, as Michal pointed out, if the user _really_ want to avoid
overheads of memcg, the effect cgroup_disable=memory should be kept.
If you use page_cgroup, you cannot save memory by the boot option.
This makes the points 'creating hugetlb only subsys for avoiding memcg overheads'
unclear. You don't need tracking information per page and it can be dynamically
allocated. Or please range-tracking as Michal proposed.
>> So, strong Nack to this. I guess you can use page->private or some entries in
>> struct page, you have many pages per accounting units. Please make an effort
>> to avoid using page_cgroup.
>>
>
> HugeTLB already use page->private of compound page head to track subpool
> pointer. So we won't be able to use page->private.
>
You can use other pages than head/tails.
For example,I think you have 512 pages per 2M pages.
Thanks,
-Kame
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
WARNING: multiple messages have this Message-ID (diff)
From: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
To: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: linux-mm@kvack.org, dhillf@gmail.com, rientjes@google.com,
mhocko@suse.cz, akpm@linux-foundation.org, hannes@cmpxchg.org,
linux-kernel@vger.kernel.org, cgroups@vger.kernel.org
Subject: Re: [PATCH -V7 07/14] mm/page_cgroup: Make page_cgroup point to the cgroup rather than the mem_cgroup
Date: Tue, 05 Jun 2012 12:40:43 +0900 [thread overview]
Message-ID: <4FCD7FBB.1000304@jp.fujitsu.com> (raw)
In-Reply-To: <87ehpu8o5z.fsf@skywalker.in.ibm.com>
(2012/06/05 11:53), Aneesh Kumar K.V wrote:
> Kamezawa Hiroyuki<kamezawa.hiroyu@jp.fujitsu.com> writes:
>
>> (2012/05/30 23:38), Aneesh Kumar K.V wrote:
>>> From: "Aneesh Kumar K.V"<aneesh.kumar@linux.vnet.ibm.com>
>>>
>>> We will use it later to make page_cgroup track the hugetlb cgroup information.
>>>
>>> Signed-off-by: Aneesh Kumar K.V<aneesh.kumar@linux.vnet.ibm.com>
>>> ---
>>> include/linux/mmzone.h | 2 +-
>>> include/linux/page_cgroup.h | 8 ++++----
>>> init/Kconfig | 4 ++++
>>> mm/Makefile | 3 ++-
>>> mm/memcontrol.c | 42 +++++++++++++++++++++++++-----------------
>>> 5 files changed, 36 insertions(+), 23 deletions(-)
>>>
>>> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
>>> index 2427706..2483cc5 100644
>>> --- a/include/linux/mmzone.h
>>> +++ b/include/linux/mmzone.h
>>> @@ -1052,7 +1052,7 @@ struct mem_section {
>>>
>>> /* See declaration of similar field in struct zone */
>>> unsigned long *pageblock_flags;
>>> -#ifdef CONFIG_CGROUP_MEM_RES_CTLR
>>> +#ifdef CONFIG_PAGE_CGROUP
>>> /*
>>> * If !SPARSEMEM, pgdat doesn't have page_cgroup pointer. We use
>>> * section. (see memcontrol.h/page_cgroup.h about this.)
>>> diff --git a/include/linux/page_cgroup.h b/include/linux/page_cgroup.h
>>> index a88cdba..7bbfe37 100644
>>> --- a/include/linux/page_cgroup.h
>>> +++ b/include/linux/page_cgroup.h
>>> @@ -12,7 +12,7 @@ enum {
>>> #ifndef __GENERATING_BOUNDS_H
>>> #include<generated/bounds.h>
>>>
>>> -#ifdef CONFIG_CGROUP_MEM_RES_CTLR
>>> +#ifdef CONFIG_PAGE_CGROUP
>>> #include<linux/bit_spinlock.h>
>>>
>>> /*
>>> @@ -24,7 +24,7 @@ enum {
>>> */
>>> struct page_cgroup {
>>> unsigned long flags;
>>> - struct mem_cgroup *mem_cgroup;
>>> + struct cgroup *cgroup;
>>> };
>>>
>>
>> This patch seems very bad.
>
> I had to change that to
>
> struct page_cgroup {
> unsigned long flags;
> struct cgroup_subsys_state *css;
> };
>
> to get memcg to work. We end up changing css.cgroup on cgroupfs mount/umount.
>
Hmm, then pointer to memcg can be calculated by this *css.
Ok to this.
>>
>> - What is the performance impact to memcg ? Doesn't this add extra overheads
>> to memcg lookup ?
>
> Considering that we are stashing cgroup_subsys_state, it should be a
> simple addition. I haven't measured the exact numbers. Do you have any
> suggestion on the tests I can run ?
>
copy-on-write, parallel page fault, file creation/deletion etc..
>> - Hugetlb reuquires much more smaller number of tracking information rather
>> than memcg requires. I guess you can record the information into page->private
>> if you want.
>
> So If we end up tracking page cgroup in struct page all these extra over
> head will go away. And in most case we would have both memcg and hugetlb
> enabled by default.
>
>> - This may prevent us from the work 'reducing size of page_cgroup'
>>
>
> by reducing you mean moving struct page_cgroup info to struct page
> itself ? If so this should not have any impact right ?
I'm not sure but....doesn't this change bring impact to rules around
(un)lock_page_cgroup() and pc->memcg overwriting algorithm ?
Let me think....but maybe discussing without patch was wrong. sorry.
>Most of the requirement of hugetlb should be similar to memcg.
>
Yes and No. hugetlb just requires 1/HUGEPAGE_SIZE of tracking information.
So, as Michal pointed out, if the user _really_ want to avoid
overheads of memcg, the effect cgroup_disable=memory should be kept.
If you use page_cgroup, you cannot save memory by the boot option.
This makes the points 'creating hugetlb only subsys for avoiding memcg overheads'
unclear. You don't need tracking information per page and it can be dynamically
allocated. Or please range-tracking as Michal proposed.
>> So, strong Nack to this. I guess you can use page->private or some entries in
>> struct page, you have many pages per accounting units. Please make an effort
>> to avoid using page_cgroup.
>>
>
> HugeTLB already use page->private of compound page head to track subpool
> pointer. So we won't be able to use page->private.
>
You can use other pages than head/tails.
For example,I think you have 512 pages per 2M pages.
Thanks,
-Kame
next prev parent reply other threads:[~2012-06-05 3:40 UTC|newest]
Thread overview: 86+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-05-30 14:38 [PATCH -V7 00/14] hugetlb: Add HugeTLB controller to control HugeTLB allocation Aneesh Kumar K.V
2012-05-30 14:38 ` Aneesh Kumar K.V
2012-05-30 14:38 ` [PATCH -V7 01/14] hugetlb: rename max_hstate to hugetlb_max_hstate Aneesh Kumar K.V
2012-05-30 14:38 ` Aneesh Kumar K.V
[not found] ` <1338388739-22919-2-git-send-email-aneesh.kumar-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
2012-05-31 0:48 ` Konrad Rzeszutek Wilk
2012-05-31 0:48 ` Konrad Rzeszutek Wilk
2012-05-31 0:48 ` Konrad Rzeszutek Wilk
2012-05-31 5:47 ` Aneesh Kumar K.V
2012-05-31 5:47 ` Aneesh Kumar K.V
2012-05-31 0:55 ` David Rientjes
2012-05-31 0:55 ` David Rientjes
2012-05-30 14:38 ` [PATCH -V7 02/14] hugetlbfs: don't use ERR_PTR with VM_FAULT* values Aneesh Kumar K.V
2012-05-30 14:38 ` Aneesh Kumar K.V
[not found] ` <1338388739-22919-3-git-send-email-aneesh.kumar-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
2012-05-31 0:54 ` Konrad Rzeszutek Wilk
2012-05-31 0:54 ` Konrad Rzeszutek Wilk
2012-05-31 0:54 ` Konrad Rzeszutek Wilk
2012-05-31 1:02 ` David Rientjes
2012-05-31 1:02 ` David Rientjes
2012-05-31 5:45 ` Aneesh Kumar K.V
2012-05-31 5:45 ` Aneesh Kumar K.V
2012-05-31 6:50 ` David Rientjes
2012-05-31 6:50 ` David Rientjes
2012-05-30 14:38 ` [PATCH -V7 03/14] hugetlbfs: add an inline helper for finding hstate index Aneesh Kumar K.V
2012-05-30 14:38 ` Aneesh Kumar K.V
2012-05-31 1:05 ` David Rientjes
2012-05-31 1:05 ` David Rientjes
2012-05-30 14:38 ` [PATCH -V7 04/14] hugetlb: use mmu_gather instead of a temporary linked list for accumulating pages Aneesh Kumar K.V
2012-05-30 14:38 ` Aneesh Kumar K.V
[not found] ` <1338388739-22919-5-git-send-email-aneesh.kumar-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
2012-05-31 1:56 ` David Rientjes
2012-05-31 1:56 ` David Rientjes
2012-05-31 1:56 ` David Rientjes
2012-05-31 5:35 ` Aneesh Kumar K.V
2012-05-31 5:35 ` Aneesh Kumar K.V
2012-05-30 14:38 ` [PATCH -V7 05/14] hugetlb: avoid taking i_mmap_mutex in unmap_single_vma() for hugetlb Aneesh Kumar K.V
2012-05-30 14:38 ` Aneesh Kumar K.V
[not found] ` <1338388739-22919-6-git-send-email-aneesh.kumar-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
2012-05-31 1:57 ` David Rientjes
2012-05-31 1:57 ` David Rientjes
2012-05-31 1:57 ` David Rientjes
2012-05-31 5:25 ` Aneesh Kumar K.V
2012-05-31 5:25 ` Aneesh Kumar K.V
2012-05-30 14:38 ` [PATCH -V7 06/14] hugetlb: simplify migrate_huge_page() Aneesh Kumar K.V
2012-05-30 14:38 ` Aneesh Kumar K.V
2012-05-30 14:38 ` [PATCH -V7 07/14] mm/page_cgroup: Make page_cgroup point to the cgroup rather than the mem_cgroup Aneesh Kumar K.V
2012-05-30 14:38 ` Aneesh Kumar K.V
[not found] ` <1338388739-22919-8-git-send-email-aneesh.kumar-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
2012-06-05 1:44 ` Kamezawa Hiroyuki
2012-06-05 1:44 ` Kamezawa Hiroyuki
2012-06-05 1:44 ` Kamezawa Hiroyuki
[not found] ` <4FCD648E.90709-+CUm20s59erQFUHtdCDX3A@public.gmane.org>
2012-06-05 2:53 ` Aneesh Kumar K.V
2012-06-05 2:53 ` Aneesh Kumar K.V
2012-06-05 2:53 ` Aneesh Kumar K.V
2012-06-05 3:40 ` Kamezawa Hiroyuki [this message]
2012-06-05 3:40 ` Kamezawa Hiroyuki
[not found] ` <4FCD7FBB.1000304-+CUm20s59erQFUHtdCDX3A@public.gmane.org>
2012-06-07 19:05 ` Aneesh Kumar K.V
2012-06-07 19:05 ` Aneesh Kumar K.V
2012-06-07 19:05 ` Aneesh Kumar K.V
2012-05-30 14:38 ` [PATCH -V7 08/14] hugetlbfs: add a list for tracking in-use HugeTLB pages Aneesh Kumar K.V
2012-05-30 14:38 ` Aneesh Kumar K.V
2012-05-30 14:38 ` [PATCH -V7 09/14] hugetlbfs: Make some static variables global Aneesh Kumar K.V
2012-05-30 14:38 ` Aneesh Kumar K.V
2012-05-30 14:38 ` [PATCH -V7 10/14] hugetlbfs: Add new HugeTLB cgroup Aneesh Kumar K.V
2012-05-30 14:38 ` Aneesh Kumar K.V
[not found] ` <1338388739-22919-11-git-send-email-aneesh.kumar-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
2012-05-31 1:19 ` Konrad Rzeszutek Wilk
2012-05-31 1:19 ` Konrad Rzeszutek Wilk
2012-05-31 1:19 ` Konrad Rzeszutek Wilk
[not found] ` <20120531011953.GE401-bi+AKbBUZKY6gyzm1THtWbp2dZbC/Bob@public.gmane.org>
2012-05-31 5:43 ` Aneesh Kumar K.V
2012-05-31 5:43 ` Aneesh Kumar K.V
2012-05-31 5:43 ` Aneesh Kumar K.V
[not found] ` <20120531054316.GD24855-6yE53ggjAfyuJw1Jpgb2kA4V1jybR7bhVpNB7YpNyf8@public.gmane.org>
2012-05-31 9:43 ` Michal Hocko
2012-05-31 9:43 ` Michal Hocko
2012-05-31 9:43 ` Michal Hocko
2012-05-31 14:01 ` Michal Hocko
2012-05-31 14:01 ` Michal Hocko
2012-05-31 14:01 ` Michal Hocko
2012-05-30 14:38 ` [PATCH -V7 11/14] hugetlbfs: add hugetlb cgroup control files Aneesh Kumar K.V
2012-05-30 14:38 ` Aneesh Kumar K.V
[not found] ` <1338388739-22919-12-git-send-email-aneesh.kumar-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
2012-05-31 1:32 ` Konrad Rzeszutek Wilk
2012-05-31 1:32 ` Konrad Rzeszutek Wilk
2012-05-31 1:32 ` Konrad Rzeszutek Wilk
2012-05-31 5:39 ` Aneesh Kumar K.V
2012-05-31 5:39 ` Aneesh Kumar K.V
2012-05-30 14:38 ` [PATCH -V7 12/14] hugetlb: add charge/uncharge calls for HugeTLB alloc/free Aneesh Kumar K.V
2012-05-30 14:38 ` Aneesh Kumar K.V
2012-05-30 14:38 ` [PATCH -V7 13/14] hugetlb: migrate hugetlb cgroup info from oldpage to new page during migration Aneesh Kumar K.V
2012-05-30 14:38 ` Aneesh Kumar K.V
2012-05-30 14:38 ` [PATCH -V7 14/14] hugetlb: add HugeTLB controller documentation Aneesh Kumar K.V
2012-05-30 14:38 ` Aneesh Kumar K.V
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4FCD7FBB.1000304@jp.fujitsu.com \
--to=kamezawa.hiroyu@jp.fujitsu.com \
--cc=akpm@linux-foundation.org \
--cc=aneesh.kumar@linux.vnet.ibm.com \
--cc=cgroups@vger.kernel.org \
--cc=dhillf@gmail.com \
--cc=hannes@cmpxchg.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@suse.cz \
--cc=rientjes@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.