From: Konstantin Khlebnikov <khlebnikov@openvz.org>
To: Michal Hocko <mhocko@suse.cz>
Cc: Sha Zhengju <handai.szj@gmail.com>,
cgroups@vger.kernel.org, linux-mm@kvack.org,
kamezawa.hiroyu@jp.fujitsu.com, akpm@linux-foundation.org,
hughd@google.com, gthelen@google.com,
Sha Zhengju <handai.szj@taobao.com>
Subject: Re: [PATCH V2 0/3] memcg: simply lock of page stat accounting
Date: Fri, 17 May 2013 09:57:37 +0400 [thread overview]
Message-ID: <5195C6D1.6040005@openvz.org> (raw)
In-Reply-To: <20130516132846.GE13848@dhcp22.suse.cz>
Michal Hocko wrote:
> On Thu 16-05-13 08:28:33, Konstantin Khlebnikov wrote:
>> Michal Hocko wrote:
>>> On Wed 15-05-13 16:35:08, Konstantin Khlebnikov wrote:
>>>> Sha Zhengju wrote:
>>>>> Hi,
>>>>>
>>>>> This is my second attempt to make memcg page stat lock simpler, the
>>>>> first version: http://www.spinics.net/lists/linux-mm/msg50037.html.
>>>>>
>>>>> In this version I investigate the potential race conditions among
>>>>> page stat, move_account, charge, uncharge and try to prove it race
>>>>> safe of my proposing lock scheme. The first patch is the basis of
>>>>> the patchset, so if I've made some stupid mistake please do not
>>>>> hesitate to point it out.
>>>>
>>>> I have a provocational question. Who needs these numbers? I mean
>>>> per-cgroup nr_mapped and so on.
>>>
>>> Well, I guess it makes some sense to know how much page cache and anon
>>> memory is charged to the group. I am using that to monitor the per-group
>>> memory usage. I can imagine a even better coverage - something
>>> /proc/meminfo like.
>>>
>>
>> I think page counters from lru-vectors can give enough information for that.
>
> not for dirty and writeback data which is the next step.
I think tracking dirty and writeback pages in per-inode manner is much more useful.
If there is only one cgroup per inode who responds for all dirtied pages we can use this
hint during writeback process to account disk operations and throttle tasks in that cgroup.
This approach allows to easily implement effective IO bandwidth controller in the VFS layer.
Actually we did this in our commercial product, feature called 'iolimits' works exactly in this
way. Unlike to blkcg this disk bandwidth controller doesn't suffer from priority inversion
bugs related to fs journal, and it works for non-disk filesystems like NFS and FUSE.
This is something like 'balance-dirty-pages' on steroids which also can handle read
operations and can take IOPS counters into account.
>
>> If somebody needs more detailed information there are enough ways to get it.
>> Amount of mapped pages can be estimated via summing rss counters from mm-structs.
>> Exact numbers can be obtained via examining /proc/pid/pagemap.
>
> How do you find out whether given pages were charged to the group of
> interest - e.g. shared data or taks that has moved from a different
> group without move_at_immigrate?
For example we can export pages ownership and charging state via single file in proc,
something similar to /proc/kpageflags
BTW
In our kernel the memory controller tries to change page's ownership at first mmap and
at each page activation, probably it's worth to add this into mainline memcg too.
>
>> I don't think that simulating 'Mapped' line in /proc/mapfile is a worth reason
>> for adding such weird stuff into the rmap code on map/unmap paths.
>
> The accounting code is trying to be not intrusive as much as possible.
> This patchset makes it more complicated without a good reason and that
> is why it has been Nacked by me.
I think we can remove it or replace it with something different but much less intrusive,
if nobody strictly requires exactly this approach in managing 'mapped' pages counters.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2013-05-17 5:57 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-05-13 5:03 [PATCH V2 0/3] memcg: simply lock of page stat accounting Sha Zhengju
2013-05-13 5:04 ` [PATCH V2 1/3] memcg: rewrite the comment about race condition " Sha Zhengju
2013-05-13 5:05 ` [PATCH V2 2/3] memcg: alter mem_cgroup_{update,inc,dec}_page_stat() args to memcg pointer Sha Zhengju
2013-05-13 12:25 ` Michal Hocko
2013-05-14 9:00 ` Sha Zhengju
2013-05-14 9:10 ` Michal Hocko
2013-05-14 0:15 ` Kamezawa Hiroyuki
2013-05-14 9:03 ` Sha Zhengju
2013-05-13 5:05 ` [PATCH V2 3/3] memcg: simplify lock of memcg page stat account Sha Zhengju
2013-05-13 13:12 ` Michal Hocko
2013-05-13 13:38 ` Michal Hocko
2013-05-14 9:13 ` Sha Zhengju
2013-05-14 9:28 ` Michal Hocko
2013-05-14 8:35 ` Sha Zhengju
2013-05-14 0:41 ` [PATCH V2 0/3] memcg: simply lock of page stat accounting Kamezawa Hiroyuki
2013-05-14 7:13 ` Michal Hocko
2013-05-15 12:35 ` Konstantin Khlebnikov
2013-05-15 13:41 ` Michal Hocko
2013-05-16 4:28 ` Konstantin Khlebnikov
2013-05-16 13:28 ` Michal Hocko
2013-05-17 5:57 ` Konstantin Khlebnikov [this message]
2013-05-17 8:38 ` Michal Hocko
2013-05-17 10:29 ` Konstantin Khlebnikov
2013-05-17 12:53 ` Michal Hocko
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5195C6D1.6040005@openvz.org \
--to=khlebnikov@openvz.org \
--cc=akpm@linux-foundation.org \
--cc=cgroups@vger.kernel.org \
--cc=gthelen@google.com \
--cc=handai.szj@gmail.com \
--cc=handai.szj@taobao.com \
--cc=hughd@google.com \
--cc=kamezawa.hiroyu@jp.fujitsu.com \
--cc=linux-mm@kvack.org \
--cc=mhocko@suse.cz \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).