From mboxrd@z Thu Jan 1 00:00:00 1970 From: Glauber Costa Subject: per-cpu statistics Date: Thu, 28 Feb 2013 11:59:50 +0400 Message-ID: <512F0E76.2020707@parallels.com> Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Return-path: Sender: owner-linux-mm@kvack.org List-ID: Content-Type: text/plain; charset="us-ascii" To: KAMEZAWA Hiroyuki , Michal Hocko , "linux-mm@kvack.org" , Johannes Weiner , Tejun Heo , Cgroups , Mel Gorman Hi guys Please enlighten me regarding some historic aspect of memcg before I go changing something I shouldn't... Regarding memcg stats, is there any reason for us to use the current per-cpu implementation we have instead of a percpu_counter? We are doing something like this: get_online_cpus(); for_each_online_cpu(cpu) val += per_cpu(memcg->stat->count[idx], cpu); #ifdef CONFIG_HOTPLUG_CPU spin_lock(&memcg->pcp_counter_lock); val += memcg->nocpu_base.count[idx]; spin_unlock(&memcg->pcp_counter_lock); #endif put_online_cpus(); It seems to me that we are just re-implementing whatever percpu_counters already do, handling the complication ourselves. It surely is an array, and this keeps the fields together. But does it really matter? Did it come from some measurable result? I wouldn't touch it if it wouldn't be bothering me. But the reason I ask, is that I am resurrecting the patches to bypass the root cgroup charges when it is the only group in the system. For that, I would like to transfer charges from global, to our memcg equivalents. Things like MM_ANONPAGES are not percpu, though, and when I add it to the memcg percpu structures, I would have to somehow distribute them around. When we uncharge, that can become negative. percpu_counters already handle all that, and then can cope well with temporary negative charges in the percpu data, that is later on withdrawn from the main base counter. We are counting pages, so the fact that we're restricted to only half of the 64-bit range in percpu counters doesn't seem to be that much of a problem. If this is just a historic leftover, I can replace them all with percpu_counters. Any words on that ? -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 From: Sha Zhengju Subject: Re: per-cpu statistics Date: Fri, 1 Mar 2013 21:48:44 +0800 Message-ID: References: <512F0E76.2020707@parallels.com> Mime-Version: 1.0 Return-path: DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:x-received:in-reply-to:references:date:message-id :subject:from:to:cc:content-type; bh=jazkp4k9AGhdJgS88KVzpKdcxd4p4qgDLVUzsuMBG9U=; b=nzvenJQDSY5S5V/p9sAcrrmyfvJApE0y0R4zXxldopUqAgv8TCtj4Xm4oMVpoZCHQ4 nfnK+/Hv6XihQ9AsojokjHHiR/ac9Jf63OBgD1wgidTOvlFioCYYDTXbPcl1K3ALT4j3 GSWEkQCBxMf5UDfkTFp2EQpGbKQzqo7/Qc/tM2cuUfXt1E6O3yM/iuhDM6OV19gzDJpd SMuUTkVNhD6HpM+KgZ4TBBtWv86LJTpxTrw9o/D9+5LPQ9rdRPkiYWUjPrtsqC7XBLJm ifdMTgRevGY1qSxtiI0NaWo/Hf/LAbx0XTY8nmoBS6BWrQTs2PIn1ETewYuGo63nn8bN npDw== In-Reply-To: <512F0E76.2020707-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org> Sender: cgroups-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org List-ID: Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: Glauber Costa Cc: KAMEZAWA Hiroyuki , Michal Hocko , "linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org" , Johannes Weiner , Tejun Heo , Cgroups , Mel Gorman Hi Glauber, Forgive me, I'm replying not because I know the reason of current per-cpu implementation but that I notice you're mentioning something I'm also interested in. Below is the detail. On Thu, Feb 28, 2013 at 3:59 PM, Glauber Costa wrote: > Hi guys > > Please enlighten me regarding some historic aspect of memcg before I go > changing something I shouldn't... > > Regarding memcg stats, is there any reason for us to use the current > per-cpu implementation we have instead of a percpu_counter? > > We are doing something like this: > > get_online_cpus(); > for_each_online_cpu(cpu) > val += per_cpu(memcg->stat->count[idx], cpu); > #ifdef CONFIG_HOTPLUG_CPU > spin_lock(&memcg->pcp_counter_lock); > val += memcg->nocpu_base.count[idx]; > spin_unlock(&memcg->pcp_counter_lock); > #endif > put_online_cpus(); > > It seems to me that we are just re-implementing whatever percpu_counters > already do, handling the complication ourselves. > > It surely is an array, and this keeps the fields together. But does it > really matter? Did it come from some measurable result? > > I wouldn't touch it if it wouldn't be bothering me. But the reason I > ask, is that I am resurrecting the patches to bypass the root cgroup > charges when it is the only group in the system. For that, I would like > to transfer charges from global, to our memcg equivalents. I'm not sure I fully understand your points, root memcg now don't charge page already and only do some page stat accounting(CACHE/RSS/SWAP). Now I'm also trying to do some optimization specific to the overhead of root memcg stat accounting, and the first attempt is posted here: https://lkml.org/lkml/2013/1/2/71 . But it only covered FILE_MAPPED/DIRTY/WRITEBACK(I've add the last two accounting in that patchset) and Michal Hock accepted the approach (so did Kame) and suggested I should handle all the stats in the same way including CACHE/RSS. But I do not handle things related to memcg LRU where I notice you have done some work. It's possible that we may take different ways to bypass root memcg stat accounting. The next round of the part will be sent out in following few days(doing some tests now), and for myself any comments and collaboration are welcome. (Glad to cc to you of course if you're also interest in it. :) ) Many thanks! > Things like MM_ANONPAGES are not percpu, though, and when I add it to > the memcg percpu structures, I would have to somehow distribute them > around. When we uncharge, that can become negative. > > percpu_counters already handle all that, and then can cope well with > temporary negative charges in the percpu data, that is later on > withdrawn from the main base counter. > > We are counting pages, so the fact that we're restricted to only half of > the 64-bit range in percpu counters doesn't seem to be that much of a > problem. > > If this is just a historic leftover, I can replace them all with > percpu_counters. Any words on that ? > > > > -- > To unsubscribe from this list: send the line "unsubscribe cgroups" in > the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Thanks, Sha From mboxrd@z Thu Jan 1 00:00:00 1970 From: Kamezawa Hiroyuki Subject: Re: per-cpu statistics Date: Mon, 04 Mar 2013 09:55:25 +0900 Message-ID: <5133F0FD.3040501@jp.fujitsu.com> References: <512F0E76.2020707@parallels.com> Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <512F0E76.2020707-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org> Sender: cgroups-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org List-ID: Content-Type: text/plain; charset="us-ascii"; format="flowed" To: Glauber Costa Cc: Michal Hocko , "linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org" , Johannes Weiner , Tejun Heo , Cgroups , Mel Gorman (2013/02/28 16:59), Glauber Costa wrote: > Hi guys > > Please enlighten me regarding some historic aspect of memcg before I go > changing something I shouldn't... > > Regarding memcg stats, is there any reason for us to use the current > per-cpu implementation we have instead of a percpu_counter? > > We are doing something like this: > > get_online_cpus(); > for_each_online_cpu(cpu) > val += per_cpu(memcg->stat->count[idx], cpu); > #ifdef CONFIG_HOTPLUG_CPU > spin_lock(&memcg->pcp_counter_lock); > val += memcg->nocpu_base.count[idx]; > spin_unlock(&memcg->pcp_counter_lock); > #endif > put_online_cpus(); > > It seems to me that we are just re-implementing whatever percpu_counters > already do, handling the complication ourselves. > > It surely is an array, and this keeps the fields together. But does it > really matter? Did it come from some measurable result? > > I wouldn't touch it if it wouldn't be bothering me. But the reason I > ask, is that I am resurrecting the patches to bypass the root cgroup > charges when it is the only group in the system. For that, I would like > to transfer charges from global, to our memcg equivalents. > > Things like MM_ANONPAGES are not percpu, though, and when I add it to > the memcg percpu structures, I would have to somehow distribute them > around. When we uncharge, that can become negative. > > percpu_counters already handle all that, and then can cope well with > temporary negative charges in the percpu data, that is later on > withdrawn from the main base counter. > > We are counting pages, so the fact that we're restricted to only half of > the 64-bit range in percpu counters doesn't seem to be that much of a > problem. > > If this is just a historic leftover, I can replace them all with > percpu_counters. Any words on that ? > An reason I didn't like percpu_counter *was* its memory layout. == struct percpu_counter { raw_spinlock_t lock; s64 count; #ifdef CONFIG_HOTPLUG_CPU struct list_head list; /* All percpu_counters are on a list */ #endif s32 __percpu *counters; }; == Assume we have counters in an array, then, we'll have lock count list pointer lock count list pointer .... An counter's lock ops will invalidate pointers in the array. We tend to update several counters at once. If you measure performance on enough large SMP and it looks good, I think it's ok to go with lib/percpu_counter.c. Thanks, -Kame From mboxrd@z Thu Jan 1 00:00:00 1970 From: Tejun Heo Subject: Re: per-cpu statistics Date: Sun, 3 Mar 2013 17:01:42 -0800 Message-ID: <20130304010142.GE3678@htj.dyndns.org> References: <512F0E76.2020707@parallels.com> <5133F0FD.3040501@jp.fujitsu.com> Mime-Version: 1.0 Return-path: DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=x-received:sender:date:from:to:cc:subject:message-id:references :mime-version:content-type:content-disposition:in-reply-to :user-agent; bh=aXTfmsU2jKmZWhmTOE7hOA2IeI4v1wgQ+Zk4FDXXRWU=; b=zk1qOKpB13ZoMR7hBPAjKA3m0sPYD0eWRMPu63nLsaSVgexOHmii+hdc3HB7LBbyNm 0Bz41pVlesjH5wcFsv35pRY9GfplqE2iJL/4lDzvhNC+Fj8KkhV9ca4DkZmbhMTDDv9O JvyFltIeX8yJm9r3stgAt5gFdhu804b/K56nx2NCy75riRPSQfkl6U+XTw45EpUa0LTZ pH1rS0+JS4L3oQx8q+BdSwy0mOyWgqSGqrzf3cIxzdgzM4MUbKPtE3ZhnVe5UnGCaI6k +/Xvbz6sQCf0zYNbBaIeKWnOUqM1ZHy0PegUcB6GnXD496dA+1jvgT9/qEfFclIfFkdB VvDA== Content-Disposition: inline In-Reply-To: <5133F0FD.3040501-+CUm20s59erQFUHtdCDX3A@public.gmane.org> Sender: cgroups-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org List-ID: Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: Kamezawa Hiroyuki Cc: Glauber Costa , Michal Hocko , "linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org" , Johannes Weiner , Cgroups , Mel Gorman Hello, On Mon, Mar 04, 2013 at 09:55:25AM +0900, Kamezawa Hiroyuki wrote: > An reason I didn't like percpu_counter *was* its memory layout. > > == > struct percpu_counter { > raw_spinlock_t lock; > s64 count; > #ifdef CONFIG_HOTPLUG_CPU > struct list_head list; /* All percpu_counters are on a list */ > #endif > s32 __percpu *counters; > }; > == > > Assume we have counters in an array, then, we'll have > > lock > count > list > pointer > lock > count > list > pointer > .... > > An counter's lock ops will invalidate pointers in the array. > We tend to update several counters at once. I agree that percpu_counter leaves quite a bit to be desired. It would be great if we can implement generic percpu stats facility which takes care of aggregating the values periodically preferably with provisions to limit the amount of deviation global counter may reach. Thansk. -- tejun From mboxrd@z Thu Jan 1 00:00:00 1970 From: Glauber Costa Subject: Re: per-cpu statistics Date: Mon, 4 Mar 2013 11:25:11 +0400 Message-ID: <51344C57.7030807@parallels.com> References: <512F0E76.2020707@parallels.com> Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: Sender: cgroups-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org List-ID: Content-Type: text/plain; charset="us-ascii" To: Sha Zhengju Cc: KAMEZAWA Hiroyuki , Michal Hocko , "linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org" , Johannes Weiner , Tejun Heo , Cgroups , Mel Gorman On 03/01/2013 05:48 PM, Sha Zhengju wrote: > Hi Glauber, > > Forgive me, I'm replying not because I know the reason of current > per-cpu implementation but that I notice you're mentioning something > I'm also interested in. Below is the detail. > > > I'm not sure I fully understand your points, root memcg now don't > charge page already and only do some page stat > accounting(CACHE/RSS/SWAP). Can you point me to the final commits of this in the tree? I am using the latest git mm from mhocko and it is not entirely clear for me what are you talking about. > Now I'm also trying to do some > optimization specific to the overhead of root memcg stat accounting, > and the first attempt is posted here: > https://lkml.org/lkml/2013/1/2/71 . But it only covered > FILE_MAPPED/DIRTY/WRITEBACK(I've add the last two accounting in that > patchset) and Michal Hock accepted the approach (so did Kame) and > suggested I should handle all the stats in the same way including > CACHE/RSS. But I do not handle things related to memcg LRU where I > notice you have done some work. > Yes, LRU is a bit tricky and it is what is keeping me from posting the patchset I have. I haven't fully done it, but I am on my way. > It's possible that we may take different ways to bypass root memcg > stat accounting. The next round of the part will be sent out in > following few days(doing some tests now), and for myself any comments > and collaboration are welcome. (Glad to cc to you of course if you're > also interest in it. :) ) > I am interested, of course. As you know, I started to work on this a while ago and had to interrupt it for a while. I resumed it last week, but if you managed to merge something already, I'd happy to rebase. From mboxrd@z Thu Jan 1 00:00:00 1970 From: Sha Zhengju Subject: Re: per-cpu statistics Date: Tue, 5 Mar 2013 15:17:30 +0800 Message-ID: References: <512F0E76.2020707@parallels.com> <51344C57.7030807@parallels.com> Mime-Version: 1.0 Return-path: DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:x-received:in-reply-to:references:date:message-id :subject:from:to:cc:content-type; bh=Arp5+Nc2drqXNk801/QLkjSO7vz9ZbhHTUqlLOAfrSw=; b=yh0ZKNX+6jPNn5/Cwc3B05nJUsMvH5BpgMQRjney03s46umFwX4oFYtvEZye2nvFaz 34D+XtTuxscVUobEke0lqcRe1IFMjoUW1O/556ZHe2l8IuSw6PNghNrDzmtFFa/uDX5v 6WBIwyIk6kMoB4OKVreLL1EAcdoieBWpDYsxx2ZWb1hgV+mLqXjDUBm/1oO0zDvlQSCc vJD0DxOXdQG3g6Mu0FnF/hkhirERJo8Sv7uXlZEeLLqqyX7qvcd78VAG/wJCMfOzs+ud hJ/Dr6CAW2iyxTMmQNCtj+YFl37CNndKs3b2Ui1SiOfTSGC7MxzRfNm+DPUohxU+aXa6 Qdqg== In-Reply-To: <51344C57.7030807@parallels.com> Sender: owner-linux-mm@kvack.org List-ID: Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: Glauber Costa Cc: KAMEZAWA Hiroyuki , Michal Hocko , "linux-mm@kvack.org" , Johannes Weiner , Tejun Heo , Cgroups , Mel Gorman On Mon, Mar 4, 2013 at 3:25 PM, Glauber Costa wrote: > On 03/01/2013 05:48 PM, Sha Zhengju wrote: >> Hi Glauber, >> >> Forgive me, I'm replying not because I know the reason of current >> per-cpu implementation but that I notice you're mentioning something >> I'm also interested in. Below is the detail. >> >> >> I'm not sure I fully understand your points, root memcg now don't >> charge page already and only do some page stat >> accounting(CACHE/RSS/SWAP). > > Can you point me to the final commits of this in the tree? I am using > the latest git mm from mhocko and it is not entirely clear for me what > are you talking about. Sorry, maybe my "root memcg charge" is confusing. What I mean is that root memcg don't do resource counter charge ( mem_cgroup_is_root() checking in __mem_cgroup_try_charge()) but still need to do other works(in __mem_cgroup_commit_charge): set pc->mem_cgroup, SetPageCgroupUsed, and account memcg page statistics such as CACHE/RSS. Btw. the original commit is 0c3e73e84f(memcg: improve resource counter scalability), but it has been drastically modified now. : ) > >> Now I'm also trying to do some >> optimization specific to the overhead of root memcg stat accounting, >> and the first attempt is posted here: >> https://lkml.org/lkml/2013/1/2/71 . But it only covered >> FILE_MAPPED/DIRTY/WRITEBACK(I've add the last two accounting in that >> patchset) and Michal Hock accepted the approach (so did Kame) and >> suggested I should handle all the stats in the same way including >> CACHE/RSS. But I do not handle things related to memcg LRU where I >> notice you have done some work. >> > Yes, LRU is a bit tricky and it is what is keeping me from posting the > patchset I have. I haven't fully done it, but I am on my way. > > >> It's possible that we may take different ways to bypass root memcg >> stat accounting. The next round of the part will be sent out in >> following few days(doing some tests now), and for myself any comments >> and collaboration are welcome. (Glad to cc to you of course if you're >> also interest in it. :) ) >> > > I am interested, of course. As you know, I started to work on this a > while ago and had to interrupt it for a while. I resumed it last week, > but if you managed to merge something already, I'd happy to rebase. > I do appreciate your support! Thanks! Regards, Sha -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx128.postini.com [74.125.245.128]) by kanga.kvack.org (Postfix) with SMTP id 7D4DF6B0002 for ; Fri, 1 Mar 2013 08:48:46 -0500 (EST) Received: by mail-bk0-f51.google.com with SMTP id ik5so1374236bkc.38 for ; Fri, 01 Mar 2013 05:48:44 -0800 (PST) MIME-Version: 1.0 In-Reply-To: <512F0E76.2020707@parallels.com> References: <512F0E76.2020707@parallels.com> Date: Fri, 1 Mar 2013 21:48:44 +0800 Message-ID: Subject: Re: per-cpu statistics From: Sha Zhengju Content-Type: text/plain; charset=ISO-8859-1 Sender: owner-linux-mm@kvack.org List-ID: To: Glauber Costa Cc: KAMEZAWA Hiroyuki , Michal Hocko , "linux-mm@kvack.org" , Johannes Weiner , Tejun Heo , Cgroups , Mel Gorman Hi Glauber, Forgive me, I'm replying not because I know the reason of current per-cpu implementation but that I notice you're mentioning something I'm also interested in. Below is the detail. On Thu, Feb 28, 2013 at 3:59 PM, Glauber Costa wrote: > Hi guys > > Please enlighten me regarding some historic aspect of memcg before I go > changing something I shouldn't... > > Regarding memcg stats, is there any reason for us to use the current > per-cpu implementation we have instead of a percpu_counter? > > We are doing something like this: > > get_online_cpus(); > for_each_online_cpu(cpu) > val += per_cpu(memcg->stat->count[idx], cpu); > #ifdef CONFIG_HOTPLUG_CPU > spin_lock(&memcg->pcp_counter_lock); > val += memcg->nocpu_base.count[idx]; > spin_unlock(&memcg->pcp_counter_lock); > #endif > put_online_cpus(); > > It seems to me that we are just re-implementing whatever percpu_counters > already do, handling the complication ourselves. > > It surely is an array, and this keeps the fields together. But does it > really matter? Did it come from some measurable result? > > I wouldn't touch it if it wouldn't be bothering me. But the reason I > ask, is that I am resurrecting the patches to bypass the root cgroup > charges when it is the only group in the system. For that, I would like > to transfer charges from global, to our memcg equivalents. I'm not sure I fully understand your points, root memcg now don't charge page already and only do some page stat accounting(CACHE/RSS/SWAP). Now I'm also trying to do some optimization specific to the overhead of root memcg stat accounting, and the first attempt is posted here: https://lkml.org/lkml/2013/1/2/71 . But it only covered FILE_MAPPED/DIRTY/WRITEBACK(I've add the last two accounting in that patchset) and Michal Hock accepted the approach (so did Kame) and suggested I should handle all the stats in the same way including CACHE/RSS. But I do not handle things related to memcg LRU where I notice you have done some work. It's possible that we may take different ways to bypass root memcg stat accounting. The next round of the part will be sent out in following few days(doing some tests now), and for myself any comments and collaboration are welcome. (Glad to cc to you of course if you're also interest in it. :) ) Many thanks! > Things like MM_ANONPAGES are not percpu, though, and when I add it to > the memcg percpu structures, I would have to somehow distribute them > around. When we uncharge, that can become negative. > > percpu_counters already handle all that, and then can cope well with > temporary negative charges in the percpu data, that is later on > withdrawn from the main base counter. > > We are counting pages, so the fact that we're restricted to only half of > the 64-bit range in percpu counters doesn't seem to be that much of a > problem. > > If this is just a historic leftover, I can replace them all with > percpu_counters. Any words on that ? > > > > -- > To unsubscribe from this list: send the line "unsubscribe cgroups" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Thanks, Sha -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx182.postini.com [74.125.245.182]) by kanga.kvack.org (Postfix) with SMTP id 4AD3A6B0002 for ; Sun, 3 Mar 2013 19:55:57 -0500 (EST) Received: from m2.gw.fujitsu.co.jp (unknown [10.0.50.72]) by fgwmail5.fujitsu.co.jp (Postfix) with ESMTP id 5022244DD81 for ; Mon, 4 Mar 2013 09:55:54 +0900 (JST) Received: from smail (m2 [127.0.0.1]) by outgoing.m2.gw.fujitsu.co.jp (Postfix) with ESMTP id 39AF645DE4E for ; Mon, 4 Mar 2013 09:55:54 +0900 (JST) Received: from s2.gw.fujitsu.co.jp (s2.gw.fujitsu.co.jp [10.0.50.92]) by m2.gw.fujitsu.co.jp (Postfix) with ESMTP id 16BFD45DE4D for ; Mon, 4 Mar 2013 09:55:54 +0900 (JST) Received: from s2.gw.fujitsu.co.jp (localhost.localdomain [127.0.0.1]) by s2.gw.fujitsu.co.jp (Postfix) with ESMTP id 08752E08001 for ; Mon, 4 Mar 2013 09:55:54 +0900 (JST) Received: from m1001.s.css.fujitsu.com (m1001.s.css.fujitsu.com [10.240.81.139]) by s2.gw.fujitsu.co.jp (Postfix) with ESMTP id B5EF61DB802C for ; Mon, 4 Mar 2013 09:55:53 +0900 (JST) Message-ID: <5133F0FD.3040501@jp.fujitsu.com> Date: Mon, 04 Mar 2013 09:55:25 +0900 From: Kamezawa Hiroyuki MIME-Version: 1.0 Subject: Re: per-cpu statistics References: <512F0E76.2020707@parallels.com> In-Reply-To: <512F0E76.2020707@parallels.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Glauber Costa Cc: Michal Hocko , "linux-mm@kvack.org" , Johannes Weiner , Tejun Heo , Cgroups , Mel Gorman (2013/02/28 16:59), Glauber Costa wrote: > Hi guys > > Please enlighten me regarding some historic aspect of memcg before I go > changing something I shouldn't... > > Regarding memcg stats, is there any reason for us to use the current > per-cpu implementation we have instead of a percpu_counter? > > We are doing something like this: > > get_online_cpus(); > for_each_online_cpu(cpu) > val += per_cpu(memcg->stat->count[idx], cpu); > #ifdef CONFIG_HOTPLUG_CPU > spin_lock(&memcg->pcp_counter_lock); > val += memcg->nocpu_base.count[idx]; > spin_unlock(&memcg->pcp_counter_lock); > #endif > put_online_cpus(); > > It seems to me that we are just re-implementing whatever percpu_counters > already do, handling the complication ourselves. > > It surely is an array, and this keeps the fields together. But does it > really matter? Did it come from some measurable result? > > I wouldn't touch it if it wouldn't be bothering me. But the reason I > ask, is that I am resurrecting the patches to bypass the root cgroup > charges when it is the only group in the system. For that, I would like > to transfer charges from global, to our memcg equivalents. > > Things like MM_ANONPAGES are not percpu, though, and when I add it to > the memcg percpu structures, I would have to somehow distribute them > around. When we uncharge, that can become negative. > > percpu_counters already handle all that, and then can cope well with > temporary negative charges in the percpu data, that is later on > withdrawn from the main base counter. > > We are counting pages, so the fact that we're restricted to only half of > the 64-bit range in percpu counters doesn't seem to be that much of a > problem. > > If this is just a historic leftover, I can replace them all with > percpu_counters. Any words on that ? > An reason I didn't like percpu_counter *was* its memory layout. == struct percpu_counter { raw_spinlock_t lock; s64 count; #ifdef CONFIG_HOTPLUG_CPU struct list_head list; /* All percpu_counters are on a list */ #endif s32 __percpu *counters; }; == Assume we have counters in an array, then, we'll have lock count list pointer lock count list pointer .... An counter's lock ops will invalidate pointers in the array. We tend to update several counters at once. If you measure performance on enough large SMP and it looks good, I think it's ok to go with lib/percpu_counter.c. Thanks, -Kame -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx189.postini.com [74.125.245.189]) by kanga.kvack.org (Postfix) with SMTP id 220096B0002 for ; Sun, 3 Mar 2013 20:01:48 -0500 (EST) Received: by mail-pa0-f48.google.com with SMTP id hz10so2808611pad.7 for ; Sun, 03 Mar 2013 17:01:47 -0800 (PST) Date: Sun, 3 Mar 2013 17:01:42 -0800 From: Tejun Heo Subject: Re: per-cpu statistics Message-ID: <20130304010142.GE3678@htj.dyndns.org> References: <512F0E76.2020707@parallels.com> <5133F0FD.3040501@jp.fujitsu.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <5133F0FD.3040501@jp.fujitsu.com> Sender: owner-linux-mm@kvack.org List-ID: To: Kamezawa Hiroyuki Cc: Glauber Costa , Michal Hocko , "linux-mm@kvack.org" , Johannes Weiner , Cgroups , Mel Gorman Hello, On Mon, Mar 04, 2013 at 09:55:25AM +0900, Kamezawa Hiroyuki wrote: > An reason I didn't like percpu_counter *was* its memory layout. > > == > struct percpu_counter { > raw_spinlock_t lock; > s64 count; > #ifdef CONFIG_HOTPLUG_CPU > struct list_head list; /* All percpu_counters are on a list */ > #endif > s32 __percpu *counters; > }; > == > > Assume we have counters in an array, then, we'll have > > lock > count > list > pointer > lock > count > list > pointer > .... > > An counter's lock ops will invalidate pointers in the array. > We tend to update several counters at once. I agree that percpu_counter leaves quite a bit to be desired. It would be great if we can implement generic percpu stats facility which takes care of aggregating the values periodically preferably with provisions to limit the amount of deviation global counter may reach. Thansk. -- tejun -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx149.postini.com [74.125.245.149]) by kanga.kvack.org (Postfix) with SMTP id 090416B0002 for ; Mon, 4 Mar 2013 02:24:49 -0500 (EST) Message-ID: <51344C57.7030807@parallels.com> Date: Mon, 4 Mar 2013 11:25:11 +0400 From: Glauber Costa MIME-Version: 1.0 Subject: Re: per-cpu statistics References: <512F0E76.2020707@parallels.com> In-Reply-To: Content-Type: text/plain; charset="ISO-8859-1" Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Sha Zhengju Cc: KAMEZAWA Hiroyuki , Michal Hocko , "linux-mm@kvack.org" , Johannes Weiner , Tejun Heo , Cgroups , Mel Gorman On 03/01/2013 05:48 PM, Sha Zhengju wrote: > Hi Glauber, > > Forgive me, I'm replying not because I know the reason of current > per-cpu implementation but that I notice you're mentioning something > I'm also interested in. Below is the detail. > > > I'm not sure I fully understand your points, root memcg now don't > charge page already and only do some page stat > accounting(CACHE/RSS/SWAP). Can you point me to the final commits of this in the tree? I am using the latest git mm from mhocko and it is not entirely clear for me what are you talking about. > Now I'm also trying to do some > optimization specific to the overhead of root memcg stat accounting, > and the first attempt is posted here: > https://lkml.org/lkml/2013/1/2/71 . But it only covered > FILE_MAPPED/DIRTY/WRITEBACK(I've add the last two accounting in that > patchset) and Michal Hock accepted the approach (so did Kame) and > suggested I should handle all the stats in the same way including > CACHE/RSS. But I do not handle things related to memcg LRU where I > notice you have done some work. > Yes, LRU is a bit tricky and it is what is keeping me from posting the patchset I have. I haven't fully done it, but I am on my way. > It's possible that we may take different ways to bypass root memcg > stat accounting. The next round of the part will be sent out in > following few days(doing some tests now), and for myself any comments > and collaboration are welcome. (Glad to cc to you of course if you're > also interest in it. :) ) > I am interested, of course. As you know, I started to work on this a while ago and had to interrupt it for a while. I resumed it last week, but if you managed to merge something already, I'd happy to rebase. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org