From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail138.messagelabs.com (mail138.messagelabs.com [216.82.249.35]) by kanga.kvack.org (Postfix) with ESMTP id ECBEA6B0012 for ; Fri, 27 May 2011 12:27:31 -0400 (EDT) Received: from wpaz9.hot.corp.google.com (wpaz9.hot.corp.google.com [172.24.198.73]) by smtp-out.google.com with ESMTP id p4RGRTxR016896 for ; Fri, 27 May 2011 09:27:29 -0700 Received: from qwk3 (qwk3.prod.google.com [10.241.195.131]) by wpaz9.hot.corp.google.com with ESMTP id p4RGRQeQ004541 (version=TLSv1/SSLv3 cipher=RC4-SHA bits=128 verify=NOT) for ; Fri, 27 May 2011 09:27:28 -0700 Received: by qwk3 with SMTP id 3so933554qwk.19 for ; Fri, 27 May 2011 09:27:26 -0700 (PDT) MIME-Version: 1.0 In-Reply-To: <20110527080417.GG3440@balbir.in.ibm.com> References: <1306444069-5094-1-git-send-email-yinghan@google.com> <20110527080417.GG3440@balbir.in.ibm.com> Date: Fri, 27 May 2011 09:27:25 -0700 Message-ID: Subject: Re: [PATCH] memcg: add pgfault latency histograms From: Ying Han Content-Type: multipart/alternative; boundary=002354470aa8c1bebc04a444696d Sender: owner-linux-mm@kvack.org List-ID: To: Balbir Singh Cc: KOSAKI Motohiro , Minchan Kim , Daisuke Nishimura , Tejun Heo , Pavel Emelyanov , KAMEZAWA Hiroyuki , Andrew Morton , Li Zefan , Mel Gorman , Christoph Lameter , Johannes Weiner , Rik van Riel , Hugh Dickins , Michal Hocko , Dave Hansen , Zhu Yanhai , "linux-mm@kvack.org" --002354470aa8c1bebc04a444696d Content-Type: text/plain; charset=ISO-8859-1 On Fri, May 27, 2011 at 1:04 AM, Balbir Singh wrote: > * Ying Han [2011-05-26 14:07:49]: > > > This adds histogram to capture pagefault latencies on per-memcg basis. I > used > > this patch on the memcg background reclaim test, and figured there could > be more > > usecases to monitor/debug application performance. > > > > The histogram is composed 8 bucket in ns unit. The last one is infinite > (inf) > > which is everything beyond the last one. To be more flexible, the buckets > can > > be reset and also each bucket is configurable at runtime. > > > > inf is a bit confusing for page faults -- no? Why not call it "rest" > or something line "> 38400". ok, i can change that to "rest". > BTW, why was 600 used as base? > well, that is based some of my experiments. I am doing anon page allocation and most of the page fault falls into the bucket of 580 ns - 600 ns. So I just leave it as default. However, the bucket is configurable and user can change it based on their workload and platform. > > > memory.pgfault_histogram: exports the histogram on per-memcg basis and > also can > > be reset by echoing "reset". Meantime, all the buckets are writable by > echoing > > the range into the API. see the example below. > > > > /proc/sys/vm/pgfault_histogram: the global sysfs tunablecan be used to > turn > > on/off recording the histogram. > > > > Why not make this per memcg? > That can be done. > > > Functional Test: > > Create a memcg with 10g hard_limit, running dd & allocate 8g anon page. > > Measure the anon page allocation latency. > > > > $ mkdir /dev/cgroup/memory/B > > $ echo 10g >/dev/cgroup/memory/B/memory.limit_in_bytes > > $ echo $$ >/dev/cgroup/memory/B/tasks > > $ dd if=/dev/zero of=/export/hdc3/dd/tf0 bs=1024 count=20971520 & > > $ allocate 8g anon pages > > > > $ echo 1 >/proc/sys/vm/pgfault_histogram > > > > $ cat /dev/cgroup/memory/B/memory.pgfault_histogram > > pgfault latency histogram (ns): > > < 600 2051273 > > < 1200 40859 > > < 2400 4004 > > < 4800 1605 > > < 9600 170 > > < 19200 82 > > < 38400 6 > > < inf 0 > > > > $ echo reset >/dev/cgroup/memory/B/memory.pgfault_histogram > > Can't we use something like "-1" to mean reset? > sounds good to me. Thank you for reviewing. --Ying > > -- > Three Cheers, > Balbir > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Fight unfair telecom internet charges in Canada: sign > http://stopthemeter.ca/ > Don't email: email@kvack.org > --002354470aa8c1bebc04a444696d Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable

On Fri, May 27, 2011 at 1:04 AM, Balbir = Singh <ba= lbir@linux.vnet.ibm.com> wrote:
* Ying Han <yinghan@google.com= > [2011-05-26 14:07:49]:

> This adds histogram to capture pagefault latencies on per-memcg basis.= I used
> this patch on the memcg background reclaim test, and figured there cou= ld be more
> usecases to monitor/debug application performance.
>
> The histogram is composed 8 bucket in ns unit. The last one is infinit= e (inf)
> which is everything beyond the last one. To be more flexible, the buck= ets can
> be reset and also each bucket is configurable at runtime.
>

inf is a bit confusing for page faults -- no? Why not call it "r= est"
or something line "> 38400".
=A0
ok= , i can change that to "rest".
=A0
BTW, why was 600 used as base?

well, th= at is based some of my experiments. I am doing anon page allocation and mos= t of the page fault falls into the bucket of 580 ns - 600 ns. So I just lea= ve it as default.=A0

However, the bucket is configurable and user can change= it based on their workload and platform.


=A0

> memory.pgfault_histogram: exports the histogram on per-memcg basis and= also can
> be reset by echoing "reset". Meantime, all the buckets are w= ritable by echoing
> the range into the API. see the example below.
>
> /proc/sys/vm/pgfault_histogram: the global sysfs tunablecan be used to= turn
> on/off recording the histogram.
>

Why not make this per memcg?

That= can be done.=A0

> Functional Test:
> Create a memcg with 10g hard_limit, running dd & allocate 8g anon = page.
> Measure the anon page allocation latency.
>
> $ mkdir /dev/cgroup/memory/B
> $ echo 10g >/dev/cgroup/memory/B/memory.limit_in_bytes
> $ echo $$ >/dev/cgroup/memory/B/tasks
> $ dd if=3D/dev/zero of=3D/export/hdc3/dd/tf0 bs=3D1024 count=3D2097152= 0 &
> $ allocate 8g anon pages
>
> $ echo 1 >/proc/sys/vm/pgfault_histogram
>
> $ cat /dev/cgroup/memory/B/memory.pgfault_histogram
> pgfault latency histogram (ns):
> < 600 =A0 =A0 =A0 =A0 =A0 =A02051273
> < 1200 =A0 =A0 =A0 =A0 =A0 40859
> < 2400 =A0 =A0 =A0 =A0 =A0 4004
> < 4800 =A0 =A0 =A0 =A0 =A0 1605
> < 9600 =A0 =A0 =A0 =A0 =A0 170
> < 19200 =A0 =A0 =A0 =A0 =A082
> < 38400 =A0 =A0 =A0 =A0 =A06
> < inf =A0 =A0 =A0 =A0 =A0 =A00
>
> $ echo reset >/dev/cgroup/memory/B/memory.pgfault_histogram

Can't we use something like "-1" to mean reset?

sounds good to me.

Th= ank you for reviewing.

--Ying=A0

--
=A0 =A0 =A0 =A0Three Cheers,
=A0 =A0 =A0 =A0Balbir

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.= =A0For more info on Linux MM,
see: http://www.linu= x-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=3Dmailto:"dont@kvack.org"> email@kva= ck.org </a>

--002354470aa8c1bebc04a444696d-- -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org