From: Tao Ma <tm@tao.ma>
To: Ingo Molnar <mingo@elte.hu>
Cc: "Liu Yuan" <namei.unix@gmail.com>,
linux-kernel@vger.kernel.org, linux-mm@kvack.org,
jaxboe@fusionio.com, akpm@linux-foundation.org,
fengguang.wu@intel.com, "Peter Zijlstra" <a.p.zijlstra@chello.nl>,
"Frédéric Weisbecker" <fweisbec@gmail.com>,
"Steven Rostedt" <rostedt@goodmis.org>,
"Thomas Gleixner" <tglx@linutronix.de>,
"Arnaldo Carvalho de Melo" <acme@redhat.com>,
"Tom Zanussi" <tzanussi@gmail.com>
Subject: Re: [RFC PATCH 4/5] mm: Add hit/miss accounting for Page Cache
Date: Thu, 03 Mar 2011 23:08:00 +0800 [thread overview]
Message-ID: <4D6FAED0.5010000@tao.ma> (raw)
In-Reply-To: <20110303093422.GC18252@elte.hu>
On 03/03/2011 05:34 PM, Ingo Molnar wrote:
> * Tao Ma<tm@tao.ma> wrote:
>
>> On 03/02/2011 04:45 PM, Ingo Molnar wrote:
>>> * Liu Yuan<namei.unix@gmail.com> wrote:
>>>
>>>> + if (likely(!retry_find)&& page&& PageUptodate(page))
>>>> + page_cache_acct_hit(inode->i_sb, READ);
>>>> + else
>>>> + page_cache_acct_missed(inode->i_sb, READ);
>>> Sigh.
>>>
>>> This would make such a nice tracepoint or sw perf event. It could be collected in a
>>> 'count' form, equivalent to the stats you are aiming for here, or it could even be
>>> traced, if someone is interested in such details.
>>>
>>> It could be mixed with other events, enriching multiple apps at once.
>>>
>>> But, instead of trying to improve those aspects of our existing instrumentation
>>> frameworks, mm/* is gradually growing its own special instrumentation hacks, missing
>>> the big picture and fragmenting the instrumentation space some more.
>> Thanks for the quick response. Actually our team(including Liu) here are planing
>> to add some debug info to the mm parts for analyzing the application behavior and
>> hope to find some way to improve our application's performance. We have searched
>> the trace points in mm, but it seems to us that the trace points isn't quite
>> welcomed there. Only vmscan and writeback have some limited trace points added.
>> That's the reason we first tried to add some debug info like this patch. You does
>> shed some light on our direction. Thanks.
> Yes, it's very much a 'critical mass' phenomenon: the moment there's enough
> tracepoints, above some magic limit, things happen quickly and everyone finds the
> stuff obviously useful.
>
> Before that limit it's all pretty painful.
yeah.
>> btw, what part do you think is needed to add some trace point? We
>> volunteer to add more if you like.
> Whatever part you find useful in your daily development work!
>
> Tracepoints are pretty flexible. The bit that is missing and which is very important
> for the MM is the collapse into 'summaries' and the avoidance of tracing overhead
> when only a summary is wanted. Please see Wu Fengguang's reply in this thread about
> the 'dump state' facility he and Steve added to recover large statistics.
We are looking into it now. Thanks for the hint.
> I suspect the hit/miss histogram you are building in this patch could be recovered
> via that facility initially?
>
> The next step would generalize that approach - it is non-trivial but powerful :-)
>
> The idea is to allow non-trivial histograms and summaries to be built out of simple
> events, via the filter engine.
>
> It would require an extension of tracing to really allow a filter expression to be
> defined over existing events, which would allow the maintenance of a persistent
> 'sum' variable - probably within the perf ring-buffer. We already have filter
> support, that would have to be extended with a notion of 'persistent variables'.
>
> So right now, if you define a tracepoint in that spot, we already support such
> filter expressions:
>
> 'bdev == sda1&& page_state == PageUptodate'
>
> You can inject such filter expressions into /debug/tracing/events/*/*/filter today,
> and you can use filters in perf record --filter '...' as well.
>
> To implement 'fast statistics', the filter engine would have to be extended to
> support (simple) statements like:
>
> if (bdev == sda1&& page_state == PageUptodate)'
> var0++;
>
> And:
>
> if (bdev == sda1&& page_state != PageUptodate)'
> var1++;
>
> Only a very minimal type of C syntax would be supported - not a full C parser.
>
> That way the 'var0' portion of the perf ring-buffer (which would not be part of the
> regular, overwritten ring-buffer) would act as a 'hits' variable that you could
> recover. The 'var1' portion would be the 'misses' counter.
>
> Individual trace events would only twiddle var0 and var1 - they would not inject a
> full-blown event into the ring-buffer, so statistics would be very fast.
>
> This method is very extensible and could be used for far more things than just MM
> statistics. In theory all of /proc statistics collection could be replaced and made
> optional that way, just by adding the right events to the right spots in the kernel.
> That is obviously a very long-term project.
It looks really fantastic for us. OK, we will try to figure out when and
how we can work on this issue. Great thanks.
Regards,
Tao
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2011-03-03 15:08 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <no>
2011-03-02 8:38 ` [RFC PATCH 1/5] x86/Kconfig: Add Page Cache Accounting entry Liu Yuan
2011-03-02 16:24 ` Randy Dunlap
2011-03-03 3:11 ` Liu Yuan
2011-03-02 8:38 ` [RFC PATCH 2/5] block: Add functions and data types for Page Cache Accounting Liu Yuan
2011-03-02 8:38 ` [RFC PATCH 3/5] block: Make Page Cache counters work with sysfs Liu Yuan
2011-03-02 8:38 ` [RFC PATCH 4/5] mm: Add hit/miss accounting for Page Cache Liu Yuan
2011-03-02 8:45 ` Ingo Molnar
2011-03-02 17:02 ` Dave Hansen
2011-03-02 18:49 ` Ingo Molnar
2011-03-03 0:33 ` Wu Fengguang
2011-03-03 2:01 ` KOSAKI Motohiro
2011-03-03 3:14 ` Tao Ma
2011-03-03 9:34 ` Ingo Molnar
2011-03-03 15:08 ` Tao Ma [this message]
2011-03-02 8:38 ` [RFC PATCH 5/5] mm: Add readpages accounting Liu Yuan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4D6FAED0.5010000@tao.ma \
--to=tm@tao.ma \
--cc=a.p.zijlstra@chello.nl \
--cc=acme@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=fengguang.wu@intel.com \
--cc=fweisbec@gmail.com \
--cc=jaxboe@fusionio.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mingo@elte.hu \
--cc=namei.unix@gmail.com \
--cc=rostedt@goodmis.org \
--cc=tglx@linutronix.de \
--cc=tzanussi@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).