From: Tao Ma <tm@tao.ma>
To: Ingo Molnar <mingo@elte.hu>
Cc: "Liu Yuan" <namei.unix@gmail.com>,
linux-kernel@vger.kernel.org, linux-mm@kvack.org,
jaxboe@fusionio.com, akpm@linux-foundation.org,
fengguang.wu@intel.com, "Peter Zijlstra" <a.p.zijlstra@chello.nl>,
"Frédéric Weisbecker" <fweisbec@gmail.com>,
"Steven Rostedt" <rostedt@goodmis.org>,
"Thomas Gleixner" <tglx@linutronix.de>,
"Arnaldo Carvalho de Melo" <acme@redhat.com>,
"Tom Zanussi" <tzanussi@gmail.com>
Subject: Re: [RFC PATCH 4/5] mm: Add hit/miss accounting for Page Cache
Date: Thu, 03 Mar 2011 23:08:00 +0800 [thread overview]
Message-ID: <4D6FAED0.5010000@tao.ma> (raw)
In-Reply-To: <20110303093422.GC18252@elte.hu>
On 03/03/2011 05:34 PM, Ingo Molnar wrote:
> * Tao Ma<tm@tao.ma> wrote:
>
>> On 03/02/2011 04:45 PM, Ingo Molnar wrote:
>>> * Liu Yuan<namei.unix@gmail.com> wrote:
>>>
>>>> + if (likely(!retry_find)&& page&& PageUptodate(page))
>>>> + page_cache_acct_hit(inode->i_sb, READ);
>>>> + else
>>>> + page_cache_acct_missed(inode->i_sb, READ);
>>> Sigh.
>>>
>>> This would make such a nice tracepoint or sw perf event. It could be collected in a
>>> 'count' form, equivalent to the stats you are aiming for here, or it could even be
>>> traced, if someone is interested in such details.
>>>
>>> It could be mixed with other events, enriching multiple apps at once.
>>>
>>> But, instead of trying to improve those aspects of our existing instrumentation
>>> frameworks, mm/* is gradually growing its own special instrumentation hacks, missing
>>> the big picture and fragmenting the instrumentation space some more.
>> Thanks for the quick response. Actually our team(including Liu) here are planing
>> to add some debug info to the mm parts for analyzing the application behavior and
>> hope to find some way to improve our application's performance. We have searched
>> the trace points in mm, but it seems to us that the trace points isn't quite
>> welcomed there. Only vmscan and writeback have some limited trace points added.
>> That's the reason we first tried to add some debug info like this patch. You does
>> shed some light on our direction. Thanks.
> Yes, it's very much a 'critical mass' phenomenon: the moment there's enough
> tracepoints, above some magic limit, things happen quickly and everyone finds the
> stuff obviously useful.
>
> Before that limit it's all pretty painful.
yeah.
>> btw, what part do you think is needed to add some trace point? We
>> volunteer to add more if you like.
> Whatever part you find useful in your daily development work!
>
> Tracepoints are pretty flexible. The bit that is missing and which is very important
> for the MM is the collapse into 'summaries' and the avoidance of tracing overhead
> when only a summary is wanted. Please see Wu Fengguang's reply in this thread about
> the 'dump state' facility he and Steve added to recover large statistics.
We are looking into it now. Thanks for the hint.
> I suspect the hit/miss histogram you are building in this patch could be recovered
> via that facility initially?
>
> The next step would generalize that approach - it is non-trivial but powerful :-)
>
> The idea is to allow non-trivial histograms and summaries to be built out of simple
> events, via the filter engine.
>
> It would require an extension of tracing to really allow a filter expression to be
> defined over existing events, which would allow the maintenance of a persistent
> 'sum' variable - probably within the perf ring-buffer. We already have filter
> support, that would have to be extended with a notion of 'persistent variables'.
>
> So right now, if you define a tracepoint in that spot, we already support such
> filter expressions:
>
> 'bdev == sda1&& page_state == PageUptodate'
>
> You can inject such filter expressions into /debug/tracing/events/*/*/filter today,
> and you can use filters in perf record --filter '...' as well.
>
> To implement 'fast statistics', the filter engine would have to be extended to
> support (simple) statements like:
>
> if (bdev == sda1&& page_state == PageUptodate)'
> var0++;
>
> And:
>
> if (bdev == sda1&& page_state != PageUptodate)'
> var1++;
>
> Only a very minimal type of C syntax would be supported - not a full C parser.
>
> That way the 'var0' portion of the perf ring-buffer (which would not be part of the
> regular, overwritten ring-buffer) would act as a 'hits' variable that you could
> recover. The 'var1' portion would be the 'misses' counter.
>
> Individual trace events would only twiddle var0 and var1 - they would not inject a
> full-blown event into the ring-buffer, so statistics would be very fast.
>
> This method is very extensible and could be used for far more things than just MM
> statistics. In theory all of /proc statistics collection could be replaced and made
> optional that way, just by adding the right events to the right spots in the kernel.
> That is obviously a very long-term project.
It looks really fantastic for us. OK, we will try to figure out when and
how we can work on this issue. Great thanks.
Regards,
Tao
next prev parent reply other threads:[~2011-03-03 15:08 UTC|newest]
Thread overview: 81+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <no>
2008-08-15 2:20 ` [PATCH 00/07] dyn_array/nr_irqs/sparse_irq support v10 - fix Yinghai Lu
2008-08-15 2:20 ` [PATCH 1/7] x86: some debug info for 32bit sparse_irq Yinghai Lu
2008-08-15 2:20 ` [PATCH 2/7] x86: remove union about dest for log/phy Yinghai Lu
2008-08-15 2:20 ` [PATCH 3/7] x86: make 32bit support per_cpu vector fix #1 Yinghai Lu
2008-08-15 2:20 ` [PATCH 4/7] x86_64: rename irq_desc/irq_desc_with_new - fix Yinghai Lu
2008-08-15 2:20 ` [PATCH 5/7] x86: make 32bit support per_cpu vector fix #2 Yinghai Lu
2008-08-15 2:20 ` [PATCH 6/7] x86: ordering functions in io_apic_32.c Yinghai Lu
2008-08-15 2:20 ` [PATCH 7/7] x86: ordering functions in io_apic_64.c Yinghai Lu
2008-08-15 8:21 ` [PATCH 3/7] x86: make 32bit support per_cpu vector fix #1 Ingo Molnar
2008-08-15 8:29 ` Yinghai Lu
2008-08-15 8:51 ` Ingo Molnar
2008-08-15 8:27 ` [PATCH 00/07] dyn_array/nr_irqs/sparse_irq support v10 - fix Ingo Molnar
2008-08-15 8:34 ` Yinghai Lu
2008-08-15 8:51 ` Ingo Molnar
2008-08-15 9:35 ` Ingo Molnar
2008-08-15 10:00 ` Peter Zijlstra
2008-08-15 10:19 ` Ingo Molnar
2008-08-15 10:28 ` Peter Zijlstra
2008-08-15 17:07 ` Yinghai Lu
2008-08-15 23:42 ` [PATCH 0/7] merge io_apic_xx.c Yinghai Lu
2008-08-15 23:42 ` [PATCH 1/7] x86: ordering functions in io_apic_32.c - fix Yinghai Lu
2008-08-15 23:42 ` [PATCH 2/7] x86: make headers files the smae in io_apic_xx.c Yinghai Lu
2008-08-15 23:42 ` [PATCH 3/7] x86: make 64 handle sis_apic_bug like the 32 bit Yinghai Lu
2008-08-15 23:42 ` [PATCH 4/7] x86: remve ioapic_force Yinghai Lu
2008-08-15 23:42 ` [PATCH 5/7] x86: make io_apic_64.c and io_apic_32.c the same Yinghai Lu
2008-08-15 23:42 ` [PATCH 6/7] rename io_apic_64.c to io_apic.c Yinghai Lu
2008-08-15 23:42 ` [PATCH 7/7] make 32 bit have io_apic resource in /proc/iomem Yinghai Lu
2008-08-16 8:02 ` [PATCH 6/7] rename io_apic_64.c to io_apic.c Ingo Molnar
2008-08-16 8:22 ` [PATCH] x86: io_apic.c, build fix Ingo Molnar
2008-08-16 8:26 ` Yinghai Lu
2008-08-18 4:12 ` [PATCH] x86: apic - unify lapic_resume - fix Yinghai Lu
2008-08-18 4:12 ` [PATCH 1/2] x86: make HAVE_SPARSE_IRQ support selectable Yinghai Lu
2008-08-18 4:12 ` [PATCH 2/2] irq: rename irq_desc() to to_irq_desc() Yinghai Lu
2008-08-18 7:37 ` Ingo Molnar
2008-08-18 18:14 ` Yinghai Lu
2008-08-18 7:25 ` [PATCH] x86: apic - unify lapic_resume - fix Ingo Molnar
2008-08-18 20:44 ` [PATCH] irq: rename irq_desc() to to_irq_desc() " Yinghai Lu
2008-08-18 20:44 ` [PATCH] irq: rename irq_desc() to to_irq_desc() - fix #2 Yinghai Lu
2008-08-18 20:44 ` [PATCH] irq: rename irq_desc() to to_irq_desc() - fix #3 Yinghai Lu
2008-08-19 0:11 ` Ingo Molnar
2008-08-19 0:38 ` Ingo Molnar
2008-08-19 0:48 ` Yinghai Lu
2008-08-19 1:16 ` Ingo Molnar
2010-04-22 13:16 ` [PATCH] OpenRD: Enable SD/UART selection for serial port 1 Tanmay Upadhyay
2011-03-02 8:38 ` [RFC PATCH 1/5] x86/Kconfig: Add Page Cache Accounting entry Liu Yuan
2011-03-02 16:24 ` Randy Dunlap
2011-03-02 8:38 ` [RFC PATCH 2/5] block: Add functions and data types for Page Cache Accounting Liu Yuan
2011-03-02 8:38 ` [RFC PATCH 3/5] block: Make Page Cache counters work with sysfs Liu Yuan
2011-03-02 8:38 ` [RFC PATCH 4/5] mm: Add hit/miss accounting for Page Cache Liu Yuan
2011-03-02 8:45 ` Ingo Molnar
2011-03-02 17:02 ` Dave Hansen
2011-03-02 18:49 ` Ingo Molnar
2011-03-03 0:33 ` Wu Fengguang
2011-03-03 2:01 ` KOSAKI Motohiro
2011-03-03 3:14 ` Tao Ma
2011-03-03 9:34 ` Ingo Molnar
2011-03-03 15:08 ` Tao Ma [this message]
2011-03-02 8:38 ` [RFC PATCH 5/5] mm: Add readpages accounting Liu Yuan
2012-07-25 5:20 ` [PATCH] fixed a macro coding style issue Baodong Chen
2012-07-25 5:27 ` Venu Byravarasu
2012-07-25 5:37 ` Dmitry Torokhov
2012-07-25 6:09 ` Baodong Chen
2012-07-25 6:15 ` Al Viro
2012-07-25 6:36 ` Dmitry Torokhov
2012-07-31 7:27 ` Dmitry Torokhov
2012-09-27 12:51 ` [PATCH 1/8] fs/namespace.c: introduce helper function path_unmounted() Yan Hong
2012-09-27 12:51 ` [PATCH 2/8] fs/namespace.c: remove unused macro MNT_WRITER_UNDERFLOW_LIMIT Yan Hong
2012-09-27 12:51 ` [PATCH 3/8] fs/namespace.c: trivial code clean Yan Hong
2012-09-27 12:51 ` [PATCH 4/8] fs/namespace.c: check permission early in sys_[u]mount Yan Hong
2012-09-27 12:51 ` [PATCH 5/8] fs/namei.c: introduce macro AT_FDINV Yan Hong
2012-09-27 12:51 ` [PATCH 6/8] fs/inode.c: call alloc_inode() in new_inode() directly Yan Hong
2012-09-27 12:51 ` [PATCH 7/8] fs/inode.c: remove outstanding spin lock prefetch Yan Hong
2012-09-27 12:51 ` [PATCH 8/8] vfs: misc comment clean Yan Hong
2013-01-07 18:11 ` [PATCH] Staging: android: fixed const coding style issue in binder.c Patrik Karlin
2013-01-07 23:01 ` Greg KH
2014-02-08 2:29 ` [PATCH v2] SUNRPC: Allow one callback request to be received from two sk_buff shaobingqing
2014-02-08 19:14 ` Sergei Shtylyov
2014-02-10 17:46 ` Trond Myklebust
2025-11-28 3:23 ` [PATCH v2] f2fs: optimize trace_f2fs_write_checkpoint with enums YH Lin
2025-11-28 3:50 ` Chao Yu
2025-12-02 18:10 ` [f2fs-dev] " patchwork-bot+f2fs
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4D6FAED0.5010000@tao.ma \
--to=tm@tao.ma \
--cc=a.p.zijlstra@chello.nl \
--cc=acme@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=fengguang.wu@intel.com \
--cc=fweisbec@gmail.com \
--cc=jaxboe@fusionio.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mingo@elte.hu \
--cc=namei.unix@gmail.com \
--cc=rostedt@goodmis.org \
--cc=tglx@linutronix.de \
--cc=tzanussi@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox