public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Ingo Molnar <mingo@elte.hu>
To: Tao Ma <tm@tao.ma>
Cc: "Liu Yuan" <namei.unix@gmail.com>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	jaxboe@fusionio.com, akpm@linux-foundation.org,
	fengguang.wu@intel.com, "Peter Zijlstra" <a.p.zijlstra@chello.nl>,
	"Frédéric Weisbecker" <fweisbec@gmail.com>,
	"Steven Rostedt" <rostedt@goodmis.org>,
	"Thomas Gleixner" <tglx@linutronix.de>,
	"Arnaldo Carvalho de Melo" <acme@redhat.com>,
	"Tom Zanussi" <tzanussi@gmail.com>
Subject: Re: [RFC PATCH 4/5] mm: Add hit/miss accounting for Page Cache
Date: Thu, 3 Mar 2011 10:34:22 +0100	[thread overview]
Message-ID: <20110303093422.GC18252@elte.hu> (raw)
In-Reply-To: <4D6F077B.3060400@tao.ma>


* Tao Ma <tm@tao.ma> wrote:

> On 03/02/2011 04:45 PM, Ingo Molnar wrote:
> >* Liu Yuan<namei.unix@gmail.com>  wrote:
> >
> >>+		if (likely(!retry_find)&&  page&&  PageUptodate(page))
> >>+			page_cache_acct_hit(inode->i_sb, READ);
> >>+		else
> >>+			page_cache_acct_missed(inode->i_sb, READ);
> >Sigh.
> >
> >This would make such a nice tracepoint or sw perf event. It could be collected in a
> >'count' form, equivalent to the stats you are aiming for here, or it could even be
> >traced, if someone is interested in such details.
> >
> >It could be mixed with other events, enriching multiple apps at once.
> >
> >But, instead of trying to improve those aspects of our existing instrumentation
> >frameworks, mm/* is gradually growing its own special instrumentation hacks, missing
> >the big picture and fragmenting the instrumentation space some more.

> Thanks for the quick response. Actually our team(including Liu) here are planing 
> to add some debug info to the mm parts for analyzing the application behavior and 
> hope to find some way to improve our application's performance. We have searched 
> the trace points in mm, but it seems to us that the trace points isn't quite 
> welcomed there. Only vmscan and writeback have some limited trace points added. 
> That's the reason we first tried to add some debug info like this patch. You does 
> shed some light on our direction. Thanks.

Yes, it's very much a 'critical mass' phenomenon: the moment there's enough 
tracepoints, above some magic limit, things happen quickly and everyone finds the 
stuff obviously useful.

Before that limit it's all pretty painful.

> btw, what part do you think is needed to add some trace point?  We
> volunteer to add more if you like.

Whatever part you find useful in your daily development work!

Tracepoints are pretty flexible. The bit that is missing and which is very important 
for the MM is the collapse into 'summaries' and the avoidance of tracing overhead 
when only a summary is wanted. Please see Wu Fengguang's reply in this thread about 
the 'dump state' facility he and Steve added to recover large statistics.

I suspect the hit/miss histogram you are building in this patch could be recovered 
via that facility initially?

The next step would generalize that approach - it is non-trivial but powerful :-)

The idea is to allow non-trivial histograms and summaries to be built out of simple 
events, via the filter engine.

It would require an extension of tracing to really allow a filter expression to be 
defined over existing events, which would allow the maintenance of a persistent 
'sum' variable - probably within the perf ring-buffer. We already have filter 
support, that would have to be extended with a notion of 'persistent variables'.

So right now, if you define a tracepoint in that spot, we already support such 
filter expressions:

     'bdev == sda1 && page_state == PageUptodate'

You can inject such filter expressions into /debug/tracing/events/*/*/filter today, 
and you can use filters in perf record --filter '...' as well.

To implement 'fast statistics', the filter engine would have to be extended to 
support (simple) statements like:

	if (bdev == sda1 && page_state == PageUptodate)'
		var0++;

And:

	if (bdev == sda1 && page_state != PageUptodate)'
		var1++;

Only a very minimal type of C syntax would be supported - not a full C parser.

That way the 'var0' portion of the perf ring-buffer (which would not be part of the 
regular, overwritten ring-buffer) would act as a 'hits' variable that you could 
recover. The 'var1' portion would be the 'misses' counter.

Individual trace events would only twiddle var0 and var1 - they would not inject a 
full-blown event into the ring-buffer, so statistics would be very fast.

This method is very extensible and could be used for far more things than just MM 
statistics. In theory all of /proc statistics collection could be replaced and made 
optional that way, just by adding the right events to the right spots in the kernel.  
That is obviously a very long-term project.

Thanks,

	Ingo

  reply	other threads:[~2011-03-03  9:34 UTC|newest]

Thread overview: 81+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <no>
2008-08-15  2:20 ` [PATCH 00/07] dyn_array/nr_irqs/sparse_irq support v10 - fix Yinghai Lu
2008-08-15  2:20   ` [PATCH 1/7] x86: some debug info for 32bit sparse_irq Yinghai Lu
2008-08-15  2:20     ` [PATCH 2/7] x86: remove union about dest for log/phy Yinghai Lu
2008-08-15  2:20       ` [PATCH 3/7] x86: make 32bit support per_cpu vector fix #1 Yinghai Lu
2008-08-15  2:20         ` [PATCH 4/7] x86_64: rename irq_desc/irq_desc_with_new - fix Yinghai Lu
2008-08-15  2:20           ` [PATCH 5/7] x86: make 32bit support per_cpu vector fix #2 Yinghai Lu
2008-08-15  2:20             ` [PATCH 6/7] x86: ordering functions in io_apic_32.c Yinghai Lu
2008-08-15  2:20               ` [PATCH 7/7] x86: ordering functions in io_apic_64.c Yinghai Lu
2008-08-15  8:21         ` [PATCH 3/7] x86: make 32bit support per_cpu vector fix #1 Ingo Molnar
2008-08-15  8:29           ` Yinghai Lu
2008-08-15  8:51             ` Ingo Molnar
2008-08-15  8:27   ` [PATCH 00/07] dyn_array/nr_irqs/sparse_irq support v10 - fix Ingo Molnar
2008-08-15  8:34     ` Yinghai Lu
2008-08-15  8:51       ` Ingo Molnar
2008-08-15  9:35         ` Ingo Molnar
2008-08-15 10:00           ` Peter Zijlstra
2008-08-15 10:19             ` Ingo Molnar
2008-08-15 10:28               ` Peter Zijlstra
2008-08-15 17:07                 ` Yinghai Lu
2008-08-15 23:42 ` [PATCH 0/7] merge io_apic_xx.c Yinghai Lu
2008-08-15 23:42   ` [PATCH 1/7] x86: ordering functions in io_apic_32.c - fix Yinghai Lu
2008-08-15 23:42     ` [PATCH 2/7] x86: make headers files the smae in io_apic_xx.c Yinghai Lu
2008-08-15 23:42       ` [PATCH 3/7] x86: make 64 handle sis_apic_bug like the 32 bit Yinghai Lu
2008-08-15 23:42         ` [PATCH 4/7] x86: remve ioapic_force Yinghai Lu
2008-08-15 23:42           ` [PATCH 5/7] x86: make io_apic_64.c and io_apic_32.c the same Yinghai Lu
2008-08-15 23:42             ` [PATCH 6/7] rename io_apic_64.c to io_apic.c Yinghai Lu
2008-08-15 23:42               ` [PATCH 7/7] make 32 bit have io_apic resource in /proc/iomem Yinghai Lu
2008-08-16  8:02               ` [PATCH 6/7] rename io_apic_64.c to io_apic.c Ingo Molnar
2008-08-16  8:22                 ` [PATCH] x86: io_apic.c, build fix Ingo Molnar
2008-08-16  8:26                   ` Yinghai Lu
2008-08-18  4:12 ` [PATCH] x86: apic - unify lapic_resume - fix Yinghai Lu
2008-08-18  4:12   ` [PATCH 1/2] x86: make HAVE_SPARSE_IRQ support selectable Yinghai Lu
2008-08-18  4:12     ` [PATCH 2/2] irq: rename irq_desc() to to_irq_desc() Yinghai Lu
2008-08-18  7:37       ` Ingo Molnar
2008-08-18 18:14         ` Yinghai Lu
2008-08-18  7:25   ` [PATCH] x86: apic - unify lapic_resume - fix Ingo Molnar
2008-08-18 20:44 ` [PATCH] irq: rename irq_desc() to to_irq_desc() " Yinghai Lu
2008-08-18 20:44   ` [PATCH] irq: rename irq_desc() to to_irq_desc() - fix #2 Yinghai Lu
2008-08-18 20:44     ` [PATCH] irq: rename irq_desc() to to_irq_desc() - fix #3 Yinghai Lu
2008-08-19  0:11       ` Ingo Molnar
2008-08-19  0:38         ` Ingo Molnar
2008-08-19  0:48           ` Yinghai Lu
2008-08-19  1:16             ` Ingo Molnar
2010-04-22 13:16 ` [PATCH] OpenRD: Enable SD/UART selection for serial port 1 Tanmay Upadhyay
2011-03-02  8:38 ` [RFC PATCH 1/5] x86/Kconfig: Add Page Cache Accounting entry Liu Yuan
2011-03-02 16:24   ` Randy Dunlap
2011-03-02  8:38 ` [RFC PATCH 2/5] block: Add functions and data types for Page Cache Accounting Liu Yuan
2011-03-02  8:38 ` [RFC PATCH 3/5] block: Make Page Cache counters work with sysfs Liu Yuan
2011-03-02  8:38 ` [RFC PATCH 4/5] mm: Add hit/miss accounting for Page Cache Liu Yuan
2011-03-02  8:45   ` Ingo Molnar
2011-03-02 17:02     ` Dave Hansen
2011-03-02 18:49       ` Ingo Molnar
2011-03-03  0:33         ` Wu Fengguang
2011-03-03  2:01     ` KOSAKI Motohiro
2011-03-03  3:14     ` Tao Ma
2011-03-03  9:34       ` Ingo Molnar [this message]
2011-03-03 15:08         ` Tao Ma
2011-03-02  8:38 ` [RFC PATCH 5/5] mm: Add readpages accounting Liu Yuan
2012-07-25  5:20 ` [PATCH] fixed a macro coding style issue Baodong Chen
2012-07-25  5:27   ` Venu Byravarasu
2012-07-25  5:37   ` Dmitry Torokhov
2012-07-25  6:09     ` Baodong Chen
2012-07-25  6:15     ` Al Viro
2012-07-25  6:36       ` Dmitry Torokhov
2012-07-31  7:27         ` Dmitry Torokhov
2012-09-27 12:51 ` [PATCH 1/8] fs/namespace.c: introduce helper function path_unmounted() Yan Hong
2012-09-27 12:51   ` [PATCH 2/8] fs/namespace.c: remove unused macro MNT_WRITER_UNDERFLOW_LIMIT Yan Hong
2012-09-27 12:51   ` [PATCH 3/8] fs/namespace.c: trivial code clean Yan Hong
2012-09-27 12:51   ` [PATCH 4/8] fs/namespace.c: check permission early in sys_[u]mount Yan Hong
2012-09-27 12:51   ` [PATCH 5/8] fs/namei.c: introduce macro AT_FDINV Yan Hong
2012-09-27 12:51   ` [PATCH 6/8] fs/inode.c: call alloc_inode() in new_inode() directly Yan Hong
2012-09-27 12:51   ` [PATCH 7/8] fs/inode.c: remove outstanding spin lock prefetch Yan Hong
2012-09-27 12:51   ` [PATCH 8/8] vfs: misc comment clean Yan Hong
2013-01-07 18:11 ` [PATCH] Staging: android: fixed const coding style issue in binder.c Patrik Karlin
2013-01-07 23:01   ` Greg KH
2014-02-08  2:29 ` [PATCH v2] SUNRPC: Allow one callback request to be received from two sk_buff shaobingqing
2014-02-08 19:14   ` Sergei Shtylyov
2014-02-10 17:46   ` Trond Myklebust
2025-11-28  3:23 ` [PATCH v2] f2fs: optimize trace_f2fs_write_checkpoint with enums YH Lin
2025-11-28  3:50   ` Chao Yu
2025-12-02 18:10   ` [f2fs-dev] " patchwork-bot+f2fs

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20110303093422.GC18252@elte.hu \
    --to=mingo@elte.hu \
    --cc=a.p.zijlstra@chello.nl \
    --cc=acme@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=fengguang.wu@intel.com \
    --cc=fweisbec@gmail.com \
    --cc=jaxboe@fusionio.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=namei.unix@gmail.com \
    --cc=rostedt@goodmis.org \
    --cc=tglx@linutronix.de \
    --cc=tm@tao.ma \
    --cc=tzanussi@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox