linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Frederic Weisbecker <fweisbec@gmail.com>
To: Wu Fengguang <fengguang.wu@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>, Li Zefan <lizf@cn.fujitsu.com>,
	Tom Zanussi <tzanussi@gmail.com>,
	KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
	Pekka Enberg <penberg@cs.helsinki.fi>,
	Andi Kleen <andi@firstfloor.org>,
	Steven Rostedt <rostedt@goodmis.org>,
	Larry Woodman <lwoodman@redhat.com>,
	Peter Zijlstra <a.p.zijlstra@chello.nl>,
	Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>,
	Andrew Morton <akpm@linux-foundation.org>,
	LKML <linux-kernel@vger.kernel.org>,
	Matt Mackall <mpm@selenic.com>,
	Alexey Dobriyan <adobriyan@gmail.com>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>
Subject: Re: [rfc] object collection tracing (was: [PATCH 5/5] proc: export more page flags in /proc/kpageflags)
Date: Tue, 12 May 2009 15:01:12 +0200	[thread overview]
Message-ID: <20090512130110.GA6255@nowhere> (raw)
In-Reply-To: <20090428133108.GA23560@localhost>

On Tue, Apr 28, 2009 at 09:31:08PM +0800, Wu Fengguang wrote:
> On Tue, Apr 28, 2009 at 08:17:51PM +0800, Ingo Molnar wrote:
> > tent-Transfer-Encoding: quoted-printable
> > Status: RO
> > Content-Length: 5480
> > Lines: 161
> > 
> > 
> > * Wu Fengguang <fengguang.wu@intel.com> wrote:
> > 
> > > > The above 'get object state' interface (which allows passive 
> > > > sampling) - integrated into the tracing framework - would serve 
> > > > that goal, agreed?
> > > 
> > > Agreed. That could in theory a good complement to dynamic 
> > > tracings.
> > > 
> > > Then what will be the canonical form for all the 'get object 
> > > state' interfaces - "object.attr=value", or whatever? [...]
> > 
> > Lemme outline what i'm thinking of.
> > 
> > I'd call the feature "object collection tracing", which would live 
> > in /debug/tracing, accessed via such files:
> > 
> >   /debug/tracing/objects/mm/pages/
> >   /debug/tracing/objects/mm/pages/format
> >   /debug/tracing/objects/mm/pages/filter
> >   /debug/tracing/objects/mm/pages/trace_pipe
> >   /debug/tracing/objects/mm/pages/stats
> >   /debug/tracing/objects/mm/pages/events/
> > 
> > here's the (proposed) semantics of those files:
> > 
> > 1) /debug/tracing/objects/mm/pages/
> > 
> > There's a subsystem / object basic directory structure to make it 
> > easy and intuitive to find our way around there.
> > 
> > 2) /debug/tracing/objects/mm/pages/format
> > 
> > the format file:
> > 
> >   /debug/tracing/objects/mm/pages/format
> > 
> > Would reuse the existing dynamic-tracepoint structured-logging 
> > descriptor format and code (this is upstream already):
> > 
> >  [root@phoenix sched_signal_send]# pwd
> >  /debug/tracing/events/sched/sched_signal_send
> > 
> >  [root@phoenix sched_signal_send]# cat format 
> >  name: sched_signal_send
> >  ID: 24
> >  format:
> > 	field:unsigned short common_type;		offset:0;	size:2;
> > 	field:unsigned char common_flags;		offset:2;	size:1;
> > 	field:unsigned char common_preempt_count;	offset:3;	size:1;
> > 	field:int common_pid;				offset:4;	size:4;
> > 	field:int common_tgid;				offset:8;	size:4;
> > 
> > 	field:int sig;					offset:12;	size:4;
> > 	field:char comm[TASK_COMM_LEN];			offset:16;	size:16;
> > 	field:pid_t pid;				offset:32;	size:4;
> > 
> >  print fmt: "sig: %d  task %s:%d", REC->sig, REC->comm, REC->pid
> > 
> > These format descriptors enumerate fields, types and sizes, in a 
> > structured way that user-space tools can parse easily. (The binary 
> > records that come from the trace_pipe file follow this format 
> > description.)
> > 
> > 3) /debug/tracing/objects/mm/pages/filter
> > 
> > This is the tracing filter that can be set based on the 'format' 
> > descriptor. So with the above (signal-send tracepoint) you can 
> > define such filter expressions:
> > 
> >   echo "(sig == 10 && comm == bash) || sig == 13" > filter
> > 
> > To restrict the 'scope' of the object collection along pretty much 
> > any key or combination of keys. (Or you can leave it as it is and 
> > dump all objects and do keying in user-space.)
> > 
> > [ Using in-kernel filtering is obviously faster that streaming it 
> >   out to user-space - but there might be details and types of 
> >   visualization you want to do in user-space - so we dont want to 
> >   restrict things here. ]
> > 
> > For the mm object collection tracepoint i could imagine such filter 
> > expressions:
> > 
> >   echo "type == shared && file == /sbin/init" > filter
> > 
> > To dump all shared pages that are mapped to /sbin/init.
> > 
> > 4) /debug/tracing/objects/mm/pages/trace_pipe
> > 
> > The 'trace_pipe' file can be used to dump all objects in the 
> > collection, which match the filter ('all objects' by default). The 
> > record format is described in 'format'.
> > 
> > trace_pipe would be a reuse of the existing trace_pipe code: it is a 
> > modern, poll()-able, read()-able, splice()-able pipe abstraction.
> > 
> > 5) /debug/tracing/objects/mm/pages/stats
> > 
> > The 'stats' file would be a reuse of the existing histogram code of 
> > the tracing code. We already make use of it for the branch tracers 
> > and for the workqueue tracer - it could be extended to be applicable 
> > to object collections as well.
> > 
> > The advantage there would be that there's no dumping at all - all 
> > the integration is done straight in the kernel. ( The 'filter' 
> > condition is listened to - increasing flexibility. The filter file 
> > could perhaps also act as a default histogram key. )
> > 
> > 6) /debug/tracing/objects/mm/pages/events/
> > 
> > The 'events' directory offers links back to existing dynamic 
> > tracepoints that are under /debug/tracing/events/. This would serve 
> > as an additional coherent force that keeps dynamic tracepoints 
> > collected by subsystem and by object type as well. (Tools could make 
> > use of this information as well - without being aware of actual 
> > object semantics.)
> > 
> > 
> > There would be a number of other object collections we could 
> > enumerate:
> > 
> >  tasks:
> > 
> >   /debug/tracing/objects/sched/tasks/
> > 
> >  active inodes known to the kernel:
> > 
> >   /debug/tracing/objects/fs/inodes/
> > 
> >  interrupts:
> > 
> >   /debug/tracing/objects/hw/irqs/
> > 
> > etc.
> > 
> > These would use the same 'object collection' framework. Once done we 
> > can use it for many other thing too.
> > 
> > Note how organically integrated it all is with the tracing 
> > framework. You could start from an 'object view' to get an overview 
> > and then go towards a more dynamic view of specific object 
> > attributes (or specific objects), as you drill down on a specific 
> > problem you want to analyze.
> > 
> > How does this all sound to you?
> 
> Great! I saw much opportunity to adapt the not yet submitted
> /proc/filecache interface to the proposed framework.
> 
> Its basic form is:
> 
> #      ino       size   cached cached% refcnt state       age accessed  process         dev             file
> [snip]
>        320          1        4     100      1    D-     50443     1085 udevd           00:11(tmpfs)     /.udev/uevent_seqnum
>     460725        123      124     100     35    --     50444     6795 touch           08:02(sda2)      /lib/libpthread-2.9.so
>     460727         31       32     100     14    --     50444     2007 touch           08:02(sda2)      /lib/librt-2.9.so
>     458865         97       80      82      1    --     50444       49 mount           08:02(sda2)      /lib/libdevmapper.so.1.02.1
>     460090         15       16     100      1    --     50444       48 mount           08:02(sda2)      /lib/libuuid.so.1.2
>     458866         46       48     100      1    --     50444       47 mount           08:02(sda2)      /lib/libblkid.so.1.0
>     460732         43       44     100     69    --     50444     3581 rcS             08:02(sda2)      /lib/libnss_nis-2.9.so
>     460739         87       88     100     73    --     50444     3597 rcS             08:02(sda2)      /lib/libnsl-2.9.so
>     460726         31       32     100     69    --     50444     3581 rcS             08:02(sda2)      /lib/libnss_compat-2.9.so
>     458804        250      252     100     11    --     50445     8175 rcS             08:02(sda2)      /lib/libncurses.so.5.6
>     229540        780      752      96      3    --     50445     7594 init            08:02(sda2)      /bin/bash
>     460735         15       16     100     89    --     50445    17581 init            08:02(sda2)      /lib/libdl-2.9.so
>     460721       1344     1340      99    117    --     50445    48732 init            08:02(sda2)      /lib/libc-2.9.so
>     458801        107      104      97     24    --     50445     3586 init            08:02(sda2)      /lib/libselinux.so.1
>     671870         37       24      65      1    --     50446        1 swapper         08:02(sda2)      /sbin/init
>        175          1    24412     100      1    --     50446        0 swapper         00:01(rootfs)    /dev/root
> 
> The patch basically does a traversal through one or more of the inode
> lists to produce the output:
>         inode_in_use
>         inode_unused
>         sb->s_dirty
>         sb->s_io
>         sb->s_more_io
>         sb->s_inodes
> 
> The filtering feature is a necessity for this interface - or it will
> take considerable time to do a full listing. It supports the following
> filters:
>         { LS_OPT_DIRTY,         "dirty"         },
>         { LS_OPT_CLEAN,         "clean"         },
>         { LS_OPT_INUSE,         "inuse"         },
>         { LS_OPT_EMPTY,         "empty"         },
>         { LS_OPT_ALL,           "all"           },
>         { LS_OPT_DEV,           "dev=%s"        },
> 
> There are two possible challenges for the conversion:
> 
> - One trick it does is to select different lists to traverse on
>   different filter options. Will this be possible in the object
>   tracing framework?



Yeah, I guess.



> - The file name lookup(last field) is the performance killer. Is it
>   possible to skip the file name lookup when the filter failed on the
>   leading fields?


objects collection lays on trace events where filters basically ignore
a whole entry in case of non-matching. Not sure if we can easily only
ignore one field.

But I guess we can do something about the performances...

Could you send us the (sob'ed) patch you made which implements this.
I could try to adapt it to object collection.

Thanks,
Frederic.


> Will the object tracing interface allow such flexibilities?
> (Sorry I'm not yet familiar with the tracing framework.)
> 
> > Can you see any conceptual holes in the scheme, any use-case that 
> > /proc/kpageflags supports but the object collection approach does 
> > not?
> 
> kpageflags is simply a big (perhaps sparse) binary array.
> I'd still prefer to retain its current form - the kernel patches and
> user space tools are all ready made, and I see no benefits in
> converting to the tracing framework.
> 
> > Would you be interested in seeing something like this, if we tried 
> > to implement it in the tracing tree? The majority of the code 
> > already exists, we just need interest from the MM side and we have 
> > to hook it all up. (it is by no means trivial to do - but looks like
> > a very exciting feature.)
> 
> Definitely! /proc/filecache has another 'page view':
> 
>         # head /proc/filecache
>         # file /bin/bash
>         # flags R:referenced A:active M:mmap U:uptodate D:dirty W:writeback X:readahead P:private O:owner b:buffer d:dirty w:writeback
>         # idx   len     state           refcnt
>         0       1       RAMU________    4
>         3       8       RAMU________    4
>         12      1       RAMU________    4
>         14      5       RAMU________    4
>         20      7       RAMU________    4
>         27      2       RAMU________    5
>         29      1       RAMU________    4
> 
> Which is also a good candidate. However I still need to investigate
> whether it offers considerable margins over the mincore() syscall.
> 
> Thanks and Regards,
> Fengguang

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2009-05-12 13:01 UTC|newest]

Thread overview: 68+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-04-28  1:09 [PATCH 0/5] proc: export more page flags in /proc/kpageflags (take 4) Wu Fengguang
2009-04-28  1:09 ` [PATCH 1/5] pagemap: document clarifications Wu Fengguang
2009-04-28  7:11   ` Tommi Rantala
2009-04-28  1:09 ` [PATCH 2/5] pagemap: documentation 9 more exported page flags Wu Fengguang
2009-04-28  1:09 ` [PATCH 3/5] mm: introduce PageHuge() for testing huge/gigantic pages Wu Fengguang
2009-04-28  1:09 ` [PATCH 4/5] proc: kpagecount/kpageflags code cleanup Wu Fengguang
2009-04-28  1:09 ` [PATCH 5/5] proc: export more page flags in /proc/kpageflags Wu Fengguang
2009-04-28  6:55   ` Ingo Molnar
2009-04-28  7:40     ` Andi Kleen
2009-04-28  9:04       ` Pekka Enberg
2009-04-28  9:10         ` Andi Kleen
2009-04-28  9:15           ` Pekka Enberg
2009-04-28  9:15         ` Ingo Molnar
2009-04-28  9:19           ` Pekka Enberg
2009-04-28  9:25             ` Pekka Enberg
2009-04-28  9:36               ` Wu Fengguang
2009-04-28  9:36               ` Ingo Molnar
2009-04-28  9:57                 ` Pekka Enberg
2009-04-28 10:10                   ` KOSAKI Motohiro
2009-04-28 10:21                     ` Pekka Enberg
2009-04-28 10:56                       ` Ingo Molnar
2009-04-28 11:09                         ` KOSAKI Motohiro
2009-04-28 12:42                           ` Ingo Molnar
2009-04-28 11:03                   ` Ingo Molnar
2009-04-28 17:42                 ` Matt Mackall
2009-04-28  9:29             ` Ingo Molnar
2009-04-28  9:34               ` KOSAKI Motohiro
2009-04-28  9:38                 ` Ingo Molnar
2009-04-28  9:55                   ` Wu Fengguang
2009-04-28 10:11                     ` KOSAKI Motohiro
2009-04-28 11:05                     ` Ingo Molnar
2009-04-28 11:36                       ` Wu Fengguang
2009-04-28 12:17                         ` [rfc] object collection tracing (was: [PATCH 5/5] proc: export more page flags in /proc/kpageflags) Ingo Molnar
2009-04-28 13:31                           ` Wu Fengguang
2009-05-12 13:01                             ` Frederic Weisbecker [this message]
2009-05-17 13:36                               ` Wu Fengguang
2009-05-17 13:55                                 ` Frederic Weisbecker
2009-05-17 14:12                                   ` Wu Fengguang
2009-05-18 11:44                                 ` KOSAKI Motohiro
2009-05-18 11:47                                   ` Wu Fengguang
2009-04-28 10:18                   ` [PATCH 5/5] proc: export more page flags in /proc/kpageflags Andi Kleen
2009-04-28  8:33     ` Wu Fengguang
2009-04-28  9:24       ` Ingo Molnar
2009-04-28 18:11       ` Tony Luck
2009-04-28 18:34         ` Matt Mackall
2009-04-28 20:47           ` Tony Luck
2009-04-28 20:54             ` Andi Kleen
2009-04-28 20:59             ` Matt Mackall
2009-04-28 21:17         ` Andrew Morton
2009-04-28 21:49           ` Matt Mackall
2009-04-29  0:02             ` Robin Holt
2009-04-28 17:49   ` Matt Mackall
2009-04-29  8:05     ` Wu Fengguang
2009-04-29 19:13       ` Matt Mackall
2009-04-30  1:00         ` Wu Fengguang
2009-04-28 21:32   ` Andrew Morton
2009-04-28 22:46     ` Matt Mackall
2009-04-28 23:02       ` Andrew Morton
2009-04-28 23:31         ` Matt Mackall
2009-04-28 23:42           ` Andrew Morton
2009-04-28 23:55             ` Matt Mackall
2009-04-29  3:33               ` Wu Fengguang
2009-04-29  2:38     ` Wu Fengguang
2009-04-29  2:55       ` Andrew Morton
2009-04-29  3:48         ` Wu Fengguang
2009-04-29  5:09           ` Wu Fengguang
2009-04-29  4:41       ` Nathan Lynch
2009-04-29  4:50         ` Andrew Morton

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090512130110.GA6255@nowhere \
    --to=fweisbec@gmail.com \
    --cc=a.p.zijlstra@chello.nl \
    --cc=adobriyan@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=andi@firstfloor.org \
    --cc=eduard.munteanu@linux360.ro \
    --cc=fengguang.wu@intel.com \
    --cc=kosaki.motohiro@jp.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lizf@cn.fujitsu.com \
    --cc=lwoodman@redhat.com \
    --cc=mingo@elte.hu \
    --cc=mpm@selenic.com \
    --cc=penberg@cs.helsinki.fi \
    --cc=rostedt@goodmis.org \
    --cc=tzanussi@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).