From: Wu Fengguang <fengguang.wu@intel.com>
To: Balbir Singh <balbir@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@elte.hu>, Chris Frost <frost@cs.ucla.edu>,
Steven Rostedt <rostedt@goodmis.org>,
Peter Zijlstra <a.p.zijlstra@chello.nl>,
Frederic Weisbecker <fweisbec@gmail.com>,
Keiichi KII <k-keiichi@bx.jp.nec.com>,
Andrew Morton <akpm@linux-foundation.org>,
Jason Baron <jbaron@redhat.com>,
Hitoshi Mitake <mitake@dcl.info.waseda.ac.jp>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"lwoodman@redhat.com" <lwoodman@redhat.com>,
"linux-mm@kvack.org" <linux-mm@kvack.org>,
Tom Zanussi <tzanussi@gmail.com>,
"riel@redhat.com" <riel@redhat.com>,
Munehiro Ikeda <m-ikeda@ds.jp.nec.com>,
Atsushi Tsuji <a-tsuji@bk.jp.nec.com>
Subject: Re: [RFC PATCH -tip 0/2 v3] pagecache tracepoints proposal
Date: Sun, 21 Feb 2010 10:28:31 +0800 [thread overview]
Message-ID: <20100221022831.GB6448@localhost> (raw)
In-Reply-To: <20100213132952.GG11364@balbir.in.ibm.com>
Hi Balbir,
> > tracing: pagecache object collections
> >
> > This dumps
> > - all cached files of a mounted fs (the inode-cache)
> > - all cached pages of a cached file (the page-cache)
> >
> > Usage and Sample output:
> >
> > # echo /dev > /debug/tracing/objects/mm/pages/walk-fs
> > # tail /debug/tracing/trace
> > zsh-2528 [000] 10429.172470: dump_inode: ino=889 size=0 cached=0 age=442 dirty=0 dev=0:18 file=/dev/console
> > zsh-2528 [000] 10429.172472: dump_inode: ino=888 size=0 cached=0 age=442 dirty=7 dev=0:18 file=/dev/null
> > zsh-2528 [000] 10429.172474: dump_inode: ino=887 size=40 cached=0 age=442 dirty=0 dev=0:18 file=/dev/shm
> > zsh-2528 [000] 10429.172477: dump_inode: ino=886 size=40 cached=0 age=442 dirty=0 dev=0:18 file=/dev/pts
> > zsh-2528 [000] 10429.172479: dump_inode: ino=885 size=11 cached=0 age=442 dirty=0 dev=0:18 file=/dev/core
> > zsh-2528 [000] 10429.172481: dump_inode: ino=884 size=15 cached=0 age=442 dirty=0 dev=0:18 file=/dev/stderr
> > zsh-2528 [000] 10429.172483: dump_inode: ino=883 size=15 cached=0 age=442 dirty=0 dev=0:18 file=/dev/stdout
> > zsh-2528 [000] 10429.172486: dump_inode: ino=882 size=15 cached=0 age=442 dirty=0 dev=0:18 file=/dev/stdin
> > zsh-2528 [000] 10429.172488: dump_inode: ino=881 size=13 cached=0 age=442 dirty=0 dev=0:18 file=/dev/fd
> > zsh-2528 [000] 10429.172491: dump_inode: ino=872 size=13360 cached=0 age=442 dirty=0 dev=0:18 file=/dev
> >
> > Here "age" is either age from inode create time, or from last dirty time.
> >
>
> It would be nice to see mapped/unmapped information as well.
As you noticed, we have mapcount for individual pages :)
> > +static int pages_similiar(struct page* page0, struct page* page)
> > +{
> > + if (page_count(page0) != page_count(page))
> > + return 0;
> > +
> > + if (page_mapcount(page0) != page_mapcount(page))
> > + return 0;
> > +
> > + if (page_flags(page0) != page_flags(page))
> > + return 0;
> > +
> > + return 1;
> > +}
> > +
>
> OK, so pages_similar() is used to identify a range of pages in the
> cache?
Right. Many files are accessed sequentially or clustered, so
pages_similar() can save lots of output lines :)
> > +#define BATCH_LINES 100
> > +static void dump_pagecache(struct address_space *mapping)
> > +{
> > + int i;
> > + int lines = 0;
> > + pgoff_t len = 0;
> > + struct pagevec pvec;
> > + struct page *page;
> > + struct page *page0 = NULL;
> > + unsigned long start = 0;
> > +
> > + for (;;) {
> > + pagevec_init(&pvec, 0);
> > + pvec.nr = radix_tree_gang_lookup(&mapping->page_tree,
> > + (void **)pvec.pages, start + len, PAGEVEC_SIZE);
>
> Is radix_tree_gang_lookup synchronized somewhere? Don't we need to
> call it under RCU or a lock (mapping) ?
No. This function is inherently non-atomic, and it seems that most in-kernel
users do not bother to take rcu_read_lock(). So lets leave it as is?
> > +static ssize_t
> > +trace_pagecache_write(struct file *filp, const char __user *ubuf, size_t count,
> > + loff_t *ppos)
> > +{
> > + struct file *file = NULL;
> > + char *name;
> > + int err = 0;
> > +
>
> Can't we use the trace_parser here?
Seems not necessary? It's merely one file name, which could contain spaces.
> > + if (count <= 1)
> > + return -EINVAL;
> > + if (count > PATH_MAX + 1)
> > + return -ENAMETOOLONG;
> > +
> > + name = kmalloc(count+1, GFP_KERNEL);
> > + if (!name)
> > + return -ENOMEM;
> > +
> > + if (copy_from_user(name, ubuf, count)) {
> > + err = -EFAULT;
> > + goto out;
> > + }
> > +
> > + /* strip the newline added by `echo` */
> > + if (name[count-1] != '\n')
> > + return -EINVAL;
>
> Doesn't sound correct, what happens if we use echo -n?
It's a bit sad. If we accept both "echo" and "echo -n" with some
smart logic to test for trailing '\n', then it will go wrong for a
'\n'-terminated file name.
Or shall we support only "echo -n"? I can do with either one.
> > --- linux-mm.orig/fs/inode.c 2010-02-08 23:19:12.000000000 +0800
> > +++ linux-mm/fs/inode.c 2010-02-08 23:19:22.000000000 +0800
> > @@ -149,7 +149,7 @@ struct inode *inode_init_always(struct s
> > inode->i_bdev = NULL;
> > inode->i_cdev = NULL;
> > inode->i_rdev = 0;
> > - inode->dirtied_when = 0;
> > + inode->dirtied_when = jiffies;
> >
>
> Hmmm... Is the inode really dirtied when initialized? I know the
> change is for tracing, but the code when read is confusing.
Huh. Not really dirtied (for that you need to check I_DIRTY), but
dirtied_when is only used in writeback code when I_DIRTY is set.
So I overload dirtied_when in the clean case to indicate the inode
load time. This is a useful trick for fastboot to collect cache
footprint shortly after boot, when most inodes are clean.
It does ask for a comment:
/*
* This records inode load time. It will be invalidated once inode is
* dirtied, or jiffies wraps around. Despite the pitfalls it still
* provides useful information for some use cases like fastboot.
*/
inode->dirtied_when = jiffies;
Thanks,
Fengguang
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2010-02-21 2:28 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-02-05 2:17 [RFC PATCH -tip 0/2 v3] pagecache tracepoints proposal Keiichi KII
2010-02-05 2:24 ` [RFC PATCH -tip 1/2 v3] tracepoints: add tracepoints for pagecache Keiichi KII
2010-02-05 2:25 ` [RFC PATCH -tip 2/2 v3] add scripts for pagecache analysis per process Keiichi KII
2010-02-05 7:28 ` [RFC PATCH -tip 0/2 v3] pagecache tracepoints proposal Ingo Molnar
2010-02-05 21:19 ` Keiichi KII
2010-02-08 15:54 ` Wu Fengguang
2010-02-09 16:21 ` Wu Fengguang
2010-02-13 13:29 ` Balbir Singh
2010-02-14 10:52 ` Balbir Singh
2010-02-21 2:28 ` Wu Fengguang [this message]
2010-02-16 3:22 ` KOSAKI Motohiro
2010-02-17 22:38 ` Keiichi KII
2010-02-18 5:34 ` KAMEZAWA Hiroyuki
2010-02-18 9:58 ` Balbir Singh
2010-02-23 14:04 ` Wu Fengguang
2010-02-21 3:09 ` Wu Fengguang
2010-02-08 13:04 ` Balbir Singh
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20100221022831.GB6448@localhost \
--to=fengguang.wu@intel.com \
--cc=a-tsuji@bk.jp.nec.com \
--cc=a.p.zijlstra@chello.nl \
--cc=akpm@linux-foundation.org \
--cc=balbir@linux.vnet.ibm.com \
--cc=frost@cs.ucla.edu \
--cc=fweisbec@gmail.com \
--cc=jbaron@redhat.com \
--cc=k-keiichi@bx.jp.nec.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lwoodman@redhat.com \
--cc=m-ikeda@ds.jp.nec.com \
--cc=mingo@elte.hu \
--cc=mitake@dcl.info.waseda.ac.jp \
--cc=riel@redhat.com \
--cc=rostedt@goodmis.org \
--cc=tzanussi@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).