From: Wu Fengguang <fengguang.wu@intel.com>
To: Matt Mackall <mpm@selenic.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"kosaki.motohiro@jp.fujitsu.com" <kosaki.motohiro@jp.fujitsu.com>,
"andi@firstfloor.org" <andi@firstfloor.org>,
"adobriyan@gmail.com" <adobriyan@gmail.com>,
"linux-mm@kvack.org" <linux-mm@kvack.org>
Subject: Re: [PATCH 5/5] proc: export more page flags in /proc/kpageflags
Date: Wed, 29 Apr 2009 11:33:58 +0800 [thread overview]
Message-ID: <20090429033358.GA10719@localhost> (raw)
In-Reply-To: <1240962910.938.1084.camel@calx>
On Wed, Apr 29, 2009 at 07:55:10AM +0800, Matt Mackall wrote:
> On Tue, 2009-04-28 at 16:42 -0700, Andrew Morton wrote:
> > On Tue, 28 Apr 2009 18:31:09 -0500
> > Matt Mackall <mpm@selenic.com> wrote:
> >
> > > On Tue, 2009-04-28 at 16:02 -0700, Andrew Morton wrote:
> > > > On Tue, 28 Apr 2009 17:46:34 -0500
> > > > Matt Mackall <mpm@selenic.com> wrote:
> > > >
> > > > > > > +/* a helper function _not_ intended for more general uses */
> > > > > > > +static inline int page_cap_writeback_dirty(struct page *page)
> > > > > > > +{
> > > > > > > + struct address_space *mapping;
> > > > > > > +
> > > > > > > + if (!PageSlab(page))
> > > > > > > + mapping = page_mapping(page);
> > > > > > > + else
> > > > > > > + mapping = NULL;
> > > > > > > +
> > > > > > > + return mapping && mapping_cap_writeback_dirty(mapping);
> > > > > > > +}
> > > > > >
> > > > > > If the page isn't locked then page->mapping can be concurrently removed
> > > > > > and freed. This actually happened to me in real-life testing several
> > > > > > years ago.
> > > > >
> > > > > We certainly don't want to be taking locks per page to build the flags
> > > > > data here. As we don't have any pretense of being atomic, it's ok if we
> > > > > can find a way to do the test that's inaccurate when a race occurs, so
> > > > > long as it doesn't dereference null.
> > > > >
> > > > > But if there's not an obvious way to do that, we should probably just
> > > > > drop this flag bit for this iteration.
> > > >
> > > > trylock_page() could be used here, perhaps.
> > > >
> > > > Then again, why _not_ just do lock_page()? After all, few pages are
> > > > ever locked. There will be latency if the caller stumbles across a
> > > > page which is under read I/O, but so be it?
> > >
> > > As I mentioned just a bit ago, it's really not an unreasonable use case
> > > to want to do this on every page in the system back to back. So per page
> > > overhead matters. And the odds of stalling on a locked page when
> > > visiting 1M pages while under load are probably not negligible.
> >
> > The chances of stalling on a locked page are pretty good, and the
> > duration of the stall might be long indeed. Perhaps a trylock is a
> > decent compromise - it depends on the value of this metric, and I've
> > forgotten what we're talking about ;)
> >
> > umm, seems that this flag is needed to enable PG_error, PG_dirty,
> > PG_uptodate and PG_writeback reporting. So simply removing this code
> > would put a huge hole in the patchset, no?
>
> We can report those bits anyway. But this patchset does something
> clever: it filters irrelevant (and possibly overloaded) bits in various
> contexts.
>
> > > Our lock primitives are pretty low overhead in the fast path, but every
> > > cycle counts. The new tests and branches this code already adds are a
> > > bit worrisome, but on balance probably worth it.
> >
> > That should be easy to quantify (hint).
>
> I'll let Fengguang address both these points.
A quick micro bench: 100 runs on another T7300@2GHz 2GB laptop:
user system total
no lock 0.270 22.850 23.607
trylock 0.310 25.890 26.484
+13.3% +12.2%
But anyway, the plan is to move filtering to user space and eliminate
the complex kernel logics.
The IO filtering is no longer possible in user space, but I didn't see
the error/dirty/writeback bits on this testing system. So I guess it
won't be a big loss.
The huge/gigantic page filtering is also not possible in user space.
So I tend to add a KPF_HUGE flag to distinguish (hardware supported)
huge pages from normal (software) compound pages. Any objections?
Thanks,
Fengguang
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2009-04-29 3:34 UTC|newest]
Thread overview: 68+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-04-28 1:09 [PATCH 0/5] proc: export more page flags in /proc/kpageflags (take 4) Wu Fengguang
2009-04-28 1:09 ` [PATCH 1/5] pagemap: document clarifications Wu Fengguang
2009-04-28 7:11 ` Tommi Rantala
2009-04-28 1:09 ` [PATCH 2/5] pagemap: documentation 9 more exported page flags Wu Fengguang
2009-04-28 1:09 ` [PATCH 3/5] mm: introduce PageHuge() for testing huge/gigantic pages Wu Fengguang
2009-04-28 1:09 ` [PATCH 4/5] proc: kpagecount/kpageflags code cleanup Wu Fengguang
2009-04-28 1:09 ` [PATCH 5/5] proc: export more page flags in /proc/kpageflags Wu Fengguang
2009-04-28 6:55 ` Ingo Molnar
2009-04-28 7:40 ` Andi Kleen
2009-04-28 9:04 ` Pekka Enberg
2009-04-28 9:10 ` Andi Kleen
2009-04-28 9:15 ` Pekka Enberg
2009-04-28 9:15 ` Ingo Molnar
2009-04-28 9:19 ` Pekka Enberg
2009-04-28 9:25 ` Pekka Enberg
2009-04-28 9:36 ` Wu Fengguang
2009-04-28 9:36 ` Ingo Molnar
2009-04-28 9:57 ` Pekka Enberg
2009-04-28 10:10 ` KOSAKI Motohiro
2009-04-28 10:21 ` Pekka Enberg
2009-04-28 10:56 ` Ingo Molnar
2009-04-28 11:09 ` KOSAKI Motohiro
2009-04-28 12:42 ` Ingo Molnar
2009-04-28 11:03 ` Ingo Molnar
2009-04-28 17:42 ` Matt Mackall
2009-04-28 9:29 ` Ingo Molnar
2009-04-28 9:34 ` KOSAKI Motohiro
2009-04-28 9:38 ` Ingo Molnar
2009-04-28 9:55 ` Wu Fengguang
2009-04-28 10:11 ` KOSAKI Motohiro
2009-04-28 11:05 ` Ingo Molnar
2009-04-28 11:36 ` Wu Fengguang
2009-04-28 12:17 ` [rfc] object collection tracing (was: [PATCH 5/5] proc: export more page flags in /proc/kpageflags) Ingo Molnar
2009-04-28 13:31 ` Wu Fengguang
2009-05-12 13:01 ` Frederic Weisbecker
2009-05-17 13:36 ` Wu Fengguang
2009-05-17 13:55 ` Frederic Weisbecker
2009-05-17 14:12 ` Wu Fengguang
2009-05-18 11:44 ` KOSAKI Motohiro
2009-05-18 11:47 ` Wu Fengguang
2009-04-28 10:18 ` [PATCH 5/5] proc: export more page flags in /proc/kpageflags Andi Kleen
2009-04-28 8:33 ` Wu Fengguang
2009-04-28 9:24 ` Ingo Molnar
2009-04-28 18:11 ` Tony Luck
2009-04-28 18:34 ` Matt Mackall
2009-04-28 20:47 ` Tony Luck
2009-04-28 20:54 ` Andi Kleen
2009-04-28 20:59 ` Matt Mackall
2009-04-28 21:17 ` Andrew Morton
2009-04-28 21:49 ` Matt Mackall
2009-04-29 0:02 ` Robin Holt
2009-04-28 17:49 ` Matt Mackall
2009-04-29 8:05 ` Wu Fengguang
2009-04-29 19:13 ` Matt Mackall
2009-04-30 1:00 ` Wu Fengguang
2009-04-28 21:32 ` Andrew Morton
2009-04-28 22:46 ` Matt Mackall
2009-04-28 23:02 ` Andrew Morton
2009-04-28 23:31 ` Matt Mackall
2009-04-28 23:42 ` Andrew Morton
2009-04-28 23:55 ` Matt Mackall
2009-04-29 3:33 ` Wu Fengguang [this message]
2009-04-29 2:38 ` Wu Fengguang
2009-04-29 2:55 ` Andrew Morton
2009-04-29 3:48 ` Wu Fengguang
2009-04-29 5:09 ` Wu Fengguang
2009-04-29 4:41 ` Nathan Lynch
2009-04-29 4:50 ` Andrew Morton
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20090429033358.GA10719@localhost \
--to=fengguang.wu@intel.com \
--cc=adobriyan@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=andi@firstfloor.org \
--cc=kosaki.motohiro@jp.fujitsu.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mpm@selenic.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).