From: Wu Fengguang <fengguang.wu@intel.com>
To: Matt Mackall <mpm@selenic.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"kosaki.motohiro@jp.fujitsu.com" <kosaki.motohiro@jp.fujitsu.com>,
"andi@firstfloor.org" <andi@firstfloor.org>,
"adobriyan@gmail.com" <adobriyan@gmail.com>,
"linux-mm@kvack.org" <linux-mm@kvack.org>
Subject: Re: [PATCH 5/5] proc: export more page flags in /proc/kpageflags
Date: Wed, 29 Apr 2009 11:33:58 +0800 [thread overview]
Message-ID: <20090429033358.GA10719@localhost> (raw)
In-Reply-To: <1240962910.938.1084.camel@calx>
On Wed, Apr 29, 2009 at 07:55:10AM +0800, Matt Mackall wrote:
> On Tue, 2009-04-28 at 16:42 -0700, Andrew Morton wrote:
> > On Tue, 28 Apr 2009 18:31:09 -0500
> > Matt Mackall <mpm@selenic.com> wrote:
> >
> > > On Tue, 2009-04-28 at 16:02 -0700, Andrew Morton wrote:
> > > > On Tue, 28 Apr 2009 17:46:34 -0500
> > > > Matt Mackall <mpm@selenic.com> wrote:
> > > >
> > > > > > > +/* a helper function _not_ intended for more general uses */
> > > > > > > +static inline int page_cap_writeback_dirty(struct page *page)
> > > > > > > +{
> > > > > > > + struct address_space *mapping;
> > > > > > > +
> > > > > > > + if (!PageSlab(page))
> > > > > > > + mapping = page_mapping(page);
> > > > > > > + else
> > > > > > > + mapping = NULL;
> > > > > > > +
> > > > > > > + return mapping && mapping_cap_writeback_dirty(mapping);
> > > > > > > +}
> > > > > >
> > > > > > If the page isn't locked then page->mapping can be concurrently removed
> > > > > > and freed. This actually happened to me in real-life testing several
> > > > > > years ago.
> > > > >
> > > > > We certainly don't want to be taking locks per page to build the flags
> > > > > data here. As we don't have any pretense of being atomic, it's ok if we
> > > > > can find a way to do the test that's inaccurate when a race occurs, so
> > > > > long as it doesn't dereference null.
> > > > >
> > > > > But if there's not an obvious way to do that, we should probably just
> > > > > drop this flag bit for this iteration.
> > > >
> > > > trylock_page() could be used here, perhaps.
> > > >
> > > > Then again, why _not_ just do lock_page()? After all, few pages are
> > > > ever locked. There will be latency if the caller stumbles across a
> > > > page which is under read I/O, but so be it?
> > >
> > > As I mentioned just a bit ago, it's really not an unreasonable use case
> > > to want to do this on every page in the system back to back. So per page
> > > overhead matters. And the odds of stalling on a locked page when
> > > visiting 1M pages while under load are probably not negligible.
> >
> > The chances of stalling on a locked page are pretty good, and the
> > duration of the stall might be long indeed. Perhaps a trylock is a
> > decent compromise - it depends on the value of this metric, and I've
> > forgotten what we're talking about ;)
> >
> > umm, seems that this flag is needed to enable PG_error, PG_dirty,
> > PG_uptodate and PG_writeback reporting. So simply removing this code
> > would put a huge hole in the patchset, no?
>
> We can report those bits anyway. But this patchset does something
> clever: it filters irrelevant (and possibly overloaded) bits in various
> contexts.
>
> > > Our lock primitives are pretty low overhead in the fast path, but every
> > > cycle counts. The new tests and branches this code already adds are a
> > > bit worrisome, but on balance probably worth it.
> >
> > That should be easy to quantify (hint).
>
> I'll let Fengguang address both these points.
A quick micro bench: 100 runs on another T7300@2GHz 2GB laptop:
user system total
no lock 0.270 22.850 23.607
trylock 0.310 25.890 26.484
+13.3% +12.2%
But anyway, the plan is to move filtering to user space and eliminate
the complex kernel logics.
The IO filtering is no longer possible in user space, but I didn't see
the error/dirty/writeback bits on this testing system. So I guess it
won't be a big loss.
The huge/gigantic page filtering is also not possible in user space.
So I tend to add a KPF_HUGE flag to distinguish (hardware supported)
huge pages from normal (software) compound pages. Any objections?
Thanks,
Fengguang
WARNING: multiple messages have this Message-ID (diff)
From: Wu Fengguang <fengguang.wu@intel.com>
To: Matt Mackall <mpm@selenic.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"kosaki.motohiro@jp.fujitsu.com" <kosaki.motohiro@jp.fujitsu.com>,
"andi@firstfloor.org" <andi@firstfloor.org>,
"adobriyan@gmail.com" <adobriyan@gmail.com>,
"linux-mm@kvack.org" <linux-mm@kvack.org>
Subject: Re: [PATCH 5/5] proc: export more page flags in /proc/kpageflags
Date: Wed, 29 Apr 2009 11:33:58 +0800 [thread overview]
Message-ID: <20090429033358.GA10719@localhost> (raw)
In-Reply-To: <1240962910.938.1084.camel@calx>
On Wed, Apr 29, 2009 at 07:55:10AM +0800, Matt Mackall wrote:
> On Tue, 2009-04-28 at 16:42 -0700, Andrew Morton wrote:
> > On Tue, 28 Apr 2009 18:31:09 -0500
> > Matt Mackall <mpm@selenic.com> wrote:
> >
> > > On Tue, 2009-04-28 at 16:02 -0700, Andrew Morton wrote:
> > > > On Tue, 28 Apr 2009 17:46:34 -0500
> > > > Matt Mackall <mpm@selenic.com> wrote:
> > > >
> > > > > > > +/* a helper function _not_ intended for more general uses */
> > > > > > > +static inline int page_cap_writeback_dirty(struct page *page)
> > > > > > > +{
> > > > > > > + struct address_space *mapping;
> > > > > > > +
> > > > > > > + if (!PageSlab(page))
> > > > > > > + mapping = page_mapping(page);
> > > > > > > + else
> > > > > > > + mapping = NULL;
> > > > > > > +
> > > > > > > + return mapping && mapping_cap_writeback_dirty(mapping);
> > > > > > > +}
> > > > > >
> > > > > > If the page isn't locked then page->mapping can be concurrently removed
> > > > > > and freed. This actually happened to me in real-life testing several
> > > > > > years ago.
> > > > >
> > > > > We certainly don't want to be taking locks per page to build the flags
> > > > > data here. As we don't have any pretense of being atomic, it's ok if we
> > > > > can find a way to do the test that's inaccurate when a race occurs, so
> > > > > long as it doesn't dereference null.
> > > > >
> > > > > But if there's not an obvious way to do that, we should probably just
> > > > > drop this flag bit for this iteration.
> > > >
> > > > trylock_page() could be used here, perhaps.
> > > >
> > > > Then again, why _not_ just do lock_page()? After all, few pages are
> > > > ever locked. There will be latency if the caller stumbles across a
> > > > page which is under read I/O, but so be it?
> > >
> > > As I mentioned just a bit ago, it's really not an unreasonable use case
> > > to want to do this on every page in the system back to back. So per page
> > > overhead matters. And the odds of stalling on a locked page when
> > > visiting 1M pages while under load are probably not negligible.
> >
> > The chances of stalling on a locked page are pretty good, and the
> > duration of the stall might be long indeed. Perhaps a trylock is a
> > decent compromise - it depends on the value of this metric, and I've
> > forgotten what we're talking about ;)
> >
> > umm, seems that this flag is needed to enable PG_error, PG_dirty,
> > PG_uptodate and PG_writeback reporting. So simply removing this code
> > would put a huge hole in the patchset, no?
>
> We can report those bits anyway. But this patchset does something
> clever: it filters irrelevant (and possibly overloaded) bits in various
> contexts.
>
> > > Our lock primitives are pretty low overhead in the fast path, but every
> > > cycle counts. The new tests and branches this code already adds are a
> > > bit worrisome, but on balance probably worth it.
> >
> > That should be easy to quantify (hint).
>
> I'll let Fengguang address both these points.
A quick micro bench: 100 runs on another T7300@2GHz 2GB laptop:
user system total
no lock 0.270 22.850 23.607
trylock 0.310 25.890 26.484
+13.3% +12.2%
But anyway, the plan is to move filtering to user space and eliminate
the complex kernel logics.
The IO filtering is no longer possible in user space, but I didn't see
the error/dirty/writeback bits on this testing system. So I guess it
won't be a big loss.
The huge/gigantic page filtering is also not possible in user space.
So I tend to add a KPF_HUGE flag to distinguish (hardware supported)
huge pages from normal (software) compound pages. Any objections?
Thanks,
Fengguang
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2009-04-29 3:34 UTC|newest]
Thread overview: 137+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-04-28 1:09 [PATCH 0/5] proc: export more page flags in /proc/kpageflags (take 4) Wu Fengguang
2009-04-28 1:09 ` Wu Fengguang
2009-04-28 1:09 ` [PATCH 1/5] pagemap: document clarifications Wu Fengguang
2009-04-28 1:09 ` Wu Fengguang
2009-04-28 7:11 ` Tommi Rantala
2009-04-28 7:11 ` Tommi Rantala
2009-04-28 1:09 ` [PATCH 2/5] pagemap: documentation 9 more exported page flags Wu Fengguang
2009-04-28 1:09 ` Wu Fengguang
2009-04-28 1:09 ` [PATCH 3/5] mm: introduce PageHuge() for testing huge/gigantic pages Wu Fengguang
2009-04-28 1:09 ` Wu Fengguang
2009-04-28 1:09 ` [PATCH 4/5] proc: kpagecount/kpageflags code cleanup Wu Fengguang
2009-04-28 1:09 ` Wu Fengguang
2009-04-28 1:09 ` [PATCH 5/5] proc: export more page flags in /proc/kpageflags Wu Fengguang
2009-04-28 1:09 ` Wu Fengguang
2009-04-28 6:55 ` Ingo Molnar
2009-04-28 6:55 ` Ingo Molnar
2009-04-28 7:40 ` Andi Kleen
2009-04-28 7:40 ` Andi Kleen
2009-04-28 9:04 ` Pekka Enberg
2009-04-28 9:04 ` Pekka Enberg
2009-04-28 9:10 ` Andi Kleen
2009-04-28 9:10 ` Andi Kleen
2009-04-28 9:15 ` Pekka Enberg
2009-04-28 9:15 ` Pekka Enberg
2009-04-28 9:15 ` Ingo Molnar
2009-04-28 9:15 ` Ingo Molnar
2009-04-28 9:19 ` Pekka Enberg
2009-04-28 9:19 ` Pekka Enberg
2009-04-28 9:25 ` Pekka Enberg
2009-04-28 9:25 ` Pekka Enberg
2009-04-28 9:36 ` Wu Fengguang
2009-04-28 9:36 ` Wu Fengguang
2009-04-28 9:36 ` Ingo Molnar
2009-04-28 9:36 ` Ingo Molnar
2009-04-28 9:57 ` Pekka Enberg
2009-04-28 9:57 ` Pekka Enberg
2009-04-28 10:10 ` KOSAKI Motohiro
2009-04-28 10:10 ` KOSAKI Motohiro
2009-04-28 10:21 ` Pekka Enberg
2009-04-28 10:21 ` Pekka Enberg
2009-04-28 10:56 ` Ingo Molnar
2009-04-28 10:56 ` Ingo Molnar
2009-04-28 11:09 ` KOSAKI Motohiro
2009-04-28 11:09 ` KOSAKI Motohiro
2009-04-28 12:42 ` Ingo Molnar
2009-04-28 12:42 ` Ingo Molnar
2009-04-28 11:03 ` Ingo Molnar
2009-04-28 11:03 ` Ingo Molnar
2009-04-28 17:42 ` Matt Mackall
2009-04-28 17:42 ` Matt Mackall
2009-04-28 9:29 ` Ingo Molnar
2009-04-28 9:29 ` Ingo Molnar
2009-04-28 9:34 ` KOSAKI Motohiro
2009-04-28 9:34 ` KOSAKI Motohiro
2009-04-28 9:38 ` Ingo Molnar
2009-04-28 9:38 ` Ingo Molnar
2009-04-28 9:55 ` Wu Fengguang
2009-04-28 9:55 ` Wu Fengguang
2009-04-28 10:11 ` KOSAKI Motohiro
2009-04-28 10:11 ` KOSAKI Motohiro
2009-04-28 11:05 ` Ingo Molnar
2009-04-28 11:05 ` Ingo Molnar
2009-04-28 11:36 ` Wu Fengguang
2009-04-28 11:36 ` Wu Fengguang
2009-04-28 12:17 ` [rfc] object collection tracing (was: [PATCH 5/5] proc: export more page flags in /proc/kpageflags) Ingo Molnar
2009-04-28 12:17 ` Ingo Molnar
2009-04-28 13:31 ` Wu Fengguang
2009-04-28 13:31 ` Wu Fengguang
2009-05-12 13:01 ` Frederic Weisbecker
2009-05-12 13:01 ` Frederic Weisbecker
2009-05-17 13:36 ` Wu Fengguang
2009-05-17 13:55 ` Frederic Weisbecker
2009-05-17 13:55 ` Frederic Weisbecker
2009-05-17 14:12 ` Wu Fengguang
2009-05-17 14:12 ` Wu Fengguang
2009-05-18 11:44 ` KOSAKI Motohiro
2009-05-18 11:44 ` KOSAKI Motohiro
2009-05-18 11:47 ` Wu Fengguang
2009-05-18 11:47 ` Wu Fengguang
2009-04-28 10:18 ` [PATCH 5/5] proc: export more page flags in /proc/kpageflags Andi Kleen
2009-04-28 10:18 ` Andi Kleen
2009-04-28 8:33 ` Wu Fengguang
2009-04-28 8:33 ` Wu Fengguang
2009-04-28 9:24 ` Ingo Molnar
2009-04-28 9:24 ` Ingo Molnar
2009-04-28 18:11 ` Tony Luck
2009-04-28 18:11 ` Tony Luck
2009-04-28 18:34 ` Matt Mackall
2009-04-28 18:34 ` Matt Mackall
2009-04-28 20:47 ` Tony Luck
2009-04-28 20:47 ` Tony Luck
2009-04-28 20:54 ` Andi Kleen
2009-04-28 20:54 ` Andi Kleen
2009-04-28 20:59 ` Matt Mackall
2009-04-28 20:59 ` Matt Mackall
2009-04-28 21:17 ` Andrew Morton
2009-04-28 21:17 ` Andrew Morton
2009-04-28 21:49 ` Matt Mackall
2009-04-28 21:49 ` Matt Mackall
2009-04-29 0:02 ` Robin Holt
2009-04-29 0:02 ` Robin Holt
2009-04-28 17:49 ` Matt Mackall
2009-04-28 17:49 ` Matt Mackall
2009-04-29 8:05 ` Wu Fengguang
2009-04-29 8:05 ` Wu Fengguang
2009-04-29 19:13 ` Matt Mackall
2009-04-29 19:13 ` Matt Mackall
2009-04-30 1:00 ` Wu Fengguang
2009-04-30 1:00 ` Wu Fengguang
2009-04-28 21:32 ` Andrew Morton
2009-04-28 21:32 ` Andrew Morton
2009-04-28 22:46 ` Matt Mackall
2009-04-28 22:46 ` Matt Mackall
2009-04-28 23:02 ` Andrew Morton
2009-04-28 23:02 ` Andrew Morton
2009-04-28 23:31 ` Matt Mackall
2009-04-28 23:31 ` Matt Mackall
2009-04-28 23:42 ` Andrew Morton
2009-04-28 23:42 ` Andrew Morton
2009-04-28 23:55 ` Matt Mackall
2009-04-28 23:55 ` Matt Mackall
2009-04-29 3:33 ` Wu Fengguang [this message]
2009-04-29 3:33 ` Wu Fengguang
2009-04-29 2:38 ` Wu Fengguang
2009-04-29 2:38 ` Wu Fengguang
2009-04-29 2:55 ` Andrew Morton
2009-04-29 2:55 ` Andrew Morton
2009-04-29 3:48 ` Wu Fengguang
2009-04-29 3:48 ` Wu Fengguang
2009-04-29 5:09 ` Wu Fengguang
2009-04-29 5:09 ` Wu Fengguang
2009-04-29 4:41 ` Nathan Lynch
2009-04-29 4:41 ` Nathan Lynch
2009-04-29 4:41 ` Nathan Lynch
2009-04-29 4:50 ` Andrew Morton
2009-04-29 4:50 ` Andrew Morton
2009-04-29 4:50 ` Andrew Morton
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20090429033358.GA10719@localhost \
--to=fengguang.wu@intel.com \
--cc=adobriyan@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=andi@firstfloor.org \
--cc=kosaki.motohiro@jp.fujitsu.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mpm@selenic.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.