From: Wu Fengguang <fengguang.wu@intel.com>
To: Nick Piggin <npiggin@suse.de>
Cc: Andi Kleen <andi@firstfloor.org>,
Andrew Morton <akpm@linux-foundation.org>,
Hugh Dickins <hugh.dickins@tiscali.co.uk>,
"linux-mm@kvack.org" <linux-mm@kvack.org>,
LKML <linux-kernel@vger.kernel.org>
Subject: Re: [RFC][PATCH] HWPOISON: remove the unsafe __set_page_locked()
Date: Fri, 2 Oct 2009 18:54:59 +0800 [thread overview]
Message-ID: <20091002105459.GA14526@localhost> (raw)
In-Reply-To: <20091001020207.GL6327@wotan.suse.de>
On Thu, Oct 01, 2009 at 10:02:07AM +0800, Nick Piggin wrote:
> On Mon, Sep 28, 2009 at 04:44:01PM +0800, Wu Fengguang wrote:
> > On Mon, Sep 28, 2009 at 03:20:25AM +0800, Nick Piggin wrote:
> > > On Sun, Sep 27, 2009 at 06:47:39PM +0800, Wu Fengguang wrote:
> > > > >
> > > > > And standard deviation is 0.04%, much larger than the difference 0.008% ..
> > > >
> > > > Sorry that's not correct. I improved the accounting by treating
> > > > function0+function1 from two CPUs as an integral entity:
> > > >
> > > > total time add_to_page_cache_lru percent stddev
> > > > before 3880166848.722 9683329.610 0.250% 0.014%
> > > > after 3828516894.376 9778088.870 0.256% 0.012%
> > > > delta 0.006%
> > >
> > > I don't understand why you're doing this NFS workload to measure?
> >
> > Because it is the first convenient workload hit my mind, and avoids
> > real disk IO :)
>
> Using tmpfs or sparse files is probably a lot easier.
Good ideas. In fact I tried them in the very beginning.
The ratios are roughly at the same level (which is somehow unexpected):
total time add_to_page_cache_lru percent stddev
tmpfs 1579056274.576 2615476.234 0.1656354036338758%
sparse 1074931917.425 3001273 0.27920586888791538%
Workload is to copy 1G /dev/zero to /dev/shm/, or 1G sparse file
(ext2) to /dev/null.
echo 1 > /debug/tracing/function_profile_enabled
cp /dev/zero /dev/shm/
echo 0 > /debug/tracing/function_profile_enabled
dd if=/dev/zero of=/mnt/test bs=1k count=1 seek=1048575
echo 1 > /debug/tracing/function_profile_enabled
cp /mnt/test/sparse /dev/null
echo 0 > /debug/tracing/function_profile_enabled
> > > I see significant nfs, networking protocol and device overheads in
> > > your profiles, also you're hitting some locks or something which
> > > is causing massive context switching. So I don't think this is a
> > > good test.
> >
> > Yes there are overheads. However it is a real and common workload.
>
> Right, but so are lots of other workloads that don't hit
> add_to_page_cache heavily :)
>
>
> > > But anyway as Hugh points out, you need to compare with a
> > > *completely* fixed kernel, which includes auditing all users of page
> > > flags non-atomically (slab, notably, but possibly also other
> > > places).
> >
> > That's good point. We can do more benchmarks when more fixes are
> > available. However I suspect their design goal will be "fix them
> > without introducing noticeable overheads" :)
>
> s/noticeable//
>
> The problem with all the non-noticeable overheads that we're
> continually adding to the kernel is that we're adding them to
> the kernel. Non-noticeable part only makes it worse because
> you can't bisect them :)
Yes it makes sense.
> > > One other thing to keep in mind that I will mention is that I am
> > > going to push in a patch to the page allocator to allow callers
> > > to avoid the refcounting (atomic_dec_and_test) in page lifetime,
> > > which is especially important for SLUB and takes more cycles off
> > > the page allocator...
> > >
> > > I don't know exactly what you're going to do after that to get a
> > > stable reference to slab pages. I guess you can read the page
> > > flags and speculatively take some slab locks and recheck etc...
> >
> > For reliably we could skip page lock on zero refcounted pages.
> >
> > We may lose the PG_hwpoison bit on races with __SetPageSlub*, however
> > it should be an acceptable imperfection.
>
> I think if you're wiling to accept these problems, then it is
> completely reasonable to also accept similar races with kernel
> fastpaths to avoid extra overheads there.
Yes I do. Even better, for this perticular race, we managed to avoid
it completely without introducing overheads in fast path :)
Thanks,
Fengguang
WARNING: multiple messages have this Message-ID (diff)
From: Wu Fengguang <fengguang.wu@intel.com>
To: Nick Piggin <npiggin@suse.de>
Cc: Andi Kleen <andi@firstfloor.org>,
Andrew Morton <akpm@linux-foundation.org>,
Hugh Dickins <hugh.dickins@tiscali.co.uk>,
"linux-mm@kvack.org" <linux-mm@kvack.org>,
LKML <linux-kernel@vger.kernel.org>
Subject: Re: [RFC][PATCH] HWPOISON: remove the unsafe __set_page_locked()
Date: Fri, 2 Oct 2009 18:54:59 +0800 [thread overview]
Message-ID: <20091002105459.GA14526@localhost> (raw)
In-Reply-To: <20091001020207.GL6327@wotan.suse.de>
On Thu, Oct 01, 2009 at 10:02:07AM +0800, Nick Piggin wrote:
> On Mon, Sep 28, 2009 at 04:44:01PM +0800, Wu Fengguang wrote:
> > On Mon, Sep 28, 2009 at 03:20:25AM +0800, Nick Piggin wrote:
> > > On Sun, Sep 27, 2009 at 06:47:39PM +0800, Wu Fengguang wrote:
> > > > >
> > > > > And standard deviation is 0.04%, much larger than the difference 0.008% ..
> > > >
> > > > Sorry that's not correct. I improved the accounting by treating
> > > > function0+function1 from two CPUs as an integral entity:
> > > >
> > > > total time add_to_page_cache_lru percent stddev
> > > > before 3880166848.722 9683329.610 0.250% 0.014%
> > > > after 3828516894.376 9778088.870 0.256% 0.012%
> > > > delta 0.006%
> > >
> > > I don't understand why you're doing this NFS workload to measure?
> >
> > Because it is the first convenient workload hit my mind, and avoids
> > real disk IO :)
>
> Using tmpfs or sparse files is probably a lot easier.
Good ideas. In fact I tried them in the very beginning.
The ratios are roughly at the same level (which is somehow unexpected):
total time add_to_page_cache_lru percent stddev
tmpfs 1579056274.576 2615476.234 0.1656354036338758%
sparse 1074931917.425 3001273 0.27920586888791538%
Workload is to copy 1G /dev/zero to /dev/shm/, or 1G sparse file
(ext2) to /dev/null.
echo 1 > /debug/tracing/function_profile_enabled
cp /dev/zero /dev/shm/
echo 0 > /debug/tracing/function_profile_enabled
dd if=/dev/zero of=/mnt/test bs=1k count=1 seek=1048575
echo 1 > /debug/tracing/function_profile_enabled
cp /mnt/test/sparse /dev/null
echo 0 > /debug/tracing/function_profile_enabled
> > > I see significant nfs, networking protocol and device overheads in
> > > your profiles, also you're hitting some locks or something which
> > > is causing massive context switching. So I don't think this is a
> > > good test.
> >
> > Yes there are overheads. However it is a real and common workload.
>
> Right, but so are lots of other workloads that don't hit
> add_to_page_cache heavily :)
>
>
> > > But anyway as Hugh points out, you need to compare with a
> > > *completely* fixed kernel, which includes auditing all users of page
> > > flags non-atomically (slab, notably, but possibly also other
> > > places).
> >
> > That's good point. We can do more benchmarks when more fixes are
> > available. However I suspect their design goal will be "fix them
> > without introducing noticeable overheads" :)
>
> s/noticeable//
>
> The problem with all the non-noticeable overheads that we're
> continually adding to the kernel is that we're adding them to
> the kernel. Non-noticeable part only makes it worse because
> you can't bisect them :)
Yes it makes sense.
> > > One other thing to keep in mind that I will mention is that I am
> > > going to push in a patch to the page allocator to allow callers
> > > to avoid the refcounting (atomic_dec_and_test) in page lifetime,
> > > which is especially important for SLUB and takes more cycles off
> > > the page allocator...
> > >
> > > I don't know exactly what you're going to do after that to get a
> > > stable reference to slab pages. I guess you can read the page
> > > flags and speculatively take some slab locks and recheck etc...
> >
> > For reliably we could skip page lock on zero refcounted pages.
> >
> > We may lose the PG_hwpoison bit on races with __SetPageSlub*, however
> > it should be an acceptable imperfection.
>
> I think if you're wiling to accept these problems, then it is
> completely reasonable to also accept similar races with kernel
> fastpaths to avoid extra overheads there.
Yes I do. Even better, for this perticular race, we managed to avoid
it completely without introducing overheads in fast path :)
Thanks,
Fengguang
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2009-10-02 10:55 UTC|newest]
Thread overview: 54+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-09-26 3:15 [RFC][PATCH] HWPOISON: remove the unsafe __set_page_locked() Wu Fengguang
2009-09-26 3:15 ` Wu Fengguang
2009-09-26 3:49 ` Andi Kleen
2009-09-26 3:49 ` Andi Kleen
2009-09-26 10:52 ` Wu Fengguang
2009-09-26 10:52 ` Wu Fengguang
2009-09-26 11:31 ` Wu Fengguang
2009-09-26 11:31 ` Wu Fengguang
2009-09-27 10:47 ` Wu Fengguang
2009-09-27 10:47 ` Wu Fengguang
2009-09-27 19:20 ` Nick Piggin
2009-09-27 19:20 ` Nick Piggin
2009-09-28 8:44 ` Wu Fengguang
2009-09-28 8:44 ` Wu Fengguang
2009-09-29 5:16 ` Wu Fengguang
2009-09-29 5:16 ` Wu Fengguang
2009-10-01 2:02 ` Nick Piggin
2009-10-01 2:02 ` Nick Piggin
2009-10-02 10:54 ` Wu Fengguang [this message]
2009-10-02 10:54 ` Wu Fengguang
2009-09-26 11:09 ` Hugh Dickins
2009-09-26 11:09 ` Hugh Dickins
2009-09-26 11:48 ` Wu Fengguang
2009-09-26 11:48 ` Wu Fengguang
2009-09-26 11:58 ` Hugh Dickins
2009-09-26 11:58 ` Hugh Dickins
2009-09-26 15:05 ` Andi Kleen
2009-09-26 15:05 ` Andi Kleen
2009-09-26 19:12 ` Nick Piggin
2009-09-26 19:12 ` Nick Piggin
2009-09-26 19:14 ` Nick Piggin
2009-09-26 19:14 ` Nick Piggin
2009-09-26 19:06 ` Nick Piggin
2009-09-26 19:06 ` Nick Piggin
2009-09-26 21:32 ` Andi Kleen
2009-09-26 21:32 ` Andi Kleen
2009-09-27 16:26 ` Hugh Dickins
2009-09-27 16:26 ` Hugh Dickins
2009-09-27 19:22 ` Nick Piggin
2009-09-27 19:22 ` Nick Piggin
2009-09-27 21:57 ` Hugh Dickins
2009-09-27 21:57 ` Hugh Dickins
2009-09-27 23:01 ` Nick Piggin
2009-09-27 23:01 ` Nick Piggin
2009-09-28 1:19 ` Andi Kleen
2009-09-28 1:19 ` Andi Kleen
2009-09-28 1:52 ` Wu Fengguang
2009-09-28 1:52 ` Wu Fengguang
2009-09-28 2:57 ` Nick Piggin
2009-09-28 2:57 ` Nick Piggin
2009-09-28 4:11 ` Andi Kleen
2009-09-28 4:11 ` Andi Kleen
2009-09-28 4:29 ` Nick Piggin
2009-09-28 4:29 ` Nick Piggin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20091002105459.GA14526@localhost \
--to=fengguang.wu@intel.com \
--cc=akpm@linux-foundation.org \
--cc=andi@firstfloor.org \
--cc=hugh.dickins@tiscali.co.uk \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=npiggin@suse.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.