Re: [RFC][PATCH] HWPOISON: remove the unsafe __set_page_locked()

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Wu Fengguang <fengguang.wu@intel.com>
To: Nick Piggin <npiggin@suse.de>
Cc: Andi Kleen <andi@firstfloor.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Hugh Dickins <hugh.dickins@tiscali.co.uk>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	LKML <linux-kernel@vger.kernel.org>
Subject: Re: [RFC][PATCH] HWPOISON: remove the unsafe __set_page_locked()
Date: Fri, 2 Oct 2009 18:54:59 +0800	[thread overview]
Message-ID: <20091002105459.GA14526@localhost> (raw)
In-Reply-To: <20091001020207.GL6327@wotan.suse.de>

On Thu, Oct 01, 2009 at 10:02:07AM +0800, Nick Piggin wrote:
> On Mon, Sep 28, 2009 at 04:44:01PM +0800, Wu Fengguang wrote:
> > On Mon, Sep 28, 2009 at 03:20:25AM +0800, Nick Piggin wrote:
> > > On Sun, Sep 27, 2009 at 06:47:39PM +0800, Wu Fengguang wrote:
> > > > > 
> > > > > And standard deviation is 0.04%, much larger than the difference 0.008% ..
> > > > 
> > > > Sorry that's not correct. I improved the accounting by treating
> > > > function0+function1 from two CPUs as an integral entity:
> > > > 
> > > >                  total time      add_to_page_cache_lru   percent  stddev
> > > >          before  3880166848.722  9683329.610             0.250%   0.014%
> > > >          after   3828516894.376  9778088.870             0.256%   0.012%
> > > >          delta                                           0.006%
> > > 
> > > I don't understand why you're doing this NFS workload to measure?
> > 
> > Because it is the first convenient workload hit my mind, and avoids
> > real disk IO :)
> 
> Using tmpfs or sparse files is probably a lot easier.

Good ideas. In fact I tried them in the very beginning.
The ratios are roughly at the same level (which is somehow unexpected):

        total time      add_to_page_cache_lru   percent  stddev
tmpfs   1579056274.576  2615476.234             0.1656354036338758%
sparse  1074931917.425  3001273                 0.27920586888791538%

Workload is to copy 1G /dev/zero to /dev/shm/, or 1G sparse file
(ext2) to /dev/null.

 echo 1 > /debug/tracing/function_profile_enabled
 cp /dev/zero /dev/shm/
 echo 0 > /debug/tracing/function_profile_enabled

 dd if=/dev/zero of=/mnt/test bs=1k count=1 seek=1048575
 echo 1 > /debug/tracing/function_profile_enabled
 cp /mnt/test/sparse /dev/null
 echo 0 > /debug/tracing/function_profile_enabled

> > > I see significant nfs, networking protocol and device overheads in
> > > your profiles, also you're hitting some locks or something which
> > > is causing massive context switching. So I don't think this is a
> > > good test.
> > 
> > Yes there are overheads. However it is a real and common workload.
> 
> Right, but so are lots of other workloads that don't hit
> add_to_page_cache heavily :)
>  
> 
> > > But anyway as Hugh points out, you need to compare with a
> > > *completely* fixed kernel, which includes auditing all users of page
> > > flags non-atomically (slab, notably, but possibly also other
> > > places).
> > 
> > That's good point. We can do more benchmarks when more fixes are
> > available. However I suspect their design goal will be "fix them
> > without introducing noticeable overheads" :)
> 
> s/noticeable//
> 
> The problem with all the non-noticeable overheads that we're
> continually adding to the kernel is that we're adding them to
> the kernel. Non-noticeable part only makes it worse because
> you can't bisect them :)

Yes it makes sense.

> > > One other thing to keep in mind that I will mention is that I am
> > > going to push in a patch to the page allocator to allow callers
> > > to avoid the refcounting (atomic_dec_and_test) in page lifetime,
> > > which is especially important for SLUB and takes more cycles off
> > > the page allocator...
> > >
> > > I don't know exactly what you're going to do after that to get a
> > > stable reference to slab pages. I guess you can read the page
> > > flags and speculatively take some slab locks and recheck etc...
> > 
> > For reliably we could skip page lock on zero refcounted pages.
> > 
> > We may lose the PG_hwpoison bit on races with __SetPageSlub*, however
> > it should be an acceptable imperfection.
> 
> I think if you're wiling to accept these problems, then it is
> completely reasonable to also accept similar races with kernel
> fastpaths to avoid extra overheads there.

Yes I do. Even better, for this perticular race, we managed to avoid
it completely without introducing overheads in fast path :)

Thanks,
Fengguang

WARNING: multiple messages have this Message-ID (diff)

From: Wu Fengguang <fengguang.wu@intel.com>
To: Nick Piggin <npiggin@suse.de>
Cc: Andi Kleen <andi@firstfloor.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Hugh Dickins <hugh.dickins@tiscali.co.uk>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	LKML <linux-kernel@vger.kernel.org>
Subject: Re: [RFC][PATCH] HWPOISON: remove the unsafe __set_page_locked()
Date: Fri, 2 Oct 2009 18:54:59 +0800	[thread overview]
Message-ID: <20091002105459.GA14526@localhost> (raw)
In-Reply-To: <20091001020207.GL6327@wotan.suse.de>

On Thu, Oct 01, 2009 at 10:02:07AM +0800, Nick Piggin wrote:
> On Mon, Sep 28, 2009 at 04:44:01PM +0800, Wu Fengguang wrote:
> > On Mon, Sep 28, 2009 at 03:20:25AM +0800, Nick Piggin wrote:
> > > On Sun, Sep 27, 2009 at 06:47:39PM +0800, Wu Fengguang wrote:
> > > > > 
> > > > > And standard deviation is 0.04%, much larger than the difference 0.008% ..
> > > > 
> > > > Sorry that's not correct. I improved the accounting by treating
> > > > function0+function1 from two CPUs as an integral entity:
> > > > 
> > > >                  total time      add_to_page_cache_lru   percent  stddev
> > > >          before  3880166848.722  9683329.610             0.250%   0.014%
> > > >          after   3828516894.376  9778088.870             0.256%   0.012%
> > > >          delta                                           0.006%
> > > 
> > > I don't understand why you're doing this NFS workload to measure?
> > 
> > Because it is the first convenient workload hit my mind, and avoids
> > real disk IO :)
> 
> Using tmpfs or sparse files is probably a lot easier.

Good ideas. In fact I tried them in the very beginning.
The ratios are roughly at the same level (which is somehow unexpected):

        total time      add_to_page_cache_lru   percent  stddev
tmpfs   1579056274.576  2615476.234             0.1656354036338758%
sparse  1074931917.425  3001273                 0.27920586888791538%

Workload is to copy 1G /dev/zero to /dev/shm/, or 1G sparse file
(ext2) to /dev/null.

 echo 1 > /debug/tracing/function_profile_enabled
 cp /dev/zero /dev/shm/
 echo 0 > /debug/tracing/function_profile_enabled

 dd if=/dev/zero of=/mnt/test bs=1k count=1 seek=1048575
 echo 1 > /debug/tracing/function_profile_enabled
 cp /mnt/test/sparse /dev/null
 echo 0 > /debug/tracing/function_profile_enabled

> > > I see significant nfs, networking protocol and device overheads in
> > > your profiles, also you're hitting some locks or something which
> > > is causing massive context switching. So I don't think this is a
> > > good test.
> > 
> > Yes there are overheads. However it is a real and common workload.
> 
> Right, but so are lots of other workloads that don't hit
> add_to_page_cache heavily :)
>  
> 
> > > But anyway as Hugh points out, you need to compare with a
> > > *completely* fixed kernel, which includes auditing all users of page
> > > flags non-atomically (slab, notably, but possibly also other
> > > places).
> > 
> > That's good point. We can do more benchmarks when more fixes are
> > available. However I suspect their design goal will be "fix them
> > without introducing noticeable overheads" :)
> 
> s/noticeable//
> 
> The problem with all the non-noticeable overheads that we're
> continually adding to the kernel is that we're adding them to
> the kernel. Non-noticeable part only makes it worse because
> you can't bisect them :)

Yes it makes sense.

> > > One other thing to keep in mind that I will mention is that I am
> > > going to push in a patch to the page allocator to allow callers
> > > to avoid the refcounting (atomic_dec_and_test) in page lifetime,
> > > which is especially important for SLUB and takes more cycles off
> > > the page allocator...
> > >
> > > I don't know exactly what you're going to do after that to get a
> > > stable reference to slab pages. I guess you can read the page
> > > flags and speculatively take some slab locks and recheck etc...
> > 
> > For reliably we could skip page lock on zero refcounted pages.
> > 
> > We may lose the PG_hwpoison bit on races with __SetPageSlub*, however
> > it should be an acceptable imperfection.
> 
> I think if you're wiling to accept these problems, then it is
> completely reasonable to also accept similar races with kernel
> fastpaths to avoid extra overheads there.

Yes I do. Even better, for this perticular race, we managed to avoid
it completely without introducing overheads in fast path :)

Thanks,
Fengguang

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

next prev parent reply	other threads:[~2009-10-02 10:55 UTC|newest]

Thread overview: 54+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-09-26  3:15 [RFC][PATCH] HWPOISON: remove the unsafe __set_page_locked() Wu Fengguang
2009-09-26  3:15 ` Wu Fengguang
2009-09-26  3:49 ` Andi Kleen
2009-09-26  3:49   ` Andi Kleen
2009-09-26 10:52   ` Wu Fengguang
2009-09-26 10:52     ` Wu Fengguang
2009-09-26 11:31     ` Wu Fengguang
2009-09-26 11:31       ` Wu Fengguang
2009-09-27 10:47       ` Wu Fengguang
2009-09-27 10:47         ` Wu Fengguang
2009-09-27 19:20         ` Nick Piggin
2009-09-27 19:20           ` Nick Piggin
2009-09-28  8:44           ` Wu Fengguang
2009-09-28  8:44             ` Wu Fengguang
2009-09-29  5:16             ` Wu Fengguang
2009-09-29  5:16               ` Wu Fengguang
2009-10-01  2:02             ` Nick Piggin
2009-10-01  2:02               ` Nick Piggin
2009-10-02 10:54               ` Wu Fengguang [this message]
2009-10-02 10:54                 ` Wu Fengguang
2009-09-26 11:09 ` Hugh Dickins
2009-09-26 11:09   ` Hugh Dickins
2009-09-26 11:48   ` Wu Fengguang
2009-09-26 11:48     ` Wu Fengguang
2009-09-26 11:58     ` Hugh Dickins
2009-09-26 11:58       ` Hugh Dickins
2009-09-26 15:05     ` Andi Kleen
2009-09-26 15:05       ` Andi Kleen
2009-09-26 19:12       ` Nick Piggin
2009-09-26 19:12         ` Nick Piggin
2009-09-26 19:14     ` Nick Piggin
2009-09-26 19:14       ` Nick Piggin
2009-09-26 19:06   ` Nick Piggin
2009-09-26 19:06     ` Nick Piggin
2009-09-26 21:32     ` Andi Kleen
2009-09-26 21:32       ` Andi Kleen
2009-09-27 16:26       ` Hugh Dickins
2009-09-27 16:26         ` Hugh Dickins
2009-09-27 19:22         ` Nick Piggin
2009-09-27 19:22           ` Nick Piggin
2009-09-27 21:57           ` Hugh Dickins
2009-09-27 21:57             ` Hugh Dickins
2009-09-27 23:01             ` Nick Piggin
2009-09-27 23:01               ` Nick Piggin
2009-09-28  1:19               ` Andi Kleen
2009-09-28  1:19                 ` Andi Kleen
2009-09-28  1:52                 ` Wu Fengguang
2009-09-28  1:52                   ` Wu Fengguang
2009-09-28  2:57                 ` Nick Piggin
2009-09-28  2:57                   ` Nick Piggin
2009-09-28  4:11                   ` Andi Kleen
2009-09-28  4:11                     ` Andi Kleen
2009-09-28  4:29                     ` Nick Piggin
2009-09-28  4:29                       ` Nick Piggin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20091002105459.GA14526@localhost \
    --to=fengguang.wu@intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=andi@firstfloor.org \
    --cc=hugh.dickins@tiscali.co.uk \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=npiggin@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.