public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Jack Steiner <steiner@sgi.com>
To: Jan Beulich <JBeulich@novell.com>
Cc: Ingo Molnar <mingo@elte.hu>, Borislav Petkov <bp@amd64.org>,
	Peter Zijlstra <a.p.zijlstra@chello.nl>,
	Nick Piggin <npiggin@kernel.dk>,
	"x86@kernel.org" <x86@kernel.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Andrew Morton <akpm@linux-foundation.org>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Arnaldo Carvalho de Melo <acme@redhat.com>,
	Ingo Molnar <mingo@redhat.com>,
	tee@sgi.com, Nikanth Karthikesan <knikanth@suse.de>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"H. Peter Anvin" <hpa@zytor.com>
Subject: Re: [PATCH RFC] x86: avoid atomic operation in test_and_set_bit_lock if possible
Date: Fri, 25 Mar 2011 08:12:12 -0500	[thread overview]
Message-ID: <20110325131212.GA15751@sgi.com> (raw)
In-Reply-To: <4D8C772202000078000384E1@vpn.id2.novell.com>

On Fri, Mar 25, 2011 at 10:06:10AM +0000, Jan Beulich wrote:
> >>> On 24.03.11 at 18:19, Ingo Molnar <mingo@elte.hu> wrote:
> > * Jan Beulich <JBeulich@novell.com> wrote:
> >> Are you certain? Iirc the lock prefix implies minimally a read-for-
> >> ownership (if CPUs are really smart enough to optimize away the
> >> write - I wonder whether that would be correct at all when it
> >> comes to locked operations), which means a cacheline can still be
> >> bouncing heavily.
> > 
> > Yeah. On what workload was this?
> > 
> > Generally you use test_and_set_bit() if you expect it to be 'owned' by 
> > whoever calls it, and released by someone else.
> > 
> > It would be really useful to run perf top on an affected box and see which 
> > kernel function causes this. It might be better to add a test_bit() to the 
> > affected codepath - instead of bloating all test_and_set_bit() users.
> 
> Indeed, I agree with you and Linus in this aspect.
> 
> > Note that the patch can also cause overhead: the test_bit() can miss the 
> > cache, it will bring in the cacheline shared, and the subsequent test_and_set() 
> > call will then dirty the cacheline - so the CPU might miss again and has to wait 
> > for other CPUs to first flush this cacheline.
> > 
> > So we really need more details here.
> 
> The problem was observed with __lock_page() (in a variant not
> upstream for reasons not known to me), and prefixing e.g.
> trylock_page() with an extra PageLocked() check yielded the
> below quoted improvements.
> 
> Jack - were there any similar measurements done on upstream
> code?

Not yet but it is high on my list to test. I suspect a similar problem exists.
I'll post the results as soon as I have them.

> 
> Jan
> 
> 
> **** Quoting Jack Steiner <steiner@sgi.com> ****
> 
> The following tests were run on UVSW :
> 	768p Westmere
> 	 128 nodes
> 
> 
> Boot times - greater than 2X reduction in boot time:
> 	2286s PTF #8
> 	1899s PTF #8
> 	 975s new algorithm
> 	 962s new algorithm
> 
> Boot messages referring to udev timeouts - eliminated:
> 	(After the udevadm settle timeout, the events queue contains):
> 
> 	7174 PTF #8
> 	9435 PTF #8
> 	   0 new algorithm
> 	   0 new algorithm
> 
> AIM7 results - no difference at low numbers of tasks. Improvements at high counts:
> 	Jobs/Min at 2000 users
> 		 5100 PTF #8
> 		17750 new algorithm
> 
> 	Wallclock seconds to run test at 2000 users
> 		2250s PTF #8
> 	 	 650s new algorithm
> 
> 	CPU Seconds at 2000 users
> 		1300000 PTF #8
> 		  14000 new algorithm
> 
> 
> Test of large parallel app faulting for text.
> 
> 	Text resident in page cache (10000 pages):
> 		REAL	USER		SYS
> 		22.830s	23m5.567s	 85m59.042s	PTF #8 run1
> 		26.267s	34m3.536s	104m20.035s	PTF #8 run2
> 		10.890s	19m27.305s	 39m50.949s	new algorithm run1
> 		10.860s	20m42.698s	 40m48.889s	new algorithm run2
> 
> 	Text on Disk (1000 pages)
> 		REAL	USER		SYS
> 		31.658s	9m25.379s	71m11.967s	PTF #8
> 		24.348s	6m15.323s	45m27.578s	new algorithm
> 
> _________________________________________________________________________________
> The following tests were run on UV48:
> 	    4 racks
> 	  256 sockets
> 	2452p westmere
> 
> Boot time:
> 	4562 sec PTF#8
> 	1965 sec new
> 
> MPI "helloworld" with 1024 ranks
> 	35 sec PTF #8
> 	22 sec new
> 
> 
> Test of large parallel app faulting for text.
> 	Text resident in page cache (10000 pages):
> 		REAL	USER		SYS
> 		46.394s	141m19s		366m53s		PTF #8
> 		38.986s	137m36		264m52s		PTF #8
> 		 7.987s	 34m50s		 42m36s		new algorithm
> 		10.550s	 43m31s		 59m45s		new algorithm
> 
> 
> AIM7 Results (this is the original AIM7 - not the recent opensource version)
> 	------------------------------
> 	Jobs/Min
> 	 TASKS      PTF #8         new
> 	     1       487.8       486.6
> 	    10      4405.8      4940.6
> 	   100     18570.5     18198.9
> 	  1000     17262.3     17167.1
> 	  2000      4879.3     18163.9
> 	  4000        **       18846.2
> 	------------------------------
> 	Real Seconds
> 	 TASKS      PTF #8         new
> 	     1        11.9        12.0
> 	    10        13.2        11.8
> 	   100        31.3        32.0
> 	  1000       337.2       339.0
> 	  2000      2385.6       640.8
> 	  4000        **        1235.3
> 	------------------------------
> 	CPU Seconds
> 	 TASKS      PTF #8         new
> 	     1         1.6         1.6
> 	    10        11.5        12.9
> 	   100       132.2       137.2
> 	  1000      4486.5      6586.3
> 	  2000   1758419.7     27845.7
> 	  4000        **       65619.5
> 
>            ** Timed out
> 

  parent reply	other threads:[~2011-03-25 13:13 UTC|newest]

Thread overview: 49+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-03-24  4:56 [PATCH RFC] x86: avoid atomic operation in test_and_set_bit_lock if possible Nikanth Karthikesan
2011-03-24  8:52 ` Jan Beulich
2011-03-24  8:56 ` Ingo Molnar
2011-03-24 14:52   ` Borislav Petkov
2011-03-24 16:48     ` Jan Beulich
2011-03-24 17:19       ` Ingo Molnar
2011-03-25 10:06         ` Jan Beulich
2011-03-25 11:10           ` Ingo Molnar
2011-03-25 12:04             ` Nikanth Karthikesan
2011-03-25 13:12           ` Jack Steiner [this message]
2011-03-25 16:29           ` Linus Torvalds
2011-03-25 16:47             ` Jan Beulich
2011-03-25 16:49             ` Jack Steiner
2011-03-24 17:30       ` Jack Steiner
2011-03-24 20:00         ` Ingo Molnar
2011-03-24 20:40           ` Andi Kleen
2011-03-24 20:50             ` Ingo Molnar
2011-03-24 21:37               ` Andi Kleen
2011-03-24 20:48           ` Eric Dumazet
2011-03-24 20:54             ` Ingo Molnar
2011-03-24 21:02               ` Eric Dumazet
2011-03-24 21:42                 ` Andi Kleen
2011-03-24 23:26                   ` Linus Torvalds
2011-03-24 23:56                     ` Andi Kleen
2011-03-25  5:47                       ` Eric Dumazet
2011-03-25  9:32                         ` Ingo Molnar
2011-03-25  9:44                           ` Eric Dumazet
2011-03-25  9:59                             ` Ingo Molnar
2011-03-25 10:50                               ` Borislav Petkov
2011-03-25 11:10                               ` Peter Zijlstra
2011-03-25 11:11                                 ` Ingo Molnar
2011-03-25 16:16                           ` Robert Richter
2011-03-25 17:22                           ` Andi Kleen
2011-03-25 19:26                             ` Ingo Molnar
2011-03-25  9:38                         ` Eric Dumazet
2011-03-25 20:29                           ` Peter Zijlstra
2011-03-26  8:15                             ` Eric Dumazet
2011-03-26  9:44                               ` Peter Zijlstra
2011-03-26  9:57                               ` Ingo Molnar
2011-03-25  9:22                       ` Ingo Molnar
2011-03-25 10:21                         ` Peter Zijlstra
2011-03-25 16:08                           ` Robert Richter
2011-03-25 19:31                             ` Ingo Molnar
2011-03-25 17:15                           ` Andi Kleen
2011-03-25 19:21                             ` Ingo Molnar
2011-03-25  9:35                     ` Ingo Molnar
2011-03-24 17:01 ` Linus Torvalds
2011-03-24 17:13 ` Jack Steiner
2011-03-24 18:38 ` Andi Kleen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20110325131212.GA15751@sgi.com \
    --to=steiner@sgi.com \
    --cc=JBeulich@novell.com \
    --cc=a.p.zijlstra@chello.nl \
    --cc=acme@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=bp@amd64.org \
    --cc=hpa@zytor.com \
    --cc=knikanth@suse.de \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=mingo@redhat.com \
    --cc=npiggin@kernel.dk \
    --cc=tee@sgi.com \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox