public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH RFC] x86: avoid atomic operation in test_and_set_bit_lock if possible
@ 2011-03-24  4:56 Nikanth Karthikesan
  2011-03-24  8:52 ` Jan Beulich
                   ` (4 more replies)
  0 siblings, 5 replies; 49+ messages in thread
From: Nikanth Karthikesan @ 2011-03-24  4:56 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Nick Piggin, Thomas Gleixner, H. Peter Anvin, x86, Andrew Morton,
	Jan Beulich, Jack Steiner, linux-kernel

On x86_64 SMP with lots of CPU atomic instructions which assert the LOCK #
signal can stall other CPUs. And as the number of cores increase this penalty
scales proportionately. So it is best to try and avoid atomic instructions
wherever possible. test_and_set_bit_lock() can avoid using LOCK_PREFIX if it
finds the bit set already.

Signed-off-by: Nikanth Karthikesan <knikanth@suse.de>

---

diff --git a/arch/x86/include/asm/bitops.h b/arch/x86/include/asm/bitops.h
index 903683b..26a42ff 100644
--- a/arch/x86/include/asm/bitops.h
+++ b/arch/x86/include/asm/bitops.h
@@ -203,19 +203,6 @@ static inline int test_and_set_bit(int nr, volatile unsigned long *addr)
 }
 
 /**
- * test_and_set_bit_lock - Set a bit and return its old value for lock
- * @nr: Bit to set
- * @addr: Address to count from
- *
- * This is the same as test_and_set_bit on x86.
- */
-static __always_inline int
-test_and_set_bit_lock(int nr, volatile unsigned long *addr)
-{
-	return test_and_set_bit(nr, addr);
-}
-
-/**
  * __test_and_set_bit - Set a bit and return its old value
  * @nr: Bit to set
  * @addr: Address to count from
@@ -339,6 +326,25 @@ static int test_bit(int nr, const volatile unsigned long *addr);
 	 : variable_test_bit((nr), (addr)))
 
 /**
+ * test_and_set_bit_lock - Set a bit and return its old value for lock
+ * @nr: Bit to set
+ * @addr: Address to count from
+ *
+ * This is the same as test_and_set_bit on x86. But atomic operation is
+ * avoided, if the bit was already set.
+ */
+static __always_inline int
+test_and_set_bit_lock(int nr, volatile unsigned long *addr)
+{
+#ifdef CONFIG_SMP
+	barrier();
+	if (test_bit(nr, addr))
+		return 1;
+#endif
+	return test_and_set_bit(nr, addr);
+}
+
+/**
  * __ffs - find first set bit in word
  * @word: The word to search
  *

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* Re: [PATCH RFC] x86: avoid atomic operation in test_and_set_bit_lock if possible
  2011-03-24  4:56 [PATCH RFC] x86: avoid atomic operation in test_and_set_bit_lock if possible Nikanth Karthikesan
@ 2011-03-24  8:52 ` Jan Beulich
  2011-03-24  8:56 ` Ingo Molnar
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 49+ messages in thread
From: Jan Beulich @ 2011-03-24  8:52 UTC (permalink / raw)
  To: Nikanth Karthikesan
  Cc: Nick Piggin, x86, Thomas Gleixner, Andrew Morton, Ingo Molnar,
	Jack Steiner, linux-kernel, H. Peter Anvin

>>> On 24.03.11 at 05:56, Nikanth Karthikesan <knikanth@suse.de> wrote:
> On x86_64 SMP with lots of CPU atomic instructions which assert the LOCK #
> signal can stall other CPUs. And as the number of cores increase this 
> penalty
> scales proportionately. So it is best to try and avoid atomic instructions
> wherever possible. test_and_set_bit_lock() can avoid using LOCK_PREFIX if it
> finds the bit set already.

Why don't you do this in test_and_set_bit() instead?

Jan

> Signed-off-by: Nikanth Karthikesan <knikanth@suse.de>
> 
> ---
> 
> diff --git a/arch/x86/include/asm/bitops.h b/arch/x86/include/asm/bitops.h
> index 903683b..26a42ff 100644
> --- a/arch/x86/include/asm/bitops.h
> +++ b/arch/x86/include/asm/bitops.h
> @@ -203,19 +203,6 @@ static inline int test_and_set_bit(int nr, volatile 
> unsigned long *addr)
>  }
>  
>  /**
> - * test_and_set_bit_lock - Set a bit and return its old value for lock
> - * @nr: Bit to set
> - * @addr: Address to count from
> - *
> - * This is the same as test_and_set_bit on x86.
> - */
> -static __always_inline int
> -test_and_set_bit_lock(int nr, volatile unsigned long *addr)
> -{
> -	return test_and_set_bit(nr, addr);
> -}
> -
> -/**
>   * __test_and_set_bit - Set a bit and return its old value
>   * @nr: Bit to set
>   * @addr: Address to count from
> @@ -339,6 +326,25 @@ static int test_bit(int nr, const volatile unsigned long 
> *addr);
>  	 : variable_test_bit((nr), (addr)))
>  
>  /**
> + * test_and_set_bit_lock - Set a bit and return its old value for lock
> + * @nr: Bit to set
> + * @addr: Address to count from
> + *
> + * This is the same as test_and_set_bit on x86. But atomic operation is
> + * avoided, if the bit was already set.
> + */
> +static __always_inline int
> +test_and_set_bit_lock(int nr, volatile unsigned long *addr)
> +{
> +#ifdef CONFIG_SMP
> +	barrier();
> +	if (test_bit(nr, addr))
> +		return 1;
> +#endif
> +	return test_and_set_bit(nr, addr);
> +}
> +
> +/**
>   * __ffs - find first set bit in word
>   * @word: The word to search
>   *




^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH RFC] x86: avoid atomic operation in test_and_set_bit_lock if possible
  2011-03-24  4:56 [PATCH RFC] x86: avoid atomic operation in test_and_set_bit_lock if possible Nikanth Karthikesan
  2011-03-24  8:52 ` Jan Beulich
@ 2011-03-24  8:56 ` Ingo Molnar
  2011-03-24 14:52   ` Borislav Petkov
  2011-03-24 17:01 ` Linus Torvalds
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 49+ messages in thread
From: Ingo Molnar @ 2011-03-24  8:56 UTC (permalink / raw)
  To: Nikanth Karthikesan
  Cc: Ingo Molnar, Nick Piggin, Thomas Gleixner, H. Peter Anvin, x86,
	Andrew Morton, Jan Beulich, Jack Steiner, linux-kernel,
	Linus Torvalds, Borislav Petkov, Peter Zijlstra


* Nikanth Karthikesan <knikanth@suse.de> wrote:

> On x86_64 SMP with lots of CPU atomic instructions which assert the LOCK #
> signal can stall other CPUs. And as the number of cores increase this penalty
> scales proportionately. So it is best to try and avoid atomic instructions
> wherever possible. test_and_set_bit_lock() can avoid using LOCK_PREFIX if it
> finds the bit set already.
> 
> Signed-off-by: Nikanth Karthikesan <knikanth@suse.de>
> 
> ---
> 
> diff --git a/arch/x86/include/asm/bitops.h b/arch/x86/include/asm/bitops.h
> index 903683b..26a42ff 100644
> --- a/arch/x86/include/asm/bitops.h
> +++ b/arch/x86/include/asm/bitops.h
> @@ -203,19 +203,6 @@ static inline int test_and_set_bit(int nr, volatile unsigned long *addr)
>  }
>  
>  /**
> - * test_and_set_bit_lock - Set a bit and return its old value for lock
> - * @nr: Bit to set
> - * @addr: Address to count from
> - *
> - * This is the same as test_and_set_bit on x86.
> - */
> -static __always_inline int
> -test_and_set_bit_lock(int nr, volatile unsigned long *addr)
> -{
> -	return test_and_set_bit(nr, addr);
> -}
> -
> -/**
>   * __test_and_set_bit - Set a bit and return its old value
>   * @nr: Bit to set
>   * @addr: Address to count from
> @@ -339,6 +326,25 @@ static int test_bit(int nr, const volatile unsigned long *addr);
>  	 : variable_test_bit((nr), (addr)))
>  
>  /**
> + * test_and_set_bit_lock - Set a bit and return its old value for lock
> + * @nr: Bit to set
> + * @addr: Address to count from
> + *
> + * This is the same as test_and_set_bit on x86. But atomic operation is
> + * avoided, if the bit was already set.
> + */
> +static __always_inline int
> +test_and_set_bit_lock(int nr, volatile unsigned long *addr)
> +{
> +#ifdef CONFIG_SMP
> +	barrier();
> +	if (test_bit(nr, addr))
> +		return 1;
> +#endif
> +	return test_and_set_bit(nr, addr);
> +}

On modern x86 CPUs there's no "#LOCK signal" anymore - it's replaced by a 
M[O]ESI cache coherency bus. I'd expect modern x86 CPUs to be pretty fast when 
the cacheline is local and the bit is set already.

So you really need to back up your patch with actual hard numbers. Putting this 
code into user-space and using pthreads to loop on the same global variable and 
testing the before/after effect would be sufficient i think. You can use 'perf 
stat --repeat 10' kind of measurements to see whether there's any improvement 
larger than the noise of the measurement.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH RFC] x86: avoid atomic operation in test_and_set_bit_lock if possible
  2011-03-24  8:56 ` Ingo Molnar
@ 2011-03-24 14:52   ` Borislav Petkov
  2011-03-24 16:48     ` Jan Beulich
  0 siblings, 1 reply; 49+ messages in thread
From: Borislav Petkov @ 2011-03-24 14:52 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Nikanth Karthikesan, Ingo Molnar, Nick Piggin, Thomas Gleixner,
	H. Peter Anvin, x86@kernel.org, Andrew Morton, Jan Beulich,
	Jack Steiner, linux-kernel@vger.kernel.org, Linus Torvalds,
	Peter Zijlstra

On Thu, Mar 24, 2011 at 04:56:47AM -0400, Ingo Molnar wrote:
> 
> * Nikanth Karthikesan <knikanth@suse.de> wrote:
> 
> > On x86_64 SMP with lots of CPU atomic instructions which assert the LOCK #
> > signal can stall other CPUs. And as the number of cores increase this penalty
> > scales proportionately. So it is best to try and avoid atomic instructions
> > wherever possible. test_and_set_bit_lock() can avoid using LOCK_PREFIX if it
> > finds the bit set already.
> > 
> > Signed-off-by: Nikanth Karthikesan <knikanth@suse.de>

[..]

> > + * test_and_set_bit_lock - Set a bit and return its old value for lock
> > + * @nr: Bit to set
> > + * @addr: Address to count from
> > + *
> > + * This is the same as test_and_set_bit on x86. But atomic operation is
> > + * avoided, if the bit was already set.
> > + */
> > +static __always_inline int
> > +test_and_set_bit_lock(int nr, volatile unsigned long *addr)
> > +{
> > +#ifdef CONFIG_SMP
> > +	barrier();
> > +	if (test_bit(nr, addr))
> > +		return 1;
> > +#endif
> > +	return test_and_set_bit(nr, addr);
> > +}
>
> On modern x86 CPUs there's no "#LOCK signal" anymore - it's replaced
> by a M[O]ESI cache coherency bus. I'd expect modern x86 CPUs to be
> pretty fast when the cacheline is local and the bit is set already.

Correct. However, LOCK still could have some overhead associated with it
and avoiding it by using only an non-atomic op (test_bit()) could bring
some miniscule speedup...

> So you really need to back up your patch with actual hard numbers.
> Putting this code into user-space and using pthreads to loop on
> the same global variable and testing the before/after effect would
> be sufficient i think. You can use 'perf stat --repeat 10' kind of
> measurements to see whether there's any improvement larger than the
> noise of the measurement.

and Ingo's question is right on the money - is this speedup noticeable
or does it simply disappear in the noise?

-- 
Regards/Gruss,
Boris.

Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
General Managers: Alberto Bozzo, Andrew Bowd
Registration: Dornach, Gemeinde Aschheim, Landkreis Muenchen
Registergericht Muenchen, HRB Nr. 43632

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH RFC] x86: avoid atomic operation in test_and_set_bit_lock if possible
  2011-03-24 14:52   ` Borislav Petkov
@ 2011-03-24 16:48     ` Jan Beulich
  2011-03-24 17:19       ` Ingo Molnar
  2011-03-24 17:30       ` Jack Steiner
  0 siblings, 2 replies; 49+ messages in thread
From: Jan Beulich @ 2011-03-24 16:48 UTC (permalink / raw)
  To: Borislav Petkov, Ingo Molnar
  Cc: Peter Zijlstra, Nick Piggin, x86@kernel.org, Thomas Gleixner,
	Andrew Morton, Linus Torvalds, Ingo Molnar, Jack Steiner, tee,
	Nikanth Karthikesan, linux-kernel@vger.kernel.org, H. Peter Anvin

>>> On 24.03.11 at 15:52, Borislav Petkov <bp@amd64.org> wrote:

(haven't seen Ingo's original reply, so responding here)

> On Thu, Mar 24, 2011 at 04:56:47AM -0400, Ingo Molnar wrote:
>> 
>> * Nikanth Karthikesan <knikanth@suse.de> wrote:
>> 
>> > On x86_64 SMP with lots of CPU atomic instructions which assert the LOCK #
>> > signal can stall other CPUs. And as the number of cores increase this 
> penalty
>> > scales proportionately. So it is best to try and avoid atomic instructions
>> > wherever possible. test_and_set_bit_lock() can avoid using LOCK_PREFIX if 
> it
>> > finds the bit set already.
>> > 
>> > Signed-off-by: Nikanth Karthikesan <knikanth@suse.de>
> 
> [..]
> 
>> > + * test_and_set_bit_lock - Set a bit and return its old value for lock
>> > + * @nr: Bit to set
>> > + * @addr: Address to count from
>> > + *
>> > + * This is the same as test_and_set_bit on x86. But atomic operation is
>> > + * avoided, if the bit was already set.
>> > + */
>> > +static __always_inline int
>> > +test_and_set_bit_lock(int nr, volatile unsigned long *addr)
>> > +{
>> > +#ifdef CONFIG_SMP
>> > +	barrier();
>> > +	if (test_bit(nr, addr))
>> > +		return 1;
>> > +#endif
>> > +	return test_and_set_bit(nr, addr);
>> > +}
>>
>> On modern x86 CPUs there's no "#LOCK signal" anymore - it's replaced
>> by a M[O]ESI cache coherency bus. I'd expect modern x86 CPUs to be
>> pretty fast when the cacheline is local and the bit is set already.

Are you certain? Iirc the lock prefix implies minimally a read-for-
ownership (if CPUs are really smart enough to optimize away the
write - I wonder whether that would be correct at all when it
comes to locked operations), which means a cacheline can still be
bouncing heavily.

>> So you really need to back up your patch with actual hard numbers.
>> Putting this code into user-space and using pthreads to loop on
>> the same global variable and testing the before/after effect would
>> be sufficient i think. You can use 'perf stat --repeat 10' kind of
>> measurements to see whether there's any improvement larger than the
>> noise of the measurement.
> 
> and Ingo's question is right on the money - is this speedup noticeable
> or does it simply disappear in the noise?

This cacheline bouncing was actually observed and measured
on SGI UV systems, but I'm not certain we're permitted to publish
that data. I'm copying the two SGI guys who had reported that
issue (and the special case fix, which Nikanth simply generalized)
to us, for them to decide.

Jan


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH RFC] x86: avoid atomic operation in test_and_set_bit_lock if possible
  2011-03-24  4:56 [PATCH RFC] x86: avoid atomic operation in test_and_set_bit_lock if possible Nikanth Karthikesan
  2011-03-24  8:52 ` Jan Beulich
  2011-03-24  8:56 ` Ingo Molnar
@ 2011-03-24 17:01 ` Linus Torvalds
  2011-03-24 17:13 ` Jack Steiner
  2011-03-24 18:38 ` Andi Kleen
  4 siblings, 0 replies; 49+ messages in thread
From: Linus Torvalds @ 2011-03-24 17:01 UTC (permalink / raw)
  To: Nikanth Karthikesan
  Cc: Ingo Molnar, Nick Piggin, Thomas Gleixner, H. Peter Anvin, x86,
	Andrew Morton, Jan Beulich, Jack Steiner, linux-kernel

On Wed, Mar 23, 2011 at 9:56 PM, Nikanth Karthikesan <knikanth@suse.de> wrote:
> On x86_64 SMP with lots of CPU atomic instructions which assert the LOCK #
> signal can stall other CPUs. And as the number of cores increase this penalty
> scales proportionately. So it is best to try and avoid atomic instructions
> wherever possible. test_and_set_bit_lock() can avoid using LOCK_PREFIX if it
> finds the bit set already.

This is potentially _very_ wrong. It means that test_and_set_bit() is
no longer a serializing instruction in the failure case, and I wonder
what effect that will have on the thousands of users.

It also means that test_and_set_bit() on an uncached entry now starts
out with a read-for-ownership cache operation, which can be quite a
bit slower than the exclusive ownership thing for the hopefully common
case where it succeeds.

So no, I really think this is seriously wrong. It basically makes it
impossible for the user of the bitop function to do a good job if it
wants to.

WHICH test_and_set_bit() are you having performance issues with?
Because I think the right approach is to do this optimization on a
case-by-case basis in the code that actually does the operation, not
in the low-level routine.

                       Linus

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH RFC] x86: avoid atomic operation in test_and_set_bit_lock if possible
  2011-03-24  4:56 [PATCH RFC] x86: avoid atomic operation in test_and_set_bit_lock if possible Nikanth Karthikesan
                   ` (2 preceding siblings ...)
  2011-03-24 17:01 ` Linus Torvalds
@ 2011-03-24 17:13 ` Jack Steiner
  2011-03-24 18:38 ` Andi Kleen
  4 siblings, 0 replies; 49+ messages in thread
From: Jack Steiner @ 2011-03-24 17:13 UTC (permalink / raw)
  To: Nikanth Karthikesan
  Cc: Ingo Molnar, Nick Piggin, Thomas Gleixner, H. Peter Anvin, x86,
	Andrew Morton, Jan Beulich, linux-kernel

On Thu, Mar 24, 2011 at 10:26:01AM +0530, Nikanth Karthikesan wrote:
> On x86_64 SMP with lots of CPU atomic instructions which assert the LOCK #
> signal can stall other CPUs. And as the number of cores increase this penalty
> scales proportionately. So it is best to try and avoid atomic instructions
> wherever possible. test_and_set_bit_lock() can avoid using LOCK_PREFIX if it
> finds the bit set already.
> 
> Signed-off-by: Nikanth Karthikesan <knikanth@suse.de>
> 



Don't we also have an issue related to the coherency protocols. 

If the cacheline is referenced by a test-and-set instruction and
the line does not currently reside in the local caches, it is fetched
for exclusive access using a single off-socket request.

If the code first reads the CL, then does a test-and-set, the line
may be first read in shared mode. Then a second off-socket request must be issued to
obtain exclusive access. This can be a serious issue on large systems.
If the bit is typically already set, the new code is a big win but is this
the case? I suspect not.


Do we need a new variant of test-and-set? The new variant would
first test the bit, then set it if not already set. This would be used only
in places where the bit is likely already set.  The downside is that it is
frequently difficult to predict whether the bit is already set.


> ---
> 
> diff --git a/arch/x86/include/asm/bitops.h b/arch/x86/include/asm/bitops.h
> index 903683b..26a42ff 100644
> --- a/arch/x86/include/asm/bitops.h
> +++ b/arch/x86/include/asm/bitops.h
> @@ -203,19 +203,6 @@ static inline int test_and_set_bit(int nr, volatile unsigned long *addr)
>  }
>  
>  /**
> - * test_and_set_bit_lock - Set a bit and return its old value for lock
> - * @nr: Bit to set
> - * @addr: Address to count from
> - *
> - * This is the same as test_and_set_bit on x86.
> - */
> -static __always_inline int
> -test_and_set_bit_lock(int nr, volatile unsigned long *addr)
> -{
> -	return test_and_set_bit(nr, addr);
> -}
> -
> -/**
>   * __test_and_set_bit - Set a bit and return its old value
>   * @nr: Bit to set
>   * @addr: Address to count from
> @@ -339,6 +326,25 @@ static int test_bit(int nr, const volatile unsigned long *addr);
>  	 : variable_test_bit((nr), (addr)))
>  
>  /**
> + * test_and_set_bit_lock - Set a bit and return its old value for lock
> + * @nr: Bit to set
> + * @addr: Address to count from
> + *
> + * This is the same as test_and_set_bit on x86. But atomic operation is
> + * avoided, if the bit was already set.
> + */
> +static __always_inline int
> +test_and_set_bit_lock(int nr, volatile unsigned long *addr)
> +{
> +#ifdef CONFIG_SMP
> +	barrier();
> +	if (test_bit(nr, addr))
> +		return 1;
> +#endif
> +	return test_and_set_bit(nr, addr);
> +}
> +
> +/**
>   * __ffs - find first set bit in word
>   * @word: The word to search
>   *

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH RFC] x86: avoid atomic operation in test_and_set_bit_lock if possible
  2011-03-24 16:48     ` Jan Beulich
@ 2011-03-24 17:19       ` Ingo Molnar
  2011-03-25 10:06         ` Jan Beulich
  2011-03-24 17:30       ` Jack Steiner
  1 sibling, 1 reply; 49+ messages in thread
From: Ingo Molnar @ 2011-03-24 17:19 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Borislav Petkov, Peter Zijlstra, Nick Piggin, x86@kernel.org,
	Thomas Gleixner, Andrew Morton, Linus Torvalds, Ingo Molnar,
	Jack Steiner, tee, Nikanth Karthikesan,
	linux-kernel@vger.kernel.org, H. Peter Anvin,
	Arnaldo Carvalho de Melo


* Jan Beulich <JBeulich@novell.com> wrote:

> >>> On 24.03.11 at 15:52, Borislav Petkov <bp@amd64.org> wrote:
> 
> (haven't seen Ingo's original reply, so responding here)
> 
> > On Thu, Mar 24, 2011 at 04:56:47AM -0400, Ingo Molnar wrote:
> >> 
> >> * Nikanth Karthikesan <knikanth@suse.de> wrote:
> >> 
> >> > On x86_64 SMP with lots of CPU atomic instructions which assert the LOCK #
> >> > signal can stall other CPUs. And as the number of cores increase this 
> > penalty
> >> > scales proportionately. So it is best to try and avoid atomic instructions
> >> > wherever possible. test_and_set_bit_lock() can avoid using LOCK_PREFIX if 
> > it
> >> > finds the bit set already.
> >> > 
> >> > Signed-off-by: Nikanth Karthikesan <knikanth@suse.de>
> > 
> > [..]
> > 
> >> > + * test_and_set_bit_lock - Set a bit and return its old value for lock
> >> > + * @nr: Bit to set
> >> > + * @addr: Address to count from
> >> > + *
> >> > + * This is the same as test_and_set_bit on x86. But atomic operation is
> >> > + * avoided, if the bit was already set.
> >> > + */
> >> > +static __always_inline int
> >> > +test_and_set_bit_lock(int nr, volatile unsigned long *addr)
> >> > +{
> >> > +#ifdef CONFIG_SMP
> >> > +	barrier();
> >> > +	if (test_bit(nr, addr))
> >> > +		return 1;
> >> > +#endif
> >> > +	return test_and_set_bit(nr, addr);
> >> > +}
> >>
> >> On modern x86 CPUs there's no "#LOCK signal" anymore - it's replaced
> >> by a M[O]ESI cache coherency bus. I'd expect modern x86 CPUs to be
> >> pretty fast when the cacheline is local and the bit is set already.
> 
> Are you certain? Iirc the lock prefix implies minimally a read-for-
> ownership (if CPUs are really smart enough to optimize away the
> write - I wonder whether that would be correct at all when it
> comes to locked operations), which means a cacheline can still be
> bouncing heavily.

Yeah. On what workload was this?

Generally you use test_and_set_bit() if you expect it to be 'owned' by whoever 
calls it, and released by someone else.

It would be really useful to run perf top on an affected box and see which 
kernel function causes this. It might be better to add a test_bit() to the 
affected codepath - instead of bloating all test_and_set_bit() users.

Note that the patch can also cause overhead: the test_bit() can miss the cache, 
it will bring in the cacheline shared, and the subsequent test_and_set() call 
will then dirty the cacheline - so the CPU might miss again and has to wait for 
other CPUs to first flush this cacheline.

So we really need more details here.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH RFC] x86: avoid atomic operation in test_and_set_bit_lock if possible
  2011-03-24 16:48     ` Jan Beulich
  2011-03-24 17:19       ` Ingo Molnar
@ 2011-03-24 17:30       ` Jack Steiner
  2011-03-24 20:00         ` Ingo Molnar
  1 sibling, 1 reply; 49+ messages in thread
From: Jack Steiner @ 2011-03-24 17:30 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Borislav Petkov, Ingo Molnar, Peter Zijlstra, Nick Piggin,
	x86@kernel.org, Thomas Gleixner, Andrew Morton, Linus Torvalds,
	Ingo Molnar, tee, Nikanth Karthikesan,
	linux-kernel@vger.kernel.org, H. Peter Anvin

> 
> This cacheline bouncing was actually observed and measured
> on SGI UV systems, but I'm not certain we're permitted to publish
> that data. I'm copying the two SGI guys who had reported that
> issue (and the special case fix, which Nikanth simply generalized)
> to us, for them to decide.

We frequently run into the cacheline bouncing issues. I don't have
the data handy that you refer to, but feel free to publish it.

--- jack



^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH RFC] x86: avoid atomic operation in test_and_set_bit_lock if possible
  2011-03-24  4:56 [PATCH RFC] x86: avoid atomic operation in test_and_set_bit_lock if possible Nikanth Karthikesan
                   ` (3 preceding siblings ...)
  2011-03-24 17:13 ` Jack Steiner
@ 2011-03-24 18:38 ` Andi Kleen
  4 siblings, 0 replies; 49+ messages in thread
From: Andi Kleen @ 2011-03-24 18:38 UTC (permalink / raw)
  To: Nikanth Karthikesan
  Cc: Ingo Molnar, Nick Piggin, Thomas Gleixner, H. Peter Anvin, x86,
	Andrew Morton, Jan Beulich, Jack Steiner, linux-kernel

Nikanth Karthikesan <knikanth@suse.de> writes:

> On x86_64 SMP with lots of CPU atomic instructions which assert the LOCK #
> signal can stall other CPUs. And as the number of cores increase this penalty

This description is very wrong. No modern CPU still has a LOCK # signal
or does global stalls for LOCK.

Do you actually have any data this is a problem and how much
difference the patch makes?

Also there's the missing barrier now of course.

-Andi
-- 
ak@linux.intel.com -- Speaking for myself only

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH RFC] x86: avoid atomic operation in test_and_set_bit_lock if possible
  2011-03-24 17:30       ` Jack Steiner
@ 2011-03-24 20:00         ` Ingo Molnar
  2011-03-24 20:40           ` Andi Kleen
  2011-03-24 20:48           ` Eric Dumazet
  0 siblings, 2 replies; 49+ messages in thread
From: Ingo Molnar @ 2011-03-24 20:00 UTC (permalink / raw)
  To: Jack Steiner
  Cc: Jan Beulich, Borislav Petkov, Peter Zijlstra, Nick Piggin,
	x86@kernel.org, Thomas Gleixner, Andrew Morton, Linus Torvalds,
	Ingo Molnar, tee, Nikanth Karthikesan,
	linux-kernel@vger.kernel.org, H. Peter Anvin


* Jack Steiner <steiner@sgi.com> wrote:

> > 
> > This cacheline bouncing was actually observed and measured
> > on SGI UV systems, but I'm not certain we're permitted to publish
> > that data. I'm copying the two SGI guys who had reported that
> > issue (and the special case fix, which Nikanth simply generalized)
> > to us, for them to decide.
> 
> We frequently run into the cacheline bouncing issues. I don't have
> the data handy that you refer to, but feel free to publish it.

One good way to see cache bounces is to run a misses/accesses ratio profile:

   perf top -e cache-misses -e cache-references --count-filter 10

Note the two events: this runs a 'weighted' profile, you'll see (LLC) 
cache-misses of a function relative to cache-references it does, a 
misses/references ratio in essence.

The --count-filter filters out rare entries. (so that rare functions 
accidentally producing a large ratio do not clutter the output)

For example during a scheduler-intense workload you'll get something like:

   PerfTop:   32652 irqs/sec  kernel:71.2%  exact:  0.0% [cache-misses/cache-references],  (all, 16 CPUs)
-------------------------------------------------------------------------------------------------------

   weight    samples  pcnt function                     DSO
   ______    _______ _____ ____________________________ ____________________

      1.9        606  3.2% irqtime_account_process_tick [kernel.kallsyms]   
      1.6        854  4.4% update_vsyscall              [kernel.kallsyms]   
      1.5        446  2.3% atomic_cmpxchg               [kernel.kallsyms]   
      1.5        758  3.9% tick_do_update_jiffies64     [kernel.kallsyms]   
      1.4        149  0.8% arch_local_irq_save          [kernel.kallsyms]   
      1.3       1524  7.9% do_timer                     [kernel.kallsyms]   
      1.2        215  1.1% clear_page_c                 [kernel.kallsyms]   
      1.2        128  0.7% dso__find_symbol             /home/mingo/bin/perf
      1.0        281  1.5% calc_global_load             [kernel.kallsyms]   
      0.9        560  2.9% profile_tick                 [kernel.kallsyms]   
      0.7        246  1.3% _raw_spin_lock               [kernel.kallsyms]   
      0.6       2523 13.1% current_kernel_time          [kernel.kallsyms]   

This output is very different from a plain cycles (or even cache-misses) 
measured profile and is very good at identifying 'bouncy' cache-miss sources. 

Another good 'view' is store-references against store-misses:

   PerfTop:   29530 irqs/sec  kernel:99.5%  exact:  0.0% [L1-dcache-store-misses/L1-dcache-stores],  (all, 16 CPUs)
-------------------------------------------------------------------------------------------------------

   weight    samples  pcnt function                 DSO
   ______    _______ _____ ________________________ __________________________________

   1271.3       3814  3.2% apic_timer_interrupt     [kernel.kallsyms]                 
    844.0        844  0.7% read_tsc                 [kernel.kallsyms]                 
    615.0        615  0.5% timekeeping_get_ns       [kernel.kallsyms]                 
    520.0        520  0.4% intel_pmu_disable_all    [kernel.kallsyms]                 
    390.0        390  0.3% tick_dev_program_event   [kernel.kallsyms]                 
    308.3       1850  1.5% update_vsyscall          [kernel.kallsyms]                 
    251.7        755  0.6% hrtimer_interrupt        [kernel.kallsyms]                 
    246.0        246  0.2% find_busiest_group       [kernel.kallsyms]                 
    222.7        668  0.6% native_apic_mem_write    [kernel.kallsyms]                 
    149.0        298  0.2% apic_write               [kernel.kallsyms]                 
    137.0        274  0.2% irq_enter                [kernel.kallsyms]                 
    105.0        105  0.1% arch_local_irq_save      [kernel.kallsyms]                 
    101.0        101  0.1% tick_program_event       [kernel.kallsyms]                 
     95.5        191  0.2% ack_APIC_irq             [kernel.kallsyms]           

You might want to experiment around with the events to see which one expresses 
things best for you on the system in question.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH RFC] x86: avoid atomic operation in test_and_set_bit_lock if possible
  2011-03-24 20:00         ` Ingo Molnar
@ 2011-03-24 20:40           ` Andi Kleen
  2011-03-24 20:50             ` Ingo Molnar
  2011-03-24 20:48           ` Eric Dumazet
  1 sibling, 1 reply; 49+ messages in thread
From: Andi Kleen @ 2011-03-24 20:40 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Jack Steiner, Jan Beulich, Borislav Petkov, Peter Zijlstra,
	Nick Piggin, x86@kernel.org, Thomas Gleixner, Andrew Morton,
	Linus Torvalds, Ingo Molnar, tee, Nikanth Karthikesan,
	linux-kernel@vger.kernel.org, H. Peter Anvin

Ingo Molnar <mingo@elte.hu> writes:
>
> One good way to see cache bounces is to run a misses/accesses ratio profile:
>
>    perf top -e cache-misses -e cache-references --count-filter 10
>
> Note the two events: this runs a 'weighted' profile, you'll see (LLC) 
> cache-misses of a function relative to cache-references it does, a 
> misses/references ratio in essence.

If anyone does this on a Nehalem please only use 2.6.39-rc*+ which
includes the offcore patches. Anything before that will get you complete
bogus numbers.

BTW with Lin-Ming's memory latency profiling code you can do this much
more directly.

-Andi
-- 
ak@linux.intel.com -- Speaking for myself only

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH RFC] x86: avoid atomic operation in test_and_set_bit_lock if possible
  2011-03-24 20:00         ` Ingo Molnar
  2011-03-24 20:40           ` Andi Kleen
@ 2011-03-24 20:48           ` Eric Dumazet
  2011-03-24 20:54             ` Ingo Molnar
  1 sibling, 1 reply; 49+ messages in thread
From: Eric Dumazet @ 2011-03-24 20:48 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Jack Steiner, Jan Beulich, Borislav Petkov, Peter Zijlstra,
	Nick Piggin, x86@kernel.org, Thomas Gleixner, Andrew Morton,
	Linus Torvalds, Ingo Molnar, tee, Nikanth Karthikesan,
	linux-kernel@vger.kernel.org, H. Peter Anvin

Le jeudi 24 mars 2011 à 21:00 +0100, Ingo Molnar a écrit :

> One good way to see cache bounces is to run a misses/accesses ratio profile:
> 
>    perf top -e cache-misses -e cache-references --count-filter 10
> 

Oh well , something must be broken...

"perf top" is working here, but if I use any "-e ...." argument, it
fails :

# perf top -e cache-misses

  Error: sys_perf_event_open() syscall returned with 2 (No such file or directory).
  /bin/dmesg may provide additional information.

  Fatal: No CONFIG_PERF_EVENTS=y kernel support configured?

# grep PERF_EVENTS .config
CONFIG_HAVE_PERF_EVENTS=y
CONFIG_PERF_EVENTS=y
CONFIG_HAVE_PERF_EVENTS_NMI=y

Nothing in dmesg...

uname -a
Linux ed001 2.6.38-08165-g2712750-dirty #517 SMP Thu Mar 24 21:31:00 CET 2011 i686 i686 i386 GNU/Linux




^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH RFC] x86: avoid atomic operation in test_and_set_bit_lock if possible
  2011-03-24 20:40           ` Andi Kleen
@ 2011-03-24 20:50             ` Ingo Molnar
  2011-03-24 21:37               ` Andi Kleen
  0 siblings, 1 reply; 49+ messages in thread
From: Ingo Molnar @ 2011-03-24 20:50 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Jack Steiner, Jan Beulich, Borislav Petkov, Peter Zijlstra,
	Nick Piggin, x86@kernel.org, Thomas Gleixner, Andrew Morton,
	Linus Torvalds, Ingo Molnar, tee, Nikanth Karthikesan,
	linux-kernel@vger.kernel.org, H. Peter Anvin


* Andi Kleen <andi@firstfloor.org> wrote:

> Ingo Molnar <mingo@elte.hu> writes:
> >
> > One good way to see cache bounces is to run a misses/accesses ratio profile:
> >
> >    perf top -e cache-misses -e cache-references --count-filter 10
> >
> > Note the two events: this runs a 'weighted' profile, you'll see (LLC) 
> > cache-misses of a function relative to cache-references it does, a 
> > misses/references ratio in essence.
> 
> If anyone does this on a Nehalem please only use 2.6.39-rc*+ which
> includes the offcore patches. Anything before that will get you complete
> bogus numbers.

In that case the second example i quoted can be used:

  perf top -e L1-dcache-store-misses -e L1-dcache-stores --count-filter 10

Or:

  perf top -e L1-dcache-load-misses -e L1-dcache-loads --count-filter 10

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH RFC] x86: avoid atomic operation in test_and_set_bit_lock if possible
  2011-03-24 20:48           ` Eric Dumazet
@ 2011-03-24 20:54             ` Ingo Molnar
  2011-03-24 21:02               ` Eric Dumazet
  0 siblings, 1 reply; 49+ messages in thread
From: Ingo Molnar @ 2011-03-24 20:54 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Jack Steiner, Jan Beulich, Borislav Petkov, Peter Zijlstra,
	Nick Piggin, x86@kernel.org, Thomas Gleixner, Andrew Morton,
	Linus Torvalds, Ingo Molnar, tee, Nikanth Karthikesan,
	linux-kernel@vger.kernel.org, H. Peter Anvin


* Eric Dumazet <eric.dumazet@gmail.com> wrote:

> Le jeudi 24 mars 2011 à 21:00 +0100, Ingo Molnar a écrit :
> 
> > One good way to see cache bounces is to run a misses/accesses ratio profile:
> > 
> >    perf top -e cache-misses -e cache-references --count-filter 10
> > 
> 
> Oh well , something must be broken...
> 
> "perf top" is working here, but if I use any "-e ...." argument, it
> fails :
> 
> # perf top -e cache-misses
> 
>   Error: sys_perf_event_open() syscall returned with 2 (No such file or directory).
>   /bin/dmesg may provide additional information.
> 
>   Fatal: No CONFIG_PERF_EVENTS=y kernel support configured?
> 
> # grep PERF_EVENTS .config
> CONFIG_HAVE_PERF_EVENTS=y
> CONFIG_PERF_EVENTS=y
> CONFIG_HAVE_PERF_EVENTS_NMI=y
> 
> Nothing in dmesg...
> 
> uname -a
> Linux ed001 2.6.38-08165-g2712750-dirty #517 SMP Thu Mar 24 21:31:00 CET 2011 i686 i686 i386 GNU/Linux

Does it work better if you run it as root?

To easiest way to debug as user is to do:

  echo -1 > /proc/sys/kernel/perf_event_paranoid 

and to put this into /etc/sysctl.conf:

  kernel.perf_event_paranoid = -1

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH RFC] x86: avoid atomic operation in test_and_set_bit_lock if possible
  2011-03-24 20:54             ` Ingo Molnar
@ 2011-03-24 21:02               ` Eric Dumazet
  2011-03-24 21:42                 ` Andi Kleen
  0 siblings, 1 reply; 49+ messages in thread
From: Eric Dumazet @ 2011-03-24 21:02 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Jack Steiner, Jan Beulich, Borislav Petkov, Peter Zijlstra,
	Nick Piggin, x86@kernel.org, Thomas Gleixner, Andrew Morton,
	Linus Torvalds, Ingo Molnar, tee, Nikanth Karthikesan,
	linux-kernel@vger.kernel.org, H. Peter Anvin

Le jeudi 24 mars 2011 à 21:54 +0100, Ingo Molnar a écrit :

> Does it work better if you run it as root?
> 

I confess I always am root on my dev machines :(

> To easiest way to debug as user is to do:
> 
>   echo -1 > /proc/sys/kernel/perf_event_paranoid 
> 
> and to put this into /etc/sysctl.conf:
> 
>   kernel.perf_event_paranoid = -1
> 


Hmm... on one machine (HP BL460c G1 ) I do have :

[    0.015998] Performance Events: PEBS fmt0+, Core2 events, Broken BIOS detected, using software events only.
[    0.015998] [Firmware Bug]: the BIOS has corrupted hw-PMU resources (MSR 186 is 43003c)


and the other one (HB BL460c G6)

[    0.122760] Performance Events: PEBS fmt1+, Nehalem events, Broken BIOS detected, using software events only.
[    0.123047] [Firmware Bug]: the BIOS has corrupted hw-PMU resources (MSR 38d is 330)


Hmm.. I have latest HP BIOS.

Version: I15
Release Date: 10/25/2010

and

Version: I24
Release Date: 09/02/2010

I am pretty sure I used "-e ..." some time ago with same BIOS.




^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH RFC] x86: avoid atomic operation in test_and_set_bit_lock if possible
  2011-03-24 20:50             ` Ingo Molnar
@ 2011-03-24 21:37               ` Andi Kleen
  0 siblings, 0 replies; 49+ messages in thread
From: Andi Kleen @ 2011-03-24 21:37 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Andi Kleen, Jack Steiner, Jan Beulich, Borislav Petkov,
	Peter Zijlstra, Nick Piggin, x86@kernel.org, Thomas Gleixner,
	Andrew Morton, Linus Torvalds, Ingo Molnar, tee,
	Nikanth Karthikesan, linux-kernel@vger.kernel.org, H. Peter Anvin

> In that case the second example i quoted can be used:
> 
>   perf top -e L1-dcache-store-misses -e L1-dcache-stores --count-filter 10

That will include L2 and L3 stores -- you won't get any indication 
on misses to other sockets.

-Andi
-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH RFC] x86: avoid atomic operation in test_and_set_bit_lock if possible
  2011-03-24 21:02               ` Eric Dumazet
@ 2011-03-24 21:42                 ` Andi Kleen
  2011-03-24 23:26                   ` Linus Torvalds
  0 siblings, 1 reply; 49+ messages in thread
From: Andi Kleen @ 2011-03-24 21:42 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Ingo Molnar, Jack Steiner, Jan Beulich, Borislav Petkov,
	Peter Zijlstra, Nick Piggin, x86@kernel.org, Thomas Gleixner,
	Andrew Morton, Linus Torvalds, Ingo Molnar, tee,
	Nikanth Karthikesan, linux-kernel@vger.kernel.org, H. Peter Anvin

Eric Dumazet <eric.dumazet@gmail.com> writes:
>
> [    0.015998] Performance Events: PEBS fmt0+, Core2 events, Broken BIOS detected, using software events only.
> [    0.015998] [Firmware Bug]: the BIOS has corrupted hw-PMU resources (MSR 186 is 43003c)

There should be a BIOS setting to disable that (something with p-states
likely).

> I am pretty sure I used "-e ..." some time ago with same BIOS.

The "if anyone else uses the PMU throw in your toys and sulk" check was
only recently added.

Or fix perf just ignore counters that are already used. The others
should work fine.

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH RFC] x86: avoid atomic operation in test_and_set_bit_lock if possible
  2011-03-24 21:42                 ` Andi Kleen
@ 2011-03-24 23:26                   ` Linus Torvalds
  2011-03-24 23:56                     ` Andi Kleen
  2011-03-25  9:35                     ` Ingo Molnar
  0 siblings, 2 replies; 49+ messages in thread
From: Linus Torvalds @ 2011-03-24 23:26 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Eric Dumazet, Ingo Molnar, Jack Steiner, Jan Beulich,
	Borislav Petkov, Peter Zijlstra, Nick Piggin, x86@kernel.org,
	Thomas Gleixner, Andrew Morton, Ingo Molnar, tee,
	Nikanth Karthikesan, linux-kernel@vger.kernel.org, H. Peter Anvin

On Thu, Mar 24, 2011 at 2:42 PM, Andi Kleen <andi@firstfloor.org> wrote:
>
> The "if anyone else uses the PMU throw in your toys and sulk" check was
> only recently added.

,, and I'd like to point out that we should just say "screw the
f*cking BIOS, it's doing things wrong". And then just take over the
PMU events, and make sure that they aren't routed to SCI. Instead of
the current "ok, roll over and die when the BIOS does something
idiotic".

People continuously claim that the BIOS really needs it, and I have
never EVER seen any good explanation of why that particular sh*t
argument would b true. It seems to be purely about politics, where
some idiotic vendor (namely HP) has convinced Intel that they really
need it. To the point where some engineers seem to have bought into
the whole thing and actually believe that fairy tale ("firmware can do
better" - hah! They must be feeding people some bad drugs at the
cafeteria)

                     Linus

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH RFC] x86: avoid atomic operation in test_and_set_bit_lock if possible
  2011-03-24 23:26                   ` Linus Torvalds
@ 2011-03-24 23:56                     ` Andi Kleen
  2011-03-25  5:47                       ` Eric Dumazet
  2011-03-25  9:22                       ` Ingo Molnar
  2011-03-25  9:35                     ` Ingo Molnar
  1 sibling, 2 replies; 49+ messages in thread
From: Andi Kleen @ 2011-03-24 23:56 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andi Kleen, Eric Dumazet, Ingo Molnar, Jack Steiner, Jan Beulich,
	Borislav Petkov, Peter Zijlstra, Nick Piggin, x86@kernel.org,
	Thomas Gleixner, Andrew Morton, Ingo Molnar, tee,
	Nikanth Karthikesan, linux-kernel@vger.kernel.org, H. Peter Anvin

> never EVER seen any good explanation of why that particular sh*t
> argument would b true. It seems to be purely about politics, where
> some idiotic vendor (namely HP) has convinced Intel that they really
> need it. To the point where some engineers seem to have bought into
> the whole thing and actually believe that fairy tale ("firmware can do
> better" - hah! They must be feeding people some bad drugs at the
> cafeteria)

For the record I don't think it's a good idea for the BIOS to do 
this (and I'm not aware of any engineer who does),  
but I think Linux should do better than just disabling PMU use when 
this happens.

However I suspect taking over SCI would cause endless problems
and is very likely not a good idea.

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH RFC] x86: avoid atomic operation in test_and_set_bit_lock if possible
  2011-03-24 23:56                     ` Andi Kleen
@ 2011-03-25  5:47                       ` Eric Dumazet
  2011-03-25  9:32                         ` Ingo Molnar
  2011-03-25  9:38                         ` Eric Dumazet
  2011-03-25  9:22                       ` Ingo Molnar
  1 sibling, 2 replies; 49+ messages in thread
From: Eric Dumazet @ 2011-03-25  5:47 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Linus Torvalds, Ingo Molnar, Jack Steiner, Jan Beulich,
	Borislav Petkov, Peter Zijlstra, Nick Piggin, x86@kernel.org,
	Thomas Gleixner, Andrew Morton, Ingo Molnar, tee,
	Nikanth Karthikesan, linux-kernel@vger.kernel.org, H. Peter Anvin

Le vendredi 25 mars 2011 à 00:56 +0100, Andi Kleen a écrit :
> > never EVER seen any good explanation of why that particular sh*t
> > argument would b true. It seems to be purely about politics, where
> > some idiotic vendor (namely HP) has convinced Intel that they really
> > need it. To the point where some engineers seem to have bought into
> > the whole thing and actually believe that fairy tale ("firmware can do
> > better" - hah! They must be feeding people some bad drugs at the
> > cafeteria)
> 
> For the record I don't think it's a good idea for the BIOS to do 
> this (and I'm not aware of any engineer who does),  
> but I think Linux should do better than just disabling PMU use when 
> this happens.
> 
> However I suspect taking over SCI would cause endless problems
> and is very likely not a good idea.

I tried many different changes in BIOS and all failed (the machine is
damn slow at boot, this takes age).

I am stuck :(

Thanks



^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH RFC] x86: avoid atomic operation in test_and_set_bit_lock if possible
  2011-03-24 23:56                     ` Andi Kleen
  2011-03-25  5:47                       ` Eric Dumazet
@ 2011-03-25  9:22                       ` Ingo Molnar
  2011-03-25 10:21                         ` Peter Zijlstra
  1 sibling, 1 reply; 49+ messages in thread
From: Ingo Molnar @ 2011-03-25  9:22 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Linus Torvalds, Eric Dumazet, Jack Steiner, Jan Beulich,
	Borislav Petkov, Peter Zijlstra, Nick Piggin, x86@kernel.org,
	Thomas Gleixner, Andrew Morton, Ingo Molnar, tee,
	Nikanth Karthikesan, linux-kernel@vger.kernel.org, H. Peter Anvin


* Andi Kleen <andi@firstfloor.org> wrote:

> > never EVER seen any good explanation of why that particular sh*t
> > argument would b true. It seems to be purely about politics, where
> > some idiotic vendor (namely HP) has convinced Intel that they really
> > need it. To the point where some engineers seem to have bought into
> > the whole thing and actually believe that fairy tale ("firmware can do
> > better" - hah! They must be feeding people some bad drugs at the
> > cafeteria)
> 
> For the record I don't think it's a good idea for the BIOS to do 
> this (and I'm not aware of any engineer who does),  

There's really just two sane options:

 - complain about the BIOS corrupting CPU state and refusing to use the PMU
 - complain about the BIOS corrupting CPU state and using the PMU against the BIOS

We went for the first one but i'll be more than glad to implement Linus's much 
more aggressive second option.

Btw., for the record, the thing you have been advocating in the past was a 
third option: for the kernel to step aside quietly and to let the BIOS corrupt 
a counter or two. You even sent us some sort of BIOS specification about how to 
implement that. That's pretty much the worst solution imaginable.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH RFC] x86: avoid atomic operation in test_and_set_bit_lock if possible
  2011-03-25  5:47                       ` Eric Dumazet
@ 2011-03-25  9:32                         ` Ingo Molnar
  2011-03-25  9:44                           ` Eric Dumazet
                                             ` (2 more replies)
  2011-03-25  9:38                         ` Eric Dumazet
  1 sibling, 3 replies; 49+ messages in thread
From: Ingo Molnar @ 2011-03-25  9:32 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Andi Kleen, Linus Torvalds, Jack Steiner, Jan Beulich,
	Borislav Petkov, Peter Zijlstra, Nick Piggin, x86@kernel.org,
	Thomas Gleixner, Andrew Morton, Ingo Molnar, tee,
	Nikanth Karthikesan, linux-kernel@vger.kernel.org, H. Peter Anvin


* Eric Dumazet <eric.dumazet@gmail.com> wrote:

> Le vendredi 25 mars 2011 à 00:56 +0100, Andi Kleen a écrit :
> > > never EVER seen any good explanation of why that particular sh*t
> > > argument would b true. It seems to be purely about politics, where
> > > some idiotic vendor (namely HP) has convinced Intel that they really
> > > need it. To the point where some engineers seem to have bought into
> > > the whole thing and actually believe that fairy tale ("firmware can do
> > > better" - hah! They must be feeding people some bad drugs at the
> > > cafeteria)
> > 
> > For the record I don't think it's a good idea for the BIOS to do 
> > this (and I'm not aware of any engineer who does),  
> > but I think Linux should do better than just disabling PMU use when 
> > this happens.
> > 
> > However I suspect taking over SCI would cause endless problems
> > and is very likely not a good idea.
> 
> I tried many different changes in BIOS and all failed (the machine is
> damn slow at boot, this takes age).
> 
> I am stuck :(

Could you please try the patch below?

Thanks,

	Ingo

------------------->
>From 14df27334ac47a5cec67fb2238d14499346acc38 Mon Sep 17 00:00:00 2001
From: Ingo Molnar <mingo@elte.hu>
Date: Fri, 25 Mar 2011 10:24:23 +0100
Subject: [PATCH] perf, x86: Complain louder about BIOSen corrupting CPU/PMU state and continue

Eric Dumazet reported that hardware PMU events do not work on his
system, due to the BIOS corrupting PMU state:

    Performance Events: PEBS fmt0+, Core2 events, Broken BIOS detected, using software events only.
    [Firmware Bug]: the BIOS has corrupted hw-PMU resources (MSR 186 is 43003c)

Linus suggested that we continue in the face of such BIOS-induced CPU
state corruption:

   http://lkml.org/lkml/2011/3/24/608

Such BIOSes will have to be fixed - developers rely on a working and fully
capable PMU and BIOS interfering with CPU state is simply not acceptable.

So this patch changes perf to continue when it detects such BIOS
interaction, some hardware events may be unreliable due to the BIOS writing
and re-writing them - there's not much the kernel can do about that.

Reported-by: Eric Dumazet <eric.dumazet@gmail.com>
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Steven Rostedt <rostedt@goodmis.org>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
 arch/x86/kernel/cpu/perf_event.c |    9 +++++++--
 1 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c
index ec46eea..eb00677 100644
--- a/arch/x86/kernel/cpu/perf_event.c
+++ b/arch/x86/kernel/cpu/perf_event.c
@@ -500,12 +500,17 @@ static bool check_hw_exists(void)
 	return true;
 
 bios_fail:
-	printk(KERN_CONT "Broken BIOS detected, using software events only.\n");
+	/*
+	 * We still allow the PMU driver to operate:
+	 */
+	printk(KERN_CONT "Broken BIOS detected, complain to your hardware vendor.\n");
 	printk(KERN_ERR FW_BUG "the BIOS has corrupted hw-PMU resources (MSR %x is %Lx)\n", reg, val);
-	return false;
+
+	return true;
 
 msr_fail:
 	printk(KERN_CONT "Broken PMU hardware detected, using software events only.\n");
+
 	return false;
 }
 

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* Re: [PATCH RFC] x86: avoid atomic operation in test_and_set_bit_lock if possible
  2011-03-24 23:26                   ` Linus Torvalds
  2011-03-24 23:56                     ` Andi Kleen
@ 2011-03-25  9:35                     ` Ingo Molnar
  1 sibling, 0 replies; 49+ messages in thread
From: Ingo Molnar @ 2011-03-25  9:35 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andi Kleen, Eric Dumazet, Jack Steiner, Jan Beulich,
	Borislav Petkov, Peter Zijlstra, Nick Piggin, x86@kernel.org,
	Thomas Gleixner, Andrew Morton, Ingo Molnar, tee,
	Nikanth Karthikesan, linux-kernel@vger.kernel.org, H. Peter Anvin


* Linus Torvalds <torvalds@linux-foundation.org> wrote:

> ,, and I'd like to point out that we should just say "screw the
> f*cking BIOS, it's doing things wrong". And then just take over the
> PMU events, and make sure that they aren't routed to SCI. Instead of
> the current "ok, roll over and die when the BIOS does something
> idiotic".
> 
> People continuously claim that the BIOS really needs it, and I have
> never EVER seen any good explanation of why that particular sh*t
> argument would b true. It seems to be purely about politics, where
> some idiotic vendor (namely HP) has convinced Intel that they really
> need it. To the point where some engineers seem to have bought into
> the whole thing and actually believe that fairy tale ("firmware can do
> better" - hah! They must be feeding people some bad drugs at the
> cafeteria)

Ok, fully agreed, and i've changed the code to "detect the BIOS breakage,
warn about it but otherwise ignore the BIOS".

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH RFC] x86: avoid atomic operation in test_and_set_bit_lock if possible
  2011-03-25  5:47                       ` Eric Dumazet
  2011-03-25  9:32                         ` Ingo Molnar
@ 2011-03-25  9:38                         ` Eric Dumazet
  2011-03-25 20:29                           ` Peter Zijlstra
  1 sibling, 1 reply; 49+ messages in thread
From: Eric Dumazet @ 2011-03-25  9:38 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Linus Torvalds, Ingo Molnar, Jack Steiner, Jan Beulich,
	Borislav Petkov, Peter Zijlstra, Nick Piggin, x86@kernel.org,
	Thomas Gleixner, Andrew Morton, Ingo Molnar, tee,
	Nikanth Karthikesan, linux-kernel@vger.kernel.org, H. Peter Anvin

Le vendredi 25 mars 2011 à 06:47 +0100, Eric Dumazet a écrit :

> I tried many different changes in BIOS and all failed (the machine is
> damn slow at boot, this takes age).

I even tried on a more recent hardware (ProLiant BL460c G7)

BIOS :	Version: I27  Release Date: 09/30/2010

[    0.021849] CPU0: Intel(R) Xeon(R) CPU           E5640  @ 2.67GHz stepping 02
[    0.123904] Performance Events: PEBS fmt1+, Westmere events, Broken BIOS detected, using software events only.
[    0.124125] [Firmware Bug]: the BIOS has corrupted hw-PMU resources (MSR 38d is 330)

I guess I'll have to patch my kernel, I doubt HP will make any change in this area.



^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH RFC] x86: avoid atomic operation in test_and_set_bit_lock if possible
  2011-03-25  9:32                         ` Ingo Molnar
@ 2011-03-25  9:44                           ` Eric Dumazet
  2011-03-25  9:59                             ` Ingo Molnar
  2011-03-25 16:16                           ` Robert Richter
  2011-03-25 17:22                           ` Andi Kleen
  2 siblings, 1 reply; 49+ messages in thread
From: Eric Dumazet @ 2011-03-25  9:44 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Andi Kleen, Linus Torvalds, Jack Steiner, Jan Beulich,
	Borislav Petkov, Peter Zijlstra, Nick Piggin, x86@kernel.org,
	Thomas Gleixner, Andrew Morton, Ingo Molnar, tee,
	Nikanth Karthikesan, linux-kernel@vger.kernel.org, H. Peter Anvin

Le vendredi 25 mars 2011 à 10:32 +0100, Ingo Molnar a écrit :
> * Eric Dumazet <eric.dumazet@gmail.com> wrote:
> 
> > Le vendredi 25 mars 2011 à 00:56 +0100, Andi Kleen a écrit :
> > > > never EVER seen any good explanation of why that particular sh*t
> > > > argument would b true. It seems to be purely about politics, where
> > > > some idiotic vendor (namely HP) has convinced Intel that they really
> > > > need it. To the point where some engineers seem to have bought into
> > > > the whole thing and actually believe that fairy tale ("firmware can do
> > > > better" - hah! They must be feeding people some bad drugs at the
> > > > cafeteria)
> > > 
> > > For the record I don't think it's a good idea for the BIOS to do 
> > > this (and I'm not aware of any engineer who does),  
> > > but I think Linux should do better than just disabling PMU use when 
> > > this happens.
> > > 
> > > However I suspect taking over SCI would cause endless problems
> > > and is very likely not a good idea.
> > 
> > I tried many different changes in BIOS and all failed (the machine is
> > damn slow at boot, this takes age).
> > 
> > I am stuck :(
> 
> Could you please try the patch below?

This obviously works, but you probably need to make a full pass to make
sure we dont have a MSR failure -this should return false in this case.






^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH RFC] x86: avoid atomic operation in test_and_set_bit_lock if possible
  2011-03-25  9:44                           ` Eric Dumazet
@ 2011-03-25  9:59                             ` Ingo Molnar
  2011-03-25 10:50                               ` Borislav Petkov
  2011-03-25 11:10                               ` Peter Zijlstra
  0 siblings, 2 replies; 49+ messages in thread
From: Ingo Molnar @ 2011-03-25  9:59 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Andi Kleen, Linus Torvalds, Jack Steiner, Jan Beulich,
	Borislav Petkov, Peter Zijlstra, Nick Piggin, x86@kernel.org,
	Thomas Gleixner, Andrew Morton, Ingo Molnar, tee,
	Nikanth Karthikesan, linux-kernel@vger.kernel.org, H. Peter Anvin


* Eric Dumazet <eric.dumazet@gmail.com> wrote:

> Le vendredi 25 mars 2011 à 10:32 +0100, Ingo Molnar a écrit :
> > * Eric Dumazet <eric.dumazet@gmail.com> wrote:
> > 
> > > Le vendredi 25 mars 2011 à 00:56 +0100, Andi Kleen a écrit :
> > > > > never EVER seen any good explanation of why that particular sh*t
> > > > > argument would b true. It seems to be purely about politics, where
> > > > > some idiotic vendor (namely HP) has convinced Intel that they really
> > > > > need it. To the point where some engineers seem to have bought into
> > > > > the whole thing and actually believe that fairy tale ("firmware can do
> > > > > better" - hah! They must be feeding people some bad drugs at the
> > > > > cafeteria)
> > > > 
> > > > For the record I don't think it's a good idea for the BIOS to do 
> > > > this (and I'm not aware of any engineer who does),  
> > > > but I think Linux should do better than just disabling PMU use when 
> > > > this happens.
> > > > 
> > > > However I suspect taking over SCI would cause endless problems
> > > > and is very likely not a good idea.
> > > 
> > > I tried many different changes in BIOS and all failed (the machine is
> > > damn slow at boot, this takes age).
> > > 
> > > I am stuck :(
> > 
> > Could you please try the patch below?
> 
> This obviously works, [...]

Congrats to your now much-improved perf experience! :-)

> [...] but you probably need to make a full pass to make sure we dont have a 
> MSR failure -this should return false in this case.

Wanted to keep this patch simple - we are not really hitting MSR failure cases 
in practice, and by getting the message the user is at least warned that 
*something* is amiss.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH RFC] x86: avoid atomic operation in test_and_set_bit_lock if possible
  2011-03-24 17:19       ` Ingo Molnar
@ 2011-03-25 10:06         ` Jan Beulich
  2011-03-25 11:10           ` Ingo Molnar
                             ` (2 more replies)
  0 siblings, 3 replies; 49+ messages in thread
From: Jan Beulich @ 2011-03-25 10:06 UTC (permalink / raw)
  To: Ingo Molnar, Jack Steiner
  Cc: Borislav Petkov, Peter Zijlstra, Nick Piggin, x86@kernel.org,
	Thomas Gleixner, Andrew Morton, Linus Torvalds,
	Arnaldo Carvalho de Melo, Ingo Molnar, tee, Nikanth Karthikesan,
	linux-kernel@vger.kernel.org, H. Peter Anvin

>>> On 24.03.11 at 18:19, Ingo Molnar <mingo@elte.hu> wrote:
> * Jan Beulich <JBeulich@novell.com> wrote:
>> Are you certain? Iirc the lock prefix implies minimally a read-for-
>> ownership (if CPUs are really smart enough to optimize away the
>> write - I wonder whether that would be correct at all when it
>> comes to locked operations), which means a cacheline can still be
>> bouncing heavily.
> 
> Yeah. On what workload was this?
> 
> Generally you use test_and_set_bit() if you expect it to be 'owned' by 
> whoever calls it, and released by someone else.
> 
> It would be really useful to run perf top on an affected box and see which 
> kernel function causes this. It might be better to add a test_bit() to the 
> affected codepath - instead of bloating all test_and_set_bit() users.

Indeed, I agree with you and Linus in this aspect.

> Note that the patch can also cause overhead: the test_bit() can miss the 
> cache, it will bring in the cacheline shared, and the subsequent test_and_set() 
> call will then dirty the cacheline - so the CPU might miss again and has to wait 
> for other CPUs to first flush this cacheline.
> 
> So we really need more details here.

The problem was observed with __lock_page() (in a variant not
upstream for reasons not known to me), and prefixing e.g.
trylock_page() with an extra PageLocked() check yielded the
below quoted improvements.

Jack - were there any similar measurements done on upstream
code?

Jan


**** Quoting Jack Steiner <steiner@sgi.com> ****

The following tests were run on UVSW :
	768p Westmere
	 128 nodes


Boot times - greater than 2X reduction in boot time:
	2286s PTF #8
	1899s PTF #8
	 975s new algorithm
	 962s new algorithm

Boot messages referring to udev timeouts - eliminated:
	(After the udevadm settle timeout, the events queue contains):

	7174 PTF #8
	9435 PTF #8
	   0 new algorithm
	   0 new algorithm

AIM7 results - no difference at low numbers of tasks. Improvements at high counts:
	Jobs/Min at 2000 users
		 5100 PTF #8
		17750 new algorithm

	Wallclock seconds to run test at 2000 users
		2250s PTF #8
	 	 650s new algorithm

	CPU Seconds at 2000 users
		1300000 PTF #8
		  14000 new algorithm


Test of large parallel app faulting for text.

	Text resident in page cache (10000 pages):
		REAL	USER		SYS
		22.830s	23m5.567s	 85m59.042s	PTF #8 run1
		26.267s	34m3.536s	104m20.035s	PTF #8 run2
		10.890s	19m27.305s	 39m50.949s	new algorithm run1
		10.860s	20m42.698s	 40m48.889s	new algorithm run2

	Text on Disk (1000 pages)
		REAL	USER		SYS
		31.658s	9m25.379s	71m11.967s	PTF #8
		24.348s	6m15.323s	45m27.578s	new algorithm

_________________________________________________________________________________
The following tests were run on UV48:
	    4 racks
	  256 sockets
	2452p westmere

Boot time:
	4562 sec PTF#8
	1965 sec new

MPI "helloworld" with 1024 ranks
	35 sec PTF #8
	22 sec new


Test of large parallel app faulting for text.
	Text resident in page cache (10000 pages):
		REAL	USER		SYS
		46.394s	141m19s		366m53s		PTF #8
		38.986s	137m36		264m52s		PTF #8
		 7.987s	 34m50s		 42m36s		new algorithm
		10.550s	 43m31s		 59m45s		new algorithm


AIM7 Results (this is the original AIM7 - not the recent opensource version)
	------------------------------
	Jobs/Min
	 TASKS      PTF #8         new
	     1       487.8       486.6
	    10      4405.8      4940.6
	   100     18570.5     18198.9
	  1000     17262.3     17167.1
	  2000      4879.3     18163.9
	  4000        **       18846.2
	------------------------------
	Real Seconds
	 TASKS      PTF #8         new
	     1        11.9        12.0
	    10        13.2        11.8
	   100        31.3        32.0
	  1000       337.2       339.0
	  2000      2385.6       640.8
	  4000        **        1235.3
	------------------------------
	CPU Seconds
	 TASKS      PTF #8         new
	     1         1.6         1.6
	    10        11.5        12.9
	   100       132.2       137.2
	  1000      4486.5      6586.3
	  2000   1758419.7     27845.7
	  4000        **       65619.5

           ** Timed out



^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH RFC] x86: avoid atomic operation in test_and_set_bit_lock if possible
  2011-03-25  9:22                       ` Ingo Molnar
@ 2011-03-25 10:21                         ` Peter Zijlstra
  2011-03-25 16:08                           ` Robert Richter
  2011-03-25 17:15                           ` Andi Kleen
  0 siblings, 2 replies; 49+ messages in thread
From: Peter Zijlstra @ 2011-03-25 10:21 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Andi Kleen, Linus Torvalds, Eric Dumazet, Jack Steiner,
	Jan Beulich, Borislav Petkov, Nick Piggin, x86@kernel.org,
	Thomas Gleixner, Andrew Morton, Ingo Molnar, tee,
	Nikanth Karthikesan, linux-kernel@vger.kernel.org, H. Peter Anvin

On Fri, 2011-03-25 at 10:22 +0100, Ingo Molnar wrote:
> * Andi Kleen <andi@firstfloor.org> wrote:
> 
> > > never EVER seen any good explanation of why that particular sh*t
> > > argument would b true. It seems to be purely about politics, where
> > > some idiotic vendor (namely HP) has convinced Intel that they really
> > > need it. To the point where some engineers seem to have bought into
> > > the whole thing and actually believe that fairy tale ("firmware can do
> > > better" - hah! They must be feeding people some bad drugs at the
> > > cafeteria)
> > 
> > For the record I don't think it's a good idea for the BIOS to do 
> > this (and I'm not aware of any engineer who does),  
> 
> There's really just two sane options:
> 
>  - complain about the BIOS corrupting CPU state and refusing to use the PMU
>  - complain about the BIOS corrupting CPU state and using the PMU against the BIOS
> 
> We went for the first one but i'll be more than glad to implement Linus's much 
> more aggressive second option.
> 
> Btw., for the record, the thing you have been advocating in the past was a 
> third option: for the kernel to step aside quietly and to let the BIOS corrupt 
> a counter or two. You even sent us some sort of BIOS specification about how to 
> implement that. That's pretty much the worst solution imaginable.

Also seriously complicated by the kexec case where the previous kernel
didn't clean up PMU state. There is simply no sane way to detect if its
actually used and by whoem.

The whole PMU 'sharing' concept championed by Andi is utter crap.

As for simply using it despite the BIOS corrupting it, that might not
always work, the BIOS might simply over-write your state because it
one-sidedly declares to own the MSRs (observed behaviour).

Its all a big clusterfuck and really the best way (IMO) is what we have
now to put pressure on and force the BIOS vendors to play nice.

I assume both HP and DELL will be seriously unhappy with the kernel
spewing FIRMWARE BUG messages on boot on their boxen, the question is,
will they be unhappy enough to fix it..

Now Ingo's patch keeps the warning and lets you take the PMU back and
live with whatever consequences that brings (incorrect counts etc), that
might also work but puts less pressure on the vendors because things
appear to work.


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH RFC] x86: avoid atomic operation in test_and_set_bit_lock if possible
  2011-03-25  9:59                             ` Ingo Molnar
@ 2011-03-25 10:50                               ` Borislav Petkov
  2011-03-25 11:10                               ` Peter Zijlstra
  1 sibling, 0 replies; 49+ messages in thread
From: Borislav Petkov @ 2011-03-25 10:50 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Eric Dumazet, Andi Kleen, Linus Torvalds, Jack Steiner,
	Jan Beulich, Borislav Petkov, Peter Zijlstra, Nick Piggin,
	x86@kernel.org, Thomas Gleixner, Andrew Morton, Ingo Molnar,
	tee@sgi.com, Nikanth Karthikesan, linux-kernel@vger.kernel.org,
	H. Peter Anvin, Robert Richter

On Fri, Mar 25, 2011 at 05:59:31AM -0400, Ingo Molnar wrote:
> > [...] but you probably need to make a full pass to make sure we dont have a 
> > MSR failure -this should return false in this case.
>
> Wanted to keep this patch simple - we are not really hitting MSR
> failure cases in practice, and by getting the message the user is at
> least warned that *something* is amiss.

AFAIR, Robert was telling me couple of months ago about some braindead
BIOSes doing power management by using at least one perf counter. A
radical approach like that might interfere with that BIOS since it will
try to reprogram the counter continuously and f*ck up OS PMU use.

Anyway, adding Robert for clarification.

-- 
Regards/Gruss,
Boris.

Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
General Managers: Alberto Bozzo, Andrew Bowd
Registration: Dornach, Gemeinde Aschheim, Landkreis Muenchen
Registergericht Muenchen, HRB Nr. 43632

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH RFC] x86: avoid atomic operation in test_and_set_bit_lock if possible
  2011-03-25 10:06         ` Jan Beulich
@ 2011-03-25 11:10           ` Ingo Molnar
  2011-03-25 12:04             ` Nikanth Karthikesan
  2011-03-25 13:12           ` Jack Steiner
  2011-03-25 16:29           ` Linus Torvalds
  2 siblings, 1 reply; 49+ messages in thread
From: Ingo Molnar @ 2011-03-25 11:10 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Jack Steiner, Borislav Petkov, Peter Zijlstra, Nick Piggin,
	x86@kernel.org, Thomas Gleixner, Andrew Morton, Linus Torvalds,
	Arnaldo Carvalho de Melo, Ingo Molnar, tee, Nikanth Karthikesan,
	linux-kernel@vger.kernel.org, H. Peter Anvin


* Jan Beulich <JBeulich@novell.com> wrote:

> >>> On 24.03.11 at 18:19, Ingo Molnar <mingo@elte.hu> wrote:
> > * Jan Beulich <JBeulich@novell.com> wrote:
> >> Are you certain? Iirc the lock prefix implies minimally a read-for-
> >> ownership (if CPUs are really smart enough to optimize away the
> >> write - I wonder whether that would be correct at all when it
> >> comes to locked operations), which means a cacheline can still be
> >> bouncing heavily.
> > 
> > Yeah. On what workload was this?
> > 
> > Generally you use test_and_set_bit() if you expect it to be 'owned' by 
> > whoever calls it, and released by someone else.
> > 
> > It would be really useful to run perf top on an affected box and see which 
> > kernel function causes this. It might be better to add a test_bit() to the 
> > affected codepath - instead of bloating all test_and_set_bit() users.
> 
> Indeed, I agree with you and Linus in this aspect.
> 
> > Note that the patch can also cause overhead: the test_bit() can miss the 
> > cache, it will bring in the cacheline shared, and the subsequent test_and_set() 
> > call will then dirty the cacheline - so the CPU might miss again and has to wait 
> > for other CPUs to first flush this cacheline.
> > 
> > So we really need more details here.
> 
> The problem was observed with __lock_page() (in a variant not
> upstream for reasons not known to me), and prefixing e.g.
> trylock_page() with an extra PageLocked() check yielded the
> below quoted improvements.

The page lock flag is indeed one of those (rather rare) exceptions to typical 
object locking patterns. So in that particular case adding the PageLocked() 
test to trylock_page() would be the right approach to improving performance.

In the common case this change actively hurts for various reasons:

 - can turn a cache miss into two cache misses
 - adds an often unnecessary branch instruction
 - adds often unnecessary bloat
 - leaks a barrier

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH RFC] x86: avoid atomic operation in test_and_set_bit_lock if possible
  2011-03-25  9:59                             ` Ingo Molnar
  2011-03-25 10:50                               ` Borislav Petkov
@ 2011-03-25 11:10                               ` Peter Zijlstra
  2011-03-25 11:11                                 ` Ingo Molnar
  1 sibling, 1 reply; 49+ messages in thread
From: Peter Zijlstra @ 2011-03-25 11:10 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Eric Dumazet, Andi Kleen, Linus Torvalds, Jack Steiner,
	Jan Beulich, Borislav Petkov, Nick Piggin, x86@kernel.org,
	Thomas Gleixner, Andrew Morton, Ingo Molnar, tee,
	Nikanth Karthikesan, linux-kernel@vger.kernel.org, H. Peter Anvin

On Fri, 2011-03-25 at 10:59 +0100, Ingo Molnar wrote:
> 
> Wanted to keep this patch simple - we are not really hitting MSR failure cases 
> in practice, 

Everything qemu/kvm hits those.


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH RFC] x86: avoid atomic operation in test_and_set_bit_lock if possible
  2011-03-25 11:10                               ` Peter Zijlstra
@ 2011-03-25 11:11                                 ` Ingo Molnar
  0 siblings, 0 replies; 49+ messages in thread
From: Ingo Molnar @ 2011-03-25 11:11 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Eric Dumazet, Andi Kleen, Linus Torvalds, Jack Steiner,
	Jan Beulich, Borislav Petkov, Nick Piggin, x86@kernel.org,
	Thomas Gleixner, Andrew Morton, Ingo Molnar, tee,
	Nikanth Karthikesan, linux-kernel@vger.kernel.org, H. Peter Anvin


* Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:

> On Fri, 2011-03-25 at 10:59 +0100, Ingo Molnar wrote:
> > 
> > Wanted to keep this patch simple - we are not really hitting MSR failure cases 
> > in practice, 
> 
> Everything qemu/kvm hits those.

and other virtualizers as well. I meant on real hardware with a real BIOS.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH RFC] x86: avoid atomic operation in test_and_set_bit_lock if possible
  2011-03-25 11:10           ` Ingo Molnar
@ 2011-03-25 12:04             ` Nikanth Karthikesan
  0 siblings, 0 replies; 49+ messages in thread
From: Nikanth Karthikesan @ 2011-03-25 12:04 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Jan Beulich, Jack Steiner, Borislav Petkov, Peter Zijlstra,
	Nick Piggin, x86@kernel.org, Thomas Gleixner, Andrew Morton,
	Linus Torvalds, Arnaldo Carvalho de Melo, Ingo Molnar, tee,
	linux-kernel@vger.kernel.org, H. Peter Anvin

On Friday, March 25, 2011 04:40:13 pm Ingo Molnar wrote:
> * Jan Beulich <JBeulich@novell.com> wrote:
> > >>> On 24.03.11 at 18:19, Ingo Molnar <mingo@elte.hu> wrote:
> > > * Jan Beulich <JBeulich@novell.com> wrote:
> > >> Are you certain? Iirc the lock prefix implies minimally a read-for-
> > >> ownership (if CPUs are really smart enough to optimize away the
> > >> write - I wonder whether that would be correct at all when it
> > >> comes to locked operations), which means a cacheline can still be
> > >> bouncing heavily.
> > > 
> > > Yeah. On what workload was this?
> > > 
> > > Generally you use test_and_set_bit() if you expect it to be 'owned' by
> > > whoever calls it, and released by someone else.
> > > 
> > > It would be really useful to run perf top on an affected box and see
> > > which kernel function causes this. It might be better to add a
> > > test_bit() to the affected codepath - instead of bloating all
> > > test_and_set_bit() users.
> > 
> > Indeed, I agree with you and Linus in this aspect.
> > 
> > > Note that the patch can also cause overhead: the test_bit() can miss
> > > the cache, it will bring in the cacheline shared, and the subsequent
> > > test_and_set() call will then dirty the cacheline - so the CPU might
> > > miss again and has to wait for other CPUs to first flush this
> > > cacheline.
> > > 
> > > So we really need more details here.
> > 
> > The problem was observed with __lock_page() (in a variant not
> > upstream for reasons not known to me), and prefixing e.g.
> > trylock_page() with an extra PageLocked() check yielded the
> > below quoted improvements.
> 
> The page lock flag is indeed one of those (rather rare) exceptions to
> typical object locking patterns. So in that particular case adding the
> PageLocked() test to trylock_page() would be the right approach to
> improving performance.
> 
> In the common case this change actively hurts for various reasons:
> 
>  - can turn a cache miss into two cache misses
>  - adds an often unnecessary branch instruction
>  - adds often unnecessary bloat
>  - leaks a barrier
> 

Yes, I think I am observing these ill-effects when testing the code copied to 
user-space.

Thanks
Nikanth

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH RFC] x86: avoid atomic operation in test_and_set_bit_lock if possible
  2011-03-25 10:06         ` Jan Beulich
  2011-03-25 11:10           ` Ingo Molnar
@ 2011-03-25 13:12           ` Jack Steiner
  2011-03-25 16:29           ` Linus Torvalds
  2 siblings, 0 replies; 49+ messages in thread
From: Jack Steiner @ 2011-03-25 13:12 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Ingo Molnar, Borislav Petkov, Peter Zijlstra, Nick Piggin,
	x86@kernel.org, Thomas Gleixner, Andrew Morton, Linus Torvalds,
	Arnaldo Carvalho de Melo, Ingo Molnar, tee, Nikanth Karthikesan,
	linux-kernel@vger.kernel.org, H. Peter Anvin

On Fri, Mar 25, 2011 at 10:06:10AM +0000, Jan Beulich wrote:
> >>> On 24.03.11 at 18:19, Ingo Molnar <mingo@elte.hu> wrote:
> > * Jan Beulich <JBeulich@novell.com> wrote:
> >> Are you certain? Iirc the lock prefix implies minimally a read-for-
> >> ownership (if CPUs are really smart enough to optimize away the
> >> write - I wonder whether that would be correct at all when it
> >> comes to locked operations), which means a cacheline can still be
> >> bouncing heavily.
> > 
> > Yeah. On what workload was this?
> > 
> > Generally you use test_and_set_bit() if you expect it to be 'owned' by 
> > whoever calls it, and released by someone else.
> > 
> > It would be really useful to run perf top on an affected box and see which 
> > kernel function causes this. It might be better to add a test_bit() to the 
> > affected codepath - instead of bloating all test_and_set_bit() users.
> 
> Indeed, I agree with you and Linus in this aspect.
> 
> > Note that the patch can also cause overhead: the test_bit() can miss the 
> > cache, it will bring in the cacheline shared, and the subsequent test_and_set() 
> > call will then dirty the cacheline - so the CPU might miss again and has to wait 
> > for other CPUs to first flush this cacheline.
> > 
> > So we really need more details here.
> 
> The problem was observed with __lock_page() (in a variant not
> upstream for reasons not known to me), and prefixing e.g.
> trylock_page() with an extra PageLocked() check yielded the
> below quoted improvements.
> 
> Jack - were there any similar measurements done on upstream
> code?

Not yet but it is high on my list to test. I suspect a similar problem exists.
I'll post the results as soon as I have them.

> 
> Jan
> 
> 
> **** Quoting Jack Steiner <steiner@sgi.com> ****
> 
> The following tests were run on UVSW :
> 	768p Westmere
> 	 128 nodes
> 
> 
> Boot times - greater than 2X reduction in boot time:
> 	2286s PTF #8
> 	1899s PTF #8
> 	 975s new algorithm
> 	 962s new algorithm
> 
> Boot messages referring to udev timeouts - eliminated:
> 	(After the udevadm settle timeout, the events queue contains):
> 
> 	7174 PTF #8
> 	9435 PTF #8
> 	   0 new algorithm
> 	   0 new algorithm
> 
> AIM7 results - no difference at low numbers of tasks. Improvements at high counts:
> 	Jobs/Min at 2000 users
> 		 5100 PTF #8
> 		17750 new algorithm
> 
> 	Wallclock seconds to run test at 2000 users
> 		2250s PTF #8
> 	 	 650s new algorithm
> 
> 	CPU Seconds at 2000 users
> 		1300000 PTF #8
> 		  14000 new algorithm
> 
> 
> Test of large parallel app faulting for text.
> 
> 	Text resident in page cache (10000 pages):
> 		REAL	USER		SYS
> 		22.830s	23m5.567s	 85m59.042s	PTF #8 run1
> 		26.267s	34m3.536s	104m20.035s	PTF #8 run2
> 		10.890s	19m27.305s	 39m50.949s	new algorithm run1
> 		10.860s	20m42.698s	 40m48.889s	new algorithm run2
> 
> 	Text on Disk (1000 pages)
> 		REAL	USER		SYS
> 		31.658s	9m25.379s	71m11.967s	PTF #8
> 		24.348s	6m15.323s	45m27.578s	new algorithm
> 
> _________________________________________________________________________________
> The following tests were run on UV48:
> 	    4 racks
> 	  256 sockets
> 	2452p westmere
> 
> Boot time:
> 	4562 sec PTF#8
> 	1965 sec new
> 
> MPI "helloworld" with 1024 ranks
> 	35 sec PTF #8
> 	22 sec new
> 
> 
> Test of large parallel app faulting for text.
> 	Text resident in page cache (10000 pages):
> 		REAL	USER		SYS
> 		46.394s	141m19s		366m53s		PTF #8
> 		38.986s	137m36		264m52s		PTF #8
> 		 7.987s	 34m50s		 42m36s		new algorithm
> 		10.550s	 43m31s		 59m45s		new algorithm
> 
> 
> AIM7 Results (this is the original AIM7 - not the recent opensource version)
> 	------------------------------
> 	Jobs/Min
> 	 TASKS      PTF #8         new
> 	     1       487.8       486.6
> 	    10      4405.8      4940.6
> 	   100     18570.5     18198.9
> 	  1000     17262.3     17167.1
> 	  2000      4879.3     18163.9
> 	  4000        **       18846.2
> 	------------------------------
> 	Real Seconds
> 	 TASKS      PTF #8         new
> 	     1        11.9        12.0
> 	    10        13.2        11.8
> 	   100        31.3        32.0
> 	  1000       337.2       339.0
> 	  2000      2385.6       640.8
> 	  4000        **        1235.3
> 	------------------------------
> 	CPU Seconds
> 	 TASKS      PTF #8         new
> 	     1         1.6         1.6
> 	    10        11.5        12.9
> 	   100       132.2       137.2
> 	  1000      4486.5      6586.3
> 	  2000   1758419.7     27845.7
> 	  4000        **       65619.5
> 
>            ** Timed out
> 

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH RFC] x86: avoid atomic operation in test_and_set_bit_lock if possible
  2011-03-25 10:21                         ` Peter Zijlstra
@ 2011-03-25 16:08                           ` Robert Richter
  2011-03-25 19:31                             ` Ingo Molnar
  2011-03-25 17:15                           ` Andi Kleen
  1 sibling, 1 reply; 49+ messages in thread
From: Robert Richter @ 2011-03-25 16:08 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Andi Kleen, Linus Torvalds, Eric Dumazet,
	Jack Steiner, Jan Beulich, Borislav Petkov, Nick Piggin,
	x86@kernel.org, Thomas Gleixner, Andrew Morton, Ingo Molnar,
	tee@sgi.com, Nikanth Karthikesan, linux-kernel@vger.kernel.org,
	H. Peter Anvin

On 25.03.11 06:21:16, Peter Zijlstra wrote:
> On Fri, 2011-03-25 at 10:22 +0100, Ingo Molnar wrote:
> > * Andi Kleen <andi@firstfloor.org> wrote:

> > > For the record I don't think it's a good idea for the BIOS to do 
> > > this (and I'm not aware of any engineer who does),  
> > 
> > There's really just two sane options:
> > 
> >  - complain about the BIOS corrupting CPU state and refusing to use the PMU
> >  - complain about the BIOS corrupting CPU state and using the PMU against the BIOS
> > 
> > We went for the first one but i'll be more than glad to implement Linus's much 
> > more aggressive second option.
> > 
> > Btw., for the record, the thing you have been advocating in the past was a 
> > third option: for the kernel to step aside quietly and to let the BIOS corrupt 
> > a counter or two. You even sent us some sort of BIOS specification about how to 
> > implement that. That's pretty much the worst solution imaginable.

Option 2 wont work, I have seen BIOSes that block access to the
counter registers, then there is no way for the OS to take over
control.

So, if you want to use perf anyway on such systems, you will have to
implement option 3 to mark the counter as "reserved" ...


> Also seriously complicated by the kexec case where the previous kernel
> didn't clean up PMU state. There is simply no sane way to detect if its
> actually used and by whoem.
> 
> The whole PMU 'sharing' concept championed by Andi is utter crap.

... but this seems not to be an option.

> 
> As for simply using it despite the BIOS corrupting it, that might not
> always work, the BIOS might simply over-write your state because it
> one-sidedly declares to own the MSRs (observed behaviour).
> 
> Its all a big clusterfuck and really the best way (IMO) is what we have
> now to put pressure on and force the BIOS vendors to play nice.
> 
> I assume both HP and DELL will be seriously unhappy with the kernel
> spewing FIRMWARE BUG messages on boot on their boxen, the question is,
> will they be unhappy enough to fix it..

So, we better stick then with option 1. My experience is that new
system's bioses try not to claim perfctrs (affected systems I have
seen are about 2-3 years old), but I am not really sure here.

> Now Ingo's patch keeps the warning and lets you take the PMU back and
> live with whatever consequences that brings (incorrect counts etc), that
> might also work but puts less pressure on the vendors because things
> appear to work.

And yes, using the counter anyway may corrupt counter values.

-Robert


-- 
Advanced Micro Devices, Inc.
Operating System Research Center


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH RFC] x86: avoid atomic operation in test_and_set_bit_lock if possible
  2011-03-25  9:32                         ` Ingo Molnar
  2011-03-25  9:44                           ` Eric Dumazet
@ 2011-03-25 16:16                           ` Robert Richter
  2011-03-25 17:22                           ` Andi Kleen
  2 siblings, 0 replies; 49+ messages in thread
From: Robert Richter @ 2011-03-25 16:16 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Eric Dumazet, Andi Kleen, Linus Torvalds, Jack Steiner,
	Jan Beulich, Borislav Petkov, Peter Zijlstra, Nick Piggin,
	x86@kernel.org, Thomas Gleixner, Andrew Morton, Ingo Molnar,
	tee@sgi.com, Nikanth Karthikesan, linux-kernel@vger.kernel.org,
	H. Peter Anvin

On 25.03.11 05:32:28, Ingo Molnar wrote:
> 
> * Eric Dumazet <eric.dumazet@gmail.com> wrote:
> 
> > Le vendredi 25 mars 2011 à 00:56 +0100, Andi Kleen a écrit :
> > > > never EVER seen any good explanation of why that particular sh*t
> > > > argument would b true. It seems to be purely about politics, where
> > > > some idiotic vendor (namely HP) has convinced Intel that they really
> > > > need it. To the point where some engineers seem to have bought into
> > > > the whole thing and actually believe that fairy tale ("firmware can do
> > > > better" - hah! They must be feeding people some bad drugs at the
> > > > cafeteria)
> > > 
> > > For the record I don't think it's a good idea for the BIOS to do 
> > > this (and I'm not aware of any engineer who does),  
> > > but I think Linux should do better than just disabling PMU use when 
> > > this happens.
> > > 
> > > However I suspect taking over SCI would cause endless problems
> > > and is very likely not a good idea.
> > 
> > I tried many different changes in BIOS and all failed (the machine is
> > damn slow at boot, this takes age).
> > 
> > I am stuck :(
> 
> Could you please try the patch below?
> 
> Thanks,
> 
> 	Ingo
> 
> ------------------->
> From 14df27334ac47a5cec67fb2238d14499346acc38 Mon Sep 17 00:00:00 2001
> From: Ingo Molnar <mingo@elte.hu>
> Date: Fri, 25 Mar 2011 10:24:23 +0100
> Subject: [PATCH] perf, x86: Complain louder about BIOSen corrupting CPU/PMU state and continue
> 
> Eric Dumazet reported that hardware PMU events do not work on his
> system, due to the BIOS corrupting PMU state:
> 
>     Performance Events: PEBS fmt0+, Core2 events, Broken BIOS detected, using software events only.
>     [Firmware Bug]: the BIOS has corrupted hw-PMU resources (MSR 186 is 43003c)
> 
> Linus suggested that we continue in the face of such BIOS-induced CPU
> state corruption:
> 
>    http://lkml.org/lkml/2011/3/24/608
> 
> Such BIOSes will have to be fixed - developers rely on a working and fully
> capable PMU and BIOS interfering with CPU state is simply not acceptable.
> 
> So this patch changes perf to continue when it detects such BIOS
> interaction, some hardware events may be unreliable due to the BIOS writing
> and re-writing them - there's not much the kernel can do about that.
> 
> Reported-by: Eric Dumazet <eric.dumazet@gmail.com>
> Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
> Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
> Cc: Frederic Weisbecker <fweisbec@gmail.com>
> Cc: Mike Galbraith <efault@gmx.de>
> Cc: Steven Rostedt <rostedt@goodmis.org>
> LKML-Reference: <new-submission>
> Signed-off-by: Ingo Molnar <mingo@elte.hu>
> ---
>  arch/x86/kernel/cpu/perf_event.c |    9 +++++++--
>  1 files changed, 7 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c
> index ec46eea..eb00677 100644
> --- a/arch/x86/kernel/cpu/perf_event.c
> +++ b/arch/x86/kernel/cpu/perf_event.c
> @@ -500,12 +500,17 @@ static bool check_hw_exists(void)
>  	return true;
>  
>  bios_fail:
> -	printk(KERN_CONT "Broken BIOS detected, using software events only.\n");
> +	/*
> +	 * We still allow the PMU driver to operate:
> +	 */
> +	printk(KERN_CONT "Broken BIOS detected, complain to your hardware vendor.\n");
>  	printk(KERN_ERR FW_BUG "the BIOS has corrupted hw-PMU resources (MSR %x is %Lx)\n", reg, val);
> -	return false;
> +
> +	return true;

Ingo, you jump out the loop here. This skips checks on other registers
and msr access. And if we want to continue anyway, checking msr access
becomes more important as the bios may block it.

-Robert

>  
>  msr_fail:
>  	printk(KERN_CONT "Broken PMU hardware detected, using software events only.\n");
> +
>  	return false;
>  }
>  
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 

-- 
Advanced Micro Devices, Inc.
Operating System Research Center


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH RFC] x86: avoid atomic operation in test_and_set_bit_lock if possible
  2011-03-25 10:06         ` Jan Beulich
  2011-03-25 11:10           ` Ingo Molnar
  2011-03-25 13:12           ` Jack Steiner
@ 2011-03-25 16:29           ` Linus Torvalds
  2011-03-25 16:47             ` Jan Beulich
  2011-03-25 16:49             ` Jack Steiner
  2 siblings, 2 replies; 49+ messages in thread
From: Linus Torvalds @ 2011-03-25 16:29 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Ingo Molnar, Jack Steiner, Borislav Petkov, Peter Zijlstra,
	Nick Piggin, x86@kernel.org, Thomas Gleixner, Andrew Morton,
	Arnaldo Carvalho de Melo, Ingo Molnar, tee, Nikanth Karthikesan,
	linux-kernel@vger.kernel.org, H. Peter Anvin

On Fri, Mar 25, 2011 at 3:06 AM, Jan Beulich <JBeulich@novell.com> wrote:
>
> The problem was observed with __lock_page() (in a variant not
> upstream for reasons not known to me), and prefixing e.g.
> trylock_page() with an extra PageLocked() check yielded the
> below quoted improvements.

Ok. __lock_page() _definitely_ should do the test_bit() thing first,
because it's normally called from lock_page() that has already tested
the bit.

But it already seems to do that, so I'm wondering what your variant is.

I'm also a bit surprised that lock_page() is that hot (unless your
_lock_page() variant is simply too broken and ends up spinning?).
Maybe we have some path that takes the page lock unnecessarily? What's
the load?

                Linus

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH RFC] x86: avoid atomic operation in test_and_set_bit_lock if possible
  2011-03-25 16:29           ` Linus Torvalds
@ 2011-03-25 16:47             ` Jan Beulich
  2011-03-25 16:49             ` Jack Steiner
  1 sibling, 0 replies; 49+ messages in thread
From: Jan Beulich @ 2011-03-25 16:47 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Borislav Petkov, Peter Zijlstra, Ingo Molnar, Nick Piggin,
	x86@kernel.org, Thomas Gleixner, Andrew Morton,
	Arnaldo Carvalho de Melo, Ingo Molnar, Jack Steiner, tee,
	Nikanth Karthikesan, linux-kernel@vger.kernel.org, H. Peter Anvin

>>> On 25.03.11 at 17:29, Linus Torvalds <torvalds@linux-foundation.org> wrote:
> On Fri, Mar 25, 2011 at 3:06 AM, Jan Beulich <JBeulich@novell.com> wrote:
>>
>> The problem was observed with __lock_page() (in a variant not
>> upstream for reasons not known to me), and prefixing e.g.
>> trylock_page() with an extra PageLocked() check yielded the
>> below quoted improvements.
> 
> Ok. __lock_page() _definitely_ should do the test_bit() thing first,
> because it's normally called from lock_page() that has already tested
> the bit.
> 
> But it already seems to do that, so I'm wondering what your variant is.

lock_page() does, but here it's really about __lock_page().

We've got a patch from Nick Piggin in the tree that aims at
speeding up unlock_page, which turns __lock_page() (on
2.6.32) into

void __lock_page(struct page *page)
{
	wait_queue_head_t *wq = page_waitqueue(page);
	DEFINE_WAIT_BIT(wait, &page->flags, PG_locked);

	do {
		prepare_to_wait(wq, &wait.wait, TASK_UNINTERRUPTIBLE);
		SetPageWaiters(page);
		if (likely(PageLocked(page)))
			sync_page(page);
	} while (!trylock_page(page));
	finish_wait(wq, &wait.wait);
}

(No, I do not know why the patch isn't upstream, but Nick is on Cc,
so maybe he can tell.)

> I'm also a bit surprised that lock_page() is that hot (unless your
> _lock_page() variant is simply too broken and ends up spinning?).
> Maybe we have some path that takes the page lock unnecessarily? What's
> the load?

So yes, there is spinning involved. A first improvement from Jack
was to make the SetPageWaiters() (introduced by that patch)
conditional upon the bit not already being set.

But that wasn't yielding the desired improvement, hence they
added a PageLocked() || ... in front of the trylock_page(), which
in turn raised the question why trylock_page() wouldn't do this
generally.

Jan


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH RFC] x86: avoid atomic operation in test_and_set_bit_lock if possible
  2011-03-25 16:29           ` Linus Torvalds
  2011-03-25 16:47             ` Jan Beulich
@ 2011-03-25 16:49             ` Jack Steiner
  1 sibling, 0 replies; 49+ messages in thread
From: Jack Steiner @ 2011-03-25 16:49 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Jan Beulich, Ingo Molnar, Borislav Petkov, Peter Zijlstra,
	Nick Piggin, x86@kernel.org, Thomas Gleixner, Andrew Morton,
	Arnaldo Carvalho de Melo, Ingo Molnar, tee, Nikanth Karthikesan,
	linux-kernel@vger.kernel.org, H. Peter Anvin

On Fri, Mar 25, 2011 at 09:29:34AM -0700, Linus Torvalds wrote:
> On Fri, Mar 25, 2011 at 3:06 AM, Jan Beulich <JBeulich@novell.com> wrote:
> >
> > The problem was observed with __lock_page() (in a variant not
> > upstream for reasons not known to me), and prefixing e.g.
> > trylock_page() with an extra PageLocked() check yielded the
> > below quoted improvements.
> 
> Ok. __lock_page() _definitely_ should do the test_bit() thing first,
> because it's normally called from lock_page() that has already tested
> the bit.
> 
> But it already seems to do that, so I'm wondering what your variant is.
> 
> I'm also a bit surprised that lock_page() is that hot (unless your
> _lock_page() variant is simply too broken and ends up spinning?).
> Maybe we have some path that takes the page lock unnecessarily? What's
> the load?

We see the problem primarily on launching very large MPI applications.
The master process rapidly forks a large number (1 per cpu) of processes.
Each faults in a large number of text pages.

The text pages are resident in the page cache. No IO is involved but
the page lock quickly becomes a very hot contended cacheline.

Note also that this is observed in a 2.6.32 distro kernel that has a
different implementation of __lock_page. I think a similar problem
exists in the upstream kernel but have not had a chance to investigate.

We also see a similar problem during boot when a large number of udevd
processes are created.


--- jack


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH RFC] x86: avoid atomic operation in test_and_set_bit_lock if possible
  2011-03-25 10:21                         ` Peter Zijlstra
  2011-03-25 16:08                           ` Robert Richter
@ 2011-03-25 17:15                           ` Andi Kleen
  2011-03-25 19:21                             ` Ingo Molnar
  1 sibling, 1 reply; 49+ messages in thread
From: Andi Kleen @ 2011-03-25 17:15 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Andi Kleen, Linus Torvalds, Eric Dumazet,
	Jack Steiner, Jan Beulich, Borislav Petkov, Nick Piggin,
	x86@kernel.org, Thomas Gleixner, Andrew Morton, Ingo Molnar, tee,
	Nikanth Karthikesan, linux-kernel@vger.kernel.org, H. Peter Anvin

> Also seriously complicated by the kexec case where the previous kernel
> didn't clean up PMU state. There is simply no sane way to detect if its

That's a good point, but we can easily stop the PMU before kexec.

> actually used and by whoem.

You check if the counter is enabled. If it's already enabled it's
used by someone else.
 
> The whole PMU 'sharing' concept championed by Andi is utter crap.

Why? It's the same thing as having some less counters. You need
to already support that for architectural perfmon with variable
counters anyways or for sharing with oprofile.

> As for simply using it despite the BIOS corrupting it, that might not
> always work, the BIOS might simply over-write your state because it
> one-sidedly declares to own the MSRs (observed behaviour).

Yes, that doesn't work. If someone else is active you have to step back.

> Its all a big clusterfuck and really the best way (IMO) is what we have
> now to put pressure on and force the BIOS vendors to play nice.

It just results in users like Eric being screwed.  I doubt that any
BIOS writer ever heard about it. Congratulations for a great plan.

-Andi


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH RFC] x86: avoid atomic operation in test_and_set_bit_lock if possible
  2011-03-25  9:32                         ` Ingo Molnar
  2011-03-25  9:44                           ` Eric Dumazet
  2011-03-25 16:16                           ` Robert Richter
@ 2011-03-25 17:22                           ` Andi Kleen
  2011-03-25 19:26                             ` Ingo Molnar
  2 siblings, 1 reply; 49+ messages in thread
From: Andi Kleen @ 2011-03-25 17:22 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Eric Dumazet, Andi Kleen, Linus Torvalds, Jack Steiner,
	Jan Beulich, Borislav Petkov, Peter Zijlstra, Nick Piggin,
	x86@kernel.org, Thomas Gleixner, Andrew Morton, Ingo Molnar, tee,
	Nikanth Karthikesan, linux-kernel@vger.kernel.org, H. Peter Anvin

> Could you please try the patch below?

This is very likely not a good idea.  I heard some systems can hang
if you overwrite their PMUs this way and you're unlucky enough :-(

-Andi

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH RFC] x86: avoid atomic operation in test_and_set_bit_lock if possible
  2011-03-25 17:15                           ` Andi Kleen
@ 2011-03-25 19:21                             ` Ingo Molnar
  0 siblings, 0 replies; 49+ messages in thread
From: Ingo Molnar @ 2011-03-25 19:21 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Peter Zijlstra, Linus Torvalds, Eric Dumazet, Jack Steiner,
	Jan Beulich, Borislav Petkov, Nick Piggin, x86@kernel.org,
	Thomas Gleixner, Andrew Morton, Ingo Molnar, tee,
	Nikanth Karthikesan, linux-kernel@vger.kernel.org, H. Peter Anvin


* Andi Kleen <andi@firstfloor.org> wrote:

> > Also seriously complicated by the kexec case where the previous kernel
> > didn't clean up PMU state. There is simply no sane way to detect if its
> 
> That's a good point, but we can easily stop the PMU before kexec.

Wrong - there's lots of existing Linux versions out there that will kexec with 
PMU state active. We cannot change them retroactively.

> > actually used and by whoem.
> 
> You check if the counter is enabled. If it's already enabled it's used by 
> someone else.

Wrong - or it can be enabled if it was left enabled accidentally. We treat PMU 
state like we treat other CPU state.

> > The whole PMU 'sharing' concept championed by Andi is utter crap.
> 
> Why? It's the same thing as having some less counters.

Wrong again - 25% or 50% of the events stolen with no user choice is a pretty 
big deal.

Consider the example in this thread: cachemiss profiling done via perf, which 
needs two events. If one of the generic counters is 'stolen' by a BIOS and 
Linux accepts this silently then it means the loss of real functionality.

In other words, '25% of a specific hardware functionality stolen by the BIOS' 
(or more) is absolutely not acceptable.

> [...] You need to already support that for architectural perfmon with 
> variable counters anyways or for sharing with oprofile.

Wrong, that's different - multiplexing the PMU is obviously done within the OS. 
It's evidently user configurable - users can use oprofile or perf - and the 
user can still fully utilise the PMU to the extent he wishes to - it's his 
hardware.

It is not possible for the kernel to stop the BIOS from using the PMU though, 
so it's not 'sharing' no matter what 'protocol' is used, it's 'stealing'.

> > As for simply using it despite the BIOS corrupting it, that might not
> > always work, the BIOS might simply over-write your state because it
> > one-sidedly declares to own the MSRs (observed behaviour).
> 
> Yes, that doesn't work. If someone else is active you have to step back.
> 
> > Its all a big clusterfuck and really the best way (IMO) is what we have
> > now to put pressure on and force the BIOS vendors to play nice.
> 
> It just results in users like Eric being screwed.  I doubt that any
> BIOS writer ever heard about it. Congratulations for a great plan.

I'd encourage you to think through this topic instead of making derisive 
comments about others ...

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH RFC] x86: avoid atomic operation in test_and_set_bit_lock if possible
  2011-03-25 17:22                           ` Andi Kleen
@ 2011-03-25 19:26                             ` Ingo Molnar
  0 siblings, 0 replies; 49+ messages in thread
From: Ingo Molnar @ 2011-03-25 19:26 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Eric Dumazet, Linus Torvalds, Jack Steiner, Jan Beulich,
	Borislav Petkov, Peter Zijlstra, Nick Piggin, x86@kernel.org,
	Thomas Gleixner, Andrew Morton, Ingo Molnar, tee,
	Nikanth Karthikesan, linux-kernel@vger.kernel.org, H. Peter Anvin


* Andi Kleen <andi@firstfloor.org> wrote:

> > Could you please try the patch below?
> 
> This is very likely not a good idea.  I heard some systems can hang
> if you overwrite their PMUs this way and you're unlucky enough :-(

Wrong - in that case all kernels from 2.6.32 to 2.6.38 would already be locking 
up. This patch simply restores behavior to essentially what it was before 
v2.6.38 - but also adds the warning.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH RFC] x86: avoid atomic operation in test_and_set_bit_lock if possible
  2011-03-25 16:08                           ` Robert Richter
@ 2011-03-25 19:31                             ` Ingo Molnar
  0 siblings, 0 replies; 49+ messages in thread
From: Ingo Molnar @ 2011-03-25 19:31 UTC (permalink / raw)
  To: Robert Richter
  Cc: Peter Zijlstra, Andi Kleen, Linus Torvalds, Eric Dumazet,
	Jack Steiner, Jan Beulich, Borislav Petkov, Nick Piggin,
	x86@kernel.org, Thomas Gleixner, Andrew Morton, Ingo Molnar,
	tee@sgi.com, Nikanth Karthikesan, linux-kernel@vger.kernel.org,
	H. Peter Anvin


* Robert Richter <robert.richter@amd.com> wrote:

> > Its all a big clusterfuck and really the best way (IMO) is what we have
> > now to put pressure on and force the BIOS vendors to play nice.
> > 
> > I assume both HP and DELL will be seriously unhappy with the kernel
> > spewing FIRMWARE BUG messages on boot on their boxen, the question is,
> > will they be unhappy enough to fix it..
> 
> So, we better stick then with option 1. My experience is that new
> system's bioses try not to claim perfctrs (affected systems I have
> seen are about 2-3 years old), but I am not really sure here.

That's good news - BIOSen unilaterally stealing PMU real estate is a really 
utterly crazy concept.

For a limited physical resource like the PMU the correct approach to add 
PMU-using features is to add an OS driver that implements the feature via the 
regular PMU access functions. We already have such features so it's very much 
possible. That way it all becomes controllable and configurable to the user.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH RFC] x86: avoid atomic operation in test_and_set_bit_lock if possible
  2011-03-25  9:38                         ` Eric Dumazet
@ 2011-03-25 20:29                           ` Peter Zijlstra
  2011-03-26  8:15                             ` Eric Dumazet
  0 siblings, 1 reply; 49+ messages in thread
From: Peter Zijlstra @ 2011-03-25 20:29 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Andi Kleen, Linus Torvalds, Ingo Molnar, Jack Steiner,
	Jan Beulich, Borislav Petkov, Nick Piggin, x86@kernel.org,
	Thomas Gleixner, Andrew Morton, Ingo Molnar, tee,
	Nikanth Karthikesan, linux-kernel@vger.kernel.org, H. Peter Anvin

On Fri, 2011-03-25 at 10:38 +0100, Eric Dumazet wrote:
> 
> BIOS :  Version: I27  Release Date: 09/30/2010
> 
> [    0.021849] CPU0: Intel(R) Xeon(R) CPU           E5640  @ 2.67GHz stepping 02
> [    0.123904] Performance Events: PEBS fmt1+, Westmere events, Broken BIOS detected, using software events only.
> [    0.124125] [Firmware Bug]: the BIOS has corrupted hw-PMU resources (MSR 38d is 330) 

Do the below instructions yield a working system?:

> > On the BL460c G6/G7, one could perform the following steps:
> >
> > To configure BIOS low-latency options using RBSU
> > 1. Press F9 during POST to enter RBSU.
> > 2. Press CTRL-A to open the menu.
> > 3. Select Service Options.
> > 4. Disable either or both of the following options:
> > o Processor Power and Utilization Monitoring
> > o Memory Pre-Failure Notification
> > 
> > By doing the above the BIOS should release the performance counter, and the
> > kernel's "perf" should not encounter a conflict. 
> > 
> > The document that you are looking for is here: (please see page 8 and later):
> > http://h20000.www2.hp.com/bc/docs/support/SupportManual/c01804533/c01804533.pdf




^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH RFC] x86: avoid atomic operation in test_and_set_bit_lock if possible
  2011-03-25 20:29                           ` Peter Zijlstra
@ 2011-03-26  8:15                             ` Eric Dumazet
  2011-03-26  9:44                               ` Peter Zijlstra
  2011-03-26  9:57                               ` Ingo Molnar
  0 siblings, 2 replies; 49+ messages in thread
From: Eric Dumazet @ 2011-03-26  8:15 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Andi Kleen, Linus Torvalds, Ingo Molnar, Jack Steiner,
	Jan Beulich, Borislav Petkov, Nick Piggin, x86@kernel.org,
	Thomas Gleixner, Andrew Morton, Ingo Molnar, tee,
	Nikanth Karthikesan, linux-kernel@vger.kernel.org, H. Peter Anvin

Le vendredi 25 mars 2011 à 21:29 +0100, Peter Zijlstra a écrit :
> On Fri, 2011-03-25 at 10:38 +0100, Eric Dumazet wrote:
> > 
> > BIOS :  Version: I27  Release Date: 09/30/2010
> > 
> > [    0.021849] CPU0: Intel(R) Xeon(R) CPU           E5640  @ 2.67GHz stepping 02
> > [    0.123904] Performance Events: PEBS fmt1+, Westmere events, Broken BIOS detected, using software events only.
> > [    0.124125] [Firmware Bug]: the BIOS has corrupted hw-PMU resources (MSR 38d is 330) 
> 
> Do the below instructions yield a working system?:
> 
> > > On the BL460c G6/G7, one could perform the following steps:
> > >
> > > To configure BIOS low-latency options using RBSU
> > > 1. Press F9 during POST to enter RBSU.
> > > 2. Press CTRL-A to open the menu.
> > > 3. Select Service Options.
> > > 4. Disable either or both of the following options:
> > > o Processor Power and Utilization Monitoring
> > > o Memory Pre-Failure Notification
> > > 
> > > By doing the above the BIOS should release the performance counter, and the
> > > kernel's "perf" should not encounter a conflict. 
> > > 
> > > The document that you are looking for is here: (please see page 8 and later):
> > > http://h20000.www2.hp.com/bc/docs/support/SupportManual/c01804533/c01804533.pdf
> 
> 
> 

Thanks Peter !

I was not aware of the hidden "CTRL-A" feature on RBSU, so I never met
these configuration settings ("Service Options" added in main menu)

I already was in "Power regulator mode : Static High mode"
QPI Link power management : Disabled

I have latest released I24 BIOS
DMI: HP ProLiant BL460c G6, BIOS I24 01/29/2011

1) I disabled "Power and Utilization Monitoring" _and_ "Memory
Pre-Failure Notification"

Still I got the 'MSR 38d is 330' error

2) I then disabled Intel Turbo boost

Same error

3) Minimum Processor idle Power state
  From C1E-State --> No C-states

Same error

Tricky isnt it ?




^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH RFC] x86: avoid atomic operation in test_and_set_bit_lock if possible
  2011-03-26  8:15                             ` Eric Dumazet
@ 2011-03-26  9:44                               ` Peter Zijlstra
  2011-03-26  9:57                               ` Ingo Molnar
  1 sibling, 0 replies; 49+ messages in thread
From: Peter Zijlstra @ 2011-03-26  9:44 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Andi Kleen, Linus Torvalds, Ingo Molnar, Jack Steiner,
	Jan Beulich, Borislav Petkov, Nick Piggin, x86@kernel.org,
	Thomas Gleixner, Andrew Morton, Ingo Molnar, tee,
	Nikanth Karthikesan, linux-kernel@vger.kernel.org, H. Peter Anvin

On Sat, 2011-03-26 at 09:15 +0100, Eric Dumazet wrote:
> Le vendredi 25 mars 2011 à 21:29 +0100, Peter Zijlstra a écrit :
> > On Fri, 2011-03-25 at 10:38 +0100, Eric Dumazet wrote:
> > > 
> > > BIOS :  Version: I27  Release Date: 09/30/2010
> > > 
> > > [    0.021849] CPU0: Intel(R) Xeon(R) CPU           E5640  @ 2.67GHz stepping 02
> > > [    0.123904] Performance Events: PEBS fmt1+, Westmere events, Broken BIOS detected, using software events only.
> > > [    0.124125] [Firmware Bug]: the BIOS has corrupted hw-PMU resources (MSR 38d is 330) 
> > 
> > Do the below instructions yield a working system?:
> > 
> > > > On the BL460c G6/G7, one could perform the following steps:
> > > >
> > > > To configure BIOS low-latency options using RBSU
> > > > 1. Press F9 during POST to enter RBSU.
> > > > 2. Press CTRL-A to open the menu.
> > > > 3. Select Service Options.
> > > > 4. Disable either or both of the following options:
> > > > o Processor Power and Utilization Monitoring
> > > > o Memory Pre-Failure Notification
> > > > 
> > > > By doing the above the BIOS should release the performance counter, and the
> > > > kernel's "perf" should not encounter a conflict. 
> > > > 
> > > > The document that you are looking for is here: (please see page 8 and later):
> > > > http://h20000.www2.hp.com/bc/docs/support/SupportManual/c01804533/c01804533.pdf
> > 
> > 
> > 
> 
> Thanks Peter !
> 
> I was not aware of the hidden "CTRL-A" feature on RBSU, so I never met
> these configuration settings ("Service Options" added in main menu)
> 
> I already was in "Power regulator mode : Static High mode"
> QPI Link power management : Disabled
> 
> I have latest released I24 BIOS
> DMI: HP ProLiant BL460c G6, BIOS I24 01/29/2011
> 
> 1) I disabled "Power and Utilization Monitoring" _and_ "Memory
> Pre-Failure Notification"
> 
> Still I got the 'MSR 38d is 330' error
> 
> 2) I then disabled Intel Turbo boost
> 
> Same error
> 
> 3) Minimum Processor idle Power state
>   From C1E-State --> No C-states
> 
> Same error
> 
> Tricky isnt it ?

Definitely, OK thanks for testing, I'll contact HP again.


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH RFC] x86: avoid atomic operation in test_and_set_bit_lock if possible
  2011-03-26  8:15                             ` Eric Dumazet
  2011-03-26  9:44                               ` Peter Zijlstra
@ 2011-03-26  9:57                               ` Ingo Molnar
  1 sibling, 0 replies; 49+ messages in thread
From: Ingo Molnar @ 2011-03-26  9:57 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Peter Zijlstra, Andi Kleen, Linus Torvalds, Jack Steiner,
	Jan Beulich, Borislav Petkov, Nick Piggin, x86@kernel.org,
	Thomas Gleixner, Andrew Morton, Ingo Molnar, tee,
	Nikanth Karthikesan, linux-kernel@vger.kernel.org, H. Peter Anvin


* Eric Dumazet <eric.dumazet@gmail.com> wrote:

> > > > By doing the above the BIOS should release the performance counter, and 
> > > > the kernel's "perf" should not encounter a conflict.
> > > > 
> > > > The document that you are looking for is here: (please see page 8 and 
> > > > later): 
> > > > http://h20000.www2.hp.com/bc/docs/support/SupportManual/c01804533/c01804533.pdf
> 
> Thanks Peter !
> 
> I was not aware of the hidden "CTRL-A" feature on RBSU, so I never met these 
> configuration settings ("Service Options" added in main menu)
> 
> I already was in "Power regulator mode : Static High mode" QPI Link power 
> management : Disabled
> 
> I have latest released I24 BIOS
> DMI: HP ProLiant BL460c G6, BIOS I24 01/29/2011
> 
> 1) I disabled "Power and Utilization Monitoring" _and_ "Memory
> Pre-Failure Notification"
> 
> Still I got the 'MSR 38d is 330' error
> 
> 2) I then disabled Intel Turbo boost
> 
> Same error
> 
> 3) Minimum Processor idle Power state
>   From C1E-State --> No C-states
> 
> Same error
> 
> Tricky isnt it ?

This kind of user-configurability reminds me of Mr. Prosser from the local 
council who wants to knock down Arthur Dent’s house with a bulldozer to make 
way for a new bypass:

 "But the plans were on display ..."

 "On display? I eventually had to go down to the cellar to find them."

 "That's the display department."

 "With a flashlight."

 "Ah, well the lights had probably gone."

 "So had the stairs."

 "But look, you found the notice didn't you?"

 "Yes," said Arthur, "yes I did. It was on display in the bottom of a locked 
  filing cabinet stuck in a disused lavatory with a sign on the door saying 
  'Beware of the Leopard'."

	Ingo

^ permalink raw reply	[flat|nested] 49+ messages in thread

end of thread, other threads:[~2011-03-26  9:57 UTC | newest]

Thread overview: 49+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-03-24  4:56 [PATCH RFC] x86: avoid atomic operation in test_and_set_bit_lock if possible Nikanth Karthikesan
2011-03-24  8:52 ` Jan Beulich
2011-03-24  8:56 ` Ingo Molnar
2011-03-24 14:52   ` Borislav Petkov
2011-03-24 16:48     ` Jan Beulich
2011-03-24 17:19       ` Ingo Molnar
2011-03-25 10:06         ` Jan Beulich
2011-03-25 11:10           ` Ingo Molnar
2011-03-25 12:04             ` Nikanth Karthikesan
2011-03-25 13:12           ` Jack Steiner
2011-03-25 16:29           ` Linus Torvalds
2011-03-25 16:47             ` Jan Beulich
2011-03-25 16:49             ` Jack Steiner
2011-03-24 17:30       ` Jack Steiner
2011-03-24 20:00         ` Ingo Molnar
2011-03-24 20:40           ` Andi Kleen
2011-03-24 20:50             ` Ingo Molnar
2011-03-24 21:37               ` Andi Kleen
2011-03-24 20:48           ` Eric Dumazet
2011-03-24 20:54             ` Ingo Molnar
2011-03-24 21:02               ` Eric Dumazet
2011-03-24 21:42                 ` Andi Kleen
2011-03-24 23:26                   ` Linus Torvalds
2011-03-24 23:56                     ` Andi Kleen
2011-03-25  5:47                       ` Eric Dumazet
2011-03-25  9:32                         ` Ingo Molnar
2011-03-25  9:44                           ` Eric Dumazet
2011-03-25  9:59                             ` Ingo Molnar
2011-03-25 10:50                               ` Borislav Petkov
2011-03-25 11:10                               ` Peter Zijlstra
2011-03-25 11:11                                 ` Ingo Molnar
2011-03-25 16:16                           ` Robert Richter
2011-03-25 17:22                           ` Andi Kleen
2011-03-25 19:26                             ` Ingo Molnar
2011-03-25  9:38                         ` Eric Dumazet
2011-03-25 20:29                           ` Peter Zijlstra
2011-03-26  8:15                             ` Eric Dumazet
2011-03-26  9:44                               ` Peter Zijlstra
2011-03-26  9:57                               ` Ingo Molnar
2011-03-25  9:22                       ` Ingo Molnar
2011-03-25 10:21                         ` Peter Zijlstra
2011-03-25 16:08                           ` Robert Richter
2011-03-25 19:31                             ` Ingo Molnar
2011-03-25 17:15                           ` Andi Kleen
2011-03-25 19:21                             ` Ingo Molnar
2011-03-25  9:35                     ` Ingo Molnar
2011-03-24 17:01 ` Linus Torvalds
2011-03-24 17:13 ` Jack Steiner
2011-03-24 18:38 ` Andi Kleen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox