[BUG FIX] Make x86_32 uni-processor Atomic ops, Atomic

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [BUG FIX] Make x86_32 uni-processor Atomic ops, Atomic
@ 2009-05-22 16:39 Michael S. Zick
  2009-05-22 18:23 ` Andi Kleen
                   ` (2 more replies)
  0 siblings, 3 replies; 90+ messages in thread
From: Michael S. Zick @ 2009-05-22 16:39 UTC (permalink / raw)
  To: linux-kernel

Found in the bit-rot for 32-bit, x86, Uni-processor builds:

diff --git a/arch/x86/include/asm/alternative.h b/arch/x86/include/asm/alternative.h
index f6aa18e..3c790ef 100644
--- a/arch/x86/include/asm/alternative.h
+++ b/arch/x86/include/asm/alternative.h
@@ -35,7 +35,7 @@
                "661:\n\tlock; "

 #else /* ! CONFIG_SMP */
-#define LOCK_PREFIX ""
+#define LOCK_PREFIX "\n\tlock; "
 #endif

 /* This must be included *after* the definition of LOCK_PREFIX */

Submitted: M. S. Zick

^ permalink raw reply related	[flat|nested] 90+ messages in thread

* Re: [BUG FIX] Make x86_32 uni-processor Atomic ops, Atomic
  2009-05-22 16:39 [BUG FIX] Make x86_32 uni-processor Atomic ops, Atomic Michael S. Zick
@ 2009-05-22 18:23 ` Andi Kleen
  2009-05-22 18:36 ` Ingo Molnar
       [not found] ` <200905221343.30638.lkml@morethan.org>
  2 siblings, 0 replies; 90+ messages in thread
From: Andi Kleen @ 2009-05-22 18:23 UTC (permalink / raw)
  To: lkml; +Cc: linux-kernel

"Michael S. Zick" <lkml@morethan.org> writes:

> Found in the bit-rot for 32-bit, x86, Uni-processor builds:

Actually uni processor should not use the lock prefix 
because it doesn't need it; the only exception are some special
ops used in para-virtualization which are special cased.

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [BUG FIX] Make x86_32 uni-processor Atomic ops, Atomic
  2009-05-22 16:39 [BUG FIX] Make x86_32 uni-processor Atomic ops, Atomic Michael S. Zick
  2009-05-22 18:23 ` Andi Kleen
@ 2009-05-22 18:36 ` Ingo Molnar
  2009-05-22 18:59   ` H. Peter Anvin
  2009-05-22 19:17   ` Michael S. Zick
       [not found] ` <200905221343.30638.lkml@morethan.org>
  2 siblings, 2 replies; 90+ messages in thread
From: Ingo Molnar @ 2009-05-22 18:36 UTC (permalink / raw)
  To: Michael S. Zick, H. Peter Anvin, Thomas Gleixner; +Cc: linux-kernel


* Michael S. Zick <lkml@morethan.org> wrote:

> Found in the bit-rot for 32-bit, x86, Uni-processor builds:
> 
> diff --git a/arch/x86/include/asm/alternative.h b/arch/x86/include/asm/alternative.h
> index f6aa18e..3c790ef 100644
> --- a/arch/x86/include/asm/alternative.h
> +++ b/arch/x86/include/asm/alternative.h
> @@ -35,7 +35,7 @@
>                 "661:\n\tlock; "
> 
>  #else /* ! CONFIG_SMP */
> -#define LOCK_PREFIX ""
> +#define LOCK_PREFIX "\n\tlock; "
>  #endif

What is your motivation for this change? At first sight this makes 
the UP kernel a bit larger and a bit smaller. Are you fixing some 
real regression/bug here?

	Ingo

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [BUG FIX] Make x86_32 uni-processor Atomic ops, Atomic
  2009-05-22 18:36 ` Ingo Molnar
@ 2009-05-22 18:59   ` H. Peter Anvin
  2009-05-22 19:20     ` Michael S. Zick
  2009-05-22 22:21     ` Michael S. Zick
  2009-05-22 19:17   ` Michael S. Zick
  1 sibling, 2 replies; 90+ messages in thread
From: H. Peter Anvin @ 2009-05-22 18:59 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Michael S. Zick, Thomas Gleixner, linux-kernel

Ingo Molnar wrote:
> * Michael S. Zick <lkml@morethan.org> wrote:
> 
>> Found in the bit-rot for 32-bit, x86, Uni-processor builds:
>>
>> diff --git a/arch/x86/include/asm/alternative.h b/arch/x86/include/asm/alternative.h
>> index f6aa18e..3c790ef 100644
>> --- a/arch/x86/include/asm/alternative.h
>> +++ b/arch/x86/include/asm/alternative.h
>> @@ -35,7 +35,7 @@
>>                 "661:\n\tlock; "
>>
>>  #else /* ! CONFIG_SMP */
>> -#define LOCK_PREFIX ""
>> +#define LOCK_PREFIX "\n\tlock; "
>>  #endif
> 
> What is your motivation for this change? At first sight this makes 
> the UP kernel a bit larger and a bit smaller. Are you fixing some 
> real regression/bug here?
> 

That looks very odd indeed.  The whole point of the LOCK_PREFIX macro is
to squelch it on UP (locks that should not be squelched on UP should not
be annotated LOCK_PREFIX.)

	-hpa


^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [BUG FIX] Make x86_32 uni-processor Atomic ops, Atomic
  2009-05-22 18:59   ` H. Peter Anvin
@ 2009-05-22 19:20     ` Michael S. Zick
  2009-05-22 22:21     ` Michael S. Zick
  1 sibling, 0 replies; 90+ messages in thread
From: Michael S. Zick @ 2009-05-22 19:20 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: Ingo Molnar, Thomas Gleixner, linux-kernel

On Fri May 22 2009, H. Peter Anvin wrote:
> Ingo Molnar wrote:
> > * Michael S. Zick <lkml@morethan.org> wrote:
> > 
> >> Found in the bit-rot for 32-bit, x86, Uni-processor builds:
> >>
> >> diff --git a/arch/x86/include/asm/alternative.h b/arch/x86/include/asm/alternative.h
> >> index f6aa18e..3c790ef 100644
> >> --- a/arch/x86/include/asm/alternative.h
> >> +++ b/arch/x86/include/asm/alternative.h
> >> @@ -35,7 +35,7 @@
> >>                 "661:\n\tlock; "
> >>
> >>  #else /* ! CONFIG_SMP */
> >> -#define LOCK_PREFIX ""
> >> +#define LOCK_PREFIX "\n\tlock; "
> >>  #endif
> > 
> > What is your motivation for this change? At first sight this makes 
> > the UP kernel a bit larger and a bit smaller. Are you fixing some 
> > real regression/bug here?
> > 
> 
> That looks very odd indeed.  The whole point of the LOCK_PREFIX macro is
> to squelch it on UP (locks that should not be squelched on UP should not
> be annotated LOCK_PREFIX.)
> 

OK, will inspect for that possibility.
We may just have a mis-use of LOCK_PREFIX.

Mike
> 	-hpa
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 
> 



^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [BUG FIX] Make x86_32 uni-processor Atomic ops, Atomic
  2009-05-22 18:59   ` H. Peter Anvin
  2009-05-22 19:20     ` Michael S. Zick
@ 2009-05-22 22:21     ` Michael S. Zick
  2009-05-22 23:30       ` H. Peter Anvin
  2009-05-28 12:48       ` Pavel Machek
  1 sibling, 2 replies; 90+ messages in thread
From: Michael S. Zick @ 2009-05-22 22:21 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: Ingo Molnar, Thomas Gleixner, linux-kernel

On Fri May 22 2009, H. Peter Anvin wrote:
> Ingo Molnar wrote:
> > * Michael S. Zick <lkml@morethan.org> wrote:
> > 
> >> Found in the bit-rot for 32-bit, x86, Uni-processor builds:
> >>
> >> diff --git a/arch/x86/include/asm/alternative.h b/arch/x86/include/asm/alternative.h
> >> index f6aa18e..3c790ef 100644
> >> --- a/arch/x86/include/asm/alternative.h
> >> +++ b/arch/x86/include/asm/alternative.h
> >> @@ -35,7 +35,7 @@
> >>                 "661:\n\tlock; "
> >>
> >>  #else /* ! CONFIG_SMP */
> >> -#define LOCK_PREFIX ""
> >> +#define LOCK_PREFIX "\n\tlock; "
> >>  #endif
> > 
> > What is your motivation for this change? At first sight this makes 
> > the UP kernel a bit larger and a bit smaller. Are you fixing some 
> > real regression/bug here?
> > 
> 
> That looks very odd indeed.  The whole point of the LOCK_PREFIX macro is
> to squelch it on UP (locks that should not be squelched on UP should not
> be annotated LOCK_PREFIX.)
> 

I can only act as a messanger to report the behavior I observe;
But let me see if I can't do a better job of that limited role.

hpa makes the best point of all in the responses here...

What I see (erratic operation, erratic lock-ups of the machine, 
and the previously posted lockdep dump) -

This may well be misplaced usage of the LOCK_PREFIX macro;
I have already agreed to keep my eyes open for this more 
specific problem.

A secondary possibility, hinted at in the context of other replies;
The usage of the LOCK_PREFIX may not apply equally to all processors
for which this code gets included.
It is possible that I am building for one of the exceptions.
That tells us nothing, since the CPU technical details are under NDA.
All that can be done in this case is report behavior differences from
the closest publicly described processor (Pentium-M).

For that purpose, I suggest that a single processor box, with other 
hardware that makes memory access independent of the processor's 
control using a processor older than P-4 is a potential test bed.
"Other hardware that makes memory access..." I previously termed:
"buss master DMA" - which is overly specific.  It misleads people
into thinking I am seeing hardware control issues rather than
non-exclusive memory access.

My earlier comments about taking an interrupt between the memory read
and the memory write operations is from a different manual than the
one posted.  A manual that only applies to processors older than 
the ones supported by the Linux kernel.  
Sorry, my bad, grabbed the wrong book, posted the correct link (SH).

Until one or more specific usages of the LOCK_PREFIX macro can be 
demonstrated to be incorrect (at least for some of the processors 
using this code) - -

Then making the posted change is a single point change that gives a 
pair of builds (one with, one without) to compare the behavior of on 
the test bed.

It is *not* the preferred change for a general release kernel, the
preferred change would be one that makes a specific rather than
general correction.  
Perhaps only for some functions, perhaps only for some of the 
processors that currently select this code.

The observation that executing an unnecessary 'lock' opcode in some
cases slows down the machine is not felt by myself to be significant 
to duplicating my observations.  Note: I have been wrong before.

This is as informative as I can make the message.

PS: *not* a single machine failure, tested on five machines, owned
by four different people, two brands, with different use histories.

Mike
> 	-hpa
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 
> 

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [BUG FIX] Make x86_32 uni-processor Atomic ops, Atomic
  2009-05-22 22:21     ` Michael S. Zick
@ 2009-05-22 23:30       ` H. Peter Anvin
  2009-05-23  0:45         ` Michael S. Zick
  2009-05-28 12:48       ` Pavel Machek
  1 sibling, 1 reply; 90+ messages in thread
From: H. Peter Anvin @ 2009-05-22 23:30 UTC (permalink / raw)
  To: lkml; +Cc: Ingo Molnar, Thomas Gleixner, linux-kernel

If there is a driver which relies on locked operations to be atomic with
respect to the I/O subsystem, it needs to use true locks, not LOCK_PREFIX.

An interrupt cannot interrupt between two parts of a lockable
instruction even if it isn't locked (there are non-atomic instructions
in the x86 architecture, but they can never be locked.)

The other thing that you might be seeing is that a locked operation may
be slow enough to keep an otherwise-present race condition from being
triggered.

> That tells us nothing, since the CPU technical details are under NDA.

Have you considered that you might be running into a CPU bug or design
error?  There was the out-of-order store bug on the Winchip that needed
workarounds (CONFIG_X86_OOSTORE) that I don't think were ever well
tested and might very well have bitrotted?

> All that can be done in this case is report behavior differences from
> the closest publicly described processor (Pentium-M).
> 
> For that purpose, I suggest that a single processor box, with other 
> hardware that makes memory access independent of the processor's 
> control using a processor older than P-4 is a potential test bed.
> "Other hardware that makes memory access..." I previously termed:
> "buss master DMA" - which is overly specific.  It misleads people
> into thinking I am seeing hardware control issues rather than
> non-exclusive memory access.
> 
> My earlier comments about taking an interrupt between the memory read
> and the memory write operations is from a different manual than the
> one posted.  A manual that only applies to processors older than 
> the ones supported by the Linux kernel.  
> Sorry, my bad, grabbed the wrong book, posted the correct link (SH).
> 
> Until one or more specific usages of the LOCK_PREFIX macro can be 
> demonstrated to be incorrect (at least for some of the processors 
> using this code) - -
> 
> Then making the posted change is a single point change that gives a 
> pair of builds (one with, one without) to compare the behavior of on 
> the test bed.
> 
> It is *not* the preferred change for a general release kernel, the
> preferred change would be one that makes a specific rather than
> general correction.  
> Perhaps only for some functions, perhaps only for some of the 
> processors that currently select this code.
> 
> The observation that executing an unnecessary 'lock' opcode in some
> cases slows down the machine is not felt by myself to be significant 
> to duplicating my observations.  Note: I have been wrong before.

What makes you draw that conclusion, in particular?  A lock prefix
typically slows down the following instruction dramatically, on some
processors by many hundreds of cycles.

> This is as informative as I can make the message.
> 
> PS: *not* a single machine failure, tested on five machines, owned
> by four different people, two brands, with different use histories.

What do they have in common?

	-hpa

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [BUG FIX] Make x86_32 uni-processor Atomic ops, Atomic
  2009-05-22 23:30       ` H. Peter Anvin
@ 2009-05-23  0:45         ` Michael S. Zick
  2009-05-23  0:51           ` H. Peter Anvin
  0 siblings, 1 reply; 90+ messages in thread
From: Michael S. Zick @ 2009-05-23  0:45 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: Ingo Molnar, Thomas Gleixner, linux-kernel

On Fri May 22 2009, H. Peter Anvin wrote:
> If there is a driver which relies on locked operations to be atomic with
> respect to the I/O subsystem, it needs to use true locks, not LOCK_PREFIX.
> 
> An interrupt cannot interrupt between two parts of a lockable
> instruction even if it isn't locked (there are non-atomic instructions
> in the x86 architecture, but they can never be locked.)
> 
> The other thing that you might be seeing is that a locked operation may
> be slow enough to keep an otherwise-present race condition from being
> triggered.
> 
> > That tells us nothing, since the CPU technical details are under NDA.
> 
> Have you considered that you might be running into a CPU bug or design
> error?  There was the out-of-order store bug on the Winchip that needed
> workarounds (CONFIG_X86_OOSTORE) that I don't think were ever well
> tested and might very well have bitrotted?
> 
> > All that can be done in this case is report behavior differences from
> > the closest publicly described processor (Pentium-M).
> > 
> > For that purpose, I suggest that a single processor box, with other 
> > hardware that makes memory access independent of the processor's 
> > control using a processor older than P-4 is a potential test bed.
> > "Other hardware that makes memory access..." I previously termed:
> > "buss master DMA" - which is overly specific.  It misleads people
> > into thinking I am seeing hardware control issues rather than
> > non-exclusive memory access.
> > 
> > My earlier comments about taking an interrupt between the memory read
> > and the memory write operations is from a different manual than the
> > one posted.  A manual that only applies to processors older than 
> > the ones supported by the Linux kernel.  
> > Sorry, my bad, grabbed the wrong book, posted the correct link (SH).
> > 
> > Until one or more specific usages of the LOCK_PREFIX macro can be 
> > demonstrated to be incorrect (at least for some of the processors 
> > using this code) - -
> > 
> > Then making the posted change is a single point change that gives a 
> > pair of builds (one with, one without) to compare the behavior of on 
> > the test bed.
> > 
> > It is *not* the preferred change for a general release kernel, the
> > preferred change would be one that makes a specific rather than
> > general correction.  
> > Perhaps only for some functions, perhaps only for some of the 
> > processors that currently select this code.
> > 
> > The observation that executing an unnecessary 'lock' opcode in some
> > cases slows down the machine is not felt by myself to be significant 
> > to duplicating my observations.  Note: I have been wrong before.
> 
> What makes you draw that conclusion, in particular?  A lock prefix
> typically slows down the following instruction dramatically, on some
> processors by many hundreds of cycles.
> 
> > This is as informative as I can make the message.
> > 
> > PS: *not* a single machine failure, tested on five machines, owned
> > by four different people, two brands, with different use histories.
> 
> What do they have in common?
>

Same integrated motherboard.
There is very little information to be gained from staring at a glowing
power on light, that only glows back.  ;)
The lockdep dump posted is the best source of information.

Other observations -

Here is something which these machines do, which may not be happening
with your choice of test machines:

ACPI: Core revision 20090320
..TIMER: vector=0x30 apic1=0 pin1=0 apic2=-1 pin2=-1
..MP-BIOS bug: 8254 timer not connected to IO-APIC
...trying to set up timer (IRQ0) through the 8259A ...
..... (found apic 0 pin 0) ...
....... works.

Note: This is on a Uni-processor build.
I have not yet examined the code that generates that set of messages.
Might be a broken work-around?

With the LOCK_PREFIX == ""

Test conditions (same as the lockdep dump) -
VLC playing streaming audio over the wired net connection (8139too) -
from 4 to 8 ssh remote terminal sessions, each running "top" set
to use different display intervales (different in 0.1 second steps) -
Fixed cpu speed at half the rated clock (for the purpose of testing).
Now just hang back and listen for 10 minutes to 4 hours -

When the machine stops running -
You will still hear bursts of sound - -
I am *guessing* that this means the chip set and bus clocks are running,
also that DMA is running - with the result that the HD audio driver
is just replaying the same buffer offset.
There is a PCI-to-PCIe bridge in the chip set and the HD audio hardware
(also on chip) is the only thing detected on the PCIe bus.

The "hold down power button to stop" still works -
I presume that means at least that internal timer is still running.

Repeat the above, *with* LOCK_PREFIX == "\n\tlock; "
When the machine stops - with only minutes rather than hours of uptime -
The machine is silent - I presume this means that DMA is not running.
The "hold down power button to stop" still works -
So clocks are not totally off.

= = = =

Either "lock-up" situation acts as if:
*) cpu is halted with interrupts off; or
*) cpu is in a tight loop with interrupts off
The primary difference is that the DMA has been stopped in the second case.
Presuming my two guesses on that subject above are correct.

Mike

> 	-hpa
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 
> 

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [BUG FIX] Make x86_32 uni-processor Atomic ops, Atomic
  2009-05-23  0:45         ` Michael S. Zick
@ 2009-05-23  0:51           ` H. Peter Anvin
  2009-05-23 10:44             ` Michael S. Zick
                               ` (3 more replies)
  0 siblings, 4 replies; 90+ messages in thread
From: H. Peter Anvin @ 2009-05-23  0:51 UTC (permalink / raw)
  To: lkml; +Cc: Ingo Molnar, Thomas Gleixner, linux-kernel

Michael S. Zick wrote:
> Same integrated motherboard.

Which means same CPU, same BIOS, same motherboard (none of which you're
telling us.)

cpuinfo and dmidecode would be informative.

	-hpa


^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [BUG FIX] Make x86_32 uni-processor Atomic ops, Atomic
  2009-05-23  0:51           ` H. Peter Anvin
@ 2009-05-23 10:44             ` Michael S. Zick
  2009-05-23 11:18               ` Michael S. Zick
                                 ` (2 more replies)
  2009-05-23 15:52             ` Michael S. Zick
                               ` (2 subsequent siblings)
  3 siblings, 3 replies; 90+ messages in thread
From: Michael S. Zick @ 2009-05-23 10:44 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: Ingo Molnar, Thomas Gleixner, linux-kernel

On Fri May 22 2009, H. Peter Anvin wrote:
> Michael S. Zick wrote:
> > Same integrated motherboard.
> 
> Which means same CPU, same BIOS, same motherboard (none of which you're
> telling us.)
> 

The only objective information is posted here:
http://lkml.org/lkml/2009/5/20/342
Everything else related to this problem is subjective.

> cpuinfo and dmidecode would be informative.
> 

Must have hit "reply" rather than "reply all" at some
critical point along the way.

Browse this directory:
http://hp-umpc.com/ce1200v/
Your looking for the:
http://hp-umpc.com/ce1200v/sylvania-g-data.tar.gz
The Everex Cloudbook only varies by some strings in the 
dmidecode output.

For logs of speculation and efforts at re-arranging the
deck chairs on the Titanic:
http://forum.netbookuser.com/viewforum.php?id=8

I chose to start with the ce1200v because:
*) It needs the most help;
*) One of the two tech manuals on the cx700 has been
published since the drivers where touched.
see: http://linux.via.com.tw/support/downloadFiles.action
select cx700/vx700 in right-hand box, click the manual.
*) The HP-2133 MiniNote uses the cn896 chipset, which
has not yet been released from NDA.

Note:
I do not have the C7-M technical reference, it is still
under NDA.
*But* if a developer on this list has a copy of the
manual *and* owns one of these three brands of machine -
they would have fixed their own machine a year ago.
So I am not holding my breath until a person is located
that has both the manual and the machine.  ;)

Mike

> 	-hpa
> 
> 
> 

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [BUG FIX] Make x86_32 uni-processor Atomic ops, Atomic
  2009-05-23 10:44             ` Michael S. Zick
@ 2009-05-23 11:18               ` Michael S. Zick
  2009-05-24  7:04               ` Harald Welte
  2009-05-27 22:13               ` Roland Dreier
  2 siblings, 0 replies; 90+ messages in thread
From: Michael S. Zick @ 2009-05-23 11:18 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: Ingo Molnar, Thomas Gleixner, linux-kernel

On Sat May 23 2009, Michael S. Zick wrote:
> On Fri May 22 2009, H. Peter Anvin wrote:
> > Michael S. Zick wrote:
> > > Same integrated motherboard.
> > 
> > Which means same CPU, same BIOS, same motherboard (none of which you're
> > telling us.)
> > 
> 
> The only objective information is posted here:
> http://lkml.org/lkml/2009/5/20/342
> Everything else related to this problem is subjective.
> 
> > cpuinfo and dmidecode would be informative.
> > 
> 
> Must have hit "reply" rather than "reply all" at some
> critical point along the way.
> 
> Browse this directory:
> http://hp-umpc.com/ce1200v/
> Your looking for the:
> http://hp-umpc.com/ce1200v/sylvania-g-data.tar.gz
> The Everex Cloudbook only varies by some strings in the 
> dmidecode output.
> 
> For logs of speculation and efforts at re-arranging the
> deck chairs on the Titanic:
> http://forum.netbookuser.com/viewforum.php?id=8
> 
> I chose to start with the ce1200v because:
> *) It needs the most help;
> *) One of the two tech manuals on the cx700 has been
> published since the drivers where touched.
> see: http://linux.via.com.tw/support/downloadFiles.action
> select cx700/vx700 in right-hand box, click the manual.
> *) The HP-2133 MiniNote uses the cn896 chipset, which
> has not yet been released from NDA.
> 
> Note:
> I do not have the C7-M technical reference, it is still
> under NDA.
> *But* if a developer on this list has a copy of the
> manual *and* owns one of these three brands of machine -
> they would have fixed their own machine a year ago.
> So I am not holding my breath until a person is located
> that has both the manual and the machine.  ;)
> 

As to getting a person with the manuals on-hand and everyone
else together, may I point out that the MUC room is still on-line:

Pick you favorite Jabber MUC client;
JID: cloudbook-group@conference.jabber.cb-chat.com

Which translates to:
Room: cloudbook-group
URL: conference.jabber.cb-chat.com
Password: <none> leave blank in your client, it is a public room.

Note: This is a low-volume server and has "on-demand" rooms enabled;
you want a room for the "hot topic" of the moment;
just "join chat" or "join group" (whatever your client calls it) for
the new room name - the server will create it for you.

If creating rooms, I strongly suggest Gajim - it has the easiest to
use control and administrative features.

Note2: Not using any video, so we will not see your smiling face. ;)

Mike
> Mike
> 
> 
> > 	-hpa
> > 
> > 
> > 
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 
> 



^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [BUG FIX] Make x86_32 uni-processor Atomic ops, Atomic
  2009-05-23 10:44             ` Michael S. Zick
  2009-05-23 11:18               ` Michael S. Zick
@ 2009-05-24  7:04               ` Harald Welte
  2009-05-24 12:48                 ` Michael S. Zick
  2009-05-24 15:43                 ` Michael S. Zick
  2009-05-27 22:13               ` Roland Dreier
  2 siblings, 2 replies; 90+ messages in thread
From: Harald Welte @ 2009-05-24  7:04 UTC (permalink / raw)
  To: Michael S. Zick
  Cc: H. Peter Anvin, Ingo Molnar, Thomas Gleixner, linux-kernel

On Sat, May 23, 2009 at 05:44:52AM -0500, Michael S. Zick wrote:

> *) The HP-2133 MiniNote uses the cn896 chipset, which
> has not yet been released from NDA.

I can see towards getting that changed, but I doubt this helps us with the
current problem.

> I do not have the C7-M technical reference, it is still
> under NDA.

I obviously have access to that documentation (which is also on its way
to become public, but needs more time) - but believe me, there is nothing
in that documentation that would help you to debug this problem :(

> *But* if a developer on this list has a copy of the
> manual *and* owns one of these three brands of machine -
> they would have fixed their own machine a year ago.

I actually own a 2133 mininote, but I rarely used it for anything but to test
openchrome on it.  What do you suggest me to try?

I also have some other systems with a C7-M, so I can certainly verify
certain code on a number of them, if a good testcase exists.

-- 
- Harald Welte <HaraldWelte@viatech.com>	    http://linux.via.com.tw/
============================================================================
VIA Open Source Liaison

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [BUG FIX] Make x86_32 uni-processor Atomic ops, Atomic
  2009-05-24  7:04               ` Harald Welte
@ 2009-05-24 12:48                 ` Michael S. Zick
  2009-05-24 15:43                 ` Michael S. Zick
  1 sibling, 0 replies; 90+ messages in thread
From: Michael S. Zick @ 2009-05-24 12:48 UTC (permalink / raw)
  To: Harald Welte; +Cc: H. Peter Anvin, Ingo Molnar, Thomas Gleixner, linux-kernel

On Sun May 24 2009, Harald Welte wrote:
> On Sat, May 23, 2009 at 05:44:52AM -0500, Michael S. Zick wrote:
> 
> > *) The HP-2133 MiniNote uses the cn896 chipset, which
> > has not yet been released from NDA.
> 
> I can see towards getting that changed, but I doubt this helps us with the
> current problem.
> 
> > I do not have the C7-M technical reference, it is still
> > under NDA.
> 
> I obviously have access to that documentation (which is also on its way
> to become public, but needs more time) - but believe me, there is nothing
> in that documentation that would help you to debug this problem :(
> 
> > *But* if a developer on this list has a copy of the
> > manual *and* owns one of these three brands of machine -
> > they would have fixed their own machine a year ago.
> 
> I actually own a 2133 mininote, but I rarely used it for anything but to test
> openchrome on it.  What do you suggest me to try?
>

The HP-2133 (C7-M/CN896) did not fail yesterday.
Find a C7-M/CX700 machine.

You might hook the rss feed at:
http://forum.netbookuser.com/viewforum.php?id=8
where my rants/raves/speculations are logged and
the other people helping me test make their comments.

In particular:

The original instructions (including download url):
http://forum.netbookuser.com/viewtopic.php?pid=6702#p6702

Updated installation instructions:
http://forum.netbookuser.com/viewtopic.php?id=907 

Also ignore anything you read in LKML that I have been doing
this in secret - those authors just never got the memo. ;)

> I also have some other systems with a C7-M, so I can certainly verify
> certain code on a number of them, if a good testcase exists.
> 

Still working towards a specific test case - only thing at this
point it the sledge hammer of putting the "lock" back in, everywhere.

Mike


^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [BUG FIX] Make x86_32 uni-processor Atomic ops, Atomic
  2009-05-24  7:04               ` Harald Welte
  2009-05-24 12:48                 ` Michael S. Zick
@ 2009-05-24 15:43                 ` Michael S. Zick
  1 sibling, 0 replies; 90+ messages in thread
From: Michael S. Zick @ 2009-05-24 15:43 UTC (permalink / raw)
  To: Harald Welte
  Cc: H. Peter Anvin, Ingo Molnar, Thomas Gleixner, linux-kernel,
	Alan Cox

On Sun May 24 2009, Harald Welte wrote:
> On Sat, May 23, 2009 at 05:44:52AM -0500, Michael S. Zick wrote:
> 
> > *) The HP-2133 MiniNote uses the cn896 chipset, which
> > has not yet been released from NDA.
> 
> I can see towards getting that changed, but I doubt this helps us with the
> current problem.
> 
> > I do not have the C7-M technical reference, it is still
> > under NDA.
> 
> I obviously have access to that documentation (which is also on its way
> to become public, but needs more time) - but believe me, there is nothing
> in that documentation that would help you to debug this problem :(
> 
> > *But* if a developer on this list has a copy of the
> > manual *and* owns one of these three brands of machine -
> > they would have fixed their own machine a year ago.
> 
> I actually own a 2133 mininote, but I rarely used it for anything but to test
> openchrome on it.  What do you suggest me to try?
>

The {,lk} pair of yesterday - now built against tag 2.6.3-rc7 is
posted as -09144{,lk}

Details here:
http://forum.netbookuser.com/viewtopic.php?pid=6976#p6976

Try them on a C7-M/CX700 (or newer NetBook system chipset)
(I don't normally test on the HP-2133 (C7-M/CN896) since I am
not (yet) dealing with the Broadcom firmware and SBB driver.)

Mike
> I also have some other systems with a C7-M, so I can certainly verify
> certain code on a number of them, if a good testcase exists.
> 



^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [BUG FIX] Make x86_32 uni-processor Atomic ops, Atomic
  2009-05-23 10:44             ` Michael S. Zick
  2009-05-23 11:18               ` Michael S. Zick
  2009-05-24  7:04               ` Harald Welte
@ 2009-05-27 22:13               ` Roland Dreier
  2009-05-27 22:33                 ` Michael S. Zick
  2 siblings, 1 reply; 90+ messages in thread
From: Roland Dreier @ 2009-05-27 22:13 UTC (permalink / raw)
  To: peterz; +Cc: lkml, H. Peter Anvin, Ingo Molnar, Thomas Gleixner, linux-kernel

 > The only objective information is posted here:
 > http://lkml.org/lkml/2009/5/20/342

Not sure if you've looked at this, but it's a lockdep trace that looks
to be a valid lockdep report due to non-annotated code (I don't *think*
it's a bug).  To summarize, there is the code path in
kernel/irq/spurious.c that does:

    poll_spurious_irq_timer ->
      poll_spurious_irqs()     [from timer, with hard IRQs on] ->
      poll_all_shared_irqs()   [if we think an IRQ got stuck] ->
      try_one_irq() ->
      spin_lock(&desc->lock)   [as above -- hard IRQs on]

while kernel/irq/chip.c has:

    handle_level_irq()         [called with hard IRQs off] ->
      spin_lock(&desc->lock)   [as above -- hard IRQs off]

and lockdep can't tell that the interrupt corresponding to desc has been
disabled if we ever actually reach try_one_irq(), so there's no risk of
the interrupt coming in and deadlocking while the try_one_irq() code
holds desc->lock.

Unfortunately I don't know how to annotate this...

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [BUG FIX] Make x86_32 uni-processor Atomic ops, Atomic
  2009-05-27 22:13               ` Roland Dreier
@ 2009-05-27 22:33                 ` Michael S. Zick
  0 siblings, 0 replies; 90+ messages in thread
From: Michael S. Zick @ 2009-05-27 22:33 UTC (permalink / raw)
  To: Roland Dreier
  Cc: peterz, H. Peter Anvin, Ingo Molnar, Thomas Gleixner,
	linux-kernel

On Wed May 27 2009, Roland Dreier wrote:
> 
>  > The only objective information is posted here:
>  > http://lkml.org/lkml/2009/5/20/342
> 
> Not sure if you've looked at this, but it's a lockdep trace that looks
> to be a valid lockdep report due to non-annotated code (I don't *think*
> it's a bug).  To summarize, there is the code path in
> kernel/irq/spurious.c that does:
>

I haven't looked at it - beyond my skill level.

Still trying to deal with a machine where the only symptom is a deadlock.
So I post these for someone else's eyes until I figure out the deadlock.

Mike
 
>     poll_spurious_irq_timer ->
>       poll_spurious_irqs()     [from timer, with hard IRQs on] ->
>       poll_all_shared_irqs()   [if we think an IRQ got stuck] ->
>       try_one_irq() ->
>       spin_lock(&desc->lock)   [as above -- hard IRQs on]
> 
> while kernel/irq/chip.c has:
> 
>     handle_level_irq()         [called with hard IRQs off] ->
>       spin_lock(&desc->lock)   [as above -- hard IRQs off]
> 
> and lockdep can't tell that the interrupt corresponding to desc has been
> disabled if we ever actually reach try_one_irq(), so there's no risk of
> the interrupt coming in and deadlocking while the try_one_irq() code
> holds desc->lock.
> 
> Unfortunately I don't know how to annotate this...
> 
> 



^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [BUG FIX] Make x86_32 uni-processor Atomic ops, Atomic
  2009-05-23  0:51           ` H. Peter Anvin
  2009-05-23 10:44             ` Michael S. Zick
@ 2009-05-23 15:52             ` Michael S. Zick
  2009-05-23 18:04             ` Michael S. Zick
  2009-05-23 20:51             ` [BUG FIX] Make x86_32 uni-processor Atomic ops, Atomic Michael S. Zick
  3 siblings, 0 replies; 90+ messages in thread
From: Michael S. Zick @ 2009-05-23 15:52 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: Ingo Molnar, Thomas Gleixner, linux-kernel

On Fri May 22 2009, H. Peter Anvin wrote:
> Michael S. Zick wrote:
> > Same integrated motherboard.
> 
> Which means same CPU, same BIOS, same motherboard (none of which you're
> telling us.)
> 
> cpuinfo and dmidecode would be informative.
>

Build: 
linux-2.6.30-rc6-ce1200v-09143_2.6.30-rc6-ce1200v-09143-22_i386.deb
The -09143lk later today.

Now also testing on the HP-2133 (C7-M/CN896) in addition
to the Everex Cloudbook/Sylvania gBook (C7-M/CX700).

Additional details:
http://forum.netbookuser.com/viewtopic.php?pid=6968#p6968

Download location:
http://hp-umpc.com/ce1200v/

HP-2133 data capture and the Sylvania/Everex data capture:
hp-2133-data_cap.tgz
sylvania-g-data.tar.gz

Summary:
On the ce1200v - first test 46 minutes uptime.
On the hp-2133 - ?? still running - no results yet. 
The -09143lk (yyddd) build not yet tested.

Mike

> 	-hpa
> 

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [BUG FIX] Make x86_32 uni-processor Atomic ops, Atomic
  2009-05-23  0:51           ` H. Peter Anvin
  2009-05-23 10:44             ` Michael S. Zick
  2009-05-23 15:52             ` Michael S. Zick
@ 2009-05-23 18:04             ` Michael S. Zick
  2009-05-23 23:44               ` H. Peter Anvin
  2009-05-23 20:51             ` [BUG FIX] Make x86_32 uni-processor Atomic ops, Atomic Michael S. Zick
  3 siblings, 1 reply; 90+ messages in thread
From: Michael S. Zick @ 2009-05-23 18:04 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: Ingo Molnar, Thomas Gleixner, linux-kernel

On Fri May 22 2009, H. Peter Anvin wrote:
> Michael S. Zick wrote:
> > Same integrated motherboard.
> 
> Which means same CPU, same BIOS, same motherboard (none of which you're
> telling us.)
> 
> cpuinfo and dmidecode would be informative.
>

The -09143lk files are posted.

Download location:
http://hp-umpc.com/ce1200v

Details so far today:
http://forum.netbookuser.com/viewtopic.php?pid=6968#p6968

Summary:
HP-2133 (C7-M/CN896) - 09143 - No results - Still up 3 1/2 hours.
Cloudbook (C7-M/CX700) - 09143 - 45 minutes.
Cloudbook (C7-M/CX700) - 09143lk - No results - Still up 1 1/2 hours.

OK - time to look for the missing "memory" in the clobber lists.  ;)

Mike
 
> 	-hpa
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 
> 



^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [BUG FIX] Make x86_32 uni-processor Atomic ops, Atomic
  2009-05-23 18:04             ` Michael S. Zick
@ 2009-05-23 23:44               ` H. Peter Anvin
  2009-05-24  6:49                 ` Harald Welte
                                   ` (2 more replies)
  0 siblings, 3 replies; 90+ messages in thread
From: H. Peter Anvin @ 2009-05-23 23:44 UTC (permalink / raw)
  To: Harald Welte; +Cc: lkml, Ingo Molnar, Thomas Gleixner, linux-kernel, Alan Cox

Hi Harald,

It looks like there might be a problem with the C7-M ... Michael reports
that if he sets LOCK_PREFIX to "lock;" it works, but that shouldn't be
necessary for a uniprocessor.

I'm wondering if we have to revive the OOSTORE hack, or some other
workaround.  It is of course hard for me to track this down since (a) I
don't have access to the CPU documentation, and (b) I work for Intel
now, which limits the amount of time I can realistically spend on this.

	-hpa

[Cc: Alan, who I believed developed the OOSTORE hack back when.]


Michael S. Zick wrote:
> On Fri May 22 2009, H. Peter Anvin wrote:
>> Michael S. Zick wrote:
>>> Same integrated motherboard.
>> Which means same CPU, same BIOS, same motherboard (none of which you're
>> telling us.)
>>
>> cpuinfo and dmidecode would be informative.
>>
> 
> The -09143lk files are posted.
> 
> Download location:
> http://hp-umpc.com/ce1200v
> 
> Details so far today:
> http://forum.netbookuser.com/viewtopic.php?pid=6968#p6968
> 
> Summary:
> HP-2133 (C7-M/CN896) - 09143 - No results - Still up 3 1/2 hours.
> Cloudbook (C7-M/CX700) - 09143 - 45 minutes.
> Cloudbook (C7-M/CX700) - 09143lk - No results - Still up 1 1/2 hours.
> 
> OK - time to look for the missing "memory" in the clobber lists.  ;)
> 

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.


^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [BUG FIX] Make x86_32 uni-processor Atomic ops, Atomic
  2009-05-23 23:44               ` H. Peter Anvin
@ 2009-05-24  6:49                 ` Harald Welte
  2009-05-24 12:38                   ` Michael S. Zick
                                     ` (2 more replies)
  2009-05-24 12:27                 ` Michael S. Zick
  2009-05-27 17:01                 ` LOCK prefix on uni processor has its use (was Re: [BUG FIX] Make x86_32 uni-processor Atomic ops, Atomic) Harald Welte
  2 siblings, 3 replies; 90+ messages in thread
From: Harald Welte @ 2009-05-24  6:49 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: lkml, Ingo Molnar, Thomas Gleixner, linux-kernel, Alan Cox

Dear hpa, and others,

On Sat, May 23, 2009 at 04:44:08PM -0700, H. Peter Anvin wrote:
> It looks like there might be a problem with the C7-M ... Michael reports
> that if he sets LOCK_PREFIX to "lock;" it works, but that shouldn't be
> necessary for a uniprocessor.
> 

I will try my best to help, though I have to admit I'm far from being
a x86 expert, and particularly not with regard to low-level bits such as atomic
operations.

So please give me some time to research some background about that,
and read up all the details on the currently reported/described problem.

Once I understand it in full detail, I can talk to the right people inside
CentaurLabs (VIA's CPU division).  

If somebody (optionally) can phrase a precise technical question that I can
directly forward to somebody with low-level x86 knowledge but no Linux background,
it would definitely help speeding up the process.

> I'm wondering if we have to revive the OOSTORE hack, or some other
> workaround.  It is of course hard for me to track this down since (a) I
> don't have access to the CPU documentation, 

As far as I know, there really is no such documentation.. all documentation
that I've ever seen internally is electrical data sheets and high-level feature
set descriptiosn, CPUID, MSR and padlock.  There are no  actual x86 instruction
set documents... Centaur is < 100 people, they don't have the resources to work
on documents along the lines of what Intel has...

> and (b) I work for Intel now, which limits the amount of time I can
> realistically spend on this.

Sure, thanks for letting me know.
-- 
- Harald Welte <HaraldWelte@viatech.com>	    http://linux.via.com.tw/
============================================================================
VIA Open Source Liaison

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [BUG FIX] Make x86_32 uni-processor Atomic ops, Atomic
  2009-05-24  6:49                 ` Harald Welte
@ 2009-05-24 12:38                   ` Michael S. Zick
  2009-05-24 17:31                     ` Harald Welte
  2009-05-27 12:18                   ` Re:[VIA Support] was: " Michael S. Zick
  2009-05-30 15:48                   ` Michael S. Zick
  2 siblings, 1 reply; 90+ messages in thread
From: Michael S. Zick @ 2009-05-24 12:38 UTC (permalink / raw)
  To: Harald Welte
  Cc: H. Peter Anvin, Ingo Molnar, Thomas Gleixner, linux-kernel,
	Alan Cox

On Sun May 24 2009, Harald Welte wrote:
> Dear hpa, and others,
> 
> On Sat, May 23, 2009 at 04:44:08PM -0700, H. Peter Anvin wrote:
> > It looks like there might be a problem with the C7-M ... Michael reports
> > that if he sets LOCK_PREFIX to "lock;" it works, but that shouldn't be
> > necessary for a uniprocessor.
> > 
> 
> I will try my best to help, though I have to admit I'm far from being
> a x86 expert, and particularly not with regard to low-level bits such as atomic
> operations.
> 
> So please give me some time to research some background about that,
> and read up all the details on the currently reported/described problem.
> 
> Once I understand it in full detail, I can talk to the right people inside
> CentaurLabs (VIA's CPU division).  
> 
> If somebody (optionally) can phrase a precise technical question that I can
> directly forward to somebody with low-level x86 knowledge but no Linux background,
> it would definitely help speeding up the process.
> 
> > I'm wondering if we have to revive the OOSTORE hack, or some other
> > workaround.  It is of course hard for me to track this down since (a) I
> > don't have access to the CPU documentation, 
> 
> As far as I know, there really is no such documentation.. all documentation
> that I've ever seen internally is electrical data sheets and high-level feature
> set descriptiosn, CPUID, MSR and padlock.  There are no  actual x86 instruction
> set documents... Centaur is < 100 people, they don't have the resources to work
> on documents along the lines of what Intel has...
>

My background is in the electronic hardware end of things - -
Is there someone I can contact for the existing documents -
Even under NDA would be fine.

For instance, the layout of the CPUID results - they don't
currently seem to match what the marketing people claim is
inside of the chips.  There are some "VIA specific" fields.

Also, those funny looking electrical data sheets with the wiggly
lines will mean something to me in terms of when to use the
"lock" prefix.  All you have to do is grow up with such things. ;)
 
Could you also dig around for a tech manual on CN896 similar to
the one (of two) CX700 manuals that are publicly posted?
Even under NDA is fine.

> > and (b) I work for Intel now, which limits the amount of time I can
> > realistically spend on this.
> 

I might be able to get you a machine, but if you are scanned
at the front door for VIA or AMD hardware. . .  ;)

Mike
> Sure, thanks for letting me know.



^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [BUG FIX] Make x86_32 uni-processor Atomic ops, Atomic
  2009-05-24 12:38                   ` Michael S. Zick
@ 2009-05-24 17:31                     ` Harald Welte
  0 siblings, 0 replies; 90+ messages in thread
From: Harald Welte @ 2009-05-24 17:31 UTC (permalink / raw)
  To: Michael S. Zick
  Cc: H. Peter Anvin, Ingo Molnar, Thomas Gleixner, linux-kernel,
	Alan Cox

Hi Michael,

On Sun, May 24, 2009 at 07:38:44AM -0500, Michael S. Zick wrote:
> > As far as I know, there really is no such documentation.. all documentation
> > that I've ever seen internally is electrical data sheets and high-level feature
> > set descriptiosn, CPUID, MSR and padlock.  There are no  actual x86 instruction
> > set documents... Centaur is < 100 people, they don't have the resources to work
> > on documents along the lines of what Intel has...
> 
> My background is in the electronic hardware end of things - -
> Is there someone I can contact for the existing documents -
> Even under NDA would be fine.

I have inquired right now.  The regular NDA process I would assume is probably
quite slow.  The CPU documentation is already on its track for becoming public
at some point (but very slooooow track), so I'll see what I can do and contact
you in private mail.

> For instance, the layout of the CPUID results - they don't
> currently seem to match what the marketing people claim is
> inside of the chips.  There are some "VIA specific" fields.

There's two versions of the C7-M, an 'A' model (90nm SOI) and a much more
recent 'D' model (90nm conventional process).   They CPUID values are 6-a and
6-d, respectively.  The cpu ID string of the former ones contains Esther,
the latter one contains C7-M  - but in fact any BIOS could override the cpu
ID string (not cpuid!) with whatever they want using a backdoor in some MSR.

> Could you also dig around for a tech manual on CN896 similar to
> the one (of two) CX700 manuals that are publicly posted?

I've asked about that.  The programming guides for chipsets are generally on
the 'open track', whereas the electrical data sheets with pinouts and timing
values are under NDA.

The CN896 was just already an "old" component when that new open-track policy
was introduced, and typically VIA is trying to focus on docs and drivers for 
new products, rather than old ones.  But I have asked if we can release the
CN896 programming manual public.

> Even under NDA is fine.

Well, I prefer to make sure that we have the neccessary information open.
NDA's are fine and well for the limited number of customers you have, but
makign NDA's with various individual programmers really is too painful,
there should be other ways...

-- 
- Harald Welte <HaraldWelte@viatech.com>	    http://linux.via.com.tw/
============================================================================
VIA Open Source Liaison

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re:[VIA Support] was: [BUG FIX] Make x86_32 uni-processor Atomic ops, Atomic
  2009-05-24  6:49                 ` Harald Welte
  2009-05-24 12:38                   ` Michael S. Zick
@ 2009-05-27 12:18                   ` Michael S. Zick
  2009-05-27 12:22                     ` [VIA " Michael S. Zick
                                       ` (2 more replies)
  2009-05-30 15:48                   ` Michael S. Zick
  2 siblings, 3 replies; 90+ messages in thread
From: Michael S. Zick @ 2009-05-27 12:18 UTC (permalink / raw)
  To: Harald Welte
  Cc: H. Peter Anvin, Ingo Molnar, Thomas Gleixner, linux-kernel,
	Alan Cox

On Sun May 24 2009, Harald Welte wrote:
> 
> Once I understand it in full detail, I can talk to the right people inside
> CentaurLabs (VIA's CPU division).  
> 
> If somebody (optionally) can phrase a precise technical question that I can
> directly forward to somebody with low-level x86 knowledge but no Linux background,
> it would definitely help speeding up the process.
> 

What is the PCI Cache Line size in the CX700?  In the CN896?

Ref:
arch/x86/pci/common.c

As in:
        /*
         * Assume PCI cacheline size of 32 bytes for all x86s except K7/K8
         * and P4. It's also good for 386/486s (which actually have 16)
         * as quite a few PCI devices do not support smaller values.
         */

        pci_cache_line_size = 32 >> 2;
        if (c->x86 >= 6 && c->x86_vendor == X86_VENDOR_AMD)
                pci_cache_line_size = 64 >> 2;  /* K7 & K8 */
        else if (c->x86 > 6 && c->x86_vendor == X86_VENDOR_INTEL)
                pci_cache_line_size = 128 >> 2; /* P4 */

A problem with cache coherency, alignment, or consistency would explain
the problems I am seeing - and the differences in the test cases.

Mike

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [VIA Support] was: [BUG FIX] Make x86_32 uni-processor Atomic ops, Atomic
  2009-05-27 12:18                   ` Re:[VIA Support] was: " Michael S. Zick
@ 2009-05-27 12:22                     ` Michael S. Zick
  2009-05-27 12:47                     ` Harald Welte
  2009-05-29 12:06                     ` Michael S. Zick
  2 siblings, 0 replies; 90+ messages in thread
From: Michael S. Zick @ 2009-05-27 12:22 UTC (permalink / raw)
  To: Harald Welte
  Cc: H. Peter Anvin, Ingo Molnar, Thomas Gleixner, linux-kernel,
	Alan Cox

On Wed May 27 2009, Michael S. Zick wrote:
> On Sun May 24 2009, Harald Welte wrote:
> > 
> > Once I understand it in full detail, I can talk to the right people inside
> > CentaurLabs (VIA's CPU division).  
> > 
> > If somebody (optionally) can phrase a precise technical question that I can
> > directly forward to somebody with low-level x86 knowledge but no Linux background,
> > it would definitely help speeding up the process.
> > 
> 
> What is the PCI Cache Line size in the CX700?  In the CN896?
> 
> Ref:
> arch/x86/pci/common.c
> 
> As in:
>         /*
>          * Assume PCI cacheline size of 32 bytes for all x86s except K7/K8
>          * and P4. It's also good for 386/486s (which actually have 16)
>          * as quite a few PCI devices do not support smaller values.
>          */
> 
>         pci_cache_line_size = 32 >> 2;
>         if (c->x86 >= 6 && c->x86_vendor == X86_VENDOR_AMD)
>                 pci_cache_line_size = 64 >> 2;  /* K7 & K8 */
>         else if (c->x86 > 6 && c->x86_vendor == X86_VENDOR_INTEL)
>                 pci_cache_line_size = 128 >> 2; /* P4 */
> 
> A problem with cache coherency, alignment, or consistency would explain
> the problems I am seeing - and the differences in the test cases.
> 

Related speculations: 
http://forum.netbookuser.com/viewtopic.php?pid=6987#p6987

Mike


^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [VIA Support] was: [BUG FIX] Make x86_32 uni-processor Atomic ops, Atomic
  2009-05-27 12:18                   ` Re:[VIA Support] was: " Michael S. Zick
  2009-05-27 12:22                     ` [VIA " Michael S. Zick
@ 2009-05-27 12:47                     ` Harald Welte
  2009-05-27 13:00                       ` Michael S. Zick
  2009-05-29 12:06                     ` Michael S. Zick
  2 siblings, 1 reply; 90+ messages in thread
From: Harald Welte @ 2009-05-27 12:47 UTC (permalink / raw)
  To: Michael S. Zick
  Cc: H. Peter Anvin, Ingo Molnar, Thomas Gleixner, linux-kernel,
	Alan Cox

On Wed, May 27, 2009 at 07:18:08AM -0500, Michael S. Zick wrote:
> On Sun May 24 2009, Harald Welte wrote:
> > 
> > Once I understand it in full detail, I can talk to the right people inside
> > CentaurLabs (VIA's CPU division).  
> > 
> > If somebody (optionally) can phrase a precise technical question that I can
> > directly forward to somebody with low-level x86 knowledge but no Linux background,
> > it would definitely help speeding up the process.
> > 
> 
> What is the PCI Cache Line size in the CX700?  In the CN896?

The chipset documentation doesn't say anything about that, I'd have to inquire
inside VIA.  I doubt any difference between CX700/CN896.

Also, setting the PCI config space register to a too small cache line size
(such as 32) on a system that supports more (say 64) doesn't really cause any
problems, but just reduces performance - as far as I know.

Setting it too big will cause trouble.  But since 32 is the default and
only on AMD and Intel CPU's it is increased, I see no issue here either.

-- 
- Harald Welte <HaraldWelte@viatech.com>	    http://linux.via.com.tw/
============================================================================
VIA Free and Open Source Software Liaison

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [VIA Support] was: [BUG FIX] Make x86_32 uni-processor Atomic ops, Atomic
  2009-05-27 12:47                     ` Harald Welte
@ 2009-05-27 13:00                       ` Michael S. Zick
  0 siblings, 0 replies; 90+ messages in thread
From: Michael S. Zick @ 2009-05-27 13:00 UTC (permalink / raw)
  To: Harald Welte
  Cc: H. Peter Anvin, Ingo Molnar, Thomas Gleixner, linux-kernel,
	Alan Cox

On Wed May 27 2009, Harald Welte wrote:
> On Wed, May 27, 2009 at 07:18:08AM -0500, Michael S. Zick wrote:
> > On Sun May 24 2009, Harald Welte wrote:
> > > 
> > > Once I understand it in full detail, I can talk to the right people inside
> > > CentaurLabs (VIA's CPU division).  
> > > 
> > > If somebody (optionally) can phrase a precise technical question that I can
> > > directly forward to somebody with low-level x86 knowledge but no Linux background,
> > > it would definitely help speeding up the process.
> > > 
> > 
> > What is the PCI Cache Line size in the CX700?  In the CN896?
> 
> The chipset documentation doesn't say anything about that, I'd have to inquire
> inside VIA.  I doubt any difference between CX700/CN896.
> 
> Also, setting the PCI config space register to a too small cache line size
> (such as 32) on a system that supports more (say 64) doesn't really cause any
> problems, but just reduces performance - as far as I know.
> 
> Setting it too big will cause trouble.  But since 32 is the default and
> only on AMD and Intel CPU's it is increased, I see no issue here either.
> 

Since the system chip sets where designed for use with the processor -
I am going to poke it up to the processor cache line size - just for fun.

If our assumptions are correct (I do agree with your statements myself) -
then all that will happen is we halve the number of cache line flushes. ;)
If not, perhaps we get another test case data point to consider.

Thanks for the quick reply;
Mike


^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [VIA Support] was: [BUG FIX] Make x86_32 uni-processor Atomic ops, Atomic
  2009-05-27 12:18                   ` Re:[VIA Support] was: " Michael S. Zick
  2009-05-27 12:22                     ` [VIA " Michael S. Zick
  2009-05-27 12:47                     ` Harald Welte
@ 2009-05-29 12:06                     ` Michael S. Zick
  2 siblings, 0 replies; 90+ messages in thread
From: Michael S. Zick @ 2009-05-29 12:06 UTC (permalink / raw)
  To: Harald Welte
  Cc: H. Peter Anvin, Ingo Molnar, Thomas Gleixner, linux-kernel,
	Alan Cox

On Wed May 27 2009, Michael S. Zick wrote:
> On Sun May 24 2009, Harald Welte wrote:
> > 
> > Once I understand it in full detail, I can talk to the right people inside
> > CentaurLabs (VIA's CPU division).  
> > 

The trial build of yesterday's repository head is now posted, details at:
http://forum.netbookuser.com/viewtopic.php?pid=7002#p7002

I don't consider the C7-M/CX700 whack-a-bug project finished. . .
But I haven't broken my C7-M/CN896 yet either.  ;)

Mike


^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [BUG FIX] Make x86_32 uni-processor Atomic ops, Atomic
  2009-05-24  6:49                 ` Harald Welte
  2009-05-24 12:38                   ` Michael S. Zick
  2009-05-27 12:18                   ` Re:[VIA Support] was: " Michael S. Zick
@ 2009-05-30 15:48                   ` Michael S. Zick
  2 siblings, 0 replies; 90+ messages in thread
From: Michael S. Zick @ 2009-05-30 15:48 UTC (permalink / raw)
  To: Harald Welte
  Cc: H. Peter Anvin, Ingo Molnar, Thomas Gleixner, linux-kernel,
	Alan Cox

On Sun May 24 2009, Harald Welte wrote:
> Dear hpa, and others,
> 
> On Sat, May 23, 2009 at 04:44:08PM -0700, H. Peter Anvin wrote:
> > It looks like there might be a problem with the C7-M ... Michael reports
> > that if he sets LOCK_PREFIX to "lock;" it works, but that shouldn't be
> > necessary for a uniprocessor.
> > 
> 
> I will try my best to help, though I have to admit I'm far from being
> a x86 expert, and particularly not with regard to low-level bits such as atomic
> operations.
> 
> So please give me some time to research some background about that,
> and read up all the details on the currently reported/described problem.
> 
> Once I understand it in full detail, I can talk to the right people inside
> CentaurLabs (VIA's CPU division).  
> 
> If somebody (optionally) can phrase a precise technical question that I can
> directly forward to somebody with low-level x86 knowledge but no Linux background,
> it would definitely help speeding up the process.
> 

Does the C7-M instruction set define the 'pause' instruction (0xf3,0x90)?
*Defined* since the P-4, but backward compatible with earlier ia32 processors
even though it falls into the "don't use rep before non-string instructions".

Mike


^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [BUG FIX] Make x86_32 uni-processor Atomic ops, Atomic
  2009-05-23 23:44               ` H. Peter Anvin
  2009-05-24  6:49                 ` Harald Welte
@ 2009-05-24 12:27                 ` Michael S. Zick
  2009-05-24 17:22                   ` Harald Welte
  2009-05-24 18:00                   ` H. Peter Anvin
  2009-05-27 17:01                 ` LOCK prefix on uni processor has its use (was Re: [BUG FIX] Make x86_32 uni-processor Atomic ops, Atomic) Harald Welte
  2 siblings, 2 replies; 90+ messages in thread
From: Michael S. Zick @ 2009-05-24 12:27 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Harald Welte, Ingo Molnar, Thomas Gleixner, linux-kernel,
	Alan Cox

On Sat May 23 2009, H. Peter Anvin wrote:
> Hi Harald,
> 
> It looks like there might be a problem with the C7-M ... Michael reports
> that if he sets LOCK_PREFIX to "lock;" it works, but that shouldn't be
> necessary for a uniprocessor.
> 
> I'm wondering if we have to revive the OOSTORE hack, or some other
> workaround.  It is of course hard for me to track this down since (a) I
> don't have access to the CPU documentation, and (b) I work for Intel
> now, which limits the amount of time I can realistically spend on this.
>

@hpa - I still like your suggestion that it is only one (or a few)
uses of atomic ops that is incorrect and in general atomic ops
should compile away on uni-processor.

Let me translate the findings (see further in the included post) -
The C7-M/CN896 (no tech manual released for CN896 yet) and
the C7-M/CX700 (tech manual released since drivers written)

*) I never tested -09143lk on the C7-M/CN896 because -09143 did
not fail all day (a record for 2.6.30 at the moment).

*) The difference on the C7-M/CX700 between the -09143 and -09143lk
I consider significant.

***) But, keep in mind, just because the system chip set is different,
there are other unknowns - -
We can *not* say at the moment that both machines where using the same 
execution paths - even though the binaries where identical.

Also, there where probably different external modules loaded in the
two runs - not many, mostly things are built-in.

The truly significant point on the C7-M/CX700 running -09143lk was that
when the echi-hcd driver got hung in its failure loop, generating a 
flood of messages - it did not take down or lock the kernel.

I consider this "forward progress" - it should be possible to build-in
the lock-dep checkers and get something in the message buffer -
rather than just have the machine halt.  Its hard to debug a halted
machine with only a glowing power-on light for feed-back.  ;)

Mike

> 	-hpa
> 
> [Cc: Alan, who I believed developed the OOSTORE hack back when.]
> 
> 
> Michael S. Zick wrote:
> > On Fri May 22 2009, H. Peter Anvin wrote:
> >> Michael S. Zick wrote:
> >>> Same integrated motherboard.
> >> Which means same CPU, same BIOS, same motherboard (none of which you're
> >> telling us.)
> >>
> >> cpuinfo and dmidecode would be informative.
> >>
> > 
> > The -09143lk files are posted.
> > 
> > Download location:
> > http://hp-umpc.com/ce1200v
> > 
> > Details so far today:
> > http://forum.netbookuser.com/viewtopic.php?pid=6968#p6968
> > 
> > Summary:
> > HP-2133 (C7-M/CN896) - 09143 - No results - Still up 3 1/2 hours.
> > Cloudbook (C7-M/CX700) - 09143 - 45 minutes.
> > Cloudbook (C7-M/CX700) - 09143lk - No results - Still up 1 1/2 hours.
> > 
> > OK - time to look for the missing "memory" in the clobber lists.  ;)
> > 
> 

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [BUG FIX] Make x86_32 uni-processor Atomic ops, Atomic
  2009-05-24 12:27                 ` Michael S. Zick
@ 2009-05-24 17:22                   ` Harald Welte
  2009-05-24 18:00                   ` H. Peter Anvin
  1 sibling, 0 replies; 90+ messages in thread
From: Harald Welte @ 2009-05-24 17:22 UTC (permalink / raw)
  To: Michael S. Zick
  Cc: H. Peter Anvin, Ingo Molnar, Thomas Gleixner, linux-kernel,
	Alan Cox

On Sun, May 24, 2009 at 07:27:27AM -0500, Michael S. Zick wrote:

> *) The difference on the C7-M/CX700 between the -09143 and -09143lk
> I consider significant.

I agree.

> ***) But, keep in mind, just because the system chip set is different,
> there are other unknowns - -
> We can *not* say at the moment that both machines where using the same 
> execution paths - even though the binaries where identical.

yes, of course.

> Also, there where probably different external modules loaded in the
> two runs - not many, mostly things are built-in.
> 
> The truly significant point on the C7-M/CX700 running -09143lk was that
> when the echi-hcd driver got hung in its failure loop, generating a 
> flood of messages - it did not take down or lock the kernel.
> 
> I consider this "forward progress" - it should be possible to build-in
> the lock-dep checkers and get something in the message buffer -
> rather than just have the machine halt.  Its hard to debug a halted
> machine with only a glowing power-on light for feed-back.  ;)

well, if you're not working with notebooks but actual regular mainboard
devices, then you should have a serial console and possibly still have 
magic sysrq or at least some other interesting information on the console.

I personally don't have access to a CX700 based board at the moment, and due to
my travel schedule I won't get that before June 6th.  However, I do have access
to C7-M boards with VX800 and VX855.  However, they don't use the VIA Rhine
Ethernet chip, so if you are triggering the bug with that driver, it is
unlikely to occur there.

Meanwhile, I will inquire what the CPU guys think should happen with regard
to the LOCK prefix.  If their view of the world of what they expect from the
hardware is already different from our assumptions, we can save ourselves
time consuming testing...

Regards,
-- 
- Harald Welte <HaraldWelte@viatech.com>	    http://linux.via.com.tw/
============================================================================
VIA Open Source Liaison

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [BUG FIX] Make x86_32 uni-processor Atomic ops, Atomic
  2009-05-24 12:27                 ` Michael S. Zick
  2009-05-24 17:22                   ` Harald Welte
@ 2009-05-24 18:00                   ` H. Peter Anvin
  2009-05-24 18:32                     ` Michael S. Zick
  2009-05-28 20:30                     ` Pavel Machek
  1 sibling, 2 replies; 90+ messages in thread
From: H. Peter Anvin @ 2009-05-24 18:00 UTC (permalink / raw)
  To: lkml; +Cc: Harald Welte, Ingo Molnar, Thomas Gleixner, linux-kernel,
	Alan Cox

Michael S. Zick wrote:
> 
> @hpa - I still like your suggestion that it is only one (or a few)
> uses of atomic ops that is incorrect and in general atomic ops
> should compile away on uni-processor.
> 

Actually, the more I think about it the more I suspect there is a race
condition either in the chip set or in any VIA-specific drivers (if
there are any.)  Putting LOCKs in random places will slow the CPU down
significantly, so it might resolve the race condition without actually
solving the problem.

	-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.


^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [BUG FIX] Make x86_32 uni-processor Atomic ops, Atomic
  2009-05-24 18:00                   ` H. Peter Anvin
@ 2009-05-24 18:32                     ` Michael S. Zick
  2009-05-24 18:46                       ` H. Peter Anvin
                                         ` (2 more replies)
  2009-05-28 20:30                     ` Pavel Machek
  1 sibling, 3 replies; 90+ messages in thread
From: Michael S. Zick @ 2009-05-24 18:32 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Harald Welte, Ingo Molnar, Thomas Gleixner, linux-kernel,
	Alan Cox

On Sun May 24 2009, H. Peter Anvin wrote:
> Michael S. Zick wrote:
> > 
> > @hpa - I still like your suggestion that it is only one (or a few)
> > uses of atomic ops that is incorrect and in general atomic ops
> > should compile away on uni-processor.
> > 
> 
> Actually, the more I think about it the more I suspect there is a race
> condition either in the chip set or in any VIA-specific drivers (if
> there are any.)  Putting LOCKs in random places will slow the CPU down
> significantly, so it might resolve the race condition without actually
> solving the problem.
>

They are mostly out of the -09143 and -09144 builds -
No cpufreq (I.E: no e_powersaver).
The padlock-* drivers are modules which must be manually loaded. 

The i2c-viapro driver (in spite of its comments) does not work
on CX700 (written before manual was released) - it is reading
the serial number rather than the second data port. ;)
(No access to the chipset temperature/voltage data on SMBus).

The via-fb driver just "doesn't work" - Haven't looked at it yet.

There is a VIA-specific driver for the VIA USB controller, but it
isn't in the x86 part of the tree - Haven't looked at it yet.

There isn't a driver for the hardware watchdog on CX700 - 
There isn't a driver for the machine error reporting -

= = = =

Although there may be timing requirement differences on the
CX700 and CN896 - I think more likely a human error (typo)
in the "clobber" lines of the asm - Have not yet audited that,
but it is high on my list.

Note: I have seem to recall that newer gcc's optimizer presume 
that the flags register is preserved across asm - 
It didn't use to do that - but there is now a "cc" to deal with
that - Have not yet audited for that, but it is high on my list.

Busy, busy, busy - -
The -09144lk on C7-M/CX700 now up for 3 3/4 hours close to a new
record - but ehci-hcd has not yet gone into a re-try loop.

Mike
> 	-hpa
> 

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [BUG FIX] Make x86_32 uni-processor Atomic ops, Atomic
  2009-05-24 18:32                     ` Michael S. Zick
@ 2009-05-24 18:46                       ` H. Peter Anvin
  2009-05-24 19:09                         ` Michael S. Zick
  2009-05-25 19:03                         ` Michael S. Zick
  2009-05-25  1:31                       ` i2c-viapro / via-fb drivers on VIA CX700 Harald Welte
  2009-05-25 16:05                       ` [BUG FIX] Make x86_32 uni-processor Atomic ops, Atomic Michael S. Zick
  2 siblings, 2 replies; 90+ messages in thread
From: H. Peter Anvin @ 2009-05-24 18:46 UTC (permalink / raw)
  To: lkml; +Cc: Harald Welte, Ingo Molnar, Thomas Gleixner, linux-kernel,
	Alan Cox

Michael S. Zick wrote:
> 
> Note: I have seem to recall that newer gcc's optimizer presume 
> that the flags register is preserved across asm - 
> It didn't use to do that - but there is now a "cc" to deal with
> that - Have not yet audited for that, but it is high on my list.
> 

I am pretty sure that's false... if it was true we'd have failures all
over the kernel.

	-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.


^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [BUG FIX] Make x86_32 uni-processor Atomic ops, Atomic
  2009-05-24 18:46                       ` H. Peter Anvin
@ 2009-05-24 19:09                         ` Michael S. Zick
  2009-05-25 19:03                         ` Michael S. Zick
  1 sibling, 0 replies; 90+ messages in thread
From: Michael S. Zick @ 2009-05-24 19:09 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Harald Welte, Ingo Molnar, Thomas Gleixner, linux-kernel,
	Alan Cox

On Sun May 24 2009, H. Peter Anvin wrote:
> Michael S. Zick wrote:
> > 
> > Note: I have seem to recall that newer gcc's optimizer presume 
> > that the flags register is preserved across asm - 
> > It didn't use to do that - but there is now a "cc" to deal with
> > that - Have not yet audited for that, but it is high on my list.
> > 
> 
> I am pretty sure that's false... if it was true we'd have failures all
> over the kernel.
> 

Not an issue at the moment, will cover that when I audit my own code.

Mike
> 	-hpa
> 



^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [BUG FIX] Make x86_32 uni-processor Atomic ops, Atomic
  2009-05-24 18:46                       ` H. Peter Anvin
  2009-05-24 19:09                         ` Michael S. Zick
@ 2009-05-25 19:03                         ` Michael S. Zick
  2009-05-25 19:18                           ` Michael S. Zick
  1 sibling, 1 reply; 90+ messages in thread
From: Michael S. Zick @ 2009-05-25 19:03 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Harald Welte, Ingo Molnar, Thomas Gleixner, linux-kernel,
	Alan Cox

On Sun May 24 2009, H. Peter Anvin wrote:
> Michael S. Zick wrote:
> > 
> > Note: I have seem to recall that newer gcc's optimizer presume 
> > that the flags register is preserved across asm - 
> > It didn't use to do that - but there is now a "cc" to deal with
> > that - Have not yet audited for that, but it is high on my list.
> > 
> 
> I am pretty sure that's false... if it was true we'd have failures all
> over the kernel.
> 

No information on the above (yet) - but you gotta love this one: ;)

Programmer authors code specifying that the subtraction be done
prior to the addition to avoid over-flow conditions;

GCC's optimizer, in its great wisdom, codes in the overflow case:
( the case of finding the characters used/free in a ring buffer )

extern int diff_umask(int mask, int *cnt1, int *cnt2) 
{ return (((mask - *cnt1) + *cnt2) & mask); }

/**
 * gcc -O2 -S -fomit-frame-pointer difftest.c
 *
         .file   "difftest.c"
        .text
        .p2align 4,,15
.globl diff_umask
        .type   diff_umask, @function
diff_umask:
        movl    12(%esp), %eax
        movl    4(%esp), %ecx
        movl    (%eax), %edx
        leal    (%ecx,%edx), %eax
        movl    8(%esp), %edx
        subl    (%edx), %eax
        andl    %ecx, %eax
        ret
        .size   diff_umask, .-diff_umask
        .ident  "GCC: (Debian 4.3.2-1.1) 4.3.2"
        .section        .note.GNU-stack,"",@progbits
*/

Note: That is not the compiler version I am building my kernels with.

Don't blame me, I didn't write the compiler. ;)

Mike
> 	-hpa
> 



^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [BUG FIX] Make x86_32 uni-processor Atomic ops, Atomic
  2009-05-25 19:03                         ` Michael S. Zick
@ 2009-05-25 19:18                           ` Michael S. Zick
  2009-05-25 19:46                             ` Michael S. Zick
  0 siblings, 1 reply; 90+ messages in thread
From: Michael S. Zick @ 2009-05-25 19:18 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Harald Welte, Ingo Molnar, Thomas Gleixner, linux-kernel,
	Alan Cox

On Mon May 25 2009, Michael S. Zick wrote:
> On Sun May 24 2009, H. Peter Anvin wrote:
> > Michael S. Zick wrote:
> > > 
> > > Note: I have seem to recall that newer gcc's optimizer presume 
> > > that the flags register is preserved across asm - 
> > > It didn't use to do that - but there is now a "cc" to deal with
> > > that - Have not yet audited for that, but it is high on my list.
> > > 
> > 
> > I am pretty sure that's false... if it was true we'd have failures all
> > over the kernel.
> > 
> 
> No information on the above (yet) - but you gotta love this one: ;)
> 
> Programmer authors code specifying that the subtraction be done
> prior to the addition to avoid over-flow conditions;
> 
> GCC's optimizer, in its great wisdom, codes in the overflow case:
> ( the case of finding the characters used/free in a ring buffer )
> 
> extern int diff_umask(int mask, int *cnt1, int *cnt2) 
> { return (((mask - *cnt1) + *cnt2) & mask); }
> 
> /**
>  * gcc -O2 -S -fomit-frame-pointer difftest.c
>  *
>          .file   "difftest.c"
>         .text
>         .p2align 4,,15
> .globl diff_umask
>         .type   diff_umask, @function
> diff_umask:
>         movl    12(%esp), %eax
>         movl    4(%esp), %ecx
>         movl    (%eax), %edx
>         leal    (%ecx,%edx), %eax
>         movl    8(%esp), %edx
>         subl    (%edx), %eax
>         andl    %ecx, %eax
>         ret
>         .size   diff_umask, .-diff_umask
>         .ident  "GCC: (Debian 4.3.2-1.1) 4.3.2"
>         .section        .note.GNU-stack,"",@progbits
> */
> 
> Note: That is not the compiler version I am building my kernels with.
> 

The compiler I am using (Gentoo 4.1.2) gets it correct:

        .file   "difftest.c"
        .text
        .p2align 4,,15
.globl diff_umask
        .type   diff_umask, @function
diff_umask:
        movl    4(%esp), %eax
        movl    8(%esp), %edx
        movl    %eax, %ecx
        subl    (%edx), %ecx
        movl    %ecx, %edx
        movl    12(%esp), %ecx
        addl    (%ecx), %edx
        andl    %edx, %eax
        ret
        .size   diff_umask, .-diff_umask
        .ident  "GCC: (GNU) 4.1.2 (Gentoo 4.1.2 p1.1)"
        .section        .note.GNU-stack,"",@progbits

Mike
> Don't blame me, I didn't write the compiler. ;)
> 
> Mike
> > 	-hpa
> > 
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 
> 



^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [BUG FIX] Make x86_32 uni-processor Atomic ops, Atomic
  2009-05-25 19:18                           ` Michael S. Zick
@ 2009-05-25 19:46                             ` Michael S. Zick
  2009-05-25 21:10                               ` Michael S. Zick
  0 siblings, 1 reply; 90+ messages in thread
From: Michael S. Zick @ 2009-05-25 19:46 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Harald Welte, Ingo Molnar, Thomas Gleixner, linux-kernel,
	Alan Cox

On Mon May 25 2009, Michael S. Zick wrote:
> On Mon May 25 2009, Michael S. Zick wrote:
> > On Sun May 24 2009, H. Peter Anvin wrote:
> > > Michael S. Zick wrote:
> > > > 
> > > > Note: I have seem to recall that newer gcc's optimizer presume 
> > > > that the flags register is preserved across asm - 
> > > > It didn't use to do that - but there is now a "cc" to deal with
> > > > that - Have not yet audited for that, but it is high on my list.
> > > > 
> > > 
> > > I am pretty sure that's false... if it was true we'd have failures all
> > > over the kernel.
> > > 
> > 
> > No information on the above (yet) - but you gotta love this one: ;)
> > 
> > Programmer authors code specifying that the subtraction be done
> > prior to the addition to avoid over-flow conditions;
> > 
> > GCC's optimizer, in its great wisdom, codes in the overflow case:
> > ( the case of finding the characters used/free in a ring buffer )
> > 
> > extern int diff_umask(int mask, int *cnt1, int *cnt2) 
> > { return (((mask - *cnt1) + *cnt2) & mask); }
> > 
> > /**
> >  * gcc -O2 -S -fomit-frame-pointer difftest.c
> >  *
> >          .file   "difftest.c"
> >         .text
> >         .p2align 4,,15
> > .globl diff_umask
> >         .type   diff_umask, @function
> > diff_umask:
> >         movl    12(%esp), %eax
> >         movl    4(%esp), %ecx
> >         movl    (%eax), %edx
> >         leal    (%ecx,%edx), %eax
> >         movl    8(%esp), %edx
> >         subl    (%edx), %eax
> >         andl    %ecx, %eax
> >         ret
> >         .size   diff_umask, .-diff_umask
> >         .ident  "GCC: (Debian 4.3.2-1.1) 4.3.2"
> >         .section        .note.GNU-stack,"",@progbits
> > */
> > 
> > Note: That is not the compiler version I am building my kernels with.
> > 
> 
> The compiler I am using (Gentoo 4.1.2) gets it correct:
> 
>         .file   "difftest.c"
>         .text
>         .p2align 4,,15
> .globl diff_umask
>         .type   diff_umask, @function
> diff_umask:
>         movl    4(%esp), %eax
>         movl    8(%esp), %edx
>         movl    %eax, %ecx
>         subl    (%edx), %ecx
>         movl    %ecx, %edx
>         movl    12(%esp), %ecx
>         addl    (%ecx), %edx
>         andl    %edx, %eax
>         ret
>         .size   diff_umask, .-diff_umask
>         .ident  "GCC: (GNU) 4.1.2 (Gentoo 4.1.2 p1.1)"
>         .section        .note.GNU-stack,"",@progbits
> 

Gentoo's current 4.3 gets it wrong also:

        .file   "difftest.c"
        .text
        .p2align 4,,15
.globl diff_umask
        .type   diff_umask, @function
diff_umask:
        movl    12(%esp), %eax
        movl    4(%esp), %ecx
        movl    (%eax), %edx
        leal    (%ecx,%edx), %eax
        movl    8(%esp), %edx
        subl    (%edx), %eax
        andl    %ecx, %eax
        ret
        .size   diff_umask, .-diff_umask
        .ident  "GCC: (Gentoo 4.3.2-r3 p1.6, pie-10.1.5) 4.3.2"
        .section        .note.GNU-stack,"",@progbits

= = = =

Might be time to put compiler version checking back into the
build system and/or re-test the sources that do have version
checking in them (hint: the boss's code).

Mike
> Mike
> > Don't blame me, I didn't write the compiler. ;)
> > 
> > Mike
> > > 	-hpa
> > > 
> > 
> > 
> > --



^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [BUG FIX] Make x86_32 uni-processor Atomic ops, Atomic
  2009-05-25 19:46                             ` Michael S. Zick
@ 2009-05-25 21:10                               ` Michael S. Zick
  2009-05-25 21:17                                 ` H. Peter Anvin
  0 siblings, 1 reply; 90+ messages in thread
From: Michael S. Zick @ 2009-05-25 21:10 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Harald Welte, Ingo Molnar, Thomas Gleixner, linux-kernel,
	Alan Cox

On Mon May 25 2009, Michael S. Zick wrote:

In actual application, this *should not* make a difference.

Mike
> On Mon May 25 2009, Michael S. Zick wrote:
> > On Mon May 25 2009, Michael S. Zick wrote:
> > > On Sun May 24 2009, H. Peter Anvin wrote:
> > > > Michael S. Zick wrote:
> > > > > 
> > > > > Note: I have seem to recall that newer gcc's optimizer presume 
> > > > > that the flags register is preserved across asm - 
> > > > > It didn't use to do that - but there is now a "cc" to deal with
> > > > > that - Have not yet audited for that, but it is high on my list.
> > > > > 
> > > > 
> > > > I am pretty sure that's false... if it was true we'd have failures all
> > > > over the kernel.
> > > > 
> > > 
> > > No information on the above (yet) - but you gotta love this one: ;)
> > > 
> > > Programmer authors code specifying that the subtraction be done
> > > prior to the addition to avoid over-flow conditions;
> > > 
> > > GCC's optimizer, in its great wisdom, codes in the overflow case:
> > > ( the case of finding the characters used/free in a ring buffer )
> > > 
> > > extern int diff_umask(int mask, int *cnt1, int *cnt2) 
> > > { return (((mask - *cnt1) + *cnt2) & mask); }
> > > 
> > > /**
> > >  * gcc -O2 -S -fomit-frame-pointer difftest.c
> > >  *
> > >          .file   "difftest.c"
> > >         .text
> > >         .p2align 4,,15
> > > .globl diff_umask
> > >         .type   diff_umask, @function
> > > diff_umask:
> > >         movl    12(%esp), %eax
> > >         movl    4(%esp), %ecx
> > >         movl    (%eax), %edx
> > >         leal    (%ecx,%edx), %eax
> > >         movl    8(%esp), %edx
> > >         subl    (%edx), %eax
> > >         andl    %ecx, %eax
> > >         ret
> > >         .size   diff_umask, .-diff_umask
> > >         .ident  "GCC: (Debian 4.3.2-1.1) 4.3.2"
> > >         .section        .note.GNU-stack,"",@progbits
> > > */
> > > 
> > > Note: That is not the compiler version I am building my kernels with.
> > > 
> > 
> > The compiler I am using (Gentoo 4.1.2) gets it correct:
> > 
> >         .file   "difftest.c"
> >         .text
> >         .p2align 4,,15
> > .globl diff_umask
> >         .type   diff_umask, @function
> > diff_umask:
> >         movl    4(%esp), %eax
> >         movl    8(%esp), %edx
> >         movl    %eax, %ecx
> >         subl    (%edx), %ecx
> >         movl    %ecx, %edx
> >         movl    12(%esp), %ecx
> >         addl    (%ecx), %edx
> >         andl    %edx, %eax
> >         ret
> >         .size   diff_umask, .-diff_umask
> >         .ident  "GCC: (GNU) 4.1.2 (Gentoo 4.1.2 p1.1)"
> >         .section        .note.GNU-stack,"",@progbits
> > 
> 
> Gentoo's current 4.3 gets it wrong also:
> 
>         .file   "difftest.c"
>         .text
>         .p2align 4,,15
> .globl diff_umask
>         .type   diff_umask, @function
> diff_umask:
>         movl    12(%esp), %eax
>         movl    4(%esp), %ecx
>         movl    (%eax), %edx
>         leal    (%ecx,%edx), %eax
>         movl    8(%esp), %edx
>         subl    (%edx), %eax
>         andl    %ecx, %eax
>         ret
>         .size   diff_umask, .-diff_umask
>         .ident  "GCC: (Gentoo 4.3.2-r3 p1.6, pie-10.1.5) 4.3.2"
>         .section        .note.GNU-stack,"",@progbits
> 
> = = = =
> 
> Might be time to put compiler version checking back into the
> build system and/or re-test the sources that do have version
> checking in them (hint: the boss's code).
> 
> Mike
> > Mike
> > > Don't blame me, I didn't write the compiler. ;)
> > > 
> > > Mike
> > > > 	-hpa
> > > > 
> > > 
> > > 
> > > --
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 
> 



^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [BUG FIX] Make x86_32 uni-processor Atomic ops, Atomic
  2009-05-25 21:10                               ` Michael S. Zick
@ 2009-05-25 21:17                                 ` H. Peter Anvin
  2009-05-25 23:03                                   ` Michael S. Zick
  0 siblings, 1 reply; 90+ messages in thread
From: H. Peter Anvin @ 2009-05-25 21:17 UTC (permalink / raw)
  To: lkml; +Cc: Harald Welte, Ingo Molnar, Thomas Gleixner, linux-kernel,
	Alan Cox

Michael S. Zick wrote:
> On Mon May 25 2009, Michael S. Zick wrote:
> 
> In actual application, this *should not* make a difference.
> 

No kidding.  This is a valid transformation for integers, since it is
all done with 2's-complement arithmetic.

Floating-point numbers is a whole other game.

	-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.


^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [BUG FIX] Make x86_32 uni-processor Atomic ops, Atomic
  2009-05-25 21:17                                 ` H. Peter Anvin
@ 2009-05-25 23:03                                   ` Michael S. Zick
  2009-05-25 23:35                                     ` Michael S. Zick
  2009-05-26  0:05                                     ` H. Peter Anvin
  0 siblings, 2 replies; 90+ messages in thread
From: Michael S. Zick @ 2009-05-25 23:03 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Harald Welte, Ingo Molnar, Thomas Gleixner, linux-kernel,
	Alan Cox

On Mon May 25 2009, H. Peter Anvin wrote:
> Michael S. Zick wrote:
> > On Mon May 25 2009, Michael S. Zick wrote:
> > 
> > In actual application, this *should not* make a difference.
> > 
> 
> No kidding.  This is a valid transformation for integers, since it is
> all done with 2's-complement arithmetic.
> 

Load Effective Address does two's complement arithmetic?
I'll take your word for it.

For example:

#include <stdio.h>

extern int diff_umask(int mask, int *cnt1, int *cnt2)
{ return (((mask - *cnt1) + *cnt2) & mask); }

int main() {
 int msk  = 0x7fffffff; /* max positive */
 int idx1 = 0x7ffffffd; /* max positive - 2 */
 int idx2 = 0x7fffffff; /* max positive */

 int rst;

 rst = diff_umask(msk, &idx1, &idx2);
 printf("\n\t%d\n", rst);  /* " 1 " - correct */
}

But that is because when it is compiled as a
single source file, gcc is hardcoding the lea
adjustment when it is not an external file:
(compare to the above listings)
Like I wrote - I don't use 31-bit ring buffers, so I don't care.

objdump -d testdiff:
- - - snip - - -
080483b0 <diff_umask>:
 80483b0:       8b 44 24 0c             mov    0xc(%esp),%eax
 80483b4:       8b 4c 24 04             mov    0x4(%esp),%ecx
 80483b8:       8b 10                   mov    (%eax),%edx
 80483ba:       8d 04 11                lea    (%ecx,%edx,1),%eax
 80483bd:       8b 54 24 08             mov    0x8(%esp),%edx
 80483c1:       2b 02                   sub    (%edx),%eax
 80483c3:       21 c8                   and    %ecx,%eax
 80483c5:       c3                      ret
- - - snip - - -

Mike

> Floating-point numbers is a whole other game.
> 
> 	-hpa
> 



^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [BUG FIX] Make x86_32 uni-processor Atomic ops, Atomic
  2009-05-25 23:03                                   ` Michael S. Zick
@ 2009-05-25 23:35                                     ` Michael S. Zick
  2009-05-26  0:05                                     ` H. Peter Anvin
  1 sibling, 0 replies; 90+ messages in thread
From: Michael S. Zick @ 2009-05-25 23:35 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Harald Welte, Ingo Molnar, Thomas Gleixner, linux-kernel,
	Alan Cox

On Mon May 25 2009, Michael S. Zick wrote:

PS: gcc-4.1.2 does compile the function the same
within the main file or as a stand-alone file.
Along with maintaining the programmer specified
order of operations without trying to hardcode
corrections to LEA.
I'll stick with 4.1.2 myself.  YMMV

Mike
> On Mon May 25 2009, H. Peter Anvin wrote:
> > Michael S. Zick wrote:
> > > On Mon May 25 2009, Michael S. Zick wrote:
> > > 
> > > In actual application, this *should not* make a difference.
> > > 
> > 
> > No kidding.  This is a valid transformation for integers, since it is
> > all done with 2's-complement arithmetic.
> > 
> 
> Load Effective Address does two's complement arithmetic?
> I'll take your word for it.
> 
> For example:
> 
> #include <stdio.h>
> 
> extern int diff_umask(int mask, int *cnt1, int *cnt2)
> { return (((mask - *cnt1) + *cnt2) & mask); }
> 
> int main() {
>  int msk  = 0x7fffffff; /* max positive */
>  int idx1 = 0x7ffffffd; /* max positive - 2 */
>  int idx2 = 0x7fffffff; /* max positive */
> 
>  int rst;
> 
>  rst = diff_umask(msk, &idx1, &idx2);
>  printf("\n\t%d\n", rst);  /* " 1 " - correct */
> }
> 
> But that is because when it is compiled as a
> single source file, gcc is hardcoding the lea
> adjustment when it is not an external file:
> (compare to the above listings)
> Like I wrote - I don't use 31-bit ring buffers, so I don't care.
> 
> objdump -d testdiff:
> - - - snip - - -
> 080483b0 <diff_umask>:
>  80483b0:       8b 44 24 0c             mov    0xc(%esp),%eax
>  80483b4:       8b 4c 24 04             mov    0x4(%esp),%ecx
>  80483b8:       8b 10                   mov    (%eax),%edx
>  80483ba:       8d 04 11                lea    (%ecx,%edx,1),%eax
>  80483bd:       8b 54 24 08             mov    0x8(%esp),%edx
>  80483c1:       2b 02                   sub    (%edx),%eax
>  80483c3:       21 c8                   and    %ecx,%eax
>  80483c5:       c3                      ret
> - - - snip - - -
> 
> Mike
> 
> > Floating-point numbers is a whole other game.
> > 
> > 	-hpa
> > 
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 
> 



^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [BUG FIX] Make x86_32 uni-processor Atomic ops, Atomic
  2009-05-25 23:03                                   ` Michael S. Zick
  2009-05-25 23:35                                     ` Michael S. Zick
@ 2009-05-26  0:05                                     ` H. Peter Anvin
  2009-05-26 12:37                                       ` Michael S. Zick
  1 sibling, 1 reply; 90+ messages in thread
From: H. Peter Anvin @ 2009-05-26  0:05 UTC (permalink / raw)
  To: lkml; +Cc: Harald Welte, Ingo Molnar, Thomas Gleixner, linux-kernel,
	Alan Cox

Michael S. Zick wrote:
> 
> Load Effective Address does two's complement arithmetic?
> I'll take your word for it.
> 

LEA, and all other address calculations use 2's-complement arithmetic:

	leal -1(%ebx),%eax
	leal 0xffffffff(%ebx),%eax

... is the same instruction.

However, gcc has been known to optimize out range checks when operating
on signed integers; it is allowed to do this by the C standard, but it
can give surprising results if the user expected wraparound.

	-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.


^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [BUG FIX] Make x86_32 uni-processor Atomic ops, Atomic
  2009-05-26  0:05                                     ` H. Peter Anvin
@ 2009-05-26 12:37                                       ` Michael S. Zick
  2009-05-26 17:13                                         ` H. Peter Anvin
  0 siblings, 1 reply; 90+ messages in thread
From: Michael S. Zick @ 2009-05-26 12:37 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Harald Welte, Ingo Molnar, Thomas Gleixner, linux-kernel,
	Alan Cox

On Mon May 25 2009, H. Peter Anvin wrote:
> Michael S. Zick wrote:
> > 
> > Load Effective Address does two's complement arithmetic?
> > I'll take your word for it.
> > 
> 
> LEA, and all other address calculations use 2's-complement arithmetic:
> 
> 	leal -1(%ebx),%eax
> 	leal 0xffffffff(%ebx),%eax
> 
> ... is the same instruction.
> 
> However, gcc has been known to optimize out range checks when operating
> on signed integers; it is allowed to do this by the C standard, but it
> can give surprising results if the user expected wraparound.
>

Well, it isn't a range check - - but this illustrates where my (false)
concern came from: 

Given this input file:
extern int diff_umask(int mask, int *cnt1, int *cnt2)
{ return (((mask - *cnt1) + *cnt2) & mask); }

Doing:
gcc -O2 -S -fomit-frame-pointer difftest.c

Yields (as difftest.s):
        .file   "difftest.c"
        .text
        .p2align 4,,15
.globl diff_umask
        .type   diff_umask, @function
diff_umask:
        movl    12(%esp), %eax
        movl    4(%esp), %ecx
        movl    (%eax), %edx
        leal    (%ecx,%edx), %eax
        movl    8(%esp), %edx
        subl    (%edx), %eax
        andl    %ecx, %eax
        ret
        .size   diff_umask, .-diff_umask
        .ident  "GCC: (Debian 4.3.2-1.1) 4.3.2"
        .section        .note.GNU-stack,"",@progbits

How follow that up with the commands:
gcc -O2 -c -fomit-frame-pointer difftest.s

Then examine the result with objdump:
objdump -d difftest.o

In relevant part, yields:
difftest.o:     file format elf32-i386

Disassembly of section .text:

00000000 <diff_umask>:
   0:   8b 44 24 0c             mov    0xc(%esp),%eax
   4:   8b 4c 24 04             mov    0x4(%esp),%ecx
   8:   8b 10                   mov    (%eax),%edx
   a:   8d 04 11                lea    (%ecx,%edx,1),%eax
   d:   8b 54 24 08             mov    0x8(%esp),%edx
  11:   2b 02                   sub    (%edx),%eax
  13:   21 c8                   and    %ecx,%eax
  15:   c3                      ret

= = = =

Checking the byte string 0x8d, 0x04, 0x11 against the Intel
documentation shows that the disassembly output of objdump
is incorrect - that bit string does not have an offset field.
That is the byte encoding for the gcc assembly input.

What's a person to do when the tool-chain lies?

Mike
> 	-hpa
> 



^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [BUG FIX] Make x86_32 uni-processor Atomic ops, Atomic
  2009-05-26 12:37                                       ` Michael S. Zick
@ 2009-05-26 17:13                                         ` H. Peter Anvin
  0 siblings, 0 replies; 90+ messages in thread
From: H. Peter Anvin @ 2009-05-26 17:13 UTC (permalink / raw)
  To: lkml; +Cc: Harald Welte, Ingo Molnar, Thomas Gleixner, linux-kernel,
	Alan Cox

Michael S. Zick wrote:
> 
> Disassembly of section .text:
> 
> 00000000 <diff_umask>:
>    0:   8b 44 24 0c             mov    0xc(%esp),%eax
>    4:   8b 4c 24 04             mov    0x4(%esp),%ecx
>    8:   8b 10                   mov    (%eax),%edx
>    a:   8d 04 11                lea    (%ecx,%edx,1),%eax
>    d:   8b 54 24 08             mov    0x8(%esp),%edx
>   11:   2b 02                   sub    (%edx),%eax
>   13:   21 c8                   and    %ecx,%eax
>   15:   c3                      ret
> 
> = = = =
> 
> Checking the byte string 0x8d, 0x04, 0x11 against the Intel
> documentation shows that the disassembly output of objdump
> is incorrect - that bit string does not have an offset field.
> That is the byte encoding for the gcc assembly input.
> 
> What's a person to do when the tool-chain lies?
> 

The ,1 isn't an offset field... it's a scale factor.

	-hpa


^ permalink raw reply	[flat|nested] 90+ messages in thread

* i2c-viapro / via-fb drivers on VIA CX700
  2009-05-24 18:32                     ` Michael S. Zick
  2009-05-24 18:46                       ` H. Peter Anvin
@ 2009-05-25  1:31                       ` Harald Welte
  2009-05-25 12:54                         ` Michael S. Zick
  2009-05-27 13:36                         ` Michael S. Zick
  2009-05-25 16:05                       ` [BUG FIX] Make x86_32 uni-processor Atomic ops, Atomic Michael S. Zick
  2 siblings, 2 replies; 90+ messages in thread
From: Harald Welte @ 2009-05-25  1:31 UTC (permalink / raw)
  To: Michael S. Zick; +Cc: linux-kernel

On Sun, May 24, 2009 at 01:32:37PM -0500, Michael S. Zick wrote:

> The i2c-viapro driver (in spite of its comments) does not work
> on CX700 (written before manual was released) - it is reading
> the serial number rather than the second data port. ;)
> (No access to the chipset temperature/voltage data on SMBus).

This is surprising.  I just manually verified the driver against the
cx700 programming manual, and it seems to do the right thing.  Lacking
access to a cx700 board right now, I cannot perform an actual test.

Where exactly is the bug about the wrong register that you mentioned?
I'd rather fix that ASAP.

> The via-fb driver just "doesn't work" - Haven't looked at it yet.

good to know.  Seems like I need to get access to a CX700 based board.

> There isn't a driver for the hardware watchdog on CX700 - 

JFYI: I wrote one but it doesn't work on the vx800/vx855, and VIA is currently
trying to figure out why.

> There isn't a driver for the machine error reporting -

I think the CPU just claims to report it but in reality doesn't... this was
made to make some proprietary software happy, AFAIR.

-- 
- Harald Welte <HaraldWelte@viatech.com>	    http://linux.via.com.tw/
============================================================================
VIA Open Source Liaison

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: i2c-viapro / via-fb drivers on VIA CX700
  2009-05-25  1:31                       ` i2c-viapro / via-fb drivers on VIA CX700 Harald Welte
@ 2009-05-25 12:54                         ` Michael S. Zick
  2009-05-27 13:36                         ` Michael S. Zick
  1 sibling, 0 replies; 90+ messages in thread
From: Michael S. Zick @ 2009-05-25 12:54 UTC (permalink / raw)
  To: Harald Welte; +Cc: linux-kernel

On Sun May 24 2009, Harald Welte wrote:
> On Sun, May 24, 2009 at 01:32:37PM -0500, Michael S. Zick wrote:
> 
> > The i2c-viapro driver (in spite of its comments) does not work
> > on CX700 (written before manual was released) - it is reading
> > the serial number rather than the second data port. ;)
> > (No access to the chipset temperature/voltage data on SMBus).
>

None of by comments have been verified sufficiently to be considered
for reporting as a bug.  Not yet.  
I need to carefully check my initial findings.

Also, both the code base and my configuration has changed greatly
since I just made those first notes.

None of those things on my "to check" list are critical to solving
a "hard lock-up" problem.
Now that I get pairs of builds with vastly different behavior - -
I can start narrowing in on the prime cause -
My plan -
*) Make local changes to printk.c that will behave differently
under heavy message floods (I have the re-try loop in ehci-hcd
available to generate the floods with).
*) Build, again, with the lockdep reporting -
This will do one of two things -
**) Make true reports - which can be found and fixed
**) Make false reports due to whatever is being worked-around
with changing the LOCK_PREFIX macro.
Even in the second case, we will get locations in the source
to eye-ball/test for the effect of the LOCK_PREFIX macro.

> This is surprising.  I just manually verified the driver against the
> cx700 programming manual, and it seems to do the right thing.  Lacking
> access to a cx700 board right now, I cannot perform an actual test.
> 
> Where exactly is the bug about the wrong register that you mentioned?
> I'd rather fix that ASAP.
> 

Let me re-check my notes on that against the -rc7 build, I really,
really would like to have SMBus access to those thermal monitors.
Also, the Everex CloudBook (not the Sylvania gBook) controls the
power to the Wifi card as an SMBus device.
The Wifi card works much better with power applied.  ;)

> > The via-fb driver just "doesn't work" - Haven't looked at it yet.
> 
> good to know.  Seems like I need to get access to a CX700 based board.
>  
> > There isn't a driver for the hardware watchdog on CX700 - 
> 
> JFYI: I wrote one but it doesn't work on the vx800/vx855, and VIA is currently
> trying to figure out why.
> 

Not a big issue at this point, as I read the manual, our choices are
to either pull down the chipset "power off" or "reset" lines with it.
Since those choices are probably grown into the silicon...
Now if we had the choice to generate a crash dump...

If you can send me a link to your preliminary code, I will check it
against the CX700 - - I do have a machine that will let it trigger. ;)

> > There isn't a driver for the machine error reporting -
> 
> I think the CPU just claims to report it but in reality doesn't... this was
> made to make some proprietary software happy, AFAIR.
> 

I think that a working SMBus driver would give enough access to
the chip set for practical purposes - these machines don't support
ECC ram - the BIOS (your demo board BIOS if we believe dmidump)
should be handling other machine check exceptions.

Thanks very much for your time.
Mike

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: i2c-viapro / via-fb drivers on VIA CX700
  2009-05-25  1:31                       ` i2c-viapro / via-fb drivers on VIA CX700 Harald Welte
  2009-05-25 12:54                         ` Michael S. Zick
@ 2009-05-27 13:36                         ` Michael S. Zick
  1 sibling, 0 replies; 90+ messages in thread
From: Michael S. Zick @ 2009-05-27 13:36 UTC (permalink / raw)
  To: Harald Welte; +Cc: linux-kernel

On Sun May 24 2009, Harald Welte wrote:
> On Sun, May 24, 2009 at 01:32:37PM -0500, Michael S. Zick wrote:
> 
> > The i2c-viapro driver (in spite of its comments) does not work
> > on CX700 (written before manual was released) - it is reading
> > the serial number rather than the second data port. ;)
> > (No access to the chipset temperature/voltage data on SMBus).
> 
> This is surprising.  I just manually verified the driver against the
> cx700 programming manual, and it seems to do the right thing.  Lacking
> access to a cx700 board right now, I cannot perform an actual test.
> 
> Where exactly is the bug about the wrong register that you mentioned?
> I'd rather fix that ASAP.
> 

I'll send you some reference material on how Everex is using the SMBus
on their ce1200v (the original Cloudbook) off-list.

Mike

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [BUG FIX] Make x86_32 uni-processor Atomic ops, Atomic
  2009-05-24 18:32                     ` Michael S. Zick
  2009-05-24 18:46                       ` H. Peter Anvin
  2009-05-25  1:31                       ` i2c-viapro / via-fb drivers on VIA CX700 Harald Welte
@ 2009-05-25 16:05                       ` Michael S. Zick
  2 siblings, 0 replies; 90+ messages in thread
From: Michael S. Zick @ 2009-05-25 16:05 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Harald Welte, Ingo Molnar, Thomas Gleixner, linux-kernel,
	Alan Cox

On Sun May 24 2009, Michael S. Zick wrote:
> On Sun May 24 2009, H. Peter Anvin wrote:
> > Michael S. Zick wrote:
> > > 
> > > @hpa - I still like your suggestion that it is only one (or a few)
> > > uses of atomic ops that is incorrect and in general atomic ops
> > > should compile away on uni-processor.
> > > 
> > 
> > Actually, the more I think about it the more I suspect there is a race
> > condition either in the chip set or in any VIA-specific drivers (if
> > there are any.)  Putting LOCKs in random places will slow the CPU down
> > significantly, so it might resolve the race condition without actually
> > solving the problem.
> >
> 
> They are mostly out of the -09143 and -09144 builds -
> No cpufreq (I.E: no e_powersaver).
> The padlock-* drivers are modules which must be manually loaded. 
> 
> The i2c-viapro driver (in spite of its comments) does not work
> on CX700 (written before manual was released) - it is reading
> the serial number rather than the second data port. ;)
> (No access to the chipset temperature/voltage data on SMBus).
> 
> The via-fb driver just "doesn't work" - Haven't looked at it yet.
> 
> There is a VIA-specific driver for the VIA USB controller, but it
> isn't in the x86 part of the tree - Haven't looked at it yet.
> 
> There isn't a driver for the hardware watchdog on CX700 - 
> There isn't a driver for the machine error reporting -
> 
> = = = =
> 
> Although there may be timing requirement differences on the
> CX700 and CN896 - I think more likely a human error (typo)
> in the "clobber" lines of the asm - Have not yet audited that,
> but it is high on my list.
> 
> Note: I have seem to recall that newer gcc's optimizer presume 
> that the flags register is preserved across asm - 
> It didn't use to do that - but there is now a "cc" to deal with
> that - Have not yet audited for that, but it is high on my list.
> 
> Busy, busy, busy - -
> The -09144lk on C7-M/CX700 now up for 3 3/4 hours close to a new
> record - but ehci-hcd has not yet gone into a re-try loop.
>

The -09145{,lk}-db pair is posted now.
Same code-base/config as the -09144{,lk} pair with the addition 
of lockdep checking.
Details:
http://forum.netbookuser.com/viewtopic.php?pid=6980#p6980

Mike
> Mike
> > 	-hpa
> > 
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 
> 



^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [BUG FIX] Make x86_32 uni-processor Atomic ops, Atomic
  2009-05-24 18:00                   ` H. Peter Anvin
  2009-05-24 18:32                     ` Michael S. Zick
@ 2009-05-28 20:30                     ` Pavel Machek
  2009-05-28 20:54                       ` Michael S. Zick
  1 sibling, 1 reply; 90+ messages in thread
From: Pavel Machek @ 2009-05-28 20:30 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: lkml, Harald Welte, Ingo Molnar, Thomas Gleixner, linux-kernel,
	Alan Cox

Hi!

> > @hpa - I still like your suggestion that it is only one (or a few)
> > uses of atomic ops that is incorrect and in general atomic ops
> > should compile away on uni-processor.
> > 
> 
> Actually, the more I think about it the more I suspect there is a race
> condition either in the chip set or in any VIA-specific drivers (if
> there are any.)  Putting LOCKs in random places will slow the CPU down
> significantly, so it might resolve the race condition without actually
> solving the problem.

Which you can verify; replace lock with something slow (pushad,
popad)? And see what happens.

(And if it never ever triggers on hp2133, you have strong clue that it
may not be cpu-related, but bios-related or chipset related or something).

Some time ago I was trying to debug misterious hangs on some
via/fic machines. 

We never figured out what was wrong, but we discovered many other bios
bugs, and those were not being fixed; so debugging was
hard/impossible. Unfortunately I no longer have access to that hw.

hp2133 did _not_ have that problem.

Try forcing maximum throttling, then move mouse for like five
seconds. If kbc dies, you have same buggy bios, and probably are
debugging same problem....
								Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [BUG FIX] Make x86_32 uni-processor Atomic ops, Atomic
  2009-05-28 20:30                     ` Pavel Machek
@ 2009-05-28 20:54                       ` Michael S. Zick
  2009-05-28 23:15                         ` [Futex RFC] was " Michael S. Zick
  0 siblings, 1 reply; 90+ messages in thread
From: Michael S. Zick @ 2009-05-28 20:54 UTC (permalink / raw)
  To: Pavel Machek
  Cc: H. Peter Anvin, Harald Welte, Ingo Molnar, Thomas Gleixner,
	linux-kernel, Alan Cox

On Thu May 28 2009, Pavel Machek wrote:
> Hi!
> 
> > > @hpa - I still like your suggestion that it is only one (or a few)
> > > uses of atomic ops that is incorrect and in general atomic ops
> > > should compile away on uni-processor.
> > > 
> > 
> > Actually, the more I think about it the more I suspect there is a race
> > condition either in the chip set or in any VIA-specific drivers (if
> > there are any.)  Putting LOCKs in random places will slow the CPU down
> > significantly, so it might resolve the race condition without actually
> > solving the problem.
> 
> Which you can verify; replace lock with something slow (pushad,
> popad)? And see what happens.
> 
> (And if it never ever triggers on hp2133, you have strong clue that it
> may not be cpu-related, but bios-related or chipset related or something).
> 
> Some time ago I was trying to debug misterious hangs on some
> via/fic machines. 
> 
> We never figured out what was wrong, but we discovered many other bios
> bugs, and those were not being fixed; so debugging was
> hard/impossible. Unfortunately I no longer have access to that hw.
> 

Then I am not losing my mind here - *it is* a difficult problem.  ;)

> hp2133 did _not_ have that problem.
>

Today's build has been playing me music for over 8 hours on the
HP-2133 (C7M-CN896) but can't get past a couple of hours on the
(fic) Everex Cloudbook (C7M-CX700).

Also, the distro on the Cloudbook is using pulse-audio - the
distro on the HP is not.  So I am reviewing the recent bug
fixes to kernel/futex for something over-looked.  ;)
May be a wild goose chase, but I think pulse-audio uses futexes.

Thanks for the other hints.

Mike 
> Try forcing maximum throttling, then move mouse for like five
> seconds. If kbc dies, you have same buggy bios, and probably are
> debugging same problem....
> 								Pavel



^ permalink raw reply	[flat|nested] 90+ messages in thread

* [Futex RFC] was Re: [BUG FIX] Make x86_32 uni-processor Atomic ops, Atomic
  2009-05-28 20:54                       ` Michael S. Zick
@ 2009-05-28 23:15                         ` Michael S. Zick
  2009-05-29  2:00                           ` Michael S. Zick
  0 siblings, 1 reply; 90+ messages in thread
From: Michael S. Zick @ 2009-05-28 23:15 UTC (permalink / raw)
  To: Pavel Machek
  Cc: H. Peter Anvin, Harald Welte, Ingo Molnar, Thomas Gleixner,
	linux-kernel, Alan Cox

On Thu May 28 2009, Michael S. Zick wrote:
> On Thu May 28 2009, Pavel Machek wrote:
> > Hi!
> > 
> > > > @hpa - I still like your suggestion that it is only one (or a few)
> > > > uses of atomic ops that is incorrect and in general atomic ops
> > > > should compile away on uni-processor.
> > > > 
> > > 
> > > Actually, the more I think about it the more I suspect there is a race
> > > condition either in the chip set or in any VIA-specific drivers (if
> > > there are any.)  Putting LOCKs in random places will slow the CPU down
> > > significantly, so it might resolve the race condition without actually
> > > solving the problem.
> > 
> > Which you can verify; replace lock with something slow (pushad,
> > popad)? And see what happens.
> > 
> > (And if it never ever triggers on hp2133, you have strong clue that it
> > may not be cpu-related, but bios-related or chipset related or something).
> > 
> > Some time ago I was trying to debug misterious hangs on some
> > via/fic machines. 
> > 
> > We never figured out what was wrong, but we discovered many other bios
> > bugs, and those were not being fixed; so debugging was
> > hard/impossible. Unfortunately I no longer have access to that hw.
> > 
> 
> Then I am not losing my mind here - *it is* a difficult problem.  ;)
> 
> > hp2133 did _not_ have that problem.
> >
> 
> Today's build has been playing me music for over 8 hours on the
> HP-2133 (C7M-CN896) but can't get past a couple of hours on the
> (fic) Everex Cloudbook (C7M-CX700).
> 
> Also, the distro on the Cloudbook is using pulse-audio - the
> distro on the HP is not.  So I am reviewing the recent bug
> fixes to kernel/futex for something over-looked.  ;)
> May be a wild goose chase, but I think pulse-audio uses futexes.
> 

Please, somebody apply an experienced eye-ball to this;
It does seem to make a difference, but tests have not run
for very long yet.

diff --git a/arch/x86/include/asm/futex.h b/arch/x86/include/asm/futex.h
index 1f11ce4..da3c801 100644
--- a/arch/x86/include/asm/futex.h
+++ b/arch/x86/include/asm/futex.h
@@ -19,7 +19,8 @@
                     "\t.previous\n"                            \
                     _ASM_EXTABLE(1b, 3b)                       \
                     : "=r" (oldval), "=r" (ret), "+m" (*uaddr) \
-                    : "i" (-EFAULT), "0" (oparg), "1" (0))
+                    : "i" (-EFAULT), "0" (oparg), "1" (0)      \
+                    : "memory")

 #define __futex_atomic_op2(insn, ret, oldval, uaddr, oparg)    \
        asm volatile("1:\tmovl  %2, %0\n"                       \
@@ -35,7 +36,8 @@
                     _ASM_EXTABLE(2b, 4b)                       \
                     : "=&a" (oldval), "=&r" (ret),             \
                       "+m" (*uaddr), "=&r" (tem)               \
-                    : "r" (oparg), "i" (-EFAULT), "1" (0))
+                    : "r" (oparg), "i" (-EFAULT), "1" (0)      \
+                    : "memory")


Mike
> Thanks for the other hints.
> 
> Mike 
> > Try forcing maximum throttling, then move mouse for like five
> > seconds. If kbc dies, you have same buggy bios, and probably are
> > debugging same problem....
> > 								Pavel
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 
> 



^ permalink raw reply related	[flat|nested] 90+ messages in thread

* Re: [Futex RFC] was Re: [BUG FIX] Make x86_32 uni-processor Atomic ops, Atomic
  2009-05-28 23:15                         ` [Futex RFC] was " Michael S. Zick
@ 2009-05-29  2:00                           ` Michael S. Zick
  0 siblings, 0 replies; 90+ messages in thread
From: Michael S. Zick @ 2009-05-29  2:00 UTC (permalink / raw)
  To: Pavel Machek
  Cc: H. Peter Anvin, Harald Welte, Ingo Molnar, Thomas Gleixner,
	linux-kernel, Alan Cox

On Thu May 28 2009, Michael S. Zick wrote:
> 
> Please, somebody apply an experienced eye-ball to this;
> It does seem to make a difference, but tests have not run
> for very long yet.
> 
> diff --git a/arch/x86/include/asm/futex.h b/arch/x86/include/asm/futex.h
> index 1f11ce4..da3c801 100644
> --- a/arch/x86/include/asm/futex.h
> +++ b/arch/x86/include/asm/futex.h
> @@ -19,7 +19,8 @@
>                      "\t.previous\n"                            \
>                      _ASM_EXTABLE(1b, 3b)                       \
>                      : "=r" (oldval), "=r" (ret), "+m" (*uaddr) \
> -                    : "i" (-EFAULT), "0" (oparg), "1" (0))
> +                    : "i" (-EFAULT), "0" (oparg), "1" (0)      \
> +                    : "memory")
> 
>  #define __futex_atomic_op2(insn, ret, oldval, uaddr, oparg)    \
>         asm volatile("1:\tmovl  %2, %0\n"                       \
> @@ -35,7 +36,8 @@
>                      _ASM_EXTABLE(2b, 4b)                       \
>                      : "=&a" (oldval), "=&r" (ret),             \
>                        "+m" (*uaddr), "=&r" (tem)               \
> -                    : "r" (oparg), "i" (-EFAULT), "1" (0))
> +                    : "r" (oparg), "i" (-EFAULT), "1" (0)      \
> +                    : "memory")
> 
> 
> Mike

Without the above annotations: C7-M/CX700 uptime while running 
pulse-audio: 1 1/2 hrs.

With the above annotations: C7-M/CX700 uptime, same test setup,
maximum unknown, test terminated after 3 hours.

On the C7-M/CN896 - maximum unknown, test terminated after 12 hrs.

Sample build to be available tomorrow.

Mike

^ permalink raw reply	[flat|nested] 90+ messages in thread

* LOCK prefix on uni processor has its use (was Re: [BUG FIX] Make x86_32 uni-processor Atomic ops, Atomic)
  2009-05-23 23:44               ` H. Peter Anvin
  2009-05-24  6:49                 ` Harald Welte
  2009-05-24 12:27                 ` Michael S. Zick
@ 2009-05-27 17:01                 ` Harald Welte
  2009-05-27 17:10                   ` Michael S. Zick
                                     ` (3 more replies)
  2 siblings, 4 replies; 90+ messages in thread
From: Harald Welte @ 2009-05-27 17:01 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: lkml, Ingo Molnar, Thomas Gleixner, linux-kernel, Alan Cox

Hi hpa and others,

On Sat, May 23, 2009 at 04:44:08PM -0700, H. Peter Anvin wrote:

> It looks like there might be a problem with the C7-M ... Michael reports
> that if he sets LOCK_PREFIX to "lock;" it works, but that shouldn't be
> necessary for a uniprocessor.

It seems, they are neccessary.

Here are some statements from the CPU logic guys at VIA/Centaur:

* A read-modify-write sequence cannot be interupted.
* All X86 instructions except rep-strings are atomic wrt interrupts.
* The lock prefix has uses on a UP processor: It keeps DMA devices from
  interfering with a read-modify-write sequence

Furthermore, they have done some experimentation in the past, making the
CPU simply ignore the LOCK prefix on uni-processor (running a certain popular
proprietary operating system): It doesn't work, presumably of the abovementioned
DMA related conflict.

Also, the engineers believe that it is only a matter of time until different
CPU/chipset combination would expose the same bug.  Since the in-order
single-retire C7-M is more vulnerable than out-of-order, multiple-retire CPU's,
they are not surprised that the issue shows first on the C7-M.

The recommendation from the CPU engineers, unsurprisingly, thus is to put the
LOCK prefixes back where they were.

Hope this helps you.

Now if I understand the issues correctly, it would mean that there is some
driver code that modifies a certain chunk of memory, while DMA of some
peripheral is also accessing that memory.  I suppose it would not have to be
the same actual address, but probably being within the same cache line is
already sufficient.

Now the question is: Is this a valid operation of a driver?  Should the driver
do such things, or is such a driver broken?  When would that occur?  I'm trying
to come up with a case, but typically you e.g. allocate some DMA buffer and
then don't touch it until the hardware has processed it.

Regards,
-- 
- Harald Welte <HaraldWelte@viatech.com>	    http://linux.via.com.tw/
============================================================================
VIA Free and Open Source Software Liaison

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: LOCK prefix on uni processor has its use (was Re: [BUG FIX] Make x86_32 uni-processor Atomic ops, Atomic)
  2009-05-27 17:01                 ` LOCK prefix on uni processor has its use (was Re: [BUG FIX] Make x86_32 uni-processor Atomic ops, Atomic) Harald Welte
@ 2009-05-27 17:10                   ` Michael S. Zick
  2009-05-27 17:19                   ` Thomas Gleixner
                                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 90+ messages in thread
From: Michael S. Zick @ 2009-05-27 17:10 UTC (permalink / raw)
  To: Harald Welte
  Cc: H. Peter Anvin, Ingo Molnar, Thomas Gleixner, linux-kernel,
	Alan Cox

On Wed May 27 2009, Harald Welte wrote:
> Hi hpa and others,
> 
> On Sat, May 23, 2009 at 04:44:08PM -0700, H. Peter Anvin wrote:
>  
> > It looks like there might be a problem with the C7-M ... Michael reports
> > that if he sets LOCK_PREFIX to "lock;" it works, but that shouldn't be
> > necessary for a uniprocessor.
> 
> It seems, they are neccessary.
> 
> Here are some statements from the CPU logic guys at VIA/Centaur:
> 
> * A read-modify-write sequence cannot be interupted.
> * All X86 instructions except rep-strings are atomic wrt interrupts.
> * The lock prefix has uses on a UP processor: It keeps DMA devices from
>   interfering with a read-modify-write sequence
> 
> Furthermore, they have done some experimentation in the past, making the
> CPU simply ignore the LOCK prefix on uni-processor (running a certain popular
> proprietary operating system): It doesn't work, presumably of the abovementioned
> DMA related conflict.
> 
> Also, the engineers believe that it is only a matter of time until different
> CPU/chipset combination would expose the same bug.  Since the in-order
> single-retire C7-M is more vulnerable than out-of-order, multiple-retire CPU's,
> they are not surprised that the issue shows first on the C7-M.
> 
> The recommendation from the CPU engineers, unsurprisingly, thus is to put the
> LOCK prefixes back where they were.
> 
> Hope this helps you.
> 
> Now if I understand the issues correctly, it would mean that there is some
> driver code that modifies a certain chunk of memory, while DMA of some
> peripheral is also accessing that memory.  I suppose it would not have to be
> the same actual address, but probably being within the same cache line is
> already sufficient.
>

I am also testing with the pci cache line size hard-coded to be the same size
as the processor cache line size (a WAFG for now) - -

It is too soon (only an 1 1/2 hours) to be a significant finding - -
but if this was set to twice the physical line length, it would be only
flushing every other line - which I think would show up *real* fast.  ;)

I am noticing some "dropped buffers and/or dropped packets" in my streaming
music - - but that is not conclusive of anything other than hd-audio may
be using the wrong cache stride also.  ;)

Mike 
> Now the question is: Is this a valid operation of a driver?  Should the driver
> do such things, or is such a driver broken?  When would that occur?  I'm trying
> to come up with a case, but typically you e.g. allocate some DMA buffer and
> then don't touch it until the hardware has processed it.
> 
> Regards,



^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: LOCK prefix on uni processor has its use (was Re: [BUG FIX] Make x86_32 uni-processor Atomic ops, Atomic)
  2009-05-27 17:01                 ` LOCK prefix on uni processor has its use (was Re: [BUG FIX] Make x86_32 uni-processor Atomic ops, Atomic) Harald Welte
  2009-05-27 17:10                   ` Michael S. Zick
@ 2009-05-27 17:19                   ` Thomas Gleixner
  2009-05-27 17:25                     ` Michael S. Zick
  2009-05-27 18:08                   ` LOCK prefix on uni processor has its use Andi Kleen
  2009-05-28  2:56                   ` LOCK prefix on uni processor has its use (was Re: [BUG FIX] Make x86_32 uni-processor Atomic ops, Atomic) H. Peter Anvin
  3 siblings, 1 reply; 90+ messages in thread
From: Thomas Gleixner @ 2009-05-27 17:19 UTC (permalink / raw)
  To: Harald Welte; +Cc: H. Peter Anvin, lkml, Ingo Molnar, linux-kernel, Alan Cox

On Wed, 27 May 2009, Harald Welte wrote:
> Here are some statements from the CPU logic guys at VIA/Centaur:
> 
> * A read-modify-write sequence cannot be interupted.
> * All X86 instructions except rep-strings are atomic wrt interrupts.
> * The lock prefix has uses on a UP processor: It keeps DMA devices from
>   interfering with a read-modify-write sequence
...

> Now if I understand the issues correctly, it would mean that there is some
> driver code that modifies a certain chunk of memory, while DMA of some
> peripheral is also accessing that memory.  I suppose it would not have to be
> the same actual address, but probably being within the same cache line is
> already sufficient.
> 
> Now the question is: Is this a valid operation of a driver?  Should the driver
> do such things, or is such a driver broken?  When would that occur?  I'm trying
> to come up with a case, but typically you e.g. allocate some DMA buffer and
> then don't touch it until the hardware has processed it.

Right, that would be more than stupid, but even then it would not
explain any breakage of the kernel. Such a driver would not be
functional anyway if it relies on some read/write modify operations in
an active DMA buffer. That would also explode on any other system as
you have no control whether the access to that memory happens before
or after the DMA operation.

Can you please ask them to clarify that DMA issue further ?

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: LOCK prefix on uni processor has its use (was Re: [BUG FIX] Make x86_32 uni-processor Atomic ops, Atomic)
  2009-05-27 17:19                   ` Thomas Gleixner
@ 2009-05-27 17:25                     ` Michael S. Zick
  0 siblings, 0 replies; 90+ messages in thread
From: Michael S. Zick @ 2009-05-27 17:25 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Harald Welte, H. Peter Anvin, Ingo Molnar, linux-kernel, Alan Cox

On Wed May 27 2009, Thomas Gleixner wrote:
> On Wed, 27 May 2009, Harald Welte wrote:
> > Here are some statements from the CPU logic guys at VIA/Centaur:
> > 
> > * A read-modify-write sequence cannot be interupted.
> > * All X86 instructions except rep-strings are atomic wrt interrupts.
> > * The lock prefix has uses on a UP processor: It keeps DMA devices from
> >   interfering with a read-modify-write sequence
> ...
> 
> > Now if I understand the issues correctly, it would mean that there is some
> > driver code that modifies a certain chunk of memory, while DMA of some
> > peripheral is also accessing that memory.  I suppose it would not have to be
> > the same actual address, but probably being within the same cache line is
> > already sufficient.
> > 
> > Now the question is: Is this a valid operation of a driver?  Should the driver
> > do such things, or is such a driver broken?  When would that occur?  I'm trying
> > to come up with a case, but typically you e.g. allocate some DMA buffer and
> > then don't touch it until the hardware has processed it.
> 
> Right, that would be more than stupid, but even then it would not
> explain any breakage of the kernel. Such a driver would not be
> functional anyway if it relies on some read/write modify operations in
> an active DMA buffer. That would also explode on any other system as
> you have no control whether the access to that memory happens before
> or after the DMA operation.
> 

IFF your DMA buffer is cache-line aligned and doesn't have an immediately
adjacent spin-lock (or some such thing) sharing the cache-line.

Mike
> Can you please ask them to clarify that DMA issue further ?
> 
> Thanks,
> 
> 	tglx
> 
> 



^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: LOCK prefix on uni processor has its use
  2009-05-27 17:01                 ` LOCK prefix on uni processor has its use (was Re: [BUG FIX] Make x86_32 uni-processor Atomic ops, Atomic) Harald Welte
  2009-05-27 17:10                   ` Michael S. Zick
  2009-05-27 17:19                   ` Thomas Gleixner
@ 2009-05-27 18:08                   ` Andi Kleen
  2009-05-27 18:22                     ` Michael S. Zick
  2009-06-02 12:48                     ` Harald Welte
  2009-05-28  2:56                   ` LOCK prefix on uni processor has its use (was Re: [BUG FIX] Make x86_32 uni-processor Atomic ops, Atomic) H. Peter Anvin
  3 siblings, 2 replies; 90+ messages in thread
From: Andi Kleen @ 2009-05-27 18:08 UTC (permalink / raw)
  To: Harald Welte
  Cc: H. Peter Anvin, lkml, Ingo Molnar, Thomas Gleixner, linux-kernel,
	Alan Cox

Harald Welte <HaraldWelte@viatech.com> writes:
> * All X86 instructions except rep-strings are atomic wrt interrupts.
> * The lock prefix has uses on a UP processor: It keeps DMA devices from
>   interfering with a read-modify-write sequence

In theory yes, but not in Linux -- normal drivers simply don't use LOCK in any way
on a UP kernel.

We discussed exactly this in the earlier subthread :)

> Now the question is: Is this a valid operation of a driver?  Should the driver
> do such things, or is such a driver broken? 

The driver is broken because if it relies on this it will not work on a UP kernel.
Also it's not portable and in general a bad idea.

> When would that occur?  I'm trying
> to come up with a case, but typically you e.g. allocate some DMA buffer and
> then don't touch it until the hardware has processed it.

Is it known which driver has this problem?

-Andi (who finds hpa's "timing theory" to be more believable anyways)

-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: LOCK prefix on uni processor has its use
  2009-05-27 18:08                   ` LOCK prefix on uni processor has its use Andi Kleen
@ 2009-05-27 18:22                     ` Michael S. Zick
  2009-05-27 18:33                       ` Michael S. Zick
  2009-05-27 18:38                       ` Andi Kleen
  2009-06-02 12:48                     ` Harald Welte
  1 sibling, 2 replies; 90+ messages in thread
From: Michael S. Zick @ 2009-05-27 18:22 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Harald Welte, H. Peter Anvin, Ingo Molnar, Thomas Gleixner,
	linux-kernel, Alan Cox

On Wed May 27 2009, Andi Kleen wrote:
> Harald Welte <HaraldWelte@viatech.com> writes:
> > * All X86 instructions except rep-strings are atomic wrt interrupts.
> > * The lock prefix has uses on a UP processor: It keeps DMA devices from
> >   interfering with a read-modify-write sequence
> 
> In theory yes, but not in Linux -- normal drivers simply don't use LOCK in any way
> on a UP kernel.
> 
> We discussed exactly this in the earlier subthread :)
> 
> > Now the question is: Is this a valid operation of a driver?  Should the driver
> > do such things, or is such a driver broken? 
> 
> The driver is broken because if it relies on this it will not work on a UP kernel.
> Also it's not portable and in general a bad idea.
> 
> > When would that occur?  I'm trying
> > to come up with a case, but typically you e.g. allocate some DMA buffer and
> > then don't touch it until the hardware has processed it.
> 
> Is it known which driver has this problem?
> 
> -Andi (who finds hpa's "timing theory" to be more believable anyways)
> 

I still have not come up with a solid, testable, theory to explain the
order of magnitude in up-time before the kernel locks with/with-out 'lock'.

But we are definitely pecking around the edges of the problem.  ;)

Today's lockdep build has just passed its previous record by hard-coding
the pci cache line size to be the same as the cpu's cache line size. (a WAFG).
Until we hear back from the VIA-CPU people, I just guessed that since the
chip set was designed for use with the processor...

Mike

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: LOCK prefix on uni processor has its use
  2009-05-27 18:22                     ` Michael S. Zick
@ 2009-05-27 18:33                       ` Michael S. Zick
  2009-05-27 18:55                         ` Michael S. Zick
  2009-05-27 18:38                       ` Andi Kleen
  1 sibling, 1 reply; 90+ messages in thread
From: Michael S. Zick @ 2009-05-27 18:33 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Harald Welte, H. Peter Anvin, Ingo Molnar, Thomas Gleixner,
	linux-kernel, Alan Cox

On Wed May 27 2009, Michael S. Zick wrote:
> On Wed May 27 2009, Andi Kleen wrote:
> > Harald Welte <HaraldWelte@viatech.com> writes:
> > > * All X86 instructions except rep-strings are atomic wrt interrupts.
> > > * The lock prefix has uses on a UP processor: It keeps DMA devices from
> > >   interfering with a read-modify-write sequence
> > 
> > In theory yes, but not in Linux -- normal drivers simply don't use LOCK in any way
> > on a UP kernel.
> > 
> > We discussed exactly this in the earlier subthread :)
> > 
> > > Now the question is: Is this a valid operation of a driver?  Should the driver
> > > do such things, or is such a driver broken? 
> > 
> > The driver is broken because if it relies on this it will not work on a UP kernel.
> > Also it's not portable and in general a bad idea.
> > 
> > > When would that occur?  I'm trying
> > > to come up with a case, but typically you e.g. allocate some DMA buffer and
> > > then don't touch it until the hardware has processed it.
> > 
> > Is it known which driver has this problem?
> > 
> > -Andi (who finds hpa's "timing theory" to be more believable anyways)
> > 
> 
> I still have not come up with a solid, testable, theory to explain the
> order of magnitude in up-time before the kernel locks with/with-out 'lock'.
> 
> But we are definitely pecking around the edges of the problem.  ;)
> 
> Today's lockdep build has just passed its previous record by hard-coding
> the pci cache line size to be the same as the cpu's cache line size. (a WAFG).
> Until we hear back from the VIA-CPU people, I just guessed that since the
> chip set was designed for use with the processor...
> 

Ah, so - some information - - - 
(caused by un-plug/re-plug usb mouse while ehci-hcd was caught in its failure
reporting loop)

ehci_hcd 0000:00:10.4: port 6 resume error -19
hub 1-0:1.0: hub_port_status failed (err = -32)
hub 1-0:1.0: connect-debounce failed, port 6 disabled
hub 1-0:1.0: over-current change on port 1
ehci_hcd 0000:00:10.4: HC died; cleaning up
irq 23: nobody cared (try booting with the "irqpoll" option)
Pid: 2277, comm: syslogd Not tainted 2.6.30-rc7-ce1200v-09147lk-db #29
Call Trace:
 [<c015de14>] ? __report_bad_irq+0x24/0x90
 [<c015dfc5>] ? note_interrupt+0x145/0x180
 [<c015e39f>] ? handle_fasteoi_irq+0xaf/0xe0
 [<c0104eb7>] ? handle_irq+0x17/0x20
 [<c0104daa>] ? do_IRQ+0x3a/0xa0
 [<c0145a8b>] ? trace_hardirqs_on_caller+0x6b/0x170
 [<c01034ae>] ? common_interrupt+0x2e/0x34
 [<c0126082>] ? __do_softirq+0x42/0x110
 [<c0141294>] ? tick_program_event+0x14/0x20
 [<c01384cc>] ? hrtimer_interrupt+0xcc/0x1d0
 [<c0126195>] ? do_softirq+0x45/0x50
 [<c01264aa>] ? irq_exit+0x6a/0x80
 [<c0111109>] ? smp_apic_timer_interrupt+0x49/0x80
 [<c0103517>] ? apic_timer_interrupt+0x2f/0x34
 [<c014799e>] ? lock_acquire+0x8e/0xa0
 [<c01f4b8a>] ? start_this_handle+0x6a/0x3c0
 [<c05307bd>] ? _spin_lock+0x3d/0x70
 [<c01f4b8a>] ? start_this_handle+0x6a/0x3c0
 [<c01f4b8a>] ? start_this_handle+0x6a/0x3c0
 [<c018b598>] ? kmem_cache_alloc+0x98/0x100
 [<c0143c06>] ? lockdep_init_map+0x46/0x130
 [<c01f4f80>] ? journal_start+0xa0/0x100
 [<c0163780>] ? grab_cache_page_write_begin+0x30/0xc0
 [<c0138fb4>] ? up_read+0x14/0x30
 [<c01da348>] ? ext3_write_begin+0x98/0x200
 [<c0163a48>] ? generic_file_buffered_write+0x108/0x300
 [<c0125943>] ? current_fs_time+0x13/0x20
 [<c016526a>] ? __generic_file_aio_write_nolock+0x24a/0x550
 [<c052f520>] ? __mutex_lock_common+0x2f0/0x3f0
 [<c01655b9>] ? generic_file_aio_write+0x49/0xd0
 [<c01655ce>] ? generic_file_aio_write+0x5e/0xd0
 [<c0146359>] ? validate_chain+0xe9/0x1000
 [<c01d8680>] ? ext3_file_write+0x30/0xc0
 [<c01d8650>] ? ext3_file_write+0x0/0xc0
 [<c018e47f>] ? do_sync_readv_writev+0xbf/0x100
 [<c0144dae>] ? lock_release_holdtime+0x6e/0xf0
 [<c0135230>] ? autoremove_wake_function+0x0/0x50
 [<c017606f>] ? might_fault+0x4f/0xa0
 [<c0225c3c>] ? security_file_permission+0xc/0x10
 [<c018e746>] ? rw_verify_area+0x66/0xd0
 [<c018e30e>] ? rw_copy_check_uvector+0x7e/0x100
 [<c018f30a>] ? do_readv_writev+0xaa/0x190
 [<c01d8650>] ? ext3_file_write+0x0/0xc0
 [<c018f42c>] ? vfs_writev+0x3c/0x50
 [<c018f527>] ? sys_writev+0x47/0x80
 [<c0102e08>] ? sysenter_do_call+0x12/0x36
handlers:
[<c035d480>] (usb_hcd_irq+0x0/0x90)
Disabling IRQ #23
hub 1-0:1.0: hub_port_status failed (err = -19)
hub 1-0:1.0: connect-debounce failed, port 1 disabled
hub 1-0:1.0: cannot disable port 1 (err = -19)
hub 1-0:1.0: hub_port_status failed (err = -19)
hub 1-0:1.0: hub_port_status failed (err = -19)
hub 1-0:1.0: hub_port_status failed (err = -19)
hub 1-0:1.0: hub_port_status failed (err = -19)
hub 1-0:1.0: hub_port_status failed (err = -19)
usb 1-5: USB disconnect, address 3
ehci_hcd 0000:00:10.4: force halt; handhake dc724014 00004000 00004000 -> -19

=================================
[ INFO: inconsistent lock state ]
2.6.30-rc7-ce1200v-09147lk-db #29
---------------------------------
inconsistent {IN-HARDIRQ-W} -> {HARDIRQ-ON-W} usage.
hd-audio0/51 [HC0[0]:SC1[1]:HE1:SE0] takes:
 (&irq_desc_lock_class){?.-...}, at: [<c015dc81>] try_one_irq+0x21/0x130
{IN-HARDIRQ-W} state was registered at:
  [<ffffffff>] 0xffffffff
irq event stamp: 95629480
hardirqs last  enabled at (95629480): [<c05310a0>] _spin_unlock_irq+0x20/0x40
hardirqs last disabled at (95629479): [<c053086d>] _spin_lock_irq+0xd/0x70
softirqs last  enabled at (95626922): [<c0126195>] do_softirq+0x45/0x50
softirqs last disabled at (95629475): [<c0126195>] do_softirq+0x45/0x50

other info that might help us debug this:
3 locks held by hd-audio0/51:
 #0:  ((bus->workq_name)){+.+...}, at: [<c0131912>] worker_thread+0x192/0x2d0
 #1:  (&chip->irq_pending_work){+.+...}, at: [<c0131912>] worker_thread+0x192/0x2d0
 #2:  (kernel/irq/spurious.c:21){+.-...}, at: [<c012a4d0>] run_timer_softirq+0xe0/0x1f0

stack backtrace:
Pid: 51, comm: hd-audio0 Not tainted 2.6.30-rc7-ce1200v-09147lk-db #29
Call Trace:
 [<c0145254>] ? print_usage_bug+0x174/0x1c0
 [<c014583b>] ? mark_lock+0x59b/0x5d0
 [<c01463b8>] ? validate_chain+0x148/0x1000
 [<c0144fc0>] ? check_usage_backwards+0x0/0x90
 [<c01474a7>] ? __lock_acquire+0x237/0x6a0
 [<c014798b>] ? lock_acquire+0x7b/0xa0
 [<c015dc81>] ? try_one_irq+0x21/0x130
 [<c05307bd>] ? _spin_lock+0x3d/0x70
 [<c015dc81>] ? try_one_irq+0x21/0x130
 [<c015dc81>] ? try_one_irq+0x21/0x130
 [<c015ddd3>] ? poll_spurious_irqs+0x43/0x60
 [<c012a55b>] ? run_timer_softirq+0x16b/0x1f0
 [<c012a4d0>] ? run_timer_softirq+0xe0/0x1f0
 [<c015dd90>] ? poll_spurious_irqs+0x0/0x60
 [<c01260a8>] ? __do_softirq+0x68/0x110
 [<c0141294>] ? tick_program_event+0x14/0x20
 [<c01384cc>] ? hrtimer_interrupt+0xcc/0x1d0
 [<c0126195>] ? do_softirq+0x45/0x50
 [<c01264aa>] ? irq_exit+0x6a/0x80
 [<c0111109>] ? smp_apic_timer_interrupt+0x49/0x80
 [<c0103517>] ? apic_timer_interrupt+0x2f/0x34
 [<c05310a6>] ? _spin_unlock_irq+0x26/0x40
 [<c0419422>] ? azx_irq_pending_work+0x92/0x120
 [<c0131912>] ? worker_thread+0x192/0x2d0
 [<c0419390>] ? azx_irq_pending_work+0x0/0x120
 [<c0131975>] ? worker_thread+0x1f5/0x2d0
 [<c0131912>] ? worker_thread+0x192/0x2d0
 [<c0135230>] ? autoremove_wake_function+0x0/0x50
 [<c0131780>] ? worker_thread+0x0/0x2d0
 [<c0134ee7>] ? kthread+0x47/0x80
 [<c0134ea0>] ? kthread+0x0/0x80
 [<c0103627>] ? kernel_thread_helper+0x7/0x10

Enjoy


^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: LOCK prefix on uni processor has its use
  2009-05-27 18:33                       ` Michael S. Zick
@ 2009-05-27 18:55                         ` Michael S. Zick
  0 siblings, 0 replies; 90+ messages in thread
From: Michael S. Zick @ 2009-05-27 18:55 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Harald Welte, H. Peter Anvin, Ingo Molnar, Thomas Gleixner,
	linux-kernel, Alan Cox

On Wed May 27 2009, Michael S. Zick wrote:
> On Wed May 27 2009, Michael S. Zick wrote:
> > On Wed May 27 2009, Andi Kleen wrote:
> > > Harald Welte <HaraldWelte@viatech.com> writes:
> > > > * All X86 instructions except rep-strings are atomic wrt interrupts.
> > > > * The lock prefix has uses on a UP processor: It keeps DMA devices from
> > > >   interfering with a read-modify-write sequence
> > > 
> > > In theory yes, but not in Linux -- normal drivers simply don't use LOCK in any way
> > > on a UP kernel.
> > > 

Note that there are there spin_{lock,unlock} symbols in the stack trace.
This build has the 'lock' in LOCK_PREFIX.

Mike
> > > We discussed exactly this in the earlier subthread :)
> > > 
> > > > Now the question is: Is this a valid operation of a driver?  Should the driver
> > > > do such things, or is such a driver broken? 
> > > 
> > > The driver is broken because if it relies on this it will not work on a UP kernel.
> > > Also it's not portable and in general a bad idea.
> > > 
> > > > When would that occur?  I'm trying
> > > > to come up with a case, but typically you e.g. allocate some DMA buffer and
> > > > then don't touch it until the hardware has processed it.
> > > 
> > > Is it known which driver has this problem?
> > > 
> > > -Andi (who finds hpa's "timing theory" to be more believable anyways)
> > > 
> > 
> > I still have not come up with a solid, testable, theory to explain the
> > order of magnitude in up-time before the kernel locks with/with-out 'lock'.
> > 
> > But we are definitely pecking around the edges of the problem.  ;)
> > 
> > Today's lockdep build has just passed its previous record by hard-coding
> > the pci cache line size to be the same as the cpu's cache line size. (a WAFG).
> > Until we hear back from the VIA-CPU people, I just guessed that since the
> > chip set was designed for use with the processor...
> > 
> 
> Ah, so - some information - - - 
> (caused by un-plug/re-plug usb mouse while ehci-hcd was caught in its failure
> reporting loop)
> 
> ehci_hcd 0000:00:10.4: port 6 resume error -19
> hub 1-0:1.0: hub_port_status failed (err = -32)
> hub 1-0:1.0: connect-debounce failed, port 6 disabled
> hub 1-0:1.0: over-current change on port 1
> ehci_hcd 0000:00:10.4: HC died; cleaning up
> irq 23: nobody cared (try booting with the "irqpoll" option)
> Pid: 2277, comm: syslogd Not tainted 2.6.30-rc7-ce1200v-09147lk-db #29
> Call Trace:
>  [<c015de14>] ? __report_bad_irq+0x24/0x90
>  [<c015dfc5>] ? note_interrupt+0x145/0x180
>  [<c015e39f>] ? handle_fasteoi_irq+0xaf/0xe0
>  [<c0104eb7>] ? handle_irq+0x17/0x20
>  [<c0104daa>] ? do_IRQ+0x3a/0xa0
>  [<c0145a8b>] ? trace_hardirqs_on_caller+0x6b/0x170
>  [<c01034ae>] ? common_interrupt+0x2e/0x34
>  [<c0126082>] ? __do_softirq+0x42/0x110
>  [<c0141294>] ? tick_program_event+0x14/0x20
>  [<c01384cc>] ? hrtimer_interrupt+0xcc/0x1d0
>  [<c0126195>] ? do_softirq+0x45/0x50
>  [<c01264aa>] ? irq_exit+0x6a/0x80
>  [<c0111109>] ? smp_apic_timer_interrupt+0x49/0x80
>  [<c0103517>] ? apic_timer_interrupt+0x2f/0x34
>  [<c014799e>] ? lock_acquire+0x8e/0xa0
>  [<c01f4b8a>] ? start_this_handle+0x6a/0x3c0
>  [<c05307bd>] ? _spin_lock+0x3d/0x70
>  [<c01f4b8a>] ? start_this_handle+0x6a/0x3c0
>  [<c01f4b8a>] ? start_this_handle+0x6a/0x3c0
>  [<c018b598>] ? kmem_cache_alloc+0x98/0x100
>  [<c0143c06>] ? lockdep_init_map+0x46/0x130
>  [<c01f4f80>] ? journal_start+0xa0/0x100
>  [<c0163780>] ? grab_cache_page_write_begin+0x30/0xc0
>  [<c0138fb4>] ? up_read+0x14/0x30
>  [<c01da348>] ? ext3_write_begin+0x98/0x200
>  [<c0163a48>] ? generic_file_buffered_write+0x108/0x300
>  [<c0125943>] ? current_fs_time+0x13/0x20
>  [<c016526a>] ? __generic_file_aio_write_nolock+0x24a/0x550
>  [<c052f520>] ? __mutex_lock_common+0x2f0/0x3f0
>  [<c01655b9>] ? generic_file_aio_write+0x49/0xd0
>  [<c01655ce>] ? generic_file_aio_write+0x5e/0xd0
>  [<c0146359>] ? validate_chain+0xe9/0x1000
>  [<c01d8680>] ? ext3_file_write+0x30/0xc0
>  [<c01d8650>] ? ext3_file_write+0x0/0xc0
>  [<c018e47f>] ? do_sync_readv_writev+0xbf/0x100
>  [<c0144dae>] ? lock_release_holdtime+0x6e/0xf0
>  [<c0135230>] ? autoremove_wake_function+0x0/0x50
>  [<c017606f>] ? might_fault+0x4f/0xa0
>  [<c0225c3c>] ? security_file_permission+0xc/0x10
>  [<c018e746>] ? rw_verify_area+0x66/0xd0
>  [<c018e30e>] ? rw_copy_check_uvector+0x7e/0x100
>  [<c018f30a>] ? do_readv_writev+0xaa/0x190
>  [<c01d8650>] ? ext3_file_write+0x0/0xc0
>  [<c018f42c>] ? vfs_writev+0x3c/0x50
>  [<c018f527>] ? sys_writev+0x47/0x80
>  [<c0102e08>] ? sysenter_do_call+0x12/0x36
> handlers:
> [<c035d480>] (usb_hcd_irq+0x0/0x90)
> Disabling IRQ #23
> hub 1-0:1.0: hub_port_status failed (err = -19)
> hub 1-0:1.0: connect-debounce failed, port 1 disabled
> hub 1-0:1.0: cannot disable port 1 (err = -19)
> hub 1-0:1.0: hub_port_status failed (err = -19)
> hub 1-0:1.0: hub_port_status failed (err = -19)
> hub 1-0:1.0: hub_port_status failed (err = -19)
> hub 1-0:1.0: hub_port_status failed (err = -19)
> hub 1-0:1.0: hub_port_status failed (err = -19)
> usb 1-5: USB disconnect, address 3
> ehci_hcd 0000:00:10.4: force halt; handhake dc724014 00004000 00004000 -> -19
> 
> =================================
> [ INFO: inconsistent lock state ]
> 2.6.30-rc7-ce1200v-09147lk-db #29
> ---------------------------------
> inconsistent {IN-HARDIRQ-W} -> {HARDIRQ-ON-W} usage.
> hd-audio0/51 [HC0[0]:SC1[1]:HE1:SE0] takes:
>  (&irq_desc_lock_class){?.-...}, at: [<c015dc81>] try_one_irq+0x21/0x130
> {IN-HARDIRQ-W} state was registered at:
>   [<ffffffff>] 0xffffffff
> irq event stamp: 95629480
> hardirqs last  enabled at (95629480): [<c05310a0>] _spin_unlock_irq+0x20/0x40
> hardirqs last disabled at (95629479): [<c053086d>] _spin_lock_irq+0xd/0x70
> softirqs last  enabled at (95626922): [<c0126195>] do_softirq+0x45/0x50
> softirqs last disabled at (95629475): [<c0126195>] do_softirq+0x45/0x50
> 
> other info that might help us debug this:
> 3 locks held by hd-audio0/51:
>  #0:  ((bus->workq_name)){+.+...}, at: [<c0131912>] worker_thread+0x192/0x2d0
>  #1:  (&chip->irq_pending_work){+.+...}, at: [<c0131912>] worker_thread+0x192/0x2d0
>  #2:  (kernel/irq/spurious.c:21){+.-...}, at: [<c012a4d0>] run_timer_softirq+0xe0/0x1f0
> 
> stack backtrace:
> Pid: 51, comm: hd-audio0 Not tainted 2.6.30-rc7-ce1200v-09147lk-db #29
> Call Trace:
>  [<c0145254>] ? print_usage_bug+0x174/0x1c0
>  [<c014583b>] ? mark_lock+0x59b/0x5d0
>  [<c01463b8>] ? validate_chain+0x148/0x1000
>  [<c0144fc0>] ? check_usage_backwards+0x0/0x90
>  [<c01474a7>] ? __lock_acquire+0x237/0x6a0
>  [<c014798b>] ? lock_acquire+0x7b/0xa0
>  [<c015dc81>] ? try_one_irq+0x21/0x130
>  [<c05307bd>] ? _spin_lock+0x3d/0x70
>  [<c015dc81>] ? try_one_irq+0x21/0x130
>  [<c015dc81>] ? try_one_irq+0x21/0x130
>  [<c015ddd3>] ? poll_spurious_irqs+0x43/0x60
>  [<c012a55b>] ? run_timer_softirq+0x16b/0x1f0
>  [<c012a4d0>] ? run_timer_softirq+0xe0/0x1f0
>  [<c015dd90>] ? poll_spurious_irqs+0x0/0x60
>  [<c01260a8>] ? __do_softirq+0x68/0x110
>  [<c0141294>] ? tick_program_event+0x14/0x20
>  [<c01384cc>] ? hrtimer_interrupt+0xcc/0x1d0
>  [<c0126195>] ? do_softirq+0x45/0x50
>  [<c01264aa>] ? irq_exit+0x6a/0x80
>  [<c0111109>] ? smp_apic_timer_interrupt+0x49/0x80
>  [<c0103517>] ? apic_timer_interrupt+0x2f/0x34
>  [<c05310a6>] ? _spin_unlock_irq+0x26/0x40
>  [<c0419422>] ? azx_irq_pending_work+0x92/0x120
>  [<c0131912>] ? worker_thread+0x192/0x2d0
>  [<c0419390>] ? azx_irq_pending_work+0x0/0x120
>  [<c0131975>] ? worker_thread+0x1f5/0x2d0
>  [<c0131912>] ? worker_thread+0x192/0x2d0
>  [<c0135230>] ? autoremove_wake_function+0x0/0x50
>  [<c0131780>] ? worker_thread+0x0/0x2d0
>  [<c0134ee7>] ? kthread+0x47/0x80
>  [<c0134ea0>] ? kthread+0x0/0x80
>  [<c0103627>] ? kernel_thread_helper+0x7/0x10
> 
> Enjoy
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 
> 



^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: LOCK prefix on uni processor has its use
  2009-05-27 18:22                     ` Michael S. Zick
  2009-05-27 18:33                       ` Michael S. Zick
@ 2009-05-27 18:38                       ` Andi Kleen
  1 sibling, 0 replies; 90+ messages in thread
From: Andi Kleen @ 2009-05-27 18:38 UTC (permalink / raw)
  To: Michael S. Zick
  Cc: Andi Kleen, Harald Welte, H. Peter Anvin, Ingo Molnar,
	Thomas Gleixner, linux-kernel, Alan Cox

> I still have not come up with a solid, testable, theory to explain the
> order of magnitude in up-time before the kernel locks with/with-out 'lock'.

What I would do is to try to track down in which file it happens.
Compile individual subdirectories of the kernel with LOCK prefix,
then down to files.

Also always double check the results.

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: LOCK prefix on uni processor has its use
  2009-05-27 18:08                   ` LOCK prefix on uni processor has its use Andi Kleen
  2009-05-27 18:22                     ` Michael S. Zick
@ 2009-06-02 12:48                     ` Harald Welte
  2009-06-02 13:03                       ` Andi Kleen
  1 sibling, 1 reply; 90+ messages in thread
From: Harald Welte @ 2009-06-02 12:48 UTC (permalink / raw)
  To: Andi Kleen
  Cc: H. Peter Anvin, lkml, Ingo Molnar, Thomas Gleixner, linux-kernel,
	Alan Cox

On Wed, May 27, 2009 at 08:08:27PM +0200, Andi Kleen wrote:
> Harald Welte <HaraldWelte@viatech.com> writes:
> > * All X86 instructions except rep-strings are atomic wrt interrupts.
> > * The lock prefix has uses on a UP processor: It keeps DMA devices from
> >   interfering with a read-modify-write sequence
> 
> In theory yes, but not in Linux -- normal drivers simply don't use LOCK in
> any way on a UP kernel.

well, they might have inadvertedly used LOCK as part of regular spinlocks,
until LOCK_PREFIX was removed, right?

> > Now the question is: Is this a valid operation of a driver?  Should the driver
> > do such things, or is such a driver broken? 
> 
> The driver is broken because if it relies on this it will not work on a UP kernel.
> Also it's not portable and in general a bad idea.

I agree.  I was not referring to any real/known driver.  I was just trying to
figure out what kind of problem the VIA/Centaur CPU guys tried to describe when
indicating that the LOCK prefix should be used on UP to avoid DMA interfering
with read-modify-write CPU instructions.

-- 
- Harald Welte <HaraldWelte@viatech.com>	    http://linux.via.com.tw/
============================================================================
VIA Free and Open Source Software Liaison

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: LOCK prefix on uni processor has its use
  2009-06-02 12:48                     ` Harald Welte
@ 2009-06-02 13:03                       ` Andi Kleen
  2009-06-02 13:26                         ` Michael S. Zick
  0 siblings, 1 reply; 90+ messages in thread
From: Andi Kleen @ 2009-06-02 13:03 UTC (permalink / raw)
  To: Harald Welte
  Cc: Andi Kleen, H. Peter Anvin, lkml, Ingo Molnar, Thomas Gleixner,
	linux-kernel, Alan Cox

On Tue, Jun 02, 2009 at 02:48:54PM +0200, Harald Welte wrote:
> On Wed, May 27, 2009 at 08:08:27PM +0200, Andi Kleen wrote:
> > Harald Welte <HaraldWelte@viatech.com> writes:
> > > * All X86 instructions except rep-strings are atomic wrt interrupts.
> > > * The lock prefix has uses on a UP processor: It keeps DMA devices from
> > >   interfering with a read-modify-write sequence
> > 
> > In theory yes, but not in Linux -- normal drivers simply don't use LOCK in
> > any way on a UP kernel.
> 
> well, they might have inadvertedly used LOCK as part of regular spinlocks,
> until LOCK_PREFIX was removed, right?

LOCK_PREFIX was always defined away on UP kernels. That dates back
to the initial Linux 2.0 SMP implementation. 

On newer SMP kernels they also patch away the lock prefix even
if they are running UP, so if you only have a single core you'll
never get lock.

So I think it's pretty unlikely any driver relied on this.

There are some special bit functions that always have LOCK, but these
are only used by the Xen drivers afaik (that is needed when a UP
kernel talks to a SMP hypervisor over shared memory)

> I agree.  I was not referring to any real/known driver.  I was just trying to
> figure out what kind of problem the VIA/Centaur CPU guys tried to describe when
> indicating that the LOCK prefix should be used on UP to avoid DMA interfering
> with read-modify-write CPU instructions.

It locks the cache line. That's a valid case in the x86 architecture,
it's just that the Linux driver model doesn't use it.

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: LOCK prefix on uni processor has its use
  2009-06-02 13:03                       ` Andi Kleen
@ 2009-06-02 13:26                         ` Michael S. Zick
  2009-06-02 13:42                           ` Andi Kleen
  0 siblings, 1 reply; 90+ messages in thread
From: Michael S. Zick @ 2009-06-02 13:26 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Harald Welte, H. Peter Anvin, Ingo Molnar, Thomas Gleixner,
	linux-kernel, Alan Cox

On Tue June 2 2009, Andi Kleen wrote:
> On Tue, Jun 02, 2009 at 02:48:54PM +0200, Harald Welte wrote:
> > On Wed, May 27, 2009 at 08:08:27PM +0200, Andi Kleen wrote:
> > > Harald Welte <HaraldWelte@viatech.com> writes:
> > > > * All X86 instructions except rep-strings are atomic wrt interrupts.
> > > > * The lock prefix has uses on a UP processor: It keeps DMA devices from
> > > >   interfering with a read-modify-write sequence
> > > 
> > > In theory yes, but not in Linux -- normal drivers simply don't use LOCK in
> > > any way on a UP kernel.
> > 
> > well, they might have inadvertedly used LOCK as part of regular spinlocks,
> > until LOCK_PREFIX was removed, right?
> 
> LOCK_PREFIX was always defined away on UP kernels. That dates back
> to the initial Linux 2.0 SMP implementation. 
> 
> On newer SMP kernels they also patch away the lock prefix even
> if they are running UP, so if you only have a single core you'll
> never get lock.
> 

After another week of chasing this - -
My favorite theory is still: "human coding error" - somewhere.

The LOCK_PREFIX is used or not used or mis-used by something.

My second favorite theory (related to the "some sort of timing
problem" suggestion:

Another difference is FSB speed on the two machines -
The "trouble free" case is twice as fast as the "problem" case.

Such a thing should be totally transparent to the kernel, but...
we do have humans writing the code.  ;)

> So I think it's pretty unlikely any driver relied on this.
> 

The kernel assumes I/O coherency, but perhaps something is
breaking that assumption.  Not by intent, but by oversight.

I posed a couple of questions to H.W. off list to pass on to
the silicon grower's department.  Will see what they recommend.

At the moment, I am stuck with brute-force code reading.
Nothing very elegant going on here.

Mike

> There are some special bit functions that always have LOCK, but these
> are only used by the Xen drivers afaik (that is needed when a UP
> kernel talks to a SMP hypervisor over shared memory)
> 
> > I agree.  I was not referring to any real/known driver.  I was just trying to
> > figure out what kind of problem the VIA/Centaur CPU guys tried to describe when
> > indicating that the LOCK prefix should be used on UP to avoid DMA interfering
> > with read-modify-write CPU instructions.
> 
> It locks the cache line. That's a valid case in the x86 architecture,
> it's just that the Linux driver model doesn't use it.
> 
> -Andi
> 



^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: LOCK prefix on uni processor has its use
  2009-06-02 13:26                         ` Michael S. Zick
@ 2009-06-02 13:42                           ` Andi Kleen
  2009-06-03 11:46                             ` Michael S. Zick
  0 siblings, 1 reply; 90+ messages in thread
From: Andi Kleen @ 2009-06-02 13:42 UTC (permalink / raw)
  To: Michael S. Zick
  Cc: Andi Kleen, Harald Welte, H. Peter Anvin, Ingo Molnar,
	Thomas Gleixner, linux-kernel, Alan Cox

> After another week of chasing this - -

Did you use the "compile part of the kernel with LOCK and others without"
technique I described earlier?

-Andi
-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: LOCK prefix on uni processor has its use
  2009-06-02 13:42                           ` Andi Kleen
@ 2009-06-03 11:46                             ` Michael S. Zick
  0 siblings, 0 replies; 90+ messages in thread
From: Michael S. Zick @ 2009-06-03 11:46 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Harald Welte, H. Peter Anvin, Ingo Molnar, Thomas Gleixner,
	linux-kernel, Alan Cox

On Tue June 2 2009, Andi Kleen wrote:
> > After another week of chasing this - -
> 
> Did you use the "compile part of the kernel with LOCK and others without"
> technique I described earlier?
> 

That would only help if it where a single point failure.

Although there are some assembly language things that can
be done to help in finding what to examine, like:

#define LOCK_PREFIX     "\n### Lock pre-fix removed:\n\t"

Or whatever might help your favorite text search program.

Which yields asm expansion in your *.s file (gcc -S) as:

#APP
# 33 "test_bytelock.c" 1

1:      xchgb %ah, %al
        test %al,%al
        jz 3f

### Lock pre-fix removed:
        incb splock+1
2:      xchgw %ax, %ax
        cmpb $1, splock
        je 2b

### Lock pre-fix removed:
        decb splock+1
        jmp 1b
3:
# 0 "" 2
#NO_APP

Note: For the readers not familar with (g)as;
#APP -> Assembler Pre-Process (gcc generated)
<ragged whitespace and comments allowed>
#NO_APP -> No Assembler Pre-Process (gcc generated)

If ambitious, you can add a comment to each asm-macro
to note the line and source filename of where it is
defined. (the line number and name gcc put there is
where it was expanded, not where it was defined).

Not really too ambitious - there are only 140 files
of interest (with asm-macros) in a x86, uni-processor build.

Mike

> -Andi

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: LOCK prefix on uni processor has its use (was Re: [BUG FIX] Make x86_32 uni-processor Atomic ops, Atomic)
  2009-05-27 17:01                 ` LOCK prefix on uni processor has its use (was Re: [BUG FIX] Make x86_32 uni-processor Atomic ops, Atomic) Harald Welte
                                     ` (2 preceding siblings ...)
  2009-05-27 18:08                   ` LOCK prefix on uni processor has its use Andi Kleen
@ 2009-05-28  2:56                   ` H. Peter Anvin
  3 siblings, 0 replies; 90+ messages in thread
From: H. Peter Anvin @ 2009-05-28  2:56 UTC (permalink / raw)
  To: Harald Welte; +Cc: lkml, Ingo Molnar, Thomas Gleixner, linux-kernel, Alan Cox

Harald Welte wrote:
> * A read-modify-write sequence cannot be interupted.
> * All X86 instructions except rep-strings are atomic wrt interrupts.
> * The lock prefix has uses on a UP processor: It keeps DMA devices from
>   interfering with a read-modify-write sequence

Correct.

> Now the question is: Is this a valid operation of a driver?  Should the driver
> do such things, or is such a driver broken?  When would that occur?  I'm trying
> to come up with a case, but typically you e.g. allocate some DMA buffer and
> then don't touch it until the hardware has processed it.

The Linux driver model does not permit this as a *lot* of hardware
doesn't support this correctly, and even on x86 there are lots of
chipset bugs in this regard.  It is of course possible to write x86-only
drivers that would do this anyway, but those should not use LOCK_PREFIX
instructions.

	-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.


^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [BUG FIX] Make x86_32 uni-processor Atomic ops, Atomic
  2009-05-23  0:51           ` H. Peter Anvin
                               ` (2 preceding siblings ...)
  2009-05-23 18:04             ` Michael S. Zick
@ 2009-05-23 20:51             ` Michael S. Zick
  3 siblings, 0 replies; 90+ messages in thread
From: Michael S. Zick @ 2009-05-23 20:51 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: Ingo Molnar, Thomas Gleixner, linux-kernel

On Fri May 22 2009, H. Peter Anvin wrote:
> Michael S. Zick wrote:
> > Same integrated motherboard.
> 
> Which means same CPU, same BIOS, same motherboard (none of which you're
> telling us.)
> 

HP-2133 (C7-M/CN896) - 09143 - No results - Still up - 6 hours.
A "personal best" for 2.6.30 on VIA hardware.

Cloudbook (C7-M/CX700) - 09143 - 45 minutes.
Cloudbook (C7-M/CX700) - 09143lk - Partial results - Still up - 4 hours.

Sometime recently, the echi (USB-2.0) driver went into its failure loop
but the kernel lived, and the music plays on (less the external mouse).
I think I will put it out of its misery now and take some time off myself.

Mike
> cpuinfo and dmidecode would be informative.
> 
> 	-hpa
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 
> 



^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [BUG FIX] Make x86_32 uni-processor Atomic ops, Atomic
  2009-05-22 22:21     ` Michael S. Zick
  2009-05-22 23:30       ` H. Peter Anvin
@ 2009-05-28 12:48       ` Pavel Machek
  2009-05-28 13:29         ` Michael S. Zick
  1 sibling, 1 reply; 90+ messages in thread
From: Pavel Machek @ 2009-05-28 12:48 UTC (permalink / raw)
  To: Michael S. Zick
  Cc: H. Peter Anvin, Ingo Molnar, Thomas Gleixner, linux-kernel

Hi!

> The observation that executing an unnecessary 'lock' opcode in some
> cases slows down the machine is not felt by myself to be significant 
> to duplicating my observations.  Note: I have been wrong before.
> 
> This is as informative as I can make the message.
> 
> PS: *not* a single machine failure, tested on five machines, owned
> by four different people, two brands, with different use histories.

I have seen some problems on via c7m based machines, where some 'smart
bios person' implemented EC access in AML (normally, it is accessed
from ec.c driver). Maybe you have similary bad bios?

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [BUG FIX] Make x86_32 uni-processor Atomic ops, Atomic
  2009-05-28 12:48       ` Pavel Machek
@ 2009-05-28 13:29         ` Michael S. Zick
  2009-05-28 20:50           ` Pavel Machek
  0 siblings, 1 reply; 90+ messages in thread
From: Michael S. Zick @ 2009-05-28 13:29 UTC (permalink / raw)
  To: Pavel Machek; +Cc: H. Peter Anvin, Ingo Molnar, Thomas Gleixner, linux-kernel

On Thu May 28 2009, Pavel Machek wrote:
> Hi!
> 
> > The observation that executing an unnecessary 'lock' opcode in some
> > cases slows down the machine is not felt by myself to be significant 
> > to duplicating my observations.  Note: I have been wrong before.
> > 
> > This is as informative as I can make the message.
> > 
> > PS: *not* a single machine failure, tested on five machines, owned
> > by four different people, two brands, with different use histories.
> 
> I have seen some problems on via c7m based machines, where some 'smart
> bios person' implemented EC access in AML (normally, it is accessed
> from ec.c driver). Maybe you have similary bad bios?
> 

How to tell or distingush?
Did your looking at the dmidecode output show you that?

Mike

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [BUG FIX] Make x86_32 uni-processor Atomic ops, Atomic
  2009-05-28 13:29         ` Michael S. Zick
@ 2009-05-28 20:50           ` Pavel Machek
  2009-05-28 20:58             ` Michael S. Zick
  0 siblings, 1 reply; 90+ messages in thread
From: Pavel Machek @ 2009-05-28 20:50 UTC (permalink / raw)
  To: Michael S. Zick
  Cc: H. Peter Anvin, Ingo Molnar, Thomas Gleixner, linux-kernel

On Thu 2009-05-28 08:29:13, Michael S. Zick wrote:
> On Thu May 28 2009, Pavel Machek wrote:
> > Hi!
> > 
> > > The observation that executing an unnecessary 'lock' opcode in some
> > > cases slows down the machine is not felt by myself to be significant 
> > > to duplicating my observations.  Note: I have been wrong before.
> > > 
> > > This is as informative as I can make the message.
> > > 
> > > PS: *not* a single machine failure, tested on five machines, owned
> > > by four different people, two brands, with different use histories.
> > 
> > I have seen some problems on via c7m based machines, where some 'smart
> > bios person' implemented EC access in AML (normally, it is accessed
> > from ec.c driver). Maybe you have similary bad bios?
> > 
> 
> How to tell or distingush?
> Did your looking at the dmidecode output show you that?

Disassemble DSDT, and if you see strange code duplicating kernel's
ec.c driver, you have similar problem... 
									Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [BUG FIX] Make x86_32 uni-processor Atomic ops, Atomic
  2009-05-28 20:50           ` Pavel Machek
@ 2009-05-28 20:58             ` Michael S. Zick
  2009-05-28 21:16               ` Pavel Machek
  0 siblings, 1 reply; 90+ messages in thread
From: Michael S. Zick @ 2009-05-28 20:58 UTC (permalink / raw)
  To: Pavel Machek; +Cc: H. Peter Anvin, Ingo Molnar, Thomas Gleixner, linux-kernel

On Thu May 28 2009, Pavel Machek wrote:
> On Thu 2009-05-28 08:29:13, Michael S. Zick wrote:
> > On Thu May 28 2009, Pavel Machek wrote:
> > > Hi!
> > > 
> > > > The observation that executing an unnecessary 'lock' opcode in some
> > > > cases slows down the machine is not felt by myself to be significant 
> > > > to duplicating my observations.  Note: I have been wrong before.
> > > > 
> > > > This is as informative as I can make the message.
> > > > 
> > > > PS: *not* a single machine failure, tested on five machines, owned
> > > > by four different people, two brands, with different use histories.
> > > 
> > > I have seen some problems on via c7m based machines, where some 'smart
> > > bios person' implemented EC access in AML (normally, it is accessed
> > > from ec.c driver). Maybe you have similary bad bios?
> > > 
> > 
> > How to tell or distingush?
> > Did your looking at the dmidecode output show you that?
> 
> Disassemble DSDT, and if you see strange code duplicating kernel's
> ec.c driver, you have similar problem... 

Someone did that but wasn't looking for "strange code" - just fixing
some entry size errors.
You can find the replacement DSDT here:
http://forum.netbookuser.com/viewtopic.php?pid=6512#p6512

(Which I am not using, since it mostly cosmetic.)

Mike
> 									Pavel



^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [BUG FIX] Make x86_32 uni-processor Atomic ops, Atomic
  2009-05-28 20:58             ` Michael S. Zick
@ 2009-05-28 21:16               ` Pavel Machek
  2009-05-28 21:21                 ` Michael S. Zick
  0 siblings, 1 reply; 90+ messages in thread
From: Pavel Machek @ 2009-05-28 21:16 UTC (permalink / raw)
  To: Michael S. Zick
  Cc: H. Peter Anvin, Ingo Molnar, Thomas Gleixner, linux-kernel

Hi!

> > > > I have seen some problems on via c7m based machines, where some 'smart
> > > > bios person' implemented EC access in AML (normally, it is accessed
> > > > from ec.c driver). Maybe you have similary bad bios?
> > > 
> > > How to tell or distingush?
> > > Did your looking at the dmidecode output show you that?
> > 
> > Disassemble DSDT, and if you see strange code duplicating kernel's
> > ec.c driver, you have similar problem... 
> 
> Someone did that but wasn't looking for "strange code" - just fixing
> some entry size errors.
> You can find the replacement DSDT here:
> http://forum.netbookuser.com/viewtopic.php?pid=6512#p6512
> 
> (Which I am not using, since it mostly cosmetic.)

Ok, it does not seem to have braindead EC implementation. The DSDT
does not look familiar, so it may be different issue. (Or it is same
issue and we were not able to debug it due to all the BIOS problems.)

									Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [BUG FIX] Make x86_32 uni-processor Atomic ops, Atomic
  2009-05-28 21:16               ` Pavel Machek
@ 2009-05-28 21:21                 ` Michael S. Zick
  0 siblings, 0 replies; 90+ messages in thread
From: Michael S. Zick @ 2009-05-28 21:21 UTC (permalink / raw)
  To: Pavel Machek; +Cc: H. Peter Anvin, Ingo Molnar, Thomas Gleixner, linux-kernel

On Thu May 28 2009, Pavel Machek wrote:
> Hi!
> 
> > > > > I have seen some problems on via c7m based machines, where some 'smart
> > > > > bios person' implemented EC access in AML (normally, it is accessed
> > > > > from ec.c driver). Maybe you have similary bad bios?
> > > > 
> > > > How to tell or distingush?
> > > > Did your looking at the dmidecode output show you that?
> > > 
> > > Disassemble DSDT, and if you see strange code duplicating kernel's
> > > ec.c driver, you have similar problem... 
> > 
> > Someone did that but wasn't looking for "strange code" - just fixing
> > some entry size errors.
> > You can find the replacement DSDT here:
> > http://forum.netbookuser.com/viewtopic.php?pid=6512#p6512
> > 
> > (Which I am not using, since it mostly cosmetic.)
> 
> Ok, it does not seem to have braindead EC implementation. The DSDT
> does not look familiar, so it may be different issue. (Or it is same
> issue and we were not able to debug it due to all the BIOS problems.)
> 

Thanks for taking a look, 
it would have meant nothing to me.

Mike
> 									Pavel



^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [BUG FIX] Make x86_32 uni-processor Atomic ops, Atomic
  2009-05-22 18:36 ` Ingo Molnar
  2009-05-22 18:59   ` H. Peter Anvin
@ 2009-05-22 19:17   ` Michael S. Zick
  1 sibling, 0 replies; 90+ messages in thread
From: Michael S. Zick @ 2009-05-22 19:17 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: linux-kernel

On Fri May 22 2009, you wrote:
> 
> * Michael S. Zick <lkml@morethan.org> wrote:
> 
> > Found in the bit-rot for 32-bit, x86, Uni-processor builds:
> > 
> > diff --git a/arch/x86/include/asm/alternative.h b/arch/x86/include/asm/alternative.h
> > index f6aa18e..3c790ef 100644
> > --- a/arch/x86/include/asm/alternative.h
> > +++ b/arch/x86/include/asm/alternative.h
> > @@ -35,7 +35,7 @@
> >                 "661:\n\tlock; "
> > 
> >  #else /* ! CONFIG_SMP */
> > -#define LOCK_PREFIX ""
> > +#define LOCK_PREFIX "\n\tlock; "
> >  #endif
> 
> What is your motivation for this change? At first sight this makes 
> the UP kernel a bit larger and a bit smaller. Are you fixing some 
> real regression/bug here?
>

Yes - but not easy to test for unless you have hardware that can
generate an interrupt flood for long enough period of time to
catch the atomic ops inbetween the read bus cycle and the write
bus cycle - a very small window.

As luck (good? bad? ugly?) would have it, I have a SDHC card and
machine organization that will trigger a flood from the ehci_hcd driver.
A poor man's test setup.

Even with that bit of luck, it takes from minutes to hours to hit the window.
The single lockdep dump I posted was the result of nearly a month's testing.
It is a _small_ window.  ;)

Mike
> 	Ingo
> 
> 

^ permalink raw reply	[flat|nested] 90+ messages in thread

[parent not found: <200905221343.30638.lkml@morethan.org>]

[parent not found: <20090522192329.GF846@one.firstfloor.org>]

* Re: [BUG FIX] Make x86_32 uni-processor Atomic ops, Atomic
       [not found]   ` <20090522192329.GF846@one.firstfloor.org>
@ 2009-05-22 19:53     ` Michael S. Zick
  2009-05-22 20:05       ` Samuel Thibault
  0 siblings, 1 reply; 90+ messages in thread
From: Michael S. Zick @ 2009-05-22 19:53 UTC (permalink / raw)
  To: Andi Kleen; +Cc: linux-kernel

On Fri May 22 2009, Andi Kleen wrote:
> On Fri, May 22, 2009 at 01:43:27PM -0500, Michael S. Zick wrote:
> > On Fri May 22 2009, you wrote:
> > > "Michael S. Zick" <lkml@morethan.org> writes:
> > > 
> > > > Found in the bit-rot for 32-bit, x86, Uni-processor builds:
> > > 
> > > Actually uni processor should not use the lock prefix 
> > > because it doesn't need it; the only exception are some special
> > > ops used in para-virtualization which are special cased.
> > > 
> > 
> > Unless you have interrupts enabled, then you have two contexts.
> 
> Interrupts on the local CPU don't interrupt instructions, only
> inbetween.
>

Ref: http://developer.intel.com/Assets/PDF/manual/253666.pdf
Manual page: 3-590  PDF page: 638
Summary: Processors prior to P-4 can take an interrupt between
the read cycle and the write cycle. Which is why opcode 0xF0 exists.

Mike
 
> -Andi



^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [BUG FIX] Make x86_32 uni-processor Atomic ops, Atomic
  2009-05-22 19:53     ` Michael S. Zick
@ 2009-05-22 20:05       ` Samuel Thibault
  2009-05-22 20:32         ` Michael S. Zick
  0 siblings, 1 reply; 90+ messages in thread
From: Samuel Thibault @ 2009-05-22 20:05 UTC (permalink / raw)
  To: Michael S. Zick; +Cc: Andi Kleen, linux-kernel

Michael S. Zick, le Fri 22 May 2009 14:53:39 -0500, a écrit :
> Ref: http://developer.intel.com/Assets/PDF/manual/253666.pdf
> Manual page: 3-590  PDF page: 638
> Summary: Processors prior to P-4 can take an interrupt between
> the read cycle and the write cycle. Which is why opcode 0xF0 exists.

Where do you see page 638/639 talking about interrupts?  It talks about
multi-processor machines.

Samuel

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [BUG FIX] Make x86_32 uni-processor Atomic ops, Atomic
  2009-05-22 20:05       ` Samuel Thibault
@ 2009-05-22 20:32         ` Michael S. Zick
  2009-05-22 20:42           ` Andi Kleen
                             ` (3 more replies)
  0 siblings, 4 replies; 90+ messages in thread
From: Michael S. Zick @ 2009-05-22 20:32 UTC (permalink / raw)
  To: Samuel Thibault; +Cc: Andi Kleen, linux-kernel

On Fri May 22 2009, Samuel Thibault wrote:
> Michael S. Zick, le Fri 22 May 2009 14:53:39 -0500, a écrit :
> > Ref: http://developer.intel.com/Assets/PDF/manual/253666.pdf
> > Manual page: 3-590  PDF page: 638
> > Summary: Processors prior to P-4 can take an interrupt between
> > the read cycle and the write cycle. Which is why opcode 0xF0 exists.
> 
> Where do you see page 638/639 talking about interrupts?  It talks about
> multi-processor machines.
> 

No - it talks about "exclusive memory access" - You got bus master DMA
in your test machine? You also have an older than P-4 single processor?

Look people, I just reported what I found from testing - 
Please don't shoot the messanger.

If it: "Does not make a difference" then it "Should not make a difference"
but it does, try it yourself.  Its safe (if LOCK_PREFIX is in the proper
places) - the machine will ignore the opcode if is recent enough to not
need it - just trust the cpu's micro-code.

Mike
> Samuel
> 
> 

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [BUG FIX] Make x86_32 uni-processor Atomic ops, Atomic
  2009-05-22 20:32         ` Michael S. Zick
@ 2009-05-22 20:42           ` Andi Kleen
  2009-05-22 20:57             ` Michael S. Zick
  2009-05-22 20:43           ` Samuel Thibault
                             ` (2 subsequent siblings)
  3 siblings, 1 reply; 90+ messages in thread
From: Andi Kleen @ 2009-05-22 20:42 UTC (permalink / raw)
  To: Michael S. Zick; +Cc: Samuel Thibault, Andi Kleen, linux-kernel

> If it: "Does not make a difference" then it "Should not make a difference"
> but it does, try it yourself.  Its safe (if LOCK_PREFIX is in the proper
> places) - the machine will ignore the opcode if is recent enough to not
> need it - just trust the cpu's micro-code.

It doesn't ignore it, in fact it's extremly slow on some older systems
where all atomic operations are very costly.
That is why LOCK is avoided as much as possible.

-Andi
-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [BUG FIX] Make x86_32 uni-processor Atomic ops, Atomic
  2009-05-22 20:42           ` Andi Kleen
@ 2009-05-22 20:57             ` Michael S. Zick
  0 siblings, 0 replies; 90+ messages in thread
From: Michael S. Zick @ 2009-05-22 20:57 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Samuel Thibault, linux-kernel

On Fri May 22 2009, Andi Kleen wrote:
> > If it: "Does not make a difference" then it "Should not make a difference"
> > but it does, try it yourself.  Its safe (if LOCK_PREFIX is in the proper
> > places) - the machine will ignore the opcode if is recent enough to not
> > need it - just trust the cpu's micro-code.
> 
> It doesn't ignore it, in fact it's extremly slow on some older systems
> where all atomic operations are very costly.
> That is why LOCK is avoided as much as possible.
> 

I'm only the messanger.

Mike
> -Andi



^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [BUG FIX] Make x86_32 uni-processor Atomic ops, Atomic
  2009-05-22 20:32         ` Michael S. Zick
  2009-05-22 20:42           ` Andi Kleen
@ 2009-05-22 20:43           ` Samuel Thibault
  2009-05-22 21:59             ` Andi Kleen
  2009-05-22 20:45           ` Roland Dreier
  2009-05-24 18:59           ` Robert Hancock
  3 siblings, 1 reply; 90+ messages in thread
From: Samuel Thibault @ 2009-05-22 20:43 UTC (permalink / raw)
  To: Michael S. Zick; +Cc: Andi Kleen, linux-kernel

Michael S. Zick, le Fri 22 May 2009 15:32:41 -0500, a écrit :
> On Fri May 22 2009, Samuel Thibault wrote:
> > Michael S. Zick, le Fri 22 May 2009 14:53:39 -0500, a écrit :
> > > Ref: http://developer.intel.com/Assets/PDF/manual/253666.pdf
> > > Manual page: 3-590  PDF page: 638
> > > Summary: Processors prior to P-4 can take an interrupt between
> > > the read cycle and the write cycle. Which is why opcode 0xF0 exists.
> > 
> > Where do you see page 638/639 talking about interrupts?  It talks about
> > multi-processor machines.
> 
> No - it talks about "exclusive memory access"

Right, that's still not interrupts.

> - You got bus master DMA in your test machine?

That's not related to the LOCK_PREFIX concern, which is about the
processor only, not interaction with other devices.

> Look people, I just reported what I found from testing - 

What did you test, precisely?

Samuel

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [BUG FIX] Make x86_32 uni-processor Atomic ops, Atomic
  2009-05-22 20:43           ` Samuel Thibault
@ 2009-05-22 21:59             ` Andi Kleen
  2009-05-22 22:00               ` Samuel Thibault
  0 siblings, 1 reply; 90+ messages in thread
From: Andi Kleen @ 2009-05-22 21:59 UTC (permalink / raw)
  To: Samuel Thibault, Michael S. Zick, Andi Kleen, linux-kernel

> > - You got bus master DMA in your test machine?
> 
> That's not related to the LOCK_PREFIX concern, which is about the
> processor only, not interaction with other devices.

Actually it's related to other devices; but only very few (most MMIO
doesn't support atomic cycles and is uncached anyways). But there's no driver 
for real hardware in Linux that relies on it to my knowledge.

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [BUG FIX] Make x86_32 uni-processor Atomic ops, Atomic
  2009-05-22 21:59             ` Andi Kleen
@ 2009-05-22 22:00               ` Samuel Thibault
  2009-05-22 22:14                 ` Andi Kleen
  0 siblings, 1 reply; 90+ messages in thread
From: Samuel Thibault @ 2009-05-22 22:00 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Michael S. Zick, linux-kernel

Andi Kleen, le Fri 22 May 2009 23:59:39 +0200, a écrit :
> > > - You got bus master DMA in your test machine?
> > 
> > That's not related to the LOCK_PREFIX concern, which is about the
> > processor only, not interaction with other devices.
> 
> Actually it's related to other devices; but only very few (most MMIO
> doesn't support atomic cycles and is uncached anyways). But there's no driver 
> for real hardware in Linux that relies on it to my knowledge.

That's what I meant: AIUI, LOCK_PREFIX has always only been used for
inter-processor interaction (atomic variables, spinlocks, etc.), not for
processor-device interaction.

Samuel

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [BUG FIX] Make x86_32 uni-processor Atomic ops, Atomic
  2009-05-22 22:00               ` Samuel Thibault
@ 2009-05-22 22:14                 ` Andi Kleen
  2009-05-22 22:14                   ` Samuel Thibault
  0 siblings, 1 reply; 90+ messages in thread
From: Andi Kleen @ 2009-05-22 22:14 UTC (permalink / raw)
  To: Samuel Thibault, Andi Kleen, Michael S. Zick, linux-kernel

> That's what I meant: AIUI, LOCK_PREFIX has always only been used for
> inter-processor interaction (atomic variables, spinlocks, etc.), not for

PCI has a locked transaction, but I don't think it's widely supported.
With normal uncached access it is also not very useful.

> processor-device interaction.

Well in Linux yes, but not architecturally in x86. That is why the CPUs
don't just nop it out with a single core (which Michael assumes they do,
but they don't)

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [BUG FIX] Make x86_32 uni-processor Atomic ops, Atomic
  2009-05-22 22:14                 ` Andi Kleen
@ 2009-05-22 22:14                   ` Samuel Thibault
  0 siblings, 0 replies; 90+ messages in thread
From: Samuel Thibault @ 2009-05-22 22:14 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Michael S. Zick, linux-kernel

Andi Kleen, le Sat 23 May 2009 00:14:56 +0200, a écrit :
> > That's what I meant: AIUI, LOCK_PREFIX has always only been used for
> > inter-processor interaction (atomic variables, spinlocks, etc.), not for
> 
> PCI has a locked transaction, but I don't think it's widely supported.
> With normal uncached access it is also not very useful.

I'm not talking about the LOCK prefix.  I'm talking about the
LOCK_PREFIX macro.  I'm saying that AIUI it has never been supposed to
be used for procesor-device interaction, even if the LOCK prefix could
be used for that.

Samuel

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [BUG FIX] Make x86_32 uni-processor Atomic ops, Atomic
  2009-05-22 20:32         ` Michael S. Zick
  2009-05-22 20:42           ` Andi Kleen
  2009-05-22 20:43           ` Samuel Thibault
@ 2009-05-22 20:45           ` Roland Dreier
  2009-05-24 18:59           ` Robert Hancock
  3 siblings, 0 replies; 90+ messages in thread
From: Roland Dreier @ 2009-05-22 20:45 UTC (permalink / raw)
  To: lkml; +Cc: Samuel Thibault, Andi Kleen, linux-kernel

 > > > Ref: http://developer.intel.com/Assets/PDF/manual/253666.pdf
 > > > Manual page: 3-590  PDF page: 638
 > > > Summary: Processors prior to P-4 can take an interrupt between
 > > > the read cycle and the write cycle. Which is why opcode 0xF0 exists.

 > > Where do you see page 638/639 talking about interrupts?  It talks about
 > > multi-processor machines.

 > No - it talks about "exclusive memory access" - You got bus master DMA
 > in your test machine? You also have an older than P-4 single processor?

I looked at the page you refer to.  I talks about asserting the LOCK#
signal -- there is absolutely no mention of the lock prefix having any
effect on the execution of an instruction internal to a single CPU.
Could you be more specific about what you are referring to?

 > Look people, I just reported what I found from testing - 
 > Please don't shoot the messanger.

Could you be specific about the test you are doing?  What operation are
you doing that is missing the lock prefix?  What is the expected result,
and what actually happens without the lock prefix?

 - R.

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [BUG FIX] Make x86_32 uni-processor Atomic ops, Atomic
  2009-05-22 20:32         ` Michael S. Zick
                             ` (2 preceding siblings ...)
  2009-05-22 20:45           ` Roland Dreier
@ 2009-05-24 18:59           ` Robert Hancock
  3 siblings, 0 replies; 90+ messages in thread
From: Robert Hancock @ 2009-05-24 18:59 UTC (permalink / raw)
  To: lkml; +Cc: Samuel Thibault, Andi Kleen, linux-kernel

Michael S. Zick wrote:
> On Fri May 22 2009, Samuel Thibault wrote:
>> Michael S. Zick, le Fri 22 May 2009 14:53:39 -0500, a écrit :
>>> Ref: http://developer.intel.com/Assets/PDF/manual/253666.pdf
>>> Manual page: 3-590  PDF page: 638
>>> Summary: Processors prior to P-4 can take an interrupt between
>>> the read cycle and the write cycle. Which is why opcode 0xF0 exists.
>> Where do you see page 638/639 talking about interrupts?  It talks about
>> multi-processor machines.
>>
> 
> No - it talks about "exclusive memory access" - You got bus master DMA
> in your test machine? You also have an older than P-4 single processor?

It means that LOCK is required in multi-processor environment to ensure 
that an instruction executes atomically WRT memory operations being done 
on other CPUs. On a single processor, except for some weird exceptions 
(like rep instructions, which can't be LOCKed anyways), instructions are 
always atomic with respect to interrupts.

> 
> Look people, I just reported what I found from testing - 
> Please don't shoot the messanger.
> 
> If it: "Does not make a difference" then it "Should not make a difference"
> but it does, try it yourself.  Its safe (if LOCK_PREFIX is in the proper
> places) - the machine will ignore the opcode if is recent enough to not
> need it - just trust the cpu's micro-code.

What do you mean "recent enough to not need it?" There is no such thing. 
On any x86 machine it does something. It will slow things down, and 
there is no reason it should be required on uni-processor systems.

Quite likely that's the only effect adding the LOCK prefix is having, 
slowing things down, and covering up whatever is causing your issue, 
without having anything to do with the root cause.

> 
> Mike
>> Samuel
>>
>>
> 
> 

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [BUG FIX] Make x86_32 uni-processor Atomic ops, Atomic
@ 2009-05-22 18:50 Michael S. Zick
  2009-05-22 19:24 ` Roland Dreier
  0 siblings, 1 reply; 90+ messages in thread
From: Michael S. Zick @ 2009-05-22 18:50 UTC (permalink / raw)
  To: linux-kernel

On Fri May 22 2009, you wrote:
> "Michael S. Zick" <lkml@morethan.org> writes:
> 
> > Found in the bit-rot for 32-bit, x86, Uni-processor builds:
> 
> Actually uni processor should not use the lock prefix 
> because it doesn't need it; the only exception are some special
> ops used in para-virtualization which are special cased.
> 

Unless you have interrupts enabled, then you have two contexts.
Only xchg is "naturally" atomic.

Mike
> -Andi
> 



^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [BUG FIX] Make x86_32 uni-processor Atomic ops, Atomic
  2009-05-22 18:50 Michael S. Zick
@ 2009-05-22 19:24 ` Roland Dreier
  2009-05-22 20:03   ` Michael S. Zick
  0 siblings, 1 reply; 90+ messages in thread
From: Roland Dreier @ 2009-05-22 19:24 UTC (permalink / raw)
  To: Michael S. Zick; +Cc: linux-kernel

 > Unless you have interrupts enabled, then you have two contexts.
 > Only xchg is "naturally" atomic.

Isn't the lock prefix about consistency between multiple processors?
The x86 architecture always handles interrupts on instruction
boundaries.  I'm guessing you're worried about definitions like

static inline void atomic_inc(atomic_t *v)
{
	asm volatile(LOCK_PREFIX "incl %0"
		     : "+m" (v->counter));
}

which compiles to just "incl" (with no lock prefix) on uniprocessor
kernels; but the IA-32 architecture guarantees that the incl instruction
cannot be interrupted between reading the old value and writing the new
value.

 - R.

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [BUG FIX] Make x86_32 uni-processor Atomic ops, Atomic
  2009-05-22 19:24 ` Roland Dreier
@ 2009-05-22 20:03   ` Michael S. Zick
  0 siblings, 0 replies; 90+ messages in thread
From: Michael S. Zick @ 2009-05-22 20:03 UTC (permalink / raw)
  To: Roland Dreier; +Cc: linux-kernel

On Fri May 22 2009, Roland Dreier wrote:
> 
>  > Unless you have interrupts enabled, then you have two contexts.
>  > Only xchg is "naturally" atomic.
> 
> Isn't the lock prefix about consistency between multiple processors?
> The x86 architecture always handles interrupts on instruction
> boundaries.  I'm guessing you're worried about definitions like
> 
> static inline void atomic_inc(atomic_t *v)
> {
> 	asm volatile(LOCK_PREFIX "incl %0"
> 		     : "+m" (v->counter));
> }
> 
> which compiles to just "incl" (with no lock prefix) on uniprocessor
> kernels; but the IA-32 architecture guarantees that the incl instruction
> cannot be interrupted between reading the old value and writing the new
> value.
> 

Not prior to P-4, and since then only "may" be done atomically,
see reference post in my earlier reply.

PS: And yes, that was where I spotted the usage first.  ;)

Mike
>  - R.
> 
> 



^ permalink raw reply	[flat|nested] 90+ messages in thread

end of thread, other threads:[~2009-06-03 11:46 UTC | newest]

Thread overview: 90+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-05-22 16:39 [BUG FIX] Make x86_32 uni-processor Atomic ops, Atomic Michael S. Zick
2009-05-22 18:23 ` Andi Kleen
2009-05-22 18:36 ` Ingo Molnar
2009-05-22 18:59   ` H. Peter Anvin
2009-05-22 19:20     ` Michael S. Zick
2009-05-22 22:21     ` Michael S. Zick
2009-05-22 23:30       ` H. Peter Anvin
2009-05-23  0:45         ` Michael S. Zick
2009-05-23  0:51           ` H. Peter Anvin
2009-05-23 10:44             ` Michael S. Zick
2009-05-23 11:18               ` Michael S. Zick
2009-05-24  7:04               ` Harald Welte
2009-05-24 12:48                 ` Michael S. Zick
2009-05-24 15:43                 ` Michael S. Zick
2009-05-27 22:13               ` Roland Dreier
2009-05-27 22:33                 ` Michael S. Zick
2009-05-23 15:52             ` Michael S. Zick
2009-05-23 18:04             ` Michael S. Zick
2009-05-23 23:44               ` H. Peter Anvin
2009-05-24  6:49                 ` Harald Welte
2009-05-24 12:38                   ` Michael S. Zick
2009-05-24 17:31                     ` Harald Welte
2009-05-27 12:18                   ` Re:[VIA Support] was: " Michael S. Zick
2009-05-27 12:22                     ` [VIA " Michael S. Zick
2009-05-27 12:47                     ` Harald Welte
2009-05-27 13:00                       ` Michael S. Zick
2009-05-29 12:06                     ` Michael S. Zick
2009-05-30 15:48                   ` Michael S. Zick
2009-05-24 12:27                 ` Michael S. Zick
2009-05-24 17:22                   ` Harald Welte
2009-05-24 18:00                   ` H. Peter Anvin
2009-05-24 18:32                     ` Michael S. Zick
2009-05-24 18:46                       ` H. Peter Anvin
2009-05-24 19:09                         ` Michael S. Zick
2009-05-25 19:03                         ` Michael S. Zick
2009-05-25 19:18                           ` Michael S. Zick
2009-05-25 19:46                             ` Michael S. Zick
2009-05-25 21:10                               ` Michael S. Zick
2009-05-25 21:17                                 ` H. Peter Anvin
2009-05-25 23:03                                   ` Michael S. Zick
2009-05-25 23:35                                     ` Michael S. Zick
2009-05-26  0:05                                     ` H. Peter Anvin
2009-05-26 12:37                                       ` Michael S. Zick
2009-05-26 17:13                                         ` H. Peter Anvin
2009-05-25  1:31                       ` i2c-viapro / via-fb drivers on VIA CX700 Harald Welte
2009-05-25 12:54                         ` Michael S. Zick
2009-05-27 13:36                         ` Michael S. Zick
2009-05-25 16:05                       ` [BUG FIX] Make x86_32 uni-processor Atomic ops, Atomic Michael S. Zick
2009-05-28 20:30                     ` Pavel Machek
2009-05-28 20:54                       ` Michael S. Zick
2009-05-28 23:15                         ` [Futex RFC] was " Michael S. Zick
2009-05-29  2:00                           ` Michael S. Zick
2009-05-27 17:01                 ` LOCK prefix on uni processor has its use (was Re: [BUG FIX] Make x86_32 uni-processor Atomic ops, Atomic) Harald Welte
2009-05-27 17:10                   ` Michael S. Zick
2009-05-27 17:19                   ` Thomas Gleixner
2009-05-27 17:25                     ` Michael S. Zick
2009-05-27 18:08                   ` LOCK prefix on uni processor has its use Andi Kleen
2009-05-27 18:22                     ` Michael S. Zick
2009-05-27 18:33                       ` Michael S. Zick
2009-05-27 18:55                         ` Michael S. Zick
2009-05-27 18:38                       ` Andi Kleen
2009-06-02 12:48                     ` Harald Welte
2009-06-02 13:03                       ` Andi Kleen
2009-06-02 13:26                         ` Michael S. Zick
2009-06-02 13:42                           ` Andi Kleen
2009-06-03 11:46                             ` Michael S. Zick
2009-05-28  2:56                   ` LOCK prefix on uni processor has its use (was Re: [BUG FIX] Make x86_32 uni-processor Atomic ops, Atomic) H. Peter Anvin
2009-05-23 20:51             ` [BUG FIX] Make x86_32 uni-processor Atomic ops, Atomic Michael S. Zick
2009-05-28 12:48       ` Pavel Machek
2009-05-28 13:29         ` Michael S. Zick
2009-05-28 20:50           ` Pavel Machek
2009-05-28 20:58             ` Michael S. Zick
2009-05-28 21:16               ` Pavel Machek
2009-05-28 21:21                 ` Michael S. Zick
2009-05-22 19:17   ` Michael S. Zick
     [not found] ` <200905221343.30638.lkml@morethan.org>
     [not found]   ` <20090522192329.GF846@one.firstfloor.org>
2009-05-22 19:53     ` Michael S. Zick
2009-05-22 20:05       ` Samuel Thibault
2009-05-22 20:32         ` Michael S. Zick
2009-05-22 20:42           ` Andi Kleen
2009-05-22 20:57             ` Michael S. Zick
2009-05-22 20:43           ` Samuel Thibault
2009-05-22 21:59             ` Andi Kleen
2009-05-22 22:00               ` Samuel Thibault
2009-05-22 22:14                 ` Andi Kleen
2009-05-22 22:14                   ` Samuel Thibault
2009-05-22 20:45           ` Roland Dreier
2009-05-24 18:59           ` Robert Hancock
  -- strict thread matches above, loose matches on Subject: below --
2009-05-22 18:50 Michael S. Zick
2009-05-22 19:24 ` Roland Dreier
2009-05-22 20:03   ` Michael S. Zick

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox