* [BUG FIX] Make x86_32 uni-processor Atomic ops, Atomic
@ 2009-05-22 16:39 Michael S. Zick
2009-05-22 18:23 ` Andi Kleen
` (2 more replies)
0 siblings, 3 replies; 90+ messages in thread
From: Michael S. Zick @ 2009-05-22 16:39 UTC (permalink / raw)
To: linux-kernel
Found in the bit-rot for 32-bit, x86, Uni-processor builds:
diff --git a/arch/x86/include/asm/alternative.h b/arch/x86/include/asm/alternative.h
index f6aa18e..3c790ef 100644
--- a/arch/x86/include/asm/alternative.h
+++ b/arch/x86/include/asm/alternative.h
@@ -35,7 +35,7 @@
"661:\n\tlock; "
#else /* ! CONFIG_SMP */
-#define LOCK_PREFIX ""
+#define LOCK_PREFIX "\n\tlock; "
#endif
/* This must be included *after* the definition of LOCK_PREFIX */
Submitted: M. S. Zick
^ permalink raw reply related [flat|nested] 90+ messages in thread* Re: [BUG FIX] Make x86_32 uni-processor Atomic ops, Atomic 2009-05-22 16:39 [BUG FIX] Make x86_32 uni-processor Atomic ops, Atomic Michael S. Zick @ 2009-05-22 18:23 ` Andi Kleen 2009-05-22 18:36 ` Ingo Molnar [not found] ` <200905221343.30638.lkml@morethan.org> 2 siblings, 0 replies; 90+ messages in thread From: Andi Kleen @ 2009-05-22 18:23 UTC (permalink / raw) To: lkml; +Cc: linux-kernel "Michael S. Zick" <lkml@morethan.org> writes: > Found in the bit-rot for 32-bit, x86, Uni-processor builds: Actually uni processor should not use the lock prefix because it doesn't need it; the only exception are some special ops used in para-virtualization which are special cased. -Andi -- ak@linux.intel.com -- Speaking for myself only. ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [BUG FIX] Make x86_32 uni-processor Atomic ops, Atomic 2009-05-22 16:39 [BUG FIX] Make x86_32 uni-processor Atomic ops, Atomic Michael S. Zick 2009-05-22 18:23 ` Andi Kleen @ 2009-05-22 18:36 ` Ingo Molnar 2009-05-22 18:59 ` H. Peter Anvin 2009-05-22 19:17 ` Michael S. Zick [not found] ` <200905221343.30638.lkml@morethan.org> 2 siblings, 2 replies; 90+ messages in thread From: Ingo Molnar @ 2009-05-22 18:36 UTC (permalink / raw) To: Michael S. Zick, H. Peter Anvin, Thomas Gleixner; +Cc: linux-kernel * Michael S. Zick <lkml@morethan.org> wrote: > Found in the bit-rot for 32-bit, x86, Uni-processor builds: > > diff --git a/arch/x86/include/asm/alternative.h b/arch/x86/include/asm/alternative.h > index f6aa18e..3c790ef 100644 > --- a/arch/x86/include/asm/alternative.h > +++ b/arch/x86/include/asm/alternative.h > @@ -35,7 +35,7 @@ > "661:\n\tlock; " > > #else /* ! CONFIG_SMP */ > -#define LOCK_PREFIX "" > +#define LOCK_PREFIX "\n\tlock; " > #endif What is your motivation for this change? At first sight this makes the UP kernel a bit larger and a bit smaller. Are you fixing some real regression/bug here? Ingo ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [BUG FIX] Make x86_32 uni-processor Atomic ops, Atomic 2009-05-22 18:36 ` Ingo Molnar @ 2009-05-22 18:59 ` H. Peter Anvin 2009-05-22 19:20 ` Michael S. Zick 2009-05-22 22:21 ` Michael S. Zick 2009-05-22 19:17 ` Michael S. Zick 1 sibling, 2 replies; 90+ messages in thread From: H. Peter Anvin @ 2009-05-22 18:59 UTC (permalink / raw) To: Ingo Molnar; +Cc: Michael S. Zick, Thomas Gleixner, linux-kernel Ingo Molnar wrote: > * Michael S. Zick <lkml@morethan.org> wrote: > >> Found in the bit-rot for 32-bit, x86, Uni-processor builds: >> >> diff --git a/arch/x86/include/asm/alternative.h b/arch/x86/include/asm/alternative.h >> index f6aa18e..3c790ef 100644 >> --- a/arch/x86/include/asm/alternative.h >> +++ b/arch/x86/include/asm/alternative.h >> @@ -35,7 +35,7 @@ >> "661:\n\tlock; " >> >> #else /* ! CONFIG_SMP */ >> -#define LOCK_PREFIX "" >> +#define LOCK_PREFIX "\n\tlock; " >> #endif > > What is your motivation for this change? At first sight this makes > the UP kernel a bit larger and a bit smaller. Are you fixing some > real regression/bug here? > That looks very odd indeed. The whole point of the LOCK_PREFIX macro is to squelch it on UP (locks that should not be squelched on UP should not be annotated LOCK_PREFIX.) -hpa ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [BUG FIX] Make x86_32 uni-processor Atomic ops, Atomic 2009-05-22 18:59 ` H. Peter Anvin @ 2009-05-22 19:20 ` Michael S. Zick 2009-05-22 22:21 ` Michael S. Zick 1 sibling, 0 replies; 90+ messages in thread From: Michael S. Zick @ 2009-05-22 19:20 UTC (permalink / raw) To: H. Peter Anvin; +Cc: Ingo Molnar, Thomas Gleixner, linux-kernel On Fri May 22 2009, H. Peter Anvin wrote: > Ingo Molnar wrote: > > * Michael S. Zick <lkml@morethan.org> wrote: > > > >> Found in the bit-rot for 32-bit, x86, Uni-processor builds: > >> > >> diff --git a/arch/x86/include/asm/alternative.h b/arch/x86/include/asm/alternative.h > >> index f6aa18e..3c790ef 100644 > >> --- a/arch/x86/include/asm/alternative.h > >> +++ b/arch/x86/include/asm/alternative.h > >> @@ -35,7 +35,7 @@ > >> "661:\n\tlock; " > >> > >> #else /* ! CONFIG_SMP */ > >> -#define LOCK_PREFIX "" > >> +#define LOCK_PREFIX "\n\tlock; " > >> #endif > > > > What is your motivation for this change? At first sight this makes > > the UP kernel a bit larger and a bit smaller. Are you fixing some > > real regression/bug here? > > > > That looks very odd indeed. The whole point of the LOCK_PREFIX macro is > to squelch it on UP (locks that should not be squelched on UP should not > be annotated LOCK_PREFIX.) > OK, will inspect for that possibility. We may just have a mis-use of LOCK_PREFIX. Mike > -hpa > > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > > ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [BUG FIX] Make x86_32 uni-processor Atomic ops, Atomic 2009-05-22 18:59 ` H. Peter Anvin 2009-05-22 19:20 ` Michael S. Zick @ 2009-05-22 22:21 ` Michael S. Zick 2009-05-22 23:30 ` H. Peter Anvin 2009-05-28 12:48 ` Pavel Machek 1 sibling, 2 replies; 90+ messages in thread From: Michael S. Zick @ 2009-05-22 22:21 UTC (permalink / raw) To: H. Peter Anvin; +Cc: Ingo Molnar, Thomas Gleixner, linux-kernel On Fri May 22 2009, H. Peter Anvin wrote: > Ingo Molnar wrote: > > * Michael S. Zick <lkml@morethan.org> wrote: > > > >> Found in the bit-rot for 32-bit, x86, Uni-processor builds: > >> > >> diff --git a/arch/x86/include/asm/alternative.h b/arch/x86/include/asm/alternative.h > >> index f6aa18e..3c790ef 100644 > >> --- a/arch/x86/include/asm/alternative.h > >> +++ b/arch/x86/include/asm/alternative.h > >> @@ -35,7 +35,7 @@ > >> "661:\n\tlock; " > >> > >> #else /* ! CONFIG_SMP */ > >> -#define LOCK_PREFIX "" > >> +#define LOCK_PREFIX "\n\tlock; " > >> #endif > > > > What is your motivation for this change? At first sight this makes > > the UP kernel a bit larger and a bit smaller. Are you fixing some > > real regression/bug here? > > > > That looks very odd indeed. The whole point of the LOCK_PREFIX macro is > to squelch it on UP (locks that should not be squelched on UP should not > be annotated LOCK_PREFIX.) > I can only act as a messanger to report the behavior I observe; But let me see if I can't do a better job of that limited role. hpa makes the best point of all in the responses here... What I see (erratic operation, erratic lock-ups of the machine, and the previously posted lockdep dump) - This may well be misplaced usage of the LOCK_PREFIX macro; I have already agreed to keep my eyes open for this more specific problem. A secondary possibility, hinted at in the context of other replies; The usage of the LOCK_PREFIX may not apply equally to all processors for which this code gets included. It is possible that I am building for one of the exceptions. That tells us nothing, since the CPU technical details are under NDA. All that can be done in this case is report behavior differences from the closest publicly described processor (Pentium-M). For that purpose, I suggest that a single processor box, with other hardware that makes memory access independent of the processor's control using a processor older than P-4 is a potential test bed. "Other hardware that makes memory access..." I previously termed: "buss master DMA" - which is overly specific. It misleads people into thinking I am seeing hardware control issues rather than non-exclusive memory access. My earlier comments about taking an interrupt between the memory read and the memory write operations is from a different manual than the one posted. A manual that only applies to processors older than the ones supported by the Linux kernel. Sorry, my bad, grabbed the wrong book, posted the correct link (SH). Until one or more specific usages of the LOCK_PREFIX macro can be demonstrated to be incorrect (at least for some of the processors using this code) - - Then making the posted change is a single point change that gives a pair of builds (one with, one without) to compare the behavior of on the test bed. It is *not* the preferred change for a general release kernel, the preferred change would be one that makes a specific rather than general correction. Perhaps only for some functions, perhaps only for some of the processors that currently select this code. The observation that executing an unnecessary 'lock' opcode in some cases slows down the machine is not felt by myself to be significant to duplicating my observations. Note: I have been wrong before. This is as informative as I can make the message. PS: *not* a single machine failure, tested on five machines, owned by four different people, two brands, with different use histories. Mike > -hpa > > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > > ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [BUG FIX] Make x86_32 uni-processor Atomic ops, Atomic 2009-05-22 22:21 ` Michael S. Zick @ 2009-05-22 23:30 ` H. Peter Anvin 2009-05-23 0:45 ` Michael S. Zick 2009-05-28 12:48 ` Pavel Machek 1 sibling, 1 reply; 90+ messages in thread From: H. Peter Anvin @ 2009-05-22 23:30 UTC (permalink / raw) To: lkml; +Cc: Ingo Molnar, Thomas Gleixner, linux-kernel If there is a driver which relies on locked operations to be atomic with respect to the I/O subsystem, it needs to use true locks, not LOCK_PREFIX. An interrupt cannot interrupt between two parts of a lockable instruction even if it isn't locked (there are non-atomic instructions in the x86 architecture, but they can never be locked.) The other thing that you might be seeing is that a locked operation may be slow enough to keep an otherwise-present race condition from being triggered. > That tells us nothing, since the CPU technical details are under NDA. Have you considered that you might be running into a CPU bug or design error? There was the out-of-order store bug on the Winchip that needed workarounds (CONFIG_X86_OOSTORE) that I don't think were ever well tested and might very well have bitrotted? > All that can be done in this case is report behavior differences from > the closest publicly described processor (Pentium-M). > > For that purpose, I suggest that a single processor box, with other > hardware that makes memory access independent of the processor's > control using a processor older than P-4 is a potential test bed. > "Other hardware that makes memory access..." I previously termed: > "buss master DMA" - which is overly specific. It misleads people > into thinking I am seeing hardware control issues rather than > non-exclusive memory access. > > My earlier comments about taking an interrupt between the memory read > and the memory write operations is from a different manual than the > one posted. A manual that only applies to processors older than > the ones supported by the Linux kernel. > Sorry, my bad, grabbed the wrong book, posted the correct link (SH). > > Until one or more specific usages of the LOCK_PREFIX macro can be > demonstrated to be incorrect (at least for some of the processors > using this code) - - > > Then making the posted change is a single point change that gives a > pair of builds (one with, one without) to compare the behavior of on > the test bed. > > It is *not* the preferred change for a general release kernel, the > preferred change would be one that makes a specific rather than > general correction. > Perhaps only for some functions, perhaps only for some of the > processors that currently select this code. > > The observation that executing an unnecessary 'lock' opcode in some > cases slows down the machine is not felt by myself to be significant > to duplicating my observations. Note: I have been wrong before. What makes you draw that conclusion, in particular? A lock prefix typically slows down the following instruction dramatically, on some processors by many hundreds of cycles. > This is as informative as I can make the message. > > PS: *not* a single machine failure, tested on five machines, owned > by four different people, two brands, with different use histories. What do they have in common? -hpa ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [BUG FIX] Make x86_32 uni-processor Atomic ops, Atomic 2009-05-22 23:30 ` H. Peter Anvin @ 2009-05-23 0:45 ` Michael S. Zick 2009-05-23 0:51 ` H. Peter Anvin 0 siblings, 1 reply; 90+ messages in thread From: Michael S. Zick @ 2009-05-23 0:45 UTC (permalink / raw) To: H. Peter Anvin; +Cc: Ingo Molnar, Thomas Gleixner, linux-kernel On Fri May 22 2009, H. Peter Anvin wrote: > If there is a driver which relies on locked operations to be atomic with > respect to the I/O subsystem, it needs to use true locks, not LOCK_PREFIX. > > An interrupt cannot interrupt between two parts of a lockable > instruction even if it isn't locked (there are non-atomic instructions > in the x86 architecture, but they can never be locked.) > > The other thing that you might be seeing is that a locked operation may > be slow enough to keep an otherwise-present race condition from being > triggered. > > > That tells us nothing, since the CPU technical details are under NDA. > > Have you considered that you might be running into a CPU bug or design > error? There was the out-of-order store bug on the Winchip that needed > workarounds (CONFIG_X86_OOSTORE) that I don't think were ever well > tested and might very well have bitrotted? > > > All that can be done in this case is report behavior differences from > > the closest publicly described processor (Pentium-M). > > > > For that purpose, I suggest that a single processor box, with other > > hardware that makes memory access independent of the processor's > > control using a processor older than P-4 is a potential test bed. > > "Other hardware that makes memory access..." I previously termed: > > "buss master DMA" - which is overly specific. It misleads people > > into thinking I am seeing hardware control issues rather than > > non-exclusive memory access. > > > > My earlier comments about taking an interrupt between the memory read > > and the memory write operations is from a different manual than the > > one posted. A manual that only applies to processors older than > > the ones supported by the Linux kernel. > > Sorry, my bad, grabbed the wrong book, posted the correct link (SH). > > > > Until one or more specific usages of the LOCK_PREFIX macro can be > > demonstrated to be incorrect (at least for some of the processors > > using this code) - - > > > > Then making the posted change is a single point change that gives a > > pair of builds (one with, one without) to compare the behavior of on > > the test bed. > > > > It is *not* the preferred change for a general release kernel, the > > preferred change would be one that makes a specific rather than > > general correction. > > Perhaps only for some functions, perhaps only for some of the > > processors that currently select this code. > > > > The observation that executing an unnecessary 'lock' opcode in some > > cases slows down the machine is not felt by myself to be significant > > to duplicating my observations. Note: I have been wrong before. > > What makes you draw that conclusion, in particular? A lock prefix > typically slows down the following instruction dramatically, on some > processors by many hundreds of cycles. > > > This is as informative as I can make the message. > > > > PS: *not* a single machine failure, tested on five machines, owned > > by four different people, two brands, with different use histories. > > What do they have in common? > Same integrated motherboard. There is very little information to be gained from staring at a glowing power on light, that only glows back. ;) The lockdep dump posted is the best source of information. Other observations - Here is something which these machines do, which may not be happening with your choice of test machines: ACPI: Core revision 20090320 ..TIMER: vector=0x30 apic1=0 pin1=0 apic2=-1 pin2=-1 ..MP-BIOS bug: 8254 timer not connected to IO-APIC ...trying to set up timer (IRQ0) through the 8259A ... ..... (found apic 0 pin 0) ... ....... works. Note: This is on a Uni-processor build. I have not yet examined the code that generates that set of messages. Might be a broken work-around? With the LOCK_PREFIX == "" Test conditions (same as the lockdep dump) - VLC playing streaming audio over the wired net connection (8139too) - from 4 to 8 ssh remote terminal sessions, each running "top" set to use different display intervales (different in 0.1 second steps) - Fixed cpu speed at half the rated clock (for the purpose of testing). Now just hang back and listen for 10 minutes to 4 hours - When the machine stops running - You will still hear bursts of sound - - I am *guessing* that this means the chip set and bus clocks are running, also that DMA is running - with the result that the HD audio driver is just replaying the same buffer offset. There is a PCI-to-PCIe bridge in the chip set and the HD audio hardware (also on chip) is the only thing detected on the PCIe bus. The "hold down power button to stop" still works - I presume that means at least that internal timer is still running. Repeat the above, *with* LOCK_PREFIX == "\n\tlock; " When the machine stops - with only minutes rather than hours of uptime - The machine is silent - I presume this means that DMA is not running. The "hold down power button to stop" still works - So clocks are not totally off. = = = = Either "lock-up" situation acts as if: *) cpu is halted with interrupts off; or *) cpu is in a tight loop with interrupts off The primary difference is that the DMA has been stopped in the second case. Presuming my two guesses on that subject above are correct. Mike > -hpa > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > > ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [BUG FIX] Make x86_32 uni-processor Atomic ops, Atomic 2009-05-23 0:45 ` Michael S. Zick @ 2009-05-23 0:51 ` H. Peter Anvin 2009-05-23 10:44 ` Michael S. Zick ` (3 more replies) 0 siblings, 4 replies; 90+ messages in thread From: H. Peter Anvin @ 2009-05-23 0:51 UTC (permalink / raw) To: lkml; +Cc: Ingo Molnar, Thomas Gleixner, linux-kernel Michael S. Zick wrote: > Same integrated motherboard. Which means same CPU, same BIOS, same motherboard (none of which you're telling us.) cpuinfo and dmidecode would be informative. -hpa ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [BUG FIX] Make x86_32 uni-processor Atomic ops, Atomic 2009-05-23 0:51 ` H. Peter Anvin @ 2009-05-23 10:44 ` Michael S. Zick 2009-05-23 11:18 ` Michael S. Zick ` (2 more replies) 2009-05-23 15:52 ` Michael S. Zick ` (2 subsequent siblings) 3 siblings, 3 replies; 90+ messages in thread From: Michael S. Zick @ 2009-05-23 10:44 UTC (permalink / raw) To: H. Peter Anvin; +Cc: Ingo Molnar, Thomas Gleixner, linux-kernel On Fri May 22 2009, H. Peter Anvin wrote: > Michael S. Zick wrote: > > Same integrated motherboard. > > Which means same CPU, same BIOS, same motherboard (none of which you're > telling us.) > The only objective information is posted here: http://lkml.org/lkml/2009/5/20/342 Everything else related to this problem is subjective. > cpuinfo and dmidecode would be informative. > Must have hit "reply" rather than "reply all" at some critical point along the way. Browse this directory: http://hp-umpc.com/ce1200v/ Your looking for the: http://hp-umpc.com/ce1200v/sylvania-g-data.tar.gz The Everex Cloudbook only varies by some strings in the dmidecode output. For logs of speculation and efforts at re-arranging the deck chairs on the Titanic: http://forum.netbookuser.com/viewforum.php?id=8 I chose to start with the ce1200v because: *) It needs the most help; *) One of the two tech manuals on the cx700 has been published since the drivers where touched. see: http://linux.via.com.tw/support/downloadFiles.action select cx700/vx700 in right-hand box, click the manual. *) The HP-2133 MiniNote uses the cn896 chipset, which has not yet been released from NDA. Note: I do not have the C7-M technical reference, it is still under NDA. *But* if a developer on this list has a copy of the manual *and* owns one of these three brands of machine - they would have fixed their own machine a year ago. So I am not holding my breath until a person is located that has both the manual and the machine. ;) Mike > -hpa > > > ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [BUG FIX] Make x86_32 uni-processor Atomic ops, Atomic 2009-05-23 10:44 ` Michael S. Zick @ 2009-05-23 11:18 ` Michael S. Zick 2009-05-24 7:04 ` Harald Welte 2009-05-27 22:13 ` Roland Dreier 2 siblings, 0 replies; 90+ messages in thread From: Michael S. Zick @ 2009-05-23 11:18 UTC (permalink / raw) To: H. Peter Anvin; +Cc: Ingo Molnar, Thomas Gleixner, linux-kernel On Sat May 23 2009, Michael S. Zick wrote: > On Fri May 22 2009, H. Peter Anvin wrote: > > Michael S. Zick wrote: > > > Same integrated motherboard. > > > > Which means same CPU, same BIOS, same motherboard (none of which you're > > telling us.) > > > > The only objective information is posted here: > http://lkml.org/lkml/2009/5/20/342 > Everything else related to this problem is subjective. > > > cpuinfo and dmidecode would be informative. > > > > Must have hit "reply" rather than "reply all" at some > critical point along the way. > > Browse this directory: > http://hp-umpc.com/ce1200v/ > Your looking for the: > http://hp-umpc.com/ce1200v/sylvania-g-data.tar.gz > The Everex Cloudbook only varies by some strings in the > dmidecode output. > > For logs of speculation and efforts at re-arranging the > deck chairs on the Titanic: > http://forum.netbookuser.com/viewforum.php?id=8 > > I chose to start with the ce1200v because: > *) It needs the most help; > *) One of the two tech manuals on the cx700 has been > published since the drivers where touched. > see: http://linux.via.com.tw/support/downloadFiles.action > select cx700/vx700 in right-hand box, click the manual. > *) The HP-2133 MiniNote uses the cn896 chipset, which > has not yet been released from NDA. > > Note: > I do not have the C7-M technical reference, it is still > under NDA. > *But* if a developer on this list has a copy of the > manual *and* owns one of these three brands of machine - > they would have fixed their own machine a year ago. > So I am not holding my breath until a person is located > that has both the manual and the machine. ;) > As to getting a person with the manuals on-hand and everyone else together, may I point out that the MUC room is still on-line: Pick you favorite Jabber MUC client; JID: cloudbook-group@conference.jabber.cb-chat.com Which translates to: Room: cloudbook-group URL: conference.jabber.cb-chat.com Password: <none> leave blank in your client, it is a public room. Note: This is a low-volume server and has "on-demand" rooms enabled; you want a room for the "hot topic" of the moment; just "join chat" or "join group" (whatever your client calls it) for the new room name - the server will create it for you. If creating rooms, I strongly suggest Gajim - it has the easiest to use control and administrative features. Note2: Not using any video, so we will not see your smiling face. ;) Mike > Mike > > > > -hpa > > > > > > > > > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > > ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [BUG FIX] Make x86_32 uni-processor Atomic ops, Atomic 2009-05-23 10:44 ` Michael S. Zick 2009-05-23 11:18 ` Michael S. Zick @ 2009-05-24 7:04 ` Harald Welte 2009-05-24 12:48 ` Michael S. Zick 2009-05-24 15:43 ` Michael S. Zick 2009-05-27 22:13 ` Roland Dreier 2 siblings, 2 replies; 90+ messages in thread From: Harald Welte @ 2009-05-24 7:04 UTC (permalink / raw) To: Michael S. Zick Cc: H. Peter Anvin, Ingo Molnar, Thomas Gleixner, linux-kernel On Sat, May 23, 2009 at 05:44:52AM -0500, Michael S. Zick wrote: > *) The HP-2133 MiniNote uses the cn896 chipset, which > has not yet been released from NDA. I can see towards getting that changed, but I doubt this helps us with the current problem. > I do not have the C7-M technical reference, it is still > under NDA. I obviously have access to that documentation (which is also on its way to become public, but needs more time) - but believe me, there is nothing in that documentation that would help you to debug this problem :( > *But* if a developer on this list has a copy of the > manual *and* owns one of these three brands of machine - > they would have fixed their own machine a year ago. I actually own a 2133 mininote, but I rarely used it for anything but to test openchrome on it. What do you suggest me to try? I also have some other systems with a C7-M, so I can certainly verify certain code on a number of them, if a good testcase exists. -- - Harald Welte <HaraldWelte@viatech.com> http://linux.via.com.tw/ ============================================================================ VIA Open Source Liaison ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [BUG FIX] Make x86_32 uni-processor Atomic ops, Atomic 2009-05-24 7:04 ` Harald Welte @ 2009-05-24 12:48 ` Michael S. Zick 2009-05-24 15:43 ` Michael S. Zick 1 sibling, 0 replies; 90+ messages in thread From: Michael S. Zick @ 2009-05-24 12:48 UTC (permalink / raw) To: Harald Welte; +Cc: H. Peter Anvin, Ingo Molnar, Thomas Gleixner, linux-kernel On Sun May 24 2009, Harald Welte wrote: > On Sat, May 23, 2009 at 05:44:52AM -0500, Michael S. Zick wrote: > > > *) The HP-2133 MiniNote uses the cn896 chipset, which > > has not yet been released from NDA. > > I can see towards getting that changed, but I doubt this helps us with the > current problem. > > > I do not have the C7-M technical reference, it is still > > under NDA. > > I obviously have access to that documentation (which is also on its way > to become public, but needs more time) - but believe me, there is nothing > in that documentation that would help you to debug this problem :( > > > *But* if a developer on this list has a copy of the > > manual *and* owns one of these three brands of machine - > > they would have fixed their own machine a year ago. > > I actually own a 2133 mininote, but I rarely used it for anything but to test > openchrome on it. What do you suggest me to try? > The HP-2133 (C7-M/CN896) did not fail yesterday. Find a C7-M/CX700 machine. You might hook the rss feed at: http://forum.netbookuser.com/viewforum.php?id=8 where my rants/raves/speculations are logged and the other people helping me test make their comments. In particular: The original instructions (including download url): http://forum.netbookuser.com/viewtopic.php?pid=6702#p6702 Updated installation instructions: http://forum.netbookuser.com/viewtopic.php?id=907 Also ignore anything you read in LKML that I have been doing this in secret - those authors just never got the memo. ;) > I also have some other systems with a C7-M, so I can certainly verify > certain code on a number of them, if a good testcase exists. > Still working towards a specific test case - only thing at this point it the sledge hammer of putting the "lock" back in, everywhere. Mike ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [BUG FIX] Make x86_32 uni-processor Atomic ops, Atomic 2009-05-24 7:04 ` Harald Welte 2009-05-24 12:48 ` Michael S. Zick @ 2009-05-24 15:43 ` Michael S. Zick 1 sibling, 0 replies; 90+ messages in thread From: Michael S. Zick @ 2009-05-24 15:43 UTC (permalink / raw) To: Harald Welte Cc: H. Peter Anvin, Ingo Molnar, Thomas Gleixner, linux-kernel, Alan Cox On Sun May 24 2009, Harald Welte wrote: > On Sat, May 23, 2009 at 05:44:52AM -0500, Michael S. Zick wrote: > > > *) The HP-2133 MiniNote uses the cn896 chipset, which > > has not yet been released from NDA. > > I can see towards getting that changed, but I doubt this helps us with the > current problem. > > > I do not have the C7-M technical reference, it is still > > under NDA. > > I obviously have access to that documentation (which is also on its way > to become public, but needs more time) - but believe me, there is nothing > in that documentation that would help you to debug this problem :( > > > *But* if a developer on this list has a copy of the > > manual *and* owns one of these three brands of machine - > > they would have fixed their own machine a year ago. > > I actually own a 2133 mininote, but I rarely used it for anything but to test > openchrome on it. What do you suggest me to try? > The {,lk} pair of yesterday - now built against tag 2.6.3-rc7 is posted as -09144{,lk} Details here: http://forum.netbookuser.com/viewtopic.php?pid=6976#p6976 Try them on a C7-M/CX700 (or newer NetBook system chipset) (I don't normally test on the HP-2133 (C7-M/CN896) since I am not (yet) dealing with the Broadcom firmware and SBB driver.) Mike > I also have some other systems with a C7-M, so I can certainly verify > certain code on a number of them, if a good testcase exists. > ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [BUG FIX] Make x86_32 uni-processor Atomic ops, Atomic 2009-05-23 10:44 ` Michael S. Zick 2009-05-23 11:18 ` Michael S. Zick 2009-05-24 7:04 ` Harald Welte @ 2009-05-27 22:13 ` Roland Dreier 2009-05-27 22:33 ` Michael S. Zick 2 siblings, 1 reply; 90+ messages in thread From: Roland Dreier @ 2009-05-27 22:13 UTC (permalink / raw) To: peterz; +Cc: lkml, H. Peter Anvin, Ingo Molnar, Thomas Gleixner, linux-kernel > The only objective information is posted here: > http://lkml.org/lkml/2009/5/20/342 Not sure if you've looked at this, but it's a lockdep trace that looks to be a valid lockdep report due to non-annotated code (I don't *think* it's a bug). To summarize, there is the code path in kernel/irq/spurious.c that does: poll_spurious_irq_timer -> poll_spurious_irqs() [from timer, with hard IRQs on] -> poll_all_shared_irqs() [if we think an IRQ got stuck] -> try_one_irq() -> spin_lock(&desc->lock) [as above -- hard IRQs on] while kernel/irq/chip.c has: handle_level_irq() [called with hard IRQs off] -> spin_lock(&desc->lock) [as above -- hard IRQs off] and lockdep can't tell that the interrupt corresponding to desc has been disabled if we ever actually reach try_one_irq(), so there's no risk of the interrupt coming in and deadlocking while the try_one_irq() code holds desc->lock. Unfortunately I don't know how to annotate this... ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [BUG FIX] Make x86_32 uni-processor Atomic ops, Atomic 2009-05-27 22:13 ` Roland Dreier @ 2009-05-27 22:33 ` Michael S. Zick 0 siblings, 0 replies; 90+ messages in thread From: Michael S. Zick @ 2009-05-27 22:33 UTC (permalink / raw) To: Roland Dreier Cc: peterz, H. Peter Anvin, Ingo Molnar, Thomas Gleixner, linux-kernel On Wed May 27 2009, Roland Dreier wrote: > > > The only objective information is posted here: > > http://lkml.org/lkml/2009/5/20/342 > > Not sure if you've looked at this, but it's a lockdep trace that looks > to be a valid lockdep report due to non-annotated code (I don't *think* > it's a bug). To summarize, there is the code path in > kernel/irq/spurious.c that does: > I haven't looked at it - beyond my skill level. Still trying to deal with a machine where the only symptom is a deadlock. So I post these for someone else's eyes until I figure out the deadlock. Mike > poll_spurious_irq_timer -> > poll_spurious_irqs() [from timer, with hard IRQs on] -> > poll_all_shared_irqs() [if we think an IRQ got stuck] -> > try_one_irq() -> > spin_lock(&desc->lock) [as above -- hard IRQs on] > > while kernel/irq/chip.c has: > > handle_level_irq() [called with hard IRQs off] -> > spin_lock(&desc->lock) [as above -- hard IRQs off] > > and lockdep can't tell that the interrupt corresponding to desc has been > disabled if we ever actually reach try_one_irq(), so there's no risk of > the interrupt coming in and deadlocking while the try_one_irq() code > holds desc->lock. > > Unfortunately I don't know how to annotate this... > > ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [BUG FIX] Make x86_32 uni-processor Atomic ops, Atomic 2009-05-23 0:51 ` H. Peter Anvin 2009-05-23 10:44 ` Michael S. Zick @ 2009-05-23 15:52 ` Michael S. Zick 2009-05-23 18:04 ` Michael S. Zick 2009-05-23 20:51 ` [BUG FIX] Make x86_32 uni-processor Atomic ops, Atomic Michael S. Zick 3 siblings, 0 replies; 90+ messages in thread From: Michael S. Zick @ 2009-05-23 15:52 UTC (permalink / raw) To: H. Peter Anvin; +Cc: Ingo Molnar, Thomas Gleixner, linux-kernel On Fri May 22 2009, H. Peter Anvin wrote: > Michael S. Zick wrote: > > Same integrated motherboard. > > Which means same CPU, same BIOS, same motherboard (none of which you're > telling us.) > > cpuinfo and dmidecode would be informative. > Build: linux-2.6.30-rc6-ce1200v-09143_2.6.30-rc6-ce1200v-09143-22_i386.deb The -09143lk later today. Now also testing on the HP-2133 (C7-M/CN896) in addition to the Everex Cloudbook/Sylvania gBook (C7-M/CX700). Additional details: http://forum.netbookuser.com/viewtopic.php?pid=6968#p6968 Download location: http://hp-umpc.com/ce1200v/ HP-2133 data capture and the Sylvania/Everex data capture: hp-2133-data_cap.tgz sylvania-g-data.tar.gz Summary: On the ce1200v - first test 46 minutes uptime. On the hp-2133 - ?? still running - no results yet. The -09143lk (yyddd) build not yet tested. Mike > -hpa > ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [BUG FIX] Make x86_32 uni-processor Atomic ops, Atomic 2009-05-23 0:51 ` H. Peter Anvin 2009-05-23 10:44 ` Michael S. Zick 2009-05-23 15:52 ` Michael S. Zick @ 2009-05-23 18:04 ` Michael S. Zick 2009-05-23 23:44 ` H. Peter Anvin 2009-05-23 20:51 ` [BUG FIX] Make x86_32 uni-processor Atomic ops, Atomic Michael S. Zick 3 siblings, 1 reply; 90+ messages in thread From: Michael S. Zick @ 2009-05-23 18:04 UTC (permalink / raw) To: H. Peter Anvin; +Cc: Ingo Molnar, Thomas Gleixner, linux-kernel On Fri May 22 2009, H. Peter Anvin wrote: > Michael S. Zick wrote: > > Same integrated motherboard. > > Which means same CPU, same BIOS, same motherboard (none of which you're > telling us.) > > cpuinfo and dmidecode would be informative. > The -09143lk files are posted. Download location: http://hp-umpc.com/ce1200v Details so far today: http://forum.netbookuser.com/viewtopic.php?pid=6968#p6968 Summary: HP-2133 (C7-M/CN896) - 09143 - No results - Still up 3 1/2 hours. Cloudbook (C7-M/CX700) - 09143 - 45 minutes. Cloudbook (C7-M/CX700) - 09143lk - No results - Still up 1 1/2 hours. OK - time to look for the missing "memory" in the clobber lists. ;) Mike > -hpa > > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > > ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [BUG FIX] Make x86_32 uni-processor Atomic ops, Atomic 2009-05-23 18:04 ` Michael S. Zick @ 2009-05-23 23:44 ` H. Peter Anvin 2009-05-24 6:49 ` Harald Welte ` (2 more replies) 0 siblings, 3 replies; 90+ messages in thread From: H. Peter Anvin @ 2009-05-23 23:44 UTC (permalink / raw) To: Harald Welte; +Cc: lkml, Ingo Molnar, Thomas Gleixner, linux-kernel, Alan Cox Hi Harald, It looks like there might be a problem with the C7-M ... Michael reports that if he sets LOCK_PREFIX to "lock;" it works, but that shouldn't be necessary for a uniprocessor. I'm wondering if we have to revive the OOSTORE hack, or some other workaround. It is of course hard for me to track this down since (a) I don't have access to the CPU documentation, and (b) I work for Intel now, which limits the amount of time I can realistically spend on this. -hpa [Cc: Alan, who I believed developed the OOSTORE hack back when.] Michael S. Zick wrote: > On Fri May 22 2009, H. Peter Anvin wrote: >> Michael S. Zick wrote: >>> Same integrated motherboard. >> Which means same CPU, same BIOS, same motherboard (none of which you're >> telling us.) >> >> cpuinfo and dmidecode would be informative. >> > > The -09143lk files are posted. > > Download location: > http://hp-umpc.com/ce1200v > > Details so far today: > http://forum.netbookuser.com/viewtopic.php?pid=6968#p6968 > > Summary: > HP-2133 (C7-M/CN896) - 09143 - No results - Still up 3 1/2 hours. > Cloudbook (C7-M/CX700) - 09143 - 45 minutes. > Cloudbook (C7-M/CX700) - 09143lk - No results - Still up 1 1/2 hours. > > OK - time to look for the missing "memory" in the clobber lists. ;) > -- H. Peter Anvin, Intel Open Source Technology Center I work for Intel. I don't speak on their behalf. ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [BUG FIX] Make x86_32 uni-processor Atomic ops, Atomic 2009-05-23 23:44 ` H. Peter Anvin @ 2009-05-24 6:49 ` Harald Welte 2009-05-24 12:38 ` Michael S. Zick ` (2 more replies) 2009-05-24 12:27 ` Michael S. Zick 2009-05-27 17:01 ` LOCK prefix on uni processor has its use (was Re: [BUG FIX] Make x86_32 uni-processor Atomic ops, Atomic) Harald Welte 2 siblings, 3 replies; 90+ messages in thread From: Harald Welte @ 2009-05-24 6:49 UTC (permalink / raw) To: H. Peter Anvin; +Cc: lkml, Ingo Molnar, Thomas Gleixner, linux-kernel, Alan Cox Dear hpa, and others, On Sat, May 23, 2009 at 04:44:08PM -0700, H. Peter Anvin wrote: > It looks like there might be a problem with the C7-M ... Michael reports > that if he sets LOCK_PREFIX to "lock;" it works, but that shouldn't be > necessary for a uniprocessor. > I will try my best to help, though I have to admit I'm far from being a x86 expert, and particularly not with regard to low-level bits such as atomic operations. So please give me some time to research some background about that, and read up all the details on the currently reported/described problem. Once I understand it in full detail, I can talk to the right people inside CentaurLabs (VIA's CPU division). If somebody (optionally) can phrase a precise technical question that I can directly forward to somebody with low-level x86 knowledge but no Linux background, it would definitely help speeding up the process. > I'm wondering if we have to revive the OOSTORE hack, or some other > workaround. It is of course hard for me to track this down since (a) I > don't have access to the CPU documentation, As far as I know, there really is no such documentation.. all documentation that I've ever seen internally is electrical data sheets and high-level feature set descriptiosn, CPUID, MSR and padlock. There are no actual x86 instruction set documents... Centaur is < 100 people, they don't have the resources to work on documents along the lines of what Intel has... > and (b) I work for Intel now, which limits the amount of time I can > realistically spend on this. Sure, thanks for letting me know. -- - Harald Welte <HaraldWelte@viatech.com> http://linux.via.com.tw/ ============================================================================ VIA Open Source Liaison ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [BUG FIX] Make x86_32 uni-processor Atomic ops, Atomic 2009-05-24 6:49 ` Harald Welte @ 2009-05-24 12:38 ` Michael S. Zick 2009-05-24 17:31 ` Harald Welte 2009-05-27 12:18 ` Re:[VIA Support] was: " Michael S. Zick 2009-05-30 15:48 ` Michael S. Zick 2 siblings, 1 reply; 90+ messages in thread From: Michael S. Zick @ 2009-05-24 12:38 UTC (permalink / raw) To: Harald Welte Cc: H. Peter Anvin, Ingo Molnar, Thomas Gleixner, linux-kernel, Alan Cox On Sun May 24 2009, Harald Welte wrote: > Dear hpa, and others, > > On Sat, May 23, 2009 at 04:44:08PM -0700, H. Peter Anvin wrote: > > It looks like there might be a problem with the C7-M ... Michael reports > > that if he sets LOCK_PREFIX to "lock;" it works, but that shouldn't be > > necessary for a uniprocessor. > > > > I will try my best to help, though I have to admit I'm far from being > a x86 expert, and particularly not with regard to low-level bits such as atomic > operations. > > So please give me some time to research some background about that, > and read up all the details on the currently reported/described problem. > > Once I understand it in full detail, I can talk to the right people inside > CentaurLabs (VIA's CPU division). > > If somebody (optionally) can phrase a precise technical question that I can > directly forward to somebody with low-level x86 knowledge but no Linux background, > it would definitely help speeding up the process. > > > I'm wondering if we have to revive the OOSTORE hack, or some other > > workaround. It is of course hard for me to track this down since (a) I > > don't have access to the CPU documentation, > > As far as I know, there really is no such documentation.. all documentation > that I've ever seen internally is electrical data sheets and high-level feature > set descriptiosn, CPUID, MSR and padlock. There are no actual x86 instruction > set documents... Centaur is < 100 people, they don't have the resources to work > on documents along the lines of what Intel has... > My background is in the electronic hardware end of things - - Is there someone I can contact for the existing documents - Even under NDA would be fine. For instance, the layout of the CPUID results - they don't currently seem to match what the marketing people claim is inside of the chips. There are some "VIA specific" fields. Also, those funny looking electrical data sheets with the wiggly lines will mean something to me in terms of when to use the "lock" prefix. All you have to do is grow up with such things. ;) Could you also dig around for a tech manual on CN896 similar to the one (of two) CX700 manuals that are publicly posted? Even under NDA is fine. > > and (b) I work for Intel now, which limits the amount of time I can > > realistically spend on this. > I might be able to get you a machine, but if you are scanned at the front door for VIA or AMD hardware. . . ;) Mike > Sure, thanks for letting me know. ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [BUG FIX] Make x86_32 uni-processor Atomic ops, Atomic 2009-05-24 12:38 ` Michael S. Zick @ 2009-05-24 17:31 ` Harald Welte 0 siblings, 0 replies; 90+ messages in thread From: Harald Welte @ 2009-05-24 17:31 UTC (permalink / raw) To: Michael S. Zick Cc: H. Peter Anvin, Ingo Molnar, Thomas Gleixner, linux-kernel, Alan Cox Hi Michael, On Sun, May 24, 2009 at 07:38:44AM -0500, Michael S. Zick wrote: > > As far as I know, there really is no such documentation.. all documentation > > that I've ever seen internally is electrical data sheets and high-level feature > > set descriptiosn, CPUID, MSR and padlock. There are no actual x86 instruction > > set documents... Centaur is < 100 people, they don't have the resources to work > > on documents along the lines of what Intel has... > > My background is in the electronic hardware end of things - - > Is there someone I can contact for the existing documents - > Even under NDA would be fine. I have inquired right now. The regular NDA process I would assume is probably quite slow. The CPU documentation is already on its track for becoming public at some point (but very slooooow track), so I'll see what I can do and contact you in private mail. > For instance, the layout of the CPUID results - they don't > currently seem to match what the marketing people claim is > inside of the chips. There are some "VIA specific" fields. There's two versions of the C7-M, an 'A' model (90nm SOI) and a much more recent 'D' model (90nm conventional process). They CPUID values are 6-a and 6-d, respectively. The cpu ID string of the former ones contains Esther, the latter one contains C7-M - but in fact any BIOS could override the cpu ID string (not cpuid!) with whatever they want using a backdoor in some MSR. > Could you also dig around for a tech manual on CN896 similar to > the one (of two) CX700 manuals that are publicly posted? I've asked about that. The programming guides for chipsets are generally on the 'open track', whereas the electrical data sheets with pinouts and timing values are under NDA. The CN896 was just already an "old" component when that new open-track policy was introduced, and typically VIA is trying to focus on docs and drivers for new products, rather than old ones. But I have asked if we can release the CN896 programming manual public. > Even under NDA is fine. Well, I prefer to make sure that we have the neccessary information open. NDA's are fine and well for the limited number of customers you have, but makign NDA's with various individual programmers really is too painful, there should be other ways... -- - Harald Welte <HaraldWelte@viatech.com> http://linux.via.com.tw/ ============================================================================ VIA Open Source Liaison ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re:[VIA Support] was: [BUG FIX] Make x86_32 uni-processor Atomic ops, Atomic 2009-05-24 6:49 ` Harald Welte 2009-05-24 12:38 ` Michael S. Zick @ 2009-05-27 12:18 ` Michael S. Zick 2009-05-27 12:22 ` [VIA " Michael S. Zick ` (2 more replies) 2009-05-30 15:48 ` Michael S. Zick 2 siblings, 3 replies; 90+ messages in thread From: Michael S. Zick @ 2009-05-27 12:18 UTC (permalink / raw) To: Harald Welte Cc: H. Peter Anvin, Ingo Molnar, Thomas Gleixner, linux-kernel, Alan Cox On Sun May 24 2009, Harald Welte wrote: > > Once I understand it in full detail, I can talk to the right people inside > CentaurLabs (VIA's CPU division). > > If somebody (optionally) can phrase a precise technical question that I can > directly forward to somebody with low-level x86 knowledge but no Linux background, > it would definitely help speeding up the process. > What is the PCI Cache Line size in the CX700? In the CN896? Ref: arch/x86/pci/common.c As in: /* * Assume PCI cacheline size of 32 bytes for all x86s except K7/K8 * and P4. It's also good for 386/486s (which actually have 16) * as quite a few PCI devices do not support smaller values. */ pci_cache_line_size = 32 >> 2; if (c->x86 >= 6 && c->x86_vendor == X86_VENDOR_AMD) pci_cache_line_size = 64 >> 2; /* K7 & K8 */ else if (c->x86 > 6 && c->x86_vendor == X86_VENDOR_INTEL) pci_cache_line_size = 128 >> 2; /* P4 */ A problem with cache coherency, alignment, or consistency would explain the problems I am seeing - and the differences in the test cases. Mike ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [VIA Support] was: [BUG FIX] Make x86_32 uni-processor Atomic ops, Atomic 2009-05-27 12:18 ` Re:[VIA Support] was: " Michael S. Zick @ 2009-05-27 12:22 ` Michael S. Zick 2009-05-27 12:47 ` Harald Welte 2009-05-29 12:06 ` Michael S. Zick 2 siblings, 0 replies; 90+ messages in thread From: Michael S. Zick @ 2009-05-27 12:22 UTC (permalink / raw) To: Harald Welte Cc: H. Peter Anvin, Ingo Molnar, Thomas Gleixner, linux-kernel, Alan Cox On Wed May 27 2009, Michael S. Zick wrote: > On Sun May 24 2009, Harald Welte wrote: > > > > Once I understand it in full detail, I can talk to the right people inside > > CentaurLabs (VIA's CPU division). > > > > If somebody (optionally) can phrase a precise technical question that I can > > directly forward to somebody with low-level x86 knowledge but no Linux background, > > it would definitely help speeding up the process. > > > > What is the PCI Cache Line size in the CX700? In the CN896? > > Ref: > arch/x86/pci/common.c > > As in: > /* > * Assume PCI cacheline size of 32 bytes for all x86s except K7/K8 > * and P4. It's also good for 386/486s (which actually have 16) > * as quite a few PCI devices do not support smaller values. > */ > > pci_cache_line_size = 32 >> 2; > if (c->x86 >= 6 && c->x86_vendor == X86_VENDOR_AMD) > pci_cache_line_size = 64 >> 2; /* K7 & K8 */ > else if (c->x86 > 6 && c->x86_vendor == X86_VENDOR_INTEL) > pci_cache_line_size = 128 >> 2; /* P4 */ > > A problem with cache coherency, alignment, or consistency would explain > the problems I am seeing - and the differences in the test cases. > Related speculations: http://forum.netbookuser.com/viewtopic.php?pid=6987#p6987 Mike ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [VIA Support] was: [BUG FIX] Make x86_32 uni-processor Atomic ops, Atomic 2009-05-27 12:18 ` Re:[VIA Support] was: " Michael S. Zick 2009-05-27 12:22 ` [VIA " Michael S. Zick @ 2009-05-27 12:47 ` Harald Welte 2009-05-27 13:00 ` Michael S. Zick 2009-05-29 12:06 ` Michael S. Zick 2 siblings, 1 reply; 90+ messages in thread From: Harald Welte @ 2009-05-27 12:47 UTC (permalink / raw) To: Michael S. Zick Cc: H. Peter Anvin, Ingo Molnar, Thomas Gleixner, linux-kernel, Alan Cox On Wed, May 27, 2009 at 07:18:08AM -0500, Michael S. Zick wrote: > On Sun May 24 2009, Harald Welte wrote: > > > > Once I understand it in full detail, I can talk to the right people inside > > CentaurLabs (VIA's CPU division). > > > > If somebody (optionally) can phrase a precise technical question that I can > > directly forward to somebody with low-level x86 knowledge but no Linux background, > > it would definitely help speeding up the process. > > > > What is the PCI Cache Line size in the CX700? In the CN896? The chipset documentation doesn't say anything about that, I'd have to inquire inside VIA. I doubt any difference between CX700/CN896. Also, setting the PCI config space register to a too small cache line size (such as 32) on a system that supports more (say 64) doesn't really cause any problems, but just reduces performance - as far as I know. Setting it too big will cause trouble. But since 32 is the default and only on AMD and Intel CPU's it is increased, I see no issue here either. -- - Harald Welte <HaraldWelte@viatech.com> http://linux.via.com.tw/ ============================================================================ VIA Free and Open Source Software Liaison ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [VIA Support] was: [BUG FIX] Make x86_32 uni-processor Atomic ops, Atomic 2009-05-27 12:47 ` Harald Welte @ 2009-05-27 13:00 ` Michael S. Zick 0 siblings, 0 replies; 90+ messages in thread From: Michael S. Zick @ 2009-05-27 13:00 UTC (permalink / raw) To: Harald Welte Cc: H. Peter Anvin, Ingo Molnar, Thomas Gleixner, linux-kernel, Alan Cox On Wed May 27 2009, Harald Welte wrote: > On Wed, May 27, 2009 at 07:18:08AM -0500, Michael S. Zick wrote: > > On Sun May 24 2009, Harald Welte wrote: > > > > > > Once I understand it in full detail, I can talk to the right people inside > > > CentaurLabs (VIA's CPU division). > > > > > > If somebody (optionally) can phrase a precise technical question that I can > > > directly forward to somebody with low-level x86 knowledge but no Linux background, > > > it would definitely help speeding up the process. > > > > > > > What is the PCI Cache Line size in the CX700? In the CN896? > > The chipset documentation doesn't say anything about that, I'd have to inquire > inside VIA. I doubt any difference between CX700/CN896. > > Also, setting the PCI config space register to a too small cache line size > (such as 32) on a system that supports more (say 64) doesn't really cause any > problems, but just reduces performance - as far as I know. > > Setting it too big will cause trouble. But since 32 is the default and > only on AMD and Intel CPU's it is increased, I see no issue here either. > Since the system chip sets where designed for use with the processor - I am going to poke it up to the processor cache line size - just for fun. If our assumptions are correct (I do agree with your statements myself) - then all that will happen is we halve the number of cache line flushes. ;) If not, perhaps we get another test case data point to consider. Thanks for the quick reply; Mike ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [VIA Support] was: [BUG FIX] Make x86_32 uni-processor Atomic ops, Atomic 2009-05-27 12:18 ` Re:[VIA Support] was: " Michael S. Zick 2009-05-27 12:22 ` [VIA " Michael S. Zick 2009-05-27 12:47 ` Harald Welte @ 2009-05-29 12:06 ` Michael S. Zick 2 siblings, 0 replies; 90+ messages in thread From: Michael S. Zick @ 2009-05-29 12:06 UTC (permalink / raw) To: Harald Welte Cc: H. Peter Anvin, Ingo Molnar, Thomas Gleixner, linux-kernel, Alan Cox On Wed May 27 2009, Michael S. Zick wrote: > On Sun May 24 2009, Harald Welte wrote: > > > > Once I understand it in full detail, I can talk to the right people inside > > CentaurLabs (VIA's CPU division). > > The trial build of yesterday's repository head is now posted, details at: http://forum.netbookuser.com/viewtopic.php?pid=7002#p7002 I don't consider the C7-M/CX700 whack-a-bug project finished. . . But I haven't broken my C7-M/CN896 yet either. ;) Mike ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [BUG FIX] Make x86_32 uni-processor Atomic ops, Atomic 2009-05-24 6:49 ` Harald Welte 2009-05-24 12:38 ` Michael S. Zick 2009-05-27 12:18 ` Re:[VIA Support] was: " Michael S. Zick @ 2009-05-30 15:48 ` Michael S. Zick 2 siblings, 0 replies; 90+ messages in thread From: Michael S. Zick @ 2009-05-30 15:48 UTC (permalink / raw) To: Harald Welte Cc: H. Peter Anvin, Ingo Molnar, Thomas Gleixner, linux-kernel, Alan Cox On Sun May 24 2009, Harald Welte wrote: > Dear hpa, and others, > > On Sat, May 23, 2009 at 04:44:08PM -0700, H. Peter Anvin wrote: > > It looks like there might be a problem with the C7-M ... Michael reports > > that if he sets LOCK_PREFIX to "lock;" it works, but that shouldn't be > > necessary for a uniprocessor. > > > > I will try my best to help, though I have to admit I'm far from being > a x86 expert, and particularly not with regard to low-level bits such as atomic > operations. > > So please give me some time to research some background about that, > and read up all the details on the currently reported/described problem. > > Once I understand it in full detail, I can talk to the right people inside > CentaurLabs (VIA's CPU division). > > If somebody (optionally) can phrase a precise technical question that I can > directly forward to somebody with low-level x86 knowledge but no Linux background, > it would definitely help speeding up the process. > Does the C7-M instruction set define the 'pause' instruction (0xf3,0x90)? *Defined* since the P-4, but backward compatible with earlier ia32 processors even though it falls into the "don't use rep before non-string instructions". Mike ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [BUG FIX] Make x86_32 uni-processor Atomic ops, Atomic 2009-05-23 23:44 ` H. Peter Anvin 2009-05-24 6:49 ` Harald Welte @ 2009-05-24 12:27 ` Michael S. Zick 2009-05-24 17:22 ` Harald Welte 2009-05-24 18:00 ` H. Peter Anvin 2009-05-27 17:01 ` LOCK prefix on uni processor has its use (was Re: [BUG FIX] Make x86_32 uni-processor Atomic ops, Atomic) Harald Welte 2 siblings, 2 replies; 90+ messages in thread From: Michael S. Zick @ 2009-05-24 12:27 UTC (permalink / raw) To: H. Peter Anvin Cc: Harald Welte, Ingo Molnar, Thomas Gleixner, linux-kernel, Alan Cox On Sat May 23 2009, H. Peter Anvin wrote: > Hi Harald, > > It looks like there might be a problem with the C7-M ... Michael reports > that if he sets LOCK_PREFIX to "lock;" it works, but that shouldn't be > necessary for a uniprocessor. > > I'm wondering if we have to revive the OOSTORE hack, or some other > workaround. It is of course hard for me to track this down since (a) I > don't have access to the CPU documentation, and (b) I work for Intel > now, which limits the amount of time I can realistically spend on this. > @hpa - I still like your suggestion that it is only one (or a few) uses of atomic ops that is incorrect and in general atomic ops should compile away on uni-processor. Let me translate the findings (see further in the included post) - The C7-M/CN896 (no tech manual released for CN896 yet) and the C7-M/CX700 (tech manual released since drivers written) *) I never tested -09143lk on the C7-M/CN896 because -09143 did not fail all day (a record for 2.6.30 at the moment). *) The difference on the C7-M/CX700 between the -09143 and -09143lk I consider significant. ***) But, keep in mind, just because the system chip set is different, there are other unknowns - - We can *not* say at the moment that both machines where using the same execution paths - even though the binaries where identical. Also, there where probably different external modules loaded in the two runs - not many, mostly things are built-in. The truly significant point on the C7-M/CX700 running -09143lk was that when the echi-hcd driver got hung in its failure loop, generating a flood of messages - it did not take down or lock the kernel. I consider this "forward progress" - it should be possible to build-in the lock-dep checkers and get something in the message buffer - rather than just have the machine halt. Its hard to debug a halted machine with only a glowing power-on light for feed-back. ;) Mike > -hpa > > [Cc: Alan, who I believed developed the OOSTORE hack back when.] > > > Michael S. Zick wrote: > > On Fri May 22 2009, H. Peter Anvin wrote: > >> Michael S. Zick wrote: > >>> Same integrated motherboard. > >> Which means same CPU, same BIOS, same motherboard (none of which you're > >> telling us.) > >> > >> cpuinfo and dmidecode would be informative. > >> > > > > The -09143lk files are posted. > > > > Download location: > > http://hp-umpc.com/ce1200v > > > > Details so far today: > > http://forum.netbookuser.com/viewtopic.php?pid=6968#p6968 > > > > Summary: > > HP-2133 (C7-M/CN896) - 09143 - No results - Still up 3 1/2 hours. > > Cloudbook (C7-M/CX700) - 09143 - 45 minutes. > > Cloudbook (C7-M/CX700) - 09143lk - No results - Still up 1 1/2 hours. > > > > OK - time to look for the missing "memory" in the clobber lists. ;) > > > ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [BUG FIX] Make x86_32 uni-processor Atomic ops, Atomic 2009-05-24 12:27 ` Michael S. Zick @ 2009-05-24 17:22 ` Harald Welte 2009-05-24 18:00 ` H. Peter Anvin 1 sibling, 0 replies; 90+ messages in thread From: Harald Welte @ 2009-05-24 17:22 UTC (permalink / raw) To: Michael S. Zick Cc: H. Peter Anvin, Ingo Molnar, Thomas Gleixner, linux-kernel, Alan Cox On Sun, May 24, 2009 at 07:27:27AM -0500, Michael S. Zick wrote: > *) The difference on the C7-M/CX700 between the -09143 and -09143lk > I consider significant. I agree. > ***) But, keep in mind, just because the system chip set is different, > there are other unknowns - - > We can *not* say at the moment that both machines where using the same > execution paths - even though the binaries where identical. yes, of course. > Also, there where probably different external modules loaded in the > two runs - not many, mostly things are built-in. > > The truly significant point on the C7-M/CX700 running -09143lk was that > when the echi-hcd driver got hung in its failure loop, generating a > flood of messages - it did not take down or lock the kernel. > > I consider this "forward progress" - it should be possible to build-in > the lock-dep checkers and get something in the message buffer - > rather than just have the machine halt. Its hard to debug a halted > machine with only a glowing power-on light for feed-back. ;) well, if you're not working with notebooks but actual regular mainboard devices, then you should have a serial console and possibly still have magic sysrq or at least some other interesting information on the console. I personally don't have access to a CX700 based board at the moment, and due to my travel schedule I won't get that before June 6th. However, I do have access to C7-M boards with VX800 and VX855. However, they don't use the VIA Rhine Ethernet chip, so if you are triggering the bug with that driver, it is unlikely to occur there. Meanwhile, I will inquire what the CPU guys think should happen with regard to the LOCK prefix. If their view of the world of what they expect from the hardware is already different from our assumptions, we can save ourselves time consuming testing... Regards, -- - Harald Welte <HaraldWelte@viatech.com> http://linux.via.com.tw/ ============================================================================ VIA Open Source Liaison ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [BUG FIX] Make x86_32 uni-processor Atomic ops, Atomic 2009-05-24 12:27 ` Michael S. Zick 2009-05-24 17:22 ` Harald Welte @ 2009-05-24 18:00 ` H. Peter Anvin 2009-05-24 18:32 ` Michael S. Zick 2009-05-28 20:30 ` Pavel Machek 1 sibling, 2 replies; 90+ messages in thread From: H. Peter Anvin @ 2009-05-24 18:00 UTC (permalink / raw) To: lkml; +Cc: Harald Welte, Ingo Molnar, Thomas Gleixner, linux-kernel, Alan Cox Michael S. Zick wrote: > > @hpa - I still like your suggestion that it is only one (or a few) > uses of atomic ops that is incorrect and in general atomic ops > should compile away on uni-processor. > Actually, the more I think about it the more I suspect there is a race condition either in the chip set or in any VIA-specific drivers (if there are any.) Putting LOCKs in random places will slow the CPU down significantly, so it might resolve the race condition without actually solving the problem. -hpa -- H. Peter Anvin, Intel Open Source Technology Center I work for Intel. I don't speak on their behalf. ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [BUG FIX] Make x86_32 uni-processor Atomic ops, Atomic 2009-05-24 18:00 ` H. Peter Anvin @ 2009-05-24 18:32 ` Michael S. Zick 2009-05-24 18:46 ` H. Peter Anvin ` (2 more replies) 2009-05-28 20:30 ` Pavel Machek 1 sibling, 3 replies; 90+ messages in thread From: Michael S. Zick @ 2009-05-24 18:32 UTC (permalink / raw) To: H. Peter Anvin Cc: Harald Welte, Ingo Molnar, Thomas Gleixner, linux-kernel, Alan Cox On Sun May 24 2009, H. Peter Anvin wrote: > Michael S. Zick wrote: > > > > @hpa - I still like your suggestion that it is only one (or a few) > > uses of atomic ops that is incorrect and in general atomic ops > > should compile away on uni-processor. > > > > Actually, the more I think about it the more I suspect there is a race > condition either in the chip set or in any VIA-specific drivers (if > there are any.) Putting LOCKs in random places will slow the CPU down > significantly, so it might resolve the race condition without actually > solving the problem. > They are mostly out of the -09143 and -09144 builds - No cpufreq (I.E: no e_powersaver). The padlock-* drivers are modules which must be manually loaded. The i2c-viapro driver (in spite of its comments) does not work on CX700 (written before manual was released) - it is reading the serial number rather than the second data port. ;) (No access to the chipset temperature/voltage data on SMBus). The via-fb driver just "doesn't work" - Haven't looked at it yet. There is a VIA-specific driver for the VIA USB controller, but it isn't in the x86 part of the tree - Haven't looked at it yet. There isn't a driver for the hardware watchdog on CX700 - There isn't a driver for the machine error reporting - = = = = Although there may be timing requirement differences on the CX700 and CN896 - I think more likely a human error (typo) in the "clobber" lines of the asm - Have not yet audited that, but it is high on my list. Note: I have seem to recall that newer gcc's optimizer presume that the flags register is preserved across asm - It didn't use to do that - but there is now a "cc" to deal with that - Have not yet audited for that, but it is high on my list. Busy, busy, busy - - The -09144lk on C7-M/CX700 now up for 3 3/4 hours close to a new record - but ehci-hcd has not yet gone into a re-try loop. Mike > -hpa > ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [BUG FIX] Make x86_32 uni-processor Atomic ops, Atomic 2009-05-24 18:32 ` Michael S. Zick @ 2009-05-24 18:46 ` H. Peter Anvin 2009-05-24 19:09 ` Michael S. Zick 2009-05-25 19:03 ` Michael S. Zick 2009-05-25 1:31 ` i2c-viapro / via-fb drivers on VIA CX700 Harald Welte 2009-05-25 16:05 ` [BUG FIX] Make x86_32 uni-processor Atomic ops, Atomic Michael S. Zick 2 siblings, 2 replies; 90+ messages in thread From: H. Peter Anvin @ 2009-05-24 18:46 UTC (permalink / raw) To: lkml; +Cc: Harald Welte, Ingo Molnar, Thomas Gleixner, linux-kernel, Alan Cox Michael S. Zick wrote: > > Note: I have seem to recall that newer gcc's optimizer presume > that the flags register is preserved across asm - > It didn't use to do that - but there is now a "cc" to deal with > that - Have not yet audited for that, but it is high on my list. > I am pretty sure that's false... if it was true we'd have failures all over the kernel. -hpa -- H. Peter Anvin, Intel Open Source Technology Center I work for Intel. I don't speak on their behalf. ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [BUG FIX] Make x86_32 uni-processor Atomic ops, Atomic 2009-05-24 18:46 ` H. Peter Anvin @ 2009-05-24 19:09 ` Michael S. Zick 2009-05-25 19:03 ` Michael S. Zick 1 sibling, 0 replies; 90+ messages in thread From: Michael S. Zick @ 2009-05-24 19:09 UTC (permalink / raw) To: H. Peter Anvin Cc: Harald Welte, Ingo Molnar, Thomas Gleixner, linux-kernel, Alan Cox On Sun May 24 2009, H. Peter Anvin wrote: > Michael S. Zick wrote: > > > > Note: I have seem to recall that newer gcc's optimizer presume > > that the flags register is preserved across asm - > > It didn't use to do that - but there is now a "cc" to deal with > > that - Have not yet audited for that, but it is high on my list. > > > > I am pretty sure that's false... if it was true we'd have failures all > over the kernel. > Not an issue at the moment, will cover that when I audit my own code. Mike > -hpa > ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [BUG FIX] Make x86_32 uni-processor Atomic ops, Atomic 2009-05-24 18:46 ` H. Peter Anvin 2009-05-24 19:09 ` Michael S. Zick @ 2009-05-25 19:03 ` Michael S. Zick 2009-05-25 19:18 ` Michael S. Zick 1 sibling, 1 reply; 90+ messages in thread From: Michael S. Zick @ 2009-05-25 19:03 UTC (permalink / raw) To: H. Peter Anvin Cc: Harald Welte, Ingo Molnar, Thomas Gleixner, linux-kernel, Alan Cox On Sun May 24 2009, H. Peter Anvin wrote: > Michael S. Zick wrote: > > > > Note: I have seem to recall that newer gcc's optimizer presume > > that the flags register is preserved across asm - > > It didn't use to do that - but there is now a "cc" to deal with > > that - Have not yet audited for that, but it is high on my list. > > > > I am pretty sure that's false... if it was true we'd have failures all > over the kernel. > No information on the above (yet) - but you gotta love this one: ;) Programmer authors code specifying that the subtraction be done prior to the addition to avoid over-flow conditions; GCC's optimizer, in its great wisdom, codes in the overflow case: ( the case of finding the characters used/free in a ring buffer ) extern int diff_umask(int mask, int *cnt1, int *cnt2) { return (((mask - *cnt1) + *cnt2) & mask); } /** * gcc -O2 -S -fomit-frame-pointer difftest.c * .file "difftest.c" .text .p2align 4,,15 .globl diff_umask .type diff_umask, @function diff_umask: movl 12(%esp), %eax movl 4(%esp), %ecx movl (%eax), %edx leal (%ecx,%edx), %eax movl 8(%esp), %edx subl (%edx), %eax andl %ecx, %eax ret .size diff_umask, .-diff_umask .ident "GCC: (Debian 4.3.2-1.1) 4.3.2" .section .note.GNU-stack,"",@progbits */ Note: That is not the compiler version I am building my kernels with. Don't blame me, I didn't write the compiler. ;) Mike > -hpa > ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [BUG FIX] Make x86_32 uni-processor Atomic ops, Atomic 2009-05-25 19:03 ` Michael S. Zick @ 2009-05-25 19:18 ` Michael S. Zick 2009-05-25 19:46 ` Michael S. Zick 0 siblings, 1 reply; 90+ messages in thread From: Michael S. Zick @ 2009-05-25 19:18 UTC (permalink / raw) To: H. Peter Anvin Cc: Harald Welte, Ingo Molnar, Thomas Gleixner, linux-kernel, Alan Cox On Mon May 25 2009, Michael S. Zick wrote: > On Sun May 24 2009, H. Peter Anvin wrote: > > Michael S. Zick wrote: > > > > > > Note: I have seem to recall that newer gcc's optimizer presume > > > that the flags register is preserved across asm - > > > It didn't use to do that - but there is now a "cc" to deal with > > > that - Have not yet audited for that, but it is high on my list. > > > > > > > I am pretty sure that's false... if it was true we'd have failures all > > over the kernel. > > > > No information on the above (yet) - but you gotta love this one: ;) > > Programmer authors code specifying that the subtraction be done > prior to the addition to avoid over-flow conditions; > > GCC's optimizer, in its great wisdom, codes in the overflow case: > ( the case of finding the characters used/free in a ring buffer ) > > extern int diff_umask(int mask, int *cnt1, int *cnt2) > { return (((mask - *cnt1) + *cnt2) & mask); } > > /** > * gcc -O2 -S -fomit-frame-pointer difftest.c > * > .file "difftest.c" > .text > .p2align 4,,15 > .globl diff_umask > .type diff_umask, @function > diff_umask: > movl 12(%esp), %eax > movl 4(%esp), %ecx > movl (%eax), %edx > leal (%ecx,%edx), %eax > movl 8(%esp), %edx > subl (%edx), %eax > andl %ecx, %eax > ret > .size diff_umask, .-diff_umask > .ident "GCC: (Debian 4.3.2-1.1) 4.3.2" > .section .note.GNU-stack,"",@progbits > */ > > Note: That is not the compiler version I am building my kernels with. > The compiler I am using (Gentoo 4.1.2) gets it correct: .file "difftest.c" .text .p2align 4,,15 .globl diff_umask .type diff_umask, @function diff_umask: movl 4(%esp), %eax movl 8(%esp), %edx movl %eax, %ecx subl (%edx), %ecx movl %ecx, %edx movl 12(%esp), %ecx addl (%ecx), %edx andl %edx, %eax ret .size diff_umask, .-diff_umask .ident "GCC: (GNU) 4.1.2 (Gentoo 4.1.2 p1.1)" .section .note.GNU-stack,"",@progbits Mike > Don't blame me, I didn't write the compiler. ;) > > Mike > > -hpa > > > > > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > > ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [BUG FIX] Make x86_32 uni-processor Atomic ops, Atomic 2009-05-25 19:18 ` Michael S. Zick @ 2009-05-25 19:46 ` Michael S. Zick 2009-05-25 21:10 ` Michael S. Zick 0 siblings, 1 reply; 90+ messages in thread From: Michael S. Zick @ 2009-05-25 19:46 UTC (permalink / raw) To: H. Peter Anvin Cc: Harald Welte, Ingo Molnar, Thomas Gleixner, linux-kernel, Alan Cox On Mon May 25 2009, Michael S. Zick wrote: > On Mon May 25 2009, Michael S. Zick wrote: > > On Sun May 24 2009, H. Peter Anvin wrote: > > > Michael S. Zick wrote: > > > > > > > > Note: I have seem to recall that newer gcc's optimizer presume > > > > that the flags register is preserved across asm - > > > > It didn't use to do that - but there is now a "cc" to deal with > > > > that - Have not yet audited for that, but it is high on my list. > > > > > > > > > > I am pretty sure that's false... if it was true we'd have failures all > > > over the kernel. > > > > > > > No information on the above (yet) - but you gotta love this one: ;) > > > > Programmer authors code specifying that the subtraction be done > > prior to the addition to avoid over-flow conditions; > > > > GCC's optimizer, in its great wisdom, codes in the overflow case: > > ( the case of finding the characters used/free in a ring buffer ) > > > > extern int diff_umask(int mask, int *cnt1, int *cnt2) > > { return (((mask - *cnt1) + *cnt2) & mask); } > > > > /** > > * gcc -O2 -S -fomit-frame-pointer difftest.c > > * > > .file "difftest.c" > > .text > > .p2align 4,,15 > > .globl diff_umask > > .type diff_umask, @function > > diff_umask: > > movl 12(%esp), %eax > > movl 4(%esp), %ecx > > movl (%eax), %edx > > leal (%ecx,%edx), %eax > > movl 8(%esp), %edx > > subl (%edx), %eax > > andl %ecx, %eax > > ret > > .size diff_umask, .-diff_umask > > .ident "GCC: (Debian 4.3.2-1.1) 4.3.2" > > .section .note.GNU-stack,"",@progbits > > */ > > > > Note: That is not the compiler version I am building my kernels with. > > > > The compiler I am using (Gentoo 4.1.2) gets it correct: > > .file "difftest.c" > .text > .p2align 4,,15 > .globl diff_umask > .type diff_umask, @function > diff_umask: > movl 4(%esp), %eax > movl 8(%esp), %edx > movl %eax, %ecx > subl (%edx), %ecx > movl %ecx, %edx > movl 12(%esp), %ecx > addl (%ecx), %edx > andl %edx, %eax > ret > .size diff_umask, .-diff_umask > .ident "GCC: (GNU) 4.1.2 (Gentoo 4.1.2 p1.1)" > .section .note.GNU-stack,"",@progbits > Gentoo's current 4.3 gets it wrong also: .file "difftest.c" .text .p2align 4,,15 .globl diff_umask .type diff_umask, @function diff_umask: movl 12(%esp), %eax movl 4(%esp), %ecx movl (%eax), %edx leal (%ecx,%edx), %eax movl 8(%esp), %edx subl (%edx), %eax andl %ecx, %eax ret .size diff_umask, .-diff_umask .ident "GCC: (Gentoo 4.3.2-r3 p1.6, pie-10.1.5) 4.3.2" .section .note.GNU-stack,"",@progbits = = = = Might be time to put compiler version checking back into the build system and/or re-test the sources that do have version checking in them (hint: the boss's code). Mike > Mike > > Don't blame me, I didn't write the compiler. ;) > > > > Mike > > > -hpa > > > > > > > > > -- ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [BUG FIX] Make x86_32 uni-processor Atomic ops, Atomic 2009-05-25 19:46 ` Michael S. Zick @ 2009-05-25 21:10 ` Michael S. Zick 2009-05-25 21:17 ` H. Peter Anvin 0 siblings, 1 reply; 90+ messages in thread From: Michael S. Zick @ 2009-05-25 21:10 UTC (permalink / raw) To: H. Peter Anvin Cc: Harald Welte, Ingo Molnar, Thomas Gleixner, linux-kernel, Alan Cox On Mon May 25 2009, Michael S. Zick wrote: In actual application, this *should not* make a difference. Mike > On Mon May 25 2009, Michael S. Zick wrote: > > On Mon May 25 2009, Michael S. Zick wrote: > > > On Sun May 24 2009, H. Peter Anvin wrote: > > > > Michael S. Zick wrote: > > > > > > > > > > Note: I have seem to recall that newer gcc's optimizer presume > > > > > that the flags register is preserved across asm - > > > > > It didn't use to do that - but there is now a "cc" to deal with > > > > > that - Have not yet audited for that, but it is high on my list. > > > > > > > > > > > > > I am pretty sure that's false... if it was true we'd have failures all > > > > over the kernel. > > > > > > > > > > No information on the above (yet) - but you gotta love this one: ;) > > > > > > Programmer authors code specifying that the subtraction be done > > > prior to the addition to avoid over-flow conditions; > > > > > > GCC's optimizer, in its great wisdom, codes in the overflow case: > > > ( the case of finding the characters used/free in a ring buffer ) > > > > > > extern int diff_umask(int mask, int *cnt1, int *cnt2) > > > { return (((mask - *cnt1) + *cnt2) & mask); } > > > > > > /** > > > * gcc -O2 -S -fomit-frame-pointer difftest.c > > > * > > > .file "difftest.c" > > > .text > > > .p2align 4,,15 > > > .globl diff_umask > > > .type diff_umask, @function > > > diff_umask: > > > movl 12(%esp), %eax > > > movl 4(%esp), %ecx > > > movl (%eax), %edx > > > leal (%ecx,%edx), %eax > > > movl 8(%esp), %edx > > > subl (%edx), %eax > > > andl %ecx, %eax > > > ret > > > .size diff_umask, .-diff_umask > > > .ident "GCC: (Debian 4.3.2-1.1) 4.3.2" > > > .section .note.GNU-stack,"",@progbits > > > */ > > > > > > Note: That is not the compiler version I am building my kernels with. > > > > > > > The compiler I am using (Gentoo 4.1.2) gets it correct: > > > > .file "difftest.c" > > .text > > .p2align 4,,15 > > .globl diff_umask > > .type diff_umask, @function > > diff_umask: > > movl 4(%esp), %eax > > movl 8(%esp), %edx > > movl %eax, %ecx > > subl (%edx), %ecx > > movl %ecx, %edx > > movl 12(%esp), %ecx > > addl (%ecx), %edx > > andl %edx, %eax > > ret > > .size diff_umask, .-diff_umask > > .ident "GCC: (GNU) 4.1.2 (Gentoo 4.1.2 p1.1)" > > .section .note.GNU-stack,"",@progbits > > > > Gentoo's current 4.3 gets it wrong also: > > .file "difftest.c" > .text > .p2align 4,,15 > .globl diff_umask > .type diff_umask, @function > diff_umask: > movl 12(%esp), %eax > movl 4(%esp), %ecx > movl (%eax), %edx > leal (%ecx,%edx), %eax > movl 8(%esp), %edx > subl (%edx), %eax > andl %ecx, %eax > ret > .size diff_umask, .-diff_umask > .ident "GCC: (Gentoo 4.3.2-r3 p1.6, pie-10.1.5) 4.3.2" > .section .note.GNU-stack,"",@progbits > > = = = = > > Might be time to put compiler version checking back into the > build system and/or re-test the sources that do have version > checking in them (hint: the boss's code). > > Mike > > Mike > > > Don't blame me, I didn't write the compiler. ;) > > > > > > Mike > > > > -hpa > > > > > > > > > > > > > -- > > > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > > ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [BUG FIX] Make x86_32 uni-processor Atomic ops, Atomic 2009-05-25 21:10 ` Michael S. Zick @ 2009-05-25 21:17 ` H. Peter Anvin 2009-05-25 23:03 ` Michael S. Zick 0 siblings, 1 reply; 90+ messages in thread From: H. Peter Anvin @ 2009-05-25 21:17 UTC (permalink / raw) To: lkml; +Cc: Harald Welte, Ingo Molnar, Thomas Gleixner, linux-kernel, Alan Cox Michael S. Zick wrote: > On Mon May 25 2009, Michael S. Zick wrote: > > In actual application, this *should not* make a difference. > No kidding. This is a valid transformation for integers, since it is all done with 2's-complement arithmetic. Floating-point numbers is a whole other game. -hpa -- H. Peter Anvin, Intel Open Source Technology Center I work for Intel. I don't speak on their behalf. ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [BUG FIX] Make x86_32 uni-processor Atomic ops, Atomic 2009-05-25 21:17 ` H. Peter Anvin @ 2009-05-25 23:03 ` Michael S. Zick 2009-05-25 23:35 ` Michael S. Zick 2009-05-26 0:05 ` H. Peter Anvin 0 siblings, 2 replies; 90+ messages in thread From: Michael S. Zick @ 2009-05-25 23:03 UTC (permalink / raw) To: H. Peter Anvin Cc: Harald Welte, Ingo Molnar, Thomas Gleixner, linux-kernel, Alan Cox On Mon May 25 2009, H. Peter Anvin wrote: > Michael S. Zick wrote: > > On Mon May 25 2009, Michael S. Zick wrote: > > > > In actual application, this *should not* make a difference. > > > > No kidding. This is a valid transformation for integers, since it is > all done with 2's-complement arithmetic. > Load Effective Address does two's complement arithmetic? I'll take your word for it. For example: #include <stdio.h> extern int diff_umask(int mask, int *cnt1, int *cnt2) { return (((mask - *cnt1) + *cnt2) & mask); } int main() { int msk = 0x7fffffff; /* max positive */ int idx1 = 0x7ffffffd; /* max positive - 2 */ int idx2 = 0x7fffffff; /* max positive */ int rst; rst = diff_umask(msk, &idx1, &idx2); printf("\n\t%d\n", rst); /* " 1 " - correct */ } But that is because when it is compiled as a single source file, gcc is hardcoding the lea adjustment when it is not an external file: (compare to the above listings) Like I wrote - I don't use 31-bit ring buffers, so I don't care. objdump -d testdiff: - - - snip - - - 080483b0 <diff_umask>: 80483b0: 8b 44 24 0c mov 0xc(%esp),%eax 80483b4: 8b 4c 24 04 mov 0x4(%esp),%ecx 80483b8: 8b 10 mov (%eax),%edx 80483ba: 8d 04 11 lea (%ecx,%edx,1),%eax 80483bd: 8b 54 24 08 mov 0x8(%esp),%edx 80483c1: 2b 02 sub (%edx),%eax 80483c3: 21 c8 and %ecx,%eax 80483c5: c3 ret - - - snip - - - Mike > Floating-point numbers is a whole other game. > > -hpa > ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [BUG FIX] Make x86_32 uni-processor Atomic ops, Atomic 2009-05-25 23:03 ` Michael S. Zick @ 2009-05-25 23:35 ` Michael S. Zick 2009-05-26 0:05 ` H. Peter Anvin 1 sibling, 0 replies; 90+ messages in thread From: Michael S. Zick @ 2009-05-25 23:35 UTC (permalink / raw) To: H. Peter Anvin Cc: Harald Welte, Ingo Molnar, Thomas Gleixner, linux-kernel, Alan Cox On Mon May 25 2009, Michael S. Zick wrote: PS: gcc-4.1.2 does compile the function the same within the main file or as a stand-alone file. Along with maintaining the programmer specified order of operations without trying to hardcode corrections to LEA. I'll stick with 4.1.2 myself. YMMV Mike > On Mon May 25 2009, H. Peter Anvin wrote: > > Michael S. Zick wrote: > > > On Mon May 25 2009, Michael S. Zick wrote: > > > > > > In actual application, this *should not* make a difference. > > > > > > > No kidding. This is a valid transformation for integers, since it is > > all done with 2's-complement arithmetic. > > > > Load Effective Address does two's complement arithmetic? > I'll take your word for it. > > For example: > > #include <stdio.h> > > extern int diff_umask(int mask, int *cnt1, int *cnt2) > { return (((mask - *cnt1) + *cnt2) & mask); } > > int main() { > int msk = 0x7fffffff; /* max positive */ > int idx1 = 0x7ffffffd; /* max positive - 2 */ > int idx2 = 0x7fffffff; /* max positive */ > > int rst; > > rst = diff_umask(msk, &idx1, &idx2); > printf("\n\t%d\n", rst); /* " 1 " - correct */ > } > > But that is because when it is compiled as a > single source file, gcc is hardcoding the lea > adjustment when it is not an external file: > (compare to the above listings) > Like I wrote - I don't use 31-bit ring buffers, so I don't care. > > objdump -d testdiff: > - - - snip - - - > 080483b0 <diff_umask>: > 80483b0: 8b 44 24 0c mov 0xc(%esp),%eax > 80483b4: 8b 4c 24 04 mov 0x4(%esp),%ecx > 80483b8: 8b 10 mov (%eax),%edx > 80483ba: 8d 04 11 lea (%ecx,%edx,1),%eax > 80483bd: 8b 54 24 08 mov 0x8(%esp),%edx > 80483c1: 2b 02 sub (%edx),%eax > 80483c3: 21 c8 and %ecx,%eax > 80483c5: c3 ret > - - - snip - - - > > Mike > > > Floating-point numbers is a whole other game. > > > > -hpa > > > > > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > > ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [BUG FIX] Make x86_32 uni-processor Atomic ops, Atomic 2009-05-25 23:03 ` Michael S. Zick 2009-05-25 23:35 ` Michael S. Zick @ 2009-05-26 0:05 ` H. Peter Anvin 2009-05-26 12:37 ` Michael S. Zick 1 sibling, 1 reply; 90+ messages in thread From: H. Peter Anvin @ 2009-05-26 0:05 UTC (permalink / raw) To: lkml; +Cc: Harald Welte, Ingo Molnar, Thomas Gleixner, linux-kernel, Alan Cox Michael S. Zick wrote: > > Load Effective Address does two's complement arithmetic? > I'll take your word for it. > LEA, and all other address calculations use 2's-complement arithmetic: leal -1(%ebx),%eax leal 0xffffffff(%ebx),%eax ... is the same instruction. However, gcc has been known to optimize out range checks when operating on signed integers; it is allowed to do this by the C standard, but it can give surprising results if the user expected wraparound. -hpa -- H. Peter Anvin, Intel Open Source Technology Center I work for Intel. I don't speak on their behalf. ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [BUG FIX] Make x86_32 uni-processor Atomic ops, Atomic 2009-05-26 0:05 ` H. Peter Anvin @ 2009-05-26 12:37 ` Michael S. Zick 2009-05-26 17:13 ` H. Peter Anvin 0 siblings, 1 reply; 90+ messages in thread From: Michael S. Zick @ 2009-05-26 12:37 UTC (permalink / raw) To: H. Peter Anvin Cc: Harald Welte, Ingo Molnar, Thomas Gleixner, linux-kernel, Alan Cox On Mon May 25 2009, H. Peter Anvin wrote: > Michael S. Zick wrote: > > > > Load Effective Address does two's complement arithmetic? > > I'll take your word for it. > > > > LEA, and all other address calculations use 2's-complement arithmetic: > > leal -1(%ebx),%eax > leal 0xffffffff(%ebx),%eax > > ... is the same instruction. > > However, gcc has been known to optimize out range checks when operating > on signed integers; it is allowed to do this by the C standard, but it > can give surprising results if the user expected wraparound. > Well, it isn't a range check - - but this illustrates where my (false) concern came from: Given this input file: extern int diff_umask(int mask, int *cnt1, int *cnt2) { return (((mask - *cnt1) + *cnt2) & mask); } Doing: gcc -O2 -S -fomit-frame-pointer difftest.c Yields (as difftest.s): .file "difftest.c" .text .p2align 4,,15 .globl diff_umask .type diff_umask, @function diff_umask: movl 12(%esp), %eax movl 4(%esp), %ecx movl (%eax), %edx leal (%ecx,%edx), %eax movl 8(%esp), %edx subl (%edx), %eax andl %ecx, %eax ret .size diff_umask, .-diff_umask .ident "GCC: (Debian 4.3.2-1.1) 4.3.2" .section .note.GNU-stack,"",@progbits How follow that up with the commands: gcc -O2 -c -fomit-frame-pointer difftest.s Then examine the result with objdump: objdump -d difftest.o In relevant part, yields: difftest.o: file format elf32-i386 Disassembly of section .text: 00000000 <diff_umask>: 0: 8b 44 24 0c mov 0xc(%esp),%eax 4: 8b 4c 24 04 mov 0x4(%esp),%ecx 8: 8b 10 mov (%eax),%edx a: 8d 04 11 lea (%ecx,%edx,1),%eax d: 8b 54 24 08 mov 0x8(%esp),%edx 11: 2b 02 sub (%edx),%eax 13: 21 c8 and %ecx,%eax 15: c3 ret = = = = Checking the byte string 0x8d, 0x04, 0x11 against the Intel documentation shows that the disassembly output of objdump is incorrect - that bit string does not have an offset field. That is the byte encoding for the gcc assembly input. What's a person to do when the tool-chain lies? Mike > -hpa > ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [BUG FIX] Make x86_32 uni-processor Atomic ops, Atomic 2009-05-26 12:37 ` Michael S. Zick @ 2009-05-26 17:13 ` H. Peter Anvin 0 siblings, 0 replies; 90+ messages in thread From: H. Peter Anvin @ 2009-05-26 17:13 UTC (permalink / raw) To: lkml; +Cc: Harald Welte, Ingo Molnar, Thomas Gleixner, linux-kernel, Alan Cox Michael S. Zick wrote: > > Disassembly of section .text: > > 00000000 <diff_umask>: > 0: 8b 44 24 0c mov 0xc(%esp),%eax > 4: 8b 4c 24 04 mov 0x4(%esp),%ecx > 8: 8b 10 mov (%eax),%edx > a: 8d 04 11 lea (%ecx,%edx,1),%eax > d: 8b 54 24 08 mov 0x8(%esp),%edx > 11: 2b 02 sub (%edx),%eax > 13: 21 c8 and %ecx,%eax > 15: c3 ret > > = = = = > > Checking the byte string 0x8d, 0x04, 0x11 against the Intel > documentation shows that the disassembly output of objdump > is incorrect - that bit string does not have an offset field. > That is the byte encoding for the gcc assembly input. > > What's a person to do when the tool-chain lies? > The ,1 isn't an offset field... it's a scale factor. -hpa ^ permalink raw reply [flat|nested] 90+ messages in thread
* i2c-viapro / via-fb drivers on VIA CX700 2009-05-24 18:32 ` Michael S. Zick 2009-05-24 18:46 ` H. Peter Anvin @ 2009-05-25 1:31 ` Harald Welte 2009-05-25 12:54 ` Michael S. Zick 2009-05-27 13:36 ` Michael S. Zick 2009-05-25 16:05 ` [BUG FIX] Make x86_32 uni-processor Atomic ops, Atomic Michael S. Zick 2 siblings, 2 replies; 90+ messages in thread From: Harald Welte @ 2009-05-25 1:31 UTC (permalink / raw) To: Michael S. Zick; +Cc: linux-kernel On Sun, May 24, 2009 at 01:32:37PM -0500, Michael S. Zick wrote: > The i2c-viapro driver (in spite of its comments) does not work > on CX700 (written before manual was released) - it is reading > the serial number rather than the second data port. ;) > (No access to the chipset temperature/voltage data on SMBus). This is surprising. I just manually verified the driver against the cx700 programming manual, and it seems to do the right thing. Lacking access to a cx700 board right now, I cannot perform an actual test. Where exactly is the bug about the wrong register that you mentioned? I'd rather fix that ASAP. > The via-fb driver just "doesn't work" - Haven't looked at it yet. good to know. Seems like I need to get access to a CX700 based board. > There isn't a driver for the hardware watchdog on CX700 - JFYI: I wrote one but it doesn't work on the vx800/vx855, and VIA is currently trying to figure out why. > There isn't a driver for the machine error reporting - I think the CPU just claims to report it but in reality doesn't... this was made to make some proprietary software happy, AFAIR. -- - Harald Welte <HaraldWelte@viatech.com> http://linux.via.com.tw/ ============================================================================ VIA Open Source Liaison ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: i2c-viapro / via-fb drivers on VIA CX700 2009-05-25 1:31 ` i2c-viapro / via-fb drivers on VIA CX700 Harald Welte @ 2009-05-25 12:54 ` Michael S. Zick 2009-05-27 13:36 ` Michael S. Zick 1 sibling, 0 replies; 90+ messages in thread From: Michael S. Zick @ 2009-05-25 12:54 UTC (permalink / raw) To: Harald Welte; +Cc: linux-kernel On Sun May 24 2009, Harald Welte wrote: > On Sun, May 24, 2009 at 01:32:37PM -0500, Michael S. Zick wrote: > > > The i2c-viapro driver (in spite of its comments) does not work > > on CX700 (written before manual was released) - it is reading > > the serial number rather than the second data port. ;) > > (No access to the chipset temperature/voltage data on SMBus). > None of by comments have been verified sufficiently to be considered for reporting as a bug. Not yet. I need to carefully check my initial findings. Also, both the code base and my configuration has changed greatly since I just made those first notes. None of those things on my "to check" list are critical to solving a "hard lock-up" problem. Now that I get pairs of builds with vastly different behavior - - I can start narrowing in on the prime cause - My plan - *) Make local changes to printk.c that will behave differently under heavy message floods (I have the re-try loop in ehci-hcd available to generate the floods with). *) Build, again, with the lockdep reporting - This will do one of two things - **) Make true reports - which can be found and fixed **) Make false reports due to whatever is being worked-around with changing the LOCK_PREFIX macro. Even in the second case, we will get locations in the source to eye-ball/test for the effect of the LOCK_PREFIX macro. > This is surprising. I just manually verified the driver against the > cx700 programming manual, and it seems to do the right thing. Lacking > access to a cx700 board right now, I cannot perform an actual test. > > Where exactly is the bug about the wrong register that you mentioned? > I'd rather fix that ASAP. > Let me re-check my notes on that against the -rc7 build, I really, really would like to have SMBus access to those thermal monitors. Also, the Everex CloudBook (not the Sylvania gBook) controls the power to the Wifi card as an SMBus device. The Wifi card works much better with power applied. ;) > > The via-fb driver just "doesn't work" - Haven't looked at it yet. > > good to know. Seems like I need to get access to a CX700 based board. > > > There isn't a driver for the hardware watchdog on CX700 - > > JFYI: I wrote one but it doesn't work on the vx800/vx855, and VIA is currently > trying to figure out why. > Not a big issue at this point, as I read the manual, our choices are to either pull down the chipset "power off" or "reset" lines with it. Since those choices are probably grown into the silicon... Now if we had the choice to generate a crash dump... If you can send me a link to your preliminary code, I will check it against the CX700 - - I do have a machine that will let it trigger. ;) > > There isn't a driver for the machine error reporting - > > I think the CPU just claims to report it but in reality doesn't... this was > made to make some proprietary software happy, AFAIR. > I think that a working SMBus driver would give enough access to the chip set for practical purposes - these machines don't support ECC ram - the BIOS (your demo board BIOS if we believe dmidump) should be handling other machine check exceptions. Thanks very much for your time. Mike ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: i2c-viapro / via-fb drivers on VIA CX700 2009-05-25 1:31 ` i2c-viapro / via-fb drivers on VIA CX700 Harald Welte 2009-05-25 12:54 ` Michael S. Zick @ 2009-05-27 13:36 ` Michael S. Zick 1 sibling, 0 replies; 90+ messages in thread From: Michael S. Zick @ 2009-05-27 13:36 UTC (permalink / raw) To: Harald Welte; +Cc: linux-kernel On Sun May 24 2009, Harald Welte wrote: > On Sun, May 24, 2009 at 01:32:37PM -0500, Michael S. Zick wrote: > > > The i2c-viapro driver (in spite of its comments) does not work > > on CX700 (written before manual was released) - it is reading > > the serial number rather than the second data port. ;) > > (No access to the chipset temperature/voltage data on SMBus). > > This is surprising. I just manually verified the driver against the > cx700 programming manual, and it seems to do the right thing. Lacking > access to a cx700 board right now, I cannot perform an actual test. > > Where exactly is the bug about the wrong register that you mentioned? > I'd rather fix that ASAP. > I'll send you some reference material on how Everex is using the SMBus on their ce1200v (the original Cloudbook) off-list. Mike ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [BUG FIX] Make x86_32 uni-processor Atomic ops, Atomic 2009-05-24 18:32 ` Michael S. Zick 2009-05-24 18:46 ` H. Peter Anvin 2009-05-25 1:31 ` i2c-viapro / via-fb drivers on VIA CX700 Harald Welte @ 2009-05-25 16:05 ` Michael S. Zick 2 siblings, 0 replies; 90+ messages in thread From: Michael S. Zick @ 2009-05-25 16:05 UTC (permalink / raw) To: H. Peter Anvin Cc: Harald Welte, Ingo Molnar, Thomas Gleixner, linux-kernel, Alan Cox On Sun May 24 2009, Michael S. Zick wrote: > On Sun May 24 2009, H. Peter Anvin wrote: > > Michael S. Zick wrote: > > > > > > @hpa - I still like your suggestion that it is only one (or a few) > > > uses of atomic ops that is incorrect and in general atomic ops > > > should compile away on uni-processor. > > > > > > > Actually, the more I think about it the more I suspect there is a race > > condition either in the chip set or in any VIA-specific drivers (if > > there are any.) Putting LOCKs in random places will slow the CPU down > > significantly, so it might resolve the race condition without actually > > solving the problem. > > > > They are mostly out of the -09143 and -09144 builds - > No cpufreq (I.E: no e_powersaver). > The padlock-* drivers are modules which must be manually loaded. > > The i2c-viapro driver (in spite of its comments) does not work > on CX700 (written before manual was released) - it is reading > the serial number rather than the second data port. ;) > (No access to the chipset temperature/voltage data on SMBus). > > The via-fb driver just "doesn't work" - Haven't looked at it yet. > > There is a VIA-specific driver for the VIA USB controller, but it > isn't in the x86 part of the tree - Haven't looked at it yet. > > There isn't a driver for the hardware watchdog on CX700 - > There isn't a driver for the machine error reporting - > > = = = = > > Although there may be timing requirement differences on the > CX700 and CN896 - I think more likely a human error (typo) > in the "clobber" lines of the asm - Have not yet audited that, > but it is high on my list. > > Note: I have seem to recall that newer gcc's optimizer presume > that the flags register is preserved across asm - > It didn't use to do that - but there is now a "cc" to deal with > that - Have not yet audited for that, but it is high on my list. > > Busy, busy, busy - - > The -09144lk on C7-M/CX700 now up for 3 3/4 hours close to a new > record - but ehci-hcd has not yet gone into a re-try loop. > The -09145{,lk}-db pair is posted now. Same code-base/config as the -09144{,lk} pair with the addition of lockdep checking. Details: http://forum.netbookuser.com/viewtopic.php?pid=6980#p6980 Mike > Mike > > -hpa > > > > > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > > ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [BUG FIX] Make x86_32 uni-processor Atomic ops, Atomic 2009-05-24 18:00 ` H. Peter Anvin 2009-05-24 18:32 ` Michael S. Zick @ 2009-05-28 20:30 ` Pavel Machek 2009-05-28 20:54 ` Michael S. Zick 1 sibling, 1 reply; 90+ messages in thread From: Pavel Machek @ 2009-05-28 20:30 UTC (permalink / raw) To: H. Peter Anvin Cc: lkml, Harald Welte, Ingo Molnar, Thomas Gleixner, linux-kernel, Alan Cox Hi! > > @hpa - I still like your suggestion that it is only one (or a few) > > uses of atomic ops that is incorrect and in general atomic ops > > should compile away on uni-processor. > > > > Actually, the more I think about it the more I suspect there is a race > condition either in the chip set or in any VIA-specific drivers (if > there are any.) Putting LOCKs in random places will slow the CPU down > significantly, so it might resolve the race condition without actually > solving the problem. Which you can verify; replace lock with something slow (pushad, popad)? And see what happens. (And if it never ever triggers on hp2133, you have strong clue that it may not be cpu-related, but bios-related or chipset related or something). Some time ago I was trying to debug misterious hangs on some via/fic machines. We never figured out what was wrong, but we discovered many other bios bugs, and those were not being fixed; so debugging was hard/impossible. Unfortunately I no longer have access to that hw. hp2133 did _not_ have that problem. Try forcing maximum throttling, then move mouse for like five seconds. If kbc dies, you have same buggy bios, and probably are debugging same problem.... Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [BUG FIX] Make x86_32 uni-processor Atomic ops, Atomic 2009-05-28 20:30 ` Pavel Machek @ 2009-05-28 20:54 ` Michael S. Zick 2009-05-28 23:15 ` [Futex RFC] was " Michael S. Zick 0 siblings, 1 reply; 90+ messages in thread From: Michael S. Zick @ 2009-05-28 20:54 UTC (permalink / raw) To: Pavel Machek Cc: H. Peter Anvin, Harald Welte, Ingo Molnar, Thomas Gleixner, linux-kernel, Alan Cox On Thu May 28 2009, Pavel Machek wrote: > Hi! > > > > @hpa - I still like your suggestion that it is only one (or a few) > > > uses of atomic ops that is incorrect and in general atomic ops > > > should compile away on uni-processor. > > > > > > > Actually, the more I think about it the more I suspect there is a race > > condition either in the chip set or in any VIA-specific drivers (if > > there are any.) Putting LOCKs in random places will slow the CPU down > > significantly, so it might resolve the race condition without actually > > solving the problem. > > Which you can verify; replace lock with something slow (pushad, > popad)? And see what happens. > > (And if it never ever triggers on hp2133, you have strong clue that it > may not be cpu-related, but bios-related or chipset related or something). > > Some time ago I was trying to debug misterious hangs on some > via/fic machines. > > We never figured out what was wrong, but we discovered many other bios > bugs, and those were not being fixed; so debugging was > hard/impossible. Unfortunately I no longer have access to that hw. > Then I am not losing my mind here - *it is* a difficult problem. ;) > hp2133 did _not_ have that problem. > Today's build has been playing me music for over 8 hours on the HP-2133 (C7M-CN896) but can't get past a couple of hours on the (fic) Everex Cloudbook (C7M-CX700). Also, the distro on the Cloudbook is using pulse-audio - the distro on the HP is not. So I am reviewing the recent bug fixes to kernel/futex for something over-looked. ;) May be a wild goose chase, but I think pulse-audio uses futexes. Thanks for the other hints. Mike > Try forcing maximum throttling, then move mouse for like five > seconds. If kbc dies, you have same buggy bios, and probably are > debugging same problem.... > Pavel ^ permalink raw reply [flat|nested] 90+ messages in thread
* [Futex RFC] was Re: [BUG FIX] Make x86_32 uni-processor Atomic ops, Atomic 2009-05-28 20:54 ` Michael S. Zick @ 2009-05-28 23:15 ` Michael S. Zick 2009-05-29 2:00 ` Michael S. Zick 0 siblings, 1 reply; 90+ messages in thread From: Michael S. Zick @ 2009-05-28 23:15 UTC (permalink / raw) To: Pavel Machek Cc: H. Peter Anvin, Harald Welte, Ingo Molnar, Thomas Gleixner, linux-kernel, Alan Cox On Thu May 28 2009, Michael S. Zick wrote: > On Thu May 28 2009, Pavel Machek wrote: > > Hi! > > > > > > @hpa - I still like your suggestion that it is only one (or a few) > > > > uses of atomic ops that is incorrect and in general atomic ops > > > > should compile away on uni-processor. > > > > > > > > > > Actually, the more I think about it the more I suspect there is a race > > > condition either in the chip set or in any VIA-specific drivers (if > > > there are any.) Putting LOCKs in random places will slow the CPU down > > > significantly, so it might resolve the race condition without actually > > > solving the problem. > > > > Which you can verify; replace lock with something slow (pushad, > > popad)? And see what happens. > > > > (And if it never ever triggers on hp2133, you have strong clue that it > > may not be cpu-related, but bios-related or chipset related or something). > > > > Some time ago I was trying to debug misterious hangs on some > > via/fic machines. > > > > We never figured out what was wrong, but we discovered many other bios > > bugs, and those were not being fixed; so debugging was > > hard/impossible. Unfortunately I no longer have access to that hw. > > > > Then I am not losing my mind here - *it is* a difficult problem. ;) > > > hp2133 did _not_ have that problem. > > > > Today's build has been playing me music for over 8 hours on the > HP-2133 (C7M-CN896) but can't get past a couple of hours on the > (fic) Everex Cloudbook (C7M-CX700). > > Also, the distro on the Cloudbook is using pulse-audio - the > distro on the HP is not. So I am reviewing the recent bug > fixes to kernel/futex for something over-looked. ;) > May be a wild goose chase, but I think pulse-audio uses futexes. > Please, somebody apply an experienced eye-ball to this; It does seem to make a difference, but tests have not run for very long yet. diff --git a/arch/x86/include/asm/futex.h b/arch/x86/include/asm/futex.h index 1f11ce4..da3c801 100644 --- a/arch/x86/include/asm/futex.h +++ b/arch/x86/include/asm/futex.h @@ -19,7 +19,8 @@ "\t.previous\n" \ _ASM_EXTABLE(1b, 3b) \ : "=r" (oldval), "=r" (ret), "+m" (*uaddr) \ - : "i" (-EFAULT), "0" (oparg), "1" (0)) + : "i" (-EFAULT), "0" (oparg), "1" (0) \ + : "memory") #define __futex_atomic_op2(insn, ret, oldval, uaddr, oparg) \ asm volatile("1:\tmovl %2, %0\n" \ @@ -35,7 +36,8 @@ _ASM_EXTABLE(2b, 4b) \ : "=&a" (oldval), "=&r" (ret), \ "+m" (*uaddr), "=&r" (tem) \ - : "r" (oparg), "i" (-EFAULT), "1" (0)) + : "r" (oparg), "i" (-EFAULT), "1" (0) \ + : "memory") Mike > Thanks for the other hints. > > Mike > > Try forcing maximum throttling, then move mouse for like five > > seconds. If kbc dies, you have same buggy bios, and probably are > > debugging same problem.... > > Pavel > > > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > > ^ permalink raw reply related [flat|nested] 90+ messages in thread
* Re: [Futex RFC] was Re: [BUG FIX] Make x86_32 uni-processor Atomic ops, Atomic 2009-05-28 23:15 ` [Futex RFC] was " Michael S. Zick @ 2009-05-29 2:00 ` Michael S. Zick 0 siblings, 0 replies; 90+ messages in thread From: Michael S. Zick @ 2009-05-29 2:00 UTC (permalink / raw) To: Pavel Machek Cc: H. Peter Anvin, Harald Welte, Ingo Molnar, Thomas Gleixner, linux-kernel, Alan Cox On Thu May 28 2009, Michael S. Zick wrote: > > Please, somebody apply an experienced eye-ball to this; > It does seem to make a difference, but tests have not run > for very long yet. > > diff --git a/arch/x86/include/asm/futex.h b/arch/x86/include/asm/futex.h > index 1f11ce4..da3c801 100644 > --- a/arch/x86/include/asm/futex.h > +++ b/arch/x86/include/asm/futex.h > @@ -19,7 +19,8 @@ > "\t.previous\n" \ > _ASM_EXTABLE(1b, 3b) \ > : "=r" (oldval), "=r" (ret), "+m" (*uaddr) \ > - : "i" (-EFAULT), "0" (oparg), "1" (0)) > + : "i" (-EFAULT), "0" (oparg), "1" (0) \ > + : "memory") > > #define __futex_atomic_op2(insn, ret, oldval, uaddr, oparg) \ > asm volatile("1:\tmovl %2, %0\n" \ > @@ -35,7 +36,8 @@ > _ASM_EXTABLE(2b, 4b) \ > : "=&a" (oldval), "=&r" (ret), \ > "+m" (*uaddr), "=&r" (tem) \ > - : "r" (oparg), "i" (-EFAULT), "1" (0)) > + : "r" (oparg), "i" (-EFAULT), "1" (0) \ > + : "memory") > > > Mike Without the above annotations: C7-M/CX700 uptime while running pulse-audio: 1 1/2 hrs. With the above annotations: C7-M/CX700 uptime, same test setup, maximum unknown, test terminated after 3 hours. On the C7-M/CN896 - maximum unknown, test terminated after 12 hrs. Sample build to be available tomorrow. Mike ^ permalink raw reply [flat|nested] 90+ messages in thread
* LOCK prefix on uni processor has its use (was Re: [BUG FIX] Make x86_32 uni-processor Atomic ops, Atomic) 2009-05-23 23:44 ` H. Peter Anvin 2009-05-24 6:49 ` Harald Welte 2009-05-24 12:27 ` Michael S. Zick @ 2009-05-27 17:01 ` Harald Welte 2009-05-27 17:10 ` Michael S. Zick ` (3 more replies) 2 siblings, 4 replies; 90+ messages in thread From: Harald Welte @ 2009-05-27 17:01 UTC (permalink / raw) To: H. Peter Anvin; +Cc: lkml, Ingo Molnar, Thomas Gleixner, linux-kernel, Alan Cox Hi hpa and others, On Sat, May 23, 2009 at 04:44:08PM -0700, H. Peter Anvin wrote: > It looks like there might be a problem with the C7-M ... Michael reports > that if he sets LOCK_PREFIX to "lock;" it works, but that shouldn't be > necessary for a uniprocessor. It seems, they are neccessary. Here are some statements from the CPU logic guys at VIA/Centaur: * A read-modify-write sequence cannot be interupted. * All X86 instructions except rep-strings are atomic wrt interrupts. * The lock prefix has uses on a UP processor: It keeps DMA devices from interfering with a read-modify-write sequence Furthermore, they have done some experimentation in the past, making the CPU simply ignore the LOCK prefix on uni-processor (running a certain popular proprietary operating system): It doesn't work, presumably of the abovementioned DMA related conflict. Also, the engineers believe that it is only a matter of time until different CPU/chipset combination would expose the same bug. Since the in-order single-retire C7-M is more vulnerable than out-of-order, multiple-retire CPU's, they are not surprised that the issue shows first on the C7-M. The recommendation from the CPU engineers, unsurprisingly, thus is to put the LOCK prefixes back where they were. Hope this helps you. Now if I understand the issues correctly, it would mean that there is some driver code that modifies a certain chunk of memory, while DMA of some peripheral is also accessing that memory. I suppose it would not have to be the same actual address, but probably being within the same cache line is already sufficient. Now the question is: Is this a valid operation of a driver? Should the driver do such things, or is such a driver broken? When would that occur? I'm trying to come up with a case, but typically you e.g. allocate some DMA buffer and then don't touch it until the hardware has processed it. Regards, -- - Harald Welte <HaraldWelte@viatech.com> http://linux.via.com.tw/ ============================================================================ VIA Free and Open Source Software Liaison ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: LOCK prefix on uni processor has its use (was Re: [BUG FIX] Make x86_32 uni-processor Atomic ops, Atomic) 2009-05-27 17:01 ` LOCK prefix on uni processor has its use (was Re: [BUG FIX] Make x86_32 uni-processor Atomic ops, Atomic) Harald Welte @ 2009-05-27 17:10 ` Michael S. Zick 2009-05-27 17:19 ` Thomas Gleixner ` (2 subsequent siblings) 3 siblings, 0 replies; 90+ messages in thread From: Michael S. Zick @ 2009-05-27 17:10 UTC (permalink / raw) To: Harald Welte Cc: H. Peter Anvin, Ingo Molnar, Thomas Gleixner, linux-kernel, Alan Cox On Wed May 27 2009, Harald Welte wrote: > Hi hpa and others, > > On Sat, May 23, 2009 at 04:44:08PM -0700, H. Peter Anvin wrote: > > > It looks like there might be a problem with the C7-M ... Michael reports > > that if he sets LOCK_PREFIX to "lock;" it works, but that shouldn't be > > necessary for a uniprocessor. > > It seems, they are neccessary. > > Here are some statements from the CPU logic guys at VIA/Centaur: > > * A read-modify-write sequence cannot be interupted. > * All X86 instructions except rep-strings are atomic wrt interrupts. > * The lock prefix has uses on a UP processor: It keeps DMA devices from > interfering with a read-modify-write sequence > > Furthermore, they have done some experimentation in the past, making the > CPU simply ignore the LOCK prefix on uni-processor (running a certain popular > proprietary operating system): It doesn't work, presumably of the abovementioned > DMA related conflict. > > Also, the engineers believe that it is only a matter of time until different > CPU/chipset combination would expose the same bug. Since the in-order > single-retire C7-M is more vulnerable than out-of-order, multiple-retire CPU's, > they are not surprised that the issue shows first on the C7-M. > > The recommendation from the CPU engineers, unsurprisingly, thus is to put the > LOCK prefixes back where they were. > > Hope this helps you. > > Now if I understand the issues correctly, it would mean that there is some > driver code that modifies a certain chunk of memory, while DMA of some > peripheral is also accessing that memory. I suppose it would not have to be > the same actual address, but probably being within the same cache line is > already sufficient. > I am also testing with the pci cache line size hard-coded to be the same size as the processor cache line size (a WAFG for now) - - It is too soon (only an 1 1/2 hours) to be a significant finding - - but if this was set to twice the physical line length, it would be only flushing every other line - which I think would show up *real* fast. ;) I am noticing some "dropped buffers and/or dropped packets" in my streaming music - - but that is not conclusive of anything other than hd-audio may be using the wrong cache stride also. ;) Mike > Now the question is: Is this a valid operation of a driver? Should the driver > do such things, or is such a driver broken? When would that occur? I'm trying > to come up with a case, but typically you e.g. allocate some DMA buffer and > then don't touch it until the hardware has processed it. > > Regards, ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: LOCK prefix on uni processor has its use (was Re: [BUG FIX] Make x86_32 uni-processor Atomic ops, Atomic) 2009-05-27 17:01 ` LOCK prefix on uni processor has its use (was Re: [BUG FIX] Make x86_32 uni-processor Atomic ops, Atomic) Harald Welte 2009-05-27 17:10 ` Michael S. Zick @ 2009-05-27 17:19 ` Thomas Gleixner 2009-05-27 17:25 ` Michael S. Zick 2009-05-27 18:08 ` LOCK prefix on uni processor has its use Andi Kleen 2009-05-28 2:56 ` LOCK prefix on uni processor has its use (was Re: [BUG FIX] Make x86_32 uni-processor Atomic ops, Atomic) H. Peter Anvin 3 siblings, 1 reply; 90+ messages in thread From: Thomas Gleixner @ 2009-05-27 17:19 UTC (permalink / raw) To: Harald Welte; +Cc: H. Peter Anvin, lkml, Ingo Molnar, linux-kernel, Alan Cox On Wed, 27 May 2009, Harald Welte wrote: > Here are some statements from the CPU logic guys at VIA/Centaur: > > * A read-modify-write sequence cannot be interupted. > * All X86 instructions except rep-strings are atomic wrt interrupts. > * The lock prefix has uses on a UP processor: It keeps DMA devices from > interfering with a read-modify-write sequence ... > Now if I understand the issues correctly, it would mean that there is some > driver code that modifies a certain chunk of memory, while DMA of some > peripheral is also accessing that memory. I suppose it would not have to be > the same actual address, but probably being within the same cache line is > already sufficient. > > Now the question is: Is this a valid operation of a driver? Should the driver > do such things, or is such a driver broken? When would that occur? I'm trying > to come up with a case, but typically you e.g. allocate some DMA buffer and > then don't touch it until the hardware has processed it. Right, that would be more than stupid, but even then it would not explain any breakage of the kernel. Such a driver would not be functional anyway if it relies on some read/write modify operations in an active DMA buffer. That would also explode on any other system as you have no control whether the access to that memory happens before or after the DMA operation. Can you please ask them to clarify that DMA issue further ? Thanks, tglx ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: LOCK prefix on uni processor has its use (was Re: [BUG FIX] Make x86_32 uni-processor Atomic ops, Atomic) 2009-05-27 17:19 ` Thomas Gleixner @ 2009-05-27 17:25 ` Michael S. Zick 0 siblings, 0 replies; 90+ messages in thread From: Michael S. Zick @ 2009-05-27 17:25 UTC (permalink / raw) To: Thomas Gleixner Cc: Harald Welte, H. Peter Anvin, Ingo Molnar, linux-kernel, Alan Cox On Wed May 27 2009, Thomas Gleixner wrote: > On Wed, 27 May 2009, Harald Welte wrote: > > Here are some statements from the CPU logic guys at VIA/Centaur: > > > > * A read-modify-write sequence cannot be interupted. > > * All X86 instructions except rep-strings are atomic wrt interrupts. > > * The lock prefix has uses on a UP processor: It keeps DMA devices from > > interfering with a read-modify-write sequence > ... > > > Now if I understand the issues correctly, it would mean that there is some > > driver code that modifies a certain chunk of memory, while DMA of some > > peripheral is also accessing that memory. I suppose it would not have to be > > the same actual address, but probably being within the same cache line is > > already sufficient. > > > > Now the question is: Is this a valid operation of a driver? Should the driver > > do such things, or is such a driver broken? When would that occur? I'm trying > > to come up with a case, but typically you e.g. allocate some DMA buffer and > > then don't touch it until the hardware has processed it. > > Right, that would be more than stupid, but even then it would not > explain any breakage of the kernel. Such a driver would not be > functional anyway if it relies on some read/write modify operations in > an active DMA buffer. That would also explode on any other system as > you have no control whether the access to that memory happens before > or after the DMA operation. > IFF your DMA buffer is cache-line aligned and doesn't have an immediately adjacent spin-lock (or some such thing) sharing the cache-line. Mike > Can you please ask them to clarify that DMA issue further ? > > Thanks, > > tglx > > ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: LOCK prefix on uni processor has its use 2009-05-27 17:01 ` LOCK prefix on uni processor has its use (was Re: [BUG FIX] Make x86_32 uni-processor Atomic ops, Atomic) Harald Welte 2009-05-27 17:10 ` Michael S. Zick 2009-05-27 17:19 ` Thomas Gleixner @ 2009-05-27 18:08 ` Andi Kleen 2009-05-27 18:22 ` Michael S. Zick 2009-06-02 12:48 ` Harald Welte 2009-05-28 2:56 ` LOCK prefix on uni processor has its use (was Re: [BUG FIX] Make x86_32 uni-processor Atomic ops, Atomic) H. Peter Anvin 3 siblings, 2 replies; 90+ messages in thread From: Andi Kleen @ 2009-05-27 18:08 UTC (permalink / raw) To: Harald Welte Cc: H. Peter Anvin, lkml, Ingo Molnar, Thomas Gleixner, linux-kernel, Alan Cox Harald Welte <HaraldWelte@viatech.com> writes: > * All X86 instructions except rep-strings are atomic wrt interrupts. > * The lock prefix has uses on a UP processor: It keeps DMA devices from > interfering with a read-modify-write sequence In theory yes, but not in Linux -- normal drivers simply don't use LOCK in any way on a UP kernel. We discussed exactly this in the earlier subthread :) > Now the question is: Is this a valid operation of a driver? Should the driver > do such things, or is such a driver broken? The driver is broken because if it relies on this it will not work on a UP kernel. Also it's not portable and in general a bad idea. > When would that occur? I'm trying > to come up with a case, but typically you e.g. allocate some DMA buffer and > then don't touch it until the hardware has processed it. Is it known which driver has this problem? -Andi (who finds hpa's "timing theory" to be more believable anyways) -- ak@linux.intel.com -- Speaking for myself only. ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: LOCK prefix on uni processor has its use 2009-05-27 18:08 ` LOCK prefix on uni processor has its use Andi Kleen @ 2009-05-27 18:22 ` Michael S. Zick 2009-05-27 18:33 ` Michael S. Zick 2009-05-27 18:38 ` Andi Kleen 2009-06-02 12:48 ` Harald Welte 1 sibling, 2 replies; 90+ messages in thread From: Michael S. Zick @ 2009-05-27 18:22 UTC (permalink / raw) To: Andi Kleen Cc: Harald Welte, H. Peter Anvin, Ingo Molnar, Thomas Gleixner, linux-kernel, Alan Cox On Wed May 27 2009, Andi Kleen wrote: > Harald Welte <HaraldWelte@viatech.com> writes: > > * All X86 instructions except rep-strings are atomic wrt interrupts. > > * The lock prefix has uses on a UP processor: It keeps DMA devices from > > interfering with a read-modify-write sequence > > In theory yes, but not in Linux -- normal drivers simply don't use LOCK in any way > on a UP kernel. > > We discussed exactly this in the earlier subthread :) > > > Now the question is: Is this a valid operation of a driver? Should the driver > > do such things, or is such a driver broken? > > The driver is broken because if it relies on this it will not work on a UP kernel. > Also it's not portable and in general a bad idea. > > > When would that occur? I'm trying > > to come up with a case, but typically you e.g. allocate some DMA buffer and > > then don't touch it until the hardware has processed it. > > Is it known which driver has this problem? > > -Andi (who finds hpa's "timing theory" to be more believable anyways) > I still have not come up with a solid, testable, theory to explain the order of magnitude in up-time before the kernel locks with/with-out 'lock'. But we are definitely pecking around the edges of the problem. ;) Today's lockdep build has just passed its previous record by hard-coding the pci cache line size to be the same as the cpu's cache line size. (a WAFG). Until we hear back from the VIA-CPU people, I just guessed that since the chip set was designed for use with the processor... Mike ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: LOCK prefix on uni processor has its use 2009-05-27 18:22 ` Michael S. Zick @ 2009-05-27 18:33 ` Michael S. Zick 2009-05-27 18:55 ` Michael S. Zick 2009-05-27 18:38 ` Andi Kleen 1 sibling, 1 reply; 90+ messages in thread From: Michael S. Zick @ 2009-05-27 18:33 UTC (permalink / raw) To: Andi Kleen Cc: Harald Welte, H. Peter Anvin, Ingo Molnar, Thomas Gleixner, linux-kernel, Alan Cox On Wed May 27 2009, Michael S. Zick wrote: > On Wed May 27 2009, Andi Kleen wrote: > > Harald Welte <HaraldWelte@viatech.com> writes: > > > * All X86 instructions except rep-strings are atomic wrt interrupts. > > > * The lock prefix has uses on a UP processor: It keeps DMA devices from > > > interfering with a read-modify-write sequence > > > > In theory yes, but not in Linux -- normal drivers simply don't use LOCK in any way > > on a UP kernel. > > > > We discussed exactly this in the earlier subthread :) > > > > > Now the question is: Is this a valid operation of a driver? Should the driver > > > do such things, or is such a driver broken? > > > > The driver is broken because if it relies on this it will not work on a UP kernel. > > Also it's not portable and in general a bad idea. > > > > > When would that occur? I'm trying > > > to come up with a case, but typically you e.g. allocate some DMA buffer and > > > then don't touch it until the hardware has processed it. > > > > Is it known which driver has this problem? > > > > -Andi (who finds hpa's "timing theory" to be more believable anyways) > > > > I still have not come up with a solid, testable, theory to explain the > order of magnitude in up-time before the kernel locks with/with-out 'lock'. > > But we are definitely pecking around the edges of the problem. ;) > > Today's lockdep build has just passed its previous record by hard-coding > the pci cache line size to be the same as the cpu's cache line size. (a WAFG). > Until we hear back from the VIA-CPU people, I just guessed that since the > chip set was designed for use with the processor... > Ah, so - some information - - - (caused by un-plug/re-plug usb mouse while ehci-hcd was caught in its failure reporting loop) ehci_hcd 0000:00:10.4: port 6 resume error -19 hub 1-0:1.0: hub_port_status failed (err = -32) hub 1-0:1.0: connect-debounce failed, port 6 disabled hub 1-0:1.0: over-current change on port 1 ehci_hcd 0000:00:10.4: HC died; cleaning up irq 23: nobody cared (try booting with the "irqpoll" option) Pid: 2277, comm: syslogd Not tainted 2.6.30-rc7-ce1200v-09147lk-db #29 Call Trace: [<c015de14>] ? __report_bad_irq+0x24/0x90 [<c015dfc5>] ? note_interrupt+0x145/0x180 [<c015e39f>] ? handle_fasteoi_irq+0xaf/0xe0 [<c0104eb7>] ? handle_irq+0x17/0x20 [<c0104daa>] ? do_IRQ+0x3a/0xa0 [<c0145a8b>] ? trace_hardirqs_on_caller+0x6b/0x170 [<c01034ae>] ? common_interrupt+0x2e/0x34 [<c0126082>] ? __do_softirq+0x42/0x110 [<c0141294>] ? tick_program_event+0x14/0x20 [<c01384cc>] ? hrtimer_interrupt+0xcc/0x1d0 [<c0126195>] ? do_softirq+0x45/0x50 [<c01264aa>] ? irq_exit+0x6a/0x80 [<c0111109>] ? smp_apic_timer_interrupt+0x49/0x80 [<c0103517>] ? apic_timer_interrupt+0x2f/0x34 [<c014799e>] ? lock_acquire+0x8e/0xa0 [<c01f4b8a>] ? start_this_handle+0x6a/0x3c0 [<c05307bd>] ? _spin_lock+0x3d/0x70 [<c01f4b8a>] ? start_this_handle+0x6a/0x3c0 [<c01f4b8a>] ? start_this_handle+0x6a/0x3c0 [<c018b598>] ? kmem_cache_alloc+0x98/0x100 [<c0143c06>] ? lockdep_init_map+0x46/0x130 [<c01f4f80>] ? journal_start+0xa0/0x100 [<c0163780>] ? grab_cache_page_write_begin+0x30/0xc0 [<c0138fb4>] ? up_read+0x14/0x30 [<c01da348>] ? ext3_write_begin+0x98/0x200 [<c0163a48>] ? generic_file_buffered_write+0x108/0x300 [<c0125943>] ? current_fs_time+0x13/0x20 [<c016526a>] ? __generic_file_aio_write_nolock+0x24a/0x550 [<c052f520>] ? __mutex_lock_common+0x2f0/0x3f0 [<c01655b9>] ? generic_file_aio_write+0x49/0xd0 [<c01655ce>] ? generic_file_aio_write+0x5e/0xd0 [<c0146359>] ? validate_chain+0xe9/0x1000 [<c01d8680>] ? ext3_file_write+0x30/0xc0 [<c01d8650>] ? ext3_file_write+0x0/0xc0 [<c018e47f>] ? do_sync_readv_writev+0xbf/0x100 [<c0144dae>] ? lock_release_holdtime+0x6e/0xf0 [<c0135230>] ? autoremove_wake_function+0x0/0x50 [<c017606f>] ? might_fault+0x4f/0xa0 [<c0225c3c>] ? security_file_permission+0xc/0x10 [<c018e746>] ? rw_verify_area+0x66/0xd0 [<c018e30e>] ? rw_copy_check_uvector+0x7e/0x100 [<c018f30a>] ? do_readv_writev+0xaa/0x190 [<c01d8650>] ? ext3_file_write+0x0/0xc0 [<c018f42c>] ? vfs_writev+0x3c/0x50 [<c018f527>] ? sys_writev+0x47/0x80 [<c0102e08>] ? sysenter_do_call+0x12/0x36 handlers: [<c035d480>] (usb_hcd_irq+0x0/0x90) Disabling IRQ #23 hub 1-0:1.0: hub_port_status failed (err = -19) hub 1-0:1.0: connect-debounce failed, port 1 disabled hub 1-0:1.0: cannot disable port 1 (err = -19) hub 1-0:1.0: hub_port_status failed (err = -19) hub 1-0:1.0: hub_port_status failed (err = -19) hub 1-0:1.0: hub_port_status failed (err = -19) hub 1-0:1.0: hub_port_status failed (err = -19) hub 1-0:1.0: hub_port_status failed (err = -19) usb 1-5: USB disconnect, address 3 ehci_hcd 0000:00:10.4: force halt; handhake dc724014 00004000 00004000 -> -19 ================================= [ INFO: inconsistent lock state ] 2.6.30-rc7-ce1200v-09147lk-db #29 --------------------------------- inconsistent {IN-HARDIRQ-W} -> {HARDIRQ-ON-W} usage. hd-audio0/51 [HC0[0]:SC1[1]:HE1:SE0] takes: (&irq_desc_lock_class){?.-...}, at: [<c015dc81>] try_one_irq+0x21/0x130 {IN-HARDIRQ-W} state was registered at: [<ffffffff>] 0xffffffff irq event stamp: 95629480 hardirqs last enabled at (95629480): [<c05310a0>] _spin_unlock_irq+0x20/0x40 hardirqs last disabled at (95629479): [<c053086d>] _spin_lock_irq+0xd/0x70 softirqs last enabled at (95626922): [<c0126195>] do_softirq+0x45/0x50 softirqs last disabled at (95629475): [<c0126195>] do_softirq+0x45/0x50 other info that might help us debug this: 3 locks held by hd-audio0/51: #0: ((bus->workq_name)){+.+...}, at: [<c0131912>] worker_thread+0x192/0x2d0 #1: (&chip->irq_pending_work){+.+...}, at: [<c0131912>] worker_thread+0x192/0x2d0 #2: (kernel/irq/spurious.c:21){+.-...}, at: [<c012a4d0>] run_timer_softirq+0xe0/0x1f0 stack backtrace: Pid: 51, comm: hd-audio0 Not tainted 2.6.30-rc7-ce1200v-09147lk-db #29 Call Trace: [<c0145254>] ? print_usage_bug+0x174/0x1c0 [<c014583b>] ? mark_lock+0x59b/0x5d0 [<c01463b8>] ? validate_chain+0x148/0x1000 [<c0144fc0>] ? check_usage_backwards+0x0/0x90 [<c01474a7>] ? __lock_acquire+0x237/0x6a0 [<c014798b>] ? lock_acquire+0x7b/0xa0 [<c015dc81>] ? try_one_irq+0x21/0x130 [<c05307bd>] ? _spin_lock+0x3d/0x70 [<c015dc81>] ? try_one_irq+0x21/0x130 [<c015dc81>] ? try_one_irq+0x21/0x130 [<c015ddd3>] ? poll_spurious_irqs+0x43/0x60 [<c012a55b>] ? run_timer_softirq+0x16b/0x1f0 [<c012a4d0>] ? run_timer_softirq+0xe0/0x1f0 [<c015dd90>] ? poll_spurious_irqs+0x0/0x60 [<c01260a8>] ? __do_softirq+0x68/0x110 [<c0141294>] ? tick_program_event+0x14/0x20 [<c01384cc>] ? hrtimer_interrupt+0xcc/0x1d0 [<c0126195>] ? do_softirq+0x45/0x50 [<c01264aa>] ? irq_exit+0x6a/0x80 [<c0111109>] ? smp_apic_timer_interrupt+0x49/0x80 [<c0103517>] ? apic_timer_interrupt+0x2f/0x34 [<c05310a6>] ? _spin_unlock_irq+0x26/0x40 [<c0419422>] ? azx_irq_pending_work+0x92/0x120 [<c0131912>] ? worker_thread+0x192/0x2d0 [<c0419390>] ? azx_irq_pending_work+0x0/0x120 [<c0131975>] ? worker_thread+0x1f5/0x2d0 [<c0131912>] ? worker_thread+0x192/0x2d0 [<c0135230>] ? autoremove_wake_function+0x0/0x50 [<c0131780>] ? worker_thread+0x0/0x2d0 [<c0134ee7>] ? kthread+0x47/0x80 [<c0134ea0>] ? kthread+0x0/0x80 [<c0103627>] ? kernel_thread_helper+0x7/0x10 Enjoy ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: LOCK prefix on uni processor has its use 2009-05-27 18:33 ` Michael S. Zick @ 2009-05-27 18:55 ` Michael S. Zick 0 siblings, 0 replies; 90+ messages in thread From: Michael S. Zick @ 2009-05-27 18:55 UTC (permalink / raw) To: Andi Kleen Cc: Harald Welte, H. Peter Anvin, Ingo Molnar, Thomas Gleixner, linux-kernel, Alan Cox On Wed May 27 2009, Michael S. Zick wrote: > On Wed May 27 2009, Michael S. Zick wrote: > > On Wed May 27 2009, Andi Kleen wrote: > > > Harald Welte <HaraldWelte@viatech.com> writes: > > > > * All X86 instructions except rep-strings are atomic wrt interrupts. > > > > * The lock prefix has uses on a UP processor: It keeps DMA devices from > > > > interfering with a read-modify-write sequence > > > > > > In theory yes, but not in Linux -- normal drivers simply don't use LOCK in any way > > > on a UP kernel. > > > Note that there are there spin_{lock,unlock} symbols in the stack trace. This build has the 'lock' in LOCK_PREFIX. Mike > > > We discussed exactly this in the earlier subthread :) > > > > > > > Now the question is: Is this a valid operation of a driver? Should the driver > > > > do such things, or is such a driver broken? > > > > > > The driver is broken because if it relies on this it will not work on a UP kernel. > > > Also it's not portable and in general a bad idea. > > > > > > > When would that occur? I'm trying > > > > to come up with a case, but typically you e.g. allocate some DMA buffer and > > > > then don't touch it until the hardware has processed it. > > > > > > Is it known which driver has this problem? > > > > > > -Andi (who finds hpa's "timing theory" to be more believable anyways) > > > > > > > I still have not come up with a solid, testable, theory to explain the > > order of magnitude in up-time before the kernel locks with/with-out 'lock'. > > > > But we are definitely pecking around the edges of the problem. ;) > > > > Today's lockdep build has just passed its previous record by hard-coding > > the pci cache line size to be the same as the cpu's cache line size. (a WAFG). > > Until we hear back from the VIA-CPU people, I just guessed that since the > > chip set was designed for use with the processor... > > > > Ah, so - some information - - - > (caused by un-plug/re-plug usb mouse while ehci-hcd was caught in its failure > reporting loop) > > ehci_hcd 0000:00:10.4: port 6 resume error -19 > hub 1-0:1.0: hub_port_status failed (err = -32) > hub 1-0:1.0: connect-debounce failed, port 6 disabled > hub 1-0:1.0: over-current change on port 1 > ehci_hcd 0000:00:10.4: HC died; cleaning up > irq 23: nobody cared (try booting with the "irqpoll" option) > Pid: 2277, comm: syslogd Not tainted 2.6.30-rc7-ce1200v-09147lk-db #29 > Call Trace: > [<c015de14>] ? __report_bad_irq+0x24/0x90 > [<c015dfc5>] ? note_interrupt+0x145/0x180 > [<c015e39f>] ? handle_fasteoi_irq+0xaf/0xe0 > [<c0104eb7>] ? handle_irq+0x17/0x20 > [<c0104daa>] ? do_IRQ+0x3a/0xa0 > [<c0145a8b>] ? trace_hardirqs_on_caller+0x6b/0x170 > [<c01034ae>] ? common_interrupt+0x2e/0x34 > [<c0126082>] ? __do_softirq+0x42/0x110 > [<c0141294>] ? tick_program_event+0x14/0x20 > [<c01384cc>] ? hrtimer_interrupt+0xcc/0x1d0 > [<c0126195>] ? do_softirq+0x45/0x50 > [<c01264aa>] ? irq_exit+0x6a/0x80 > [<c0111109>] ? smp_apic_timer_interrupt+0x49/0x80 > [<c0103517>] ? apic_timer_interrupt+0x2f/0x34 > [<c014799e>] ? lock_acquire+0x8e/0xa0 > [<c01f4b8a>] ? start_this_handle+0x6a/0x3c0 > [<c05307bd>] ? _spin_lock+0x3d/0x70 > [<c01f4b8a>] ? start_this_handle+0x6a/0x3c0 > [<c01f4b8a>] ? start_this_handle+0x6a/0x3c0 > [<c018b598>] ? kmem_cache_alloc+0x98/0x100 > [<c0143c06>] ? lockdep_init_map+0x46/0x130 > [<c01f4f80>] ? journal_start+0xa0/0x100 > [<c0163780>] ? grab_cache_page_write_begin+0x30/0xc0 > [<c0138fb4>] ? up_read+0x14/0x30 > [<c01da348>] ? ext3_write_begin+0x98/0x200 > [<c0163a48>] ? generic_file_buffered_write+0x108/0x300 > [<c0125943>] ? current_fs_time+0x13/0x20 > [<c016526a>] ? __generic_file_aio_write_nolock+0x24a/0x550 > [<c052f520>] ? __mutex_lock_common+0x2f0/0x3f0 > [<c01655b9>] ? generic_file_aio_write+0x49/0xd0 > [<c01655ce>] ? generic_file_aio_write+0x5e/0xd0 > [<c0146359>] ? validate_chain+0xe9/0x1000 > [<c01d8680>] ? ext3_file_write+0x30/0xc0 > [<c01d8650>] ? ext3_file_write+0x0/0xc0 > [<c018e47f>] ? do_sync_readv_writev+0xbf/0x100 > [<c0144dae>] ? lock_release_holdtime+0x6e/0xf0 > [<c0135230>] ? autoremove_wake_function+0x0/0x50 > [<c017606f>] ? might_fault+0x4f/0xa0 > [<c0225c3c>] ? security_file_permission+0xc/0x10 > [<c018e746>] ? rw_verify_area+0x66/0xd0 > [<c018e30e>] ? rw_copy_check_uvector+0x7e/0x100 > [<c018f30a>] ? do_readv_writev+0xaa/0x190 > [<c01d8650>] ? ext3_file_write+0x0/0xc0 > [<c018f42c>] ? vfs_writev+0x3c/0x50 > [<c018f527>] ? sys_writev+0x47/0x80 > [<c0102e08>] ? sysenter_do_call+0x12/0x36 > handlers: > [<c035d480>] (usb_hcd_irq+0x0/0x90) > Disabling IRQ #23 > hub 1-0:1.0: hub_port_status failed (err = -19) > hub 1-0:1.0: connect-debounce failed, port 1 disabled > hub 1-0:1.0: cannot disable port 1 (err = -19) > hub 1-0:1.0: hub_port_status failed (err = -19) > hub 1-0:1.0: hub_port_status failed (err = -19) > hub 1-0:1.0: hub_port_status failed (err = -19) > hub 1-0:1.0: hub_port_status failed (err = -19) > hub 1-0:1.0: hub_port_status failed (err = -19) > usb 1-5: USB disconnect, address 3 > ehci_hcd 0000:00:10.4: force halt; handhake dc724014 00004000 00004000 -> -19 > > ================================= > [ INFO: inconsistent lock state ] > 2.6.30-rc7-ce1200v-09147lk-db #29 > --------------------------------- > inconsistent {IN-HARDIRQ-W} -> {HARDIRQ-ON-W} usage. > hd-audio0/51 [HC0[0]:SC1[1]:HE1:SE0] takes: > (&irq_desc_lock_class){?.-...}, at: [<c015dc81>] try_one_irq+0x21/0x130 > {IN-HARDIRQ-W} state was registered at: > [<ffffffff>] 0xffffffff > irq event stamp: 95629480 > hardirqs last enabled at (95629480): [<c05310a0>] _spin_unlock_irq+0x20/0x40 > hardirqs last disabled at (95629479): [<c053086d>] _spin_lock_irq+0xd/0x70 > softirqs last enabled at (95626922): [<c0126195>] do_softirq+0x45/0x50 > softirqs last disabled at (95629475): [<c0126195>] do_softirq+0x45/0x50 > > other info that might help us debug this: > 3 locks held by hd-audio0/51: > #0: ((bus->workq_name)){+.+...}, at: [<c0131912>] worker_thread+0x192/0x2d0 > #1: (&chip->irq_pending_work){+.+...}, at: [<c0131912>] worker_thread+0x192/0x2d0 > #2: (kernel/irq/spurious.c:21){+.-...}, at: [<c012a4d0>] run_timer_softirq+0xe0/0x1f0 > > stack backtrace: > Pid: 51, comm: hd-audio0 Not tainted 2.6.30-rc7-ce1200v-09147lk-db #29 > Call Trace: > [<c0145254>] ? print_usage_bug+0x174/0x1c0 > [<c014583b>] ? mark_lock+0x59b/0x5d0 > [<c01463b8>] ? validate_chain+0x148/0x1000 > [<c0144fc0>] ? check_usage_backwards+0x0/0x90 > [<c01474a7>] ? __lock_acquire+0x237/0x6a0 > [<c014798b>] ? lock_acquire+0x7b/0xa0 > [<c015dc81>] ? try_one_irq+0x21/0x130 > [<c05307bd>] ? _spin_lock+0x3d/0x70 > [<c015dc81>] ? try_one_irq+0x21/0x130 > [<c015dc81>] ? try_one_irq+0x21/0x130 > [<c015ddd3>] ? poll_spurious_irqs+0x43/0x60 > [<c012a55b>] ? run_timer_softirq+0x16b/0x1f0 > [<c012a4d0>] ? run_timer_softirq+0xe0/0x1f0 > [<c015dd90>] ? poll_spurious_irqs+0x0/0x60 > [<c01260a8>] ? __do_softirq+0x68/0x110 > [<c0141294>] ? tick_program_event+0x14/0x20 > [<c01384cc>] ? hrtimer_interrupt+0xcc/0x1d0 > [<c0126195>] ? do_softirq+0x45/0x50 > [<c01264aa>] ? irq_exit+0x6a/0x80 > [<c0111109>] ? smp_apic_timer_interrupt+0x49/0x80 > [<c0103517>] ? apic_timer_interrupt+0x2f/0x34 > [<c05310a6>] ? _spin_unlock_irq+0x26/0x40 > [<c0419422>] ? azx_irq_pending_work+0x92/0x120 > [<c0131912>] ? worker_thread+0x192/0x2d0 > [<c0419390>] ? azx_irq_pending_work+0x0/0x120 > [<c0131975>] ? worker_thread+0x1f5/0x2d0 > [<c0131912>] ? worker_thread+0x192/0x2d0 > [<c0135230>] ? autoremove_wake_function+0x0/0x50 > [<c0131780>] ? worker_thread+0x0/0x2d0 > [<c0134ee7>] ? kthread+0x47/0x80 > [<c0134ea0>] ? kthread+0x0/0x80 > [<c0103627>] ? kernel_thread_helper+0x7/0x10 > > Enjoy > > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > > ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: LOCK prefix on uni processor has its use 2009-05-27 18:22 ` Michael S. Zick 2009-05-27 18:33 ` Michael S. Zick @ 2009-05-27 18:38 ` Andi Kleen 1 sibling, 0 replies; 90+ messages in thread From: Andi Kleen @ 2009-05-27 18:38 UTC (permalink / raw) To: Michael S. Zick Cc: Andi Kleen, Harald Welte, H. Peter Anvin, Ingo Molnar, Thomas Gleixner, linux-kernel, Alan Cox > I still have not come up with a solid, testable, theory to explain the > order of magnitude in up-time before the kernel locks with/with-out 'lock'. What I would do is to try to track down in which file it happens. Compile individual subdirectories of the kernel with LOCK prefix, then down to files. Also always double check the results. -Andi -- ak@linux.intel.com -- Speaking for myself only. ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: LOCK prefix on uni processor has its use 2009-05-27 18:08 ` LOCK prefix on uni processor has its use Andi Kleen 2009-05-27 18:22 ` Michael S. Zick @ 2009-06-02 12:48 ` Harald Welte 2009-06-02 13:03 ` Andi Kleen 1 sibling, 1 reply; 90+ messages in thread From: Harald Welte @ 2009-06-02 12:48 UTC (permalink / raw) To: Andi Kleen Cc: H. Peter Anvin, lkml, Ingo Molnar, Thomas Gleixner, linux-kernel, Alan Cox On Wed, May 27, 2009 at 08:08:27PM +0200, Andi Kleen wrote: > Harald Welte <HaraldWelte@viatech.com> writes: > > * All X86 instructions except rep-strings are atomic wrt interrupts. > > * The lock prefix has uses on a UP processor: It keeps DMA devices from > > interfering with a read-modify-write sequence > > In theory yes, but not in Linux -- normal drivers simply don't use LOCK in > any way on a UP kernel. well, they might have inadvertedly used LOCK as part of regular spinlocks, until LOCK_PREFIX was removed, right? > > Now the question is: Is this a valid operation of a driver? Should the driver > > do such things, or is such a driver broken? > > The driver is broken because if it relies on this it will not work on a UP kernel. > Also it's not portable and in general a bad idea. I agree. I was not referring to any real/known driver. I was just trying to figure out what kind of problem the VIA/Centaur CPU guys tried to describe when indicating that the LOCK prefix should be used on UP to avoid DMA interfering with read-modify-write CPU instructions. -- - Harald Welte <HaraldWelte@viatech.com> http://linux.via.com.tw/ ============================================================================ VIA Free and Open Source Software Liaison ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: LOCK prefix on uni processor has its use 2009-06-02 12:48 ` Harald Welte @ 2009-06-02 13:03 ` Andi Kleen 2009-06-02 13:26 ` Michael S. Zick 0 siblings, 1 reply; 90+ messages in thread From: Andi Kleen @ 2009-06-02 13:03 UTC (permalink / raw) To: Harald Welte Cc: Andi Kleen, H. Peter Anvin, lkml, Ingo Molnar, Thomas Gleixner, linux-kernel, Alan Cox On Tue, Jun 02, 2009 at 02:48:54PM +0200, Harald Welte wrote: > On Wed, May 27, 2009 at 08:08:27PM +0200, Andi Kleen wrote: > > Harald Welte <HaraldWelte@viatech.com> writes: > > > * All X86 instructions except rep-strings are atomic wrt interrupts. > > > * The lock prefix has uses on a UP processor: It keeps DMA devices from > > > interfering with a read-modify-write sequence > > > > In theory yes, but not in Linux -- normal drivers simply don't use LOCK in > > any way on a UP kernel. > > well, they might have inadvertedly used LOCK as part of regular spinlocks, > until LOCK_PREFIX was removed, right? LOCK_PREFIX was always defined away on UP kernels. That dates back to the initial Linux 2.0 SMP implementation. On newer SMP kernels they also patch away the lock prefix even if they are running UP, so if you only have a single core you'll never get lock. So I think it's pretty unlikely any driver relied on this. There are some special bit functions that always have LOCK, but these are only used by the Xen drivers afaik (that is needed when a UP kernel talks to a SMP hypervisor over shared memory) > I agree. I was not referring to any real/known driver. I was just trying to > figure out what kind of problem the VIA/Centaur CPU guys tried to describe when > indicating that the LOCK prefix should be used on UP to avoid DMA interfering > with read-modify-write CPU instructions. It locks the cache line. That's a valid case in the x86 architecture, it's just that the Linux driver model doesn't use it. -Andi -- ak@linux.intel.com -- Speaking for myself only. ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: LOCK prefix on uni processor has its use 2009-06-02 13:03 ` Andi Kleen @ 2009-06-02 13:26 ` Michael S. Zick 2009-06-02 13:42 ` Andi Kleen 0 siblings, 1 reply; 90+ messages in thread From: Michael S. Zick @ 2009-06-02 13:26 UTC (permalink / raw) To: Andi Kleen Cc: Harald Welte, H. Peter Anvin, Ingo Molnar, Thomas Gleixner, linux-kernel, Alan Cox On Tue June 2 2009, Andi Kleen wrote: > On Tue, Jun 02, 2009 at 02:48:54PM +0200, Harald Welte wrote: > > On Wed, May 27, 2009 at 08:08:27PM +0200, Andi Kleen wrote: > > > Harald Welte <HaraldWelte@viatech.com> writes: > > > > * All X86 instructions except rep-strings are atomic wrt interrupts. > > > > * The lock prefix has uses on a UP processor: It keeps DMA devices from > > > > interfering with a read-modify-write sequence > > > > > > In theory yes, but not in Linux -- normal drivers simply don't use LOCK in > > > any way on a UP kernel. > > > > well, they might have inadvertedly used LOCK as part of regular spinlocks, > > until LOCK_PREFIX was removed, right? > > LOCK_PREFIX was always defined away on UP kernels. That dates back > to the initial Linux 2.0 SMP implementation. > > On newer SMP kernels they also patch away the lock prefix even > if they are running UP, so if you only have a single core you'll > never get lock. > After another week of chasing this - - My favorite theory is still: "human coding error" - somewhere. The LOCK_PREFIX is used or not used or mis-used by something. My second favorite theory (related to the "some sort of timing problem" suggestion: Another difference is FSB speed on the two machines - The "trouble free" case is twice as fast as the "problem" case. Such a thing should be totally transparent to the kernel, but... we do have humans writing the code. ;) > So I think it's pretty unlikely any driver relied on this. > The kernel assumes I/O coherency, but perhaps something is breaking that assumption. Not by intent, but by oversight. I posed a couple of questions to H.W. off list to pass on to the silicon grower's department. Will see what they recommend. At the moment, I am stuck with brute-force code reading. Nothing very elegant going on here. Mike > There are some special bit functions that always have LOCK, but these > are only used by the Xen drivers afaik (that is needed when a UP > kernel talks to a SMP hypervisor over shared memory) > > > I agree. I was not referring to any real/known driver. I was just trying to > > figure out what kind of problem the VIA/Centaur CPU guys tried to describe when > > indicating that the LOCK prefix should be used on UP to avoid DMA interfering > > with read-modify-write CPU instructions. > > It locks the cache line. That's a valid case in the x86 architecture, > it's just that the Linux driver model doesn't use it. > > -Andi > ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: LOCK prefix on uni processor has its use 2009-06-02 13:26 ` Michael S. Zick @ 2009-06-02 13:42 ` Andi Kleen 2009-06-03 11:46 ` Michael S. Zick 0 siblings, 1 reply; 90+ messages in thread From: Andi Kleen @ 2009-06-02 13:42 UTC (permalink / raw) To: Michael S. Zick Cc: Andi Kleen, Harald Welte, H. Peter Anvin, Ingo Molnar, Thomas Gleixner, linux-kernel, Alan Cox > After another week of chasing this - - Did you use the "compile part of the kernel with LOCK and others without" technique I described earlier? -Andi -- ak@linux.intel.com -- Speaking for myself only. ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: LOCK prefix on uni processor has its use 2009-06-02 13:42 ` Andi Kleen @ 2009-06-03 11:46 ` Michael S. Zick 0 siblings, 0 replies; 90+ messages in thread From: Michael S. Zick @ 2009-06-03 11:46 UTC (permalink / raw) To: Andi Kleen Cc: Harald Welte, H. Peter Anvin, Ingo Molnar, Thomas Gleixner, linux-kernel, Alan Cox On Tue June 2 2009, Andi Kleen wrote: > > After another week of chasing this - - > > Did you use the "compile part of the kernel with LOCK and others without" > technique I described earlier? > That would only help if it where a single point failure. Although there are some assembly language things that can be done to help in finding what to examine, like: #define LOCK_PREFIX "\n### Lock pre-fix removed:\n\t" Or whatever might help your favorite text search program. Which yields asm expansion in your *.s file (gcc -S) as: #APP # 33 "test_bytelock.c" 1 1: xchgb %ah, %al test %al,%al jz 3f ### Lock pre-fix removed: incb splock+1 2: xchgw %ax, %ax cmpb $1, splock je 2b ### Lock pre-fix removed: decb splock+1 jmp 1b 3: # 0 "" 2 #NO_APP Note: For the readers not familar with (g)as; #APP -> Assembler Pre-Process (gcc generated) <ragged whitespace and comments allowed> #NO_APP -> No Assembler Pre-Process (gcc generated) If ambitious, you can add a comment to each asm-macro to note the line and source filename of where it is defined. (the line number and name gcc put there is where it was expanded, not where it was defined). Not really too ambitious - there are only 140 files of interest (with asm-macros) in a x86, uni-processor build. Mike > -Andi ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: LOCK prefix on uni processor has its use (was Re: [BUG FIX] Make x86_32 uni-processor Atomic ops, Atomic) 2009-05-27 17:01 ` LOCK prefix on uni processor has its use (was Re: [BUG FIX] Make x86_32 uni-processor Atomic ops, Atomic) Harald Welte ` (2 preceding siblings ...) 2009-05-27 18:08 ` LOCK prefix on uni processor has its use Andi Kleen @ 2009-05-28 2:56 ` H. Peter Anvin 3 siblings, 0 replies; 90+ messages in thread From: H. Peter Anvin @ 2009-05-28 2:56 UTC (permalink / raw) To: Harald Welte; +Cc: lkml, Ingo Molnar, Thomas Gleixner, linux-kernel, Alan Cox Harald Welte wrote: > * A read-modify-write sequence cannot be interupted. > * All X86 instructions except rep-strings are atomic wrt interrupts. > * The lock prefix has uses on a UP processor: It keeps DMA devices from > interfering with a read-modify-write sequence Correct. > Now the question is: Is this a valid operation of a driver? Should the driver > do such things, or is such a driver broken? When would that occur? I'm trying > to come up with a case, but typically you e.g. allocate some DMA buffer and > then don't touch it until the hardware has processed it. The Linux driver model does not permit this as a *lot* of hardware doesn't support this correctly, and even on x86 there are lots of chipset bugs in this regard. It is of course possible to write x86-only drivers that would do this anyway, but those should not use LOCK_PREFIX instructions. -hpa -- H. Peter Anvin, Intel Open Source Technology Center I work for Intel. I don't speak on their behalf. ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [BUG FIX] Make x86_32 uni-processor Atomic ops, Atomic 2009-05-23 0:51 ` H. Peter Anvin ` (2 preceding siblings ...) 2009-05-23 18:04 ` Michael S. Zick @ 2009-05-23 20:51 ` Michael S. Zick 3 siblings, 0 replies; 90+ messages in thread From: Michael S. Zick @ 2009-05-23 20:51 UTC (permalink / raw) To: H. Peter Anvin; +Cc: Ingo Molnar, Thomas Gleixner, linux-kernel On Fri May 22 2009, H. Peter Anvin wrote: > Michael S. Zick wrote: > > Same integrated motherboard. > > Which means same CPU, same BIOS, same motherboard (none of which you're > telling us.) > HP-2133 (C7-M/CN896) - 09143 - No results - Still up - 6 hours. A "personal best" for 2.6.30 on VIA hardware. Cloudbook (C7-M/CX700) - 09143 - 45 minutes. Cloudbook (C7-M/CX700) - 09143lk - Partial results - Still up - 4 hours. Sometime recently, the echi (USB-2.0) driver went into its failure loop but the kernel lived, and the music plays on (less the external mouse). I think I will put it out of its misery now and take some time off myself. Mike > cpuinfo and dmidecode would be informative. > > -hpa > > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > > ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [BUG FIX] Make x86_32 uni-processor Atomic ops, Atomic 2009-05-22 22:21 ` Michael S. Zick 2009-05-22 23:30 ` H. Peter Anvin @ 2009-05-28 12:48 ` Pavel Machek 2009-05-28 13:29 ` Michael S. Zick 1 sibling, 1 reply; 90+ messages in thread From: Pavel Machek @ 2009-05-28 12:48 UTC (permalink / raw) To: Michael S. Zick Cc: H. Peter Anvin, Ingo Molnar, Thomas Gleixner, linux-kernel Hi! > The observation that executing an unnecessary 'lock' opcode in some > cases slows down the machine is not felt by myself to be significant > to duplicating my observations. Note: I have been wrong before. > > This is as informative as I can make the message. > > PS: *not* a single machine failure, tested on five machines, owned > by four different people, two brands, with different use histories. I have seen some problems on via c7m based machines, where some 'smart bios person' implemented EC access in AML (normally, it is accessed from ec.c driver). Maybe you have similary bad bios? -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [BUG FIX] Make x86_32 uni-processor Atomic ops, Atomic 2009-05-28 12:48 ` Pavel Machek @ 2009-05-28 13:29 ` Michael S. Zick 2009-05-28 20:50 ` Pavel Machek 0 siblings, 1 reply; 90+ messages in thread From: Michael S. Zick @ 2009-05-28 13:29 UTC (permalink / raw) To: Pavel Machek; +Cc: H. Peter Anvin, Ingo Molnar, Thomas Gleixner, linux-kernel On Thu May 28 2009, Pavel Machek wrote: > Hi! > > > The observation that executing an unnecessary 'lock' opcode in some > > cases slows down the machine is not felt by myself to be significant > > to duplicating my observations. Note: I have been wrong before. > > > > This is as informative as I can make the message. > > > > PS: *not* a single machine failure, tested on five machines, owned > > by four different people, two brands, with different use histories. > > I have seen some problems on via c7m based machines, where some 'smart > bios person' implemented EC access in AML (normally, it is accessed > from ec.c driver). Maybe you have similary bad bios? > How to tell or distingush? Did your looking at the dmidecode output show you that? Mike ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [BUG FIX] Make x86_32 uni-processor Atomic ops, Atomic 2009-05-28 13:29 ` Michael S. Zick @ 2009-05-28 20:50 ` Pavel Machek 2009-05-28 20:58 ` Michael S. Zick 0 siblings, 1 reply; 90+ messages in thread From: Pavel Machek @ 2009-05-28 20:50 UTC (permalink / raw) To: Michael S. Zick Cc: H. Peter Anvin, Ingo Molnar, Thomas Gleixner, linux-kernel On Thu 2009-05-28 08:29:13, Michael S. Zick wrote: > On Thu May 28 2009, Pavel Machek wrote: > > Hi! > > > > > The observation that executing an unnecessary 'lock' opcode in some > > > cases slows down the machine is not felt by myself to be significant > > > to duplicating my observations. Note: I have been wrong before. > > > > > > This is as informative as I can make the message. > > > > > > PS: *not* a single machine failure, tested on five machines, owned > > > by four different people, two brands, with different use histories. > > > > I have seen some problems on via c7m based machines, where some 'smart > > bios person' implemented EC access in AML (normally, it is accessed > > from ec.c driver). Maybe you have similary bad bios? > > > > How to tell or distingush? > Did your looking at the dmidecode output show you that? Disassemble DSDT, and if you see strange code duplicating kernel's ec.c driver, you have similar problem... Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [BUG FIX] Make x86_32 uni-processor Atomic ops, Atomic 2009-05-28 20:50 ` Pavel Machek @ 2009-05-28 20:58 ` Michael S. Zick 2009-05-28 21:16 ` Pavel Machek 0 siblings, 1 reply; 90+ messages in thread From: Michael S. Zick @ 2009-05-28 20:58 UTC (permalink / raw) To: Pavel Machek; +Cc: H. Peter Anvin, Ingo Molnar, Thomas Gleixner, linux-kernel On Thu May 28 2009, Pavel Machek wrote: > On Thu 2009-05-28 08:29:13, Michael S. Zick wrote: > > On Thu May 28 2009, Pavel Machek wrote: > > > Hi! > > > > > > > The observation that executing an unnecessary 'lock' opcode in some > > > > cases slows down the machine is not felt by myself to be significant > > > > to duplicating my observations. Note: I have been wrong before. > > > > > > > > This is as informative as I can make the message. > > > > > > > > PS: *not* a single machine failure, tested on five machines, owned > > > > by four different people, two brands, with different use histories. > > > > > > I have seen some problems on via c7m based machines, where some 'smart > > > bios person' implemented EC access in AML (normally, it is accessed > > > from ec.c driver). Maybe you have similary bad bios? > > > > > > > How to tell or distingush? > > Did your looking at the dmidecode output show you that? > > Disassemble DSDT, and if you see strange code duplicating kernel's > ec.c driver, you have similar problem... Someone did that but wasn't looking for "strange code" - just fixing some entry size errors. You can find the replacement DSDT here: http://forum.netbookuser.com/viewtopic.php?pid=6512#p6512 (Which I am not using, since it mostly cosmetic.) Mike > Pavel ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [BUG FIX] Make x86_32 uni-processor Atomic ops, Atomic 2009-05-28 20:58 ` Michael S. Zick @ 2009-05-28 21:16 ` Pavel Machek 2009-05-28 21:21 ` Michael S. Zick 0 siblings, 1 reply; 90+ messages in thread From: Pavel Machek @ 2009-05-28 21:16 UTC (permalink / raw) To: Michael S. Zick Cc: H. Peter Anvin, Ingo Molnar, Thomas Gleixner, linux-kernel Hi! > > > > I have seen some problems on via c7m based machines, where some 'smart > > > > bios person' implemented EC access in AML (normally, it is accessed > > > > from ec.c driver). Maybe you have similary bad bios? > > > > > > How to tell or distingush? > > > Did your looking at the dmidecode output show you that? > > > > Disassemble DSDT, and if you see strange code duplicating kernel's > > ec.c driver, you have similar problem... > > Someone did that but wasn't looking for "strange code" - just fixing > some entry size errors. > You can find the replacement DSDT here: > http://forum.netbookuser.com/viewtopic.php?pid=6512#p6512 > > (Which I am not using, since it mostly cosmetic.) Ok, it does not seem to have braindead EC implementation. The DSDT does not look familiar, so it may be different issue. (Or it is same issue and we were not able to debug it due to all the BIOS problems.) Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [BUG FIX] Make x86_32 uni-processor Atomic ops, Atomic 2009-05-28 21:16 ` Pavel Machek @ 2009-05-28 21:21 ` Michael S. Zick 0 siblings, 0 replies; 90+ messages in thread From: Michael S. Zick @ 2009-05-28 21:21 UTC (permalink / raw) To: Pavel Machek; +Cc: H. Peter Anvin, Ingo Molnar, Thomas Gleixner, linux-kernel On Thu May 28 2009, Pavel Machek wrote: > Hi! > > > > > > I have seen some problems on via c7m based machines, where some 'smart > > > > > bios person' implemented EC access in AML (normally, it is accessed > > > > > from ec.c driver). Maybe you have similary bad bios? > > > > > > > > How to tell or distingush? > > > > Did your looking at the dmidecode output show you that? > > > > > > Disassemble DSDT, and if you see strange code duplicating kernel's > > > ec.c driver, you have similar problem... > > > > Someone did that but wasn't looking for "strange code" - just fixing > > some entry size errors. > > You can find the replacement DSDT here: > > http://forum.netbookuser.com/viewtopic.php?pid=6512#p6512 > > > > (Which I am not using, since it mostly cosmetic.) > > Ok, it does not seem to have braindead EC implementation. The DSDT > does not look familiar, so it may be different issue. (Or it is same > issue and we were not able to debug it due to all the BIOS problems.) > Thanks for taking a look, it would have meant nothing to me. Mike > Pavel ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [BUG FIX] Make x86_32 uni-processor Atomic ops, Atomic 2009-05-22 18:36 ` Ingo Molnar 2009-05-22 18:59 ` H. Peter Anvin @ 2009-05-22 19:17 ` Michael S. Zick 1 sibling, 0 replies; 90+ messages in thread From: Michael S. Zick @ 2009-05-22 19:17 UTC (permalink / raw) To: Ingo Molnar; +Cc: linux-kernel On Fri May 22 2009, you wrote: > > * Michael S. Zick <lkml@morethan.org> wrote: > > > Found in the bit-rot for 32-bit, x86, Uni-processor builds: > > > > diff --git a/arch/x86/include/asm/alternative.h b/arch/x86/include/asm/alternative.h > > index f6aa18e..3c790ef 100644 > > --- a/arch/x86/include/asm/alternative.h > > +++ b/arch/x86/include/asm/alternative.h > > @@ -35,7 +35,7 @@ > > "661:\n\tlock; " > > > > #else /* ! CONFIG_SMP */ > > -#define LOCK_PREFIX "" > > +#define LOCK_PREFIX "\n\tlock; " > > #endif > > What is your motivation for this change? At first sight this makes > the UP kernel a bit larger and a bit smaller. Are you fixing some > real regression/bug here? > Yes - but not easy to test for unless you have hardware that can generate an interrupt flood for long enough period of time to catch the atomic ops inbetween the read bus cycle and the write bus cycle - a very small window. As luck (good? bad? ugly?) would have it, I have a SDHC card and machine organization that will trigger a flood from the ehci_hcd driver. A poor man's test setup. Even with that bit of luck, it takes from minutes to hours to hit the window. The single lockdep dump I posted was the result of nearly a month's testing. It is a _small_ window. ;) Mike > Ingo > > ^ permalink raw reply [flat|nested] 90+ messages in thread
[parent not found: <200905221343.30638.lkml@morethan.org>]
[parent not found: <20090522192329.GF846@one.firstfloor.org>]
* Re: [BUG FIX] Make x86_32 uni-processor Atomic ops, Atomic [not found] ` <20090522192329.GF846@one.firstfloor.org> @ 2009-05-22 19:53 ` Michael S. Zick 2009-05-22 20:05 ` Samuel Thibault 0 siblings, 1 reply; 90+ messages in thread From: Michael S. Zick @ 2009-05-22 19:53 UTC (permalink / raw) To: Andi Kleen; +Cc: linux-kernel On Fri May 22 2009, Andi Kleen wrote: > On Fri, May 22, 2009 at 01:43:27PM -0500, Michael S. Zick wrote: > > On Fri May 22 2009, you wrote: > > > "Michael S. Zick" <lkml@morethan.org> writes: > > > > > > > Found in the bit-rot for 32-bit, x86, Uni-processor builds: > > > > > > Actually uni processor should not use the lock prefix > > > because it doesn't need it; the only exception are some special > > > ops used in para-virtualization which are special cased. > > > > > > > Unless you have interrupts enabled, then you have two contexts. > > Interrupts on the local CPU don't interrupt instructions, only > inbetween. > Ref: http://developer.intel.com/Assets/PDF/manual/253666.pdf Manual page: 3-590 PDF page: 638 Summary: Processors prior to P-4 can take an interrupt between the read cycle and the write cycle. Which is why opcode 0xF0 exists. Mike > -Andi ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [BUG FIX] Make x86_32 uni-processor Atomic ops, Atomic 2009-05-22 19:53 ` Michael S. Zick @ 2009-05-22 20:05 ` Samuel Thibault 2009-05-22 20:32 ` Michael S. Zick 0 siblings, 1 reply; 90+ messages in thread From: Samuel Thibault @ 2009-05-22 20:05 UTC (permalink / raw) To: Michael S. Zick; +Cc: Andi Kleen, linux-kernel Michael S. Zick, le Fri 22 May 2009 14:53:39 -0500, a écrit : > Ref: http://developer.intel.com/Assets/PDF/manual/253666.pdf > Manual page: 3-590 PDF page: 638 > Summary: Processors prior to P-4 can take an interrupt between > the read cycle and the write cycle. Which is why opcode 0xF0 exists. Where do you see page 638/639 talking about interrupts? It talks about multi-processor machines. Samuel ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [BUG FIX] Make x86_32 uni-processor Atomic ops, Atomic 2009-05-22 20:05 ` Samuel Thibault @ 2009-05-22 20:32 ` Michael S. Zick 2009-05-22 20:42 ` Andi Kleen ` (3 more replies) 0 siblings, 4 replies; 90+ messages in thread From: Michael S. Zick @ 2009-05-22 20:32 UTC (permalink / raw) To: Samuel Thibault; +Cc: Andi Kleen, linux-kernel On Fri May 22 2009, Samuel Thibault wrote: > Michael S. Zick, le Fri 22 May 2009 14:53:39 -0500, a écrit : > > Ref: http://developer.intel.com/Assets/PDF/manual/253666.pdf > > Manual page: 3-590 PDF page: 638 > > Summary: Processors prior to P-4 can take an interrupt between > > the read cycle and the write cycle. Which is why opcode 0xF0 exists. > > Where do you see page 638/639 talking about interrupts? It talks about > multi-processor machines. > No - it talks about "exclusive memory access" - You got bus master DMA in your test machine? You also have an older than P-4 single processor? Look people, I just reported what I found from testing - Please don't shoot the messanger. If it: "Does not make a difference" then it "Should not make a difference" but it does, try it yourself. Its safe (if LOCK_PREFIX is in the proper places) - the machine will ignore the opcode if is recent enough to not need it - just trust the cpu's micro-code. Mike > Samuel > > ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [BUG FIX] Make x86_32 uni-processor Atomic ops, Atomic 2009-05-22 20:32 ` Michael S. Zick @ 2009-05-22 20:42 ` Andi Kleen 2009-05-22 20:57 ` Michael S. Zick 2009-05-22 20:43 ` Samuel Thibault ` (2 subsequent siblings) 3 siblings, 1 reply; 90+ messages in thread From: Andi Kleen @ 2009-05-22 20:42 UTC (permalink / raw) To: Michael S. Zick; +Cc: Samuel Thibault, Andi Kleen, linux-kernel > If it: "Does not make a difference" then it "Should not make a difference" > but it does, try it yourself. Its safe (if LOCK_PREFIX is in the proper > places) - the machine will ignore the opcode if is recent enough to not > need it - just trust the cpu's micro-code. It doesn't ignore it, in fact it's extremly slow on some older systems where all atomic operations are very costly. That is why LOCK is avoided as much as possible. -Andi -- ak@linux.intel.com -- Speaking for myself only. ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [BUG FIX] Make x86_32 uni-processor Atomic ops, Atomic 2009-05-22 20:42 ` Andi Kleen @ 2009-05-22 20:57 ` Michael S. Zick 0 siblings, 0 replies; 90+ messages in thread From: Michael S. Zick @ 2009-05-22 20:57 UTC (permalink / raw) To: Andi Kleen; +Cc: Samuel Thibault, linux-kernel On Fri May 22 2009, Andi Kleen wrote: > > If it: "Does not make a difference" then it "Should not make a difference" > > but it does, try it yourself. Its safe (if LOCK_PREFIX is in the proper > > places) - the machine will ignore the opcode if is recent enough to not > > need it - just trust the cpu's micro-code. > > It doesn't ignore it, in fact it's extremly slow on some older systems > where all atomic operations are very costly. > That is why LOCK is avoided as much as possible. > I'm only the messanger. Mike > -Andi ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [BUG FIX] Make x86_32 uni-processor Atomic ops, Atomic 2009-05-22 20:32 ` Michael S. Zick 2009-05-22 20:42 ` Andi Kleen @ 2009-05-22 20:43 ` Samuel Thibault 2009-05-22 21:59 ` Andi Kleen 2009-05-22 20:45 ` Roland Dreier 2009-05-24 18:59 ` Robert Hancock 3 siblings, 1 reply; 90+ messages in thread From: Samuel Thibault @ 2009-05-22 20:43 UTC (permalink / raw) To: Michael S. Zick; +Cc: Andi Kleen, linux-kernel Michael S. Zick, le Fri 22 May 2009 15:32:41 -0500, a écrit : > On Fri May 22 2009, Samuel Thibault wrote: > > Michael S. Zick, le Fri 22 May 2009 14:53:39 -0500, a écrit : > > > Ref: http://developer.intel.com/Assets/PDF/manual/253666.pdf > > > Manual page: 3-590 PDF page: 638 > > > Summary: Processors prior to P-4 can take an interrupt between > > > the read cycle and the write cycle. Which is why opcode 0xF0 exists. > > > > Where do you see page 638/639 talking about interrupts? It talks about > > multi-processor machines. > > No - it talks about "exclusive memory access" Right, that's still not interrupts. > - You got bus master DMA in your test machine? That's not related to the LOCK_PREFIX concern, which is about the processor only, not interaction with other devices. > Look people, I just reported what I found from testing - What did you test, precisely? Samuel ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [BUG FIX] Make x86_32 uni-processor Atomic ops, Atomic 2009-05-22 20:43 ` Samuel Thibault @ 2009-05-22 21:59 ` Andi Kleen 2009-05-22 22:00 ` Samuel Thibault 0 siblings, 1 reply; 90+ messages in thread From: Andi Kleen @ 2009-05-22 21:59 UTC (permalink / raw) To: Samuel Thibault, Michael S. Zick, Andi Kleen, linux-kernel > > - You got bus master DMA in your test machine? > > That's not related to the LOCK_PREFIX concern, which is about the > processor only, not interaction with other devices. Actually it's related to other devices; but only very few (most MMIO doesn't support atomic cycles and is uncached anyways). But there's no driver for real hardware in Linux that relies on it to my knowledge. -Andi -- ak@linux.intel.com -- Speaking for myself only. ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [BUG FIX] Make x86_32 uni-processor Atomic ops, Atomic 2009-05-22 21:59 ` Andi Kleen @ 2009-05-22 22:00 ` Samuel Thibault 2009-05-22 22:14 ` Andi Kleen 0 siblings, 1 reply; 90+ messages in thread From: Samuel Thibault @ 2009-05-22 22:00 UTC (permalink / raw) To: Andi Kleen; +Cc: Michael S. Zick, linux-kernel Andi Kleen, le Fri 22 May 2009 23:59:39 +0200, a écrit : > > > - You got bus master DMA in your test machine? > > > > That's not related to the LOCK_PREFIX concern, which is about the > > processor only, not interaction with other devices. > > Actually it's related to other devices; but only very few (most MMIO > doesn't support atomic cycles and is uncached anyways). But there's no driver > for real hardware in Linux that relies on it to my knowledge. That's what I meant: AIUI, LOCK_PREFIX has always only been used for inter-processor interaction (atomic variables, spinlocks, etc.), not for processor-device interaction. Samuel ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [BUG FIX] Make x86_32 uni-processor Atomic ops, Atomic 2009-05-22 22:00 ` Samuel Thibault @ 2009-05-22 22:14 ` Andi Kleen 2009-05-22 22:14 ` Samuel Thibault 0 siblings, 1 reply; 90+ messages in thread From: Andi Kleen @ 2009-05-22 22:14 UTC (permalink / raw) To: Samuel Thibault, Andi Kleen, Michael S. Zick, linux-kernel > That's what I meant: AIUI, LOCK_PREFIX has always only been used for > inter-processor interaction (atomic variables, spinlocks, etc.), not for PCI has a locked transaction, but I don't think it's widely supported. With normal uncached access it is also not very useful. > processor-device interaction. Well in Linux yes, but not architecturally in x86. That is why the CPUs don't just nop it out with a single core (which Michael assumes they do, but they don't) -Andi -- ak@linux.intel.com -- Speaking for myself only. ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [BUG FIX] Make x86_32 uni-processor Atomic ops, Atomic 2009-05-22 22:14 ` Andi Kleen @ 2009-05-22 22:14 ` Samuel Thibault 0 siblings, 0 replies; 90+ messages in thread From: Samuel Thibault @ 2009-05-22 22:14 UTC (permalink / raw) To: Andi Kleen; +Cc: Michael S. Zick, linux-kernel Andi Kleen, le Sat 23 May 2009 00:14:56 +0200, a écrit : > > That's what I meant: AIUI, LOCK_PREFIX has always only been used for > > inter-processor interaction (atomic variables, spinlocks, etc.), not for > > PCI has a locked transaction, but I don't think it's widely supported. > With normal uncached access it is also not very useful. I'm not talking about the LOCK prefix. I'm talking about the LOCK_PREFIX macro. I'm saying that AIUI it has never been supposed to be used for procesor-device interaction, even if the LOCK prefix could be used for that. Samuel ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [BUG FIX] Make x86_32 uni-processor Atomic ops, Atomic 2009-05-22 20:32 ` Michael S. Zick 2009-05-22 20:42 ` Andi Kleen 2009-05-22 20:43 ` Samuel Thibault @ 2009-05-22 20:45 ` Roland Dreier 2009-05-24 18:59 ` Robert Hancock 3 siblings, 0 replies; 90+ messages in thread From: Roland Dreier @ 2009-05-22 20:45 UTC (permalink / raw) To: lkml; +Cc: Samuel Thibault, Andi Kleen, linux-kernel > > > Ref: http://developer.intel.com/Assets/PDF/manual/253666.pdf > > > Manual page: 3-590 PDF page: 638 > > > Summary: Processors prior to P-4 can take an interrupt between > > > the read cycle and the write cycle. Which is why opcode 0xF0 exists. > > Where do you see page 638/639 talking about interrupts? It talks about > > multi-processor machines. > No - it talks about "exclusive memory access" - You got bus master DMA > in your test machine? You also have an older than P-4 single processor? I looked at the page you refer to. I talks about asserting the LOCK# signal -- there is absolutely no mention of the lock prefix having any effect on the execution of an instruction internal to a single CPU. Could you be more specific about what you are referring to? > Look people, I just reported what I found from testing - > Please don't shoot the messanger. Could you be specific about the test you are doing? What operation are you doing that is missing the lock prefix? What is the expected result, and what actually happens without the lock prefix? - R. ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [BUG FIX] Make x86_32 uni-processor Atomic ops, Atomic 2009-05-22 20:32 ` Michael S. Zick ` (2 preceding siblings ...) 2009-05-22 20:45 ` Roland Dreier @ 2009-05-24 18:59 ` Robert Hancock 3 siblings, 0 replies; 90+ messages in thread From: Robert Hancock @ 2009-05-24 18:59 UTC (permalink / raw) To: lkml; +Cc: Samuel Thibault, Andi Kleen, linux-kernel Michael S. Zick wrote: > On Fri May 22 2009, Samuel Thibault wrote: >> Michael S. Zick, le Fri 22 May 2009 14:53:39 -0500, a écrit : >>> Ref: http://developer.intel.com/Assets/PDF/manual/253666.pdf >>> Manual page: 3-590 PDF page: 638 >>> Summary: Processors prior to P-4 can take an interrupt between >>> the read cycle and the write cycle. Which is why opcode 0xF0 exists. >> Where do you see page 638/639 talking about interrupts? It talks about >> multi-processor machines. >> > > No - it talks about "exclusive memory access" - You got bus master DMA > in your test machine? You also have an older than P-4 single processor? It means that LOCK is required in multi-processor environment to ensure that an instruction executes atomically WRT memory operations being done on other CPUs. On a single processor, except for some weird exceptions (like rep instructions, which can't be LOCKed anyways), instructions are always atomic with respect to interrupts. > > Look people, I just reported what I found from testing - > Please don't shoot the messanger. > > If it: "Does not make a difference" then it "Should not make a difference" > but it does, try it yourself. Its safe (if LOCK_PREFIX is in the proper > places) - the machine will ignore the opcode if is recent enough to not > need it - just trust the cpu's micro-code. What do you mean "recent enough to not need it?" There is no such thing. On any x86 machine it does something. It will slow things down, and there is no reason it should be required on uni-processor systems. Quite likely that's the only effect adding the LOCK prefix is having, slowing things down, and covering up whatever is causing your issue, without having anything to do with the root cause. > > Mike >> Samuel >> >> > > ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [BUG FIX] Make x86_32 uni-processor Atomic ops, Atomic @ 2009-05-22 18:50 Michael S. Zick 2009-05-22 19:24 ` Roland Dreier 0 siblings, 1 reply; 90+ messages in thread From: Michael S. Zick @ 2009-05-22 18:50 UTC (permalink / raw) To: linux-kernel On Fri May 22 2009, you wrote: > "Michael S. Zick" <lkml@morethan.org> writes: > > > Found in the bit-rot for 32-bit, x86, Uni-processor builds: > > Actually uni processor should not use the lock prefix > because it doesn't need it; the only exception are some special > ops used in para-virtualization which are special cased. > Unless you have interrupts enabled, then you have two contexts. Only xchg is "naturally" atomic. Mike > -Andi > ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [BUG FIX] Make x86_32 uni-processor Atomic ops, Atomic 2009-05-22 18:50 Michael S. Zick @ 2009-05-22 19:24 ` Roland Dreier 2009-05-22 20:03 ` Michael S. Zick 0 siblings, 1 reply; 90+ messages in thread From: Roland Dreier @ 2009-05-22 19:24 UTC (permalink / raw) To: Michael S. Zick; +Cc: linux-kernel > Unless you have interrupts enabled, then you have two contexts. > Only xchg is "naturally" atomic. Isn't the lock prefix about consistency between multiple processors? The x86 architecture always handles interrupts on instruction boundaries. I'm guessing you're worried about definitions like static inline void atomic_inc(atomic_t *v) { asm volatile(LOCK_PREFIX "incl %0" : "+m" (v->counter)); } which compiles to just "incl" (with no lock prefix) on uniprocessor kernels; but the IA-32 architecture guarantees that the incl instruction cannot be interrupted between reading the old value and writing the new value. - R. ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [BUG FIX] Make x86_32 uni-processor Atomic ops, Atomic 2009-05-22 19:24 ` Roland Dreier @ 2009-05-22 20:03 ` Michael S. Zick 0 siblings, 0 replies; 90+ messages in thread From: Michael S. Zick @ 2009-05-22 20:03 UTC (permalink / raw) To: Roland Dreier; +Cc: linux-kernel On Fri May 22 2009, Roland Dreier wrote: > > > Unless you have interrupts enabled, then you have two contexts. > > Only xchg is "naturally" atomic. > > Isn't the lock prefix about consistency between multiple processors? > The x86 architecture always handles interrupts on instruction > boundaries. I'm guessing you're worried about definitions like > > static inline void atomic_inc(atomic_t *v) > { > asm volatile(LOCK_PREFIX "incl %0" > : "+m" (v->counter)); > } > > which compiles to just "incl" (with no lock prefix) on uniprocessor > kernels; but the IA-32 architecture guarantees that the incl instruction > cannot be interrupted between reading the old value and writing the new > value. > Not prior to P-4, and since then only "may" be done atomically, see reference post in my earlier reply. PS: And yes, that was where I spotted the usage first. ;) Mike > - R. > > ^ permalink raw reply [flat|nested] 90+ messages in thread
end of thread, other threads:[~2009-06-03 11:46 UTC | newest]
Thread overview: 90+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-05-22 16:39 [BUG FIX] Make x86_32 uni-processor Atomic ops, Atomic Michael S. Zick
2009-05-22 18:23 ` Andi Kleen
2009-05-22 18:36 ` Ingo Molnar
2009-05-22 18:59 ` H. Peter Anvin
2009-05-22 19:20 ` Michael S. Zick
2009-05-22 22:21 ` Michael S. Zick
2009-05-22 23:30 ` H. Peter Anvin
2009-05-23 0:45 ` Michael S. Zick
2009-05-23 0:51 ` H. Peter Anvin
2009-05-23 10:44 ` Michael S. Zick
2009-05-23 11:18 ` Michael S. Zick
2009-05-24 7:04 ` Harald Welte
2009-05-24 12:48 ` Michael S. Zick
2009-05-24 15:43 ` Michael S. Zick
2009-05-27 22:13 ` Roland Dreier
2009-05-27 22:33 ` Michael S. Zick
2009-05-23 15:52 ` Michael S. Zick
2009-05-23 18:04 ` Michael S. Zick
2009-05-23 23:44 ` H. Peter Anvin
2009-05-24 6:49 ` Harald Welte
2009-05-24 12:38 ` Michael S. Zick
2009-05-24 17:31 ` Harald Welte
2009-05-27 12:18 ` Re:[VIA Support] was: " Michael S. Zick
2009-05-27 12:22 ` [VIA " Michael S. Zick
2009-05-27 12:47 ` Harald Welte
2009-05-27 13:00 ` Michael S. Zick
2009-05-29 12:06 ` Michael S. Zick
2009-05-30 15:48 ` Michael S. Zick
2009-05-24 12:27 ` Michael S. Zick
2009-05-24 17:22 ` Harald Welte
2009-05-24 18:00 ` H. Peter Anvin
2009-05-24 18:32 ` Michael S. Zick
2009-05-24 18:46 ` H. Peter Anvin
2009-05-24 19:09 ` Michael S. Zick
2009-05-25 19:03 ` Michael S. Zick
2009-05-25 19:18 ` Michael S. Zick
2009-05-25 19:46 ` Michael S. Zick
2009-05-25 21:10 ` Michael S. Zick
2009-05-25 21:17 ` H. Peter Anvin
2009-05-25 23:03 ` Michael S. Zick
2009-05-25 23:35 ` Michael S. Zick
2009-05-26 0:05 ` H. Peter Anvin
2009-05-26 12:37 ` Michael S. Zick
2009-05-26 17:13 ` H. Peter Anvin
2009-05-25 1:31 ` i2c-viapro / via-fb drivers on VIA CX700 Harald Welte
2009-05-25 12:54 ` Michael S. Zick
2009-05-27 13:36 ` Michael S. Zick
2009-05-25 16:05 ` [BUG FIX] Make x86_32 uni-processor Atomic ops, Atomic Michael S. Zick
2009-05-28 20:30 ` Pavel Machek
2009-05-28 20:54 ` Michael S. Zick
2009-05-28 23:15 ` [Futex RFC] was " Michael S. Zick
2009-05-29 2:00 ` Michael S. Zick
2009-05-27 17:01 ` LOCK prefix on uni processor has its use (was Re: [BUG FIX] Make x86_32 uni-processor Atomic ops, Atomic) Harald Welte
2009-05-27 17:10 ` Michael S. Zick
2009-05-27 17:19 ` Thomas Gleixner
2009-05-27 17:25 ` Michael S. Zick
2009-05-27 18:08 ` LOCK prefix on uni processor has its use Andi Kleen
2009-05-27 18:22 ` Michael S. Zick
2009-05-27 18:33 ` Michael S. Zick
2009-05-27 18:55 ` Michael S. Zick
2009-05-27 18:38 ` Andi Kleen
2009-06-02 12:48 ` Harald Welte
2009-06-02 13:03 ` Andi Kleen
2009-06-02 13:26 ` Michael S. Zick
2009-06-02 13:42 ` Andi Kleen
2009-06-03 11:46 ` Michael S. Zick
2009-05-28 2:56 ` LOCK prefix on uni processor has its use (was Re: [BUG FIX] Make x86_32 uni-processor Atomic ops, Atomic) H. Peter Anvin
2009-05-23 20:51 ` [BUG FIX] Make x86_32 uni-processor Atomic ops, Atomic Michael S. Zick
2009-05-28 12:48 ` Pavel Machek
2009-05-28 13:29 ` Michael S. Zick
2009-05-28 20:50 ` Pavel Machek
2009-05-28 20:58 ` Michael S. Zick
2009-05-28 21:16 ` Pavel Machek
2009-05-28 21:21 ` Michael S. Zick
2009-05-22 19:17 ` Michael S. Zick
[not found] ` <200905221343.30638.lkml@morethan.org>
[not found] ` <20090522192329.GF846@one.firstfloor.org>
2009-05-22 19:53 ` Michael S. Zick
2009-05-22 20:05 ` Samuel Thibault
2009-05-22 20:32 ` Michael S. Zick
2009-05-22 20:42 ` Andi Kleen
2009-05-22 20:57 ` Michael S. Zick
2009-05-22 20:43 ` Samuel Thibault
2009-05-22 21:59 ` Andi Kleen
2009-05-22 22:00 ` Samuel Thibault
2009-05-22 22:14 ` Andi Kleen
2009-05-22 22:14 ` Samuel Thibault
2009-05-22 20:45 ` Roland Dreier
2009-05-24 18:59 ` Robert Hancock
-- strict thread matches above, loose matches on Subject: below --
2009-05-22 18:50 Michael S. Zick
2009-05-22 19:24 ` Roland Dreier
2009-05-22 20:03 ` Michael S. Zick
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox