* PPC upstream kernel ignored DABR bug @ 2007-11-26 22:02 Jan Kratochvil 2007-11-27 22:35 ` Arnd Bergmann 0 siblings, 1 reply; 28+ messages in thread From: Jan Kratochvil @ 2007-11-26 22:02 UTC (permalink / raw) To: Paul Mackerras, linuxppc-dev; +Cc: Roland McGrath Hi, this testcase: http://people.redhat.com/jkratoch/dabr-lost.c reproduces a PPC DABR kernel bug. The variable `variable' should not get modified as the thread modifying it should be caught by its DABR: $ ./dabr-lost TID 30914: DABR 0x10012a77 NIP 0x80f6ebb318 TID 30915: DABR 0x10012a77 NIP 0x80f6ebb318 TID 30916: DABR 0x10012a77 NIP 0x80f6ebb318 TID 30914: hitting the variable TID 30915: hitting the variable TID 30916: hitting the variable variable found = 30916, caught TID = 30914 TID 30916: DABR 0x10012a77 Variable got modified by a thread which has DABR still set! At the `variable found =' line the parent ptracer found the TID thread 30916 wrote the value into the variable - despite it had DABR alrady set before. As the behavior is dependent on the current weather I expect the scheduling matters there. It is important the target thread is in the `nanosleep' syscall. If you define WORKAROUND_SET_DABR_IN_SYSCALL in the testcase it busyloops in the userland and the bug gets no longer reproduced. I got it reproduced on a utrace-patched kernel on dual-CPU Power5 and Roland McGrath reported it reproduced on the vanilla upstream kernel on a Mac G5. Regards, Jan Kratochvil ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: PPC upstream kernel ignored DABR bug 2007-11-26 22:02 PPC upstream kernel ignored DABR bug Jan Kratochvil @ 2007-11-27 22:35 ` Arnd Bergmann 2007-11-28 8:59 ` Jan Kratochvil 2007-11-28 22:59 ` Geoff Levand 0 siblings, 2 replies; 28+ messages in thread From: Arnd Bergmann @ 2007-11-27 22:35 UTC (permalink / raw) To: linuxppc-dev; +Cc: Paul Mackerras, Jan Kratochvil, Roland McGrath On Monday 26 November 2007, Jan Kratochvil wrote: > Hi, >=20 > this testcase: > =A0=A0=A0=A0=A0=A0=A0=A0http://people.redhat.com/jkratoch/dabr-lost.c >=20 > reproduces a PPC DABR kernel bug. =A0The variable `variable' should not g= et > modified as the thread modifying it should be caught by its DABR: >=20 > $ ./dabr-lost > TID 30914: DABR 0x10012a77 NIP 0x80f6ebb318 > TID 30915: DABR 0x10012a77 NIP 0x80f6ebb318 > TID 30916: DABR 0x10012a77 NIP 0x80f6ebb318 > TID 30914: hitting the variable > TID 30915: hitting the variable > TID 30916: hitting the variable > variable found =3D 30916, caught TID =3D 30914 > TID 30916: DABR 0x10012a77 > Variable got modified by a thread which has DABR still set! >=20 This sounds like a bug recently reported by Uli Weigand. BenH said he'd take a look, but it probably fell under the table. The problem found by Uli is that on certain processors (Cell/B.E. in his case), the DABRX register needs to be set in order for the DABR to take effect. Arnd <>< ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: PPC upstream kernel ignored DABR bug 2007-11-27 22:35 ` Arnd Bergmann @ 2007-11-28 8:59 ` Jan Kratochvil 2007-11-28 12:28 ` Arnd Bergmann 2007-11-28 22:59 ` Geoff Levand 1 sibling, 1 reply; 28+ messages in thread From: Jan Kratochvil @ 2007-11-28 8:59 UTC (permalink / raw) To: Arnd Bergmann; +Cc: linuxppc-dev, Paul Mackerras, Roland McGrath On Tue, 27 Nov 2007 23:35:36 +0100, Arnd Bergmann wrote: > On Monday 26 November 2007, Jan Kratochvil wrote: > > Hi, > > > > this testcase: > > http://people.redhat.com/jkratoch/dabr-lost.c > > > > reproduces a PPC DABR kernel bug. The variable `variable' should not get > > modified as the thread modifying it should be caught by its DABR: > > > > $ ./dabr-lost > > TID 30914: DABR 0x10012a77 NIP 0x80f6ebb318 > > TID 30915: DABR 0x10012a77 NIP 0x80f6ebb318 > > TID 30916: DABR 0x10012a77 NIP 0x80f6ebb318 > > TID 30914: hitting the variable > > TID 30915: hitting the variable > > TID 30916: hitting the variable > > variable found = 30916, caught TID = 30914 > > TID 30916: DABR 0x10012a77 > > Variable got modified by a thread which has DABR still set! > > > > This sounds like a bug recently reported by Uli Weigand. BenH > said he'd take a look, but it probably fell under the table. > The problem found by Uli is that on certain processors (Cell/B.E. > in his case), the DABRX register needs to be set in order for > the DABR to take effect. Please be aware DABR works fine if the same code runs just 1 (always) or 2 (sometimes) threads. It starts failing with too many threads running: $ ./dabr-lost TID 32725: DABR 0x1001279f NIP 0xfecf41c TID 32726: DABR 0x1001279f NIP 0xfecf41c TID 32725: hitting the variable variable found = -1, caught TID = 32725 TID 32726: hitting the variable variable found = -1, caught TID = 32726 The kernel bug did not get reproduced - increase THREADS. As I did not find any code in that kernel touching DABRX its value should not be dependent on the number of threads running. Regards, Lace ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: PPC upstream kernel ignored DABR bug 2007-11-28 8:59 ` Jan Kratochvil @ 2007-11-28 12:28 ` Arnd Bergmann 2007-11-28 12:45 ` Jan Kratochvil 0 siblings, 1 reply; 28+ messages in thread From: Arnd Bergmann @ 2007-11-28 12:28 UTC (permalink / raw) To: Jan Kratochvil; +Cc: linuxppc-dev, Paul Mackerras, Roland McGrath On Wednesday 28 November 2007, Jan Kratochvil wrote: > Please be aware DABR works fine if the same code runs just 1 (always) or > 2 (sometimes) threads. =C2=A0It starts failing with too many threads runn= ing: >=20 > $ ./dabr-lost > TID 32725: DABR 0x1001279f NIP 0xfecf41c > TID 32726: DABR 0x1001279f NIP 0xfecf41c > TID 32725: hitting the variable > variable found =3D -1, caught TID =3D 32725 > TID 32726: hitting the variable > variable found =3D -1, caught TID =3D 32726 > The kernel bug did not get reproduced - increase THREADS. >=20 > As I did not find any code in that kernel touching DABRX its value should= not > be dependent on the number of threads running. >=20 Right, this is a different problem from the one reported by Uli. =46rom what I can tell, your problem is that you set the DABR only in one thread, so the other threads don't see it. DABR is saved in the thread_struct, so setting it in one thread doesn't have an impact on any other thread. Arnd <>< ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: PPC upstream kernel ignored DABR bug 2007-11-28 12:28 ` Arnd Bergmann @ 2007-11-28 12:45 ` Jan Kratochvil 0 siblings, 0 replies; 28+ messages in thread From: Jan Kratochvil @ 2007-11-28 12:45 UTC (permalink / raw) To: Arnd Bergmann; +Cc: linuxppc-dev, Paul Mackerras, Roland McGrath On Wed, 28 Nov 2007 13:28:48 +0100, Arnd Bergmann wrote: > On Wednesday 28 November 2007, Jan Kratochvil wrote: > > Please be aware DABR works fine if the same code runs just 1 (always) or > > 2 (sometimes) threads. It starts failing with too many threads running: > > > > $ ./dabr-lost > > TID 32725: DABR 0x1001279f NIP 0xfecf41c > > TID 32726: DABR 0x1001279f NIP 0xfecf41c > > TID 32725: hitting the variable > > variable found = -1, caught TID = 32725 > > TID 32726: hitting the variable > > variable found = -1, caught TID = 32726 > > The kernel bug did not get reproduced - increase THREADS. > > > > As I did not find any code in that kernel touching DABRX its value should not > > be dependent on the number of threads running. > > > > Right, this is a different problem from the one reported by Uli. > From what I can tell, your problem is that you set the DABR only > in one thread, so the other threads don't see it. DABR is saved > in the thread_struct, so setting it in one thread doesn't have > an impact on any other thread. It even prints out above: TID 32725: DABR 0x1001279f NIP 0xfecf41c TID 32726: DABR 0x1001279f NIP 0xfecf41c that it wrote DABR in both the threads and it has also successfully read it back from each thread specifically (according to its thread-specific TID). for (threadi = 0; threadi < THREADS; threadi++) { pid_t tid = thread[threadi]; setup (tid); ... } static void setup (pid_t tid) { ... l = ptrace (PTRACE_SET_DEBUGREG, tid, NULL, (void *) dabr); ... } Also if I would not set DABR specifically for each thread it would not work in 90% of cases for `THREADS == 2'. And it would not work for `THREADS == 4' if they are busylooping (therefore not in a syscall). TID 596: DABR 0x100127a7 NIP 0x10000dbc TID 597: DABR 0x100127a7 NIP 0x10000db0 TID 598: DABR 0x100127a7 NIP 0x10000dac TID 599: DABR 0x100127a7 NIP 0x10000dbc TID 596: hitting the variable variable found = -1, caught TID = 596 TID 599: hitting the variable variable found = -1, caught TID = 599 TID 597: hitting the variable variable found = -1, caught TID = 597 TID 598: hitting the variable variable found = -1, caught TID = 598 The kernel bug got workarounded by WORKAROUND_SET_DABR_IN_SYSCALL. (I found out now WORKAROUND_SET_DABR_IN_SYSCALL only reduces the probability of the failure, it is not a 100% workaround of the problem in the testcase.) There is some tricky kernel code around it but I did not try to debug it: struct task_struct *__switch_to(struct task_struct *prev, struct task_struct *new) { ... if (unlikely(__get_cpu_var(current_dabr) != new->thread.dabr)) { set_dabr(new->thread.dabr); __get_cpu_var(current_dabr) = new->thread.dabr; } ... } Regards, Jan ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: PPC upstream kernel ignored DABR bug 2007-11-27 22:35 ` Arnd Bergmann 2007-11-28 8:59 ` Jan Kratochvil @ 2007-11-28 22:59 ` Geoff Levand 2007-11-29 0:13 ` Arnd Bergmann 1 sibling, 1 reply; 28+ messages in thread From: Geoff Levand @ 2007-11-28 22:59 UTC (permalink / raw) To: Arnd Bergmann Cc: linuxppc-dev, Paul Mackerras, Roland McGrath, Jan Kratochvil Arnd Bergmann wrote: > On Monday 26 November 2007, Jan Kratochvil wrote: >> Hi, >> >> this testcase: >> http://people.redhat.com/jkratoch/dabr-lost.c >> >> reproduces a PPC DABR kernel bug. The variable `variable' should not get >> modified as the thread modifying it should be caught by its DABR: >> >> $ ./dabr-lost >> TID 30914: DABR 0x10012a77 NIP 0x80f6ebb318 >> TID 30915: DABR 0x10012a77 NIP 0x80f6ebb318 >> TID 30916: DABR 0x10012a77 NIP 0x80f6ebb318 >> TID 30914: hitting the variable >> TID 30915: hitting the variable >> TID 30916: hitting the variable >> variable found = 30916, caught TID = 30914 >> TID 30916: DABR 0x10012a77 >> Variable got modified by a thread which has DABR still set! >> > > This sounds like a bug recently reported by Uli Weigand. BenH > said he'd take a look, but it probably fell under the table. > The problem found by Uli is that on certain processors (Cell/B.E. > in his case), the DABRX register needs to be set in order for > the DABR to take effect. Just as a note, the PS3's lv1_set_dabr(), which we used for ppc_md.set_dabr sets up both the DABRX and DABR registers. -Geoff ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: PPC upstream kernel ignored DABR bug 2007-11-28 22:59 ` Geoff Levand @ 2007-11-29 0:13 ` Arnd Bergmann 2008-03-10 0:53 ` Luis Machado 0 siblings, 1 reply; 28+ messages in thread From: Arnd Bergmann @ 2007-11-29 0:13 UTC (permalink / raw) To: Geoff Levand; +Cc: linuxppc-dev, Paul Mackerras, Roland McGrath, Jan Kratochvil On Wednesday 28 November 2007 23:59:36 Geoff Levand wrote: > > This sounds like a bug recently reported by Uli Weigand. BenH > > said he'd take a look, but it probably fell under the table. > > The problem found by Uli is that on certain processors (Cell/B.E. > > in his case), the DABRX register needs to be set in order for > > the DABR to take effect. > > Just as a note, the PS3's lv1_set_dabr(), which we used for > ppc_md.set_dabr sets up both the DABRX and DABR registers. Yes, I know. I tried it on the PS3 first and couldn't reproduce the bug he saw on the blade. Arnd <>< ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: PPC upstream kernel ignored DABR bug 2007-11-29 0:13 ` Arnd Bergmann @ 2008-03-10 0:53 ` Luis Machado 2008-03-10 14:01 ` Jens Osterkamp 0 siblings, 1 reply; 28+ messages in thread From: Luis Machado @ 2008-03-10 0:53 UTC (permalink / raw) To: Arnd Bergmann Cc: linuxppc-dev, Paul Mackerras, Roland McGrath, Jan Kratochvil > Yes, I know. I tried it on the PS3 first and couldn't reproduce > the bug he saw on the blade. Arnd, Do we have any news on this topic? I've seen this happening quite often within GDB when using hardware watchpoints on a shared variable in a threaded (7+ threads) binary. Sometimes the watchpoint won't trigger, even though the monitored variable's value was modified. Appreciate your feedback. Best regards, -- Luis Machado LoP Toolchain Software Engineer IBM Linux Technology Center ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: PPC upstream kernel ignored DABR bug 2008-03-10 0:53 ` Luis Machado @ 2008-03-10 14:01 ` Jens Osterkamp 2008-03-10 15:13 ` Luis Machado ` (2 more replies) 0 siblings, 3 replies; 28+ messages in thread From: Jens Osterkamp @ 2008-03-10 14:01 UTC (permalink / raw) To: linuxppc-dev, luisgpm Cc: Jan Kratochvil, Paul Mackerras, Roland McGrath, Arnd Bergmann On Monday 10 March 2008, Luis Machado wrote: > > Yes, I know. I tried it on the PS3 first and couldn't reproduce > > the bug he saw on the blade. > > Arnd, > > Do we have any news on this topic? > > I've seen this happening quite often within GDB when using hardware > watchpoints on a shared variable in a threaded (7+ threads) binary. > Sometimes the watchpoint won't trigger, even though the monitored > variable's value was modified. On the Blade DABRX had to be set additional to DABR. PS3 and Celleb already did this. Uli Weigand found this back in November. I submitted a patch for this which went into 2.6.25-rc4. Can you please try again with rc4 ? Gruß, Jens IBM Deutschland Entwicklung GmbH Vorsitzender des Aufsichtsrats: Martin Jetter Geschäftsführung: Herbert Kircher Sitz der Gesellschaft: Böblingen Registergericht: Amtsgericht Stuttgart, HRB 243294 ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: PPC upstream kernel ignored DABR bug 2008-03-10 14:01 ` Jens Osterkamp @ 2008-03-10 15:13 ` Luis Machado 2008-03-10 19:19 ` Roland McGrath 2008-03-12 17:51 ` Luis Machado 2 siblings, 0 replies; 28+ messages in thread From: Luis Machado @ 2008-03-10 15:13 UTC (permalink / raw) To: Jens Osterkamp Cc: linuxppc-dev, Paul Mackerras, Roland McGrath, Arnd Bergmann, Jan Kratochvil > On the Blade DABRX had to be set additional to DABR. PS3 and Celleb > already did this. Uli Weigand found this back in November. I submitted > a patch for this which went into 2.6.25-rc4. > Can you please try again with rc4 ? I will try it and will post the results back. Thanks Jens. Regards, -- Luis Machado Software Engineer IBM Linux Technology Center ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: PPC upstream kernel ignored DABR bug 2008-03-10 14:01 ` Jens Osterkamp 2008-03-10 15:13 ` Luis Machado @ 2008-03-10 19:19 ` Roland McGrath 2008-03-10 19:36 ` Luis Machado 2008-03-10 22:06 ` Segher Boessenkool 2008-03-12 17:51 ` Luis Machado 2 siblings, 2 replies; 28+ messages in thread From: Roland McGrath @ 2008-03-10 19:19 UTC (permalink / raw) To: Jens Osterkamp Cc: linuxppc-dev, Jan Kratochvil, Paul Mackerras, Arnd Bergmann > On the Blade DABRX had to be set additional to DABR. PS3 and Celleb > already did this. Uli Weigand found this back in November. I submitted > a patch for this which went into 2.6.25-rc4. > Can you please try again with rc4 ? This is not the problem. This came up before and everyone seems have forgotten. This bug has been reproduced on G5's, which do not have DABRX as I understand it. Thanks, Roland ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: PPC upstream kernel ignored DABR bug 2008-03-10 19:19 ` Roland McGrath @ 2008-03-10 19:36 ` Luis Machado 2008-03-10 19:50 ` Olof Johansson 2008-03-10 22:06 ` Segher Boessenkool 1 sibling, 1 reply; 28+ messages in thread From: Luis Machado @ 2008-03-10 19:36 UTC (permalink / raw) To: Roland McGrath Cc: Jan Kratochvil, Paul Mackerras, Arnd Bergmann, linuxppc-dev On Mon, 2008-03-10 at 12:19 -0700, Roland McGrath wrote: > > On the Blade DABRX had to be set additional to DABR. PS3 and Celleb > > already did this. Uli Weigand found this back in November. I submitted > > a patch for this which went into 2.6.25-rc4. > > Can you please try again with rc4 ? > > This is not the problem. This came up before and everyone seems have > forgotten. This bug has been reproduced on G5's, which do not have DABRX > as I understand it. Yes, now that you mentioned, i've been able to reproduce this on 970FX's blades, which i don't think have DABRX registers. I guess it's the almost the same CPU as G5's. Regards, -- Luis Machado Software Engineer IBM Linux Technology Center ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: PPC upstream kernel ignored DABR bug 2008-03-10 19:36 ` Luis Machado @ 2008-03-10 19:50 ` Olof Johansson 2008-03-10 19:54 ` Roland McGrath 0 siblings, 1 reply; 28+ messages in thread From: Olof Johansson @ 2008-03-10 19:50 UTC (permalink / raw) To: Luis Machado Cc: Paul Mackerras, Jan Kratochvil, Arnd Bergmann, Roland McGrath, linuxppc-dev On Mon, Mar 10, 2008 at 04:36:37PM -0300, Luis Machado wrote: > On Mon, 2008-03-10 at 12:19 -0700, Roland McGrath wrote: > > > On the Blade DABRX had to be set additional to DABR. PS3 and Celleb > > > already did this. Uli Weigand found this back in November. I submitted > > > a patch for this which went into 2.6.25-rc4. > > > Can you please try again with rc4 ? > > > > This is not the problem. This came up before and everyone seems have > > forgotten. This bug has been reproduced on G5's, which do not have DABRX > > as I understand it. > > Yes, now that you mentioned, i've been able to reproduce this on 970FX's > blades, which i don't think have DABRX registers. I guess it's the > almost the same CPU as G5's. What Apple called G5 were during the production runs three different CPUs: 970 970FX 970MP 970 was only used in the very first models. 970MP was used in the last (the models with pci-express and up to 4 cpus). 970FX was used on almost everything else inbetween. -Olof ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: PPC upstream kernel ignored DABR bug 2008-03-10 19:50 ` Olof Johansson @ 2008-03-10 19:54 ` Roland McGrath 0 siblings, 0 replies; 28+ messages in thread From: Roland McGrath @ 2008-03-10 19:54 UTC (permalink / raw) To: Olof Johansson Cc: linuxppc-dev, Paul Mackerras, Jan Kratochvil, Arnd Bergmann The G5 that I have says: cpu : PPC970FX, altivec supported revision : 3.0 (pvr 003c 0300) and it does indeed reproduce this bug. It also strange for it to be the DABRX issue given the failure mode. That is, it works sometimes but unreliably (as if the context switch sometimes fails to install the value). Thanks, Roland ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: PPC upstream kernel ignored DABR bug 2008-03-10 19:19 ` Roland McGrath 2008-03-10 19:36 ` Luis Machado @ 2008-03-10 22:06 ` Segher Boessenkool 1 sibling, 0 replies; 28+ messages in thread From: Segher Boessenkool @ 2008-03-10 22:06 UTC (permalink / raw) To: Roland McGrath Cc: linuxppc-dev, Jan Kratochvil, Paul Mackerras, Arnd Bergmann >> On the Blade DABRX had to be set additional to DABR. PS3 and Celleb >> already did this. Uli Weigand found this back in November. I submitted >> a patch for this which went into 2.6.25-rc4. >> Can you please try again with rc4 ? > > This is not the problem. This came up before and everyone seems have > forgotten. This bug has been reproduced on G5's, which do not have > DABRX > as I understand it. 970 (all versions) _does_ have a DABRX register. Dunno if it has the same register definition (I cannot find DABRX in the Cell docs). Segher ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: PPC upstream kernel ignored DABR bug 2008-03-10 14:01 ` Jens Osterkamp 2008-03-10 15:13 ` Luis Machado 2008-03-10 19:19 ` Roland McGrath @ 2008-03-12 17:51 ` Luis Machado 2008-03-12 22:30 ` Jens Osterkamp 2 siblings, 1 reply; 28+ messages in thread From: Luis Machado @ 2008-03-12 17:51 UTC (permalink / raw) To: Jens Osterkamp Cc: linuxppc-dev, Paul Mackerras, Roland McGrath, Arnd Bergmann, Jan Kratochvil Hi, > On the Blade DABRX had to be set additional to DABR. PS3 and Celleb > already did this. Uli Weigand found this back in November. I submitted > a patch for this which went into 2.6.25-rc4. > Can you please try again with rc4 ? > Gruß, > > Jens Just to make sure, i tested the binary against the 2.6.25-rc4 kernel. It still fails. So this is really an open bug for PPC. -- Luis Machado Software Engineer IBM Linux Technology Center ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: PPC upstream kernel ignored DABR bug 2008-03-12 17:51 ` Luis Machado @ 2008-03-12 22:30 ` Jens Osterkamp 2008-03-13 1:47 ` Roland McGrath 2008-03-13 13:13 ` Luis Machado 0 siblings, 2 replies; 28+ messages in thread From: Jens Osterkamp @ 2008-03-12 22:30 UTC (permalink / raw) To: luisgpm Cc: linuxppc-dev, Paul Mackerras, Roland McGrath, Arnd Bergmann, Jan Kratochvil > Just to make sure, i tested the binary against the 2.6.25-rc4 kernel. It > still fails. So this is really an open bug for PPC. On a Cell- or 970-based machine ? Gruß, Jens IBM Deutschland Entwicklung GmbH Vorsitzender des Aufsichtsrats: Martin Jetter Geschäftsführung: Herbert Kircher Sitz der Gesellschaft: Böblingen Registergericht: Amtsgericht Stuttgart, HRB 243294 ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: PPC upstream kernel ignored DABR bug 2008-03-12 22:30 ` Jens Osterkamp @ 2008-03-13 1:47 ` Roland McGrath 2008-03-13 22:20 ` Segher Boessenkool 2008-03-26 20:57 ` Josh Boyer 2008-03-13 13:13 ` Luis Machado 1 sibling, 2 replies; 28+ messages in thread From: Roland McGrath @ 2008-03-13 1:47 UTC (permalink / raw) To: Jens Osterkamp Cc: linuxppc-dev, Jan Kratochvil, Paul Mackerras, Arnd Bergmann AFAICT the DABRX register just has two global bits that enable paying attention to the DABR register. It only needs to be set once at boot time (as the cell code does). I don't see how missing that initialization could ever have explained the behavior we see where DABR matches are intermittent. If those DABRX bits weren't set then no DABR match would have happened. (Apparently they are set before boot on an Apple G5.) What we actually see is that DABR matches seem to be reliable when things are slow, and get intermittent when there are enough threads with DABR set. I searched the web trying to figure out what a DABRX register does so I could just go try it myself rather than waiting another n months for powerpc folks to forget about it again. (I did try it, and mtspr(SPRN_DABRX, DABRX_KERNEL | DABRX_USER); makes no difference to the test on my machine, even done in set_dabr every time we set SPRN_DABR.) I happened across: http://www-01.ibm.com/chips/techlib/techlib.nsf/techdocs/79B6E24422AA101287256E93006C957E/$file/PowerPC_970FX_errata_DD3.X_V1.7.pdf which is "IBM PowerPC 970FX RISC Microprocessor Errata List for DD3.X" and contains "Erratum #8: DABRX register might not always be updated correctly": Projected Impact The data address breakpoint function might not always work. Workaround None. Status A fix is not planned at this time for the PowerPC 970FX. The only machine I have at home for testing powerpc is an Apple G5, supplied to me by IBM. It says: cpu : PPC970FX, altivec supported revision : 3.0 (pvr 003c 0300) so I am guessing this document applies to the chips I have. Since I can't test on other chips myself, it is plausible from what I've seen that there is no mysterious kernel problem and only this hardware problem. The description of the hardware problem would not make me think that it would behave this way, but it is not very detailed or precise, or at least does not seem so to a reader not expert on powerpc. So, uh, go IBM! I'm in the minority in this conversation as someone not expert on powerpc, and as someone not employed by IBM. (I don't really mind finding public IBM documents about powerpc on the web and telling IBM powerpc folks about them. But, well.) I don't know what I can do next to tell whether this processor erratum is in fact what's happening in the test case. If it is, I don't know if there might be some arcane way to work around it despite "None" cited above. Thanks, Roland ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: PPC upstream kernel ignored DABR bug 2008-03-13 1:47 ` Roland McGrath @ 2008-03-13 22:20 ` Segher Boessenkool 2008-03-13 22:42 ` Roland McGrath 2008-03-16 20:37 ` Benjamin Herrenschmidt 2008-03-26 20:57 ` Josh Boyer 1 sibling, 2 replies; 28+ messages in thread From: Segher Boessenkool @ 2008-03-13 22:20 UTC (permalink / raw) To: Roland McGrath Cc: linuxppc-dev, Jan Kratochvil, Paul Mackerras, Arnd Bergmann > AFAICT the DABRX register just has two global bits that enable paying > attention to the DABR register. It has four bits: 01 match in user mode 02 match in supervisor mode 04 match in hypervisor mode 08 ignore translation field in DABR If the kernel can write to DABRX, it is running in hypervisor mode, so it should set 07 instead of 03 (as it currently does) if it wants to match in kernel mode; or 01, if it doesn't. OTOH, the Apple version of the 970 is special (it has no separate hypervisor mode); still, 07 should always work. > It only needs to be set once at boot time > (as the cell code does). I don't see how missing that initialization > could > ever have explained the behavior we see where DABR matches are > intermittent. > If those DABRX bits weren't set then no DABR match would have happened. > (Apparently they are set before boot on an Apple G5.) I don't see the Apple boot code initialising DABRX; maybe the bootup state for DABRX is 07, dunno. Either way, it would be good if the kernel set it properly, esp. if it wants to enable or disable matches in the kernel itself. > What we actually see is that DABR matches seem to be reliable when > things > are slow, and get intermittent when there are enough threads with DABR > set. > I happened across: > > http://www-01.ibm.com/chips/techlib/techlib.nsf/techdocs/ > 79B6E24422AA101287256E93006C957E/$file/ > PowerPC_970FX_errata_DD3.X_V1.7.pdf > > which is "IBM PowerPC 970FX RISC Microprocessor Errata List for DD3.X" > and contains "Erratum #8: DABRX register might not always be updated > correctly": > The only machine I have at home for testing powerpc is an Apple G5, > supplied to me by IBM. It says: > cpu : PPC970FX, altivec supported > revision : 3.0 (pvr 003c 0300) > so I am guessing this document applies to the chips I have. Indeed. > Since I can't > test on other chips myself, it is plausible from what I've seen that > there > is no mysterious kernel problem and only this hardware problem. The > description of the hardware problem would not make me think that it > would > behave this way, but it is not very detailed or precise, or at least > does > not seem so to a reader not expert on powerpc. Since the 970 kernel never sets DABRX currently, #8 cannot explain _intermittent_ problems: either it always works, or never does. You could be happening upon #5, if the non-triggering data breakpoints are with vector loads/stores in strange code. > I don't know what I can do next to tell whether this processor erratum > is in > fact what's happening in the test case. If it is, I don't know if > there > might be some arcane way to work around it despite "None" cited above. It would help if you could give us the disassembly of some code where the breakpoint did not trigger; say, that insn and the previous 20 or so insns. Segher ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: PPC upstream kernel ignored DABR bug 2008-03-13 22:20 ` Segher Boessenkool @ 2008-03-13 22:42 ` Roland McGrath 2008-03-14 2:11 ` Segher Boessenkool 2008-03-16 20:37 ` Benjamin Herrenschmidt 1 sibling, 1 reply; 28+ messages in thread From: Roland McGrath @ 2008-03-13 22:42 UTC (permalink / raw) To: Segher Boessenkool Cc: linuxppc-dev, Jan Kratochvil, Paul Mackerras, Arnd Bergmann > Since the 970 kernel never sets DABRX currently, #8 cannot explain > _intermittent_ problems: either it always works, or never does. That's kind of what I thought, but I couldn't make enough sense of the #8 text to be very sure. > You could be happening upon #5, if the non-triggering data breakpoints > are with vector loads/stores in strange code. They are not. > It would help if you could give us the disassembly of some code where the > breakpoint did not trigger; say, that insn and the previous 20 or so insns. The pointer to the test case was given here before. http://sources.redhat.com/cgi-bin/cvsweb.cgi/~checkout~/tests/ptrace-tests/tests/ppc-dabr-race.c?cvsroot=systemtap -m32 Dump of assembler code for function child_thread: 0x10000950 <child_thread+0>: stwu r1,-32(r1) 0x10000954 <child_thread+4>: li r3,207 0x10000958 <child_thread+8>: mflr r0 0x1000095c <child_thread+12>: stw r29,20(r1) 0x10000960 <child_thread+16>: stw r0,36(r1) 0x10000964 <child_thread+20>: crclr 4*cr1+eq 0x10000968 <child_thread+24>: bl 0x10001680 <syscall> 0x1000096c <child_thread+28>: lis r11,4097 0x10000970 <child_thread+32>: mr r29,r3 0x10000974 <child_thread+36>: li r3,1 0x10000978 <child_thread+40>: lwz r9,7800(r11) 0x1000097c <child_thread+44>: addi r9,r9,1 0x10000980 <child_thread+48>: stw r9,7800(r11) 0x10000984 <child_thread+52>: bl 0x10001750 <sleep> 0x10000988 <child_thread+56>: lis r9,4097 ---> 0x1000098c <child_thread+60>: stw r29,7792(r9) 0x10000990 <child_thread+64>: bl 0x10001760 <pause> 0x10000994 <child_thread+68>: bl 0x10001760 <pause> 0x10000998 <child_thread+72>: b 0x10000990 <child_thread+64> End of assembler dump. -m64 Dump of assembler code for function child_thread: 0x0000000010000d10 <child_thread+0>: mflr r0 0x0000000010000d14 <child_thread+4>: std r29,-24(r1) 0x0000000010000d18 <child_thread+8>: li r3,207 0x0000000010000d1c <child_thread+12>: std r0,16(r1) 0x0000000010000d20 <child_thread+16>: stdu r1,-144(r1) 0x0000000010000d24 <child_thread+20>: bl 0x10000b68 0x0000000010000d28 <child_thread+24>: ld r2,40(r1) 0x0000000010000d2c <child_thread+28>: ld r11,-32696(r2) 0x0000000010000d30 <child_thread+32>: mr r29,r3 0x0000000010000d34 <child_thread+36>: li r3,1 0x0000000010000d38 <child_thread+40>: extsw r29,r29 0x0000000010000d3c <child_thread+44>: lwz r9,0(r11) 0x0000000010000d40 <child_thread+48>: addi r9,r9,1 0x0000000010000d44 <child_thread+52>: clrldi r9,r9,32 0x0000000010000d48 <child_thread+56>: stw r9,0(r11) 0x0000000010000d4c <child_thread+60>: bl 0x10000a88 0x0000000010000d50 <child_thread+64>: ld r2,40(r1) 0x0000000010000d54 <child_thread+68>: ld r9,-32688(r2) ---> 0x0000000010000d58 <child_thread+72>: std r29,0(r9) 0x0000000010000d5c <child_thread+76>: nop 0x0000000010000d60 <child_thread+80>: bl 0x100009a8 0x0000000010000d64 <child_thread+84>: ld r2,40(r1) 0x0000000010000d68 <child_thread+88>: b 0x10000d60 <child_thread+80> 0x0000000010000d6c <child_thread+92>: .long 0x0 0x0000000010000d70 <child_thread+96>: .long 0x1 0x0000000010000d74 <child_thread+100>: lwz r0,0(r3) End of assembler dump. Thanks, Roland ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: PPC upstream kernel ignored DABR bug 2008-03-13 22:42 ` Roland McGrath @ 2008-03-14 2:11 ` Segher Boessenkool 2008-03-14 7:45 ` Roland McGrath 0 siblings, 1 reply; 28+ messages in thread From: Segher Boessenkool @ 2008-03-14 2:11 UTC (permalink / raw) To: Roland McGrath Cc: linuxppc-dev, Jan Kratochvil, Paul Mackerras, Arnd Bergmann > The pointer to the test case was given here before. Oh, I missed that. Anyway, I wanted to see the asm, and who knows, with different compiler versions and all that. > 0x10000984 <child_thread+52>: bl 0x10001750 <sleep> > 0x10000988 <child_thread+56>: lis r9,4097 > ---> 0x1000098c <child_thread+60>: stw r29,7792(r9) > 0x0000000010000d4c <child_thread+60>: bl 0x10000a88 > 0x0000000010000d50 <child_thread+64>: ld r2,40(r1) > 0x0000000010000d54 <child_thread+68>: ld r9,-32688(r2) > ---> 0x0000000010000d58 <child_thread+72>: std r29,0(r9) In both these cases, the storage access goes to LSU0, so you're not hitting the errata. I noticed set_dabr() doesn't do proper synchronisation insns, could you try this patch? I doubt it helps, but it changes the code to do "the right thing". diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c index 4846bf5..ee925f5 100644 --- a/arch/powerpc/kernel/process.c +++ b/arch/powerpc/kernel/process.c @@ -250,7 +250,9 @@ int set_dabr(unsigned long dabr) /* XXX should we have a CPU_FTR_HAS_DABR ? */ #if defined(CONFIG_PPC64) || defined(CONFIG_6xx) + asm("sync"); mtspr(SPRN_DABR, dabr); + asm("isync"); #endif return 0; } (badly copy/pasted, please apply by hand. Will send a real patch later ;-) ) If this doesn't help, and the failures stay intermittent, I don't think there is a close-to-the-hardware problem here. Segher ^ permalink raw reply related [flat|nested] 28+ messages in thread
* Re: PPC upstream kernel ignored DABR bug 2008-03-14 2:11 ` Segher Boessenkool @ 2008-03-14 7:45 ` Roland McGrath 2008-03-14 8:42 ` Segher Boessenkool 0 siblings, 1 reply; 28+ messages in thread From: Roland McGrath @ 2008-03-14 7:45 UTC (permalink / raw) To: Segher Boessenkool Cc: linuxppc-dev, Jan Kratochvil, Paul Mackerras, Arnd Bergmann > In both these cases, the storage access goes to LSU0, so you're > not hitting the errata. I'll take your word for it. > If this doesn't help, and the failures stay intermittent, I don't think > there is a close-to-the-hardware problem here. I saw no effect from that change. So now we're back to pure mystery, I guess. Thanks, Roland ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: PPC upstream kernel ignored DABR bug 2008-03-14 7:45 ` Roland McGrath @ 2008-03-14 8:42 ` Segher Boessenkool 2008-03-16 20:38 ` Benjamin Herrenschmidt 0 siblings, 1 reply; 28+ messages in thread From: Segher Boessenkool @ 2008-03-14 8:42 UTC (permalink / raw) To: Roland McGrath Cc: linuxppc-dev, Jan Kratochvil, Paul Mackerras, Arnd Bergmann >> If this doesn't help, and the failures stay intermittent, I don't >> think >> there is a close-to-the-hardware problem here. > > I saw no effect from that change. So now we're back to pure mystery, > I guess. Hey, we know something now: it's "just" a problem in the kernel :-) Segher ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: PPC upstream kernel ignored DABR bug 2008-03-14 8:42 ` Segher Boessenkool @ 2008-03-16 20:38 ` Benjamin Herrenschmidt 0 siblings, 0 replies; 28+ messages in thread From: Benjamin Herrenschmidt @ 2008-03-16 20:38 UTC (permalink / raw) To: Segher Boessenkool Cc: linuxppc-dev, Paul Mackerras, Jan Kratochvil, Arnd Bergmann, Roland McGrath On Fri, 2008-03-14 at 09:42 +0100, Segher Boessenkool wrote: > > I saw no effect from that change. So now we're back to pure > mystery, > > I guess. > > Hey, we know something now: it's "just" a problem in the kernel :-) We don't know that for sure. The DABR context switching code is trivial enough... Ben. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: PPC upstream kernel ignored DABR bug 2008-03-13 22:20 ` Segher Boessenkool 2008-03-13 22:42 ` Roland McGrath @ 2008-03-16 20:37 ` Benjamin Herrenschmidt 1 sibling, 0 replies; 28+ messages in thread From: Benjamin Herrenschmidt @ 2008-03-16 20:37 UTC (permalink / raw) To: Segher Boessenkool Cc: linuxppc-dev, Paul Mackerras, Jan Kratochvil, Arnd Bergmann, Roland McGrath > Since the 970 kernel never sets DABRX currently, #8 cannot explain > _intermittent_ problems: either it always works, or never does. Uh... could be the boot code setting it, the setting happening on LSU0 but not LSU1. No ? Ben. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: PPC upstream kernel ignored DABR bug 2008-03-13 1:47 ` Roland McGrath 2008-03-13 22:20 ` Segher Boessenkool @ 2008-03-26 20:57 ` Josh Boyer 2008-03-27 1:47 ` Josh Boyer 1 sibling, 1 reply; 28+ messages in thread From: Josh Boyer @ 2008-03-26 20:57 UTC (permalink / raw) To: Roland McGrath Cc: Arnd Bergmann, Jan, linuxppc-dev, Paul Mackerras, Kratochvil On Wed, 12 Mar 2008 18:47:45 -0700 (PDT) Roland McGrath <roland@redhat.com> wrote: > The only machine I have at home for testing powerpc is an Apple G5, > supplied to me by IBM. It says: > cpu : PPC970FX, altivec supported > revision : 3.0 (pvr 003c 0300) > so I am guessing this document applies to the chips I have. Since I can't > test on other chips myself, it is plausible from what I've seen that there > is no mysterious kernel problem and only this hardware problem. The > description of the hardware problem would not make me think that it would > behave this way, but it is not very detailed or precise, or at least does > not seem so to a reader not expert on powerpc. I ran the testcase on my older G5 today with: cpu : PPC970, altivec supported revision : 2.2 (pvr 0039 0202) and it also failed after a few iterations. This was with 2.6.25-0.121.rc5.git4.fc9 as the kernel, which is fairly close to mainline. At the least, this doesn't seem to be 970FX related. I'll try building a vanilla 2.6.25-rc7 later this evening to see if that makes a difference. josh ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: PPC upstream kernel ignored DABR bug 2008-03-26 20:57 ` Josh Boyer @ 2008-03-27 1:47 ` Josh Boyer 0 siblings, 0 replies; 28+ messages in thread From: Josh Boyer @ 2008-03-27 1:47 UTC (permalink / raw) To: Josh Boyer Cc: Arnd Bergmann, linuxppc-dev, Paul Mackerras, Jan, Kratochvil, Roland McGrath On Wed, 26 Mar 2008 15:57:32 -0500 Josh Boyer <jwboyer@linux.vnet.ibm.com> wrote: > On Wed, 12 Mar 2008 18:47:45 -0700 (PDT) > Roland McGrath <roland@redhat.com> wrote: > > > The only machine I have at home for testing powerpc is an Apple G5, > > supplied to me by IBM. It says: > > cpu : PPC970FX, altivec supported > > revision : 3.0 (pvr 003c 0300) > > so I am guessing this document applies to the chips I have. Since I can't > > test on other chips myself, it is plausible from what I've seen that there > > is no mysterious kernel problem and only this hardware problem. The > > description of the hardware problem would not make me think that it would > > behave this way, but it is not very detailed or precise, or at least does > > not seem so to a reader not expert on powerpc. > > I ran the testcase on my older G5 today with: > > cpu : PPC970, altivec supported > revision : 2.2 (pvr 0039 0202) > > and it also failed after a few iterations. This was with > 2.6.25-0.121.rc5.git4.fc9 as the kernel, which is fairly close to mainline. At the least, this doesn't seem to be 970FX related. I'll try building a vanilla 2.6.25-rc7 later this evening to see if that makes a difference. Still failed with a -vanilla build of 2.6.25-rc7. josh ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: PPC upstream kernel ignored DABR bug 2008-03-12 22:30 ` Jens Osterkamp 2008-03-13 1:47 ` Roland McGrath @ 2008-03-13 13:13 ` Luis Machado 1 sibling, 0 replies; 28+ messages in thread From: Luis Machado @ 2008-03-13 13:13 UTC (permalink / raw) To: Jens Osterkamp Cc: linuxppc-dev, Paul Mackerras, Roland McGrath, Arnd Bergmann, Jan Kratochvil On Wed, 2008-03-12 at 23:30 +0100, Jens Osterkamp wrote: > > Just to make sure, i tested the binary against the 2.6.25-rc4 kernel. It > > still fails. So this is really an open bug for PPC. > > On a Cell- or 970-based machine ? > > Gruß, > Jens On a 970-based machine. Regards, -- Luis Machado Software Engineer IBM Linux Technology Center ^ permalink raw reply [flat|nested] 28+ messages in thread
end of thread, other threads:[~2008-03-27 1:49 UTC | newest] Thread overview: 28+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2007-11-26 22:02 PPC upstream kernel ignored DABR bug Jan Kratochvil 2007-11-27 22:35 ` Arnd Bergmann 2007-11-28 8:59 ` Jan Kratochvil 2007-11-28 12:28 ` Arnd Bergmann 2007-11-28 12:45 ` Jan Kratochvil 2007-11-28 22:59 ` Geoff Levand 2007-11-29 0:13 ` Arnd Bergmann 2008-03-10 0:53 ` Luis Machado 2008-03-10 14:01 ` Jens Osterkamp 2008-03-10 15:13 ` Luis Machado 2008-03-10 19:19 ` Roland McGrath 2008-03-10 19:36 ` Luis Machado 2008-03-10 19:50 ` Olof Johansson 2008-03-10 19:54 ` Roland McGrath 2008-03-10 22:06 ` Segher Boessenkool 2008-03-12 17:51 ` Luis Machado 2008-03-12 22:30 ` Jens Osterkamp 2008-03-13 1:47 ` Roland McGrath 2008-03-13 22:20 ` Segher Boessenkool 2008-03-13 22:42 ` Roland McGrath 2008-03-14 2:11 ` Segher Boessenkool 2008-03-14 7:45 ` Roland McGrath 2008-03-14 8:42 ` Segher Boessenkool 2008-03-16 20:38 ` Benjamin Herrenschmidt 2008-03-16 20:37 ` Benjamin Herrenschmidt 2008-03-26 20:57 ` Josh Boyer 2008-03-27 1:47 ` Josh Boyer 2008-03-13 13:13 ` Luis Machado
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).