* Re: [parisc-linux] Mysterious hangs with parisc [not found] ` <200606240241.k5O2fALQ001885@hiauly1.hia.nrc.ca> @ 2006-07-05 16:47 ` Kyle McMartin 2006-07-06 18:22 ` Joel Soete 0 siblings, 1 reply; 11+ messages in thread From: Kyle McMartin @ 2006-07-05 16:47 UTC (permalink / raw) To: John David Anglin; +Cc: James.Bottomley, parisc-linux On Fri, Jun 23, 2006 at 10:41:09PM -0400, John David Anglin wrote: > This is not to say that the locking code can't be improved. As far > as I can tell, hpux doesn't spin. It has pre-abitration, priorities > for waiters, etc. When a lock is released, suwaiters is called to > pass the lock in a fair way to another waiter if there is one. > HPUX spinlocks /do/ spin, afaict. They just keep track of how long they've been spinning for (and panic after 60 seconds.) On release of a spinlock, they also donate it to another cpu, which is, I guess, the arbitration thing. I can back this up with disassembly, but I wouldn't be comfortable doing it over mail, due to copyright infringement and blah blah. Cheers, Kyle M. _______________________________________________ parisc-linux mailing list parisc-linux@lists.parisc-linux.org http://lists.parisc-linux.org/mailman/listinfo/parisc-linux ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [parisc-linux] Mysterious hangs with parisc 2006-07-05 16:47 ` [parisc-linux] Mysterious hangs with parisc Kyle McMartin @ 2006-07-06 18:22 ` Joel Soete 2006-07-06 18:34 ` Kyle McMartin 0 siblings, 1 reply; 11+ messages in thread From: Joel Soete @ 2006-07-06 18:22 UTC (permalink / raw) To: Kyle McMartin; +Cc: James.Bottomley, John David Anglin, parisc-linux Kyle McMartin wrote: > On Fri, Jun 23, 2006 at 10:41:09PM -0400, John David Anglin wrote: > >>This is not to say that the locking code can't be improved. As far >>as I can tell, hpux doesn't spin. It has pre-abitration, priorities >>for waiters, etc. When a lock is released, suwaiters is called to >>pass the lock in a fair way to another waiter if there is one. >> > > > HPUX spinlocks /do/ spin, afaict. They just keep track of how long > they've been spinning for (and panic after 60 seconds.) On release of > a spinlock, they also donate it to another cpu, which is, I guess, the > arbitration thing. I can back this up with disassembly, but I wouldn't > be comfortable doing it over mail, due to copyright infringement and > blah blah. > > Cheers, > Kyle M. Cool ;-) That said to followup another related thread <http://lists.parisc-linux.org/pipermail/parisc-linux/2006-July/029481.html>, the do_gettimeofday() jejb's patch doesn't help: after just some hours of my stress test: top - 17:52:25 up 21:57, 3 users, load average: 8.16, 7.18, 6.91 Tasks: 84 total, 5 running, 79 sleeping, 0 stopped, 0 zombie Cpu0 : 53.7% us, 35.3% sy, 0.0% ni, 0.0% id, 0.0% wa, 0.0% hi, 11.0% si Cpu1 : 86.1% us, 10.2% sy, 0.0% ni, 0.0% id, 0.7% wa, 0.0% hi, 2.9% si Mem: 4114224k total, 4018628k used, 95596k free, 483816k buffers Swap: 250872k total, 4k used, 250868k free, 436632k cached I still encountered the well know BUG: soft lockup detected BUG: soft lockup detected on CPU#1! Backtrace: [<00000000101122b0>] dump_stack+0x18/0x28 [<0000000010171c60>] softlockup_tick+0x128/0x158 [<0000000010151a00>] run_local_timers+0x28/0x38 [<0000000010152770>] update_process_times+0x58/0xd8 [<000000001011cc08>] smp_do_timer+0x70/0x80 [<00000000101134b4>] timer_interrupt+0xdc/0x1e0 [<0000000010171e04>] handle_IRQ_event+0x74/0xd0 [<0000000010171f1c>] __do_IRQ+0xbc/0x268 [<0000000010113e74>] do_cpu_irq_mask+0x114/0x1e0 [<0000000010104074>] intr_return+0x0/0x1c BUG: soft lockup detected on CPU#0! Backtrace: [<00000000101122b0>] dump_stack+0x18/0x28 [<0000000010171c60>] softlockup_tick+0x128/0x158 [<0000000010151a00>] run_local_timers+0x28/0x38 [<0000000010152770>] update_process_times+0x58/0xd8 [<000000001011cc08>] smp_do_timer+0x70/0x80 [<00000000101134b4>] timer_interrupt+0xdc/0x1e0 [<0000000010171e04>] handle_IRQ_event+0x74/0xd0 [<0000000010171f1c>] __do_IRQ+0xbc/0x268 [<0000000010113e74>] do_cpu_irq_mask+0x114/0x1e0 [<0000000010104074>] intr_return+0x0/0x1c (only a small diff with previous failures: here it shows also 'CPU#1!') Otoh previous test seems to show that older gcc-3.3 would produce a more operational kernel but I haven't any idea where to look for? All guide line would be welcome. Thanks, Joel _______________________________________________ parisc-linux mailing list parisc-linux@lists.parisc-linux.org http://lists.parisc-linux.org/mailman/listinfo/parisc-linux ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [parisc-linux] Mysterious hangs with parisc 2006-07-06 18:22 ` Joel Soete @ 2006-07-06 18:34 ` Kyle McMartin 2006-07-21 20:05 ` [parisc-linux] Mysterious hangs with parisc (a send_group_sig_info() analysis) Joel Soete 0 siblings, 1 reply; 11+ messages in thread From: Kyle McMartin @ 2006-07-06 18:34 UTC (permalink / raw) To: Joel Soete Cc: Kyle McMartin, James.Bottomley, parisc-linux, John David Anglin On Thu, Jul 06, 2006 at 06:22:40PM +0000, Joel Soete wrote: > [<000000001011cc08>] smp_do_timer+0x70/0x80 You're running an SMP kernel. time.c: static inline unsigned long gettimeoffset (void) { #ifndef CONFIG_SMP /* * FIXME: This won't work on smp because jiffies are updated by cpu 0. * Once parisc-linux learns the cr16 difference between processors, * this could be made to work. */ long last_tick; long elapsed_cycles; /* it_value is the intended time of the next tick */ last_tick = cpu_data[smp_processor_id()].it_value; /* Subtract one tick and account for possible difference between * when we expected the tick and when it actually arrived. * (aka wall vs real) */ last_tick -= clocktick * (jiffies - wall_jiffies + 1); elapsed_cycles = mfctl(16) - last_tick; /* the precision of this math could be improved */ return elapsed_cycles / (PAGE0->mem_10msec / 10000); #else return 0; #endif } Now let's look at the comment for James' change: <quote> Apparently gettimeoffset can return small negative values (usually in the 100us range). If xtime.tv_nsec is accidentally less than this, though (a fortunately unlikely event) it triggers the loop forever. I've added a test and correct adjustment for this case. It has a warning printk in there which I'd like to leave for the time being just in case this problem implicates some other part of the kernel. </quote> I don't see 0 being a small negative value. ;-) James' fix only applied to UP builds. :) Cheers, Kyle M. _______________________________________________ parisc-linux mailing list parisc-linux@lists.parisc-linux.org http://lists.parisc-linux.org/mailman/listinfo/parisc-linux ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [parisc-linux] Mysterious hangs with parisc (a send_group_sig_info() analysis) 2006-07-06 18:34 ` Kyle McMartin @ 2006-07-21 20:05 ` Joel Soete 2006-07-21 20:21 ` Michael S. Zick ` (2 more replies) 0 siblings, 3 replies; 11+ messages in thread From: Joel Soete @ 2006-07-21 20:05 UTC (permalink / raw) To: Kyle McMartin; +Cc: James.Bottomley, John David Anglin, parisc-linux Hello all, Kyle McMartin wrote: --- snip --- > > James' fix only applied to UP builds. :) > Continuing investigation on this: BUG: soft lockup detected on CPU#0! Backtrace: [<0000000010112e70>] dump_stack+0x18/0x28 [<0000000010176088>] softlockup_tick+0x120/0x160 [<00000000101533b0>] run_local_timers+0x28/0x38 [<0000000010153c48>] update_process_times+0x58/0xd8 [<000000001011d898>] smp_do_timer+0x70/0x80 [<0000000010114094>] timer_interrupt+0xd4/0x1e0 [<0000000010176204>] handle_IRQ_event+0x74/0xd0 [<0000000010176318>] __do_IRQ+0xb8/0x270 [<0000000010114a6c>] do_cpu_irq_mask+0x11c/0x1e8 [<0000000010104074>] intr_return+0x0/0x1c smp n4k pb. I figure out that with the kernel 2.6.17-pa3 compiled with gcc-3.3 this system run my stress test during 8 full days without pb. Otoh when the same src were compiled with gcc-4.x the system vanished (panicing?) eracticaly after between 1 to 3 tree days of test. So either gcc-3.3 produced the right stuff or hiden a bug? So I managed to produce *.s files for gcc-3.3 and gcc-4.1 to attempt comparison. But this Backtrace didn't show me any cpu status (even after I updated the firmware of pdc and gsp to their latest releases and tried the latest mingo's patch for lock validator?). Even thought Mike recalled me this old report: <http://lists.parisc-linux.org/pipermail/parisc-linux/2006-March/028603.html> IAOQ[0]: _read_lock+0x18/0x30 IAOQ[1]: _read_lock+0x8/0x30 RP(r2): send_group_sig_info+0x3c/0xb0 I so used this as starting point of investigation: --- snip --- /* * This is the entry point for "process-wide" signals. * They will go to an appropriate thread in the thread group. */ int send_group_sig_info(int sig, struct siginfo *info, struct task_struct *p) { int ret; read_lock(&tasklist_lock); ret = group_send_sig_info(sig, info, p); read_unlock(&tasklist_lock); return ret; } --- snip --- ./include/linux/spinlock.h: --- snip --- #define read_lock(lock) _read_lock(lock) /* * We inline the unlock functions in the nondebug case: */ #if defined(CONFIG_DEBUG_SPINLOCK) || defined(CONFIG_PREEMPT) || !defined(CONFIG_SMP) # define spin_unlock(lock) _spin_unlock(lock) # define read_unlock(lock) _read_unlock(lock) # define write_unlock(lock) _write_unlock(lock) #else # define spin_unlock(lock) __raw_spin_unlock(&(lock)->raw_lock) # define read_unlock(lock) __raw_read_unlock(&(lock)->raw_lock) # define write_unlock(lock) __raw_write_unlock(&(lock)->raw_lock) #endif --- snip --- Q1: according to this comment why don't we 'inline' read_lock() as read_unlock? Lets have a look to the pre-compiled (.i) file: precompiled: --- snip --- int send_group_sig_info(int sig, struct siginfo *info, struct task_struct *p) { int ret; _read_lock(&tasklist_lock); ret = group_send_sig_info(sig, info, p); __raw_read_unlock(&(&tasklist_lock)->raw_lock); return ret; } --- snip --- static __inline__ __attribute__((always_inline)) void __raw_read_unlock(raw_rwlock_t *rw) { __raw_spin_lock_flags(&rw->lock, 0); rw->counter--; __raw_spin_unlock(&rw->lock); } --- snip --- static inline __attribute__((always_inline)) void __raw_spin_lock_flags(raw_spinlock_t *x, unsigned long flags) { volatile unsigned int *a; __asm__ __volatile__("":::"memory"); a = ((volatile unsigned int *)x); while (({ unsigned __ret; __asm__ __volatile__("ldcw,co" " 0(%1),%0" : "=r" (__ret) : "r" (a)); __ret; }) == 0) __asm__("; flag_0"); while (*a == 0) if (flags & 0x00000001) { __asm__ __volatile__("ssm %0,%%r0\n" : : "i" (0x00000001) : "memory" ); __asm__ __volatile__("":::"memory"); __asm__ __volatile__("rsm %0,%%r0\n" : : "i" (0x00000001) : "memory" ); } else __asm__ __volatile__("; cpu_relax\n\tnop\n\tnop\n" : : ); __asm__ __volatile__("":::"memory"); } --- snip --- to be sure am i on the right place and make some asm highlighting: # diff -Nau spinlock.h.Orig spinlock.h --- spinlock.h.Orig 2006-07-20 07:54:26.000000000 +0200 +++ spinlock.h 2006-07-21 19:37:16.000000000 +0200 @@ -20,8 +20,10 @@ { volatile unsigned int *a; - mb(); +/* mb(); */ + __asm__ __volatile__("; raw_spin_lock_flags_in":::"memory"); a = __ldcw_align(x); + while (__ldcw(a) == 0) while (*a == 0) if (flags & PSW_SM_I) { @@ -30,16 +32,19 @@ local_irq_disable(); } else cpu_relax(); - mb(); +/* mb(); */ + __asm__ __volatile__("; raw_spin_lock_flags_out":::"memory"); } static inline void __raw_spin_unlock(raw_spinlock_t *x) { volatile unsigned int *a; - mb(); +/* mb(); */ + __asm__ __volatile__("; raw_spin_unlock_in":::"memory"); a = __ldcw_align(x); *a = 1; - mb(); +/* mb(); */ + __asm__ __volatile__("; raw_spin_unlock_out":::"memory"); } static inline int __raw_spin_trylock(raw_spinlock_t *x) --- processor.h.Orig 2006-07-20 07:54:46.000000000 +0200 +++ processor.h 2006-07-20 08:14:30.000000000 +0200 @@ -355,7 +355,11 @@ } #endif +/* #define cpu_relax() barrier() + */ +#define cpu_relax() __asm__ __volatile__("; cpu_relax\n\tnop\n\tnop\n" : : ) + #endif /* __ASSEMBLY__ */ which obvioulsy produced different s files: # diff -y SGSI_gcc33.s.ban_noFlag0 SGSI_gcc41.s.ban_noFlag1 .section .text.send_group_sig_info,"ax",@progb .section .text.send_group_sig_info,"ax",@progb .align 8 .align 8 .globl send_group_sig_info .globl send_group_sig_info .type send_group_sig_info, @function .type send_group_sig_info, @function send_group_sig_info: send_group_sig_info: .PROC .PROC .CALLINFO FRAME=144,CALLS,SAVE_RP,ENTRY_GR=6 | .CALLINFO FRAME=160,CALLS,SAVE_RP,ENTRY_GR=7 .ENTRY .ENTRY > addil LT'tasklist_lock,%r27 std %r2,-16(%r30) std %r2,-16(%r30) std,ma %r7,144(%r30) | std,ma %r8,160(%r30) std %r6,-136(%r30) | copy %r24,%r8 std %r5,-128(%r30) < std %r4,-120(%r30) < copy %r27,%r4 < ldo -48(%r30),%r29 ldo -48(%r30),%r29 copy %r25,%r6 | std %r7,-152(%r30) extrd,s %r26,63,32,%r5 | copy %r25,%r7 addil LT'tasklist_lock,%r27 | std %r6,-144(%r30) ldd RT'tasklist_lock(%r1),%r1 | extrd,s %r26,63,32,%r6 copy %r1,%r26 | std %r5,-136(%r30) > ldd RT'tasklist_lock(%r1),%r5 > copy %r5,%r26 > std %r4,-128(%r30) b,l _read_lock,%r2 b,l _read_lock,%r2 copy %r24,%r7 | copy %r27,%r4 copy %r6,%r25 < copy %r7,%r24 < copy %r4,%r27 copy %r4,%r27 > copy %r6,%r26 > copy %r27,%r4 > copy %r7,%r25 ldo -48(%r30),%r29 ldo -48(%r30),%r29 b,l group_send_sig_info,%r2 b,l group_send_sig_info,%r2 copy %r5,%r26 | copy %r8,%r24 addil LT'tasklist_lock,%r4 | copy %r4,%r27 ldd RT'tasklist_lock(%r1),%r20 | copy %r28,%r19 #APP #APP ; raw_spin_lock_flags_in ; raw_spin_lock_flags_in ldcw,co 0(%r20),%r19 < #NO_APP #NO_APP cmpib,<>,n 0,%r19,.L895 | .L944: .L890: | #APP ldw 0(%r20),%r19 | ldcw,co 0(%r5),%r28 cmpib,<>,n 0,%r19,.L897 | #NO_APP .L889: | cmpib,<>,n 0,%r28,.L946 > .L945: > ldw 0(%r5),%r28 > cmpb,*<> %r0,%r28,.L944 > nop #APP #APP ; cpu_relax ; cpu_relax nop nop nop nop #NO_APP #NO_APP ldw 0(%r20),%r19 | b,n .L945 cmpib,= 0,%r19,.L889 | .L946: nop < .L897: < #APP < ldcw,co 0(%r20),%r19 < #NO_APP < cmpib,= 0,%r19,.L890 < nop < .L895: < #APP #APP ; raw_spin_lock_flags_out ; raw_spin_lock_flags_out #NO_APP #NO_APP ldw 4(%r20),%r19 | addil LT'tasklist_lock,%r27 ldo -1(%r19),%r19 | ldd RT'tasklist_lock(%r1),%r31 stw %r19,4(%r20) | ldw 4(%r31),%r28 > ldo -1(%r28),%r28 > stw %r28,4(%r31) #APP #APP ; raw_spin_unlock_in ; raw_spin_unlock_in #NO_APP #NO_APP ldi 1,%r19 | ldi 1,%r28 stw %r19,0(%r20) | stw %r28,0(%r5) #APP #APP ; raw_spin_unlock_out ; raw_spin_unlock_out #NO_APP #NO_APP ldd -120(%r30),%r4 | copy %r19,%r28 ldd -128(%r30),%r5 | ldd -176(%r30),%r2 ldd -160(%r30),%r2 | ldd -152(%r30),%r7 ldd -136(%r30),%r6 | ldd -144(%r30),%r6 > ldd -136(%r30),%r5 > ldd -128(%r30),%r4 bve (%r2) bve (%r2) ldd,mb -144(%r30),%r7 | ldd,mb -160(%r30),%r8 .EXIT .EXIT .PROCEND .PROCEND .size send_group_sig_info, .-send_group_sig_info .size send_group_sig_info, .-send_group_sig_info And here are a lot of questions: Q2: some diff in routine like 2 times ldcw with gcc-3 and only one with gcc-4 (may be ok?) Q3: a 64bit comparison (cmpb,*<> %r0,%r28,.L944) introduced with gcc-4? Q4: imho the most important: with 2 gcc where is the couple of ssm/rsm that jejb introduced in his patch: <http://lists.parisc-linux.org/pipermail/parisc-linux-cvs/2005-October/036211.html> --- linux-2.6/include/asm-parisc/spinlock.h 2005/09/14 14:42:11 1.16 +++ linux-2.6/include/asm-parisc/spinlock.h 2005/10/20 16:07:04 1.17 @@ -11,18 +11,25 @@ return *a == 0; } -#define __raw_spin_lock_flags(lock, flags) __raw_spin_lock(lock) +#define __raw_spin_lock(lock) __raw_spin_lock_flags(lock, 0) #define __raw_spin_unlock_wait(x) \ do { cpu_relax(); } while (__raw_spin_is_locked(x)) -static inline void __raw_spin_lock(raw_spinlock_t *x) +static inline void __raw_spin_lock_flags(raw_spinlock_t *x, + unsigned long flags) { volatile unsigned int *a; mb(); a = __ldcw_align(x); while (__ldcw(a) == 0) - while (*a == 0); + while (*a == 0) + if (flags & PSW_SM_I) { + local_irq_enable(); + cpu_relax(); + local_irq_disable(); + } else + cpu_relax(); mb(); } ? Mike, feel free to complete this summary if something looks like bad, not complete or not clear ;-) Thanks for help, Joel _______________________________________________ parisc-linux mailing list parisc-linux@lists.parisc-linux.org http://lists.parisc-linux.org/mailman/listinfo/parisc-linux ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [parisc-linux] Mysterious hangs with parisc (a send_group_sig_info() analysis) 2006-07-21 20:05 ` [parisc-linux] Mysterious hangs with parisc (a send_group_sig_info() analysis) Joel Soete @ 2006-07-21 20:21 ` Michael S. Zick 2006-07-22 17:23 ` Carlos O'Donell 2006-07-22 17:29 ` Carlos O'Donell 2 siblings, 0 replies; 11+ messages in thread From: Michael S. Zick @ 2006-07-21 20:21 UTC (permalink / raw) To: parisc-linux On Fri July 21 2006 15:05, Joel Soete wrote: > Hello all, > - - - snip > > Mike, feel free to complete this summary if something > looks like bad, not complete or not clear ;-) > Nothing to add. That is a good summary Joel. We know that routine fails, we know its different, we can't find the problem. > Thanks for help, > Joel Mike _______________________________________________ parisc-linux mailing list parisc-linux@lists.parisc-linux.org http://lists.parisc-linux.org/mailman/listinfo/parisc-linux ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [parisc-linux] Mysterious hangs with parisc (a send_group_sig_info() analysis) 2006-07-21 20:05 ` [parisc-linux] Mysterious hangs with parisc (a send_group_sig_info() analysis) Joel Soete 2006-07-21 20:21 ` Michael S. Zick @ 2006-07-22 17:23 ` Carlos O'Donell 2006-07-22 19:11 ` Michael S. Zick 2006-07-22 17:29 ` Carlos O'Donell 2 siblings, 1 reply; 11+ messages in thread From: Carlos O'Donell @ 2006-07-22 17:23 UTC (permalink / raw) To: Joel Soete Cc: Kyle McMartin, James.Bottomley, parisc-linux, John David Anglin > And here are a lot of questions: > Q2: some diff in routine like 2 times ldcw with gcc-3 and only one with gcc-4 (may be ok?) > Analyze the code. Is it doing functionally the same thing? I assert that yes, it's doing exactly the same thing. The older gcc has unwrapped one of the ldcw's, and thus has two copies of the inner loop in the assembly. The code on the right does exactly the same thing, but in less instructions. GCC has gotten better. There isn't a missing lock, we only needed one lock, the older gcc was doing a poorer job of code generation. There may be other parts of the kernel that have problems, so keep looking! Cheers, Carlos. _______________________________________________ parisc-linux mailing list parisc-linux@lists.parisc-linux.org http://lists.parisc-linux.org/mailman/listinfo/parisc-linux ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [parisc-linux] Mysterious hangs with parisc (a send_group_sig_info() analysis) 2006-07-22 17:23 ` Carlos O'Donell @ 2006-07-22 19:11 ` Michael S. Zick 0 siblings, 0 replies; 11+ messages in thread From: Michael S. Zick @ 2006-07-22 19:11 UTC (permalink / raw) To: parisc-linux; +Cc: Kyle McMartin, James.Bottomley, John David Anglin On Sat July 22 2006 12:23, Carlos O'Donell wrote: > > And here are a lot of questions: > > Q2: some diff in routine like 2 times ldcw with gcc-3 and only one with gcc-4 (may be ok?) > > > > Analyze the code. Is it doing functionally the same thing? > > I assert that yes, it's doing exactly the same thing. The older gcc > has unwrapped one of the ldcw's, and thus has two copies of the inner > loop in the assembly. The code on the right does exactly the same > thing, but in less instructions. GCC has gotten better. > > There isn't a missing lock, we only needed one lock, the older gcc was > doing a poorer job of code generation. > OK - will go with that theory. > There may be other parts of the kernel that have problems, so keep looking! > Now that portion of the work is easy (not quick, easy) to reproduce. Pick a fairly recent kernel (I think Joel has used both 2.6.16 & 2.6.17); Use gcc-3.3 to build 64-bit-smp; Use gcc-4.1 to build 64-bit-smp; Pick a machine with two or more processors (Joel has PA8000 & PA8700); Pick yourself a never-ending script that keeps all processors busy; Run the gcc-4.1 built kernel; observe that machine locks up after some time period ranging from hours to a few days; Boot the gcc-3.3 built kernel; observe that machine runs for weeks. Yes, there is something getting compiled differently than intended. Q) Is the following an expected difference? send_group_sig_info: send_group_sig_info: .PROC .PROC .CALLINFO FRAME=144,CALLS,SAVE_RP,ENTRY_GR=6 | .CALLINFO FRAME=160,CALLS,SAVE_RP,ENTRY_GR=7 Mike _______________________________________________ parisc-linux mailing list parisc-linux@lists.parisc-linux.org http://lists.parisc-linux.org/mailman/listinfo/parisc-linux ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [parisc-linux] Mysterious hangs with parisc (a send_group_sig_info() analysis) 2006-07-21 20:05 ` [parisc-linux] Mysterious hangs with parisc (a send_group_sig_info() analysis) Joel Soete 2006-07-21 20:21 ` Michael S. Zick 2006-07-22 17:23 ` Carlos O'Donell @ 2006-07-22 17:29 ` Carlos O'Donell 2006-07-22 22:10 ` Joel Soete 2 siblings, 1 reply; 11+ messages in thread From: Carlos O'Donell @ 2006-07-22 17:29 UTC (permalink / raw) To: Joel Soete Cc: Kyle McMartin, James.Bottomley, parisc-linux, John David Anglin > Q4: imho the most important: with 2 gcc where is the couple of ssm/rsm that jejb introduced in his patch: The *most* important piece of imformation is the pre-compiled source. You *must* determine that both paths of the if/else are different, because if they are the same, then GCC is right to remove one of the paths, and unconditionally call cpu_relax. Where is the pre-compiled source to prove that the paths aren't identical? Remember to work towards a solution! Cheers, Carlos. _______________________________________________ parisc-linux mailing list parisc-linux@lists.parisc-linux.org http://lists.parisc-linux.org/mailman/listinfo/parisc-linux ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [parisc-linux] Mysterious hangs with parisc (a send_group_sig_info() analysis) 2006-07-22 17:29 ` Carlos O'Donell @ 2006-07-22 22:10 ` Joel Soete 2006-07-22 23:49 ` Michael S. Zick 0 siblings, 1 reply; 11+ messages in thread From: Joel Soete @ 2006-07-22 22:10 UTC (permalink / raw) To: Carlos O'Donell Cc: Kyle McMartin, James.Bottomley, parisc-linux, John David Anglin Carlos O'Donell wrote: >> Q4: imho the most important: with 2 gcc where is the couple of >> ssm/rsm that jejb introduced in his patch: > > > The *most* important piece of imformation is the pre-compiled source. if (flags & 0x00000001) { __asm__ __volatile__("ssm %0,%%r0\n" : : "i" (0x00000001) : "memory" ); __asm__ __volatile__("":::"memory"); __asm__ __volatile__("rsm %0,%%r0\n" : : "i" (0x00000001) : "memory" ); } else __asm__ __volatile__("":::"memory"); > You *must* determine that both paths of the if/else are different, > because if they are the same, then GCC is right to remove one of the > paths, and unconditionally call cpu_relax. > and flags is a parameter of __raw_spin_lock_flags(): static inline __attribute__((always_inline)) void __raw_spin_lock_flags(raw_spinlock_t *x, unsigned long flags) So i didn't see how gcc could know at compile time that (flags & 0x00000001) was always false? > Where is the pre-compiled source to prove that the paths aren't identical? > Well I don't think that pre-compile stuff would help more because doesn't inlining stuff like as #define would? That said I think to have found what you want to learn me: isn't it exactely this inlining into: static __inline__ __attribute__((always_inline)) void __raw_read_unlock(raw_rwlock_t *rw) { __raw_spin_lock_flags(&rw->lock, 0); --- snip --- where obvioulsy (0 & whatelse) == 0 (i.e. false) Is it well what you expected we found? > Remember to work towards a solution! > So this: IAOQ[0]: _read_lock+0x18/0x30 IAOQ[1]: _read_lock+0x8/0x30 RP(r2): send_group_sig_info+0x3c/0x88 was 2.6.14-rc related and not anymore to 2.6.17 and greater? But I haven't anymore IAOQ report when this system hang now (even TOC doesn't report any usefull info). So where to start other investigattion? Thanks, Joel PS: would it make sence to test this nucked hunk I mean afaik this "__raw_spin_lock_flags(&rw->lock, 0);" comes from this definition: #define __raw_spin_lock(lock) __raw_spin_lock_flags(lock, 0) Tbh I am a bit confused by this patch: write some stuff and say don't use it? > Cheers, > Carlos. > > _______________________________________________ parisc-linux mailing list parisc-linux@lists.parisc-linux.org http://lists.parisc-linux.org/mailman/listinfo/parisc-linux ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [parisc-linux] Mysterious hangs with parisc (a send_group_sig_info() analysis) 2006-07-22 22:10 ` Joel Soete @ 2006-07-22 23:49 ` Michael S. Zick 2006-07-24 4:16 ` Grant Grundler 0 siblings, 1 reply; 11+ messages in thread From: Michael S. Zick @ 2006-07-22 23:49 UTC (permalink / raw) To: parisc-linux On Sat July 22 2006 17:10, Joel Soete wrote: > > Carlos O'Donell wrote: > >> Q4: imho the most important: with 2 gcc where is the couple of > >> ssm/rsm that jejb introduced in his patch: > > > > > > The *most* important piece of imformation is the pre-compiled source. > if (flags & 0x00000001) { > __asm__ __volatile__("ssm %0,%%r0\n" : : "i" (0x00000001) : "memory" ); > __asm__ __volatile__("":::"memory"); > __asm__ __volatile__("rsm %0,%%r0\n" : : "i" (0x00000001) : "memory" ); > } else > __asm__ __volatile__("":::"memory"); > > > You *must* determine that both paths of the if/else are different, > > because if they are the same, then GCC is right to remove one of the > > paths, and unconditionally call cpu_relax. > > Note: That code is the same when either compiler is used; _But_ as a separate problem... My reading of the intent of that added code was: Busy_Loop: If (Interrupts are disabled) { Enable interrupts; Cpu_relax; /* Long enough to recognize pending interrupts */ Disable interrupts; } else { Cpu_relax; } And my complaint was a matter of structure... The assumption here is that the flag bit in an external variable is always in-sync with the interrupt status bit in the register. Obviously a false assumption, this routine is an example... Inside the first if block, the status bit is changed but the external variable bit is not. A general purpose, like we don't know what state the interrupts are in, and we don't want to trust an external flag bit; Looks like: Busy_Loop: Save current interrupt status to the external variable; Enable interrupts; /* a nop if already enabled */ Cpu_relax; /* Long enough to recognize pending interrupts */ Restore interrupt status from external variable; Our redefining Cpu_relax to a pair of nops was for machines that delay recognizing interrupts for an instruction after changing the status bit. We also included an asm comment in Cpu_relax just in case we wanted to see that it had not gotten optimized away. Same reason the mb() was redefined as an asm comment that could be identified by inspection. Mike _______________________________________________ parisc-linux mailing list parisc-linux@lists.parisc-linux.org http://lists.parisc-linux.org/mailman/listinfo/parisc-linux ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [parisc-linux] Mysterious hangs with parisc (a send_group_sig_info() analysis) 2006-07-22 23:49 ` Michael S. Zick @ 2006-07-24 4:16 ` Grant Grundler 0 siblings, 0 replies; 11+ messages in thread From: Grant Grundler @ 2006-07-24 4:16 UTC (permalink / raw) To: Michael S. Zick; +Cc: parisc-linux On Sat, Jul 22, 2006 at 06:49:18PM -0500, Michael S. Zick wrote: > The assumption here is that the flag bit in an external variable > is always in-sync with the interrupt status bit in the register. The flag bit is generally the value _before_ we attempted to acquire a lock. I'm really not comfortable with passing in a zero flags value. I'd need to review alot more code to be happy we aren't breaking something else that uses __raw_spin_lock. ... > A general purpose, like we don't know what state the interrupts > are in, and we don't want to trust an external flag bit; > > Looks like: > > Busy_Loop: > Save current interrupt status to the external variable; > Enable interrupts; /* a nop if already enabled */ > Cpu_relax; /* Long enough to recognize pending interrupts */ > Restore interrupt status from external variable; I'm ok with this for two reasons: 1) I'm not sure looking at the _previous_ state is the intent 2) branch in perf path is always a bad thing (after cacheline misses). However, I don't know if there is a reason to instead look at the previous state. > Our redefining Cpu_relax to a pair of nops was for machines that > delay recognizing interrupts for an instruction after changing > the status bit. We also included an asm comment in Cpu_relax > just in case we wanted to see that it had not gotten optimized away. the asm comment is a good idea. > Same reason the mb() was redefined as an asm comment that could > be identified by inspection. *nod*. thanks, grant _______________________________________________ parisc-linux mailing list parisc-linux@lists.parisc-linux.org http://lists.parisc-linux.org/mailman/listinfo/parisc-linux ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2006-07-24 4:16 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20060624015935.GJ12481@quicksilver.road.mcmartin.ca>
[not found] ` <200606240241.k5O2fALQ001885@hiauly1.hia.nrc.ca>
2006-07-05 16:47 ` [parisc-linux] Mysterious hangs with parisc Kyle McMartin
2006-07-06 18:22 ` Joel Soete
2006-07-06 18:34 ` Kyle McMartin
2006-07-21 20:05 ` [parisc-linux] Mysterious hangs with parisc (a send_group_sig_info() analysis) Joel Soete
2006-07-21 20:21 ` Michael S. Zick
2006-07-22 17:23 ` Carlos O'Donell
2006-07-22 19:11 ` Michael S. Zick
2006-07-22 17:29 ` Carlos O'Donell
2006-07-22 22:10 ` Joel Soete
2006-07-22 23:49 ` Michael S. Zick
2006-07-24 4:16 ` Grant Grundler
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.