* Re: [parisc-linux] Mysterious hangs with parisc
[not found] ` <200606240241.k5O2fALQ001885@hiauly1.hia.nrc.ca>
@ 2006-07-05 16:47 ` Kyle McMartin
2006-07-06 18:22 ` Joel Soete
0 siblings, 1 reply; 11+ messages in thread
From: Kyle McMartin @ 2006-07-05 16:47 UTC (permalink / raw)
To: John David Anglin; +Cc: James.Bottomley, parisc-linux
On Fri, Jun 23, 2006 at 10:41:09PM -0400, John David Anglin wrote:
> This is not to say that the locking code can't be improved. As far
> as I can tell, hpux doesn't spin. It has pre-abitration, priorities
> for waiters, etc. When a lock is released, suwaiters is called to
> pass the lock in a fair way to another waiter if there is one.
>
HPUX spinlocks /do/ spin, afaict. They just keep track of how long
they've been spinning for (and panic after 60 seconds.) On release of
a spinlock, they also donate it to another cpu, which is, I guess, the
arbitration thing. I can back this up with disassembly, but I wouldn't
be comfortable doing it over mail, due to copyright infringement and
blah blah.
Cheers,
Kyle M.
_______________________________________________
parisc-linux mailing list
parisc-linux@lists.parisc-linux.org
http://lists.parisc-linux.org/mailman/listinfo/parisc-linux
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [parisc-linux] Mysterious hangs with parisc
2006-07-05 16:47 ` [parisc-linux] Mysterious hangs with parisc Kyle McMartin
@ 2006-07-06 18:22 ` Joel Soete
2006-07-06 18:34 ` Kyle McMartin
0 siblings, 1 reply; 11+ messages in thread
From: Joel Soete @ 2006-07-06 18:22 UTC (permalink / raw)
To: Kyle McMartin; +Cc: James.Bottomley, John David Anglin, parisc-linux
Kyle McMartin wrote:
> On Fri, Jun 23, 2006 at 10:41:09PM -0400, John David Anglin wrote:
>
>>This is not to say that the locking code can't be improved. As far
>>as I can tell, hpux doesn't spin. It has pre-abitration, priorities
>>for waiters, etc. When a lock is released, suwaiters is called to
>>pass the lock in a fair way to another waiter if there is one.
>>
>
>
> HPUX spinlocks /do/ spin, afaict. They just keep track of how long
> they've been spinning for (and panic after 60 seconds.) On release of
> a spinlock, they also donate it to another cpu, which is, I guess, the
> arbitration thing. I can back this up with disassembly, but I wouldn't
> be comfortable doing it over mail, due to copyright infringement and
> blah blah.
>
> Cheers,
> Kyle M.
Cool ;-)
That said to followup another related thread <http://lists.parisc-linux.org/pipermail/parisc-linux/2006-July/029481.html>, the
do_gettimeofday() jejb's patch doesn't help:
after just some hours of my stress test:
top - 17:52:25 up 21:57, 3 users, load average: 8.16, 7.18, 6.91
Tasks: 84 total, 5 running, 79 sleeping, 0 stopped, 0 zombie
Cpu0 : 53.7% us, 35.3% sy, 0.0% ni, 0.0% id, 0.0% wa, 0.0% hi, 11.0% si
Cpu1 : 86.1% us, 10.2% sy, 0.0% ni, 0.0% id, 0.7% wa, 0.0% hi, 2.9% si
Mem: 4114224k total, 4018628k used, 95596k free, 483816k buffers
Swap: 250872k total, 4k used, 250868k free, 436632k cached
I still encountered the well know BUG: soft lockup detected
BUG: soft lockup detected on CPU#1!
Backtrace:
[<00000000101122b0>] dump_stack+0x18/0x28
[<0000000010171c60>] softlockup_tick+0x128/0x158
[<0000000010151a00>] run_local_timers+0x28/0x38
[<0000000010152770>] update_process_times+0x58/0xd8
[<000000001011cc08>] smp_do_timer+0x70/0x80
[<00000000101134b4>] timer_interrupt+0xdc/0x1e0
[<0000000010171e04>] handle_IRQ_event+0x74/0xd0
[<0000000010171f1c>] __do_IRQ+0xbc/0x268
[<0000000010113e74>] do_cpu_irq_mask+0x114/0x1e0
[<0000000010104074>] intr_return+0x0/0x1c
BUG: soft lockup detected on CPU#0!
Backtrace:
[<00000000101122b0>] dump_stack+0x18/0x28
[<0000000010171c60>] softlockup_tick+0x128/0x158
[<0000000010151a00>] run_local_timers+0x28/0x38
[<0000000010152770>] update_process_times+0x58/0xd8
[<000000001011cc08>] smp_do_timer+0x70/0x80
[<00000000101134b4>] timer_interrupt+0xdc/0x1e0
[<0000000010171e04>] handle_IRQ_event+0x74/0xd0
[<0000000010171f1c>] __do_IRQ+0xbc/0x268
[<0000000010113e74>] do_cpu_irq_mask+0x114/0x1e0
[<0000000010104074>] intr_return+0x0/0x1c
(only a small diff with previous failures: here it shows also 'CPU#1!')
Otoh previous test seems to show that older gcc-3.3 would produce a more operational kernel but I haven't any idea where to look for?
All guide line would be welcome.
Thanks,
Joel
_______________________________________________
parisc-linux mailing list
parisc-linux@lists.parisc-linux.org
http://lists.parisc-linux.org/mailman/listinfo/parisc-linux
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [parisc-linux] Mysterious hangs with parisc
2006-07-06 18:22 ` Joel Soete
@ 2006-07-06 18:34 ` Kyle McMartin
2006-07-21 20:05 ` [parisc-linux] Mysterious hangs with parisc (a send_group_sig_info() analysis) Joel Soete
0 siblings, 1 reply; 11+ messages in thread
From: Kyle McMartin @ 2006-07-06 18:34 UTC (permalink / raw)
To: Joel Soete
Cc: Kyle McMartin, James.Bottomley, parisc-linux, John David Anglin
On Thu, Jul 06, 2006 at 06:22:40PM +0000, Joel Soete wrote:
> [<000000001011cc08>] smp_do_timer+0x70/0x80
You're running an SMP kernel.
time.c:
static inline unsigned long
gettimeoffset (void)
{
#ifndef CONFIG_SMP
/*
* FIXME: This won't work on smp because jiffies are updated by cpu 0.
* Once parisc-linux learns the cr16 difference between processors,
* this could be made to work.
*/
long last_tick;
long elapsed_cycles;
/* it_value is the intended time of the next tick */
last_tick = cpu_data[smp_processor_id()].it_value;
/* Subtract one tick and account for possible difference between
* when we expected the tick and when it actually arrived.
* (aka wall vs real)
*/
last_tick -= clocktick * (jiffies - wall_jiffies + 1);
elapsed_cycles = mfctl(16) - last_tick;
/* the precision of this math could be improved */
return elapsed_cycles / (PAGE0->mem_10msec / 10000);
#else
return 0;
#endif
}
Now let's look at the comment for James' change:
<quote>
Apparently gettimeoffset can return small negative values (usually in
the 100us range). If xtime.tv_nsec is accidentally less than this,
though (a fortunately unlikely event) it triggers the loop forever.
I've added a test and correct adjustment for this case. It has a
warning printk in there which I'd like to leave for the time being
just in case this problem implicates some other part of the kernel.
</quote>
I don't see 0 being a small negative value. ;-)
James' fix only applied to UP builds. :)
Cheers,
Kyle M.
_______________________________________________
parisc-linux mailing list
parisc-linux@lists.parisc-linux.org
http://lists.parisc-linux.org/mailman/listinfo/parisc-linux
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [parisc-linux] Mysterious hangs with parisc (a send_group_sig_info() analysis)
2006-07-06 18:34 ` Kyle McMartin
@ 2006-07-21 20:05 ` Joel Soete
2006-07-21 20:21 ` Michael S. Zick
` (2 more replies)
0 siblings, 3 replies; 11+ messages in thread
From: Joel Soete @ 2006-07-21 20:05 UTC (permalink / raw)
To: Kyle McMartin; +Cc: James.Bottomley, John David Anglin, parisc-linux
Hello all,
Kyle McMartin wrote:
--- snip ---
>
> James' fix only applied to UP builds. :)
>
Continuing investigation on this:
BUG: soft lockup detected on CPU#0!
Backtrace:
[<0000000010112e70>] dump_stack+0x18/0x28
[<0000000010176088>] softlockup_tick+0x120/0x160
[<00000000101533b0>] run_local_timers+0x28/0x38
[<0000000010153c48>] update_process_times+0x58/0xd8
[<000000001011d898>] smp_do_timer+0x70/0x80
[<0000000010114094>] timer_interrupt+0xd4/0x1e0
[<0000000010176204>] handle_IRQ_event+0x74/0xd0
[<0000000010176318>] __do_IRQ+0xb8/0x270
[<0000000010114a6c>] do_cpu_irq_mask+0x11c/0x1e8
[<0000000010104074>] intr_return+0x0/0x1c
smp n4k pb.
I figure out that with the kernel 2.6.17-pa3 compiled with gcc-3.3 this system run my stress test during 8 full days without pb.
Otoh when the same src were compiled with gcc-4.x the system vanished (panicing?) eracticaly after between 1 to 3 tree days of test.
So either gcc-3.3 produced the right stuff or hiden a bug?
So I managed to produce *.s files for gcc-3.3 and gcc-4.1 to attempt comparison.
But this Backtrace didn't show me any cpu status (even after I updated the firmware of pdc and gsp to their latest releases and
tried the latest mingo's patch for lock validator?). Even thought Mike recalled me this old report:
<http://lists.parisc-linux.org/pipermail/parisc-linux/2006-March/028603.html>
IAOQ[0]: _read_lock+0x18/0x30
IAOQ[1]: _read_lock+0x8/0x30
RP(r2): send_group_sig_info+0x3c/0xb0
I so used this as starting point of investigation:
--- snip ---
/*
* This is the entry point for "process-wide" signals.
* They will go to an appropriate thread in the thread group.
*/
int
send_group_sig_info(int sig, struct siginfo *info, struct task_struct *p)
{
int ret;
read_lock(&tasklist_lock);
ret = group_send_sig_info(sig, info, p);
read_unlock(&tasklist_lock);
return ret;
}
--- snip ---
./include/linux/spinlock.h:
--- snip ---
#define read_lock(lock) _read_lock(lock)
/*
* We inline the unlock functions in the nondebug case:
*/
#if defined(CONFIG_DEBUG_SPINLOCK) || defined(CONFIG_PREEMPT) || !defined(CONFIG_SMP)
# define spin_unlock(lock) _spin_unlock(lock)
# define read_unlock(lock) _read_unlock(lock)
# define write_unlock(lock) _write_unlock(lock)
#else
# define spin_unlock(lock) __raw_spin_unlock(&(lock)->raw_lock)
# define read_unlock(lock) __raw_read_unlock(&(lock)->raw_lock)
# define write_unlock(lock) __raw_write_unlock(&(lock)->raw_lock)
#endif
--- snip ---
Q1: according to this comment why don't we 'inline' read_lock() as read_unlock?
Lets have a look to the pre-compiled (.i) file:
precompiled:
--- snip ---
int
send_group_sig_info(int sig, struct siginfo *info, struct task_struct *p)
{
int ret;
_read_lock(&tasklist_lock);
ret = group_send_sig_info(sig, info, p);
__raw_read_unlock(&(&tasklist_lock)->raw_lock);
return ret;
}
--- snip ---
static __inline__ __attribute__((always_inline)) void __raw_read_unlock(raw_rwlock_t *rw)
{
__raw_spin_lock_flags(&rw->lock, 0);
rw->counter--;
__raw_spin_unlock(&rw->lock);
}
--- snip ---
static inline __attribute__((always_inline)) void __raw_spin_lock_flags(raw_spinlock_t *x,
unsigned long flags)
{
volatile unsigned int *a;
__asm__ __volatile__("":::"memory");
a = ((volatile unsigned int *)x);
while (({ unsigned __ret; __asm__ __volatile__("ldcw,co" " 0(%1),%0" : "=r" (__ret) : "r" (a)); __ret; }) == 0)
__asm__("; flag_0");
while (*a == 0)
if (flags & 0x00000001) {
__asm__ __volatile__("ssm %0,%%r0\n" : : "i" (0x00000001) : "memory" );
__asm__ __volatile__("":::"memory");
__asm__ __volatile__("rsm %0,%%r0\n" : : "i" (0x00000001) : "memory" );
} else
__asm__ __volatile__("; cpu_relax\n\tnop\n\tnop\n" : : );
__asm__ __volatile__("":::"memory");
}
--- snip ---
to be sure am i on the right place and make some asm highlighting:
# diff -Nau spinlock.h.Orig spinlock.h
--- spinlock.h.Orig 2006-07-20 07:54:26.000000000 +0200
+++ spinlock.h 2006-07-21 19:37:16.000000000 +0200
@@ -20,8 +20,10 @@
{
volatile unsigned int *a;
- mb();
+/* mb(); */
+ __asm__ __volatile__("; raw_spin_lock_flags_in":::"memory");
a = __ldcw_align(x);
+
while (__ldcw(a) == 0)
while (*a == 0)
if (flags & PSW_SM_I) {
@@ -30,16 +32,19 @@
local_irq_disable();
} else
cpu_relax();
- mb();
+/* mb(); */
+ __asm__ __volatile__("; raw_spin_lock_flags_out":::"memory");
}
static inline void __raw_spin_unlock(raw_spinlock_t *x)
{
volatile unsigned int *a;
- mb();
+/* mb(); */
+ __asm__ __volatile__("; raw_spin_unlock_in":::"memory");
a = __ldcw_align(x);
*a = 1;
- mb();
+/* mb(); */
+ __asm__ __volatile__("; raw_spin_unlock_out":::"memory");
}
static inline int __raw_spin_trylock(raw_spinlock_t *x)
--- processor.h.Orig 2006-07-20 07:54:46.000000000 +0200
+++ processor.h 2006-07-20 08:14:30.000000000 +0200
@@ -355,7 +355,11 @@
}
#endif
+/*
#define cpu_relax() barrier()
+ */
+#define cpu_relax() __asm__ __volatile__("; cpu_relax\n\tnop\n\tnop\n" : : )
+
#endif /* __ASSEMBLY__ */
which obvioulsy produced different s files:
# diff -y SGSI_gcc33.s.ban_noFlag0 SGSI_gcc41.s.ban_noFlag1
.section .text.send_group_sig_info,"ax",@progb .section .text.send_group_sig_info,"ax",@progb
.align 8 .align 8
.globl send_group_sig_info .globl send_group_sig_info
.type send_group_sig_info, @function .type send_group_sig_info, @function
send_group_sig_info: send_group_sig_info:
.PROC .PROC
.CALLINFO FRAME=144,CALLS,SAVE_RP,ENTRY_GR=6 | .CALLINFO FRAME=160,CALLS,SAVE_RP,ENTRY_GR=7
.ENTRY .ENTRY
> addil LT'tasklist_lock,%r27
std %r2,-16(%r30) std %r2,-16(%r30)
std,ma %r7,144(%r30) | std,ma %r8,160(%r30)
std %r6,-136(%r30) | copy %r24,%r8
std %r5,-128(%r30) <
std %r4,-120(%r30) <
copy %r27,%r4 <
ldo -48(%r30),%r29 ldo -48(%r30),%r29
copy %r25,%r6 | std %r7,-152(%r30)
extrd,s %r26,63,32,%r5 | copy %r25,%r7
addil LT'tasklist_lock,%r27 | std %r6,-144(%r30)
ldd RT'tasklist_lock(%r1),%r1 | extrd,s %r26,63,32,%r6
copy %r1,%r26 | std %r5,-136(%r30)
> ldd RT'tasklist_lock(%r1),%r5
> copy %r5,%r26
> std %r4,-128(%r30)
b,l _read_lock,%r2 b,l _read_lock,%r2
copy %r24,%r7 | copy %r27,%r4
copy %r6,%r25 <
copy %r7,%r24 <
copy %r4,%r27 copy %r4,%r27
> copy %r6,%r26
> copy %r27,%r4
> copy %r7,%r25
ldo -48(%r30),%r29 ldo -48(%r30),%r29
b,l group_send_sig_info,%r2 b,l group_send_sig_info,%r2
copy %r5,%r26 | copy %r8,%r24
addil LT'tasklist_lock,%r4 | copy %r4,%r27
ldd RT'tasklist_lock(%r1),%r20 | copy %r28,%r19
#APP #APP
; raw_spin_lock_flags_in ; raw_spin_lock_flags_in
ldcw,co 0(%r20),%r19 <
#NO_APP #NO_APP
cmpib,<>,n 0,%r19,.L895 | .L944:
.L890: | #APP
ldw 0(%r20),%r19 | ldcw,co 0(%r5),%r28
cmpib,<>,n 0,%r19,.L897 | #NO_APP
.L889: | cmpib,<>,n 0,%r28,.L946
> .L945:
> ldw 0(%r5),%r28
> cmpb,*<> %r0,%r28,.L944
> nop
#APP #APP
; cpu_relax ; cpu_relax
nop nop
nop nop
#NO_APP #NO_APP
ldw 0(%r20),%r19 | b,n .L945
cmpib,= 0,%r19,.L889 | .L946:
nop <
.L897: <
#APP <
ldcw,co 0(%r20),%r19 <
#NO_APP <
cmpib,= 0,%r19,.L890 <
nop <
.L895: <
#APP #APP
; raw_spin_lock_flags_out ; raw_spin_lock_flags_out
#NO_APP #NO_APP
ldw 4(%r20),%r19 | addil LT'tasklist_lock,%r27
ldo -1(%r19),%r19 | ldd RT'tasklist_lock(%r1),%r31
stw %r19,4(%r20) | ldw 4(%r31),%r28
> ldo -1(%r28),%r28
> stw %r28,4(%r31)
#APP #APP
; raw_spin_unlock_in ; raw_spin_unlock_in
#NO_APP #NO_APP
ldi 1,%r19 | ldi 1,%r28
stw %r19,0(%r20) | stw %r28,0(%r5)
#APP #APP
; raw_spin_unlock_out ; raw_spin_unlock_out
#NO_APP #NO_APP
ldd -120(%r30),%r4 | copy %r19,%r28
ldd -128(%r30),%r5 | ldd -176(%r30),%r2
ldd -160(%r30),%r2 | ldd -152(%r30),%r7
ldd -136(%r30),%r6 | ldd -144(%r30),%r6
> ldd -136(%r30),%r5
> ldd -128(%r30),%r4
bve (%r2) bve (%r2)
ldd,mb -144(%r30),%r7 | ldd,mb -160(%r30),%r8
.EXIT .EXIT
.PROCEND .PROCEND
.size send_group_sig_info, .-send_group_sig_info .size send_group_sig_info, .-send_group_sig_info
And here are a lot of questions:
Q2: some diff in routine like 2 times ldcw with gcc-3 and only one with gcc-4 (may be ok?)
Q3: a 64bit comparison (cmpb,*<> %r0,%r28,.L944) introduced with gcc-4?
Q4: imho the most important: with 2 gcc where is the couple of ssm/rsm that jejb introduced in his patch:
<http://lists.parisc-linux.org/pipermail/parisc-linux-cvs/2005-October/036211.html>
--- linux-2.6/include/asm-parisc/spinlock.h 2005/09/14 14:42:11 1.16
+++ linux-2.6/include/asm-parisc/spinlock.h 2005/10/20 16:07:04 1.17
@@ -11,18 +11,25 @@
return *a == 0;
}
-#define __raw_spin_lock_flags(lock, flags) __raw_spin_lock(lock)
+#define __raw_spin_lock(lock) __raw_spin_lock_flags(lock, 0)
#define __raw_spin_unlock_wait(x) \
do { cpu_relax(); } while (__raw_spin_is_locked(x))
-static inline void __raw_spin_lock(raw_spinlock_t *x)
+static inline void __raw_spin_lock_flags(raw_spinlock_t *x,
+ unsigned long flags)
{
volatile unsigned int *a;
mb();
a = __ldcw_align(x);
while (__ldcw(a) == 0)
- while (*a == 0);
+ while (*a == 0)
+ if (flags & PSW_SM_I) {
+ local_irq_enable();
+ cpu_relax();
+ local_irq_disable();
+ } else
+ cpu_relax();
mb();
}
?
Mike, feel free to complete this summary if something looks like bad, not complete or not clear ;-)
Thanks for help,
Joel
_______________________________________________
parisc-linux mailing list
parisc-linux@lists.parisc-linux.org
http://lists.parisc-linux.org/mailman/listinfo/parisc-linux
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [parisc-linux] Mysterious hangs with parisc (a send_group_sig_info() analysis)
2006-07-21 20:05 ` [parisc-linux] Mysterious hangs with parisc (a send_group_sig_info() analysis) Joel Soete
@ 2006-07-21 20:21 ` Michael S. Zick
2006-07-22 17:23 ` Carlos O'Donell
2006-07-22 17:29 ` Carlos O'Donell
2 siblings, 0 replies; 11+ messages in thread
From: Michael S. Zick @ 2006-07-21 20:21 UTC (permalink / raw)
To: parisc-linux
On Fri July 21 2006 15:05, Joel Soete wrote:
> Hello all,
>
- - - snip
>
> Mike, feel free to complete this summary if something
> looks like bad, not complete or not clear ;-)
>
Nothing to add. That is a good summary Joel.
We know that routine fails,
we know its different,
we can't find the problem.
> Thanks for help,
> Joel
Mike
_______________________________________________
parisc-linux mailing list
parisc-linux@lists.parisc-linux.org
http://lists.parisc-linux.org/mailman/listinfo/parisc-linux
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [parisc-linux] Mysterious hangs with parisc (a send_group_sig_info() analysis)
2006-07-21 20:05 ` [parisc-linux] Mysterious hangs with parisc (a send_group_sig_info() analysis) Joel Soete
2006-07-21 20:21 ` Michael S. Zick
@ 2006-07-22 17:23 ` Carlos O'Donell
2006-07-22 19:11 ` Michael S. Zick
2006-07-22 17:29 ` Carlos O'Donell
2 siblings, 1 reply; 11+ messages in thread
From: Carlos O'Donell @ 2006-07-22 17:23 UTC (permalink / raw)
To: Joel Soete
Cc: Kyle McMartin, James.Bottomley, parisc-linux, John David Anglin
> And here are a lot of questions:
> Q2: some diff in routine like 2 times ldcw with gcc-3 and only one with gcc-4 (may be ok?)
>
Analyze the code. Is it doing functionally the same thing?
I assert that yes, it's doing exactly the same thing. The older gcc
has unwrapped one of the ldcw's, and thus has two copies of the inner
loop in the assembly. The code on the right does exactly the same
thing, but in less instructions. GCC has gotten better.
There isn't a missing lock, we only needed one lock, the older gcc was
doing a poorer job of code generation.
There may be other parts of the kernel that have problems, so keep looking!
Cheers,
Carlos.
_______________________________________________
parisc-linux mailing list
parisc-linux@lists.parisc-linux.org
http://lists.parisc-linux.org/mailman/listinfo/parisc-linux
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [parisc-linux] Mysterious hangs with parisc (a send_group_sig_info() analysis)
2006-07-21 20:05 ` [parisc-linux] Mysterious hangs with parisc (a send_group_sig_info() analysis) Joel Soete
2006-07-21 20:21 ` Michael S. Zick
2006-07-22 17:23 ` Carlos O'Donell
@ 2006-07-22 17:29 ` Carlos O'Donell
2006-07-22 22:10 ` Joel Soete
2 siblings, 1 reply; 11+ messages in thread
From: Carlos O'Donell @ 2006-07-22 17:29 UTC (permalink / raw)
To: Joel Soete
Cc: Kyle McMartin, James.Bottomley, parisc-linux, John David Anglin
> Q4: imho the most important: with 2 gcc where is the couple of ssm/rsm that jejb introduced in his patch:
The *most* important piece of imformation is the pre-compiled source.
You *must* determine that both paths of the if/else are different,
because if they are the same, then GCC is right to remove one of the
paths, and unconditionally call cpu_relax.
Where is the pre-compiled source to prove that the paths aren't identical?
Remember to work towards a solution!
Cheers,
Carlos.
_______________________________________________
parisc-linux mailing list
parisc-linux@lists.parisc-linux.org
http://lists.parisc-linux.org/mailman/listinfo/parisc-linux
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [parisc-linux] Mysterious hangs with parisc (a send_group_sig_info() analysis)
2006-07-22 17:23 ` Carlos O'Donell
@ 2006-07-22 19:11 ` Michael S. Zick
0 siblings, 0 replies; 11+ messages in thread
From: Michael S. Zick @ 2006-07-22 19:11 UTC (permalink / raw)
To: parisc-linux; +Cc: Kyle McMartin, James.Bottomley, John David Anglin
On Sat July 22 2006 12:23, Carlos O'Donell wrote:
> > And here are a lot of questions:
> > Q2: some diff in routine like 2 times ldcw with gcc-3 and only one with gcc-4 (may be ok?)
> >
>
> Analyze the code. Is it doing functionally the same thing?
>
> I assert that yes, it's doing exactly the same thing. The older gcc
> has unwrapped one of the ldcw's, and thus has two copies of the inner
> loop in the assembly. The code on the right does exactly the same
> thing, but in less instructions. GCC has gotten better.
>
> There isn't a missing lock, we only needed one lock, the older gcc was
> doing a poorer job of code generation.
>
OK - will go with that theory.
> There may be other parts of the kernel that have problems, so keep looking!
>
Now that portion of the work is easy (not quick, easy) to reproduce.
Pick a fairly recent kernel (I think Joel has used both 2.6.16 & 2.6.17);
Use gcc-3.3 to build 64-bit-smp;
Use gcc-4.1 to build 64-bit-smp;
Pick a machine with two or more processors (Joel has PA8000 & PA8700);
Pick yourself a never-ending script that keeps all processors busy;
Run the gcc-4.1 built kernel; observe that machine locks up after some
time period ranging from hours to a few days;
Boot the gcc-3.3 built kernel; observe that machine runs for weeks.
Yes, there is something getting compiled differently than intended.
Q) Is the following an expected difference?
send_group_sig_info: send_group_sig_info:
.PROC .PROC
.CALLINFO FRAME=144,CALLS,SAVE_RP,ENTRY_GR=6 | .CALLINFO FRAME=160,CALLS,SAVE_RP,ENTRY_GR=7
Mike
_______________________________________________
parisc-linux mailing list
parisc-linux@lists.parisc-linux.org
http://lists.parisc-linux.org/mailman/listinfo/parisc-linux
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [parisc-linux] Mysterious hangs with parisc (a send_group_sig_info() analysis)
2006-07-22 17:29 ` Carlos O'Donell
@ 2006-07-22 22:10 ` Joel Soete
2006-07-22 23:49 ` Michael S. Zick
0 siblings, 1 reply; 11+ messages in thread
From: Joel Soete @ 2006-07-22 22:10 UTC (permalink / raw)
To: Carlos O'Donell
Cc: Kyle McMartin, James.Bottomley, parisc-linux, John David Anglin
Carlos O'Donell wrote:
>> Q4: imho the most important: with 2 gcc where is the couple of
>> ssm/rsm that jejb introduced in his patch:
>
>
> The *most* important piece of imformation is the pre-compiled source.
if (flags & 0x00000001) {
__asm__ __volatile__("ssm %0,%%r0\n" : : "i" (0x00000001) : "memory" );
__asm__ __volatile__("":::"memory");
__asm__ __volatile__("rsm %0,%%r0\n" : : "i" (0x00000001) : "memory" );
} else
__asm__ __volatile__("":::"memory");
> You *must* determine that both paths of the if/else are different,
> because if they are the same, then GCC is right to remove one of the
> paths, and unconditionally call cpu_relax.
>
and flags is a parameter of __raw_spin_lock_flags():
static inline __attribute__((always_inline)) void __raw_spin_lock_flags(raw_spinlock_t *x,
unsigned long flags)
So i didn't see how gcc could know at compile time that (flags & 0x00000001) was always false?
> Where is the pre-compiled source to prove that the paths aren't identical?
>
Well I don't think that pre-compile stuff would help more because doesn't inlining stuff like as #define would?
That said I think to have found what you want to learn me: isn't it exactely this inlining into:
static __inline__ __attribute__((always_inline)) void __raw_read_unlock(raw_rwlock_t *rw)
{
__raw_spin_lock_flags(&rw->lock, 0);
--- snip ---
where obvioulsy (0 & whatelse) == 0 (i.e. false)
Is it well what you expected we found?
> Remember to work towards a solution!
>
So this:
IAOQ[0]: _read_lock+0x18/0x30
IAOQ[1]: _read_lock+0x8/0x30
RP(r2): send_group_sig_info+0x3c/0x88
was 2.6.14-rc related and not anymore to 2.6.17 and greater?
But I haven't anymore IAOQ report when this system hang now (even TOC doesn't report any usefull info).
So where to start other investigattion?
Thanks,
Joel
PS: would it make sence to test this nucked hunk
I mean afaik this "__raw_spin_lock_flags(&rw->lock, 0);" comes from this definition:
#define __raw_spin_lock(lock) __raw_spin_lock_flags(lock, 0)
Tbh I am a bit confused by this patch: write some stuff and say don't use it?
> Cheers,
> Carlos.
>
>
_______________________________________________
parisc-linux mailing list
parisc-linux@lists.parisc-linux.org
http://lists.parisc-linux.org/mailman/listinfo/parisc-linux
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [parisc-linux] Mysterious hangs with parisc (a send_group_sig_info() analysis)
2006-07-22 22:10 ` Joel Soete
@ 2006-07-22 23:49 ` Michael S. Zick
2006-07-24 4:16 ` Grant Grundler
0 siblings, 1 reply; 11+ messages in thread
From: Michael S. Zick @ 2006-07-22 23:49 UTC (permalink / raw)
To: parisc-linux
On Sat July 22 2006 17:10, Joel Soete wrote:
>
> Carlos O'Donell wrote:
> >> Q4: imho the most important: with 2 gcc where is the couple of
> >> ssm/rsm that jejb introduced in his patch:
> >
> >
> > The *most* important piece of imformation is the pre-compiled source.
> if (flags & 0x00000001) {
> __asm__ __volatile__("ssm %0,%%r0\n" : : "i" (0x00000001) : "memory" );
> __asm__ __volatile__("":::"memory");
> __asm__ __volatile__("rsm %0,%%r0\n" : : "i" (0x00000001) : "memory" );
> } else
> __asm__ __volatile__("":::"memory");
>
> > You *must* determine that both paths of the if/else are different,
> > because if they are the same, then GCC is right to remove one of the
> > paths, and unconditionally call cpu_relax.
> >
Note: That code is the same when either compiler is used;
_But_ as a separate problem...
My reading of the intent of that added code was:
Busy_Loop:
If (Interrupts are disabled) {
Enable interrupts;
Cpu_relax; /* Long enough to recognize pending interrupts */
Disable interrupts;
} else {
Cpu_relax;
}
And my complaint was a matter of structure...
The assumption here is that the flag bit in an external variable
is always in-sync with the interrupt status bit in the register.
Obviously a false assumption, this routine is an example...
Inside the first if block, the status bit is changed but the external
variable bit is not.
A general purpose, like we don't know what state the interrupts
are in, and we don't want to trust an external flag bit;
Looks like:
Busy_Loop:
Save current interrupt status to the external variable;
Enable interrupts; /* a nop if already enabled */
Cpu_relax; /* Long enough to recognize pending interrupts */
Restore interrupt status from external variable;
Our redefining Cpu_relax to a pair of nops was for machines that
delay recognizing interrupts for an instruction after changing
the status bit. We also included an asm comment in Cpu_relax
just in case we wanted to see that it had not gotten optimized away.
Same reason the mb() was redefined as an asm comment that could
be identified by inspection.
Mike
_______________________________________________
parisc-linux mailing list
parisc-linux@lists.parisc-linux.org
http://lists.parisc-linux.org/mailman/listinfo/parisc-linux
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [parisc-linux] Mysterious hangs with parisc (a send_group_sig_info() analysis)
2006-07-22 23:49 ` Michael S. Zick
@ 2006-07-24 4:16 ` Grant Grundler
0 siblings, 0 replies; 11+ messages in thread
From: Grant Grundler @ 2006-07-24 4:16 UTC (permalink / raw)
To: Michael S. Zick; +Cc: parisc-linux
On Sat, Jul 22, 2006 at 06:49:18PM -0500, Michael S. Zick wrote:
> The assumption here is that the flag bit in an external variable
> is always in-sync with the interrupt status bit in the register.
The flag bit is generally the value _before_ we attempted to acquire a lock.
I'm really not comfortable with passing in a zero flags value.
I'd need to review alot more code to be happy we aren't breaking
something else that uses __raw_spin_lock.
...
> A general purpose, like we don't know what state the interrupts
> are in, and we don't want to trust an external flag bit;
>
> Looks like:
>
> Busy_Loop:
> Save current interrupt status to the external variable;
> Enable interrupts; /* a nop if already enabled */
> Cpu_relax; /* Long enough to recognize pending interrupts */
> Restore interrupt status from external variable;
I'm ok with this for two reasons:
1) I'm not sure looking at the _previous_ state is the intent
2) branch in perf path is always a bad thing (after cacheline misses).
However, I don't know if there is a reason to instead look at
the previous state.
> Our redefining Cpu_relax to a pair of nops was for machines that
> delay recognizing interrupts for an instruction after changing
> the status bit. We also included an asm comment in Cpu_relax
> just in case we wanted to see that it had not gotten optimized away.
the asm comment is a good idea.
> Same reason the mb() was redefined as an asm comment that could
> be identified by inspection.
*nod*.
thanks,
grant
_______________________________________________
parisc-linux mailing list
parisc-linux@lists.parisc-linux.org
http://lists.parisc-linux.org/mailman/listinfo/parisc-linux
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2006-07-24 4:16 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20060624015935.GJ12481@quicksilver.road.mcmartin.ca>
[not found] ` <200606240241.k5O2fALQ001885@hiauly1.hia.nrc.ca>
2006-07-05 16:47 ` [parisc-linux] Mysterious hangs with parisc Kyle McMartin
2006-07-06 18:22 ` Joel Soete
2006-07-06 18:34 ` Kyle McMartin
2006-07-21 20:05 ` [parisc-linux] Mysterious hangs with parisc (a send_group_sig_info() analysis) Joel Soete
2006-07-21 20:21 ` Michael S. Zick
2006-07-22 17:23 ` Carlos O'Donell
2006-07-22 19:11 ` Michael S. Zick
2006-07-22 17:29 ` Carlos O'Donell
2006-07-22 22:10 ` Joel Soete
2006-07-22 23:49 ` Michael S. Zick
2006-07-24 4:16 ` Grant Grundler
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.