All of lore.kernel.org
 help / color / mirror / Atom feed
* [sparc64] Strange interaction between 2.6 kernel and 2.5 (and 2.6)
@ 2007-07-10  7:53 BERTRAND Joël
  2007-07-10 17:49 ` [sparc64] Strange interaction between 2.6 kernel and 2.5 (and David Miller
                   ` (33 more replies)
  0 siblings, 34 replies; 35+ messages in thread
From: BERTRAND Joël @ 2007-07-10  7:53 UTC (permalink / raw)
  To: sparclinux

	Hello,

	For one month, I try to debug a very strange interaction between 2.6 
kernel and glibc on sparc64 boxes. All boxes run debian/testing.
I have tested all kernel betwenn 2.6.20.3 and 2.6.21.6 with glibc 2.3, 
2.5 and 2.6 (2.6exp3 from debian/experimental).

Constatations:
1/ with all kernels and glibc2.3, all boxes run fine;
2/ with all kernels and glibc2.5 or 2.6, my U2/smp works fine too, but 
on U60/smp and U80/smp several daemons randmoly remain in sleep state 
and don't wake up anymore (named, clamd, milter-greylist, portmap...).

	I have tried to isolate parameters and I have found that the trouble 
comes from thread support introduced by new libc, and I though that I 
have to find it in glibc. Now, I think that this bug vomes from kernel, 
because I never seen this bug (with of course the same configuration) on 
my U2.

Steps to reproduce:
1/ install a sendmail server with mimedefang that calls clamav-daemon 
(in mimedefang.pl.conf : $Features{'Virus:CLAMD'}      = 1; $ClamdSock = 
"/var/run/clamav/clamd.ctl";)
2/ send or receive (by script ;-) ) huge amount of mails
3/ wait... nd see ;-)

	When clamav-daemon (or all other daemons) remain in sleep mode, I 
cannot quickly restart this daemon by /etc/init.d/$(daemon) restart, I 
have to wait for a timeout.

	Regards,

	JKB

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [sparc64] Strange interaction between 2.6 kernel and 2.5 (and
  2007-07-10  7:53 [sparc64] Strange interaction between 2.6 kernel and 2.5 (and 2.6) BERTRAND Joël
@ 2007-07-10 17:49 ` David Miller
  2007-07-10 18:03 ` BERTRAND Joël
                   ` (32 subsequent siblings)
  33 siblings, 0 replies; 35+ messages in thread
From: David Miller @ 2007-07-10 17:49 UTC (permalink / raw)
  To: sparclinux

From: BERTRAND_Joël <joel.bertrand@systella.fr>
Date: Tue, 10 Jul 2007 09:53:12 +0200

> 	When clamav-daemon (or all other daemons) remain in sleep mode, I 
> cannot quickly restart this daemon by /etc/init.d/$(daemon) restart, I 
> have to wait for a timeout.

Use the kernel sysrq keystrokes to get a task dump to see where
clamav-daemon is sleeping, my suspicion is that it is stuck
in the kernel futex code somewhere.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [sparc64] Strange interaction between 2.6 kernel and 2.5 (and
  2007-07-10  7:53 [sparc64] Strange interaction between 2.6 kernel and 2.5 (and 2.6) BERTRAND Joël
  2007-07-10 17:49 ` [sparc64] Strange interaction between 2.6 kernel and 2.5 (and David Miller
@ 2007-07-10 18:03 ` BERTRAND Joël
  2007-07-10 18:05 ` BERTRAND Joël
                   ` (31 subsequent siblings)
  33 siblings, 0 replies; 35+ messages in thread
From: BERTRAND Joël @ 2007-07-10 18:03 UTC (permalink / raw)
  To: sparclinux

David Miller wrote:
> From: BERTRAND_Joël <joel.bertrand@systella.fr>
> Date: Tue, 10 Jul 2007 09:53:12 +0200
> 
>> 	When clamav-daemon (or all other daemons) remain in sleep mode, I 
>> cannot quickly restart this daemon by /etc/init.d/$(daemon) restart, I 
>> have to wait for a timeout.
> 
> Use the kernel sysrq keystrokes to get a task dump to see where
> clamav-daemon is sleeping,

	I will try tomorrow. I have to build a 2.6.22 (to see if nis works 
because it doesn't with 2.6.22-rc7).

> my suspicion is that it is stuck
> in the kernel futex code somewhere.

	I think too.

	JKB

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [sparc64] Strange interaction between 2.6 kernel and 2.5 (and
  2007-07-10  7:53 [sparc64] Strange interaction between 2.6 kernel and 2.5 (and 2.6) BERTRAND Joël
  2007-07-10 17:49 ` [sparc64] Strange interaction between 2.6 kernel and 2.5 (and David Miller
  2007-07-10 18:03 ` BERTRAND Joël
@ 2007-07-10 18:05 ` BERTRAND Joël
  2007-07-11 10:24 ` BERTRAND Joël
                   ` (30 subsequent siblings)
  33 siblings, 0 replies; 35+ messages in thread
From: BERTRAND Joël @ 2007-07-10 18:05 UTC (permalink / raw)
  To: sparclinux

David Miller wrote:
> From: BERTRAND_Joël <joel.bertrand@systella.fr>
> Date: Tue, 10 Jul 2007 09:53:12 +0200
> 
>> 	When clamav-daemon (or all other daemons) remain in sleep mode, I 
>> cannot quickly restart this daemon by /etc/init.d/$(daemon) restart, I 
>> have to wait for a timeout.
> 
> Use the kernel sysrq keystrokes to get a task dump to see where
> clamav-daemon is sleeping,

	I will try tomorrow. I have to build a 2.6.22 (to see if nis works 
because it doesn't with 2.6.22-rc7).

> my suspicion is that it is stuck
> in the kernel futex code somewhere.

	I think too.

	JKB

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [sparc64] Strange interaction between 2.6 kernel and 2.5 (and
  2007-07-10  7:53 [sparc64] Strange interaction between 2.6 kernel and 2.5 (and 2.6) BERTRAND Joël
                   ` (2 preceding siblings ...)
  2007-07-10 18:05 ` BERTRAND Joël
@ 2007-07-11 10:24 ` BERTRAND Joël
  2007-07-11 18:56 ` BERTRAND Joël
                   ` (29 subsequent siblings)
  33 siblings, 0 replies; 35+ messages in thread
From: BERTRAND Joël @ 2007-07-11 10:24 UTC (permalink / raw)
  To: sparclinux

BERTRAND Joël wrote:
> David Miller wrote:
>> From: BERTRAND_Joël <joel.bertrand@systella.fr>
>> Date: Tue, 10 Jul 2007 09:53:12 +0200
>>
>>>     When clamav-daemon (or all other daemons) remain in sleep mode, I 
>>> cannot quickly restart this daemon by /etc/init.d/$(daemon) restart, 
>>> I have to wait for a timeout.
>>
>> Use the kernel sysrq keystrokes to get a task dump to see where
>> clamav-daemon is sleeping,
> 
>     I will try tomorrow. I have to build a 2.6.22 (to see if nis works 
> because it doesn't with 2.6.22-rc7).

	Tests made with a 2.6.22.1 (NIS works with 2.6.22.1 and not with 
2.6.22-rc7, but I don't know why... ;-) )

>> my suspicion is that it is stuck
>> in the kernel futex code somewhere.
> 
>     I think too.

	When I write these lines, my mail server is blocked due to clamd socket 
trouble :

Root rayleigh:[/etc/mail] > ps -eLf | grep clamd
clamav    3502     1  3502  1    2 10:53 ?        00:01:23 /usr/sbin/clamd
clamav    3502     1  7226  0    2 12:00 ?        00:00:00 /usr/sbin/clamd
root      8091  5893  8091  0    1 12:18 pts/0    00:00:00 grep clamd

	clamd remains in S mode.

	I have tried to obtain mode information with sysrq+w without any 
success, sysrq+w returns no blocked task. In this example, clamd is 
blocked, but some other tasks can be blocked.

	Regards,

	JKB

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [sparc64] Strange interaction between 2.6 kernel and 2.5 (and
  2007-07-10  7:53 [sparc64] Strange interaction between 2.6 kernel and 2.5 (and 2.6) BERTRAND Joël
                   ` (3 preceding siblings ...)
  2007-07-11 10:24 ` BERTRAND Joël
@ 2007-07-11 18:56 ` BERTRAND Joël
  2007-07-11 20:43 ` David Miller
                   ` (28 subsequent siblings)
  33 siblings, 0 replies; 35+ messages in thread
From: BERTRAND Joël @ 2007-07-11 18:56 UTC (permalink / raw)
  To: sparclinux

BERTRAND Joël wrote:
> BERTRAND Joël wrote:
>> David Miller wrote:
>>> From: BERTRAND_Joël <joel.bertrand@systella.fr>
>>> Date: Tue, 10 Jul 2007 09:53:12 +0200
>>>
>>>>     When clamav-daemon (or all other daemons) remain in sleep mode, 
>>>> I cannot quickly restart this daemon by /etc/init.d/$(daemon) 
>>>> restart, I have to wait for a timeout.
>>>
>>> Use the kernel sysrq keystrokes to get a task dump to see where
>>> clamav-daemon is sleeping,
>>
>>     I will try tomorrow. I have to build a 2.6.22 (to see if nis works 
>> because it doesn't with 2.6.22-rc7).
> 
>     Tests made with a 2.6.22.1 (NIS works with 2.6.22.1 and not with 
> 2.6.22-rc7, but I don't know why... ;-) )
> 
>>> my suspicion is that it is stuck
>>> in the kernel futex code somewhere.
>>
>>     I think too.
> 
>     When I write these lines, my mail server is blocked due to clamd 
> socket trouble :
> 
> Root rayleigh:[/etc/mail] > ps -eLf | grep clamd
> clamav    3502     1  3502  1    2 10:53 ?        00:01:23 /usr/sbin/clamd
> clamav    3502     1  7226  0    2 12:00 ?        00:00:00 /usr/sbin/clamd
> root      8091  5893  8091  0    1 12:18 pts/0    00:00:00 grep clamd
> 
>     clamd remains in S mode.
> 
>     I have tried to obtain mode information with sysrq+w without any 
> success, sysrq+w returns no blocked task. In this example, clamd is 
> blocked, but some other tasks can be blocked.

	David, I cannot obtain more information with sysrq, but I have found 
another program that randomly stops : seamonkey (not really seamonkey, 
iceape, its debian package). Iceape randomly stops in sleep state. I 
have launched iceape in strace :

mprotect(0xf3ad6000, 8192, PROT_READ|PROT_WRITE) = 0
mprotect(0xf3ad8000, 8192, PROT_READ|PROT_WRITE) = 0
mprotect(0xf3ada000, 8192, PROT_READ|PROT_WRITE) = 0
mprotect(0xf3adc000, 8192, PROT_READ|PROT_WRITE) = 0
mprotect(0xf3ade000, 8192, PROT_READ|PROT_WRITE) = 0
mprotect(0xf3ae0000, 16384, PROT_READ|PROT_WRITE) = 0
mprotect(0xf3ae4000, 8192, PROT_READ|PROT_WRITE) = 0
mprotect(0xf3ae6000, 8192, PROT_READ|PROT_WRITE) = 0
mprotect(0xf3ae8000, 8192, PROT_READ|PROT_WRITE) = 0
gettimeofday({1184179749, 741572}, NULL) = 0
gettimeofday({1184179749, 741853}, NULL) = 0
futex(0x278dac, FUTEX_WAKE, 1)          = 1
access("/usr/lib/iceape/chrome/modern.jar", F_OK) = 0
gettimeofday({1184179749, 767087}, NULL) = 0
gettimeofday({1184179749, 767300}, NULL) = 0
gettimeofday({1184179749, 767530}, NULL) = 0
futex(0x278dac, FUTEX_WAKE, 1)          = 1
futex(0x278da8, FUTEX_WAKE, 1)          = 1
access("/usr/lib/iceape/chrome/modern.jar", F_OK) = 0
gettimeofday({1184179749, 771232}, NULL) = 0
gettimeofday({1184179749, 771447}, NULL) = 0
gettimeofday({1184179749, 771677}, NULL) = 0
futex(0x278dac, FUTEX_WAKE, 1)          = 1
futex(0xf3a00010, FUTEX_WAIT, 2, NULL <unfinished ...>
rayleigh:[~] >

(aborted by CTRL+C)

	If I restart iceape, it always stops with the same message type.

	I don't understand why (with the same kernel and same glibc), my U2/SMP 
works fine and not my U60/SMP.

	Regards,

	JKB

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [sparc64] Strange interaction between 2.6 kernel and 2.5 (and
  2007-07-10  7:53 [sparc64] Strange interaction between 2.6 kernel and 2.5 (and 2.6) BERTRAND Joël
                   ` (4 preceding siblings ...)
  2007-07-11 18:56 ` BERTRAND Joël
@ 2007-07-11 20:43 ` David Miller
  2007-07-12  7:17 ` BERTRAND Joël
                   ` (27 subsequent siblings)
  33 siblings, 0 replies; 35+ messages in thread
From: David Miller @ 2007-07-11 20:43 UTC (permalink / raw)
  To: sparclinux

From: BERTRAND_Joël <joel.bertrand@systella.fr>
Date: Wed, 11 Jul 2007 12:24:07 +0200

> Root rayleigh:[/etc/mail] > ps -eLf | grep clamd
> clamav    3502     1  3502  1    2 10:53 ?        00:01:23 /usr/sbin/clamd
> clamav    3502     1  7226  0    2 12:00 ?        00:00:00 /usr/sbin/clamd
> root      8091  5893  8091  0    1 12:18 pts/0    00:00:00 grep clamd
> 
> 	clamd remains in S mode.
> 
> 	I have tried to obtain mode information with sysrq+w without any 
> success, sysrq+w returns no blocked task. In this example, clamd is 
> blocked, but some other tasks can be blocked.

This is not enough information to debug this problem, sorry.

You'll need to do some thinking about what kind of other information
you can fetch from this stuck process.  Perhaps running
clamd under strace will provide a good debugging trace so we can
see exactly what kind of socket it is stuck on, and in what way.

Otherwise, I wish you luck in debugging this problem :-)

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [sparc64] Strange interaction between 2.6 kernel and 2.5 (and
  2007-07-10  7:53 [sparc64] Strange interaction between 2.6 kernel and 2.5 (and 2.6) BERTRAND Joël
                   ` (5 preceding siblings ...)
  2007-07-11 20:43 ` David Miller
@ 2007-07-12  7:17 ` BERTRAND Joël
  2007-07-12  9:11 ` BERTRAND Joël
                   ` (26 subsequent siblings)
  33 siblings, 0 replies; 35+ messages in thread
From: BERTRAND Joël @ 2007-07-12  7:17 UTC (permalink / raw)
  To: sparclinux

David Miller wrote:
> From: BERTRAND_Joël <joel.bertrand@systella.fr>
> Date: Wed, 11 Jul 2007 12:24:07 +0200
> 
>> Root rayleigh:[/etc/mail] > ps -eLf | grep clamd
>> clamav    3502     1  3502  1    2 10:53 ?        00:01:23 /usr/sbin/clamd
>> clamav    3502     1  7226  0    2 12:00 ?        00:00:00 /usr/sbin/clamd
>> root      8091  5893  8091  0    1 12:18 pts/0    00:00:00 grep clamd
>>
>> 	clamd remains in S mode.
>>
>> 	I have tried to obtain mode information with sysrq+w without any 
>> success, sysrq+w returns no blocked task. In this example, clamd is 
>> blocked, but some other tasks can be blocked.
> 
> This is not enough information to debug this problem, sorry.
> 
> You'll need to do some thinking about what kind of other information
> you can fetch from this stuck process.  Perhaps running
> clamd under strace will provide a good debugging trace so we can
> see exactly what kind of socket it is stuck on, and in what way.

	OK. Now, I run clamd under strace. I hope that it will quickly hang ;-) 
When,clamd hangs, I will post here its strace output. Are there 
significative differences between futex management on sbus and pci 
sparc64 ? I have tried to reproduce this bug on sbus workstation without 
any success, but I can see it on all sparc64/PCI I have (U60, U80, U420)...

	Regards,

	JKB

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [sparc64] Strange interaction between 2.6 kernel and 2.5 (and
  2007-07-10  7:53 [sparc64] Strange interaction between 2.6 kernel and 2.5 (and 2.6) BERTRAND Joël
                   ` (6 preceding siblings ...)
  2007-07-12  7:17 ` BERTRAND Joël
@ 2007-07-12  9:11 ` BERTRAND Joël
  2007-07-12  9:38 ` David Miller
                   ` (25 subsequent siblings)
  33 siblings, 0 replies; 35+ messages in thread
From: BERTRAND Joël @ 2007-07-12  9:11 UTC (permalink / raw)
  To: sparclinux

[-- Attachment #1: Type: text/plain, Size: 1558 bytes --]

BERTRAND Joël wrote:
> David Miller wrote:
>> From: BERTRAND_Joël <joel.bertrand@systella.fr>
>> Date: Wed, 11 Jul 2007 12:24:07 +0200
>>
>>> Root rayleigh:[/etc/mail] > ps -eLf | grep clamd
>>> clamav    3502     1  3502  1    2 10:53 ?        00:01:23 
>>> /usr/sbin/clamd
>>> clamav    3502     1  7226  0    2 12:00 ?        00:00:00 
>>> /usr/sbin/clamd
>>> root      8091  5893  8091  0    1 12:18 pts/0    00:00:00 grep clamd
>>>
>>>     clamd remains in S mode.
>>>
>>>     I have tried to obtain mode information with sysrq+w without any 
>>> success, sysrq+w returns no blocked task. In this example, clamd is 
>>> blocked, but some other tasks can be blocked.
>>
>> This is not enough information to debug this problem, sorry.
>>
>> You'll need to do some thinking about what kind of other information
>> you can fetch from this stuck process.  Perhaps running
>> clamd under strace will provide a good debugging trace so we can
>> see exactly what kind of socket it is stuck on, and in what way.
> 
>     OK. Now, I run clamd under strace. I hope that it will quickly hang 
> ;-) When,clamd hangs, I will post here its strace output. Are there 
> significative differences between futex management on sbus and pci 
> sparc64 ? I have tried to reproduce this bug on sbus workstation without 
> any success, but I can see it on all sparc64/PCI I have (U60, U80, U420)...

	I'm lucky ;-) First mail in clamd and clamd sleeps... You can find in 
attachement complete strace log (strace -ff ...).

	Regards,

	JKB

[-- Attachment #2: clamav-strace.bz2 --]
[-- Type: application/octet-stream, Size: 277171 bytes --]

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [sparc64] Strange interaction between 2.6 kernel and 2.5 (and
  2007-07-10  7:53 [sparc64] Strange interaction between 2.6 kernel and 2.5 (and 2.6) BERTRAND Joël
                   ` (7 preceding siblings ...)
  2007-07-12  9:11 ` BERTRAND Joël
@ 2007-07-12  9:38 ` David Miller
  2007-07-12  9:46 ` David Miller
                   ` (24 subsequent siblings)
  33 siblings, 0 replies; 35+ messages in thread
From: David Miller @ 2007-07-12  9:38 UTC (permalink / raw)
  To: sparclinux

From: BERTRAND_Joël <joel.bertrand@systella.fr>
Date: Thu, 12 Jul 2007 09:17:04 +0200

> Are there significative differences between futex management on sbus
> and pci sparc64 ?

Absolutely none.

If it is some race, the timing differences of the two cpus
can be enough to make it hard if not impossible to trigger
on one machine vs. another.  I see bugs like this all the
time.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [sparc64] Strange interaction between 2.6 kernel and 2.5 (and
  2007-07-10  7:53 [sparc64] Strange interaction between 2.6 kernel and 2.5 (and 2.6) BERTRAND Joël
                   ` (8 preceding siblings ...)
  2007-07-12  9:38 ` David Miller
@ 2007-07-12  9:46 ` David Miller
  2007-07-12  9:50 ` BERTRAND Joël
                   ` (23 subsequent siblings)
  33 siblings, 0 replies; 35+ messages in thread
From: David Miller @ 2007-07-12  9:46 UTC (permalink / raw)
  To: sparclinux

From: BERTRAND_Joël <joel.bertrand@systella.fr>
Date: Thu, 12 Jul 2007 11:11:32 +0200

> 	I'm lucky ;-) First mail in clamd and clamd sleeps... You can find in 
> attachement complete strace log (strace -ff ...).

It is looping at the end servicing SIGINT over and over, are
you pressing Ctrl-C at the terminal where clamav is running
or sending it signals with "kill"?


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [sparc64] Strange interaction between 2.6 kernel and 2.5 (and
  2007-07-10  7:53 [sparc64] Strange interaction between 2.6 kernel and 2.5 (and 2.6) BERTRAND Joël
                   ` (9 preceding siblings ...)
  2007-07-12  9:46 ` David Miller
@ 2007-07-12  9:50 ` BERTRAND Joël
  2007-07-25  6:19 ` David Miller
                   ` (22 subsequent siblings)
  33 siblings, 0 replies; 35+ messages in thread
From: BERTRAND Joël @ 2007-07-12  9:50 UTC (permalink / raw)
  To: sparclinux

David Miller wrote:
> From: BERTRAND_Joël <joel.bertrand@systella.fr>
> Date: Thu, 12 Jul 2007 11:11:32 +0200
> 
>> 	I'm lucky ;-) First mail in clamd and clamd sleeps... You can find in 
>> attachement complete strace log (strace -ff ...).
> 
> It is looping at the end servicing SIGINT over and over, are
> you pressing Ctrl-C at the terminal where clamav is running
> or sending it signals with "kill"?
> 

	Process stops at line 17384 in clamav-strace:

futex(0x25944f0, FUTEX_WAIT

Line terminaison was written ("2, NULL)   = ? ERESTARTSYS (To be 
restarted)") when I have pressed ctrl+C, but process remains in sleep 
state. I have tried kill -15 (without any success). Only kill -9 kills 
clamd.

	Regards,

	JKB

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [sparc64] Strange interaction between 2.6 kernel and 2.5 (and
  2007-07-10  7:53 [sparc64] Strange interaction between 2.6 kernel and 2.5 (and 2.6) BERTRAND Joël
                   ` (10 preceding siblings ...)
  2007-07-12  9:50 ` BERTRAND Joël
@ 2007-07-25  6:19 ` David Miller
  2007-07-25  6:58 ` BERTRAND Joël
                   ` (21 subsequent siblings)
  33 siblings, 0 replies; 35+ messages in thread
From: David Miller @ 2007-07-25  6:19 UTC (permalink / raw)
  To: sparclinux

From: BERTRAND_Joël <joel.bertrand@systella.fr>
Date: Thu, 12 Jul 2007 11:50:38 +0200

> 	Process stops at line 17384 in clamav-strace:
> 
> futex(0x25944f0, FUTEX_WAIT
> 
> Line terminaison was written ("2, NULL)   = ? ERESTARTSYS (To be 
> restarted)") when I have pressed ctrl+C, but process remains in sleep 
> state. I have tried kill -15 (without any success). Only kill -9 kills 
> clamd.

Can you give this patch a try?

diff --git a/include/asm-sparc64/futex.h b/include/asm-sparc64/futex.h
index 876312f..3b5797e 100644
--- a/include/asm-sparc64/futex.h
+++ b/include/asm-sparc64/futex.h
@@ -14,6 +14,7 @@
 	"	cmp	%2, %1\n"			\
 	"	bne,pn	%%icc, 1b\n"			\
 	"	 mov	0, %0\n"			\
+	"	sra	%1, 0, %1\n"			\
 	"3:\n"						\
 	"	.section .fixup,#alloc,#execinstr\n"	\
 	"	.align	4\n"				\
@@ -88,6 +89,7 @@ futex_atomic_cmpxchg_inatomic(int __user *uaddr, int oldval, int newval)
 {
 	__asm__ __volatile__(
 	"\n1:	casa	[%3] %%asi, %2, %0\n"
+	"	sra	%0, 0, %0\n"
 	"2:\n"
 	"	.section .fixup,#alloc,#execinstr\n"
 	"	.align	4\n"

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [sparc64] Strange interaction between 2.6 kernel and 2.5 (and
  2007-07-10  7:53 [sparc64] Strange interaction between 2.6 kernel and 2.5 (and 2.6) BERTRAND Joël
                   ` (11 preceding siblings ...)
  2007-07-25  6:19 ` David Miller
@ 2007-07-25  6:58 ` BERTRAND Joël
  2007-07-25  8:31 ` BERTRAND Joël
                   ` (20 subsequent siblings)
  33 siblings, 0 replies; 35+ messages in thread
From: BERTRAND Joël @ 2007-07-25  6:58 UTC (permalink / raw)
  To: sparclinux

David Miller wrote:
> From: BERTRAND_Joël <joel.bertrand@systella.fr>
> Date: Thu, 12 Jul 2007 11:50:38 +0200
> 
>> 	Process stops at line 17384 in clamav-strace:
>>
>> futex(0x25944f0, FUTEX_WAIT
>>
>> Line terminaison was written ("2, NULL)   = ? ERESTARTSYS (To be 
>> restarted)") when I have pressed ctrl+C, but process remains in sleep 
>> state. I have tried kill -15 (without any success). Only kill -9 kills 
>> clamd.
> 
> Can you give this patch a try?
> 
> diff --git a/include/asm-sparc64/futex.h b/include/asm-sparc64/futex.h
> index 876312f..3b5797e 100644
> --- a/include/asm-sparc64/futex.h
> +++ b/include/asm-sparc64/futex.h
> @@ -14,6 +14,7 @@
>  	"	cmp	%2, %1\n"			\
>  	"	bne,pn	%%icc, 1b\n"			\
>  	"	 mov	0, %0\n"			\
> +	"	sra	%1, 0, %1\n"			\
>  	"3:\n"						\
>  	"	.section .fixup,#alloc,#execinstr\n"	\
>  	"	.align	4\n"				\
> @@ -88,6 +89,7 @@ futex_atomic_cmpxchg_inatomic(int __user *uaddr, int oldval, int newval)
>  {
>  	__asm__ __volatile__(
>  	"\n1:	casa	[%3] %%asi, %2, %0\n"
> +	"	sra	%0, 0, %0\n"
>  	"2:\n"
>  	"	.section .fixup,#alloc,#execinstr\n"
>  	"	.align	4\n"

	Applied. My U60 is rebuilding a 2.6.22.1 kernel with your patch. I test 
and I'll come back with feedback.

	Thanks,

	JKB

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [sparc64] Strange interaction between 2.6 kernel and 2.5 (and
  2007-07-10  7:53 [sparc64] Strange interaction between 2.6 kernel and 2.5 (and 2.6) BERTRAND Joël
                   ` (12 preceding siblings ...)
  2007-07-25  6:58 ` BERTRAND Joël
@ 2007-07-25  8:31 ` BERTRAND Joël
  2007-07-25  8:33 ` David Miller
                   ` (19 subsequent siblings)
  33 siblings, 0 replies; 35+ messages in thread
From: BERTRAND Joël @ 2007-07-25  8:31 UTC (permalink / raw)
  To: sparclinux

David Miller wrote:
> From: BERTRAND_Joël <joel.bertrand@systella.fr>
> Date: Thu, 12 Jul 2007 11:50:38 +0200
> 
>> 	Process stops at line 17384 in clamav-strace:
>>
>> futex(0x25944f0, FUTEX_WAIT
>>
>> Line terminaison was written ("2, NULL)   = ? ERESTARTSYS (To be 
>> restarted)") when I have pressed ctrl+C, but process remains in sleep 
>> state. I have tried kill -15 (without any success). Only kill -9 kills 
>> clamd.
> 
> Can you give this patch a try?
> 
> diff --git a/include/asm-sparc64/futex.h b/include/asm-sparc64/futex.h
> index 876312f..3b5797e 100644
> --- a/include/asm-sparc64/futex.h
> +++ b/include/asm-sparc64/futex.h
> @@ -14,6 +14,7 @@
>  	"	cmp	%2, %1\n"			\
>  	"	bne,pn	%%icc, 1b\n"			\
>  	"	 mov	0, %0\n"			\
> +	"	sra	%1, 0, %1\n"			\
>  	"3:\n"						\
>  	"	.section .fixup,#alloc,#execinstr\n"	\
>  	"	.align	4\n"				\
> @@ -88,6 +89,7 @@ futex_atomic_cmpxchg_inatomic(int __user *uaddr, int oldval, int newval)
>  {
>  	__asm__ __volatile__(
>  	"\n1:	casa	[%3] %%asi, %2, %0\n"
> +	"	sra	%0, 0, %0\n"
>  	"2:\n"
>  	"	.section .fixup,#alloc,#execinstr\n"
>  	"	.align	4\n"

	David,

	I have tried your patch with icemonkey and it hangs (in a futex). 
Strace output ends with :

futex(0xf2d3b0, FUTEX_WAKE, 1)          = 0
write(5, "\372", 1)                     = 1
ioctl(3, 0x4004667f, 0xfffe5304)        = 0
poll([{fd=3, events=POLLIN}, {fd=9, events=POLLIN}, {fd\x13, 
events=POLLIN|POLLPR
I}, {fd\x15, events=POLLIN|POLLPRI}, {fd\x16, events=POLLIN|POLLPRI}, 
{fd\x17, even
ts=POLLIN|POLLPRI}, {fd=4, events=POLLIN, revents=POLLIN}], 7, -1) = 1
gettimeofday({1185351978, 620970}, NULL) = 0
gettimeofday({1185351978, 621167}, NULL) = 0
gettimeofday({1185351978, 621487}, NULL) = 0
gettimeofday({1185351978, 621671}, NULL) = 0
futex(0x923ec, FUTEX_WAKE, 1)           = 1
futex(0x923e8, FUTEX_WAKE, 1)           = 1
read(4, "\372", 1)                      = 1
futex(0xf9504, FUTEX_WAKE, 1)           = 1
ioctl(3, 0x4004667f, 0xfffe5304)        = 0
poll([{fd=3, events=POLLIN}, {fd=9, events=POLLIN}, {fd\x13, 
events=POLLIN|POLLPR
I}, {fd\x15, events=POLLIN|POLLPRI}, {fd\x16, events=POLLIN|POLLPRI}, 
{fd\x17, even
ts=POLLIN|POLLPRI}], 6, 0) = 0
write(3, "5\30\0\4\1 \30!\0\0\0>\0\22\0\22F\3\0\5\1 \30!\1 \n\236"..., 
404) = 40
4
ioctl(3, 0x4004667f, 0xfffe5304)        = 0
poll([{fd=3, events=POLLIN}, {fd=9, events=POLLIN}, {fd\x13, 
events=POLLIN|POLLPR
I}, {fd\x15, events=POLLIN|POLLPRI}, {fd\x16, events=POLLIN|POLLPRI}, 
{fd\x17, even
ts=POLLIN|POLLPRI}, {fd=4, events=POLLIN, revents=POLLIN}], 7, -1) = 1
futex(0xf300d4e4, FUTEX_WAKE, 1)        = 1
futex(0xf300d4e0, FUTEX_WAKE, 1)        = 1
futex(0xf277c8, FUTEX_WAKE, 1)          = 1
futex(0xf2d050, FUTEX_WAKE, 1)          = 1
futex(0xf2d054, FUTEX_WAIT, 1, NULL)    = -1 EAGAIN (Resource 
temporarily unavai
lable)
futex(0xf277c8, FUTEX_WAKE, 1)          = 0
write(5, "\372", 1)                     = 1
ioctl(3, 0x4004667f, 0xfffe5304)        = 0
poll([{fd=3, events=POLLIN}, {fd=9, events=POLLIN}, {fd\x13, 
events=POLLIN|POLLPR
I}, {fd\x15, events=POLLIN|POLLPRI}, {fd\x16, events=POLLIN|POLLPRI}, 
{fd\x17, even
ts=POLLIN|POLLPRI}, {fd=4, events=POLLIN, revents=POLLIN}], 7, -1) = 1
gettimeofday({1185351978, 680377}, NULL) = 0
write(3, "(\30\0\4\1 \3:\0\0\0>\0\0\0\0", 16) = 16
read(3, "\1\1\277\310\0\0\0\0\0@\0\372\0\1\0\27\0\0\0\1\0\0\0\0"..., 32) 
= 32
write(3, "(\30\0\4\1 \3:\0\0\0>\0\0\0\0", 16) = 16
read(3, "\1\1\277\311\0\0\0\0\0@\0\372\0\1\0\27\0\0\0\1\0\0\0\0"..., 32) 
= 32
write(3, "(\30\0\4\1 \3:\0\0\0>\0\0\0\0", 16) = 16
read(3, "\1\1\277\312\0\0\0\0\0@\0\372\0\1\0\27\0\0\0\1\0\0\0\0"..., 32) 
= 32
write(3, "(\30\0\4\1 \3:\0\0\0>\0\0\0\0", 16) = 16
read(3, "\1\1\277\313\0\0\0\0\0@\0\372\0\1\0\27\0\0\0\1\0\0\0\0"..., 32) 
= 32
write(3, "(\30\0\4\1 \3:\0\0\0>\0\0\0\0", 16) = 16
read(3, "\1\1\277\314\0\0\0\0\0@\0\372\0\1\0\27\0\0\0\1\0\0\0\0"..., 32) 
= 32
write(3, "(\30\0\4\1 \3:\0\0\0>\0\0\0\0", 16) = 16
read(3, "\1\1\277\315\0\0\0\0\0@\0\372\0\1\0\27\0\0\0\1\0\0\0\0"..., 32) 
= 32
gettimeofday({1185351978, 688447}, NULL) = 0
gettimeofday({1185351978, 688650}, NULL) = 0
gettimeofday({1185351978, 688958}, NULL) = 0
gettimeofday({1185351978, 689146}, NULL) = 0
futex(0x923ec, FUTEX_WAKE, 1)           = 1
futex(0x923e8, FUTEX_WAKE, 1)           = 1
write(5, "\372", 1)                     = 1
ioctl(3, 0x4004667f, 0xfffe5304)        = 0
poll([{fd=3, events=POLLIN}, {fd=9, events=POLLIN}, {fd\x13, 
events=POLLIN|POLLPR
I}, {fd\x15, events=POLLIN|POLLPRI}, {fd\x16, events=POLLIN|POLLPRI}, 
{fd\x17, even
ts=POLLIN|POLLPRI}], 6, 0) = 0
write(3, "5\30\0\4\1 \30%\0\0\0>\4\t\3U;\3\0\7\1 \27\220\0\0\0\0"..., 
7948) = 79
48
ioctl(3, 0x4004667f, 0xfffe5304)        = 0
poll([{fd=3, events=POLLIN}, {fd=9, events=POLLIN}, {fd\x13, 
events=POLLIN|POLLPR
I}, {fd\x15, events=POLLIN|POLLPRI}, {fd\x16, events=POLLIN|POLLPRI}, 
{fd\x17, even
ts=POLLIN|POLLPRI}, {fd=4, events=POLLIN, revents=POLLIN}], 7, -1) = 1
futex(0xf72cb648, FUTEX_WAIT, 2, NULL)  = ? ERESTARTSYS (To be restarted)
--- SIGWINCH (Window changed) @ 0 (0) ---
futex(0xf72cb648, FUTEX_WAIT, 2, NULL)  = ? ERESTARTSYS (To be restarted)
--- SIGWINCH (Window changed) @ 0 (0) ---
futex(0xf72cb648, FUTEX_WAIT, 2, NULL

	Last line is not terminated.

	Regards,

	JKB

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [sparc64] Strange interaction between 2.6 kernel and 2.5 (and
  2007-07-10  7:53 [sparc64] Strange interaction between 2.6 kernel and 2.5 (and 2.6) BERTRAND Joël
                   ` (13 preceding siblings ...)
  2007-07-25  8:31 ` BERTRAND Joël
@ 2007-07-25  8:33 ` David Miller
  2007-07-28  7:59 ` BERTRAND Joël
                   ` (18 subsequent siblings)
  33 siblings, 0 replies; 35+ messages in thread
From: David Miller @ 2007-07-25  8:33 UTC (permalink / raw)
  To: sparclinux

From: BERTRAND_Joël <joel.bertrand@systella.fr>
Date: Wed, 25 Jul 2007 10:31:50 +0200

> 	I have tried your patch with icemonkey and it hangs (in a futex). 
> Strace output ends with :

Thanks for testing, I'll keep digging.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [sparc64] Strange interaction between 2.6 kernel and 2.5 (and
  2007-07-10  7:53 [sparc64] Strange interaction between 2.6 kernel and 2.5 (and 2.6) BERTRAND Joël
                   ` (14 preceding siblings ...)
  2007-07-25  8:33 ` David Miller
@ 2007-07-28  7:59 ` BERTRAND Joël
  2007-07-28  8:07 ` David Miller
                   ` (17 subsequent siblings)
  33 siblings, 0 replies; 35+ messages in thread
From: BERTRAND Joël @ 2007-07-28  7:59 UTC (permalink / raw)
  To: sparclinux

David Miller wrote:
> From: BERTRAND_Joël <joel.bertrand@systella.fr>
> Date: Wed, 25 Jul 2007 10:31:50 +0200
> 
>> 	I have tried your patch with icemonkey and it hangs (in a futex). 
>> Strace output ends with :
> 
> Thanks for testing, I'll keep digging.

	I have tested your patch for two days. Now, I can affirm that it does 
not fix futex trouble. I have seen exactly same bug report on 
sparc-debian mailing list.

	Regards,

	JKB

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [sparc64] Strange interaction between 2.6 kernel and 2.5 (and
  2007-07-10  7:53 [sparc64] Strange interaction between 2.6 kernel and 2.5 (and 2.6) BERTRAND Joël
                   ` (15 preceding siblings ...)
  2007-07-28  7:59 ` BERTRAND Joël
@ 2007-07-28  8:07 ` David Miller
  2007-07-28  8:25 ` BERTRAND Joël
                   ` (16 subsequent siblings)
  33 siblings, 0 replies; 35+ messages in thread
From: David Miller @ 2007-07-28  8:07 UTC (permalink / raw)
  To: sparclinux

From: BERTRAND_Joël <joel.bertrand@systella.fr>
Date: Sat, 28 Jul 2007 09:59:44 +0200

> 	I have tested your patch for two days. Now, I can affirm that
> it does not fix futex trouble. I have seen exactly same bug report
> on sparc-debian mailing list.

Thanks for your continual feedback.

I want to reproduce this here, that will get this fixed most
quickly.

What version of debian do you have installed on the machine that shows
this, and what exact kind of system and cpu configuration does it
have?

I think I can reproduce this most easily using firefox, or whatever
they have renamed it to under debian, do you just startup the web
browser and it hangs soon?

Under debian gutsy I ran the same test, saw the same futex calls under
strace, and it worked fine even after hours of browsing many sites.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [sparc64] Strange interaction between 2.6 kernel and 2.5 (and
  2007-07-10  7:53 [sparc64] Strange interaction between 2.6 kernel and 2.5 (and 2.6) BERTRAND Joël
                   ` (16 preceding siblings ...)
  2007-07-28  8:07 ` David Miller
@ 2007-07-28  8:25 ` BERTRAND Joël
  2007-07-30  5:19 ` David Miller
                   ` (15 subsequent siblings)
  33 siblings, 0 replies; 35+ messages in thread
From: BERTRAND Joël @ 2007-07-28  8:25 UTC (permalink / raw)
  To: sparclinux

David Miller wrote:
> From: BERTRAND_Joël <joel.bertrand@systella.fr>
> Date: Sat, 28 Jul 2007 09:59:44 +0200
> 
>> 	I have tested your patch for two days. Now, I can affirm that
>> it does not fix futex trouble. I have seen exactly same bug report
>> on sparc-debian mailing list.
> 
> Thanks for your continual feedback.

	You're welcome.

> I want to reproduce this here, that will get this fixed most
> quickly.
> 
> What version of debian do you have installed on the machine that shows
> this, and what exact kind of system and cpu configuration does it
> have?

	I have three workstations to do some tests, all of these run 
debian/testing up to date.

1/ U2 with two Uii@296 MHz, 2 GB -> I cannot reproduced this bug
fermat:[~] > uname -a
Linux fermat 2.6.22.1 #2 SMP PREEMPT Wed Jul 11 11:01:38 CEST 2007 
sparc64 GNU/Linux

2/ U60 with two Uii@450 MHz, 1 GB -> bug
rayleigh:[~] > uname -a
Linux rayleigh 2.6.22.1 #2 SMP Wed Jul 25 09:48:50 CEST 2007 sparc64 
GNU/Linux

3/ U80 with four Uii@450 MHz, 2 GB -> bug
kant:[~] > uname -a
Linux kant 2.6.20.11-netfilter-route-patch #1 SMP Wed May 9 12:27:15 
CEST 2007 sparc64 GNU/Linux
I have patched this kernel with netfilter route patch (and no one other).

All stations run with 2.6-2 libc.

> I think I can reproduce this most easily using firefox, or whatever
> they have renamed it to under debian, do you just startup the web
> browser and it hangs soon?

	I have see a bug report with Iceweasel (debian name for firefox). Thus, 
I think you can reproduce this bug with firefox. On U60 or U80, 
Seamonkey is unusable (it quickly hangs).

	Regards,

	JKB

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [sparc64] Strange interaction between 2.6 kernel and 2.5 (and
  2007-07-10  7:53 [sparc64] Strange interaction between 2.6 kernel and 2.5 (and 2.6) BERTRAND Joël
                   ` (17 preceding siblings ...)
  2007-07-28  8:25 ` BERTRAND Joël
@ 2007-07-30  5:19 ` David Miller
  2007-07-30  7:04 ` BERTRAND Joël
                   ` (14 subsequent siblings)
  33 siblings, 0 replies; 35+ messages in thread
From: David Miller @ 2007-07-30  5:19 UTC (permalink / raw)
  To: sparclinux

From: BERTRAND_Joël <joel.bertrand@systella.fr>
Date: Sat, 28 Jul 2007 10:25:36 +0200

> 1/ U2 with two Uii@296 MHz, 2 GB -> I cannot reproduced this bug
> fermat:[~] > uname -a
> Linux fermat 2.6.22.1 #2 SMP PREEMPT Wed Jul 11 11:01:38 CEST 2007 
> sparc64 GNU/Linux
> 
> 2/ U60 with two Uii@450 MHz, 1 GB -> bug
> rayleigh:[~] > uname -a
> Linux rayleigh 2.6.22.1 #2 SMP Wed Jul 25 09:48:50 CEST 2007 sparc64 
> GNU/Linux
> 
> 3/ U80 with four Uii@450 MHz, 2 GB -> bug
> kant:[~] > uname -a
> Linux kant 2.6.20.11-netfilter-route-patch #1 SMP Wed May 9 12:27:15 
> CEST 2007 sparc64 GNU/Linux
> I have patched this kernel with netfilter route patch (and no one other).
 ...
> 	I have see a bug report with Iceweasel (debian name for firefox). Thus, 
> I think you can reproduce this bug with firefox. On U60 or U80, 
> Seamonkey is unusable (it quickly hangs).

Try as I might I cannot reproduce this on my Ultra60
with 360Mhz Ultra-II cpus.

You seem to see the problem only on the systems that have the 450Mhz
cpus, any chance you can test the ultra60 with a different cpu
variant?  Perhaps those chips are part of the problem.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [sparc64] Strange interaction between 2.6 kernel and 2.5 (and
  2007-07-10  7:53 [sparc64] Strange interaction between 2.6 kernel and 2.5 (and 2.6) BERTRAND Joël
                   ` (18 preceding siblings ...)
  2007-07-30  5:19 ` David Miller
@ 2007-07-30  7:04 ` BERTRAND Joël
  2007-07-30  7:07 ` David Miller
                   ` (13 subsequent siblings)
  33 siblings, 0 replies; 35+ messages in thread
From: BERTRAND Joël @ 2007-07-30  7:04 UTC (permalink / raw)
  To: sparclinux

David Miller wrote:
> From: BERTRAND_Joël <joel.bertrand@systella.fr>
> Date: Sat, 28 Jul 2007 10:25:36 +0200
> 
>> 1/ U2 with two Uii@296 MHz, 2 GB -> I cannot reproduced this bug
>> fermat:[~] > uname -a
>> Linux fermat 2.6.22.1 #2 SMP PREEMPT Wed Jul 11 11:01:38 CEST 2007 
>> sparc64 GNU/Linux
>>
>> 2/ U60 with two Uii@450 MHz, 1 GB -> bug
>> rayleigh:[~] > uname -a
>> Linux rayleigh 2.6.22.1 #2 SMP Wed Jul 25 09:48:50 CEST 2007 sparc64 
>> GNU/Linux
>>
>> 3/ U80 with four Uii@450 MHz, 2 GB -> bug
>> kant:[~] > uname -a
>> Linux kant 2.6.20.11-netfilter-route-patch #1 SMP Wed May 9 12:27:15 
>> CEST 2007 sparc64 GNU/Linux
>> I have patched this kernel with netfilter route patch (and no one other).
>  ...
>> 	I have see a bug report with Iceweasel (debian name for firefox). Thus, 
>> I think you can reproduce this bug with firefox. On U60 or U80, 
>> Seamonkey is unusable (it quickly hangs).
> 
> Try as I might I cannot reproduce this on my Ultra60
> with 360Mhz Ultra-II cpus.

	Very strange.

> You seem to see the problem only on the systems that have the 450Mhz
> cpus, any chance you can test the ultra60 with a different cpu
> variant?  Perhaps those chips are part of the problem.

	And we can explain why on my U2/SMP (Uii@296MHz), I cannot reproduce 
the bug. I only have Uii@296 MHz and Uii@450 MHz, and I cannot switch 
CPU's between my U2 and one of my U60 (I need a running U2). If you 
want, I can open a ssh access on one of my U60 to do some tests.
What's cache size on your CPU's ? 450 has 4 MB, 296, only 2 MB.

	Gavin, what are your CPU's ?

	Regards,

	JKB

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [sparc64] Strange interaction between 2.6 kernel and 2.5 (and
  2007-07-10  7:53 [sparc64] Strange interaction between 2.6 kernel and 2.5 (and 2.6) BERTRAND Joël
                   ` (19 preceding siblings ...)
  2007-07-30  7:04 ` BERTRAND Joël
@ 2007-07-30  7:07 ` David Miller
  2007-07-30  7:48 ` BERTRAND Joël
                   ` (12 subsequent siblings)
  33 siblings, 0 replies; 35+ messages in thread
From: David Miller @ 2007-07-30  7:07 UTC (permalink / raw)
  To: sparclinux

From: BERTRAND_Joël <joel.bertrand@systella.fr>
Date: Mon, 30 Jul 2007 09:04:27 +0200

> If you want, I can open a ssh access on one of my U60 to do some
> tests.

Thank you kindly for the offer, but this bug is stuck in the
mud until I can reproduce it here locally as that's the only
reasonable way I can work on a bug of this nature.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [sparc64] Strange interaction between 2.6 kernel and 2.5 (and
  2007-07-10  7:53 [sparc64] Strange interaction between 2.6 kernel and 2.5 (and 2.6) BERTRAND Joël
                   ` (20 preceding siblings ...)
  2007-07-30  7:07 ` David Miller
@ 2007-07-30  7:48 ` BERTRAND Joël
  2007-07-30  8:17 ` David Miller
                   ` (11 subsequent siblings)
  33 siblings, 0 replies; 35+ messages in thread
From: BERTRAND Joël @ 2007-07-30  7:48 UTC (permalink / raw)
  To: sparclinux

David Miller wrote:
> From: BERTRAND_Joël <joel.bertrand@systella.fr>
> Date: Mon, 30 Jul 2007 09:04:27 +0200
> 
>> If you want, I can open a ssh access on one of my U60 to do some
>> tests.
> 
> Thank you kindly for the offer, but this bug is stuck in the
> mud until I can reproduce it here locally as that's the only
> reasonable way I can work on a bug of this nature.

	I know. How can I help you ? This bug is very problematic. I can patch 
kernel of my U60 to debug futex if that can help you.

	Regards,

	JKB

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [sparc64] Strange interaction between 2.6 kernel and 2.5 (and
  2007-07-10  7:53 [sparc64] Strange interaction between 2.6 kernel and 2.5 (and 2.6) BERTRAND Joël
                   ` (21 preceding siblings ...)
  2007-07-30  7:48 ` BERTRAND Joël
@ 2007-07-30  8:17 ` David Miller
  2007-07-30  9:16 ` gavin duley
                   ` (10 subsequent siblings)
  33 siblings, 0 replies; 35+ messages in thread
From: David Miller @ 2007-07-30  8:17 UTC (permalink / raw)
  To: sparclinux

From: BERTRAND_Joël <joel.bertrand@systella.fr>
Date: Mon, 30 Jul 2007 09:48:30 +0200

> I know. How can I help you ? This bug is very problematic. I can
> patch kernel of my U60 to debug futex if that can help you.

I really need to confirm that the 450Mhz cpus are the type
that uniquely trigger this problem.

We need to find someone who can test alternate cpu types to
see if that makes the bug go away.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [sparc64] Strange interaction between 2.6 kernel and 2.5 (and
  2007-07-10  7:53 [sparc64] Strange interaction between 2.6 kernel and 2.5 (and 2.6) BERTRAND Joël
                   ` (22 preceding siblings ...)
  2007-07-30  8:17 ` David Miller
@ 2007-07-30  9:16 ` gavin duley
  2007-07-30  9:22 ` David Miller
                   ` (9 subsequent siblings)
  33 siblings, 0 replies; 35+ messages in thread
From: gavin duley @ 2007-07-30  9:16 UTC (permalink / raw)
  To: sparclinux

On Mon, Jul 30, 2007 at 09:04:27AM +0200, BERTRAND Joël <joel.bertrand@systella.fr> wrote:
[snip]
>> You seem to see the problem only on the systems that have the 450Mhz
>> cpus, any chance you can test the ultra60 with a different cpu
>> variant?  Perhaps those chips are part of the problem.
>
> 	And we can explain why on my U2/SMP (Uii@296MHz), I cannot reproduce the 
> bug. I only have Uii@296 MHz and Uii@450 MHz, and I cannot switch CPU's 
> between my U2 and one of my U60 (I need a running U2). If you want, I can 
> open a ssh access on one of my U60 to do some tests.
> What's cache size on your CPU's ? 450 has 4 MB, 296, only 2 MB.
>
> 	Gavin, what are your CPU's ?

My Sun box is a Sun Ultra 5 with just one processor: a TI UltraSparc IIi 
(Sabre). From memory, it is 333MHz with 2MB cache.

gavin,


[Please cc: me as I'm not on this list. Thanks.]
-- 
Wonder is the beginning of all science.
	-- Aristotle

Gavin Duley
<gduley@une.edu.au>   <gduley@turing.une.edu.au>
http://www-personal.une.edu.au/~gduley/

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [sparc64] Strange interaction between 2.6 kernel and 2.5 (and
  2007-07-10  7:53 [sparc64] Strange interaction between 2.6 kernel and 2.5 (and 2.6) BERTRAND Joël
                   ` (23 preceding siblings ...)
  2007-07-30  9:16 ` gavin duley
@ 2007-07-30  9:22 ` David Miller
  2007-07-30  9:45 ` gavin duley
                   ` (8 subsequent siblings)
  33 siblings, 0 replies; 35+ messages in thread
From: David Miller @ 2007-07-30  9:22 UTC (permalink / raw)
  To: sparclinux

From: gavin duley <gduley@une.edu.au>
Date: Mon, 30 Jul 2007 09:16:53 +0000

> My Sun box is a Sun Ultra 5 with just one processor: a TI UltraSparc
> IIi (Sabre). From memory, it is 333MHz with 2MB cache.

What exactly is your test case that fails?

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [sparc64] Strange interaction between 2.6 kernel and 2.5 (and
  2007-07-10  7:53 [sparc64] Strange interaction between 2.6 kernel and 2.5 (and 2.6) BERTRAND Joël
                   ` (24 preceding siblings ...)
  2007-07-30  9:22 ` David Miller
@ 2007-07-30  9:45 ` gavin duley
  2007-07-31  9:21 ` David Miller
                   ` (7 subsequent siblings)
  33 siblings, 0 replies; 35+ messages in thread
From: gavin duley @ 2007-07-30  9:45 UTC (permalink / raw)
  To: sparclinux

On Mon, Jul 30, 2007 at 02:22:35AM -0700, David Miller <davem@davemloft.net> wrote:
> From: gavin duley <gduley@une.edu.au>
> Date: Mon, 30 Jul 2007 09:16:53 +0000
> 
> > My Sun box is a Sun Ultra 5 with just one processor: a TI UltraSparc
> > IIi (Sabre). From memory, it is 333MHz with 2MB cache.
> 
> What exactly is your test case that fails?

The application that hangs on my machine is iceweasel. The machine is 
running Debian testing:

%:gpd@aragorn:~> uname -a                      [(pts/3) 9:26 on 07-07-30]
Linux aragorn 2.6.18-4-sparc64 #1 Mon Mar 26 11:16:07 UTC 2007 sparc64 GNU/Linux
%:gpd@aragorn:~>                               [(pts/3) 9:40 on 07-07-30]

If I run icewease with strace, it gives the following output when it hangs 
(this also shows that I used kill to kill iceweasel after it hung):

gettimeofday({1185286472, 656275}, NULL) = 0
futex(0xc71d84, FUTEX_WAKE, 1)          = 1
futex(0xc71d80, FUTEX_WAKE, 1)          = 1
read(5, 0xff04d257, 1)                  = -1 EAGAIN (Resource temporarily 
unavailable)
ioctl(3, 0x4004667f, 0xff04d344)        = 0
poll([{fd=3, events=POLLIN}, {fd\x10, events=POLLIN}, {fd\x14, 
events=POLLIN|POLLPRI}, {fd\x16, events=POLLIN|POLLPRI}, {fd\x17, 
events=POLLIN|POLLPRI}, {fd\x18, events=POLLIN|POLLPRI}, {fd=5, 
events=POLLIN, revents=POLLIN}], 7, -1) = 1
gettimeofday({1185286474, 657891}, NULL) = 0
futex(0xccfc54, FUTEX_WAIT, 1, NULL)    = -1 EINTR (Interrupted system 
call)
--- SIGHUP (Hangup) @ 0 (0) ---
unlink("/home/gpd/.mozilla/firefox/7em5vrni.default/lock") = 0
rt_sigaction(SIGHUP, {SIG_DFL}, NULL, 0xf7dcccf8, 0) = 0
rt_sigprocmask(SIG_UNBLOCK, [HUP], NULL, 8) = 0
tgkill(10471, 10471, SIGHUP)            = 0
--- SIGHUP (Hangup) @ 0 (0) ---
+++ killed by SIGHUP +++

gavin,

[Please cc: me as I'm not on this list. Thanks.]
-- 
Wonder is the beginning of all science.
	-- Aristotle

Gavin Duley
<gduley@une.edu.au>   <gduley@turing.une.edu.au>
http://www-personal.une.edu.au/~gduley/

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [sparc64] Strange interaction between 2.6 kernel and 2.5 (and
  2007-07-10  7:53 [sparc64] Strange interaction between 2.6 kernel and 2.5 (and 2.6) BERTRAND Joël
                   ` (25 preceding siblings ...)
  2007-07-30  9:45 ` gavin duley
@ 2007-07-31  9:21 ` David Miller
  2007-07-31  9:37 ` BERTRAND Joël
                   ` (6 subsequent siblings)
  33 siblings, 0 replies; 35+ messages in thread
From: David Miller @ 2007-07-31  9:21 UTC (permalink / raw)
  To: sparclinux

From: gavin duley <gduley@une.edu.au>
Date: Mon, 30 Jul 2007 09:45:12 +0000

> On Mon, Jul 30, 2007 at 02:22:35AM -0700, David Miller <davem@davemloft.net> wrote:
> > From: gavin duley <gduley@une.edu.au>
> > Date: Mon, 30 Jul 2007 09:16:53 +0000
> > 
> > > My Sun box is a Sun Ultra 5 with just one processor: a TI UltraSparc
> > > IIi (Sabre). From memory, it is 333MHz with 2MB cache.
> > 
> > What exactly is your test case that fails?
> 
> The application that hangs on my machine is iceweasel. The machine is 
> running Debian testing:

So I installed debian/testing on my ultra5, and ran iceweasel
both remotely over SSH and locally under gnome desktop,
I could get neither to hang.

Now, my ultra5 has a 270MHZ Ultra-IIi compared to your higher
speed chip, but I'm thinking that the cpu type is a red herring.

I think you guys have something enabled or in use in your
installations that I don't.

Do you happen to be using NIS, or NFS, or something unique like that?
If you have some firewalling enabled, can you retest with it disabled?

Thanks.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [sparc64] Strange interaction between 2.6 kernel and 2.5 (and
  2007-07-10  7:53 [sparc64] Strange interaction between 2.6 kernel and 2.5 (and 2.6) BERTRAND Joël
                   ` (26 preceding siblings ...)
  2007-07-31  9:21 ` David Miller
@ 2007-07-31  9:37 ` BERTRAND Joël
  2007-07-31  9:42 ` David Miller
                   ` (5 subsequent siblings)
  33 siblings, 0 replies; 35+ messages in thread
From: BERTRAND Joël @ 2007-07-31  9:37 UTC (permalink / raw)
  To: sparclinux

David Miller wrote:
> From: gavin duley <gduley@une.edu.au>
> Date: Mon, 30 Jul 2007 09:45:12 +0000
> 
>> On Mon, Jul 30, 2007 at 02:22:35AM -0700, David Miller <davem@davemloft.net> wrote:
>>> From: gavin duley <gduley@une.edu.au>
>>> Date: Mon, 30 Jul 2007 09:16:53 +0000
>>>
>>>> My Sun box is a Sun Ultra 5 with just one processor: a TI UltraSparc
>>>> IIi (Sabre). From memory, it is 333MHz with 2MB cache.
>>> What exactly is your test case that fails?
>> The application that hangs on my machine is iceweasel. The machine is 
>> running Debian testing:
> 
> So I installed debian/testing on my ultra5, and ran iceweasel
> both remotely over SSH and locally under gnome desktop,
> I could get neither to hang.
> 
> Now, my ultra5 has a 270MHZ Ultra-IIi compared to your higher
> speed chip, but I'm thinking that the cpu type is a red herring.
> 
> I think you guys have something enabled or in use in your
> installations that I don't.
> 
> Do you happen to be using NIS, or NFS, or something unique like that?
> If you have some firewalling enabled, can you retest with it disabled?

	David,

	My U2 is a NIS and NFS client and iceape works fine. NIS and NFS server 
is on my U60 (and on this workstation, iceape hangs). All accounts on 
U60 are local.

	For information, my U60 runs in runlevel 3 :
Root rayleigh:[/etc/rc3.d] > ls
README               S20cupsys      S20smartmontools      S50wu-ftpd
S09wd_keepalive      S20dbus        S20snmpd              S89anacron
S10sysklogd          S20dbus-1      S20xfs                S89atd
S11klogd             S20greylist    S20xinetd             S89cron
S15bind9             S20jabber      S20xprint             S89watchdog
S16ssh               S20mailman     S21fam                S90apache
S17iproute2          S20makedev     S21hddtemp            S99fail2ban
S18portmap           S20mimedefang  S21sendmail           S99fetchmail
S19mysql             S20mplayer     S23ntp                S99rc.local
S19mysql-ndb-mgm     S20mysql-ndb   S25mdadm              S99rmnologin
S19nis               S20nfs-common  S25nfs-kernel-server  S99stop-bootlogd
S20clamav-daemon     S20QoS         S30gdm                S99udev-fixes
S20clamav-freshclam  S20saslauthd   S30iptables

	Regards,

	JKB

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [sparc64] Strange interaction between 2.6 kernel and 2.5 (and
  2007-07-10  7:53 [sparc64] Strange interaction between 2.6 kernel and 2.5 (and 2.6) BERTRAND Joël
                   ` (27 preceding siblings ...)
  2007-07-31  9:37 ` BERTRAND Joël
@ 2007-07-31  9:42 ` David Miller
  2007-07-31  9:52 ` BERTRAND Joël
                   ` (4 subsequent siblings)
  33 siblings, 0 replies; 35+ messages in thread
From: David Miller @ 2007-07-31  9:42 UTC (permalink / raw)
  To: sparclinux

From: BERTRAND_Joël <joel.bertrand@systella.fr>
Date: Tue, 31 Jul 2007 11:37:00 +0200

> 	My U2 is a NIS and NFS client and iceape works fine. NIS and
> NFS server is on my U60 (and on this workstation, iceape hangs). All
> accounts on U60 are local.

You mentioned you are using netfilter, can you reboot with your
filtering rules disabled and retest?

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [sparc64] Strange interaction between 2.6 kernel and 2.5 (and
  2007-07-10  7:53 [sparc64] Strange interaction between 2.6 kernel and 2.5 (and 2.6) BERTRAND Joël
                   ` (28 preceding siblings ...)
  2007-07-31  9:42 ` David Miller
@ 2007-07-31  9:52 ` BERTRAND Joël
  2007-07-31  9:54 ` David Miller
                   ` (3 subsequent siblings)
  33 siblings, 0 replies; 35+ messages in thread
From: BERTRAND Joël @ 2007-07-31  9:52 UTC (permalink / raw)
  To: sparclinux

David Miller wrote:
> From: BERTRAND_Joël <joel.bertrand@systella.fr>
> Date: Tue, 31 Jul 2007 11:37:00 +0200
> 
>> 	My U2 is a NIS and NFS client and iceape works fine. NIS and
>> NFS server is on my U60 (and on this workstation, iceape hangs). All
>> accounts on U60 are local.
> 
> You mentioned you are using netfilter, can you reboot with your
> filtering rules disabled and retest?

	I cannot reboot this workstation now. Is it possible to only stop 
netfilter ?

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [sparc64] Strange interaction between 2.6 kernel and 2.5 (and
  2007-07-10  7:53 [sparc64] Strange interaction between 2.6 kernel and 2.5 (and 2.6) BERTRAND Joël
                   ` (29 preceding siblings ...)
  2007-07-31  9:52 ` BERTRAND Joël
@ 2007-07-31  9:54 ` David Miller
  2007-07-31 10:28 ` [sparc64] Strange interaction between 2.6 kernel and 2.5 (and 2.6) glibc gavin duley
                   ` (2 subsequent siblings)
  33 siblings, 0 replies; 35+ messages in thread
From: David Miller @ 2007-07-31  9:54 UTC (permalink / raw)
  To: sparclinux

From: BERTRAND_Joël <joel.bertrand@systella.fr>
Date: Tue, 31 Jul 2007 11:52:59 +0200

> 	I cannot reboot this workstation now. Is it possible to only
> stop netfilter ?

It would be an interesting test to just stop netfilter, but not
conclusive because if netfilter has corrupted something already only a
reboot would ensure it did not occur.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [sparc64] Strange interaction between 2.6 kernel and 2.5 (and 2.6) glibc
  2007-07-10  7:53 [sparc64] Strange interaction between 2.6 kernel and 2.5 (and 2.6) BERTRAND Joël
                   ` (30 preceding siblings ...)
  2007-07-31  9:54 ` David Miller
@ 2007-07-31 10:28 ` gavin duley
  2007-07-31 11:04 ` [sparc64] Strange interaction between 2.6 kernel and 2.5 (and BERTRAND Joël
  2007-07-31 20:12 ` David Miller
  33 siblings, 0 replies; 35+ messages in thread
From: gavin duley @ 2007-07-31 10:28 UTC (permalink / raw)
  To: sparclinux

On 31 Jul 2007, at 19:21, David Miller wrote:

> From: gavin duley <gduley@une.edu.au>
> Date: Mon, 30 Jul 2007 09:45:12 +0000
>
>> On Mon, Jul 30, 2007 at 02:22:35AM -0700, David Miller  
>> <davem@davemloft.net> wrote:
>>> From: gavin duley <gduley@une.edu.au>
>>> Date: Mon, 30 Jul 2007 09:16:53 +0000
>>>
>>>> My Sun box is a Sun Ultra 5 with just one processor: a TI  
>>>> UltraSparc
>>>> IIi (Sabre). From memory, it is 333MHz with 2MB cache.
>>>
>>> What exactly is your test case that fails?
>>
>> The application that hangs on my machine is iceweasel. The machine is
>> running Debian testing:
>
> So I installed debian/testing on my ultra5, and ran iceweasel
> both remotely over SSH and locally under gnome desktop,
> I could get neither to hang.

 From memory, the times iceweasel has hung has generally been after  
it's been running for over a day, though with only a few tabs open.  
Maybe it needs to be left running a little while?

I'm just trying running iceweasel with strace again anyway.

> Now, my ultra5 has a 270MHZ Ultra-IIi compared to your higher
> speed chip, but I'm thinking that the cpu type is a red herring.
>
> I think you guys have something enabled or in use in your
> installations that I don't.
>
> Do you happen to be using NIS, or NFS, or something unique like that?
> If you have some firewalling enabled, can you retest with it disabled?

I'm not using NFS or NIS. At the moment, I don't have a firewall on  
that machine (something I should fix soon, but it's only accessible  
to people on the same network as me so it's not a major priority).  
It's just running a fairly standard install of Debian testing. The  
only slightly unusual thing is that I am using lvm, but I don't think  
that's the problem.

Network-wise, everything is fairly standard, with the IP address  
being got via DHCP.

gavin,

--
You can't go around building a better world for people. Only people
can build a better world for people. Otherwise it's just a cage.
                 -- Terry Pratchett, Witches Abroad

Gavin Duley
<gduley@une.edu.au>   <gduley@turing.une.edu.au>
http://www-personal.une.edu.au/~gduley/



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [sparc64] Strange interaction between 2.6 kernel and 2.5 (and
  2007-07-10  7:53 [sparc64] Strange interaction between 2.6 kernel and 2.5 (and 2.6) BERTRAND Joël
                   ` (31 preceding siblings ...)
  2007-07-31 10:28 ` [sparc64] Strange interaction between 2.6 kernel and 2.5 (and 2.6) glibc gavin duley
@ 2007-07-31 11:04 ` BERTRAND Joël
  2007-07-31 20:12 ` David Miller
  33 siblings, 0 replies; 35+ messages in thread
From: BERTRAND Joël @ 2007-07-31 11:04 UTC (permalink / raw)
  To: sparclinux

David Miller wrote:
> From: BERTRAND_Joël <joel.bertrand@systella.fr>
> Date: Tue, 31 Jul 2007 11:52:59 +0200
> 
>> 	I cannot reboot this workstation now. Is it possible to only
>> stop netfilter ?
> 
> It would be an interesting test to just stop netfilter, but not
> conclusive because if netfilter has corrupted something already only a
> reboot would ensure it did not occur.

	David,

	I have rebooted my U60 without fail2ban and wihtout iptables. iceape 
quickly hangs too. Last line in strace output is

	futex(0xf3e00010, FUTEX_WAIT, 2, NULL

	I use CONFIG_DEFAULT_CFQ=y.

	Regards,

	JKB

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [sparc64] Strange interaction between 2.6 kernel and 2.5 (and
  2007-07-10  7:53 [sparc64] Strange interaction between 2.6 kernel and 2.5 (and 2.6) BERTRAND Joël
                   ` (32 preceding siblings ...)
  2007-07-31 11:04 ` [sparc64] Strange interaction between 2.6 kernel and 2.5 (and BERTRAND Joël
@ 2007-07-31 20:12 ` David Miller
  33 siblings, 0 replies; 35+ messages in thread
From: David Miller @ 2007-07-31 20:12 UTC (permalink / raw)
  To: sparclinux

From: BERTRAND_Joël <joel.bertrand@systella.fr>
Date: Tue, 31 Jul 2007 13:04:44 +0200

> 	I have rebooted my U60 without fail2ban and wihtout iptables. iceape 
> quickly hangs too. Last line in strace output is
> 
> 	futex(0xf3e00010, FUTEX_WAIT, 2, NULL
> 
> 	I use CONFIG_DEFAULT_CFQ=y.

Thanks for testing.

^ permalink raw reply	[flat|nested] 35+ messages in thread

end of thread, other threads:[~2007-07-31 20:12 UTC | newest]

Thread overview: 35+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-07-10  7:53 [sparc64] Strange interaction between 2.6 kernel and 2.5 (and 2.6) BERTRAND Joël
2007-07-10 17:49 ` [sparc64] Strange interaction between 2.6 kernel and 2.5 (and David Miller
2007-07-10 18:03 ` BERTRAND Joël
2007-07-10 18:05 ` BERTRAND Joël
2007-07-11 10:24 ` BERTRAND Joël
2007-07-11 18:56 ` BERTRAND Joël
2007-07-11 20:43 ` David Miller
2007-07-12  7:17 ` BERTRAND Joël
2007-07-12  9:11 ` BERTRAND Joël
2007-07-12  9:38 ` David Miller
2007-07-12  9:46 ` David Miller
2007-07-12  9:50 ` BERTRAND Joël
2007-07-25  6:19 ` David Miller
2007-07-25  6:58 ` BERTRAND Joël
2007-07-25  8:31 ` BERTRAND Joël
2007-07-25  8:33 ` David Miller
2007-07-28  7:59 ` BERTRAND Joël
2007-07-28  8:07 ` David Miller
2007-07-28  8:25 ` BERTRAND Joël
2007-07-30  5:19 ` David Miller
2007-07-30  7:04 ` BERTRAND Joël
2007-07-30  7:07 ` David Miller
2007-07-30  7:48 ` BERTRAND Joël
2007-07-30  8:17 ` David Miller
2007-07-30  9:16 ` gavin duley
2007-07-30  9:22 ` David Miller
2007-07-30  9:45 ` gavin duley
2007-07-31  9:21 ` David Miller
2007-07-31  9:37 ` BERTRAND Joël
2007-07-31  9:42 ` David Miller
2007-07-31  9:52 ` BERTRAND Joël
2007-07-31  9:54 ` David Miller
2007-07-31 10:28 ` [sparc64] Strange interaction between 2.6 kernel and 2.5 (and 2.6) glibc gavin duley
2007-07-31 11:04 ` [sparc64] Strange interaction between 2.6 kernel and 2.5 (and BERTRAND Joël
2007-07-31 20:12 ` David Miller

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.