All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: [Systems] Re: Oops in tcp_sendmsg on T[12]000
@ 2007-03-06  6:46 Corey Shields
  2007-03-06 15:22 ` Jason Wever
                   ` (8 more replies)
  0 siblings, 9 replies; 10+ messages in thread
From: Corey Shields @ 2007-03-06  6:46 UTC (permalink / raw)
  To: sparclinux

On 3/5/07, Michael Marineau <mike@marineau.org> wrote:
> Built the kernel with gcc 4.1.1 and I got the oops in just a few
> minutes, trying a second time now and it is lasting longer this time
> around.

Why not try and recreate this on bender (Gentoo's T2000)?  Weeve?

-C

-- 
Corey Shields - OSU Open Source Lab
One of InfoWorld's Top100 Projects of 2006!
http://osuosl.org

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Systems] Re: Oops in tcp_sendmsg on T[12]000
  2007-03-06  6:46 [Systems] Re: Oops in tcp_sendmsg on T[12]000 Corey Shields
@ 2007-03-06 15:22 ` Jason Wever
  2007-03-12 23:58 ` Narayan Newton
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: Jason Wever @ 2007-03-06 15:22 UTC (permalink / raw)
  To: sparclinux

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Mon, 5 Mar 2007, Corey Shields wrote:

> On 3/5/07, Michael Marineau <mike@marineau.org> wrote:
>>  Built the kernel with gcc 4.1.1 and I got the oops in just a few
>>  minutes, trying a second time now and it is lasting longer this time
>>  around.
>
> Why not try and recreate this on bender (Gentoo's T2000)?  Weeve?

We're currently building release material on it but I'll schedule some 
time to try and see if Gustavo and I can replicate this.

We've been running bender with gcc-3.4.6 compiled kernels (currently 
running 2.6.20-gentoo) and we haven't encountered this issue (yet).

Cheers,
- -- 
Jason Wever
Gentoo/Sparc Team Co-Lead
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFF7YdVdKvgdVioq28RAm7WAKC6tJ/o0Yg+URqXSpIP2XdXuIuf2QCeKmqA
Y25gg0Eje2VQa5VbczN0upg=9Ptj
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Systems] Re: Oops in tcp_sendmsg on T[12]000
  2007-03-06  6:46 [Systems] Re: Oops in tcp_sendmsg on T[12]000 Corey Shields
  2007-03-06 15:22 ` Jason Wever
@ 2007-03-12 23:58 ` Narayan Newton
  2007-03-13  2:50 ` David Miller
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: Narayan Newton @ 2007-03-12 23:58 UTC (permalink / raw)
  To: sparclinux

[-- Attachment #1: Type: text/plain, Size: 1134 bytes --]

Hi,

I have been working on the same server/issue as Mike. We have found that
our kernel without Netfilter support does not have this issue, but the
moment you enable it in the kernel config this bug is triggered.
Attached are the two kernel configs. The only difference is
CONFIG_NETFILTER=y

Kernel version: 2.6.21-rc2

Let me know if I can send you anymore information.

--Narayan Newton

David Miller wrote:
> From: "Michael Marineau" <mike@marineau.org>
> Date: Tue, 6 Mar 2007 11:48:50 -0800
> 
>> I twiddled with my kernel configuration a bit to remove the need for
>> any modules to make building elsewhere and copying it over easier.
>> With this new config I am no longer able to trigger the bug (ran the
>> test over night).  I have no idea what change did this, I'll fiddle
>> more with it as soon as I have time in the next day or two.
> 
> Thanks for continuing to help track this down.
> 
> Probably it's some module that has some "use after free" memory
> allocation bug.
> _______________________________________________
> Systems mailing list
> Systems@osuosl.org
> http://lists.osuosl.org/mailman/listinfo/systems


[-- Attachment #2: config-bad.gz --]
[-- Type: application/x-gzip, Size: 6845 bytes --]

[-- Attachment #3: config-good.gz --]
[-- Type: application/x-gzip, Size: 6759 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Systems] Re: Oops in tcp_sendmsg on T[12]000
  2007-03-06  6:46 [Systems] Re: Oops in tcp_sendmsg on T[12]000 Corey Shields
  2007-03-06 15:22 ` Jason Wever
  2007-03-12 23:58 ` Narayan Newton
@ 2007-03-13  2:50 ` David Miller
  2007-03-17 20:23 ` David Miller
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: David Miller @ 2007-03-13  2:50 UTC (permalink / raw)
  To: sparclinux

From: Narayan Newton <nnewton@osuosl.org>
Date: Mon, 12 Mar 2007 16:58:56 -0700

> I have been working on the same server/issue as Mike. We have found that
> our kernel without Netfilter support does not have this issue, but the
> moment you enable it in the kernel config this bug is triggered.
> Attached are the two kernel configs. The only difference is
> CONFIG_NETFILTER=y
> 
> Kernel version: 2.6.21-rc2
> 
> Let me know if I can send you anymore information.

Thanks for the datapoint.

I have a hugetlbfs lockup and a few networking things to attend
to, but after that I'll try again to reproduce the problem
locally.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Systems] Re: Oops in tcp_sendmsg on T[12]000
  2007-03-06  6:46 [Systems] Re: Oops in tcp_sendmsg on T[12]000 Corey Shields
                   ` (2 preceding siblings ...)
  2007-03-13  2:50 ` David Miller
@ 2007-03-17 20:23 ` David Miller
  2007-03-19  7:41 ` David Miller
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: David Miller @ 2007-03-17 20:23 UTC (permalink / raw)
  To: sparclinux

From: Narayan Newton <nnewton@osuosl.org>
Date: Mon, 12 Mar 2007 16:58:56 -0700

> I have been working on the same server/issue as Mike. We have found that
> our kernel without Netfilter support does not have this issue, but the
> moment you enable it in the kernel config this bug is triggered.
> Attached are the two kernel configs. The only difference is
> CONFIG_NETFILTER=y
> 
> Kernel version: 2.6.21-rc2
> 
> Let me know if I can send you anymore information.

I can finally reproduce this bug!  What a relief.
Thanks for tracking it down to this config difference.

Hopefully I can figure out the cause and fix this soon.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Systems] Re: Oops in tcp_sendmsg on T[12]000
  2007-03-06  6:46 [Systems] Re: Oops in tcp_sendmsg on T[12]000 Corey Shields
                   ` (3 preceding siblings ...)
  2007-03-17 20:23 ` David Miller
@ 2007-03-19  7:41 ` David Miller
  2007-03-19 16:40 ` Gustavo Zacarias
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: David Miller @ 2007-03-19  7:41 UTC (permalink / raw)
  To: sparclinux

From: Narayan Newton <nnewton@osuosl.org>
Date: Mon, 12 Mar 2007 16:58:56 -0700

> I have been working on the same server/issue as Mike. We have found that
> our kernel without Netfilter support does not have this issue, but the
> moment you enable it in the kernel config this bug is triggered.
> Attached are the two kernel configs. The only difference is
> CONFIG_NETFILTER=y
> 
> Kernel version: 2.6.21-rc2

Ok, I think the following patch is the bug fix.  I'm running a bunch
of further stress testing to make sure this is indeed the cause of
these crashes.

Let me know if you can still trigger the bug with this patch
applied, thanks!

Assuming all goes well I'll push this upstream to Linus and
also to the -stable 2.6.x branches.

[SPARC64]: store-init needs trailing membar.

The manual says that it is required and we actually have crash reports
where loads see stale data due to not having membars here.

In one case the networking does:

	memset(skb, 0, offsetof(struct sk_buff, truesize));

and then some code later checks skb->nohdr for zero, but it's still
the value that was there before the memset().

Signed-off-by: David S. Miller <davem@davemloft.net>

diff --git a/arch/sparc64/lib/NGbzero.S b/arch/sparc64/lib/NGbzero.S
index e86baec..f10e452 100644
--- a/arch/sparc64/lib/NGbzero.S
+++ b/arch/sparc64/lib/NGbzero.S
@@ -88,6 +88,7 @@ NGbzero_loop:
 	bne,pt		%xcc, NGbzero_loop
 	 add		%o0, 64, %o0
 
+	membar		#Sync
 	wr		%o4, 0x0, %asi
 	brz,pn		%o1, NGbzero_done
 NGbzero_medium:
diff --git a/arch/sparc64/lib/NGmemcpy.S b/arch/sparc64/lib/NGmemcpy.S
index 8e522b3..66063a9 100644
--- a/arch/sparc64/lib/NGmemcpy.S
+++ b/arch/sparc64/lib/NGmemcpy.S
@@ -247,6 +247,8 @@ FUNC_NAME:	/* %o0=dst, %o1=src, %o2=len */
 	/* fall through */
 
 60:	
+	membar		#Sync
+
 	/* %o2 contains any final bytes still needed to be copied
 	 * over. If anything is left, we copy it one byte at a time.
 	 */
diff --git a/arch/sparc64/lib/NGpage.S b/arch/sparc64/lib/NGpage.S
index 7d7c3bb..8ce3a0c 100644
--- a/arch/sparc64/lib/NGpage.S
+++ b/arch/sparc64/lib/NGpage.S
@@ -41,6 +41,7 @@ NGcopy_user_page:	/* %o0Þst, %o1=src, %o2=vaddr */
 	subcc		%g7, 64, %g7
 	bne,pt		%xcc, 1b
 	 add		%o0, 32, %o0
+	membar		#Sync
 	retl
 	 nop
 
@@ -63,6 +64,7 @@ NGclear_user_page:	/* %o0Þst, %o1=vaddr */
 	subcc		%g7, 64, %g7
 	bne,pt		%xcc, 1b
 	 add		%o0, 32, %o0
+	membar		#Sync
 	retl
 	 nop
 

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [Systems] Re: Oops in tcp_sendmsg on T[12]000
  2007-03-06  6:46 [Systems] Re: Oops in tcp_sendmsg on T[12]000 Corey Shields
                   ` (4 preceding siblings ...)
  2007-03-19  7:41 ` David Miller
@ 2007-03-19 16:40 ` Gustavo Zacarias
  2007-03-19 18:58 ` David Miller
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: Gustavo Zacarias @ 2007-03-19 16:40 UTC (permalink / raw)
  To: sparclinux

David Miller wrote:

> Ok, I think the following patch is the bug fix.  I'm running a bunch
> of further stress testing to make sure this is indeed the cause of
> these crashes.
> 
> Let me know if you can still trigger the bug with this patch
> applied, thanks!
> 
> Assuming all goes well I'll push this upstream to Linus and
> also to the -stable 2.6.x branches.
> 
> [SPARC64]: store-init needs trailing membar.
> 
> The manual says that it is required and we actually have crash reports
> where loads see stale data due to not having membars here.
> 
> In one case the networking does:
> 
> 	memset(skb, 0, offsetof(struct sk_buff, truesize));
> 
> and then some code later checks skb->nohdr for zero, but it's still
> the value that was there before the memset().

Been running some intensive network loads on our T2000 for the last 
couple of hours and indeed this patch seems to fix it. Previously it 
would trigger an oops in less than 10 minutes.
Thanks.

-- 
Gustavo Zacarias
Gentoo/SPARC monkey

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Systems] Re: Oops in tcp_sendmsg on T[12]000
  2007-03-06  6:46 [Systems] Re: Oops in tcp_sendmsg on T[12]000 Corey Shields
                   ` (5 preceding siblings ...)
  2007-03-19 16:40 ` Gustavo Zacarias
@ 2007-03-19 18:58 ` David Miller
  2007-03-20 20:41 ` Narayan Newton
  2007-03-20 22:47 ` David Miller
  8 siblings, 0 replies; 10+ messages in thread
From: David Miller @ 2007-03-19 18:58 UTC (permalink / raw)
  To: sparclinux

From: Gustavo Zacarias <gustavoz@gentoo.org>
Date: Mon, 19 Mar 2007 13:40:36 -0300

> Been running some intensive network loads on our T2000 for the last 
> couple of hours and indeed this patch seems to fix it. Previously it 
> would trigger an oops in less than 10 minutes.
> Thanks.

Thanks for testing.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Systems] Re: Oops in tcp_sendmsg on T[12]000
  2007-03-06  6:46 [Systems] Re: Oops in tcp_sendmsg on T[12]000 Corey Shields
                   ` (6 preceding siblings ...)
  2007-03-19 18:58 ` David Miller
@ 2007-03-20 20:41 ` Narayan Newton
  2007-03-20 22:47 ` David Miller
  8 siblings, 0 replies; 10+ messages in thread
From: Narayan Newton @ 2007-03-20 20:41 UTC (permalink / raw)
  To: sparclinux

Hi,

I have been running netcat since around 10 AM yesterday morning on a
patched kernel with netfilter enabled and have been unable to trigger
this bug. Thank you for your work on this issue!

-- 
Narayan Newton
OSU Open Source Lab



David Miller wrote:
> From: Narayan Newton <nnewton@osuosl.org>
> Date: Mon, 12 Mar 2007 16:58:56 -0700
> 
>> I have been working on the same server/issue as Mike. We have found that
>> our kernel without Netfilter support does not have this issue, but the
>> moment you enable it in the kernel config this bug is triggered.
>> Attached are the two kernel configs. The only difference is
>> CONFIG_NETFILTER=y
>>
>> Kernel version: 2.6.21-rc2
> 
> Ok, I think the following patch is the bug fix.  I'm running a bunch
> of further stress testing to make sure this is indeed the cause of
> these crashes.
> 
> Let me know if you can still trigger the bug with this patch
> applied, thanks!
> 
> Assuming all goes well I'll push this upstream to Linus and
> also to the -stable 2.6.x branches.
> 
> [SPARC64]: store-init needs trailing membar.
> 
> The manual says that it is required and we actually have crash reports
> where loads see stale data due to not having membars here.
> 
> In one case the networking does:
> 
> 	memset(skb, 0, offsetof(struct sk_buff, truesize));
> 
> and then some code later checks skb->nohdr for zero, but it's still
> the value that was there before the memset().
> 
> Signed-off-by: David S. Miller <davem@davemloft.net>
> 
> diff --git a/arch/sparc64/lib/NGbzero.S b/arch/sparc64/lib/NGbzero.S
> index e86baec..f10e452 100644
> --- a/arch/sparc64/lib/NGbzero.S
> +++ b/arch/sparc64/lib/NGbzero.S
> @@ -88,6 +88,7 @@ NGbzero_loop:
>  	bne,pt		%xcc, NGbzero_loop
>  	 add		%o0, 64, %o0
>  
> +	membar		#Sync
>  	wr		%o4, 0x0, %asi
>  	brz,pn		%o1, NGbzero_done
>  NGbzero_medium:
> diff --git a/arch/sparc64/lib/NGmemcpy.S b/arch/sparc64/lib/NGmemcpy.S
> index 8e522b3..66063a9 100644
> --- a/arch/sparc64/lib/NGmemcpy.S
> +++ b/arch/sparc64/lib/NGmemcpy.S
> @@ -247,6 +247,8 @@ FUNC_NAME:	/* %o0=dst, %o1=src, %o2=len */
>  	/* fall through */
>  
>  60:	
> +	membar		#Sync
> +
>  	/* %o2 contains any final bytes still needed to be copied
>  	 * over. If anything is left, we copy it one byte at a time.
>  	 */
> diff --git a/arch/sparc64/lib/NGpage.S b/arch/sparc64/lib/NGpage.S
> index 7d7c3bb..8ce3a0c 100644
> --- a/arch/sparc64/lib/NGpage.S
> +++ b/arch/sparc64/lib/NGpage.S
> @@ -41,6 +41,7 @@ NGcopy_user_page:	/* %o0Þst, %o1=src, %o2=vaddr */
>  	subcc		%g7, 64, %g7
>  	bne,pt		%xcc, 1b
>  	 add		%o0, 32, %o0
> +	membar		#Sync
>  	retl
>  	 nop
>  
> @@ -63,6 +64,7 @@ NGclear_user_page:	/* %o0Þst, %o1=vaddr */
>  	subcc		%g7, 64, %g7
>  	bne,pt		%xcc, 1b
>  	 add		%o0, 32, %o0
> +	membar		#Sync
>  	retl
>  	 nop
>  

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Systems] Re: Oops in tcp_sendmsg on T[12]000
  2007-03-06  6:46 [Systems] Re: Oops in tcp_sendmsg on T[12]000 Corey Shields
                   ` (7 preceding siblings ...)
  2007-03-20 20:41 ` Narayan Newton
@ 2007-03-20 22:47 ` David Miller
  8 siblings, 0 replies; 10+ messages in thread
From: David Miller @ 2007-03-20 22:47 UTC (permalink / raw)
  To: sparclinux

From: Narayan Newton <nnewton@osuosl.org>
Date: Tue, 20 Mar 2007 13:41:10 -0700

> Hi,
> 
> I have been running netcat since around 10 AM yesterday morning on a
> patched kernel with netfilter enabled and have been unable to trigger
> this bug. Thank you for your work on this issue!

Thank you for testing.

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2007-03-20 22:47 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-03-06  6:46 [Systems] Re: Oops in tcp_sendmsg on T[12]000 Corey Shields
2007-03-06 15:22 ` Jason Wever
2007-03-12 23:58 ` Narayan Newton
2007-03-13  2:50 ` David Miller
2007-03-17 20:23 ` David Miller
2007-03-19  7:41 ` David Miller
2007-03-19 16:40 ` Gustavo Zacarias
2007-03-19 18:58 ` David Miller
2007-03-20 20:41 ` Narayan Newton
2007-03-20 22:47 ` David Miller

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.