* Re: [Systems] Re: Oops in tcp_sendmsg on T[12]000
@ 2007-03-06 6:46 Corey Shields
2007-03-06 15:22 ` Jason Wever
` (8 more replies)
0 siblings, 9 replies; 10+ messages in thread
From: Corey Shields @ 2007-03-06 6:46 UTC (permalink / raw)
To: sparclinux
On 3/5/07, Michael Marineau <mike@marineau.org> wrote:
> Built the kernel with gcc 4.1.1 and I got the oops in just a few
> minutes, trying a second time now and it is lasting longer this time
> around.
Why not try and recreate this on bender (Gentoo's T2000)? Weeve?
-C
--
Corey Shields - OSU Open Source Lab
One of InfoWorld's Top100 Projects of 2006!
http://osuosl.org
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Systems] Re: Oops in tcp_sendmsg on T[12]000
2007-03-06 6:46 [Systems] Re: Oops in tcp_sendmsg on T[12]000 Corey Shields
@ 2007-03-06 15:22 ` Jason Wever
2007-03-12 23:58 ` Narayan Newton
` (7 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: Jason Wever @ 2007-03-06 15:22 UTC (permalink / raw)
To: sparclinux
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
On Mon, 5 Mar 2007, Corey Shields wrote:
> On 3/5/07, Michael Marineau <mike@marineau.org> wrote:
>> Built the kernel with gcc 4.1.1 and I got the oops in just a few
>> minutes, trying a second time now and it is lasting longer this time
>> around.
>
> Why not try and recreate this on bender (Gentoo's T2000)? Weeve?
We're currently building release material on it but I'll schedule some
time to try and see if Gustavo and I can replicate this.
We've been running bender with gcc-3.4.6 compiled kernels (currently
running 2.6.20-gentoo) and we haven't encountered this issue (yet).
Cheers,
- --
Jason Wever
Gentoo/Sparc Team Co-Lead
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
iD8DBQFF7YdVdKvgdVioq28RAm7WAKC6tJ/o0Yg+URqXSpIP2XdXuIuf2QCeKmqA
Y25gg0Eje2VQa5VbczN0upg=9Ptj
-----END PGP SIGNATURE-----
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Systems] Re: Oops in tcp_sendmsg on T[12]000
2007-03-06 6:46 [Systems] Re: Oops in tcp_sendmsg on T[12]000 Corey Shields
2007-03-06 15:22 ` Jason Wever
@ 2007-03-12 23:58 ` Narayan Newton
2007-03-13 2:50 ` David Miller
` (6 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: Narayan Newton @ 2007-03-12 23:58 UTC (permalink / raw)
To: sparclinux
[-- Attachment #1: Type: text/plain, Size: 1134 bytes --]
Hi,
I have been working on the same server/issue as Mike. We have found that
our kernel without Netfilter support does not have this issue, but the
moment you enable it in the kernel config this bug is triggered.
Attached are the two kernel configs. The only difference is
CONFIG_NETFILTER=y
Kernel version: 2.6.21-rc2
Let me know if I can send you anymore information.
--Narayan Newton
David Miller wrote:
> From: "Michael Marineau" <mike@marineau.org>
> Date: Tue, 6 Mar 2007 11:48:50 -0800
>
>> I twiddled with my kernel configuration a bit to remove the need for
>> any modules to make building elsewhere and copying it over easier.
>> With this new config I am no longer able to trigger the bug (ran the
>> test over night). I have no idea what change did this, I'll fiddle
>> more with it as soon as I have time in the next day or two.
>
> Thanks for continuing to help track this down.
>
> Probably it's some module that has some "use after free" memory
> allocation bug.
> _______________________________________________
> Systems mailing list
> Systems@osuosl.org
> http://lists.osuosl.org/mailman/listinfo/systems
[-- Attachment #2: config-bad.gz --]
[-- Type: application/x-gzip, Size: 6845 bytes --]
[-- Attachment #3: config-good.gz --]
[-- Type: application/x-gzip, Size: 6759 bytes --]
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Systems] Re: Oops in tcp_sendmsg on T[12]000
2007-03-06 6:46 [Systems] Re: Oops in tcp_sendmsg on T[12]000 Corey Shields
2007-03-06 15:22 ` Jason Wever
2007-03-12 23:58 ` Narayan Newton
@ 2007-03-13 2:50 ` David Miller
2007-03-17 20:23 ` David Miller
` (5 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: David Miller @ 2007-03-13 2:50 UTC (permalink / raw)
To: sparclinux
From: Narayan Newton <nnewton@osuosl.org>
Date: Mon, 12 Mar 2007 16:58:56 -0700
> I have been working on the same server/issue as Mike. We have found that
> our kernel without Netfilter support does not have this issue, but the
> moment you enable it in the kernel config this bug is triggered.
> Attached are the two kernel configs. The only difference is
> CONFIG_NETFILTER=y
>
> Kernel version: 2.6.21-rc2
>
> Let me know if I can send you anymore information.
Thanks for the datapoint.
I have a hugetlbfs lockup and a few networking things to attend
to, but after that I'll try again to reproduce the problem
locally.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Systems] Re: Oops in tcp_sendmsg on T[12]000
2007-03-06 6:46 [Systems] Re: Oops in tcp_sendmsg on T[12]000 Corey Shields
` (2 preceding siblings ...)
2007-03-13 2:50 ` David Miller
@ 2007-03-17 20:23 ` David Miller
2007-03-19 7:41 ` David Miller
` (4 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: David Miller @ 2007-03-17 20:23 UTC (permalink / raw)
To: sparclinux
From: Narayan Newton <nnewton@osuosl.org>
Date: Mon, 12 Mar 2007 16:58:56 -0700
> I have been working on the same server/issue as Mike. We have found that
> our kernel without Netfilter support does not have this issue, but the
> moment you enable it in the kernel config this bug is triggered.
> Attached are the two kernel configs. The only difference is
> CONFIG_NETFILTER=y
>
> Kernel version: 2.6.21-rc2
>
> Let me know if I can send you anymore information.
I can finally reproduce this bug! What a relief.
Thanks for tracking it down to this config difference.
Hopefully I can figure out the cause and fix this soon.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Systems] Re: Oops in tcp_sendmsg on T[12]000
2007-03-06 6:46 [Systems] Re: Oops in tcp_sendmsg on T[12]000 Corey Shields
` (3 preceding siblings ...)
2007-03-17 20:23 ` David Miller
@ 2007-03-19 7:41 ` David Miller
2007-03-19 16:40 ` Gustavo Zacarias
` (3 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: David Miller @ 2007-03-19 7:41 UTC (permalink / raw)
To: sparclinux
From: Narayan Newton <nnewton@osuosl.org>
Date: Mon, 12 Mar 2007 16:58:56 -0700
> I have been working on the same server/issue as Mike. We have found that
> our kernel without Netfilter support does not have this issue, but the
> moment you enable it in the kernel config this bug is triggered.
> Attached are the two kernel configs. The only difference is
> CONFIG_NETFILTER=y
>
> Kernel version: 2.6.21-rc2
Ok, I think the following patch is the bug fix. I'm running a bunch
of further stress testing to make sure this is indeed the cause of
these crashes.
Let me know if you can still trigger the bug with this patch
applied, thanks!
Assuming all goes well I'll push this upstream to Linus and
also to the -stable 2.6.x branches.
[SPARC64]: store-init needs trailing membar.
The manual says that it is required and we actually have crash reports
where loads see stale data due to not having membars here.
In one case the networking does:
memset(skb, 0, offsetof(struct sk_buff, truesize));
and then some code later checks skb->nohdr for zero, but it's still
the value that was there before the memset().
Signed-off-by: David S. Miller <davem@davemloft.net>
diff --git a/arch/sparc64/lib/NGbzero.S b/arch/sparc64/lib/NGbzero.S
index e86baec..f10e452 100644
--- a/arch/sparc64/lib/NGbzero.S
+++ b/arch/sparc64/lib/NGbzero.S
@@ -88,6 +88,7 @@ NGbzero_loop:
bne,pt %xcc, NGbzero_loop
add %o0, 64, %o0
+ membar #Sync
wr %o4, 0x0, %asi
brz,pn %o1, NGbzero_done
NGbzero_medium:
diff --git a/arch/sparc64/lib/NGmemcpy.S b/arch/sparc64/lib/NGmemcpy.S
index 8e522b3..66063a9 100644
--- a/arch/sparc64/lib/NGmemcpy.S
+++ b/arch/sparc64/lib/NGmemcpy.S
@@ -247,6 +247,8 @@ FUNC_NAME: /* %o0=dst, %o1=src, %o2=len */
/* fall through */
60:
+ membar #Sync
+
/* %o2 contains any final bytes still needed to be copied
* over. If anything is left, we copy it one byte at a time.
*/
diff --git a/arch/sparc64/lib/NGpage.S b/arch/sparc64/lib/NGpage.S
index 7d7c3bb..8ce3a0c 100644
--- a/arch/sparc64/lib/NGpage.S
+++ b/arch/sparc64/lib/NGpage.S
@@ -41,6 +41,7 @@ NGcopy_user_page: /* %o0Þst, %o1=src, %o2=vaddr */
subcc %g7, 64, %g7
bne,pt %xcc, 1b
add %o0, 32, %o0
+ membar #Sync
retl
nop
@@ -63,6 +64,7 @@ NGclear_user_page: /* %o0Þst, %o1=vaddr */
subcc %g7, 64, %g7
bne,pt %xcc, 1b
add %o0, 32, %o0
+ membar #Sync
retl
nop
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [Systems] Re: Oops in tcp_sendmsg on T[12]000
2007-03-06 6:46 [Systems] Re: Oops in tcp_sendmsg on T[12]000 Corey Shields
` (4 preceding siblings ...)
2007-03-19 7:41 ` David Miller
@ 2007-03-19 16:40 ` Gustavo Zacarias
2007-03-19 18:58 ` David Miller
` (2 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: Gustavo Zacarias @ 2007-03-19 16:40 UTC (permalink / raw)
To: sparclinux
David Miller wrote:
> Ok, I think the following patch is the bug fix. I'm running a bunch
> of further stress testing to make sure this is indeed the cause of
> these crashes.
>
> Let me know if you can still trigger the bug with this patch
> applied, thanks!
>
> Assuming all goes well I'll push this upstream to Linus and
> also to the -stable 2.6.x branches.
>
> [SPARC64]: store-init needs trailing membar.
>
> The manual says that it is required and we actually have crash reports
> where loads see stale data due to not having membars here.
>
> In one case the networking does:
>
> memset(skb, 0, offsetof(struct sk_buff, truesize));
>
> and then some code later checks skb->nohdr for zero, but it's still
> the value that was there before the memset().
Been running some intensive network loads on our T2000 for the last
couple of hours and indeed this patch seems to fix it. Previously it
would trigger an oops in less than 10 minutes.
Thanks.
--
Gustavo Zacarias
Gentoo/SPARC monkey
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Systems] Re: Oops in tcp_sendmsg on T[12]000
2007-03-06 6:46 [Systems] Re: Oops in tcp_sendmsg on T[12]000 Corey Shields
` (5 preceding siblings ...)
2007-03-19 16:40 ` Gustavo Zacarias
@ 2007-03-19 18:58 ` David Miller
2007-03-20 20:41 ` Narayan Newton
2007-03-20 22:47 ` David Miller
8 siblings, 0 replies; 10+ messages in thread
From: David Miller @ 2007-03-19 18:58 UTC (permalink / raw)
To: sparclinux
From: Gustavo Zacarias <gustavoz@gentoo.org>
Date: Mon, 19 Mar 2007 13:40:36 -0300
> Been running some intensive network loads on our T2000 for the last
> couple of hours and indeed this patch seems to fix it. Previously it
> would trigger an oops in less than 10 minutes.
> Thanks.
Thanks for testing.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Systems] Re: Oops in tcp_sendmsg on T[12]000
2007-03-06 6:46 [Systems] Re: Oops in tcp_sendmsg on T[12]000 Corey Shields
` (6 preceding siblings ...)
2007-03-19 18:58 ` David Miller
@ 2007-03-20 20:41 ` Narayan Newton
2007-03-20 22:47 ` David Miller
8 siblings, 0 replies; 10+ messages in thread
From: Narayan Newton @ 2007-03-20 20:41 UTC (permalink / raw)
To: sparclinux
Hi,
I have been running netcat since around 10 AM yesterday morning on a
patched kernel with netfilter enabled and have been unable to trigger
this bug. Thank you for your work on this issue!
--
Narayan Newton
OSU Open Source Lab
David Miller wrote:
> From: Narayan Newton <nnewton@osuosl.org>
> Date: Mon, 12 Mar 2007 16:58:56 -0700
>
>> I have been working on the same server/issue as Mike. We have found that
>> our kernel without Netfilter support does not have this issue, but the
>> moment you enable it in the kernel config this bug is triggered.
>> Attached are the two kernel configs. The only difference is
>> CONFIG_NETFILTER=y
>>
>> Kernel version: 2.6.21-rc2
>
> Ok, I think the following patch is the bug fix. I'm running a bunch
> of further stress testing to make sure this is indeed the cause of
> these crashes.
>
> Let me know if you can still trigger the bug with this patch
> applied, thanks!
>
> Assuming all goes well I'll push this upstream to Linus and
> also to the -stable 2.6.x branches.
>
> [SPARC64]: store-init needs trailing membar.
>
> The manual says that it is required and we actually have crash reports
> where loads see stale data due to not having membars here.
>
> In one case the networking does:
>
> memset(skb, 0, offsetof(struct sk_buff, truesize));
>
> and then some code later checks skb->nohdr for zero, but it's still
> the value that was there before the memset().
>
> Signed-off-by: David S. Miller <davem@davemloft.net>
>
> diff --git a/arch/sparc64/lib/NGbzero.S b/arch/sparc64/lib/NGbzero.S
> index e86baec..f10e452 100644
> --- a/arch/sparc64/lib/NGbzero.S
> +++ b/arch/sparc64/lib/NGbzero.S
> @@ -88,6 +88,7 @@ NGbzero_loop:
> bne,pt %xcc, NGbzero_loop
> add %o0, 64, %o0
>
> + membar #Sync
> wr %o4, 0x0, %asi
> brz,pn %o1, NGbzero_done
> NGbzero_medium:
> diff --git a/arch/sparc64/lib/NGmemcpy.S b/arch/sparc64/lib/NGmemcpy.S
> index 8e522b3..66063a9 100644
> --- a/arch/sparc64/lib/NGmemcpy.S
> +++ b/arch/sparc64/lib/NGmemcpy.S
> @@ -247,6 +247,8 @@ FUNC_NAME: /* %o0=dst, %o1=src, %o2=len */
> /* fall through */
>
> 60:
> + membar #Sync
> +
> /* %o2 contains any final bytes still needed to be copied
> * over. If anything is left, we copy it one byte at a time.
> */
> diff --git a/arch/sparc64/lib/NGpage.S b/arch/sparc64/lib/NGpage.S
> index 7d7c3bb..8ce3a0c 100644
> --- a/arch/sparc64/lib/NGpage.S
> +++ b/arch/sparc64/lib/NGpage.S
> @@ -41,6 +41,7 @@ NGcopy_user_page: /* %o0Þst, %o1=src, %o2=vaddr */
> subcc %g7, 64, %g7
> bne,pt %xcc, 1b
> add %o0, 32, %o0
> + membar #Sync
> retl
> nop
>
> @@ -63,6 +64,7 @@ NGclear_user_page: /* %o0Þst, %o1=vaddr */
> subcc %g7, 64, %g7
> bne,pt %xcc, 1b
> add %o0, 32, %o0
> + membar #Sync
> retl
> nop
>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Systems] Re: Oops in tcp_sendmsg on T[12]000
2007-03-06 6:46 [Systems] Re: Oops in tcp_sendmsg on T[12]000 Corey Shields
` (7 preceding siblings ...)
2007-03-20 20:41 ` Narayan Newton
@ 2007-03-20 22:47 ` David Miller
8 siblings, 0 replies; 10+ messages in thread
From: David Miller @ 2007-03-20 22:47 UTC (permalink / raw)
To: sparclinux
From: Narayan Newton <nnewton@osuosl.org>
Date: Tue, 20 Mar 2007 13:41:10 -0700
> Hi,
>
> I have been running netcat since around 10 AM yesterday morning on a
> patched kernel with netfilter enabled and have been unable to trigger
> this bug. Thank you for your work on this issue!
Thank you for testing.
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2007-03-20 22:47 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-03-06 6:46 [Systems] Re: Oops in tcp_sendmsg on T[12]000 Corey Shields
2007-03-06 15:22 ` Jason Wever
2007-03-12 23:58 ` Narayan Newton
2007-03-13 2:50 ` David Miller
2007-03-17 20:23 ` David Miller
2007-03-19 7:41 ` David Miller
2007-03-19 16:40 ` Gustavo Zacarias
2007-03-19 18:58 ` David Miller
2007-03-20 20:41 ` Narayan Newton
2007-03-20 22:47 ` David Miller
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.