netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Vernon Mauery <vernux@us.ibm.com>
To: netdev <netdev@vger.kernel.org>
Cc: LKML <linux-kernel@vger.kernel.org>,
	Dhananjay Phadke <dhananjay@netxen.com>
Subject: NetXen driver causing slab corruption in -RT kernels
Date: Tue, 18 Sep 2007 11:17:58 -0700	[thread overview]
Message-ID: <200709181117.59041.vernux@us.ibm.com> (raw)

In doing some stress testing of the NetXen driver, I found that my machine was 
dying in all sorts of weird ways.  I saw several different crashes, BUG 
messages in the TCP stack and some assert messages in the TCP stack as well.  
I really didn't think that there could be six different bugs all at once in 
the TCP/IP stack, so I started looking at possible memory corruption.

I first saw this on 2.6.16-rt22 with a backported netxen driver from 2.6.22.  
I figured I should try the latest kernel, so I tried it on 2.6.23-rc6-git7 
but could not trigger the slab corruption messages with CONFIG_DEBUG_SLAB, so 
I figured the race must only exist in the -RT kernels.  Next I tried 
2.6.23-rc6-git7-rt1 (I applied patch-2.6.23-rc4-rt1 patch to 2.6.23-rc6-git7 
and fixed the 5 failing hunks).  After an hour or so, lots of slab corruption 
messages showed up:

Slab corruption: size-2048 start=f40c4670, len=2048
Slab corruption: size-2048 start=f313cf48, len=2048
Redzone: 0x9f911029d74e35b/0x9f911029d74e35b.
Last user: [<c0166be4>](kfree+0x80/0x95)
010: 6b 6b 00 0e 1e 00 16 13 00 0e 1e 00 19 3d 08 00
020: 45 00 05 dc 92 ab 40 00 40 11 8a 5b 0a 02 02 03
030: 0a 02 02 04 80 0c 80 0d 05 c8 dc 39 00 00 00 00
040: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
050: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
060: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Prev obj: start=f313c730, len=2048
Redzone: 0xd84156c5635688c0/0xd84156c5635688c0.
Last user: [<f8f06186>](netxen_post_rx_buffers_nodb+0x62/0x1f0 [netxen_nic])
000: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a
010: 5a 5a 00 0e 1e 00 16 13 00 0e 1e 00 19 3d 08 00
Next obj: start=f313d760, len=2048
Redzone: 0xd84156c5635688c0/0xd84156c5635688c0.
Last user: [<f8f06186>](netxen_post_rx_buffers_nodb+0x62/0x1f0 [netxen_nic])
000: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a
010: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a
Slab corruption: size-2048 start=f395a6f0, len=2048
Redzone: 0x9f911029d74e35b/0x9f911029d74e35b.
Last user: [<c0166be4>](kfree+0x80/0x95)
010: 6b 6b 00 0e 1e 00 16 13 00 0e 1e 00 19 3d 08 00
020: 45 00 05 dc 92 ac 40 00 40 11 8a 5a 0a 02 02 03
030: 0a 02 02 04 80 0c 80 0d 05 c8 dc 39 00 00 00 00
040: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
050: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
060: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Next obj: start=f395af08, len=2048
Redzone: 0xd84156c5635688c0/0xd84156c5635688c0.
Last user: [<f8f06186>](netxen_post_rx_buffers_nodb+0x62/0x1f0 [netxen_nic])
000: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a
010: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a
Redzone: 0x9f911029d74e35b/0x9f911029d74e35b.
Last user: [<c0166be4>](kfree+0x80/0x95)
010: 6b 6b 00 0e 1e 00 16 13 00 0e 1e 00 19 3d 08 00
020: 45 00 05 dc 92 aa 40 00 40 11 8a 5c 0a 02 02 03
030: 0a 02 02 04 80 0c 80 0d 05 c8 dc 39 00 00 00 00
040: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
050: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
060: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Next obj: start=f40c4e88, len=2048
Redzone: 0xd84156c5635688c0/0xd84156c5635688c0.
Last user: [<f8f06186>](netxen_post_rx_buffers_nodb+0x62/0x1f0 [netxen_nic])
000: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a
010: 5a 5a 00 0e 1e 00 16 13 00 0e 1e 00 19 3d 08 00

The stress test that I am running is basically a mixed bag of stuff I threw 
together.  It runs eight concurrent netperf TCP streams and two concurrent 
UDP streams in both directions, (and on both 10GbE interfaces), ping -f in 
both directions, some more disk/cpu loads in the background and a little bit 
of NFS traffic thrown in for good measure.

--Vernon

             reply	other threads:[~2007-09-18 18:18 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-09-18 18:17 Vernon Mauery [this message]
2007-09-19  6:43 ` NetXen driver causing slab corruption in -RT kernels Dhananjay Phadke

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=200709181117.59041.vernux@us.ibm.com \
    --to=vernux@us.ibm.com \
    --cc=dhananjay@netxen.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).