From: Paul Barrette <paul.barrette-CWA4WttNNZF54TAoqtyWWQ@public.gmane.org>
To: Stefan Baranoff
<sbaranoff-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>,
<dev-VfR2kkLFssw@public.gmane.org>
Subject: Re: Random mbuf corruption
Date: Fri, 20 Jun 2014 09:59:58 -0400 [thread overview]
Message-ID: <53A43E5E.3030809@windriver.com> (raw)
In-Reply-To: <CAHzKxpZUOVKbCYTb66D8cQbm0ceSt7rfYo6VU3f2qhi2ZBvytQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
On 06/20/2014 07:20 AM, Stefan Baranoff wrote:
> All,
>
> We are seeing 'random' memory corruption in mbufs coming from the ixgbe UIO
> driver and I am looking for some pointers on debugging it. Our software was
> running flawlessly for weeks at a time on our old Westmere systems (CentOS
> 6.4) but since moving to a new Sandy Bridge v2 server (also CentOS 6.4) it
> runs for 1-2 minutes and then at least one mbuf is overwritten with
> arbitrary data (pointers/lengths/RSS value/num segs/etc. are all
> ridiculous). Both servers are using the 82599EB chipset (x520) and the DPDK
> version (1.6.0r2) is identical. We recently also tested on a third server
> running RHEL 6.4 with the same hardware as the failing Sandy Bridge based
> system and it is fine (days of runtime no failures).
>
> Running all of this in GDB with 'record' enabled and setting a watchpoint
> on the address which contains the corrupted data and executing a
> 'reverse-continue' never hits the watchpoint [GDB newbie here -- assuming
> 'watch *(uint64_t*)0x7FB.....' should work]. My first thought was memory
> corruption but the BIOS memcheck on the ECC RAM shows no issues.
>
> Also looking at mbuf->pkt.data, as an example, the corrupt value was the
> same 6/12 trials but I could not find that value elsewhere in the processes
> memory. This doesn't seem "random" and points to a software bug but I
> cannot for the life of me get GDB to tell me where the program is when that
> memory is written to. Incidentally trying this with the PCAP driver and
> --no-huge to run valgrind shows no memory access errors/uninitialized
> values/etc.
>
> Thoughts? Pointers? Ways to rule in/out hardware other than going 1 by 1
> removing each of the 24 DIMMs?
>
> Thanks so much in advance!
> Stefan
Run memtest to rule out bad ram.
Pb
next prev parent reply other threads:[~2014-06-20 13:59 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <CAHzKxpaxCbt9d+njdBBpwSy069zLfsOvQ5Dx0CzXLNVMKQ9AaQ@mail.gmail.com>
[not found] ` <CAHzKxpaNvZkH9h0kqYJd8VoYEXqBUfhSX9V_zUro2oX_-ioAAw@mail.gmail.com>
[not found] ` <CAHzKxpaNvZkH9h0kqYJd8VoYEXqBUfhSX9V_zUro2oX_-ioAAw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-06-20 11:20 ` Random mbuf corruption Stefan Baranoff
[not found] ` <CAHzKxpZUOVKbCYTb66D8cQbm0ceSt7rfYo6VU3f2qhi2ZBvytQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-06-20 13:59 ` Paul Barrette [this message]
[not found] ` <53A43E5E.3030809-CWA4WttNNZF54TAoqtyWWQ@public.gmane.org>
2014-06-23 21:43 ` Stefan Baranoff
[not found] ` <CAHzKxpYaUhR5ti2EDZfj7jeu8pWxhnmWM+e2D20k01NHa_u85w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-06-24 8:05 ` Gray, Mark D
2014-06-24 10:48 ` Neil Horman
[not found] ` <20140624104859.GA19229-B26myB8xz7F8NnZeBjwnZQMhkBWG/bsMQH7oEaQurus@public.gmane.org>
2014-06-24 11:01 ` Olivier MATZ
[not found] ` <53A95AA3.90408-pdR9zngts4EAvxtiuMwx3w@public.gmane.org>
2014-06-25 1:31 ` Stefan Baranoff
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=53A43E5E.3030809@windriver.com \
--to=paul.barrette-cwa4wttnnzf54taoqtywwq@public.gmane.org \
--cc=dev-VfR2kkLFssw@public.gmane.org \
--cc=sbaranoff-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.