netfilter-devel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jon Masters <jonathan@jonmasters.org>
To: Eric Dumazet <eric.dumazet@gmail.com>
Cc: linux-kernel <linux-kernel@vger.kernel.org>,
	netdev <netdev@vger.kernel.org>,
	netfilter-devel <netfilter-devel@vger.kernel.org>,
	Patrick McHardy <kaber@trash.net>
Subject: Re: debug: nt_conntrack and KVM crash
Date: Mon, 01 Feb 2010 04:32:25 -0500	[thread overview]
Message-ID: <1265016745.7499.144.camel@tonnant> (raw)
In-Reply-To: <1264834704.2919.3.camel@edumazet-laptop>

On Sat, 2010-01-30 at 07:58 +0100, Eric Dumazet wrote:
> Le vendredi 29 janvier 2010 à 20:59 -0500, Jon Masters a écrit :
> > On Fri, 2010-01-29 at 20:57 -0500, Jon Masters wrote:
> > 
> > > Ah so I should have realized before but I wasn't looking at valid values
> > > for the range of the hashtable yet, nf_conntrack_htable_size is getting
> > > wildly out of whack. It goes from:
> > > 
> > > (gdb) print nf_conntrack_hash_rnd
> > > $1 = 2688505299
> > > (gdb) print nf_conntrack_htable_size
> > > $2 = 16384
> > > 
> > > nf_conntrack_events: 1
> > > nf_conntrack_max: 65536
> > > 
> > > Shortly after booting, before being NULLed shortly after starting some
> > > virtual machines (the hash isn't reset, whereas it is recomputed if the
> > > hashtable is re-initialized after an intentional resizing operation):
> > 
> > I mean the *seed* isn't changed, so I don't think it was resized
> > intentionally. I wonder where else htable_size is fiddled with.

> This rings a bell here, since another crash analysis on another problem
> suggested to me a potential problem with read_mostly and modules, but I
> had no time to confirm the thing yet.

It gets more interesting, and this occurs with the code builtin anyway
(I build in to make it easier to kgdb the result conveniently), so I
don't think that's an issue...but...

I hacked up a per-namespace version of hashtables (this needs doing
anyway, since the global stuff is just waiting to break) but then
noticed that the built kernel always ends up linked roughly (the
nf_conntrack_default_htable_size is a direct rename of the existing
htable_size and is now simply the initial size for new hashtables - they
can then have their own sizes independently of this global):

00000000000074c8 l     O .data.read_mostly      0000000000000008
nf_conntrack_cachep
00000000000074d0 g     O .data.read_mostly      0000000000000198
nf_conntrack_untracked
0000000000007668 g     O .data.read_mostly      0000000000000004
nf_conntrack_default_htable_size
000000000000766c g     O .data.read_mostly      0000000000000004
nf_conntrack_default_max

In some of my runs, I've been seeing nf_conntrack_default_htable_size
get corrupted with a value that just happens to be the address of
nf_conntrack_cachep. I looked over the RCU handling and the cache
allocation/de-allocation, but didn't see anything yet. And then I'm not
sure why this address would happen to get written there? It immediately
follows nf_conntrack_untracked so I looked over what happens to that
struct (including the memset, etc.) and didn't see anything either.

Like I said, I dumped the memory with kgdb in a number of runs both
"before" and "after" for the entire page surrounding the corruption and
the only real difference is this change to the value immediately
following nf_conntrack_untracked. There was also a decrement of the
reference count on untracked (I think that's normal? It's like a
catchall for when a connection isn't being tracking anywhere else) so
I'm still looking to weird freeing.

Anyway. It looks like we have a few issues:

1). The conntrack code needs to be looked at for namespaces. I have some
work in progress patches for hashing I can send along later. But that's
just a start really for someone who knows that piece a little better.

2). Some other weird memory corruption of that specific address. Most of
the other people who've had this problem don't have dumps or kgdb.

Jon.


--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

  parent reply	other threads:[~2010-02-01  9:32 UTC|newest]

Thread overview: 63+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-01-30  1:10 debug: nt_conntrack and KVM crash Jon Masters
2010-01-30  1:57 ` Jon Masters
2010-01-30  1:59   ` Jon Masters
2010-01-30  6:58     ` Eric Dumazet
2010-01-30  7:36       ` Jon Masters
2010-01-30  7:40         ` Jon Masters
2010-01-30  8:33         ` Eric Dumazet
2010-01-30 10:03           ` Jon Masters
2010-02-01  9:32       ` Jon Masters [this message]
2010-02-01  9:36         ` Alexey Dobriyan
2010-02-01 10:12           ` Eric Dumazet
2010-02-01 10:25             ` Alexey Dobriyan
2010-02-01 10:38               ` Jon Masters
2010-02-01 11:23               ` Eric Dumazet
2010-02-01 14:48                 ` Alexey Dobriyan
2010-02-01 14:57                   ` Eric Dumazet
2010-02-01 14:52                 ` [PATCH] netfilter: per netns nf_conntrack_cachep Eric Dumazet
2010-02-01 14:58                   ` Alexey Dobriyan
2010-02-01 15:02                     ` Eric Dumazet
2010-02-02 11:04                       ` Jon Masters
2010-02-02 11:35                         ` Jon Masters
2010-02-02 16:46                           ` Jon Masters
2010-02-02 16:48                             ` Patrick McHardy
2010-02-02 17:07                               ` Jon Masters
2010-02-02 17:58                                 ` Alexey Dobriyan
2010-02-02 18:16                                   ` Jon Masters
2010-02-02 18:34                                     ` Jon Masters
2010-02-02 18:36                                     ` Patrick McHardy
2010-02-02 18:39                                       ` Jon Masters
2010-02-02 18:42                                         ` Jon Masters
2010-02-03 12:10                                       ` Patrick McHardy
2010-02-03 18:38                                         ` Jon Masters
2010-02-03 19:09                                           ` Alexey Dobriyan
2010-02-03 19:43                                             ` Jon Masters
2010-02-03 19:46                                               ` Jon Masters
2010-02-03 19:53                                                 ` Alexey Dobriyan
2010-02-03 20:04                                                   ` Jon Masters
2010-02-03 19:51                                               ` Alexey Dobriyan
2010-02-03 19:53                                                 ` Jon Masters
2010-02-03 20:01                                                   ` Alexey Dobriyan
2010-02-04 12:25                                               ` Patrick McHardy
2010-02-04 12:27                                                 ` Alexey Dobriyan
2010-02-04 12:30                                                   ` Patrick McHardy
2010-02-04 12:35                                                     ` Alexey Dobriyan
2010-02-04 13:04                                                       ` Patrick McHardy
2010-02-04 13:18                                                         ` Jon Masters
2010-02-04 13:37                                                           ` Patrick McHardy
2010-02-04 13:42                                                             ` Jon Masters
2010-02-03 20:21                                         ` Jon Masters
2010-02-04 12:24                                           ` Patrick McHardy
2010-02-02 16:58                             ` PROBLEM with summary: " Jon Masters
2010-02-02 17:04                               ` Patrick McHardy
2010-02-02 17:16                                 ` Eric Dumazet
2010-02-02 17:23                                   ` Jon Masters
2010-02-02  4:36                   ` Jon Masters
2010-02-02  7:02                     ` Jon Masters
2010-02-02 10:47                   ` Jon Masters
2010-02-04 14:00                   ` Patrick McHardy
2010-02-01 10:35           ` debug: nt_conntrack and KVM crash Jon Masters
2010-02-01 10:44             ` Alexey Dobriyan
2010-02-01 10:47               ` Alexey Dobriyan
2010-02-01 10:49                 ` Alexey Dobriyan
2010-02-01 10:53                   ` Jon Masters

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1265016745.7499.144.camel@tonnant \
    --to=jonathan@jonmasters.org \
    --cc=eric.dumazet@gmail.com \
    --cc=kaber@trash.net \
    --cc=linux-kernel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=netfilter-devel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).