Re: Kernel panic nf_nat_setup_info+0x5b3/0x6e0

Netdev List
 help / color / mirror / Atom feed

From: Eric Dumazet <eric.dumazet@gmail.com>
To: Changli Gao <xiaosuo@gmail.com>
Cc: "\"Oleg A. Arkhangelsky\"" <sysoleg@yandex.ru>,
	Patrick McHardy <kaber@trash.net>,
	netfilter-devel@vger.kernel.org, netdev@vger.kernel.org,
	Paul E McKenney <paulmck@linux.vnet.ibm.com>
Subject: Re: Kernel panic nf_nat_setup_info+0x5b3/0x6e0
Date: Fri, 01 Apr 2011 07:30:40 +0200	[thread overview]
Message-ID: <1301635840.2881.19.camel@edumazet-laptop> (raw)
In-Reply-To: <AANLkTinBn+dKCaHN2Hq7kOocfMsvCO8wBpOgWd3VBV3g@mail.gmail.com>

Le vendredi 01 avril 2011 à 10:02 +0800, Changli Gao a écrit :
> On Thu, Mar 31, 2011 at 10:47 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> >
> > I wonder if this is not hiding another bug.
> >
> > Adding an RCU grace period might reduce the probability window.
> 
> What bug? This one?
> 

Yes. I am saying your patch is a brown paper bag.

Real bug is elsewhere, and we must find it, not make it happen less
frequently.

Maybe its a missing barrier, and adding a full RCU grace period is a
waste (of cpu caches space, since call_rcu() fill a potential big list)

> >
> > By the time nf_conntrack_free(ct) is called, no other cpu/thread
> > could/should use ct, or ct->ext ?
> 
> nat->ct may refer it.
> 

conntrack must have a real problem of refcounting then.

Each time an object A has a reference to object B, we should increase B
refcount, so B cannot disappear.

nat->ct _cannot_ refer ct if we are freeing ct. Thats quite simple rule.

> >
> > Sure, another thread can find/pass_on ct in a lookup but should not use
> > it, since its refcount (ct_general.use) should be 0.
> >
> >
> 
> As SLAB_DESTROY_BY_RCU is used, we should validate ct->orig_tuple too.
> 
> There is another minor problem.
> 

minor or serious ?


> nf_nat_core.c:
> 
> 133         rcu_read_lock();
> 134         hlist_for_each_entry_rcu(nat, n,
> &net->ipv4.nat_bysource[h], bysourc    e) {
> 135                 ct = nat->ct;
> 136                 if (same_src(ct, tuple) && nf_ct_zone(ct) == zone) {
> 137                         /* Copy source part from reply tuple. */
> 138                         nf_ct_invert_tuplepr(result,
> 139
> &ct->tuplehash[IP_CT_DIR_REPLY].tuple    );
> 140                         result->dst = tuple->dst;
> 141
> 142                         if (in_range(result, range)) {
> 143                                 rcu_read_unlock();
> 144                                 return 1;
> 145                         }
> 146                 }
> 147         }
> 148         rcu_read_unlock();
> 
> If the ct is reused, NAT mapping will be wrong. It isn't a serious
> problem, and can't be fixed, even though we check the reference
> counter before using it, but we can't validate it with the original
> tuple.
> 

I call this a serious problem. netfilter is not a fuzzy logic.

> Maybe we can do it in this way
> 
>                 ct = nat->ct;
>                 if (!nf_ct_is_dying(ct) &&
> atomic_inc_not_zero(&ct->ct_general.use))) {
>                                if (nf_ct_ext_find(ct, NF_CT_EXT_NAT) == nat) {
>                                         /* ct is valid, do sth... */
>                                }
>                                nf_ct_put(ct);
>                 }
> 
> I think two additional atomic operations are expensive. It isn't a good idea.
> 

It depends. This is better than taking the conntrack lock.

SLAB_DESTROY_BY_RCU is not allowing for fuzzy logic. Rules are we _must_
take a reference on object after finding it, and _recheck_ validity of
the object before using it.

Another way to avoid atomic operations in find_appropriate_src() is to
use a seqcount (or seqlock), and change the seqcount every time
something is changed in ct.



--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

next prev parent reply	other threads:[~2011-04-01  5:30 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-02-23 17:07 Kernel panic nf_nat_setup_info+0x5b3/0x6e0 "Oleg A. Arkhangelsky"
2011-03-02 11:37 ` Patrick McHardy
2011-03-02 14:37   ` Changli Gao
2011-03-02 19:50     ` "Oleg A. Arkhangelsky"
2011-03-03  7:33       ` Changli Gao
2011-03-26 15:44         ` Changli Gao
2011-03-31 14:03           ` "Oleg A. Arkhangelsky"
2011-03-31 14:47             ` Eric Dumazet
2011-04-01  2:02               ` Changli Gao
2011-04-01  5:30                 ` Eric Dumazet [this message]
2011-04-05 11:49               ` Patrick McHardy
2011-05-21 15:42                 ` Changli Gao

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1301635840.2881.19.camel@edumazet-laptop \
    --to=eric.dumazet@gmail.com \
    --cc=kaber@trash.net \
    --cc=netdev@vger.kernel.org \
    --cc=netfilter-devel@vger.kernel.org \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=sysoleg@yandex.ru \
    --cc=xiaosuo@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox