From: Eric Dumazet <eric.dumazet@gmail.com>
To: Changli Gao <xiaosuo@gmail.com>
Cc: "\"Oleg A. Arkhangelsky\"" <sysoleg@yandex.ru>,
Patrick McHardy <kaber@trash.net>,
netfilter-devel@vger.kernel.org, netdev@vger.kernel.org,
Paul E McKenney <paulmck@linux.vnet.ibm.com>
Subject: Re: Kernel panic nf_nat_setup_info+0x5b3/0x6e0
Date: Fri, 01 Apr 2011 07:30:40 +0200 [thread overview]
Message-ID: <1301635840.2881.19.camel@edumazet-laptop> (raw)
In-Reply-To: <AANLkTinBn+dKCaHN2Hq7kOocfMsvCO8wBpOgWd3VBV3g@mail.gmail.com>
Le vendredi 01 avril 2011 à 10:02 +0800, Changli Gao a écrit :
> On Thu, Mar 31, 2011 at 10:47 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> >
> > I wonder if this is not hiding another bug.
> >
> > Adding an RCU grace period might reduce the probability window.
>
> What bug? This one?
>
Yes. I am saying your patch is a brown paper bag.
Real bug is elsewhere, and we must find it, not make it happen less
frequently.
Maybe its a missing barrier, and adding a full RCU grace period is a
waste (of cpu caches space, since call_rcu() fill a potential big list)
> >
> > By the time nf_conntrack_free(ct) is called, no other cpu/thread
> > could/should use ct, or ct->ext ?
>
> nat->ct may refer it.
>
conntrack must have a real problem of refcounting then.
Each time an object A has a reference to object B, we should increase B
refcount, so B cannot disappear.
nat->ct _cannot_ refer ct if we are freeing ct. Thats quite simple rule.
> >
> > Sure, another thread can find/pass_on ct in a lookup but should not use
> > it, since its refcount (ct_general.use) should be 0.
> >
> >
>
> As SLAB_DESTROY_BY_RCU is used, we should validate ct->orig_tuple too.
>
> There is another minor problem.
>
minor or serious ?
> nf_nat_core.c:
>
> 133 rcu_read_lock();
> 134 hlist_for_each_entry_rcu(nat, n,
> &net->ipv4.nat_bysource[h], bysourc e) {
> 135 ct = nat->ct;
> 136 if (same_src(ct, tuple) && nf_ct_zone(ct) == zone) {
> 137 /* Copy source part from reply tuple. */
> 138 nf_ct_invert_tuplepr(result,
> 139
> &ct->tuplehash[IP_CT_DIR_REPLY].tuple );
> 140 result->dst = tuple->dst;
> 141
> 142 if (in_range(result, range)) {
> 143 rcu_read_unlock();
> 144 return 1;
> 145 }
> 146 }
> 147 }
> 148 rcu_read_unlock();
>
> If the ct is reused, NAT mapping will be wrong. It isn't a serious
> problem, and can't be fixed, even though we check the reference
> counter before using it, but we can't validate it with the original
> tuple.
>
I call this a serious problem. netfilter is not a fuzzy logic.
> Maybe we can do it in this way
>
> ct = nat->ct;
> if (!nf_ct_is_dying(ct) &&
> atomic_inc_not_zero(&ct->ct_general.use))) {
> if (nf_ct_ext_find(ct, NF_CT_EXT_NAT) == nat) {
> /* ct is valid, do sth... */
> }
> nf_ct_put(ct);
> }
>
> I think two additional atomic operations are expensive. It isn't a good idea.
>
It depends. This is better than taking the conntrack lock.
SLAB_DESTROY_BY_RCU is not allowing for fuzzy logic. Rules are we _must_
take a reference on object after finding it, and _recheck_ validity of
the object before using it.
Another way to avoid atomic operations in find_appropriate_src() is to
use a seqcount (or seqlock), and change the seqcount every time
something is changed in ct.
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
next prev parent reply other threads:[~2011-04-01 5:30 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-02-23 17:07 Kernel panic nf_nat_setup_info+0x5b3/0x6e0 "Oleg A. Arkhangelsky"
2011-03-02 11:37 ` Patrick McHardy
2011-03-02 14:37 ` Changli Gao
2011-03-02 19:50 ` "Oleg A. Arkhangelsky"
2011-03-03 7:33 ` Changli Gao
2011-03-26 15:44 ` Changli Gao
2011-03-31 14:03 ` "Oleg A. Arkhangelsky"
2011-03-31 14:47 ` Eric Dumazet
2011-04-01 2:02 ` Changli Gao
2011-04-01 5:30 ` Eric Dumazet [this message]
2011-04-05 11:49 ` Patrick McHardy
2011-05-21 15:42 ` Changli Gao
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1301635840.2881.19.camel@edumazet-laptop \
--to=eric.dumazet@gmail.com \
--cc=kaber@trash.net \
--cc=netdev@vger.kernel.org \
--cc=netfilter-devel@vger.kernel.org \
--cc=paulmck@linux.vnet.ibm.com \
--cc=sysoleg@yandex.ru \
--cc=xiaosuo@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox