From mboxrd@z Thu Jan 1 00:00:00 1970 From: Eric Dumazet Subject: Re: [PATCH] conntrack: use SLAB_DESTROY_BY_RCU for nf_conn structs Date: Wed, 25 Mar 2009 20:17:36 +0100 Message-ID: <49CA8350.5040407@cosmosbay.com> References: <49C77D71.8090709@trash.net> <49C780AD.70704@trash.net> <49C7CB9B.1040409@trash.net> <49C8A415.1090606@cosmosbay.com> <49C8CCF4.5050104@cosmosbay.com> <1237907850.12351.80.camel@sakura.staff.proxad.net> <49C8FBCA.40402@cosmosbay.com> <49CA6F9A.9010806@cosmosbay.com> <49CA7255.20807@trash.net> <49CA74CA.1040603@cosmosbay.com> <49CA76C4.2090409@trash.net> <49CA7DAF.9070207@cosmosbay.com> <49CA7F45.5020800@trash.n et> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-15 Content-Transfer-Encoding: QUOTED-PRINTABLE Cc: mbizon@freebox.fr, "Paul E. McKenney" , Joakim Tjernlund , avorontsov@ru.mvista.com, netdev@vger.kernel.org, Netfilter Developers To: Patrick McHardy Return-path: Received: from gw1.cosmosbay.com ([212.99.114.194]:58886 "EHLO gw1.cosmosbay.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752594AbZCYTSG convert rfc822-to-8bit (ORCPT ); Wed, 25 Mar 2009 15:18:06 -0400 In-Reply-To: <49CA7F45.5020800@trash.net> Sender: netdev-owner@vger.kernel.org List-ID: Patrick McHardy a =E9crit : > Eric Dumazet wrote: >> Here is take 2 of the patch with proper ref counting on dumping. >=20 > Thanks, one final question about the seq-file handling: >=20 >> diff --git a/net/ipv4/netfilter/nf_conntrack_l3proto_ipv4_compat.c >> b/net/ipv4/netfilter/nf_conntrack_l3proto_ipv4_compat.c >> index 6ba5c55..0b870b9 100644 >> --- a/net/ipv4/netfilter/nf_conntrack_l3proto_ipv4_compat.c >> +++ b/net/ipv4/netfilter/nf_conntrack_l3proto_ipv4_compat.c >> @@ -25,30 +25,30 @@ struct ct_iter_state { >> unsigned int bucket; >> }; >> =20 >> -static struct hlist_node *ct_get_first(struct seq_file *seq) >> +static struct hlist_nulls_node *ct_get_first(struct seq_file *seq) >> { >> struct net *net =3D seq_file_net(seq); >> struct ct_iter_state *st =3D seq->private; >> - struct hlist_node *n; >> + struct hlist_nulls_node *n; >> =20 >> for (st->bucket =3D 0; >> st->bucket < nf_conntrack_htable_size; >> st->bucket++) { >> n =3D rcu_dereference(net->ct.hash[st->bucket].first); >> - if (n) >> + if (!is_a_nulls(n)) >> return n; >> } >> return NULL; >> } >> =20 >> -static struct hlist_node *ct_get_next(struct seq_file *seq, >> - struct hlist_node *head) >> +static struct hlist_nulls_node *ct_get_next(struct seq_file *seq, >> + struct hlist_nulls_node *head) >> { >> struct net *net =3D seq_file_net(seq); >> struct ct_iter_state *st =3D seq->private; >> =20 >> head =3D rcu_dereference(head->next); >> - while (head =3D=3D NULL) { >> + while (is_a_nulls(head)) { >> if (++st->bucket >=3D nf_conntrack_htable_size) >> return NULL; >> head =3D rcu_dereference(net->ct.hash[st->bucket].first); >> @@ -56,9 +56,9 @@ static struct hlist_node *ct_get_next(struct >> seq_file *seq, >> return head; >> } >> =20 >> -static struct hlist_node *ct_get_idx(struct seq_file *seq, loff_t p= os) >> +static struct hlist_nulls_node *ct_get_idx(struct seq_file *seq, >> loff_t pos) >> { >> - struct hlist_node *head =3D ct_get_first(seq); >> + struct hlist_nulls_node *head =3D ct_get_first(seq); >> =20 >> if (head) >> while (pos && (head =3D ct_get_next(seq, head))) >> @@ -87,69 +87,76 @@ static void ct_seq_stop(struct seq_file *s, void= *v) >> =20 >> static int ct_seq_show(struct seq_file *s, void *v) >> { >> - const struct nf_conntrack_tuple_hash *hash =3D v; >> - const struct nf_conn *ct =3D nf_ct_tuplehash_to_ctrack(hash); >> + struct nf_conntrack_tuple_hash *hash =3D v; >> + struct nf_conn *ct =3D nf_ct_tuplehash_to_ctrack(hash); >> const struct nf_conntrack_l3proto *l3proto; >> const struct nf_conntrack_l4proto *l4proto; >> + int ret =3D 0; >> =20 >> NF_CT_ASSERT(ct); >> + if (unlikely(!atomic_inc_not_zero(&ct->ct_general.use))) >> + return 0; >=20 > Can we assume the next pointer still points to the next entry > in the same chain after the refcount dropped to zero? >=20 >=20 >=20 We are looking chain N. If we cannot atomic_inc() refcount, we got some deleted entry. If we could atomic_inc, we can meet an entry that just moved to another= chain X When hitting its end, we continue the search to the N+1 chain so we onl= y=20 skip the end of previous chain (N). We can 'forget' some entries, we ca= n print several time one given entry. We could solve this by : 1) Checking hash value : if not one expected ->=20 Going back to head of chain N, (potentially re-printing already hand= led entries) So it is not a *perfect* solution. 2) Use a locking to forbid writers (as done in UDP/TCP), but it is expe= nsive and wont solve other problem : We wont avoid emitting same entry several time anyway (this is a flaw o= f=20 current seq_file handling, since we 'count' entries to be skiped, and t= his is wrong if some entries were deleted or inserted meanwhile) We have same problem on /proc/net/udp & /proc/net/tcp, I am not sure we= should care... Also, current resizing code can give to a /proc/net/ip_conntrack reader= a problem, since hash table can switch while its doing its dumping : many entries might = be lost or regiven...