netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Eric Dumazet <dada1@cosmosbay.com>
To: Patrick McHardy <kaber@trash.net>
Cc: mbizon@freebox.fr,
	"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>,
	Joakim Tjernlund <Joakim.Tjernlund@transmode.se>,
	avorontsov@ru.mvista.com, netdev@vger.kernel.org,
	Netfilter Developers <netfilter-devel@vger.kernel.org>
Subject: Re: [PATCH] conntrack: use SLAB_DESTROY_BY_RCU for nf_conn structs
Date: Wed, 25 Mar 2009 20:17:36 +0100	[thread overview]
Message-ID: <49CA8350.5040407@cosmosbay.com> (raw)
In-Reply-To: <49CA7F45.5020800@trash.net>

Patrick McHardy a écrit :
> Eric Dumazet wrote:
>> Here is take 2 of the patch with proper ref counting on dumping.
> 
> Thanks, one final question about the seq-file handling:
> 
>> diff --git a/net/ipv4/netfilter/nf_conntrack_l3proto_ipv4_compat.c
>> b/net/ipv4/netfilter/nf_conntrack_l3proto_ipv4_compat.c
>> index 6ba5c55..0b870b9 100644
>> --- a/net/ipv4/netfilter/nf_conntrack_l3proto_ipv4_compat.c
>> +++ b/net/ipv4/netfilter/nf_conntrack_l3proto_ipv4_compat.c
>> @@ -25,30 +25,30 @@ struct ct_iter_state {
>>      unsigned int bucket;
>>  };
>>  
>> -static struct hlist_node *ct_get_first(struct seq_file *seq)
>> +static struct hlist_nulls_node *ct_get_first(struct seq_file *seq)
>>  {
>>      struct net *net = seq_file_net(seq);
>>      struct ct_iter_state *st = seq->private;
>> -    struct hlist_node *n;
>> +    struct hlist_nulls_node *n;
>>  
>>      for (st->bucket = 0;
>>           st->bucket < nf_conntrack_htable_size;
>>           st->bucket++) {
>>          n = rcu_dereference(net->ct.hash[st->bucket].first);
>> -        if (n)
>> +        if (!is_a_nulls(n))
>>              return n;
>>      }
>>      return NULL;
>>  }
>>  
>> -static struct hlist_node *ct_get_next(struct seq_file *seq,
>> -                      struct hlist_node *head)
>> +static struct hlist_nulls_node *ct_get_next(struct seq_file *seq,
>> +                      struct hlist_nulls_node *head)
>>  {
>>      struct net *net = seq_file_net(seq);
>>      struct ct_iter_state *st = seq->private;
>>  
>>      head = rcu_dereference(head->next);
>> -    while (head == NULL) {
>> +    while (is_a_nulls(head)) {
>>          if (++st->bucket >= nf_conntrack_htable_size)
>>              return NULL;
>>          head = rcu_dereference(net->ct.hash[st->bucket].first);
>> @@ -56,9 +56,9 @@ static struct hlist_node *ct_get_next(struct
>> seq_file *seq,
>>      return head;
>>  }
>>  
>> -static struct hlist_node *ct_get_idx(struct seq_file *seq, loff_t pos)
>> +static struct hlist_nulls_node *ct_get_idx(struct seq_file *seq,
>> loff_t pos)
>>  {
>> -    struct hlist_node *head = ct_get_first(seq);
>> +    struct hlist_nulls_node *head = ct_get_first(seq);
>>  
>>      if (head)
>>          while (pos && (head = ct_get_next(seq, head)))
>> @@ -87,69 +87,76 @@ static void ct_seq_stop(struct seq_file *s, void *v)
>>  
>>  static int ct_seq_show(struct seq_file *s, void *v)
>>  {
>> -    const struct nf_conntrack_tuple_hash *hash = v;
>> -    const struct nf_conn *ct = nf_ct_tuplehash_to_ctrack(hash);
>> +    struct nf_conntrack_tuple_hash *hash = v;
>> +    struct nf_conn *ct = nf_ct_tuplehash_to_ctrack(hash);
>>      const struct nf_conntrack_l3proto *l3proto;
>>      const struct nf_conntrack_l4proto *l4proto;
>> +    int ret = 0;
>>  
>>      NF_CT_ASSERT(ct);
>> +    if (unlikely(!atomic_inc_not_zero(&ct->ct_general.use)))
>> +        return 0;
> 
> Can we assume the next pointer still points to the next entry
> in the same chain after the refcount dropped to zero?
> 
> 
> 

We are looking chain N.
If we cannot atomic_inc() refcount, we got some deleted entry.
If we could atomic_inc, we can meet an entry that just moved to another chain X

When hitting its end, we continue the search to the N+1 chain so we only 
skip the end of previous chain (N). We can 'forget' some entries, we can print
several time one given entry.


We could solve this by :

1) Checking hash value : if not one expected -> 
   Going back to head of chain N, (potentially re-printing already handled entries)
   So it is not a *perfect* solution.

2) Use a locking to forbid writers (as done in UDP/TCP), but it is expensive and
wont solve other problem :

We wont avoid emitting same entry several time anyway (this is a flaw of 
current seq_file handling, since we 'count' entries to be skiped, and this is
wrong if some entries were deleted or inserted meanwhile)

We have same problem on /proc/net/udp & /proc/net/tcp, I am not sure we should care...

Also, current resizing code can give to a /proc/net/ip_conntrack reader a problem, since
hash table can switch while its doing its dumping : many entries might be lost or regiven...



  reply	other threads:[~2009-03-25 19:18 UTC|newest]

Thread overview: 44+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-03-23 10:42 ucc_geth: nf_conntrack: table full, dropping packet Joakim Tjernlund
2009-03-23 12:15 ` Patrick McHardy
2009-03-23 12:25   ` Joakim Tjernlund
2009-03-23 12:29     ` Patrick McHardy
2009-03-23 12:59       ` Joakim Tjernlund
     [not found]       ` <OF387EC803.F810F72A-ONC1257582.00468C6E-C1257582.00475783@LocalDomain>
2009-03-23 13:09         ` Joakim Tjernlund
2009-03-23 17:42       ` Joakim Tjernlund
2009-03-23 17:49         ` Patrick McHardy
2009-03-24  8:22           ` Joakim Tjernlund
2009-03-24  9:12             ` Eric Dumazet
2009-03-24 10:55               ` Joakim Tjernlund
2009-03-24 12:07                 ` [PATCH] conntrack: Reduce conntrack count in nf_conntrack_free() Eric Dumazet
2009-03-24 12:25                   ` Eric Dumazet
2009-03-24 12:43                     ` Patrick McHardy
2009-03-24 13:32                       ` Eric Dumazet
2009-03-24 13:38                         ` Patrick McHardy
2009-03-24 13:47                           ` Eric Dumazet
     [not found]                             ` <49C8F871.9070600@cosmosbay.com>
     [not found]                               ` <49C8F8E0.9050502@trash.net>
2009-03-25  3:53                                 ` Eric Dumazet
2009-03-25 13:39                                   ` Patrick McHardy
2009-03-25 13:44                                     ` Eric Dumazet
2009-03-24 13:20                   ` Joakim Tjernlund
2009-03-24 13:28                     ` Patrick McHardy
2009-03-24 13:29                     ` Eric Dumazet
2009-03-24 13:41                       ` Joakim Tjernlund
2009-03-24 15:17                   ` Maxime Bizon
2009-03-24 15:21                     ` Patrick McHardy
2009-03-24 15:27                     ` Eric Dumazet
2009-03-24 19:54                       ` [PATCH] netfilter: Use hlist_add_head_rcu() in nf_conntrack_set_hashsize() Eric Dumazet
2009-03-25 16:26                         ` Patrick McHardy
2009-03-25 17:53                       ` [PATCH] conntrack: use SLAB_DESTROY_BY_RCU for nf_conn structs Eric Dumazet
2009-03-25 18:05                         ` Patrick McHardy
2009-03-25 18:06                           ` Patrick McHardy
2009-03-25 18:15                           ` Eric Dumazet
2009-03-25 18:24                             ` Patrick McHardy
2009-03-25 18:53                               ` Eric Dumazet
2009-03-25 19:00                                 ` Patrick McHardy
2009-03-25 19:17                                   ` Eric Dumazet [this message]
2009-03-25 19:41                                     ` Patrick McHardy
2009-03-25 19:58                                       ` Eric Dumazet
2009-03-25 20:10                                         ` Patrick McHardy
2009-03-24 18:29                     ` [PATCH] conntrack: Reduce conntrack count in nf_conntrack_free() Joakim Tjernlund
2009-03-23 17:49         ` ucc_geth: nf_conntrack: table full, dropping packet Eric Dumazet
2009-03-23 18:04           ` Joakim Tjernlund
2009-03-23 18:08             ` Eric Dumazet

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=49CA8350.5040407@cosmosbay.com \
    --to=dada1@cosmosbay.com \
    --cc=Joakim.Tjernlund@transmode.se \
    --cc=avorontsov@ru.mvista.com \
    --cc=kaber@trash.net \
    --cc=mbizon@freebox.fr \
    --cc=netdev@vger.kernel.org \
    --cc=netfilter-devel@vger.kernel.org \
    --cc=paulmck@linux.vnet.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).