All of lore.kernel.org
 help / color / mirror / Atom feed
From: Eric Dumazet <dada1@cosmosbay.com>
To: Patrick McHardy <kaber@trash.net>
Cc: mbizon@freebox.fr,
	"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>,
	Joakim Tjernlund <Joakim.Tjernlund@transmode.se>,
	avorontsov@ru.mvista.com, netdev@vger.kernel.org,
	Netfilter Developers <netfilter-devel@vger.kernel.org>
Subject: Re: [PATCH] conntrack: use SLAB_DESTROY_BY_RCU for nf_conn structs
Date: Wed, 25 Mar 2009 20:17:36 +0100	[thread overview]
Message-ID: <49CA8350.5040407@cosmosbay.com> (raw)
In-Reply-To: <49CA7F45.5020800@trash.net>

Patrick McHardy a écrit :
> Eric Dumazet wrote:
>> Here is take 2 of the patch with proper ref counting on dumping.
> 
> Thanks, one final question about the seq-file handling:
> 
>> diff --git a/net/ipv4/netfilter/nf_conntrack_l3proto_ipv4_compat.c
>> b/net/ipv4/netfilter/nf_conntrack_l3proto_ipv4_compat.c
>> index 6ba5c55..0b870b9 100644
>> --- a/net/ipv4/netfilter/nf_conntrack_l3proto_ipv4_compat.c
>> +++ b/net/ipv4/netfilter/nf_conntrack_l3proto_ipv4_compat.c
>> @@ -25,30 +25,30 @@ struct ct_iter_state {
>>      unsigned int bucket;
>>  };
>>  
>> -static struct hlist_node *ct_get_first(struct seq_file *seq)
>> +static struct hlist_nulls_node *ct_get_first(struct seq_file *seq)
>>  {
>>      struct net *net = seq_file_net(seq);
>>      struct ct_iter_state *st = seq->private;
>> -    struct hlist_node *n;
>> +    struct hlist_nulls_node *n;
>>  
>>      for (st->bucket = 0;
>>           st->bucket < nf_conntrack_htable_size;
>>           st->bucket++) {
>>          n = rcu_dereference(net->ct.hash[st->bucket].first);
>> -        if (n)
>> +        if (!is_a_nulls(n))
>>              return n;
>>      }
>>      return NULL;
>>  }
>>  
>> -static struct hlist_node *ct_get_next(struct seq_file *seq,
>> -                      struct hlist_node *head)
>> +static struct hlist_nulls_node *ct_get_next(struct seq_file *seq,
>> +                      struct hlist_nulls_node *head)
>>  {
>>      struct net *net = seq_file_net(seq);
>>      struct ct_iter_state *st = seq->private;
>>  
>>      head = rcu_dereference(head->next);
>> -    while (head == NULL) {
>> +    while (is_a_nulls(head)) {
>>          if (++st->bucket >= nf_conntrack_htable_size)
>>              return NULL;
>>          head = rcu_dereference(net->ct.hash[st->bucket].first);
>> @@ -56,9 +56,9 @@ static struct hlist_node *ct_get_next(struct
>> seq_file *seq,
>>      return head;
>>  }
>>  
>> -static struct hlist_node *ct_get_idx(struct seq_file *seq, loff_t pos)
>> +static struct hlist_nulls_node *ct_get_idx(struct seq_file *seq,
>> loff_t pos)
>>  {
>> -    struct hlist_node *head = ct_get_first(seq);
>> +    struct hlist_nulls_node *head = ct_get_first(seq);
>>  
>>      if (head)
>>          while (pos && (head = ct_get_next(seq, head)))
>> @@ -87,69 +87,76 @@ static void ct_seq_stop(struct seq_file *s, void *v)
>>  
>>  static int ct_seq_show(struct seq_file *s, void *v)
>>  {
>> -    const struct nf_conntrack_tuple_hash *hash = v;
>> -    const struct nf_conn *ct = nf_ct_tuplehash_to_ctrack(hash);
>> +    struct nf_conntrack_tuple_hash *hash = v;
>> +    struct nf_conn *ct = nf_ct_tuplehash_to_ctrack(hash);
>>      const struct nf_conntrack_l3proto *l3proto;
>>      const struct nf_conntrack_l4proto *l4proto;
>> +    int ret = 0;
>>  
>>      NF_CT_ASSERT(ct);
>> +    if (unlikely(!atomic_inc_not_zero(&ct->ct_general.use)))
>> +        return 0;
> 
> Can we assume the next pointer still points to the next entry
> in the same chain after the refcount dropped to zero?
> 
> 
> 

We are looking chain N.
If we cannot atomic_inc() refcount, we got some deleted entry.
If we could atomic_inc, we can meet an entry that just moved to another chain X

When hitting its end, we continue the search to the N+1 chain so we only 
skip the end of previous chain (N). We can 'forget' some entries, we can print
several time one given entry.


We could solve this by :

1) Checking hash value : if not one expected -> 
   Going back to head of chain N, (potentially re-printing already handled entries)
   So it is not a *perfect* solution.

2) Use a locking to forbid writers (as done in UDP/TCP), but it is expensive and
wont solve other problem :

We wont avoid emitting same entry several time anyway (this is a flaw of 
current seq_file handling, since we 'count' entries to be skiped, and this is
wrong if some entries were deleted or inserted meanwhile)

We have same problem on /proc/net/udp & /proc/net/tcp, I am not sure we should care...

Also, current resizing code can give to a /proc/net/ip_conntrack reader a problem, since
hash table can switch while its doing its dumping : many entries might be lost or regiven...



  reply	other threads:[~2009-03-25 19:17 UTC|newest]

Thread overview: 44+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-03-23 10:42 ucc_geth: nf_conntrack: table full, dropping packet Joakim Tjernlund
2009-03-23 12:15 ` Patrick McHardy
2009-03-23 12:25   ` Joakim Tjernlund
2009-03-23 12:29     ` Patrick McHardy
2009-03-23 12:59       ` Joakim Tjernlund
     [not found]       ` <OF387EC803.F810F72A-ONC1257582.00468C6E-C1257582.00475783@LocalDomain>
2009-03-23 13:09         ` Joakim Tjernlund
2009-03-23 17:42       ` Joakim Tjernlund
2009-03-23 17:49         ` Patrick McHardy
2009-03-24  8:22           ` Joakim Tjernlund
2009-03-24  9:12             ` Eric Dumazet
2009-03-24 10:55               ` Joakim Tjernlund
2009-03-24 12:07                 ` [PATCH] conntrack: Reduce conntrack count in nf_conntrack_free() Eric Dumazet
2009-03-24 12:25                   ` Eric Dumazet
2009-03-24 12:43                     ` Patrick McHardy
2009-03-24 13:32                       ` Eric Dumazet
2009-03-24 13:38                         ` Patrick McHardy
2009-03-24 13:47                           ` Eric Dumazet
     [not found]                             ` <49C8F871.9070600@cosmosbay.com>
     [not found]                               ` <49C8F8E0.9050502@trash.net>
2009-03-25  3:53                                 ` Eric Dumazet
2009-03-25 13:39                                   ` Patrick McHardy
2009-03-25 13:44                                     ` Eric Dumazet
2009-03-24 13:20                   ` Joakim Tjernlund
2009-03-24 13:28                     ` Patrick McHardy
2009-03-24 13:29                     ` Eric Dumazet
2009-03-24 13:41                       ` Joakim Tjernlund
2009-03-24 15:17                   ` Maxime Bizon
2009-03-24 15:21                     ` Patrick McHardy
2009-03-24 15:27                     ` Eric Dumazet
2009-03-24 19:54                       ` [PATCH] netfilter: Use hlist_add_head_rcu() in nf_conntrack_set_hashsize() Eric Dumazet
2009-03-25 16:26                         ` Patrick McHardy
2009-03-25 17:53                       ` [PATCH] conntrack: use SLAB_DESTROY_BY_RCU for nf_conn structs Eric Dumazet
2009-03-25 18:05                         ` Patrick McHardy
2009-03-25 18:06                           ` Patrick McHardy
2009-03-25 18:15                           ` Eric Dumazet
2009-03-25 18:24                             ` Patrick McHardy
2009-03-25 18:53                               ` Eric Dumazet
2009-03-25 19:00                                 ` Patrick McHardy
2009-03-25 19:17                                   ` Eric Dumazet [this message]
2009-03-25 19:41                                     ` Patrick McHardy
2009-03-25 19:58                                       ` Eric Dumazet
2009-03-25 20:10                                         ` Patrick McHardy
2009-03-24 18:29                     ` [PATCH] conntrack: Reduce conntrack count in nf_conntrack_free() Joakim Tjernlund
2009-03-23 17:49         ` ucc_geth: nf_conntrack: table full, dropping packet Eric Dumazet
2009-03-23 18:04           ` Joakim Tjernlund
2009-03-23 18:08             ` Eric Dumazet

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=49CA8350.5040407@cosmosbay.com \
    --to=dada1@cosmosbay.com \
    --cc=Joakim.Tjernlund@transmode.se \
    --cc=avorontsov@ru.mvista.com \
    --cc=kaber@trash.net \
    --cc=mbizon@freebox.fr \
    --cc=netdev@vger.kernel.org \
    --cc=netfilter-devel@vger.kernel.org \
    --cc=paulmck@linux.vnet.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.