All of lore.kernel.org
 help / color / mirror / Atom feed
From: Pablo Neira Ayuso <pablo@netfilter.org>
To: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Cc: netfilter-devel@vger.kernel.org, Zambo Marcell <marcell.zambo@gmail.com>
Subject: Re: [PATCH 1/1] netfilter: fix soft lockup when netlink adds new entries
Date: Fri, 24 Feb 2012 12:18:00 +0100	[thread overview]
Message-ID: <20120224111800.GA9669@1984> (raw)
In-Reply-To: <alpine.DEB.2.00.1202241143001.14577@blackhole.kfki.hu>

On Fri, Feb 24, 2012 at 11:45:49AM +0100, Jozsef Kadlecsik wrote:
> On Fri, 24 Feb 2012, Jozsef Kadlecsik wrote:
> 
> > OK, and I remove nf_conntrack_hash_insert then, and use a single ct 
> > argument because as you noted net and zone can be extracted from the ct.
> 
> Here comes the second version of the patch, which assumes that the 
> previous one is reverted.
> 
> Best regards,
> Jozsef
> 
> Marcell Zambo and Janos Farago noticed and reported that when
> new conntrack entries are added via netlink and the conntrack table
> gets full, soft lockup happens. This is because the nf_conntrack_lock
> is held while nf_conntrack_alloc is called, which is in turn wants
> to lock nf_conntrack_lock while evicting entries from the full table.
> 
> The patch fixes the soft lockup with limiting the holding of the
> nf_conntrack_lock to the minimum, where it's absolutely required.
> It required to extend (and thus change) nf_conntrack_hash_insert
> so that it makes sure conntrack and ctnetlink do not add the same entry
> twice to the conntrack table.
> 
> Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
> ---
>  include/net/netfilter/nf_conntrack.h |    2 +-
>  net/netfilter/nf_conntrack_core.c    |   38 ++++++++++++++++++++--
>  net/netfilter/nf_conntrack_netlink.c |   58 +++++++++++++--------------------
>  3 files changed, 58 insertions(+), 40 deletions(-)
> 
> diff --git a/include/net/netfilter/nf_conntrack.h b/include/net/netfilter/nf_conntrack.h
> index 8a2b0ae..ab86036 100644
> --- a/include/net/netfilter/nf_conntrack.h
> +++ b/include/net/netfilter/nf_conntrack.h
> @@ -209,7 +209,7 @@ extern struct nf_conntrack_tuple_hash *
>  __nf_conntrack_find(struct net *net, u16 zone,
>  		    const struct nf_conntrack_tuple *tuple);
>  
> -extern void nf_conntrack_hash_insert(struct nf_conn *ct);
> +extern int nf_conntrack_hash_check_insert(struct nf_conn *ct);
>  extern void nf_ct_delete_from_lists(struct nf_conn *ct);
>  extern void nf_ct_insert_dying_list(struct nf_conn *ct);
>  
> diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c
> index 76613f5..ed86a3b 100644
> --- a/net/netfilter/nf_conntrack_core.c
> +++ b/net/netfilter/nf_conntrack_core.c
> @@ -404,19 +404,49 @@ static void __nf_conntrack_hash_insert(struct nf_conn *ct,
>  			   &net->ct.hash[repl_hash]);
>  }
>  
> -void nf_conntrack_hash_insert(struct nf_conn *ct)
> +int
> +nf_conntrack_hash_check_insert(struct nf_conn *ct)
>  {
>  	struct net *net = nf_ct_net(ct);
>  	unsigned int hash, repl_hash;
> +	struct nf_conntrack_tuple_hash *h;
> +	struct hlist_nulls_node *n;
>  	u16 zone;
>  
>  	zone = nf_ct_zone(ct);
> -	hash = hash_conntrack(net, zone, &ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple);
> -	repl_hash = hash_conntrack(net, zone, &ct->tuplehash[IP_CT_DIR_REPLY].tuple);
> +	hash = hash_conntrack(net, zone,
> +			      &ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple);
> +	repl_hash = hash_conntrack(net, zone,
> +				   &ct->tuplehash[IP_CT_DIR_REPLY].tuple);
> +
> +	spin_lock_bh(&nf_conntrack_lock);
>  
> +	/* See if there's one in the list already, including reverse */
> +	hlist_nulls_for_each_entry(h, n, &net->ct.hash[hash], hnnode)
> +		if (nf_ct_tuple_equal(&ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple,
> +				      &h->tuple) &&
> +		    zone == nf_ct_zone(nf_ct_tuplehash_to_ctrack(h)))
> +			goto out;
> +	hlist_nulls_for_each_entry(h, n, &net->ct.hash[repl_hash], hnnode)
> +		if (nf_ct_tuple_equal(&ct->tuplehash[IP_CT_DIR_REPLY].tuple,
> +				      &h->tuple) &&
> +		    zone == nf_ct_zone(nf_ct_tuplehash_to_ctrack(h)))
> +			goto out;
> +
> +	add_timer(&ct->timeout);
> +	nf_conntrack_get(&ct->ct_general);
>  	__nf_conntrack_hash_insert(ct, hash, repl_hash);
> +	NF_CT_STAT_INC(net, insert);
> +	spin_unlock_bh(&nf_conntrack_lock);
> +
> +	return 0;
> +
> +out:
> +	NF_CT_STAT_INC(net, insert_failed);
> +	spin_unlock_bh(&nf_conntrack_lock);
> +	return -EEXIST;
>  }
> -EXPORT_SYMBOL_GPL(nf_conntrack_hash_insert);
> +EXPORT_SYMBOL_GPL(nf_conntrack_hash_check_insert);
>  
>  /* Confirm a connection given skb; places it in hash table */
>  int
> diff --git a/net/netfilter/nf_conntrack_netlink.c b/net/netfilter/nf_conntrack_netlink.c
> index 9307b03..6d73501 100644
> --- a/net/netfilter/nf_conntrack_netlink.c
> +++ b/net/netfilter/nf_conntrack_netlink.c
> @@ -1345,7 +1345,7 @@ ctnetlink_create_conntrack(struct net *net, u16 zone,
>  	struct nf_conntrack_helper *helper;
>  	struct nf_conn_tstamp *tstamp;
>  
> -	ct = nf_conntrack_alloc(net, zone, otuple, rtuple, GFP_ATOMIC);
> +	ct = nf_conntrack_alloc(net, zone, otuple, rtuple, GFP_KERNEL);
>  	if (IS_ERR(ct))
>  		return ERR_PTR(-ENOMEM);
>  

Sorry, I was wrong. I didn't notice that this is called holding
rcu_read_lock() inside nfnetlink.c and we cannot use GFP_KERNEL
inside read-lock area.

I'm going to take the patch and will fix this myself.

Thanks Jozsef.

  reply	other threads:[~2012-02-24 11:18 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-02-21 13:53 [PATCH 0/1] netfilter: fix soft lockup when netlink adds new entries Jozsef Kadlecsik
2012-02-21 13:53 ` [PATCH 1/1] " Jozsef Kadlecsik
2012-02-21 14:52   ` Pablo Neira Ayuso
2012-02-21 15:06     ` Jozsef Kadlecsik
2012-02-21 15:20       ` Pablo Neira Ayuso
2012-02-23 10:12       ` Pablo Neira Ayuso
2012-02-23 12:43         ` Jozsef Kadlecsik
2012-02-23 15:46           ` Pablo Neira Ayuso
2012-02-23 20:44             ` Jozsef Kadlecsik
2012-02-24  1:24               ` Pablo Neira Ayuso
2012-02-24  6:44                 ` Eric Dumazet
2012-02-24 10:01                   ` Pablo Neira Ayuso
2012-02-24 10:11                     ` Jozsef Kadlecsik
2012-02-24 11:36                     ` Pablo Neira Ayuso
2012-02-24  8:06                 ` Jozsef Kadlecsik
2012-02-24  9:59                   ` Pablo Neira Ayuso
2012-02-24 10:09                     ` Jozsef Kadlecsik
2012-02-24 10:45                       ` Jozsef Kadlecsik
2012-02-24 11:18                         ` Pablo Neira Ayuso [this message]
2012-02-24 11:33                         ` Pablo Neira Ayuso
2012-02-24 17:05                           ` Jozsef Kadlecsik

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120224111800.GA9669@1984 \
    --to=pablo@netfilter.org \
    --cc=kadlec@blackhole.kfki.hu \
    --cc=marcell.zambo@gmail.com \
    --cc=netfilter-devel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.