Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH v3 10/11] net: sched: atomically check-allocate action
From: Marcelo Ricardo Leitner @ 2018-05-28 21:38 UTC (permalink / raw)
  To: Vlad Buslov
  Cc: jiri, netdev, jhs, xiyou.wangcong, davem, ast, daniel, kliteyn
In-Reply-To: <1527455849-22327-11-git-send-email-vladbu@mellanox.com>

On Mon, May 28, 2018 at 12:17:28AM +0300, Vlad Buslov wrote:
> Implement function that atomically checks if action exists and either takes
> reference to it, or allocates idr slot for action index to prevent
> concurrent allocations of actions with same index. Use EBUSY error pointer
> to indicate that idr slot is reserved.
> 
> Implement cleanup helper function that removes temporary error pointer from
> idr. (in case of error between idr allocation and insertion of newly
> created action to specified index)
> 
> Refactor all action init functions to insert new action to idr using this
> API.
> 
> Signed-off-by: Vlad Buslov <vladbu@mellanox.com>

Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>

> ---
> Changes from V1 to V2:
> - Remove unique idr insertion function. Change original idr insert to do
>   the same thing.
> - Refactor action check-alloc code into standalone function.
> 
>  include/net/act_api.h      |  3 ++
>  net/sched/act_api.c        | 92 ++++++++++++++++++++++++++++++++++++----------
>  net/sched/act_bpf.c        | 11 ++++--
>  net/sched/act_connmark.c   | 10 +++--
>  net/sched/act_csum.c       | 11 ++++--
>  net/sched/act_gact.c       | 11 ++++--
>  net/sched/act_ife.c        |  6 ++-
>  net/sched/act_ipt.c        | 13 ++++++-
>  net/sched/act_mirred.c     | 16 ++++++--
>  net/sched/act_nat.c        | 11 ++++--
>  net/sched/act_pedit.c      | 15 ++++++--
>  net/sched/act_police.c     |  9 ++++-
>  net/sched/act_sample.c     | 11 ++++--
>  net/sched/act_simple.c     | 11 +++++-
>  net/sched/act_skbedit.c    | 11 +++++-
>  net/sched/act_skbmod.c     | 11 +++++-
>  net/sched/act_tunnel_key.c |  9 ++++-
>  net/sched/act_vlan.c       | 17 ++++++++-
>  18 files changed, 218 insertions(+), 60 deletions(-)
> 
> diff --git a/include/net/act_api.h b/include/net/act_api.h
> index d256e20507b9..cd4547476074 100644
> --- a/include/net/act_api.h
> +++ b/include/net/act_api.h
> @@ -154,6 +154,9 @@ int tcf_idr_create(struct tc_action_net *tn, u32 index, struct nlattr *est,
>  		   int bind, bool cpustats);
>  void tcf_idr_insert(struct tc_action_net *tn, struct tc_action *a);
>  
> +void tcf_idr_cleanup(struct tc_action_net *tn, u32 index);
> +int tcf_idr_check_alloc(struct tc_action_net *tn, u32 *index,
> +			struct tc_action **a, int bind);
>  int tcf_idr_delete_index(struct tc_action_net *tn, u32 index);
>  int __tcf_idr_release(struct tc_action *a, bool bind, bool strict);
>  
> diff --git a/net/sched/act_api.c b/net/sched/act_api.c
> index eefe8c2fe667..9511502e1cbb 100644
> --- a/net/sched/act_api.c
> +++ b/net/sched/act_api.c
> @@ -303,7 +303,9 @@ static bool __tcf_idr_check(struct tc_action_net *tn, u32 index,
>  
>  	spin_lock(&idrinfo->lock);
>  	p = idr_find(&idrinfo->action_idr, index);
> -	if (p) {
> +	if (IS_ERR(p)) {
> +		p = NULL;
> +	} else if (p) {
>  		refcount_inc(&p->tcfa_refcnt);
>  		if (bind)
>  			atomic_inc(&p->tcfa_bindcnt);
> @@ -371,7 +373,6 @@ int tcf_idr_create(struct tc_action_net *tn, u32 index, struct nlattr *est,
>  {
>  	struct tc_action *p = kzalloc(ops->size, GFP_KERNEL);
>  	struct tcf_idrinfo *idrinfo = tn->idrinfo;
> -	struct idr *idr = &idrinfo->action_idr;
>  	int err = -ENOMEM;
>  
>  	if (unlikely(!p))
> @@ -389,20 +390,6 @@ int tcf_idr_create(struct tc_action_net *tn, u32 index, struct nlattr *est,
>  			goto err2;
>  	}
>  	spin_lock_init(&p->tcfa_lock);
> -	idr_preload(GFP_KERNEL);
> -	spin_lock(&idrinfo->lock);
> -	/* user doesn't specify an index */
> -	if (!index) {
> -		index = 1;
> -		err = idr_alloc_u32(idr, NULL, &index, UINT_MAX, GFP_ATOMIC);
> -	} else {
> -		err = idr_alloc_u32(idr, NULL, &index, index, GFP_ATOMIC);
> -	}
> -	spin_unlock(&idrinfo->lock);
> -	idr_preload_end();
> -	if (err)
> -		goto err3;
> -
>  	p->tcfa_index = index;
>  	p->tcfa_tm.install = jiffies;
>  	p->tcfa_tm.lastuse = jiffies;
> @@ -412,7 +399,7 @@ int tcf_idr_create(struct tc_action_net *tn, u32 index, struct nlattr *est,
>  					&p->tcfa_rate_est,
>  					&p->tcfa_lock, NULL, est);
>  		if (err)
> -			goto err4;
> +			goto err3;
>  	}
>  
>  	p->idrinfo = idrinfo;
> @@ -420,8 +407,6 @@ int tcf_idr_create(struct tc_action_net *tn, u32 index, struct nlattr *est,
>  	INIT_LIST_HEAD(&p->list);
>  	*a = p;
>  	return 0;
> -err4:
> -	idr_remove(idr, index);
>  err3:
>  	free_percpu(p->cpu_qstats);
>  err2:
> @@ -437,11 +422,78 @@ void tcf_idr_insert(struct tc_action_net *tn, struct tc_action *a)
>  	struct tcf_idrinfo *idrinfo = tn->idrinfo;
>  
>  	spin_lock(&idrinfo->lock);
> -	idr_replace(&idrinfo->action_idr, a, a->tcfa_index);
> +	/* Replace ERR_PTR(-EBUSY) allocated by tcf_idr_check_alloc */
> +	WARN_ON(!IS_ERR(idr_replace(&idrinfo->action_idr, a, a->tcfa_index)));
>  	spin_unlock(&idrinfo->lock);
>  }
>  EXPORT_SYMBOL(tcf_idr_insert);
>  
> +/* Cleanup idr index that was allocated but not initialized. */
> +
> +void tcf_idr_cleanup(struct tc_action_net *tn, u32 index)
> +{
> +	struct tcf_idrinfo *idrinfo = tn->idrinfo;
> +
> +	spin_lock(&idrinfo->lock);
> +	/* Remove ERR_PTR(-EBUSY) allocated by tcf_idr_check_alloc */
> +	WARN_ON(!IS_ERR(idr_remove(&idrinfo->action_idr, index)));
> +	spin_unlock(&idrinfo->lock);
> +}
> +EXPORT_SYMBOL(tcf_idr_cleanup);
> +
> +/* Check if action with specified index exists. If actions is found, increments
> + * its reference and bind counters, and return 1. Otherwise insert temporary
> + * error pointer (to prevent concurrent users from inserting actions with same
> + * index) and return 0.
> + */
> +
> +int tcf_idr_check_alloc(struct tc_action_net *tn, u32 *index,
> +			struct tc_action **a, int bind)
> +{
> +	struct tcf_idrinfo *idrinfo = tn->idrinfo;
> +	struct tc_action *p;
> +	int ret;
> +
> +again:
> +	spin_lock(&idrinfo->lock);
> +	if (*index) {
> +		p = idr_find(&idrinfo->action_idr, *index);
> +		if (IS_ERR(p)) {
> +			/* This means that another process allocated
> +			 * index but did not assign the pointer yet.
> +			 */
> +			spin_unlock(&idrinfo->lock);
> +			goto again;
> +		}
> +
> +		if (p) {
> +			refcount_inc(&p->tcfa_refcnt);
> +			if (bind)
> +				atomic_inc(&p->tcfa_bindcnt);
> +			*a = p;
> +			ret = 1;
> +		} else {
> +			*a = NULL;
> +			ret = idr_alloc_u32(&idrinfo->action_idr, NULL, index,
> +					    *index, GFP_ATOMIC);
> +			if (!ret)
> +				idr_replace(&idrinfo->action_idr,
> +					    ERR_PTR(-EBUSY), *index);
> +		}
> +	} else {
> +		*index = 1;
> +		*a = NULL;
> +		ret = idr_alloc_u32(&idrinfo->action_idr, NULL, index,
> +				    UINT_MAX, GFP_ATOMIC);
> +		if (!ret)
> +			idr_replace(&idrinfo->action_idr, ERR_PTR(-EBUSY),
> +				    *index);
> +	}
> +	spin_unlock(&idrinfo->lock);
> +	return ret;
> +}
> +EXPORT_SYMBOL(tcf_idr_check_alloc);
> +
>  void tcf_idrinfo_destroy(const struct tc_action_ops *ops,
>  			 struct tcf_idrinfo *idrinfo)
>  {
> diff --git a/net/sched/act_bpf.c b/net/sched/act_bpf.c
> index d3f4ac6f2c4b..06f743d8ed41 100644
> --- a/net/sched/act_bpf.c
> +++ b/net/sched/act_bpf.c
> @@ -299,14 +299,17 @@ static int tcf_bpf_init(struct net *net, struct nlattr *nla,
>  
>  	parm = nla_data(tb[TCA_ACT_BPF_PARMS]);
>  
> -	if (!tcf_idr_check(tn, parm->index, act, bind)) {
> +	ret = tcf_idr_check_alloc(tn, &parm->index, act, bind);
> +	if (!ret) {
>  		ret = tcf_idr_create(tn, parm->index, est, act,
>  				     &act_bpf_ops, bind, true);
> -		if (ret < 0)
> +		if (ret < 0) {
> +			tcf_idr_cleanup(tn, parm->index);
>  			return ret;
> +		}
>  
>  		res = ACT_P_CREATED;
> -	} else {
> +	} else if (ret > 0) {
>  		/* Don't override defaults. */
>  		if (bind)
>  			return 0;
> @@ -315,6 +318,8 @@ static int tcf_bpf_init(struct net *net, struct nlattr *nla,
>  			tcf_idr_release(*act, bind);
>  			return -EEXIST;
>  		}
> +	} else {
> +		return ret;
>  	}
>  
>  	is_bpf = tb[TCA_ACT_BPF_OPS_LEN] && tb[TCA_ACT_BPF_OPS];
> diff --git a/net/sched/act_connmark.c b/net/sched/act_connmark.c
> index 701e90244eff..1e31f0e448e2 100644
> --- a/net/sched/act_connmark.c
> +++ b/net/sched/act_connmark.c
> @@ -118,11 +118,14 @@ static int tcf_connmark_init(struct net *net, struct nlattr *nla,
>  
>  	parm = nla_data(tb[TCA_CONNMARK_PARMS]);
>  
> -	if (!tcf_idr_check(tn, parm->index, a, bind)) {
> +	ret = tcf_idr_check_alloc(tn, &parm->index, a, bind);
> +	if (!ret) {
>  		ret = tcf_idr_create(tn, parm->index, est, a,
>  				     &act_connmark_ops, bind, false);
> -		if (ret)
> +		if (ret) {
> +			tcf_idr_cleanup(tn, parm->index);
>  			return ret;
> +		}
>  
>  		ci = to_connmark(*a);
>  		ci->tcf_action = parm->action;
> @@ -131,7 +134,7 @@ static int tcf_connmark_init(struct net *net, struct nlattr *nla,
>  
>  		tcf_idr_insert(tn, *a);
>  		ret = ACT_P_CREATED;
> -	} else {
> +	} else if (ret > 0) {
>  		ci = to_connmark(*a);
>  		if (bind)
>  			return 0;
> @@ -142,6 +145,7 @@ static int tcf_connmark_init(struct net *net, struct nlattr *nla,
>  		/* replacing action and zone */
>  		ci->tcf_action = parm->action;
>  		ci->zone = parm->zone;
> +		ret = 0;
>  	}
>  
>  	return ret;
> diff --git a/net/sched/act_csum.c b/net/sched/act_csum.c
> index 5dbee136b0a1..bd232d3bd022 100644
> --- a/net/sched/act_csum.c
> +++ b/net/sched/act_csum.c
> @@ -67,19 +67,24 @@ static int tcf_csum_init(struct net *net, struct nlattr *nla,
>  		return -EINVAL;
>  	parm = nla_data(tb[TCA_CSUM_PARMS]);
>  
> -	if (!tcf_idr_check(tn, parm->index, a, bind)) {
> +	err = tcf_idr_check_alloc(tn, &parm->index, a, bind);
> +	if (!err) {
>  		ret = tcf_idr_create(tn, parm->index, est, a,
>  				     &act_csum_ops, bind, true);
> -		if (ret)
> +		if (ret) {
> +			tcf_idr_cleanup(tn, parm->index);
>  			return ret;
> +		}
>  		ret = ACT_P_CREATED;
> -	} else {
> +	} else if (err > 0) {
>  		if (bind)/* dont override defaults */
>  			return 0;
>  		if (!ovr) {
>  			tcf_idr_release(*a, bind);
>  			return -EEXIST;
>  		}
> +	} else {
> +		return err;
>  	}
>  
>  	p = to_tcf_csum(*a);
> diff --git a/net/sched/act_gact.c b/net/sched/act_gact.c
> index 11c4de3f344e..661b72b9147d 100644
> --- a/net/sched/act_gact.c
> +++ b/net/sched/act_gact.c
> @@ -91,19 +91,24 @@ static int tcf_gact_init(struct net *net, struct nlattr *nla,
>  	}
>  #endif
>  
> -	if (!tcf_idr_check(tn, parm->index, a, bind)) {
> +	err = tcf_idr_check_alloc(tn, &parm->index, a, bind);
> +	if (!err) {
>  		ret = tcf_idr_create(tn, parm->index, est, a,
>  				     &act_gact_ops, bind, true);
> -		if (ret)
> +		if (ret) {
> +			tcf_idr_cleanup(tn, parm->index);
>  			return ret;
> +		}
>  		ret = ACT_P_CREATED;
> -	} else {
> +	} else if (err > 0) {
>  		if (bind)/* dont override defaults */
>  			return 0;
>  		if (!ovr) {
>  			tcf_idr_release(*a, bind);
>  			return -EEXIST;
>  		}
> +	} else {
> +		return err;
>  	}
>  
>  	gact = to_gact(*a);
> diff --git a/net/sched/act_ife.c b/net/sched/act_ife.c
> index 3dd3d79c5a4b..5bf0e79796c0 100644
> --- a/net/sched/act_ife.c
> +++ b/net/sched/act_ife.c
> @@ -483,7 +483,10 @@ static int tcf_ife_init(struct net *net, struct nlattr *nla,
>  	if (!p)
>  		return -ENOMEM;
>  
> -	exists = tcf_idr_check(tn, parm->index, a, bind);
> +	err = tcf_idr_check_alloc(tn, &parm->index, a, bind);
> +	if (err < 0)
> +		return err;
> +	exists = err;
>  	if (exists && bind) {
>  		kfree(p);
>  		return 0;
> @@ -493,6 +496,7 @@ static int tcf_ife_init(struct net *net, struct nlattr *nla,
>  		ret = tcf_idr_create(tn, parm->index, est, a, &act_ife_ops,
>  				     bind, true);
>  		if (ret) {
> +			tcf_idr_cleanup(tn, parm->index);
>  			kfree(p);
>  			return ret;
>  		}
> diff --git a/net/sched/act_ipt.c b/net/sched/act_ipt.c
> index 85e85dfba401..0dc787a57798 100644
> --- a/net/sched/act_ipt.c
> +++ b/net/sched/act_ipt.c
> @@ -119,13 +119,18 @@ static int __tcf_ipt_init(struct net *net, unsigned int id, struct nlattr *nla,
>  	if (tb[TCA_IPT_INDEX] != NULL)
>  		index = nla_get_u32(tb[TCA_IPT_INDEX]);
>  
> -	exists = tcf_idr_check(tn, index, a, bind);
> +	err = tcf_idr_check_alloc(tn, &index, a, bind);
> +	if (err < 0)
> +		return err;
> +	exists = err;
>  	if (exists && bind)
>  		return 0;
>  
>  	if (tb[TCA_IPT_HOOK] == NULL || tb[TCA_IPT_TARG] == NULL) {
>  		if (exists)
>  			tcf_idr_release(*a, bind);
> +		else
> +			tcf_idr_cleanup(tn, index);
>  		return -EINVAL;
>  	}
>  
> @@ -133,14 +138,18 @@ static int __tcf_ipt_init(struct net *net, unsigned int id, struct nlattr *nla,
>  	if (nla_len(tb[TCA_IPT_TARG]) < td->u.target_size) {
>  		if (exists)
>  			tcf_idr_release(*a, bind);
> +		else
> +			tcf_idr_cleanup(tn, index);
>  		return -EINVAL;
>  	}
>  
>  	if (!exists) {
>  		ret = tcf_idr_create(tn, index, est, a, ops, bind,
>  				     false);
> -		if (ret)
> +		if (ret) {
> +			tcf_idr_cleanup(tn, index);
>  			return ret;
> +		}
>  		ret = ACT_P_CREATED;
>  	} else {
>  		if (bind)/* dont override defaults */
> diff --git a/net/sched/act_mirred.c b/net/sched/act_mirred.c
> index e08aed06d7f8..6afd89a36c69 100644
> --- a/net/sched/act_mirred.c
> +++ b/net/sched/act_mirred.c
> @@ -79,7 +79,7 @@ static int tcf_mirred_init(struct net *net, struct nlattr *nla,
>  	struct tcf_mirred *m;
>  	struct net_device *dev;
>  	bool exists = false;
> -	int ret;
> +	int ret, err;
>  
>  	if (!nla) {
>  		NL_SET_ERR_MSG_MOD(extack, "Mirred requires attributes to be passed");
> @@ -94,7 +94,10 @@ static int tcf_mirred_init(struct net *net, struct nlattr *nla,
>  	}
>  	parm = nla_data(tb[TCA_MIRRED_PARMS]);
>  
> -	exists = tcf_idr_check(tn, parm->index, a, bind);
> +	err = tcf_idr_check_alloc(tn, &parm->index, a, bind);
> +	if (err < 0)
> +		return err;
> +	exists = err;
>  	if (exists && bind)
>  		return 0;
>  
> @@ -107,6 +110,8 @@ static int tcf_mirred_init(struct net *net, struct nlattr *nla,
>  	default:
>  		if (exists)
>  			tcf_idr_release(*a, bind);
> +		else
> +			tcf_idr_cleanup(tn, parm->index);
>  		NL_SET_ERR_MSG_MOD(extack, "Unknown mirred option");
>  		return -EINVAL;
>  	}
> @@ -115,6 +120,8 @@ static int tcf_mirred_init(struct net *net, struct nlattr *nla,
>  		if (dev == NULL) {
>  			if (exists)
>  				tcf_idr_release(*a, bind);
> +			else
> +				tcf_idr_cleanup(tn, parm->index);
>  			return -ENODEV;
>  		}
>  		mac_header_xmit = dev_is_mac_header_xmit(dev);
> @@ -124,13 +131,16 @@ static int tcf_mirred_init(struct net *net, struct nlattr *nla,
>  
>  	if (!exists) {
>  		if (!dev) {
> +			tcf_idr_cleanup(tn, parm->index);
>  			NL_SET_ERR_MSG_MOD(extack, "Specified device does not exist");
>  			return -EINVAL;
>  		}
>  		ret = tcf_idr_create(tn, parm->index, est, a,
>  				     &act_mirred_ops, bind, true);
> -		if (ret)
> +		if (ret) {
> +			tcf_idr_cleanup(tn, parm->index);
>  			return ret;
> +		}
>  		ret = ACT_P_CREATED;
>  	} else if (!ovr) {
>  		tcf_idr_release(*a, bind);
> diff --git a/net/sched/act_nat.c b/net/sched/act_nat.c
> index 1f91e8e66c0f..4dd9188a72fd 100644
> --- a/net/sched/act_nat.c
> +++ b/net/sched/act_nat.c
> @@ -57,19 +57,24 @@ static int tcf_nat_init(struct net *net, struct nlattr *nla, struct nlattr *est,
>  		return -EINVAL;
>  	parm = nla_data(tb[TCA_NAT_PARMS]);
>  
> -	if (!tcf_idr_check(tn, parm->index, a, bind)) {
> +	err = tcf_idr_check_alloc(tn, &parm->index, a, bind);
> +	if (!err) {
>  		ret = tcf_idr_create(tn, parm->index, est, a,
>  				     &act_nat_ops, bind, false);
> -		if (ret)
> +		if (ret) {
> +			tcf_idr_cleanup(tn, parm->index);
>  			return ret;
> +		}
>  		ret = ACT_P_CREATED;
> -	} else {
> +	} else if (err > 0) {
>  		if (bind)
>  			return 0;
>  		if (!ovr) {
>  			tcf_idr_release(*a, bind);
>  			return -EEXIST;
>  		}
> +	} else {
> +		return err;
>  	}
>  	p = to_tcf_nat(*a);
>  
> diff --git a/net/sched/act_pedit.c b/net/sched/act_pedit.c
> index fbf283f2ac34..2bd1d3f61488 100644
> --- a/net/sched/act_pedit.c
> +++ b/net/sched/act_pedit.c
> @@ -167,13 +167,18 @@ static int tcf_pedit_init(struct net *net, struct nlattr *nla,
>  	if (IS_ERR(keys_ex))
>  		return PTR_ERR(keys_ex);
>  
> -	if (!tcf_idr_check(tn, parm->index, a, bind)) {
> -		if (!parm->nkeys)
> +	err = tcf_idr_check_alloc(tn, &parm->index, a, bind);
> +	if (!err) {
> +		if (!parm->nkeys) {
> +			tcf_idr_cleanup(tn, parm->index);
>  			return -EINVAL;
> +		}
>  		ret = tcf_idr_create(tn, parm->index, est, a,
>  				     &act_pedit_ops, bind, false);
> -		if (ret)
> +		if (ret) {
> +			tcf_idr_cleanup(tn, parm->index);
>  			return ret;
> +		}
>  		p = to_pedit(*a);
>  		keys = kmalloc(ksize, GFP_KERNEL);
>  		if (keys == NULL) {
> @@ -182,7 +187,7 @@ static int tcf_pedit_init(struct net *net, struct nlattr *nla,
>  			return -ENOMEM;
>  		}
>  		ret = ACT_P_CREATED;
> -	} else {
> +	} else if (err > 0) {
>  		if (bind)
>  			return 0;
>  		if (!ovr) {
> @@ -197,6 +202,8 @@ static int tcf_pedit_init(struct net *net, struct nlattr *nla,
>  				return -ENOMEM;
>  			}
>  		}
> +	} else {
> +		return err;
>  	}
>  
>  	spin_lock_bh(&p->tcf_lock);
> diff --git a/net/sched/act_police.c b/net/sched/act_police.c
> index 99335cca739e..1f3192ea8df7 100644
> --- a/net/sched/act_police.c
> +++ b/net/sched/act_police.c
> @@ -101,15 +101,20 @@ static int tcf_act_police_init(struct net *net, struct nlattr *nla,
>  		return -EINVAL;
>  
>  	parm = nla_data(tb[TCA_POLICE_TBF]);
> -	exists = tcf_idr_check(tn, parm->index, a, bind);
> +	err = tcf_idr_check_alloc(tn, &parm->index, a, bind);
> +	if (err < 0)
> +		return err;
> +	exists = err;
>  	if (exists && bind)
>  		return 0;
>  
>  	if (!exists) {
>  		ret = tcf_idr_create(tn, parm->index, NULL, a,
>  				     &act_police_ops, bind, false);
> -		if (ret)
> +		if (ret) {
> +			tcf_idr_cleanup(tn, parm->index);
>  			return ret;
> +		}
>  		ret = ACT_P_CREATED;
>  	} else if (!ovr) {
>  		tcf_idr_release(*a, bind);
> diff --git a/net/sched/act_sample.c b/net/sched/act_sample.c
> index a8582e1347db..3079e7be5bde 100644
> --- a/net/sched/act_sample.c
> +++ b/net/sched/act_sample.c
> @@ -46,7 +46,7 @@ static int tcf_sample_init(struct net *net, struct nlattr *nla,
>  	struct tc_sample *parm;
>  	struct tcf_sample *s;
>  	bool exists = false;
> -	int ret;
> +	int ret, err;
>  
>  	if (!nla)
>  		return -EINVAL;
> @@ -59,15 +59,20 @@ static int tcf_sample_init(struct net *net, struct nlattr *nla,
>  
>  	parm = nla_data(tb[TCA_SAMPLE_PARMS]);
>  
> -	exists = tcf_idr_check(tn, parm->index, a, bind);
> +	err = tcf_idr_check_alloc(tn, &parm->index, a, bind);
> +	if (err < 0)
> +		return err;
> +	exists = err;
>  	if (exists && bind)
>  		return 0;
>  
>  	if (!exists) {
>  		ret = tcf_idr_create(tn, parm->index, est, a,
>  				     &act_sample_ops, bind, false);
> -		if (ret)
> +		if (ret) {
> +			tcf_idr_cleanup(tn, parm->index);
>  			return ret;
> +		}
>  		ret = ACT_P_CREATED;
>  	} else if (!ovr) {
>  		tcf_idr_release(*a, bind);
> diff --git a/net/sched/act_simple.c b/net/sched/act_simple.c
> index 78fffd329ed9..2cc874f791df 100644
> --- a/net/sched/act_simple.c
> +++ b/net/sched/act_simple.c
> @@ -101,13 +101,18 @@ static int tcf_simp_init(struct net *net, struct nlattr *nla,
>  		return -EINVAL;
>  
>  	parm = nla_data(tb[TCA_DEF_PARMS]);
> -	exists = tcf_idr_check(tn, parm->index, a, bind);
> +	err = tcf_idr_check_alloc(tn, &parm->index, a, bind);
> +	if (err < 0)
> +		return err;
> +	exists = err;
>  	if (exists && bind)
>  		return 0;
>  
>  	if (tb[TCA_DEF_DATA] == NULL) {
>  		if (exists)
>  			tcf_idr_release(*a, bind);
> +		else
> +			tcf_idr_cleanup(tn, parm->index);
>  		return -EINVAL;
>  	}
>  
> @@ -116,8 +121,10 @@ static int tcf_simp_init(struct net *net, struct nlattr *nla,
>  	if (!exists) {
>  		ret = tcf_idr_create(tn, parm->index, est, a,
>  				     &act_simp_ops, bind, false);
> -		if (ret)
> +		if (ret) {
> +			tcf_idr_cleanup(tn, parm->index);
>  			return ret;
> +		}
>  
>  		d = to_defact(*a);
>  		ret = alloc_defdata(d, defdata);
> diff --git a/net/sched/act_skbedit.c b/net/sched/act_skbedit.c
> index c0607d1319eb..29a15172a99d 100644
> --- a/net/sched/act_skbedit.c
> +++ b/net/sched/act_skbedit.c
> @@ -117,21 +117,28 @@ static int tcf_skbedit_init(struct net *net, struct nlattr *nla,
>  
>  	parm = nla_data(tb[TCA_SKBEDIT_PARMS]);
>  
> -	exists = tcf_idr_check(tn, parm->index, a, bind);
> +	err = tcf_idr_check_alloc(tn, &parm->index, a, bind);
> +	if (err < 0)
> +		return err;
> +	exists = err;
>  	if (exists && bind)
>  		return 0;
>  
>  	if (!flags) {
>  		if (exists)
>  			tcf_idr_release(*a, bind);
> +		else
> +			tcf_idr_cleanup(tn, parm->index);
>  		return -EINVAL;
>  	}
>  
>  	if (!exists) {
>  		ret = tcf_idr_create(tn, parm->index, est, a,
>  				     &act_skbedit_ops, bind, false);
> -		if (ret)
> +		if (ret) {
> +			tcf_idr_cleanup(tn, parm->index);
>  			return ret;
> +		}
>  
>  		d = to_skbedit(*a);
>  		ret = ACT_P_CREATED;
> diff --git a/net/sched/act_skbmod.c b/net/sched/act_skbmod.c
> index e844381af066..cdc6bacfb190 100644
> --- a/net/sched/act_skbmod.c
> +++ b/net/sched/act_skbmod.c
> @@ -128,21 +128,28 @@ static int tcf_skbmod_init(struct net *net, struct nlattr *nla,
>  	if (parm->flags & SKBMOD_F_SWAPMAC)
>  		lflags = SKBMOD_F_SWAPMAC;
>  
> -	exists = tcf_idr_check(tn, parm->index, a, bind);
> +	err = tcf_idr_check_alloc(tn, &parm->index, a, bind);
> +	if (err < 0)
> +		return err;
> +	exists = err;
>  	if (exists && bind)
>  		return 0;
>  
>  	if (!lflags) {
>  		if (exists)
>  			tcf_idr_release(*a, bind);
> +		else
> +			tcf_idr_cleanup(tn, parm->index);
>  		return -EINVAL;
>  	}
>  
>  	if (!exists) {
>  		ret = tcf_idr_create(tn, parm->index, est, a,
>  				     &act_skbmod_ops, bind, true);
> -		if (ret)
> +		if (ret) {
> +			tcf_idr_cleanup(tn, parm->index);
>  			return ret;
> +		}
>  
>  		ret = ACT_P_CREATED;
>  	} else if (!ovr) {
> diff --git a/net/sched/act_tunnel_key.c b/net/sched/act_tunnel_key.c
> index bd53f39a345b..4679b620af12 100644
> --- a/net/sched/act_tunnel_key.c
> +++ b/net/sched/act_tunnel_key.c
> @@ -99,7 +99,10 @@ static int tunnel_key_init(struct net *net, struct nlattr *nla,
>  		return -EINVAL;
>  
>  	parm = nla_data(tb[TCA_TUNNEL_KEY_PARMS]);
> -	exists = tcf_idr_check(tn, parm->index, a, bind);
> +	err = tcf_idr_check_alloc(tn, &parm->index, a, bind);
> +	if (err < 0)
> +		return err;
> +	exists = err;
>  	if (exists && bind)
>  		return 0;
>  
> @@ -162,7 +165,7 @@ static int tunnel_key_init(struct net *net, struct nlattr *nla,
>  		ret = tcf_idr_create(tn, parm->index, est, a,
>  				     &act_tunnel_key_ops, bind, true);
>  		if (ret)
> -			return ret;
> +			goto err_out;
>  
>  		ret = ACT_P_CREATED;
>  	} else if (!ovr) {
> @@ -198,6 +201,8 @@ static int tunnel_key_init(struct net *net, struct nlattr *nla,
>  err_out:
>  	if (exists)
>  		tcf_idr_release(*a, bind);
> +	else
> +		tcf_idr_cleanup(tn, parm->index);
>  	return ret;
>  }
>  
> diff --git a/net/sched/act_vlan.c b/net/sched/act_vlan.c
> index 4ac0d565e437..bae9822837d7 100644
> --- a/net/sched/act_vlan.c
> +++ b/net/sched/act_vlan.c
> @@ -134,7 +134,10 @@ static int tcf_vlan_init(struct net *net, struct nlattr *nla,
>  	if (!tb[TCA_VLAN_PARMS])
>  		return -EINVAL;
>  	parm = nla_data(tb[TCA_VLAN_PARMS]);
> -	exists = tcf_idr_check(tn, parm->index, a, bind);
> +	err = tcf_idr_check_alloc(tn, &parm->index, a, bind);
> +	if (err < 0)
> +		return err;
> +	exists = err;
>  	if (exists && bind)
>  		return 0;
>  
> @@ -146,12 +149,16 @@ static int tcf_vlan_init(struct net *net, struct nlattr *nla,
>  		if (!tb[TCA_VLAN_PUSH_VLAN_ID]) {
>  			if (exists)
>  				tcf_idr_release(*a, bind);
> +			else
> +				tcf_idr_cleanup(tn, parm->index);
>  			return -EINVAL;
>  		}
>  		push_vid = nla_get_u16(tb[TCA_VLAN_PUSH_VLAN_ID]);
>  		if (push_vid >= VLAN_VID_MASK) {
>  			if (exists)
>  				tcf_idr_release(*a, bind);
> +			else
> +				tcf_idr_cleanup(tn, parm->index);
>  			return -ERANGE;
>  		}
>  
> @@ -162,6 +169,8 @@ static int tcf_vlan_init(struct net *net, struct nlattr *nla,
>  			case htons(ETH_P_8021AD):
>  				break;
>  			default:
> +				if (!exists)
> +					tcf_idr_cleanup(tn, parm->index);
>  				return -EPROTONOSUPPORT;
>  			}
>  		} else {
> @@ -174,6 +183,8 @@ static int tcf_vlan_init(struct net *net, struct nlattr *nla,
>  	default:
>  		if (exists)
>  			tcf_idr_release(*a, bind);
> +		else
> +			tcf_idr_cleanup(tn, parm->index);
>  		return -EINVAL;
>  	}
>  	action = parm->v_action;
> @@ -181,8 +192,10 @@ static int tcf_vlan_init(struct net *net, struct nlattr *nla,
>  	if (!exists) {
>  		ret = tcf_idr_create(tn, parm->index, est, a,
>  				     &act_vlan_ops, bind, true);
> -		if (ret)
> +		if (ret) {
> +			tcf_idr_cleanup(tn, parm->index);
>  			return ret;
> +		}
>  
>  		ret = ACT_P_CREATED;
>  	} else if (!ovr) {
> -- 
> 2.7.5
> 

^ permalink raw reply

* Re: [PATCH v3 09/11] net: sched: use reference counting action init
From: Marcelo Ricardo Leitner @ 2018-05-28 21:38 UTC (permalink / raw)
  To: Vlad Buslov
  Cc: jiri, netdev, jhs, xiyou.wangcong, davem, ast, daniel, kliteyn
In-Reply-To: <1527455849-22327-10-git-send-email-vladbu@mellanox.com>

On Mon, May 28, 2018 at 12:17:27AM +0300, Vlad Buslov wrote:
> Change action API to assume that action init function always takes
> reference to action, even when overwriting existing action. This is
> necessary because action API continues to use action pointer after init
> function is done. At this point action becomes accessible for concurrent
> modifications, so user must always hold reference to it.
> 
> Implement helper put list function to atomically release list of actions
> after action API init code is done using them.
> 
> Signed-off-by: Vlad Buslov <vladbu@mellanox.com>

Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>

> ---
> Changes from V1 to V2:
> - Resplit action lookup/release code to prevent memory leaks in
>   individual patches.
> 
>  net/sched/act_api.c | 35 +++++++++++++++++------------------
>  1 file changed, 17 insertions(+), 18 deletions(-)
> 
> diff --git a/net/sched/act_api.c b/net/sched/act_api.c
> index f019f0464cec..eefe8c2fe667 100644
> --- a/net/sched/act_api.c
> +++ b/net/sched/act_api.c
> @@ -627,6 +627,18 @@ static int tcf_action_put(struct tc_action *p)
>  	return __tcf_action_put(p, false);
>  }
>  
> +static void tcf_action_put_lst(struct list_head *actions)
> +{
> +	struct tc_action *a, *tmp;
> +
> +	list_for_each_entry_safe(a, tmp, actions, list) {
> +		const struct tc_action_ops *ops = a->ops;
> +
> +		if (tcf_action_put(a))
> +			module_put(ops->owner);
> +	}
> +}
> +
>  int
>  tcf_action_dump_old(struct sk_buff *skb, struct tc_action *a, int bind, int ref)
>  {
> @@ -835,17 +847,6 @@ struct tc_action *tcf_action_init_1(struct net *net, struct tcf_proto *tp,
>  	return ERR_PTR(err);
>  }
>  
> -static void cleanup_a(struct list_head *actions, int ovr)
> -{
> -	struct tc_action *a;
> -
> -	if (!ovr)
> -		return;
> -
> -	list_for_each_entry(a, actions, list)
> -		refcount_dec(&a->tcfa_refcnt);
> -}
> -
>  int tcf_action_init(struct net *net, struct tcf_proto *tp, struct nlattr *nla,
>  		    struct nlattr *est, char *name, int ovr, int bind,
>  		    struct list_head *actions, size_t *attr_size,
> @@ -874,11 +875,6 @@ int tcf_action_init(struct net *net, struct tcf_proto *tp, struct nlattr *nla,
>  	}
>  
>  	*attr_size = tcf_action_full_attrs_size(sz);
> -
> -	/* Remove the temp refcnt which was necessary to protect against
> -	 * destroying an existing action which was being replaced
> -	 */
> -	cleanup_a(actions, ovr);
>  	return 0;
>  
>  err:
> @@ -1209,7 +1205,7 @@ tca_action_gd(struct net *net, struct nlattr *nla, struct nlmsghdr *n,
>  		return ret;
>  	}
>  err:
> -	tcf_action_destroy(&actions, 0);
> +	tcf_action_put_lst(&actions);
>  	return ret;
>  }
>  
> @@ -1251,8 +1247,11 @@ static int tcf_action_add(struct net *net, struct nlattr *nla,
>  			      &attr_size, true, extack);
>  	if (ret)
>  		return ret;
> +	ret = tcf_add_notify(net, n, &actions, portid, attr_size, extack);
> +	if (ovr)
> +		tcf_action_put_lst(&actions);
>  
> -	return tcf_add_notify(net, n, &actions, portid, attr_size, extack);
> +	return ret;
>  }
>  
>  static u32 tcaa_root_flags_allowed = TCA_FLAG_LARGE_DUMP_ON;
> -- 
> 2.7.5
> 

^ permalink raw reply

* Re: [PATCH v3 08/11] net: sched: don't release reference on action overwrite
From: Marcelo Ricardo Leitner @ 2018-05-28 21:38 UTC (permalink / raw)
  To: Vlad Buslov
  Cc: jiri, netdev, jhs, xiyou.wangcong, davem, ast, daniel, kliteyn
In-Reply-To: <1527455849-22327-9-git-send-email-vladbu@mellanox.com>

On Mon, May 28, 2018 at 12:17:26AM +0300, Vlad Buslov wrote:
> Return from action init function with reference to action taken,
> even when overwriting existing action.
> 
> Action init API initializes its fourth argument (pointer to pointer to tc
> action) to either existing action with same index or newly created action.
> In case of existing index(and bind argument is zero), init function returns
> without incrementing action reference counter. Caller of action init then
> proceeds working with action, without actually holding reference to it.
> This means that action could be deleted concurrently.
> 
> Change action init behavior to always take reference to action before
> returning successfully, in order to protect from concurrent deletion.
> 
> Signed-off-by: Vlad Buslov <vladbu@mellanox.com>

Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>

> ---
> Changes from V1 to V2:
> - Resplit action lookup/release code to prevent memory leaks in
>   individual patches.
> - Change convoluted commit message.
> 
>  net/sched/act_api.c        |  2 --
>  net/sched/act_bpf.c        |  8 ++++----
>  net/sched/act_connmark.c   |  5 +++--
>  net/sched/act_csum.c       |  8 ++++----
>  net/sched/act_gact.c       |  5 +++--
>  net/sched/act_ife.c        | 12 +++++-------
>  net/sched/act_ipt.c        |  5 +++--
>  net/sched/act_mirred.c     |  5 ++---
>  net/sched/act_nat.c        |  5 +++--
>  net/sched/act_pedit.c      |  5 +++--
>  net/sched/act_police.c     |  8 +++-----
>  net/sched/act_sample.c     |  8 +++-----
>  net/sched/act_simple.c     |  5 +++--
>  net/sched/act_skbedit.c    |  5 +++--
>  net/sched/act_skbmod.c     |  8 +++-----
>  net/sched/act_tunnel_key.c |  8 +++-----
>  net/sched/act_vlan.c       |  8 +++-----
>  17 files changed, 51 insertions(+), 59 deletions(-)
> 
> diff --git a/net/sched/act_api.c b/net/sched/act_api.c
> index a023873db713..f019f0464cec 100644
> --- a/net/sched/act_api.c
> +++ b/net/sched/act_api.c
> @@ -870,8 +870,6 @@ int tcf_action_init(struct net *net, struct tcf_proto *tp, struct nlattr *nla,
>  		}
>  		act->order = i;
>  		sz += tcf_action_fill_size(act);
> -		if (ovr)
> -			refcount_inc(&act->tcfa_refcnt);
>  		list_add_tail(&act->list, actions);
>  	}
>  
> diff --git a/net/sched/act_bpf.c b/net/sched/act_bpf.c
> index 7941dd66ff83..d3f4ac6f2c4b 100644
> --- a/net/sched/act_bpf.c
> +++ b/net/sched/act_bpf.c
> @@ -311,9 +311,10 @@ static int tcf_bpf_init(struct net *net, struct nlattr *nla,
>  		if (bind)
>  			return 0;
>  
> -		tcf_idr_release(*act, bind);
> -		if (!replace)
> +		if (!replace) {
> +			tcf_idr_release(*act, bind);
>  			return -EEXIST;
> +		}
>  	}
>  
>  	is_bpf = tb[TCA_ACT_BPF_OPS_LEN] && tb[TCA_ACT_BPF_OPS];
> @@ -356,8 +357,7 @@ static int tcf_bpf_init(struct net *net, struct nlattr *nla,
>  
>  	return res;
>  out:
> -	if (res == ACT_P_CREATED)
> -		tcf_idr_release(*act, bind);
> +	tcf_idr_release(*act, bind);
>  
>  	return ret;
>  }
> diff --git a/net/sched/act_connmark.c b/net/sched/act_connmark.c
> index 143c2d3de723..701e90244eff 100644
> --- a/net/sched/act_connmark.c
> +++ b/net/sched/act_connmark.c
> @@ -135,9 +135,10 @@ static int tcf_connmark_init(struct net *net, struct nlattr *nla,
>  		ci = to_connmark(*a);
>  		if (bind)
>  			return 0;
> -		tcf_idr_release(*a, bind);
> -		if (!ovr)
> +		if (!ovr) {
> +			tcf_idr_release(*a, bind);
>  			return -EEXIST;
> +		}
>  		/* replacing action and zone */
>  		ci->tcf_action = parm->action;
>  		ci->zone = parm->zone;
> diff --git a/net/sched/act_csum.c b/net/sched/act_csum.c
> index 3768539340e0..5dbee136b0a1 100644
> --- a/net/sched/act_csum.c
> +++ b/net/sched/act_csum.c
> @@ -76,9 +76,10 @@ static int tcf_csum_init(struct net *net, struct nlattr *nla,
>  	} else {
>  		if (bind)/* dont override defaults */
>  			return 0;
> -		tcf_idr_release(*a, bind);
> -		if (!ovr)
> +		if (!ovr) {
> +			tcf_idr_release(*a, bind);
>  			return -EEXIST;
> +		}
>  	}
>  
>  	p = to_tcf_csum(*a);
> @@ -86,8 +87,7 @@ static int tcf_csum_init(struct net *net, struct nlattr *nla,
>  
>  	params_new = kzalloc(sizeof(*params_new), GFP_KERNEL);
>  	if (unlikely(!params_new)) {
> -		if (ret == ACT_P_CREATED)
> -			tcf_idr_release(*a, bind);
> +		tcf_idr_release(*a, bind);
>  		return -ENOMEM;
>  	}
>  	params_old = rtnl_dereference(p->params);
> diff --git a/net/sched/act_gact.c b/net/sched/act_gact.c
> index a431a711f0dd..11c4de3f344e 100644
> --- a/net/sched/act_gact.c
> +++ b/net/sched/act_gact.c
> @@ -100,9 +100,10 @@ static int tcf_gact_init(struct net *net, struct nlattr *nla,
>  	} else {
>  		if (bind)/* dont override defaults */
>  			return 0;
> -		tcf_idr_release(*a, bind);
> -		if (!ovr)
> +		if (!ovr) {
> +			tcf_idr_release(*a, bind);
>  			return -EEXIST;
> +		}
>  	}
>  
>  	gact = to_gact(*a);
> diff --git a/net/sched/act_ife.c b/net/sched/act_ife.c
> index 027c305dcb37..3dd3d79c5a4b 100644
> --- a/net/sched/act_ife.c
> +++ b/net/sched/act_ife.c
> @@ -497,12 +497,10 @@ static int tcf_ife_init(struct net *net, struct nlattr *nla,
>  			return ret;
>  		}
>  		ret = ACT_P_CREATED;
> -	} else {
> +	} else if (!ovr) {
>  		tcf_idr_release(*a, bind);
> -		if (!ovr) {
> -			kfree(p);
> -			return -EEXIST;
> -		}
> +		kfree(p);
> +		return -EEXIST;
>  	}
>  
>  	ife = to_ife(*a);
> @@ -544,13 +542,13 @@ static int tcf_ife_init(struct net *net, struct nlattr *nla,
>  				       NULL, NULL);
>  		if (err) {
>  metadata_parse_err:
> -			if (exists)
> -				tcf_idr_release(*a, bind);
>  			if (ret == ACT_P_CREATED)
>  				_tcf_ife_cleanup(*a);
>  
>  			if (exists)
>  				spin_unlock_bh(&ife->tcf_lock);
> +			tcf_idr_release(*a, bind);
> +
>  			kfree(p);
>  			return err;
>  		}
> diff --git a/net/sched/act_ipt.c b/net/sched/act_ipt.c
> index 6c234411c771..85e85dfba401 100644
> --- a/net/sched/act_ipt.c
> +++ b/net/sched/act_ipt.c
> @@ -145,10 +145,11 @@ static int __tcf_ipt_init(struct net *net, unsigned int id, struct nlattr *nla,
>  	} else {
>  		if (bind)/* dont override defaults */
>  			return 0;
> -		tcf_idr_release(*a, bind);
>  
> -		if (!ovr)
> +		if (!ovr) {
> +			tcf_idr_release(*a, bind);
>  			return -EEXIST;
> +		}
>  	}
>  	hook = nla_get_u32(tb[TCA_IPT_HOOK]);
>  
> diff --git a/net/sched/act_mirred.c b/net/sched/act_mirred.c
> index 3d8300bce7e4..e08aed06d7f8 100644
> --- a/net/sched/act_mirred.c
> +++ b/net/sched/act_mirred.c
> @@ -132,10 +132,9 @@ static int tcf_mirred_init(struct net *net, struct nlattr *nla,
>  		if (ret)
>  			return ret;
>  		ret = ACT_P_CREATED;
> -	} else {
> +	} else if (!ovr) {
>  		tcf_idr_release(*a, bind);
> -		if (!ovr)
> -			return -EEXIST;
> +		return -EEXIST;
>  	}
>  	m = to_mirred(*a);
>  
> diff --git a/net/sched/act_nat.c b/net/sched/act_nat.c
> index 9eb27c89dc46..1f91e8e66c0f 100644
> --- a/net/sched/act_nat.c
> +++ b/net/sched/act_nat.c
> @@ -66,9 +66,10 @@ static int tcf_nat_init(struct net *net, struct nlattr *nla, struct nlattr *est,
>  	} else {
>  		if (bind)
>  			return 0;
> -		tcf_idr_release(*a, bind);
> -		if (!ovr)
> +		if (!ovr) {
> +			tcf_idr_release(*a, bind);
>  			return -EEXIST;
> +		}
>  	}
>  	p = to_tcf_nat(*a);
>  
> diff --git a/net/sched/act_pedit.c b/net/sched/act_pedit.c
> index b8857035e3f8..fbf283f2ac34 100644
> --- a/net/sched/act_pedit.c
> +++ b/net/sched/act_pedit.c
> @@ -185,9 +185,10 @@ static int tcf_pedit_init(struct net *net, struct nlattr *nla,
>  	} else {
>  		if (bind)
>  			return 0;
> -		tcf_idr_release(*a, bind);
> -		if (!ovr)
> +		if (!ovr) {
> +			tcf_idr_release(*a, bind);
>  			return -EEXIST;
> +		}
>  		p = to_pedit(*a);
>  		if (p->tcfp_nkeys && p->tcfp_nkeys != parm->nkeys) {
>  			keys = kmalloc(ksize, GFP_KERNEL);
> diff --git a/net/sched/act_police.c b/net/sched/act_police.c
> index c955fb0d4f3f..99335cca739e 100644
> --- a/net/sched/act_police.c
> +++ b/net/sched/act_police.c
> @@ -111,10 +111,9 @@ static int tcf_act_police_init(struct net *net, struct nlattr *nla,
>  		if (ret)
>  			return ret;
>  		ret = ACT_P_CREATED;
> -	} else {
> +	} else if (!ovr) {
>  		tcf_idr_release(*a, bind);
> -		if (!ovr)
> -			return -EEXIST;
> +		return -EEXIST;
>  	}
>  
>  	police = to_police(*a);
> @@ -195,8 +194,7 @@ static int tcf_act_police_init(struct net *net, struct nlattr *nla,
>  failure:
>  	qdisc_put_rtab(P_tab);
>  	qdisc_put_rtab(R_tab);
> -	if (ret == ACT_P_CREATED)
> -		tcf_idr_release(*a, bind);
> +	tcf_idr_release(*a, bind);
>  	return err;
>  }
>  
> diff --git a/net/sched/act_sample.c b/net/sched/act_sample.c
> index 6f79d2afcba2..a8582e1347db 100644
> --- a/net/sched/act_sample.c
> +++ b/net/sched/act_sample.c
> @@ -69,10 +69,9 @@ static int tcf_sample_init(struct net *net, struct nlattr *nla,
>  		if (ret)
>  			return ret;
>  		ret = ACT_P_CREATED;
> -	} else {
> +	} else if (!ovr) {
>  		tcf_idr_release(*a, bind);
> -		if (!ovr)
> -			return -EEXIST;
> +		return -EEXIST;
>  	}
>  	s = to_sample(*a);
>  
> @@ -81,8 +80,7 @@ static int tcf_sample_init(struct net *net, struct nlattr *nla,
>  	s->psample_group_num = nla_get_u32(tb[TCA_SAMPLE_PSAMPLE_GROUP]);
>  	psample_group = psample_group_get(net, s->psample_group_num);
>  	if (!psample_group) {
> -		if (ret == ACT_P_CREATED)
> -			tcf_idr_release(*a, bind);
> +		tcf_idr_release(*a, bind);
>  		return -ENOMEM;
>  	}
>  	RCU_INIT_POINTER(s->psample_group, psample_group);
> diff --git a/net/sched/act_simple.c b/net/sched/act_simple.c
> index b570e7ca7e33..78fffd329ed9 100644
> --- a/net/sched/act_simple.c
> +++ b/net/sched/act_simple.c
> @@ -130,9 +130,10 @@ static int tcf_simp_init(struct net *net, struct nlattr *nla,
>  	} else {
>  		d = to_defact(*a);
>  
> -		tcf_idr_release(*a, bind);
> -		if (!ovr)
> +		if (!ovr) {
> +			tcf_idr_release(*a, bind);
>  			return -EEXIST;
> +		}
>  
>  		reset_policy(d, defdata, parm);
>  	}
> diff --git a/net/sched/act_skbedit.c b/net/sched/act_skbedit.c
> index dc0cb350aa45..c0607d1319eb 100644
> --- a/net/sched/act_skbedit.c
> +++ b/net/sched/act_skbedit.c
> @@ -137,9 +137,10 @@ static int tcf_skbedit_init(struct net *net, struct nlattr *nla,
>  		ret = ACT_P_CREATED;
>  	} else {
>  		d = to_skbedit(*a);
> -		tcf_idr_release(*a, bind);
> -		if (!ovr)
> +		if (!ovr) {
> +			tcf_idr_release(*a, bind);
>  			return -EEXIST;
> +		}
>  	}
>  
>  	spin_lock_bh(&d->tcf_lock);
> diff --git a/net/sched/act_skbmod.c b/net/sched/act_skbmod.c
> index 30be3f767495..e844381af066 100644
> --- a/net/sched/act_skbmod.c
> +++ b/net/sched/act_skbmod.c
> @@ -145,10 +145,9 @@ static int tcf_skbmod_init(struct net *net, struct nlattr *nla,
>  			return ret;
>  
>  		ret = ACT_P_CREATED;
> -	} else {
> +	} else if (!ovr) {
>  		tcf_idr_release(*a, bind);
> -		if (!ovr)
> -			return -EEXIST;
> +		return -EEXIST;
>  	}
>  
>  	d = to_skbmod(*a);
> @@ -156,8 +155,7 @@ static int tcf_skbmod_init(struct net *net, struct nlattr *nla,
>  	ASSERT_RTNL();
>  	p = kzalloc(sizeof(struct tcf_skbmod_params), GFP_KERNEL);
>  	if (unlikely(!p)) {
> -		if (ret == ACT_P_CREATED)
> -			tcf_idr_release(*a, bind);
> +		tcf_idr_release(*a, bind);
>  		return -ENOMEM;
>  	}
>  
> diff --git a/net/sched/act_tunnel_key.c b/net/sched/act_tunnel_key.c
> index 4b7f9a3b47d7..bd53f39a345b 100644
> --- a/net/sched/act_tunnel_key.c
> +++ b/net/sched/act_tunnel_key.c
> @@ -165,10 +165,9 @@ static int tunnel_key_init(struct net *net, struct nlattr *nla,
>  			return ret;
>  
>  		ret = ACT_P_CREATED;
> -	} else {
> +	} else if (!ovr) {
>  		tcf_idr_release(*a, bind);
> -		if (!ovr)
> -			return -EEXIST;
> +		return -EEXIST;
>  	}
>  
>  	t = to_tunnel_key(*a);
> @@ -176,8 +175,7 @@ static int tunnel_key_init(struct net *net, struct nlattr *nla,
>  	ASSERT_RTNL();
>  	params_new = kzalloc(sizeof(*params_new), GFP_KERNEL);
>  	if (unlikely(!params_new)) {
> -		if (ret == ACT_P_CREATED)
> -			tcf_idr_release(*a, bind);
> +		tcf_idr_release(*a, bind);
>  		return -ENOMEM;
>  	}
>  
> diff --git a/net/sched/act_vlan.c b/net/sched/act_vlan.c
> index b44377c951b6..4ac0d565e437 100644
> --- a/net/sched/act_vlan.c
> +++ b/net/sched/act_vlan.c
> @@ -185,10 +185,9 @@ static int tcf_vlan_init(struct net *net, struct nlattr *nla,
>  			return ret;
>  
>  		ret = ACT_P_CREATED;
> -	} else {
> +	} else if (!ovr) {
>  		tcf_idr_release(*a, bind);
> -		if (!ovr)
> -			return -EEXIST;
> +		return -EEXIST;
>  	}
>  
>  	v = to_vlan(*a);
> @@ -196,8 +195,7 @@ static int tcf_vlan_init(struct net *net, struct nlattr *nla,
>  	ASSERT_RTNL();
>  	p = kzalloc(sizeof(*p), GFP_KERNEL);
>  	if (!p) {
> -		if (ret == ACT_P_CREATED)
> -			tcf_idr_release(*a, bind);
> +		tcf_idr_release(*a, bind);
>  		return -ENOMEM;
>  	}
>  
> -- 
> 2.7.5
> 

^ permalink raw reply

* Re: [PATCH v3 07/11] net: sched: implement reference counted action release
From: Marcelo Ricardo Leitner @ 2018-05-28 21:38 UTC (permalink / raw)
  To: Vlad Buslov
  Cc: jiri, netdev, jhs, xiyou.wangcong, davem, ast, daniel, kliteyn
In-Reply-To: <1527455849-22327-8-git-send-email-vladbu@mellanox.com>

On Mon, May 28, 2018 at 12:17:25AM +0300, Vlad Buslov wrote:
> Implement helper delete function that uses new action ops 'delete', instead
> of destroying action directly. This is required so act API could delete
> actions by index, without holding any references to action that is being
> deleted.
> 
> Implement function __tcf_action_put() that releases reference to action and
> frees it, if necessary. Refactor action deletion code to use new put
> function and not to rely on rtnl lock. Remove rtnl lock assertions that are
> no longer needed.
> 
> Signed-off-by: Vlad Buslov <vladbu@mellanox.com>

Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>

> ---
> Changes from V1 to V2:
> - Removed redundant actions ops lookup during delete.
> - Assume all actions have delete implemented and don't check for it
>   explicitly.
> - Rearrange variable definitions in tcf_action_delete.
> 
>  net/sched/act_api.c | 84 +++++++++++++++++++++++++++++++++++++++--------------
>  net/sched/cls_api.c |  1 -
>  2 files changed, 62 insertions(+), 23 deletions(-)
> 
> diff --git a/net/sched/act_api.c b/net/sched/act_api.c
> index 0f31f09946ab..a023873db713 100644
> --- a/net/sched/act_api.c
> +++ b/net/sched/act_api.c
> @@ -90,21 +90,39 @@ static void free_tcf(struct tc_action *p)
>  	kfree(p);
>  }
>  
> -static void tcf_idr_remove(struct tcf_idrinfo *idrinfo, struct tc_action *p)
> +static void tcf_action_cleanup(struct tc_action *p)
>  {
> -	spin_lock(&idrinfo->lock);
> -	idr_remove(&idrinfo->action_idr, p->tcfa_index);
> -	spin_unlock(&idrinfo->lock);
> +	if (p->ops->cleanup)
> +		p->ops->cleanup(p);
> +
>  	gen_kill_estimator(&p->tcfa_rate_est);
>  	free_tcf(p);
>  }
>  
> +static int __tcf_action_put(struct tc_action *p, bool bind)
> +{
> +	struct tcf_idrinfo *idrinfo = p->idrinfo;
> +
> +	if (refcount_dec_and_lock(&p->tcfa_refcnt, &idrinfo->lock)) {
> +		if (bind)
> +			atomic_dec(&p->tcfa_bindcnt);
> +		idr_remove(&idrinfo->action_idr, p->tcfa_index);
> +		spin_unlock(&idrinfo->lock);
> +
> +		tcf_action_cleanup(p);
> +		return 1;
> +	}
> +
> +	if (bind)
> +		atomic_dec(&p->tcfa_bindcnt);
> +
> +	return 0;
> +}
> +
>  int __tcf_idr_release(struct tc_action *p, bool bind, bool strict)
>  {
>  	int ret = 0;
>  
> -	ASSERT_RTNL();
> -
>  	/* Release with strict==1 and bind==0 is only called through act API
>  	 * interface (classifiers always bind). Only case when action with
>  	 * positive reference count and zero bind count can exist is when it was
> @@ -118,18 +136,11 @@ int __tcf_idr_release(struct tc_action *p, bool bind, bool strict)
>  	 * are acceptable.
>  	 */
>  	if (p) {
> -		if (bind)
> -			atomic_dec(&p->tcfa_bindcnt);
> -		else if (strict && atomic_read(&p->tcfa_bindcnt) > 0)
> +		if (!bind && strict && atomic_read(&p->tcfa_bindcnt) > 0)
>  			return -EPERM;
>  
> -		if (atomic_read(&p->tcfa_bindcnt) <= 0 &&
> -		    refcount_dec_and_test(&p->tcfa_refcnt)) {
> -			if (p->ops->cleanup)
> -				p->ops->cleanup(p);
> -			tcf_idr_remove(p->idrinfo, p);
> +		if (__tcf_action_put(p, bind))
>  			ret = ACT_P_DELETED;
> -		}
>  	}
>  
>  	return ret;
> @@ -340,11 +351,7 @@ int tcf_idr_delete_index(struct tc_action_net *tn, u32 index)
>  						p->tcfa_index));
>  			spin_unlock(&idrinfo->lock);
>  
> -			if (p->ops->cleanup)
> -				p->ops->cleanup(p);
> -
> -			gen_kill_estimator(&p->tcfa_rate_est);
> -			free_tcf(p);
> +			tcf_action_cleanup(p);
>  			module_put(owner);
>  			return 0;
>  		}
> @@ -615,6 +622,11 @@ int tcf_action_destroy(struct list_head *actions, int bind)
>  	return ret;
>  }
>  
> +static int tcf_action_put(struct tc_action *p)
> +{
> +	return __tcf_action_put(p, false);
> +}
> +
>  int
>  tcf_action_dump_old(struct sk_buff *skb, struct tc_action *a, int bind, int ref)
>  {
> @@ -1092,6 +1104,35 @@ static int tca_action_flush(struct net *net, struct nlattr *nla,
>  	return err;
>  }
>  
> +static int tcf_action_delete(struct net *net, struct list_head *actions,
> +			     struct netlink_ext_ack *extack)
> +{
> +	struct tc_action *a, *tmp;
> +	u32 act_index;
> +	int ret;
> +
> +	list_for_each_entry_safe(a, tmp, actions, list) {
> +		const struct tc_action_ops *ops = a->ops;
> +
> +		/* Actions can be deleted concurrently so we must save their
> +		 * type and id to search again after reference is released.
> +		 */
> +		act_index = a->tcfa_index;
> +
> +		list_del(&a->list);
> +		if (tcf_action_put(a)) {
> +			/* last reference, action was deleted concurrently */
> +			module_put(ops->owner);
> +		} else  {
> +			/* now do the delete */
> +			ret = ops->delete(net, act_index);
> +			if (ret < 0)
> +				return ret;
> +		}
> +	}
> +	return 0;
> +}
> +
>  static int
>  tcf_del_notify(struct net *net, struct nlmsghdr *n, struct list_head *actions,
>  	       u32 portid, size_t attr_size, struct netlink_ext_ack *extack)
> @@ -1112,7 +1153,7 @@ tcf_del_notify(struct net *net, struct nlmsghdr *n, struct list_head *actions,
>  	}
>  
>  	/* now do the delete */
> -	ret = tcf_action_destroy(actions, 0);
> +	ret = tcf_action_delete(net, actions, extack);
>  	if (ret < 0) {
>  		NL_SET_ERR_MSG(extack, "Failed to delete TC action");
>  		kfree_skb(skb);
> @@ -1164,7 +1205,6 @@ tca_action_gd(struct net *net, struct nlattr *nla, struct nlmsghdr *n,
>  	if (event == RTM_GETACTION)
>  		ret = tcf_get_notify(net, portid, n, &actions, event, extack);
>  	else { /* delete */
> -		cleanup_a(&actions, 1); /* lookup took reference */
>  		ret = tcf_del_notify(net, n, &actions, portid, attr_size, extack);
>  		if (ret)
>  			goto err;
> diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
> index efbf01ce14c2..dd37d0ae3fce 100644
> --- a/net/sched/cls_api.c
> +++ b/net/sched/cls_api.c
> @@ -1417,7 +1417,6 @@ void tcf_exts_destroy(struct tcf_exts *exts)
>  #ifdef CONFIG_NET_CLS_ACT
>  	LIST_HEAD(actions);
>  
> -	ASSERT_RTNL();
>  	tcf_exts_to_list(exts, &actions);
>  	tcf_action_destroy(&actions, TCA_ACT_UNBIND);
>  	kfree(exts->actions);
> -- 
> 2.7.5
> 

^ permalink raw reply

* Re: [PATCH v3 06/11] net: sched: add 'delete' function to action ops
From: Marcelo Ricardo Leitner @ 2018-05-28 21:38 UTC (permalink / raw)
  To: Vlad Buslov
  Cc: jiri, netdev, jhs, xiyou.wangcong, davem, ast, daniel, kliteyn
In-Reply-To: <1527455849-22327-7-git-send-email-vladbu@mellanox.com>

On Mon, May 28, 2018 at 12:17:24AM +0300, Vlad Buslov wrote:
> Extend action ops with 'delete' function. Each action type to implements
> its own delete function that doesn't depend on rtnl lock.
> 
> Implement delete function that is required to delete actions without
> holding rtnl lock. Use action API function that atomically deletes action
> only if it is still in action idr. This implementation prevents concurrent
> threads from deleting same action twice.
> 
> Signed-off-by: Vlad Buslov <vladbu@mellanox.com>

Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>

> ---
> Changes from V1 to V2:
> - Merge action ops delete definition and implementation.
> 
>  include/net/act_api.h      |  1 +
>  net/sched/act_bpf.c        |  8 ++++++++
>  net/sched/act_connmark.c   |  8 ++++++++
>  net/sched/act_csum.c       |  8 ++++++++
>  net/sched/act_gact.c       |  8 ++++++++
>  net/sched/act_ife.c        |  8 ++++++++
>  net/sched/act_ipt.c        | 16 ++++++++++++++++
>  net/sched/act_mirred.c     |  8 ++++++++
>  net/sched/act_nat.c        |  8 ++++++++
>  net/sched/act_pedit.c      |  8 ++++++++
>  net/sched/act_police.c     |  8 ++++++++
>  net/sched/act_sample.c     |  8 ++++++++
>  net/sched/act_simple.c     |  8 ++++++++
>  net/sched/act_skbedit.c    |  8 ++++++++
>  net/sched/act_skbmod.c     |  8 ++++++++
>  net/sched/act_tunnel_key.c |  8 ++++++++
>  net/sched/act_vlan.c       |  8 ++++++++
>  17 files changed, 137 insertions(+)
> 
> diff --git a/include/net/act_api.h b/include/net/act_api.h
> index d94ec6400673..d256e20507b9 100644
> --- a/include/net/act_api.h
> +++ b/include/net/act_api.h
> @@ -101,6 +101,7 @@ struct tc_action_ops {
>  	void	(*stats_update)(struct tc_action *, u64, u32, u64);
>  	size_t  (*get_fill_size)(const struct tc_action *act);
>  	struct net_device *(*get_dev)(const struct tc_action *a);
> +	int     (*delete)(struct net *net, u32 index);
>  };
>  
>  struct tc_action_net {
> diff --git a/net/sched/act_bpf.c b/net/sched/act_bpf.c
> index 8ebf40a3506c..7941dd66ff83 100644
> --- a/net/sched/act_bpf.c
> +++ b/net/sched/act_bpf.c
> @@ -388,6 +388,13 @@ static int tcf_bpf_search(struct net *net, struct tc_action **a, u32 index,
>  	return tcf_idr_search(tn, a, index);
>  }
>  
> +static int tcf_bpf_delete(struct net *net, u32 index)
> +{
> +	struct tc_action_net *tn = net_generic(net, bpf_net_id);
> +
> +	return tcf_idr_delete_index(tn, index);
> +}
> +
>  static struct tc_action_ops act_bpf_ops __read_mostly = {
>  	.kind		=	"bpf",
>  	.type		=	TCA_ACT_BPF,
> @@ -398,6 +405,7 @@ static struct tc_action_ops act_bpf_ops __read_mostly = {
>  	.init		=	tcf_bpf_init,
>  	.walk		=	tcf_bpf_walker,
>  	.lookup		=	tcf_bpf_search,
> +	.delete		=	tcf_bpf_delete,
>  	.size		=	sizeof(struct tcf_bpf),
>  };
>  
> diff --git a/net/sched/act_connmark.c b/net/sched/act_connmark.c
> index e3787aa0025a..143c2d3de723 100644
> --- a/net/sched/act_connmark.c
> +++ b/net/sched/act_connmark.c
> @@ -193,6 +193,13 @@ static int tcf_connmark_search(struct net *net, struct tc_action **a, u32 index,
>  	return tcf_idr_search(tn, a, index);
>  }
>  
> +static int tcf_connmark_delete(struct net *net, u32 index)
> +{
> +	struct tc_action_net *tn = net_generic(net, connmark_net_id);
> +
> +	return tcf_idr_delete_index(tn, index);
> +}
> +
>  static struct tc_action_ops act_connmark_ops = {
>  	.kind		=	"connmark",
>  	.type		=	TCA_ACT_CONNMARK,
> @@ -202,6 +209,7 @@ static struct tc_action_ops act_connmark_ops = {
>  	.init		=	tcf_connmark_init,
>  	.walk		=	tcf_connmark_walker,
>  	.lookup		=	tcf_connmark_search,
> +	.delete		=	tcf_connmark_delete,
>  	.size		=	sizeof(struct tcf_connmark_info),
>  };
>  
> diff --git a/net/sched/act_csum.c b/net/sched/act_csum.c
> index 334261943f9f..3768539340e0 100644
> --- a/net/sched/act_csum.c
> +++ b/net/sched/act_csum.c
> @@ -654,6 +654,13 @@ static size_t tcf_csum_get_fill_size(const struct tc_action *act)
>  	return nla_total_size(sizeof(struct tc_csum));
>  }
>  
> +static int tcf_csum_delete(struct net *net, u32 index)
> +{
> +	struct tc_action_net *tn = net_generic(net, csum_net_id);
> +
> +	return tcf_idr_delete_index(tn, index);
> +}
> +
>  static struct tc_action_ops act_csum_ops = {
>  	.kind		= "csum",
>  	.type		= TCA_ACT_CSUM,
> @@ -665,6 +672,7 @@ static struct tc_action_ops act_csum_ops = {
>  	.walk		= tcf_csum_walker,
>  	.lookup		= tcf_csum_search,
>  	.get_fill_size  = tcf_csum_get_fill_size,
> +	.delete		= tcf_csum_delete,
>  	.size		= sizeof(struct tcf_csum),
>  };
>  
> diff --git a/net/sched/act_gact.c b/net/sched/act_gact.c
> index b4dfb2b4addc..a431a711f0dd 100644
> --- a/net/sched/act_gact.c
> +++ b/net/sched/act_gact.c
> @@ -231,6 +231,13 @@ static size_t tcf_gact_get_fill_size(const struct tc_action *act)
>  	return sz;
>  }
>  
> +static int tcf_gact_delete(struct net *net, u32 index)
> +{
> +	struct tc_action_net *tn = net_generic(net, gact_net_id);
> +
> +	return tcf_idr_delete_index(tn, index);
> +}
> +
>  static struct tc_action_ops act_gact_ops = {
>  	.kind		=	"gact",
>  	.type		=	TCA_ACT_GACT,
> @@ -242,6 +249,7 @@ static struct tc_action_ops act_gact_ops = {
>  	.walk		=	tcf_gact_walker,
>  	.lookup		=	tcf_gact_search,
>  	.get_fill_size	=	tcf_gact_get_fill_size,
> +	.delete		=	tcf_gact_delete,
>  	.size		=	sizeof(struct tcf_gact),
>  };
>  
> diff --git a/net/sched/act_ife.c b/net/sched/act_ife.c
> index 3dccc4e1d378..027c305dcb37 100644
> --- a/net/sched/act_ife.c
> +++ b/net/sched/act_ife.c
> @@ -846,6 +846,13 @@ static int tcf_ife_search(struct net *net, struct tc_action **a, u32 index,
>  	return tcf_idr_search(tn, a, index);
>  }
>  
> +static int tcf_ife_delete(struct net *net, u32 index)
> +{
> +	struct tc_action_net *tn = net_generic(net, ife_net_id);
> +
> +	return tcf_idr_delete_index(tn, index);
> +}
> +
>  static struct tc_action_ops act_ife_ops = {
>  	.kind = "ife",
>  	.type = TCA_ACT_IFE,
> @@ -856,6 +863,7 @@ static struct tc_action_ops act_ife_ops = {
>  	.init = tcf_ife_init,
>  	.walk = tcf_ife_walker,
>  	.lookup = tcf_ife_search,
> +	.delete = tcf_ife_delete,
>  	.size =	sizeof(struct tcf_ife_info),
>  };
>  
> diff --git a/net/sched/act_ipt.c b/net/sched/act_ipt.c
> index 9c21663a86a6..6c234411c771 100644
> --- a/net/sched/act_ipt.c
> +++ b/net/sched/act_ipt.c
> @@ -324,6 +324,13 @@ static int tcf_ipt_search(struct net *net, struct tc_action **a, u32 index,
>  	return tcf_idr_search(tn, a, index);
>  }
>  
> +static int tcf_ipt_delete(struct net *net, u32 index)
> +{
> +	struct tc_action_net *tn = net_generic(net, ipt_net_id);
> +
> +	return tcf_idr_delete_index(tn, index);
> +}
> +
>  static struct tc_action_ops act_ipt_ops = {
>  	.kind		=	"ipt",
>  	.type		=	TCA_ACT_IPT,
> @@ -334,6 +341,7 @@ static struct tc_action_ops act_ipt_ops = {
>  	.init		=	tcf_ipt_init,
>  	.walk		=	tcf_ipt_walker,
>  	.lookup		=	tcf_ipt_search,
> +	.delete		=	tcf_ipt_delete,
>  	.size		=	sizeof(struct tcf_ipt),
>  };
>  
> @@ -374,6 +382,13 @@ static int tcf_xt_search(struct net *net, struct tc_action **a, u32 index,
>  	return tcf_idr_search(tn, a, index);
>  }
>  
> +static int tcf_xt_delete(struct net *net, u32 index)
> +{
> +	struct tc_action_net *tn = net_generic(net, xt_net_id);
> +
> +	return tcf_idr_delete_index(tn, index);
> +}
> +
>  static struct tc_action_ops act_xt_ops = {
>  	.kind		=	"xt",
>  	.type		=	TCA_ACT_XT,
> @@ -384,6 +399,7 @@ static struct tc_action_ops act_xt_ops = {
>  	.init		=	tcf_xt_init,
>  	.walk		=	tcf_xt_walker,
>  	.lookup		=	tcf_xt_search,
> +	.delete		=	tcf_xt_delete,
>  	.size		=	sizeof(struct tcf_ipt),
>  };
>  
> diff --git a/net/sched/act_mirred.c b/net/sched/act_mirred.c
> index 5434f08f2eb7..3d8300bce7e4 100644
> --- a/net/sched/act_mirred.c
> +++ b/net/sched/act_mirred.c
> @@ -322,6 +322,13 @@ static struct net_device *tcf_mirred_get_dev(const struct tc_action *a)
>  	return rtnl_dereference(m->tcfm_dev);
>  }
>  
> +static int tcf_mirred_delete(struct net *net, u32 index)
> +{
> +	struct tc_action_net *tn = net_generic(net, mirred_net_id);
> +
> +	return tcf_idr_delete_index(tn, index);
> +}
> +
>  static struct tc_action_ops act_mirred_ops = {
>  	.kind		=	"mirred",
>  	.type		=	TCA_ACT_MIRRED,
> @@ -335,6 +342,7 @@ static struct tc_action_ops act_mirred_ops = {
>  	.lookup		=	tcf_mirred_search,
>  	.size		=	sizeof(struct tcf_mirred),
>  	.get_dev	=	tcf_mirred_get_dev,
> +	.delete		=	tcf_mirred_delete,
>  };
>  
>  static __net_init int mirred_init_net(struct net *net)
> diff --git a/net/sched/act_nat.c b/net/sched/act_nat.c
> index e6487ad1e4a8..9eb27c89dc46 100644
> --- a/net/sched/act_nat.c
> +++ b/net/sched/act_nat.c
> @@ -294,6 +294,13 @@ static int tcf_nat_search(struct net *net, struct tc_action **a, u32 index,
>  	return tcf_idr_search(tn, a, index);
>  }
>  
> +static int tcf_nat_delete(struct net *net, u32 index)
> +{
> +	struct tc_action_net *tn = net_generic(net, nat_net_id);
> +
> +	return tcf_idr_delete_index(tn, index);
> +}
> +
>  static struct tc_action_ops act_nat_ops = {
>  	.kind		=	"nat",
>  	.type		=	TCA_ACT_NAT,
> @@ -303,6 +310,7 @@ static struct tc_action_ops act_nat_ops = {
>  	.init		=	tcf_nat_init,
>  	.walk		=	tcf_nat_walker,
>  	.lookup		=	tcf_nat_search,
> +	.delete		=	tcf_nat_delete,
>  	.size		=	sizeof(struct tcf_nat),
>  };
>  
> diff --git a/net/sched/act_pedit.c b/net/sched/act_pedit.c
> index 7c9a3f24edba..b8857035e3f8 100644
> --- a/net/sched/act_pedit.c
> +++ b/net/sched/act_pedit.c
> @@ -436,6 +436,13 @@ static int tcf_pedit_search(struct net *net, struct tc_action **a, u32 index,
>  	return tcf_idr_search(tn, a, index);
>  }
>  
> +static int tcf_pedit_delete(struct net *net, u32 index)
> +{
> +	struct tc_action_net *tn = net_generic(net, pedit_net_id);
> +
> +	return tcf_idr_delete_index(tn, index);
> +}
> +
>  static struct tc_action_ops act_pedit_ops = {
>  	.kind		=	"pedit",
>  	.type		=	TCA_ACT_PEDIT,
> @@ -446,6 +453,7 @@ static struct tc_action_ops act_pedit_ops = {
>  	.init		=	tcf_pedit_init,
>  	.walk		=	tcf_pedit_walker,
>  	.lookup		=	tcf_pedit_search,
> +	.delete		=	tcf_pedit_delete,
>  	.size		=	sizeof(struct tcf_pedit),
>  };
>  
> diff --git a/net/sched/act_police.c b/net/sched/act_police.c
> index 0e1c2fb0ebea..c955fb0d4f3f 100644
> --- a/net/sched/act_police.c
> +++ b/net/sched/act_police.c
> @@ -314,6 +314,13 @@ static int tcf_police_search(struct net *net, struct tc_action **a, u32 index,
>  	return tcf_idr_search(tn, a, index);
>  }
>  
> +static int tcf_police_delete(struct net *net, u32 index)
> +{
> +	struct tc_action_net *tn = net_generic(net, police_net_id);
> +
> +	return tcf_idr_delete_index(tn, index);
> +}
> +
>  MODULE_AUTHOR("Alexey Kuznetsov");
>  MODULE_DESCRIPTION("Policing actions");
>  MODULE_LICENSE("GPL");
> @@ -327,6 +334,7 @@ static struct tc_action_ops act_police_ops = {
>  	.init		=	tcf_act_police_init,
>  	.walk		=	tcf_act_police_walker,
>  	.lookup		=	tcf_police_search,
> +	.delete		=	tcf_police_delete,
>  	.size		=	sizeof(struct tcf_police),
>  };
>  
> diff --git a/net/sched/act_sample.c b/net/sched/act_sample.c
> index 316fc645595d..6f79d2afcba2 100644
> --- a/net/sched/act_sample.c
> +++ b/net/sched/act_sample.c
> @@ -220,6 +220,13 @@ static int tcf_sample_search(struct net *net, struct tc_action **a, u32 index,
>  	return tcf_idr_search(tn, a, index);
>  }
>  
> +static int tcf_sample_delete(struct net *net, u32 index)
> +{
> +	struct tc_action_net *tn = net_generic(net, sample_net_id);
> +
> +	return tcf_idr_delete_index(tn, index);
> +}
> +
>  static struct tc_action_ops act_sample_ops = {
>  	.kind	  = "sample",
>  	.type	  = TCA_ACT_SAMPLE,
> @@ -230,6 +237,7 @@ static struct tc_action_ops act_sample_ops = {
>  	.cleanup  = tcf_sample_cleanup,
>  	.walk	  = tcf_sample_walker,
>  	.lookup	  = tcf_sample_search,
> +	.delete	  = tcf_sample_delete,
>  	.size	  = sizeof(struct tcf_sample),
>  };
>  
> diff --git a/net/sched/act_simple.c b/net/sched/act_simple.c
> index 23fa893ea092..b570e7ca7e33 100644
> --- a/net/sched/act_simple.c
> +++ b/net/sched/act_simple.c
> @@ -187,6 +187,13 @@ static int tcf_simp_search(struct net *net, struct tc_action **a, u32 index,
>  	return tcf_idr_search(tn, a, index);
>  }
>  
> +static int tcf_simp_delete(struct net *net, u32 index)
> +{
> +	struct tc_action_net *tn = net_generic(net, simp_net_id);
> +
> +	return tcf_idr_delete_index(tn, index);
> +}
> +
>  static struct tc_action_ops act_simp_ops = {
>  	.kind		=	"simple",
>  	.type		=	TCA_ACT_SIMP,
> @@ -197,6 +204,7 @@ static struct tc_action_ops act_simp_ops = {
>  	.init		=	tcf_simp_init,
>  	.walk		=	tcf_simp_walker,
>  	.lookup		=	tcf_simp_search,
> +	.delete		=	tcf_simp_delete,
>  	.size		=	sizeof(struct tcf_defact),
>  };
>  
> diff --git a/net/sched/act_skbedit.c b/net/sched/act_skbedit.c
> index 85ed9d603dc1..dc0cb350aa45 100644
> --- a/net/sched/act_skbedit.c
> +++ b/net/sched/act_skbedit.c
> @@ -226,6 +226,13 @@ static int tcf_skbedit_search(struct net *net, struct tc_action **a, u32 index,
>  	return tcf_idr_search(tn, a, index);
>  }
>  
> +static int tcf_skbedit_delete(struct net *net, u32 index)
> +{
> +	struct tc_action_net *tn = net_generic(net, skbedit_net_id);
> +
> +	return tcf_idr_delete_index(tn, index);
> +}
> +
>  static struct tc_action_ops act_skbedit_ops = {
>  	.kind		=	"skbedit",
>  	.type		=	TCA_ACT_SKBEDIT,
> @@ -235,6 +242,7 @@ static struct tc_action_ops act_skbedit_ops = {
>  	.init		=	tcf_skbedit_init,
>  	.walk		=	tcf_skbedit_walker,
>  	.lookup		=	tcf_skbedit_search,
> +	.delete		=	tcf_skbedit_delete,
>  	.size		=	sizeof(struct tcf_skbedit),
>  };
>  
> diff --git a/net/sched/act_skbmod.c b/net/sched/act_skbmod.c
> index 026d6f58eda1..30be3f767495 100644
> --- a/net/sched/act_skbmod.c
> +++ b/net/sched/act_skbmod.c
> @@ -253,6 +253,13 @@ static int tcf_skbmod_search(struct net *net, struct tc_action **a, u32 index,
>  	return tcf_idr_search(tn, a, index);
>  }
>  
> +static int tcf_skbmod_delete(struct net *net, u32 index)
> +{
> +	struct tc_action_net *tn = net_generic(net, skbmod_net_id);
> +
> +	return tcf_idr_delete_index(tn, index);
> +}
> +
>  static struct tc_action_ops act_skbmod_ops = {
>  	.kind		=	"skbmod",
>  	.type		=	TCA_ACT_SKBMOD,
> @@ -263,6 +270,7 @@ static struct tc_action_ops act_skbmod_ops = {
>  	.cleanup	=	tcf_skbmod_cleanup,
>  	.walk		=	tcf_skbmod_walker,
>  	.lookup		=	tcf_skbmod_search,
> +	.delete		=	tcf_skbmod_delete,
>  	.size		=	sizeof(struct tcf_skbmod),
>  };
>  
> diff --git a/net/sched/act_tunnel_key.c b/net/sched/act_tunnel_key.c
> index ed698fcb0e5a..4b7f9a3b47d7 100644
> --- a/net/sched/act_tunnel_key.c
> +++ b/net/sched/act_tunnel_key.c
> @@ -310,6 +310,13 @@ static int tunnel_key_search(struct net *net, struct tc_action **a, u32 index,
>  	return tcf_idr_search(tn, a, index);
>  }
>  
> +static int tunnel_key_delete(struct net *net, u32 index)
> +{
> +	struct tc_action_net *tn = net_generic(net, tunnel_key_net_id);
> +
> +	return tcf_idr_delete_index(tn, index);
> +}
> +
>  static struct tc_action_ops act_tunnel_key_ops = {
>  	.kind		=	"tunnel_key",
>  	.type		=	TCA_ACT_TUNNEL_KEY,
> @@ -320,6 +327,7 @@ static struct tc_action_ops act_tunnel_key_ops = {
>  	.cleanup	=	tunnel_key_release,
>  	.walk		=	tunnel_key_walker,
>  	.lookup		=	tunnel_key_search,
> +	.delete		=	tunnel_key_delete,
>  	.size		=	sizeof(struct tcf_tunnel_key),
>  };
>  
> diff --git a/net/sched/act_vlan.c b/net/sched/act_vlan.c
> index 72d2d78fb942..b44377c951b6 100644
> --- a/net/sched/act_vlan.c
> +++ b/net/sched/act_vlan.c
> @@ -285,6 +285,13 @@ static int tcf_vlan_search(struct net *net, struct tc_action **a, u32 index,
>  	return tcf_idr_search(tn, a, index);
>  }
>  
> +static int tcf_vlan_delete(struct net *net, u32 index)
> +{
> +	struct tc_action_net *tn = net_generic(net, vlan_net_id);
> +
> +	return tcf_idr_delete_index(tn, index);
> +}
> +
>  static struct tc_action_ops act_vlan_ops = {
>  	.kind		=	"vlan",
>  	.type		=	TCA_ACT_VLAN,
> @@ -295,6 +302,7 @@ static struct tc_action_ops act_vlan_ops = {
>  	.cleanup	=	tcf_vlan_cleanup,
>  	.walk		=	tcf_vlan_walker,
>  	.lookup		=	tcf_vlan_search,
> +	.delete		=	tcf_vlan_delete,
>  	.size		=	sizeof(struct tcf_vlan),
>  };
>  
> -- 
> 2.7.5
> 

^ permalink raw reply

* Re: [PATCH v3 05/11] net: sched: implement action API that deletes action by index
From: Marcelo Ricardo Leitner @ 2018-05-28 21:38 UTC (permalink / raw)
  To: Vlad Buslov
  Cc: jiri, netdev, jhs, xiyou.wangcong, davem, ast, daniel, kliteyn
In-Reply-To: <1527455849-22327-6-git-send-email-vladbu@mellanox.com>

On Mon, May 28, 2018 at 12:17:23AM +0300, Vlad Buslov wrote:
> Implement new action API function that atomically finds and deletes action
> from idr by index. Intended to be used by lockless actions that do not rely
> on rtnl lock.
> 
> Signed-off-by: Vlad Buslov <vladbu@mellanox.com>

Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>

> ---
> Changes from V1 to V2:
> - Rename tcf_idr_find_delete to tcf_idr_delete_index.
> 
>  include/net/act_api.h |  1 +
>  net/sched/act_api.c   | 39 +++++++++++++++++++++++++++++++++++++++
>  2 files changed, 40 insertions(+)
> 
> diff --git a/include/net/act_api.h b/include/net/act_api.h
> index 888ff471bbf6..d94ec6400673 100644
> --- a/include/net/act_api.h
> +++ b/include/net/act_api.h
> @@ -153,6 +153,7 @@ int tcf_idr_create(struct tc_action_net *tn, u32 index, struct nlattr *est,
>  		   int bind, bool cpustats);
>  void tcf_idr_insert(struct tc_action_net *tn, struct tc_action *a);
>  
> +int tcf_idr_delete_index(struct tc_action_net *tn, u32 index);
>  int __tcf_idr_release(struct tc_action *a, bool bind, bool strict);
>  
>  static inline int tcf_idr_release(struct tc_action *a, bool bind)
> diff --git a/net/sched/act_api.c b/net/sched/act_api.c
> index aa304d36fee0..0f31f09946ab 100644
> --- a/net/sched/act_api.c
> +++ b/net/sched/act_api.c
> @@ -319,6 +319,45 @@ bool tcf_idr_check(struct tc_action_net *tn, u32 index, struct tc_action **a,
>  }
>  EXPORT_SYMBOL(tcf_idr_check);
>  
> +int tcf_idr_delete_index(struct tc_action_net *tn, u32 index)
> +{
> +	struct tcf_idrinfo *idrinfo = tn->idrinfo;
> +	struct tc_action *p;
> +	int ret = 0;
> +
> +	spin_lock(&idrinfo->lock);
> +	p = idr_find(&idrinfo->action_idr, index);
> +	if (!p) {
> +		spin_unlock(&idrinfo->lock);
> +		return -ENOENT;
> +	}
> +
> +	if (!atomic_read(&p->tcfa_bindcnt)) {
> +		if (refcount_dec_and_test(&p->tcfa_refcnt)) {
> +			struct module *owner = p->ops->owner;
> +
> +			WARN_ON(p != idr_remove(&idrinfo->action_idr,
> +						p->tcfa_index));
> +			spin_unlock(&idrinfo->lock);
> +
> +			if (p->ops->cleanup)
> +				p->ops->cleanup(p);
> +
> +			gen_kill_estimator(&p->tcfa_rate_est);
> +			free_tcf(p);
> +			module_put(owner);
> +			return 0;
> +		}
> +		ret = 0;
> +	} else {
> +		ret = -EPERM;
> +	}
> +
> +	spin_unlock(&idrinfo->lock);
> +	return ret;
> +}
> +EXPORT_SYMBOL(tcf_idr_delete_index);
> +
>  int tcf_idr_create(struct tc_action_net *tn, u32 index, struct nlattr *est,
>  		   struct tc_action **a, const struct tc_action_ops *ops,
>  		   int bind, bool cpustats)
> -- 
> 2.7.5
> 

^ permalink raw reply

* Re: [PATCH v3 04/11] net: sched: always take reference to action
From: Marcelo Ricardo Leitner @ 2018-05-28 21:38 UTC (permalink / raw)
  To: Vlad Buslov
  Cc: jiri, netdev, jhs, xiyou.wangcong, davem, ast, daniel, kliteyn
In-Reply-To: <1527455849-22327-5-git-send-email-vladbu@mellanox.com>

On Mon, May 28, 2018 at 12:17:22AM +0300, Vlad Buslov wrote:
> Without rtnl lock protection it is no longer safe to use pointer to tc
> action without holding reference to it. (it can be destroyed concurrently)
> 
> Remove unsafe action idr lookup function. Instead of it, implement safe tcf
> idr check function that atomically looks up action in idr and increments
> its reference and bind counters. Implement both action search and check
> using new safe function
> 
> Reference taken by idr check is temporal and should not be accounted by
> userspace clients (both logically and to preserver current API behavior).
> Subtract temporal reference when dumping action to userspace using existing
> tca_get_fill function arguments.
> 
> Signed-off-by: Vlad Buslov <vladbu@mellanox.com>

Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>

> ---
> Changes from V1 to V2:
> - Make __tcf_idr_check function static
> - Merge changes that take reference to action when performing lookup and
>   changes that account for this additional reference when dumping action
>   to user space into single patch.
> 
>  net/sched/act_api.c | 46 ++++++++++++++++++++--------------------------
>  1 file changed, 20 insertions(+), 26 deletions(-)
> 
> diff --git a/net/sched/act_api.c b/net/sched/act_api.c
> index 256b0c93916c..aa304d36fee0 100644
> --- a/net/sched/act_api.c
> +++ b/net/sched/act_api.c
> @@ -284,44 +284,38 @@ int tcf_generic_walker(struct tc_action_net *tn, struct sk_buff *skb,
>  }
>  EXPORT_SYMBOL(tcf_generic_walker);
>  
> -static struct tc_action *tcf_idr_lookup(u32 index, struct tcf_idrinfo *idrinfo)
> +static bool __tcf_idr_check(struct tc_action_net *tn, u32 index,
> +			    struct tc_action **a, int bind)
>  {
> -	struct tc_action *p = NULL;
> +	struct tcf_idrinfo *idrinfo = tn->idrinfo;
> +	struct tc_action *p;
>  
>  	spin_lock(&idrinfo->lock);
>  	p = idr_find(&idrinfo->action_idr, index);
> +	if (p) {
> +		refcount_inc(&p->tcfa_refcnt);
> +		if (bind)
> +			atomic_inc(&p->tcfa_bindcnt);
> +	}
>  	spin_unlock(&idrinfo->lock);
>  
> -	return p;
> +	if (p) {
> +		*a = p;
> +		return true;
> +	}
> +	return false;
>  }
>  
>  int tcf_idr_search(struct tc_action_net *tn, struct tc_action **a, u32 index)
>  {
> -	struct tcf_idrinfo *idrinfo = tn->idrinfo;
> -	struct tc_action *p = tcf_idr_lookup(index, idrinfo);
> -
> -	if (p) {
> -		*a = p;
> -		return 1;
> -	}
> -	return 0;
> +	return __tcf_idr_check(tn, index, a, 0);
>  }
>  EXPORT_SYMBOL(tcf_idr_search);
>  
>  bool tcf_idr_check(struct tc_action_net *tn, u32 index, struct tc_action **a,
>  		   int bind)
>  {
> -	struct tcf_idrinfo *idrinfo = tn->idrinfo;
> -	struct tc_action *p = tcf_idr_lookup(index, idrinfo);
> -
> -	if (index && p) {
> -		if (bind)
> -			atomic_inc(&p->tcfa_bindcnt);
> -		refcount_inc(&p->tcfa_refcnt);
> -		*a = p;
> -		return true;
> -	}
> -	return false;
> +	return __tcf_idr_check(tn, index, a, bind);
>  }
>  EXPORT_SYMBOL(tcf_idr_check);
>  
> @@ -932,7 +926,7 @@ tcf_get_notify(struct net *net, u32 portid, struct nlmsghdr *n,
>  	if (!skb)
>  		return -ENOBUFS;
>  	if (tca_get_fill(skb, actions, portid, n->nlmsg_seq, 0, event,
> -			 0, 0) <= 0) {
> +			 0, 1) <= 0) {
>  		NL_SET_ERR_MSG(extack, "Failed to fill netlink attributes while adding TC action");
>  		kfree_skb(skb);
>  		return -EINVAL;
> @@ -1072,7 +1066,7 @@ tcf_del_notify(struct net *net, struct nlmsghdr *n, struct list_head *actions,
>  		return -ENOBUFS;
>  
>  	if (tca_get_fill(skb, actions, portid, n->nlmsg_seq, 0, RTM_DELACTION,
> -			 0, 1) <= 0) {
> +			 0, 2) <= 0) {
>  		NL_SET_ERR_MSG(extack, "Failed to fill netlink TC action attributes");
>  		kfree_skb(skb);
>  		return -EINVAL;
> @@ -1131,14 +1125,14 @@ tca_action_gd(struct net *net, struct nlattr *nla, struct nlmsghdr *n,
>  	if (event == RTM_GETACTION)
>  		ret = tcf_get_notify(net, portid, n, &actions, event, extack);
>  	else { /* delete */
> +		cleanup_a(&actions, 1); /* lookup took reference */
>  		ret = tcf_del_notify(net, n, &actions, portid, attr_size, extack);
>  		if (ret)
>  			goto err;
>  		return ret;
>  	}
>  err:
> -	if (event != RTM_GETACTION)
> -		tcf_action_destroy(&actions, 0);
> +	tcf_action_destroy(&actions, 0);
>  	return ret;
>  }
>  
> -- 
> 2.7.5
> 

^ permalink raw reply

* Re: [PATCH v3 03/11] net: sched: implement unlocked action init API
From: Marcelo Ricardo Leitner @ 2018-05-28 21:38 UTC (permalink / raw)
  To: Vlad Buslov
  Cc: jiri, netdev, jhs, xiyou.wangcong, davem, ast, daniel, kliteyn
In-Reply-To: <1527455849-22327-4-git-send-email-vladbu@mellanox.com>

On Mon, May 28, 2018 at 12:17:21AM +0300, Vlad Buslov wrote:
> Add additional 'rtnl_held' argument to act API init functions. It is
> required to implement actions that need to release rtnl lock before loading
> kernel module and reacquire if afterwards.
> 
> Signed-off-by: Vlad Buslov <vladbu@mellanox.com>

Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>

> ---
> Changes from V1 to V2:
> - Rename "unlocked" to "rtnl_held" for clarity.
> 
>  include/net/act_api.h      |  6 ++++--
>  net/sched/act_api.c        | 18 +++++++++++-------
>  net/sched/act_bpf.c        |  3 ++-
>  net/sched/act_connmark.c   |  2 +-
>  net/sched/act_csum.c       |  3 ++-
>  net/sched/act_gact.c       |  3 ++-
>  net/sched/act_ife.c        |  3 ++-
>  net/sched/act_ipt.c        |  6 ++++--
>  net/sched/act_mirred.c     |  5 +++--
>  net/sched/act_nat.c        |  2 +-
>  net/sched/act_pedit.c      |  3 ++-
>  net/sched/act_police.c     |  2 +-
>  net/sched/act_sample.c     |  3 ++-
>  net/sched/act_simple.c     |  3 ++-
>  net/sched/act_skbedit.c    |  3 ++-
>  net/sched/act_skbmod.c     |  3 ++-
>  net/sched/act_tunnel_key.c |  3 ++-
>  net/sched/act_vlan.c       |  3 ++-
>  net/sched/cls_api.c        |  5 +++--
>  19 files changed, 50 insertions(+), 29 deletions(-)
> 
> diff --git a/include/net/act_api.h b/include/net/act_api.h
> index e634014605cb..888ff471bbf6 100644
> --- a/include/net/act_api.h
> +++ b/include/net/act_api.h
> @@ -92,7 +92,8 @@ struct tc_action_ops {
>  			  struct netlink_ext_ack *extack);
>  	int     (*init)(struct net *net, struct nlattr *nla,
>  			struct nlattr *est, struct tc_action **act, int ovr,
> -			int bind, struct netlink_ext_ack *extack);
> +			int bind, bool rtnl_held,
> +			struct netlink_ext_ack *extack);
>  	int     (*walk)(struct net *, struct sk_buff *,
>  			struct netlink_callback *, int,
>  			const struct tc_action_ops *,
> @@ -168,10 +169,11 @@ int tcf_action_exec(struct sk_buff *skb, struct tc_action **actions,
>  int tcf_action_init(struct net *net, struct tcf_proto *tp, struct nlattr *nla,
>  		    struct nlattr *est, char *name, int ovr, int bind,
>  		    struct list_head *actions, size_t *attr_size,
> -		    struct netlink_ext_ack *extack);
> +		    bool rtnl_held, struct netlink_ext_ack *extack);
>  struct tc_action *tcf_action_init_1(struct net *net, struct tcf_proto *tp,
>  				    struct nlattr *nla, struct nlattr *est,
>  				    char *name, int ovr, int bind,
> +				    bool rtnl_held,
>  				    struct netlink_ext_ack *extack);
>  int tcf_action_dump(struct sk_buff *skb, struct list_head *, int, int);
>  int tcf_action_dump_old(struct sk_buff *skb, struct tc_action *a, int, int);
> diff --git a/net/sched/act_api.c b/net/sched/act_api.c
> index 4f064ecab882..256b0c93916c 100644
> --- a/net/sched/act_api.c
> +++ b/net/sched/act_api.c
> @@ -671,6 +671,7 @@ static struct tc_cookie *nla_memdup_cookie(struct nlattr **tb)
>  struct tc_action *tcf_action_init_1(struct net *net, struct tcf_proto *tp,
>  				    struct nlattr *nla, struct nlattr *est,
>  				    char *name, int ovr, int bind,
> +				    bool rtnl_held,
>  				    struct netlink_ext_ack *extack)
>  {
>  	struct tc_action *a;
> @@ -721,9 +722,11 @@ struct tc_action *tcf_action_init_1(struct net *net, struct tcf_proto *tp,
>  	a_o = tc_lookup_action_n(act_name);
>  	if (a_o == NULL) {
>  #ifdef CONFIG_MODULES
> -		rtnl_unlock();
> +		if (rtnl_held)
> +			rtnl_unlock();
>  		request_module("act_%s", act_name);
> -		rtnl_lock();
> +		if (rtnl_held)
> +			rtnl_lock();
>  
>  		a_o = tc_lookup_action_n(act_name);
>  
> @@ -746,9 +749,10 @@ struct tc_action *tcf_action_init_1(struct net *net, struct tcf_proto *tp,
>  	/* backward compatibility for policer */
>  	if (name == NULL)
>  		err = a_o->init(net, tb[TCA_ACT_OPTIONS], est, &a, ovr, bind,
> -				extack);
> +				rtnl_held, extack);
>  	else
> -		err = a_o->init(net, nla, est, &a, ovr, bind, extack);
> +		err = a_o->init(net, nla, est, &a, ovr, bind, rtnl_held,
> +				extack);
>  	if (err < 0)
>  		goto err_mod;
>  
> @@ -800,7 +804,7 @@ static void cleanup_a(struct list_head *actions, int ovr)
>  int tcf_action_init(struct net *net, struct tcf_proto *tp, struct nlattr *nla,
>  		    struct nlattr *est, char *name, int ovr, int bind,
>  		    struct list_head *actions, size_t *attr_size,
> -		    struct netlink_ext_ack *extack)
> +		    bool rtnl_held, struct netlink_ext_ack *extack)
>  {
>  	struct nlattr *tb[TCA_ACT_MAX_PRIO + 1];
>  	struct tc_action *act;
> @@ -814,7 +818,7 @@ int tcf_action_init(struct net *net, struct tcf_proto *tp, struct nlattr *nla,
>  
>  	for (i = 1; i <= TCA_ACT_MAX_PRIO && tb[i]; i++) {
>  		act = tcf_action_init_1(net, tp, tb[i], est, name, ovr, bind,
> -					extack);
> +					rtnl_held, extack);
>  		if (IS_ERR(act)) {
>  			err = PTR_ERR(act);
>  			goto err;
> @@ -1173,7 +1177,7 @@ static int tcf_action_add(struct net *net, struct nlattr *nla,
>  	LIST_HEAD(actions);
>  
>  	ret = tcf_action_init(net, NULL, nla, NULL, NULL, ovr, 0, &actions,
> -			      &attr_size, extack);
> +			      &attr_size, true, extack);
>  	if (ret)
>  		return ret;
>  
> diff --git a/net/sched/act_bpf.c b/net/sched/act_bpf.c
> index 15a2a53cbde1..8ebf40a3506c 100644
> --- a/net/sched/act_bpf.c
> +++ b/net/sched/act_bpf.c
> @@ -276,7 +276,8 @@ static void tcf_bpf_prog_fill_cfg(const struct tcf_bpf *prog,
>  
>  static int tcf_bpf_init(struct net *net, struct nlattr *nla,
>  			struct nlattr *est, struct tc_action **act,
> -			int replace, int bind, struct netlink_ext_ack *extack)
> +			int replace, int bind, bool rtnl_held,
> +			struct netlink_ext_ack *extack)
>  {
>  	struct tc_action_net *tn = net_generic(net, bpf_net_id);
>  	struct nlattr *tb[TCA_ACT_BPF_MAX + 1];
> diff --git a/net/sched/act_connmark.c b/net/sched/act_connmark.c
> index 188865034f9a..e3787aa0025a 100644
> --- a/net/sched/act_connmark.c
> +++ b/net/sched/act_connmark.c
> @@ -96,7 +96,7 @@ static const struct nla_policy connmark_policy[TCA_CONNMARK_MAX + 1] = {
>  
>  static int tcf_connmark_init(struct net *net, struct nlattr *nla,
>  			     struct nlattr *est, struct tc_action **a,
> -			     int ovr, int bind,
> +			     int ovr, int bind, bool rtnl_held,
>  			     struct netlink_ext_ack *extack)
>  {
>  	struct tc_action_net *tn = net_generic(net, connmark_net_id);
> diff --git a/net/sched/act_csum.c b/net/sched/act_csum.c
> index da865f7b390a..334261943f9f 100644
> --- a/net/sched/act_csum.c
> +++ b/net/sched/act_csum.c
> @@ -46,7 +46,8 @@ static struct tc_action_ops act_csum_ops;
>  
>  static int tcf_csum_init(struct net *net, struct nlattr *nla,
>  			 struct nlattr *est, struct tc_action **a, int ovr,
> -			 int bind, struct netlink_ext_ack *extack)
> +			 int bind, bool rtnl_held,
> +			 struct netlink_ext_ack *extack)
>  {
>  	struct tc_action_net *tn = net_generic(net, csum_net_id);
>  	struct tcf_csum_params *params_old, *params_new;
> diff --git a/net/sched/act_gact.c b/net/sched/act_gact.c
> index ca83debd5a70..b4dfb2b4addc 100644
> --- a/net/sched/act_gact.c
> +++ b/net/sched/act_gact.c
> @@ -56,7 +56,8 @@ static const struct nla_policy gact_policy[TCA_GACT_MAX + 1] = {
>  
>  static int tcf_gact_init(struct net *net, struct nlattr *nla,
>  			 struct nlattr *est, struct tc_action **a,
> -			 int ovr, int bind, struct netlink_ext_ack *extack)
> +			 int ovr, int bind, bool rtnl_held,
> +			 struct netlink_ext_ack *extack)
>  {
>  	struct tc_action_net *tn = net_generic(net, gact_net_id);
>  	struct nlattr *tb[TCA_GACT_MAX + 1];
> diff --git a/net/sched/act_ife.c b/net/sched/act_ife.c
> index 706e84d6f912..3dccc4e1d378 100644
> --- a/net/sched/act_ife.c
> +++ b/net/sched/act_ife.c
> @@ -447,7 +447,8 @@ static int populate_metalist(struct tcf_ife_info *ife, struct nlattr **tb,
>  
>  static int tcf_ife_init(struct net *net, struct nlattr *nla,
>  			struct nlattr *est, struct tc_action **a,
> -			int ovr, int bind, struct netlink_ext_ack *extack)
> +			int ovr, int bind, bool rtnl_held,
> +			struct netlink_ext_ack *extack)
>  {
>  	struct tc_action_net *tn = net_generic(net, ife_net_id);
>  	struct nlattr *tb[TCA_IFE_MAX + 1];
> diff --git a/net/sched/act_ipt.c b/net/sched/act_ipt.c
> index 7bce88dc11c9..9c21663a86a6 100644
> --- a/net/sched/act_ipt.c
> +++ b/net/sched/act_ipt.c
> @@ -196,7 +196,8 @@ static int __tcf_ipt_init(struct net *net, unsigned int id, struct nlattr *nla,
>  
>  static int tcf_ipt_init(struct net *net, struct nlattr *nla,
>  			struct nlattr *est, struct tc_action **a, int ovr,
> -			int bind, struct netlink_ext_ack *extack)
> +			int bind, bool rtnl_held,
> +			struct netlink_ext_ack *extack)
>  {
>  	return __tcf_ipt_init(net, ipt_net_id, nla, est, a, &act_ipt_ops, ovr,
>  			      bind);
> @@ -204,7 +205,8 @@ static int tcf_ipt_init(struct net *net, struct nlattr *nla,
>  
>  static int tcf_xt_init(struct net *net, struct nlattr *nla,
>  		       struct nlattr *est, struct tc_action **a, int ovr,
> -		       int bind, struct netlink_ext_ack *extack)
> +		       int bind, bool unlocked,
> +		       struct netlink_ext_ack *extack)
>  {
>  	return __tcf_ipt_init(net, xt_net_id, nla, est, a, &act_xt_ops, ovr,
>  			      bind);
> diff --git a/net/sched/act_mirred.c b/net/sched/act_mirred.c
> index 82a8bdd67c47..5434f08f2eb7 100644
> --- a/net/sched/act_mirred.c
> +++ b/net/sched/act_mirred.c
> @@ -68,8 +68,9 @@ static unsigned int mirred_net_id;
>  static struct tc_action_ops act_mirred_ops;
>  
>  static int tcf_mirred_init(struct net *net, struct nlattr *nla,
> -			   struct nlattr *est, struct tc_action **a, int ovr,
> -			   int bind, struct netlink_ext_ack *extack)
> +			   struct nlattr *est, struct tc_action **a,
> +			   int ovr, int bind, bool rtnl_held,
> +			   struct netlink_ext_ack *extack)
>  {
>  	struct tc_action_net *tn = net_generic(net, mirred_net_id);
>  	struct nlattr *tb[TCA_MIRRED_MAX + 1];
> diff --git a/net/sched/act_nat.c b/net/sched/act_nat.c
> index 457c2ae3de46..e6487ad1e4a8 100644
> --- a/net/sched/act_nat.c
> +++ b/net/sched/act_nat.c
> @@ -38,7 +38,7 @@ static const struct nla_policy nat_policy[TCA_NAT_MAX + 1] = {
>  
>  static int tcf_nat_init(struct net *net, struct nlattr *nla, struct nlattr *est,
>  			struct tc_action **a, int ovr, int bind,
> -			struct netlink_ext_ack *extack)
> +			bool rtnl_held, struct netlink_ext_ack *extack)
>  {
>  	struct tc_action_net *tn = net_generic(net, nat_net_id);
>  	struct nlattr *tb[TCA_NAT_MAX + 1];
> diff --git a/net/sched/act_pedit.c b/net/sched/act_pedit.c
> index 0102b2935fdb..7c9a3f24edba 100644
> --- a/net/sched/act_pedit.c
> +++ b/net/sched/act_pedit.c
> @@ -132,7 +132,8 @@ static int tcf_pedit_key_ex_dump(struct sk_buff *skb,
>  
>  static int tcf_pedit_init(struct net *net, struct nlattr *nla,
>  			  struct nlattr *est, struct tc_action **a,
> -			  int ovr, int bind, struct netlink_ext_ack *extack)
> +			  int ovr, int bind, bool rtnl_held,
> +			  struct netlink_ext_ack *extack)
>  {
>  	struct tc_action_net *tn = net_generic(net, pedit_net_id);
>  	struct nlattr *tb[TCA_PEDIT_MAX + 1];
> diff --git a/net/sched/act_police.c b/net/sched/act_police.c
> index a789b8060968..0e1c2fb0ebea 100644
> --- a/net/sched/act_police.c
> +++ b/net/sched/act_police.c
> @@ -75,7 +75,7 @@ static const struct nla_policy police_policy[TCA_POLICE_MAX + 1] = {
>  
>  static int tcf_act_police_init(struct net *net, struct nlattr *nla,
>  			       struct nlattr *est, struct tc_action **a,
> -			       int ovr, int bind,
> +			       int ovr, int bind, bool rtnl_held,
>  			       struct netlink_ext_ack *extack)
>  {
>  	int ret = 0, err;
> diff --git a/net/sched/act_sample.c b/net/sched/act_sample.c
> index 4a46978db092..316fc645595d 100644
> --- a/net/sched/act_sample.c
> +++ b/net/sched/act_sample.c
> @@ -37,7 +37,8 @@ static const struct nla_policy sample_policy[TCA_SAMPLE_MAX + 1] = {
>  
>  static int tcf_sample_init(struct net *net, struct nlattr *nla,
>  			   struct nlattr *est, struct tc_action **a, int ovr,
> -			   int bind, struct netlink_ext_ack *extack)
> +			   int bind, bool rtnl_held,
> +			   struct netlink_ext_ack *extack)
>  {
>  	struct tc_action_net *tn = net_generic(net, sample_net_id);
>  	struct nlattr *tb[TCA_SAMPLE_MAX + 1];
> diff --git a/net/sched/act_simple.c b/net/sched/act_simple.c
> index 95d5985b8d67..23fa893ea092 100644
> --- a/net/sched/act_simple.c
> +++ b/net/sched/act_simple.c
> @@ -79,7 +79,8 @@ static const struct nla_policy simple_policy[TCA_DEF_MAX + 1] = {
>  
>  static int tcf_simp_init(struct net *net, struct nlattr *nla,
>  			 struct nlattr *est, struct tc_action **a,
> -			 int ovr, int bind, struct netlink_ext_ack *extack)
> +			 int ovr, int bind, bool rtnl_held,
> +			 struct netlink_ext_ack *extack)
>  {
>  	struct tc_action_net *tn = net_generic(net, simp_net_id);
>  	struct nlattr *tb[TCA_DEF_MAX + 1];
> diff --git a/net/sched/act_skbedit.c b/net/sched/act_skbedit.c
> index d418ec3b0ab9..85ed9d603dc1 100644
> --- a/net/sched/act_skbedit.c
> +++ b/net/sched/act_skbedit.c
> @@ -66,7 +66,8 @@ static const struct nla_policy skbedit_policy[TCA_SKBEDIT_MAX + 1] = {
>  
>  static int tcf_skbedit_init(struct net *net, struct nlattr *nla,
>  			    struct nlattr *est, struct tc_action **a,
> -			    int ovr, int bind, struct netlink_ext_ack *extack)
> +			    int ovr, int bind, bool rtnl_held,
> +			    struct netlink_ext_ack *extack)
>  {
>  	struct tc_action_net *tn = net_generic(net, skbedit_net_id);
>  	struct nlattr *tb[TCA_SKBEDIT_MAX + 1];
> diff --git a/net/sched/act_skbmod.c b/net/sched/act_skbmod.c
> index ff90d720eda3..026d6f58eda1 100644
> --- a/net/sched/act_skbmod.c
> +++ b/net/sched/act_skbmod.c
> @@ -84,7 +84,8 @@ static const struct nla_policy skbmod_policy[TCA_SKBMOD_MAX + 1] = {
>  
>  static int tcf_skbmod_init(struct net *net, struct nlattr *nla,
>  			   struct nlattr *est, struct tc_action **a,
> -			   int ovr, int bind, struct netlink_ext_ack *extack)
> +			   int ovr, int bind, bool rtnl_held,
> +			   struct netlink_ext_ack *extack)
>  {
>  	struct tc_action_net *tn = net_generic(net, skbmod_net_id);
>  	struct nlattr *tb[TCA_SKBMOD_MAX + 1];
> diff --git a/net/sched/act_tunnel_key.c b/net/sched/act_tunnel_key.c
> index c6e50695414b..ed698fcb0e5a 100644
> --- a/net/sched/act_tunnel_key.c
> +++ b/net/sched/act_tunnel_key.c
> @@ -70,7 +70,8 @@ static const struct nla_policy tunnel_key_policy[TCA_TUNNEL_KEY_MAX + 1] = {
>  
>  static int tunnel_key_init(struct net *net, struct nlattr *nla,
>  			   struct nlattr *est, struct tc_action **a,
> -			   int ovr, int bind, struct netlink_ext_ack *extack)
> +			   int ovr, int bind, bool rtnl_held,
> +			   struct netlink_ext_ack *extack)
>  {
>  	struct tc_action_net *tn = net_generic(net, tunnel_key_net_id);
>  	struct nlattr *tb[TCA_TUNNEL_KEY_MAX + 1];
> diff --git a/net/sched/act_vlan.c b/net/sched/act_vlan.c
> index 8dda78473004..72d2d78fb942 100644
> --- a/net/sched/act_vlan.c
> +++ b/net/sched/act_vlan.c
> @@ -109,7 +109,8 @@ static const struct nla_policy vlan_policy[TCA_VLAN_MAX + 1] = {
>  
>  static int tcf_vlan_init(struct net *net, struct nlattr *nla,
>  			 struct nlattr *est, struct tc_action **a,
> -			 int ovr, int bind, struct netlink_ext_ack *extack)
> +			 int ovr, int bind, bool rtnl_held,
> +			 struct netlink_ext_ack *extack)
>  {
>  	struct tc_action_net *tn = net_generic(net, vlan_net_id);
>  	struct nlattr *tb[TCA_VLAN_MAX + 1];
> diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
> index 963e4bf0aab8..efbf01ce14c2 100644
> --- a/net/sched/cls_api.c
> +++ b/net/sched/cls_api.c
> @@ -1438,7 +1438,7 @@ int tcf_exts_validate(struct net *net, struct tcf_proto *tp, struct nlattr **tb,
>  		if (exts->police && tb[exts->police]) {
>  			act = tcf_action_init_1(net, tp, tb[exts->police],
>  						rate_tlv, "police", ovr,
> -						TCA_ACT_BIND, extack);
> +						TCA_ACT_BIND, true, extack);
>  			if (IS_ERR(act))
>  				return PTR_ERR(act);
>  
> @@ -1451,7 +1451,8 @@ int tcf_exts_validate(struct net *net, struct tcf_proto *tp, struct nlattr **tb,
>  
>  			err = tcf_action_init(net, tp, tb[exts->action],
>  					      rate_tlv, NULL, ovr, TCA_ACT_BIND,
> -					      &actions, &attr_size, extack);
> +					      &actions, &attr_size, true,
> +					      extack);
>  			if (err)
>  				return err;
>  			list_for_each_entry(act, &actions, list)
> -- 
> 2.7.5
> 

^ permalink raw reply

* Re: [PATCH v3 02/11] net: sched: change type of reference and bind counters
From: Marcelo Ricardo Leitner @ 2018-05-28 21:37 UTC (permalink / raw)
  To: Vlad Buslov
  Cc: jiri, netdev, jhs, xiyou.wangcong, davem, ast, daniel, kliteyn,
	Jiri Pirko
In-Reply-To: <1527455849-22327-3-git-send-email-vladbu@mellanox.com>

On Mon, May 28, 2018 at 12:17:20AM +0300, Vlad Buslov wrote:
> Change type of action reference counter to refcount_t.
> 
> Change type of action bind counter to atomic_t.
> This type is used to allow decrementing bind counter without testing
> for 0 result.
> 
> Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
> Signed-off-by: Jiri Pirko <jiri@mellanox.com>

Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>

> ---
>  include/net/act_api.h      |  5 +++--
>  net/sched/act_api.c        | 32 ++++++++++++++++++++++----------
>  net/sched/act_bpf.c        |  4 ++--
>  net/sched/act_connmark.c   |  4 ++--
>  net/sched/act_csum.c       |  4 ++--
>  net/sched/act_gact.c       |  4 ++--
>  net/sched/act_ife.c        |  4 ++--
>  net/sched/act_ipt.c        |  4 ++--
>  net/sched/act_mirred.c     |  4 ++--
>  net/sched/act_nat.c        |  4 ++--
>  net/sched/act_pedit.c      |  4 ++--
>  net/sched/act_police.c     |  4 ++--
>  net/sched/act_sample.c     |  4 ++--
>  net/sched/act_simple.c     |  4 ++--
>  net/sched/act_skbedit.c    |  4 ++--
>  net/sched/act_skbmod.c     |  4 ++--
>  net/sched/act_tunnel_key.c |  4 ++--
>  net/sched/act_vlan.c       |  4 ++--
>  18 files changed, 57 insertions(+), 44 deletions(-)
> 
> diff --git a/include/net/act_api.h b/include/net/act_api.h
> index f7b59ef7303d..e634014605cb 100644
> --- a/include/net/act_api.h
> +++ b/include/net/act_api.h
> @@ -6,6 +6,7 @@
>   * Public action API for classifiers/qdiscs
>  */
>  
> +#include <linux/refcount.h>
>  #include <net/sch_generic.h>
>  #include <net/pkt_sched.h>
>  #include <net/net_namespace.h>
> @@ -26,8 +27,8 @@ struct tc_action {
>  	struct tcf_idrinfo		*idrinfo;
>  
>  	u32				tcfa_index;
> -	int				tcfa_refcnt;
> -	int				tcfa_bindcnt;
> +	refcount_t			tcfa_refcnt;
> +	atomic_t			tcfa_bindcnt;
>  	u32				tcfa_capab;
>  	int				tcfa_action;
>  	struct tcf_t			tcfa_tm;
> diff --git a/net/sched/act_api.c b/net/sched/act_api.c
> index 02670c7489e3..4f064ecab882 100644
> --- a/net/sched/act_api.c
> +++ b/net/sched/act_api.c
> @@ -105,14 +105,26 @@ int __tcf_idr_release(struct tc_action *p, bool bind, bool strict)
>  
>  	ASSERT_RTNL();
>  
> +	/* Release with strict==1 and bind==0 is only called through act API
> +	 * interface (classifiers always bind). Only case when action with
> +	 * positive reference count and zero bind count can exist is when it was
> +	 * also created with act API (unbinding last classifier will destroy the
> +	 * action if it was created by classifier). So only case when bind count
> +	 * can be changed after initial check is when unbound action is
> +	 * destroyed by act API while classifier binds to action with same id
> +	 * concurrently. This result either creation of new action(same behavior
> +	 * as before), or reusing existing action if concurrent process
> +	 * increments reference count before action is deleted. Both scenarios
> +	 * are acceptable.
> +	 */
>  	if (p) {
>  		if (bind)
> -			p->tcfa_bindcnt--;
> -		else if (strict && p->tcfa_bindcnt > 0)
> +			atomic_dec(&p->tcfa_bindcnt);
> +		else if (strict && atomic_read(&p->tcfa_bindcnt) > 0)
>  			return -EPERM;
>  
> -		p->tcfa_refcnt--;
> -		if (p->tcfa_bindcnt <= 0 && p->tcfa_refcnt <= 0) {
> +		if (atomic_read(&p->tcfa_bindcnt) <= 0 &&
> +		    refcount_dec_and_test(&p->tcfa_refcnt)) {
>  			if (p->ops->cleanup)
>  				p->ops->cleanup(p);
>  			tcf_idr_remove(p->idrinfo, p);
> @@ -304,8 +316,8 @@ bool tcf_idr_check(struct tc_action_net *tn, u32 index, struct tc_action **a,
>  
>  	if (index && p) {
>  		if (bind)
> -			p->tcfa_bindcnt++;
> -		p->tcfa_refcnt++;
> +			atomic_inc(&p->tcfa_bindcnt);
> +		refcount_inc(&p->tcfa_refcnt);
>  		*a = p;
>  		return true;
>  	}
> @@ -324,9 +336,9 @@ int tcf_idr_create(struct tc_action_net *tn, u32 index, struct nlattr *est,
>  
>  	if (unlikely(!p))
>  		return -ENOMEM;
> -	p->tcfa_refcnt = 1;
> +	refcount_set(&p->tcfa_refcnt, 1);
>  	if (bind)
> -		p->tcfa_bindcnt = 1;
> +		atomic_set(&p->tcfa_bindcnt, 1);
>  
>  	if (cpustats) {
>  		p->cpu_bstats = netdev_alloc_pcpu_stats(struct gnet_stats_basic_cpu);
> @@ -782,7 +794,7 @@ static void cleanup_a(struct list_head *actions, int ovr)
>  		return;
>  
>  	list_for_each_entry(a, actions, list)
> -		a->tcfa_refcnt--;
> +		refcount_dec(&a->tcfa_refcnt);
>  }
>  
>  int tcf_action_init(struct net *net, struct tcf_proto *tp, struct nlattr *nla,
> @@ -810,7 +822,7 @@ int tcf_action_init(struct net *net, struct tcf_proto *tp, struct nlattr *nla,
>  		act->order = i;
>  		sz += tcf_action_fill_size(act);
>  		if (ovr)
> -			act->tcfa_refcnt++;
> +			refcount_inc(&act->tcfa_refcnt);
>  		list_add_tail(&act->list, actions);
>  	}
>  
> diff --git a/net/sched/act_bpf.c b/net/sched/act_bpf.c
> index 18089c02e557..15a2a53cbde1 100644
> --- a/net/sched/act_bpf.c
> +++ b/net/sched/act_bpf.c
> @@ -141,8 +141,8 @@ static int tcf_bpf_dump(struct sk_buff *skb, struct tc_action *act,
>  	struct tcf_bpf *prog = to_bpf(act);
>  	struct tc_act_bpf opt = {
>  		.index   = prog->tcf_index,
> -		.refcnt  = prog->tcf_refcnt - ref,
> -		.bindcnt = prog->tcf_bindcnt - bind,
> +		.refcnt  = refcount_read(&prog->tcf_refcnt) - ref,
> +		.bindcnt = atomic_read(&prog->tcf_bindcnt) - bind,
>  		.action  = prog->tcf_action,
>  	};
>  	struct tcf_t tm;
> diff --git a/net/sched/act_connmark.c b/net/sched/act_connmark.c
> index e4b880fa51fe..188865034f9a 100644
> --- a/net/sched/act_connmark.c
> +++ b/net/sched/act_connmark.c
> @@ -154,8 +154,8 @@ static inline int tcf_connmark_dump(struct sk_buff *skb, struct tc_action *a,
>  
>  	struct tc_connmark opt = {
>  		.index   = ci->tcf_index,
> -		.refcnt  = ci->tcf_refcnt - ref,
> -		.bindcnt = ci->tcf_bindcnt - bind,
> +		.refcnt  = refcount_read(&ci->tcf_refcnt) - ref,
> +		.bindcnt = atomic_read(&ci->tcf_bindcnt) - bind,
>  		.action  = ci->tcf_action,
>  		.zone   = ci->zone,
>  	};
> diff --git a/net/sched/act_csum.c b/net/sched/act_csum.c
> index 526a8e491626..da865f7b390a 100644
> --- a/net/sched/act_csum.c
> +++ b/net/sched/act_csum.c
> @@ -597,8 +597,8 @@ static int tcf_csum_dump(struct sk_buff *skb, struct tc_action *a, int bind,
>  	struct tcf_csum_params *params;
>  	struct tc_csum opt = {
>  		.index   = p->tcf_index,
> -		.refcnt  = p->tcf_refcnt - ref,
> -		.bindcnt = p->tcf_bindcnt - bind,
> +		.refcnt  = refcount_read(&p->tcf_refcnt) - ref,
> +		.bindcnt = atomic_read(&p->tcf_bindcnt) - bind,
>  	};
>  	struct tcf_t t;
>  
> diff --git a/net/sched/act_gact.c b/net/sched/act_gact.c
> index 4dc4f153cad8..ca83debd5a70 100644
> --- a/net/sched/act_gact.c
> +++ b/net/sched/act_gact.c
> @@ -169,8 +169,8 @@ static int tcf_gact_dump(struct sk_buff *skb, struct tc_action *a,
>  	struct tcf_gact *gact = to_gact(a);
>  	struct tc_gact opt = {
>  		.index   = gact->tcf_index,
> -		.refcnt  = gact->tcf_refcnt - ref,
> -		.bindcnt = gact->tcf_bindcnt - bind,
> +		.refcnt  = refcount_read(&gact->tcf_refcnt) - ref,
> +		.bindcnt = atomic_read(&gact->tcf_bindcnt) - bind,
>  		.action  = gact->tcf_action,
>  	};
>  	struct tcf_t t;
> diff --git a/net/sched/act_ife.c b/net/sched/act_ife.c
> index 8527cfdc446d..706e84d6f912 100644
> --- a/net/sched/act_ife.c
> +++ b/net/sched/act_ife.c
> @@ -598,8 +598,8 @@ static int tcf_ife_dump(struct sk_buff *skb, struct tc_action *a, int bind,
>  	struct tcf_ife_params *p = rtnl_dereference(ife->params);
>  	struct tc_ife opt = {
>  		.index = ife->tcf_index,
> -		.refcnt = ife->tcf_refcnt - ref,
> -		.bindcnt = ife->tcf_bindcnt - bind,
> +		.refcnt = refcount_read(&ife->tcf_refcnt) - ref,
> +		.bindcnt = atomic_read(&ife->tcf_bindcnt) - bind,
>  		.action = ife->tcf_action,
>  		.flags = p->flags,
>  	};
> diff --git a/net/sched/act_ipt.c b/net/sched/act_ipt.c
> index 14c312d7908f..7bce88dc11c9 100644
> --- a/net/sched/act_ipt.c
> +++ b/net/sched/act_ipt.c
> @@ -280,8 +280,8 @@ static int tcf_ipt_dump(struct sk_buff *skb, struct tc_action *a, int bind,
>  	if (unlikely(!t))
>  		goto nla_put_failure;
>  
> -	c.bindcnt = ipt->tcf_bindcnt - bind;
> -	c.refcnt = ipt->tcf_refcnt - ref;
> +	c.bindcnt = atomic_read(&ipt->tcf_bindcnt) - bind;
> +	c.refcnt = refcount_read(&ipt->tcf_refcnt) - ref;
>  	strcpy(t->u.user.name, ipt->tcfi_t->u.kernel.target->name);
>  
>  	if (nla_put(skb, TCA_IPT_TARG, ipt->tcfi_t->u.user.target_size, t) ||
> diff --git a/net/sched/act_mirred.c b/net/sched/act_mirred.c
> index fd34015331ab..82a8bdd67c47 100644
> --- a/net/sched/act_mirred.c
> +++ b/net/sched/act_mirred.c
> @@ -250,8 +250,8 @@ static int tcf_mirred_dump(struct sk_buff *skb, struct tc_action *a, int bind,
>  	struct tc_mirred opt = {
>  		.index   = m->tcf_index,
>  		.action  = m->tcf_action,
> -		.refcnt  = m->tcf_refcnt - ref,
> -		.bindcnt = m->tcf_bindcnt - bind,
> +		.refcnt  = refcount_read(&m->tcf_refcnt) - ref,
> +		.bindcnt = atomic_read(&m->tcf_bindcnt) - bind,
>  		.eaction = m->tcfm_eaction,
>  		.ifindex = dev ? dev->ifindex : 0,
>  	};
> diff --git a/net/sched/act_nat.c b/net/sched/act_nat.c
> index 4b5848b6c252..457c2ae3de46 100644
> --- a/net/sched/act_nat.c
> +++ b/net/sched/act_nat.c
> @@ -257,8 +257,8 @@ static int tcf_nat_dump(struct sk_buff *skb, struct tc_action *a,
>  
>  		.index    = p->tcf_index,
>  		.action   = p->tcf_action,
> -		.refcnt   = p->tcf_refcnt - ref,
> -		.bindcnt  = p->tcf_bindcnt - bind,
> +		.refcnt   = refcount_read(&p->tcf_refcnt) - ref,
> +		.bindcnt  = atomic_read(&p->tcf_bindcnt) - bind,
>  	};
>  	struct tcf_t t;
>  
> diff --git a/net/sched/act_pedit.c b/net/sched/act_pedit.c
> index 8a925c72db5f..0102b2935fdb 100644
> --- a/net/sched/act_pedit.c
> +++ b/net/sched/act_pedit.c
> @@ -391,8 +391,8 @@ static int tcf_pedit_dump(struct sk_buff *skb, struct tc_action *a,
>  	opt->nkeys = p->tcfp_nkeys;
>  	opt->flags = p->tcfp_flags;
>  	opt->action = p->tcf_action;
> -	opt->refcnt = p->tcf_refcnt - ref;
> -	opt->bindcnt = p->tcf_bindcnt - bind;
> +	opt->refcnt = refcount_read(&p->tcf_refcnt) - ref;
> +	opt->bindcnt = atomic_read(&p->tcf_bindcnt) - bind;
>  
>  	if (p->tcfp_keys_ex) {
>  		tcf_pedit_key_ex_dump(skb, p->tcfp_keys_ex, p->tcfp_nkeys);
> diff --git a/net/sched/act_police.c b/net/sched/act_police.c
> index 4e72bc2a0dfb..a789b8060968 100644
> --- a/net/sched/act_police.c
> +++ b/net/sched/act_police.c
> @@ -274,8 +274,8 @@ static int tcf_act_police_dump(struct sk_buff *skb, struct tc_action *a,
>  		.action = police->tcf_action,
>  		.mtu = police->tcfp_mtu,
>  		.burst = PSCHED_NS2TICKS(police->tcfp_burst),
> -		.refcnt = police->tcf_refcnt - ref,
> -		.bindcnt = police->tcf_bindcnt - bind,
> +		.refcnt = refcount_read(&police->tcf_refcnt) - ref,
> +		.bindcnt = atomic_read(&police->tcf_bindcnt) - bind,
>  	};
>  	struct tcf_t t;
>  
> diff --git a/net/sched/act_sample.c b/net/sched/act_sample.c
> index 5db358497c9e..4a46978db092 100644
> --- a/net/sched/act_sample.c
> +++ b/net/sched/act_sample.c
> @@ -173,8 +173,8 @@ static int tcf_sample_dump(struct sk_buff *skb, struct tc_action *a,
>  	struct tc_sample opt = {
>  		.index      = s->tcf_index,
>  		.action     = s->tcf_action,
> -		.refcnt     = s->tcf_refcnt - ref,
> -		.bindcnt    = s->tcf_bindcnt - bind,
> +		.refcnt     = refcount_read(&s->tcf_refcnt) - ref,
> +		.bindcnt    = atomic_read(&s->tcf_bindcnt) - bind,
>  	};
>  	struct tcf_t t;
>  
> diff --git a/net/sched/act_simple.c b/net/sched/act_simple.c
> index 9618b4a83cee..95d5985b8d67 100644
> --- a/net/sched/act_simple.c
> +++ b/net/sched/act_simple.c
> @@ -148,8 +148,8 @@ static int tcf_simp_dump(struct sk_buff *skb, struct tc_action *a,
>  	struct tcf_defact *d = to_defact(a);
>  	struct tc_defact opt = {
>  		.index   = d->tcf_index,
> -		.refcnt  = d->tcf_refcnt - ref,
> -		.bindcnt = d->tcf_bindcnt - bind,
> +		.refcnt  = refcount_read(&d->tcf_refcnt) - ref,
> +		.bindcnt = atomic_read(&d->tcf_bindcnt) - bind,
>  		.action  = d->tcf_action,
>  	};
>  	struct tcf_t t;
> diff --git a/net/sched/act_skbedit.c b/net/sched/act_skbedit.c
> index 6138d1d71900..d418ec3b0ab9 100644
> --- a/net/sched/act_skbedit.c
> +++ b/net/sched/act_skbedit.c
> @@ -173,8 +173,8 @@ static int tcf_skbedit_dump(struct sk_buff *skb, struct tc_action *a,
>  	struct tcf_skbedit *d = to_skbedit(a);
>  	struct tc_skbedit opt = {
>  		.index   = d->tcf_index,
> -		.refcnt  = d->tcf_refcnt - ref,
> -		.bindcnt = d->tcf_bindcnt - bind,
> +		.refcnt  = refcount_read(&d->tcf_refcnt) - ref,
> +		.bindcnt = atomic_read(&d->tcf_bindcnt) - bind,
>  		.action  = d->tcf_action,
>  	};
>  	struct tcf_t t;
> diff --git a/net/sched/act_skbmod.c b/net/sched/act_skbmod.c
> index ad050d7d4b46..ff90d720eda3 100644
> --- a/net/sched/act_skbmod.c
> +++ b/net/sched/act_skbmod.c
> @@ -205,8 +205,8 @@ static int tcf_skbmod_dump(struct sk_buff *skb, struct tc_action *a,
>  	struct tcf_skbmod_params  *p = rtnl_dereference(d->skbmod_p);
>  	struct tc_skbmod opt = {
>  		.index   = d->tcf_index,
> -		.refcnt  = d->tcf_refcnt - ref,
> -		.bindcnt = d->tcf_bindcnt - bind,
> +		.refcnt  = refcount_read(&d->tcf_refcnt) - ref,
> +		.bindcnt = atomic_read(&d->tcf_bindcnt) - bind,
>  		.action  = d->tcf_action,
>  	};
>  	struct tcf_t t;
> diff --git a/net/sched/act_tunnel_key.c b/net/sched/act_tunnel_key.c
> index 626dac81a48a..c6e50695414b 100644
> --- a/net/sched/act_tunnel_key.c
> +++ b/net/sched/act_tunnel_key.c
> @@ -252,8 +252,8 @@ static int tunnel_key_dump(struct sk_buff *skb, struct tc_action *a,
>  	struct tcf_tunnel_key_params *params;
>  	struct tc_tunnel_key opt = {
>  		.index    = t->tcf_index,
> -		.refcnt   = t->tcf_refcnt - ref,
> -		.bindcnt  = t->tcf_bindcnt - bind,
> +		.refcnt   = refcount_read(&t->tcf_refcnt) - ref,
> +		.bindcnt  = atomic_read(&t->tcf_bindcnt) - bind,
>  	};
>  	struct tcf_t tm;
>  
> diff --git a/net/sched/act_vlan.c b/net/sched/act_vlan.c
> index 853604685965..8dda78473004 100644
> --- a/net/sched/act_vlan.c
> +++ b/net/sched/act_vlan.c
> @@ -237,8 +237,8 @@ static int tcf_vlan_dump(struct sk_buff *skb, struct tc_action *a,
>  	struct tcf_vlan_params *p = rtnl_dereference(v->vlan_p);
>  	struct tc_vlan opt = {
>  		.index    = v->tcf_index,
> -		.refcnt   = v->tcf_refcnt - ref,
> -		.bindcnt  = v->tcf_bindcnt - bind,
> +		.refcnt   = refcount_read(&v->tcf_refcnt) - ref,
> +		.bindcnt  = atomic_read(&v->tcf_bindcnt) - bind,
>  		.action   = v->tcf_action,
>  		.v_action = p->tcfv_action,
>  	};
> -- 
> 2.7.5
> 

^ permalink raw reply

* Re: [PATCH v3 01/11] net: sched: use rcu for action cookie update
From: Marcelo Ricardo Leitner @ 2018-05-28 21:37 UTC (permalink / raw)
  To: Vlad Buslov
  Cc: jiri, netdev, jhs, xiyou.wangcong, davem, ast, daniel, kliteyn,
	Jiri Pirko
In-Reply-To: <1527455849-22327-2-git-send-email-vladbu@mellanox.com>

On Mon, May 28, 2018 at 12:17:19AM +0300, Vlad Buslov wrote:
> Implement functions to atomically update and free action cookie
> using rcu mechanism.
> 
> Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
> Signed-off-by: Jiri Pirko <jiri@mellanox.com>

Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>

> ---
>  include/net/act_api.h |  2 +-
>  include/net/pkt_cls.h |  1 +
>  net/sched/act_api.c   | 44 ++++++++++++++++++++++++++++++--------------
>  3 files changed, 32 insertions(+), 15 deletions(-)
> 
> diff --git a/include/net/act_api.h b/include/net/act_api.h
> index 9e59ebfded62..f7b59ef7303d 100644
> --- a/include/net/act_api.h
> +++ b/include/net/act_api.h
> @@ -37,7 +37,7 @@ struct tc_action {
>  	spinlock_t			tcfa_lock;
>  	struct gnet_stats_basic_cpu __percpu *cpu_bstats;
>  	struct gnet_stats_queue __percpu *cpu_qstats;
> -	struct tc_cookie	*act_cookie;
> +	struct tc_cookie	__rcu *act_cookie;
>  	struct tcf_chain	*goto_chain;
>  };
>  #define tcf_index	common.tcfa_index
> diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h
> index e828d31be5da..3068cc8aa0f1 100644
> --- a/include/net/pkt_cls.h
> +++ b/include/net/pkt_cls.h
> @@ -769,6 +769,7 @@ struct tc_mqprio_qopt_offload {
>  struct tc_cookie {
>  	u8  *data;
>  	u32 len;
> +	struct rcu_head rcu;
>  };
>  
>  struct tc_qopt_offload_stats {
> diff --git a/net/sched/act_api.c b/net/sched/act_api.c
> index 3f4cf930f809..02670c7489e3 100644
> --- a/net/sched/act_api.c
> +++ b/net/sched/act_api.c
> @@ -55,6 +55,24 @@ static void tcf_action_goto_chain_exec(const struct tc_action *a,
>  	res->goto_tp = rcu_dereference_bh(chain->filter_chain);
>  }
>  
> +static void tcf_free_cookie_rcu(struct rcu_head *p)
> +{
> +	struct tc_cookie *cookie = container_of(p, struct tc_cookie, rcu);
> +
> +	kfree(cookie->data);
> +	kfree(cookie);
> +}
> +
> +static void tcf_set_action_cookie(struct tc_cookie __rcu **old_cookie,
> +				  struct tc_cookie *new_cookie)
> +{
> +	struct tc_cookie *old;
> +
> +	old = xchg(old_cookie, new_cookie);
> +	if (old)
> +		call_rcu(&old->rcu, tcf_free_cookie_rcu);
> +}
> +
>  /* XXX: For standalone actions, we don't need a RCU grace period either, because
>   * actions are always connected to filters and filters are already destroyed in
>   * RCU callbacks, so after a RCU grace period actions are already disconnected
> @@ -65,10 +83,7 @@ static void free_tcf(struct tc_action *p)
>  	free_percpu(p->cpu_bstats);
>  	free_percpu(p->cpu_qstats);
>  
> -	if (p->act_cookie) {
> -		kfree(p->act_cookie->data);
> -		kfree(p->act_cookie);
> -	}
> +	tcf_set_action_cookie(&p->act_cookie, NULL);
>  	if (p->goto_chain)
>  		tcf_action_goto_chain_fini(p);
>  
> @@ -567,16 +582,22 @@ tcf_action_dump_1(struct sk_buff *skb, struct tc_action *a, int bind, int ref)
>  	int err = -EINVAL;
>  	unsigned char *b = skb_tail_pointer(skb);
>  	struct nlattr *nest;
> +	struct tc_cookie *cookie;
>  
>  	if (nla_put_string(skb, TCA_KIND, a->ops->kind))
>  		goto nla_put_failure;
>  	if (tcf_action_copy_stats(skb, a, 0))
>  		goto nla_put_failure;
> -	if (a->act_cookie) {
> -		if (nla_put(skb, TCA_ACT_COOKIE, a->act_cookie->len,
> -			    a->act_cookie->data))
> +
> +	rcu_read_lock();
> +	cookie = rcu_dereference(a->act_cookie);
> +	if (cookie) {
> +		if (nla_put(skb, TCA_ACT_COOKIE, cookie->len, cookie->data)) {
> +			rcu_read_unlock();
>  			goto nla_put_failure;
> +		}
>  	}
> +	rcu_read_unlock();
>  
>  	nest = nla_nest_start(skb, TCA_OPTIONS);
>  	if (nest == NULL)
> @@ -719,13 +740,8 @@ struct tc_action *tcf_action_init_1(struct net *net, struct tcf_proto *tp,
>  	if (err < 0)
>  		goto err_mod;
>  
> -	if (name == NULL && tb[TCA_ACT_COOKIE]) {
> -		if (a->act_cookie) {
> -			kfree(a->act_cookie->data);
> -			kfree(a->act_cookie);
> -		}
> -		a->act_cookie = cookie;
> -	}
> +	if (!name && tb[TCA_ACT_COOKIE])
> +		tcf_set_action_cookie(&a->act_cookie, cookie);
>  
>  	/* module count goes up only when brand new policy is created
>  	 * if it exists and is only bound to in a_o->init() then
> -- 
> 2.7.5
> 

^ permalink raw reply

* [PATCH net-next] net: remove bypassed check in sch_direct_xmit()
From: Song Liu @ 2018-05-28 21:36 UTC (permalink / raw)
  To: netdev
  Cc: Song Liu, kernel-team, John Fastabend, Alexei Starovoitov,
	David S . Miller

Check sch_direct_xmit() at the end of sch_direct_xmit() will be bypassed.
This is because "ret" from sch_direct_xmit() will be either NETDEV_TX_OK
or NETDEV_TX_BUSY, and only ret == NETDEV_TX_OK == 0 will reach the
condition:

    if (ret && netif_xmit_frozen_or_stopped(txq))
        return false;

This patch cleans up the code by removing  the whole condition.

For more discussion about this, please refer to
   https://marc.info/?t=152727195700008

Signed-off-by: Song Liu <songliubraving@fb.com>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: David S. Miller <davem@davemloft.net>
---
 net/sched/sch_generic.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
index 760ab1b..69078c8 100644
--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -346,9 +346,6 @@ bool sch_direct_xmit(struct sk_buff *skb, struct Qdisc *q,
 		return false;
 	}
 
-	if (ret && netif_xmit_frozen_or_stopped(txq))
-		return false;
-
 	return true;
 }
 
-- 
2.9.5

^ permalink raw reply related

* Re: [PATCH v3 11/11] net: sched: change action API to use array of pointers to actions
From: Marcelo Ricardo Leitner @ 2018-05-28 21:31 UTC (permalink / raw)
  To: Vlad Buslov
  Cc: jiri, netdev, jhs, xiyou.wangcong, davem, ast, daniel, kliteyn
In-Reply-To: <1527455849-22327-12-git-send-email-vladbu@mellanox.com>

On Mon, May 28, 2018 at 12:17:29AM +0300, Vlad Buslov wrote:
...
> -int tcf_action_destroy(struct list_head *actions, int bind)
> +int tcf_action_destroy(struct tc_action *actions[], int bind)
>  {
>  	const struct tc_action_ops *ops;
> -	struct tc_action *a, *tmp;
> -	int ret = 0;
> +	struct tc_action *a;
> +	int ret = 0, i;
>  
> -	list_for_each_entry_safe(a, tmp, actions, list) {
> +	for (i = 0; i < TCA_ACT_MAX_PRIO && actions[i]; i++) {
...
> @@ -878,10 +881,9 @@ struct tc_action *tcf_action_init_1(struct net *net, struct tcf_proto *tp,
>  	if (TC_ACT_EXT_CMP(a->tcfa_action, TC_ACT_GOTO_CHAIN)) {
>  		err = tcf_action_goto_chain_init(a, tp);
>  		if (err) {
> -			LIST_HEAD(actions);
> +			struct tc_action *actions[TCA_ACT_MAX_PRIO] = { a };

Somewhat nit.. Considering tcf_action_destroy will stop at the first
NULL, you need only 2 slots here.

>  
> -			list_add_tail(&a->list, &actions);
> -			tcf_action_destroy(&actions, bind);
> +			tcf_action_destroy(actions, bind);
>  			NL_SET_ERR_MSG(extack, "Failed to init TC action chain");
>  			return ERR_PTR(err);
>  		}

^ permalink raw reply

* Re: [PATCH net-next v16 4/8] netfilter: Add nf_ct_get_tuple_skb callback
From: Toke Høiland-Jørgensen @ 2018-05-28 21:28 UTC (permalink / raw)
  To: Pablo Neira Ayuso; +Cc: netdev, cake, netfilter-devel
In-Reply-To: <20180528194902.kooqoao3agt32nls@salvia>

Pablo Neira Ayuso <pablo@netfilter.org> writes:

> On Mon, May 28, 2018 at 04:27:46PM +0200, Toke Høiland-Jørgensen wrote:
> [...]
>> diff --git a/net/netfilter/core.c b/net/netfilter/core.c
>> index 0f6b8172fb9a..520565198f0e 100644
>> --- a/net/netfilter/core.c
>> +++ b/net/netfilter/core.c
>> @@ -572,6 +572,27 @@ void nf_conntrack_destroy(struct nf_conntrack *nfct)
>>  }
>>  EXPORT_SYMBOL(nf_conntrack_destroy);
>>  
>> +bool (*skb_ct_get_tuple)(struct nf_conntrack_tuple *,
>> +			 const struct sk_buff *) __rcu __read_mostly;
>> +EXPORT_SYMBOL(skb_ct_get_tuple);
>
> Now we have struct nf_ct_hook in net-next, please add ->get_tuple to
> that new object.

Ah, right, will do :)

-Toke

^ permalink raw reply

* Re: Is it possible to get device information via CMSG?
From: Eric S. Raymond @ 2018-05-28 21:02 UTC (permalink / raw)
  To: Michal Kubecek; +Cc: netdev
In-Reply-To: <20180528065701.dqdjbrqb34zfsfvo@unicorn.suse.cz>

Michal Kubecek <mkubecek@suse.cz>:
> > 1. Is there a cmsg_level/cmsg_type combination that will return the
> > name of the device the packet arrived through?
> 
> Not name directly, AFAIK, but you can set SOL_IP / IP_PKTINFO (or
> SOL_IPV6 / IPV6_RECVPKTINFO) socket option and get IP_PKTINFO
> (IPV6_PKTINFO) message with recvmsg(). This will tell you incoming
> interface index so that you can look the name up. See ip(7) or ipv6(7)
> for format of the message (struct ip_pktinfo, struct in6_pktinfo).

Thanks, that confirms something I found on Stack Overflow after I
queried your list.

> However, I suspect that userspace application is not really interested
> in incoming interface name but rather in destination address of the
> incoming packet which is also provided in IP_PKTINFO / IPV6_PKTINFO
> message. 

NTP is weird that way.  My group, NTPsec, inherited the reference
Mills codebase (what we now call "NTP Classic") which really does have
a filter-by-interface-name feature *in addition to* local address
filtering.

We want to simplify the way it works without discarding that feature,
because we've made promises about backward compatibility that we mean
to keep.  We don't throw away features unless either they're security
holes or we are *dead certain* they are obsolete.

In case it's of interest to you, NTPsec is a drop-in replacement for
NTP Classic that solves its chronic security problems by stripping out
unused features and legacy code. We've actually shrunk the codebase
size by a factor of 4x.  We have better monitoring and admin tools, too.
-- 
		<a href="http://www.catb.org/~esr/">Eric S. Raymond</a>

My work is funded by the Internet Civil Engineering Institute: https://icei.org
Please visit their site and donate: the civilization you save might be your own.

^ permalink raw reply

* Proposal
From: Miss Zeliha Omer Faruk @ 2018-05-28 20:46 UTC (permalink / raw)




-- 
Hello

I have been trying to contact you. Did you get my business proposal?

Best Regards,
Miss.Zeliha ömer faruk
Esentepe Mahallesi Büyükdere
Caddesi Kristal Kule Binasi
No:215
Sisli - Istanbul, Turke

^ permalink raw reply

* Re: [RFC net-next 0/4] net: sched: support replay of filter offload when binding to block
From: Jakub Kicinski @ 2018-05-28 20:02 UTC (permalink / raw)
  To: Or Gerlitz
  Cc: John Hurley, Linux Netdev List, Jiri Pirko, Samudrala, Sridhar,
	oss-drivers, Rabie Loulou
In-Reply-To: <CAJ3xEMjfSc=yg+o+Pv4AQw0P-1C4pDjKa=ctCbeS2hWWMR+RPw@mail.gmail.com>

On Mon, 28 May 2018 13:48:28 +0300, Or Gerlitz wrote:
> On Fri, May 25, 2018 at 5:25 AM, Jakub Kicinski wrote:
> > This series from John adds the ability to replay filter offload requests
> > when new offload callback is being registered on a TC block.  This is most
> > likely to take place for shared blocks today, when a block which already
> > has rules is bound to another interface.  Prior to this patch set if any
> > of the rules were offloaded the block bind would fail.  
> 
> Can you elaborate a little further here? is this something that you are planning
> to use for the uplink LAG use-case? AFAIU if we apply share-block to nfp as
> things are prior to this patch, it would work, so there's a case where
> it doesn't and this is now handled with the series?

Just looking at things as they stand today, no bond/forward looking
plans - nfp "supports" shared blocks by registering multiple callbacks
to the block.  There are two problems:

(a) one can't install a second callback if some rules are already
    offloaded because of:

	/* At this point, playback of previous block cb calls is not supported,
	 * so forbid to register to block which already has some offloaded
	 * filters present.
	 */
	if (tcf_block_offload_in_use(block))
		return ERR_PTR(-EOPNOTSUPP);

    in __tcf_block_cb_register(), so block sharing has to be set up
    before any rules are added.

(b) when block is unshared filters are not removed today and driver
    would have to sweep its rule table, as John notes.  It's not a big
    deal but this series fixes it nicely in the core, too.


Looking forward there are two things we can use shared blocks for: we
can try to teach user space to share ingress blocks on all legs of bonds
instead of trying to propagate the rules from the bond down in the
kernel, which is more tricky to get right.  We will need reliable
replay for that, because we want new links to be able to join and leave
the bond when rules are already present.

Second use case, which is more far fetched, is trying to discover and
register callbacks for blocks of tunnel devices directly, and avoid the
egdev infrastructure...

We should discuss the above further, but regardless, I think this
patchset is quite a nice addition on it's own.  Would you agree?

^ permalink raw reply

* Re: [PATCH net-next v16 5/8] sch_cake: Add NAT awareness to packet classifier
From: Pablo Neira Ayuso @ 2018-05-28 19:51 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen; +Cc: netdev, cake, netfilter-devel
In-Reply-To: <152751766690.30935.18178441475189968162.stgit@alrua-kau>

On Mon, May 28, 2018 at 04:27:46PM +0200, Toke Høiland-Jørgensen wrote:
> When CAKE is deployed on a gateway that also performs NAT (which is a
> common deployment mode), the host fairness mechanism cannot distinguish
> internal hosts from each other, and so fails to work correctly.
> 
> To fix this, we add an optional NAT awareness mode, which will query the
> kernel conntrack mechanism to obtain the pre-NAT addresses for each packet
> and use that in the flow and host hashing.
> 
> When the shaper is enabled and the host is already performing NAT, the cost
> of this lookup is negligible. However, in unlimited mode with no NAT being
> performed, there is a significant CPU cost at higher bandwidths. For this
> reason, the feature is turned off by default.
> 
> Cc: netfilter-devel@vger.kernel.org
> Signed-off-by: Toke Høiland-Jørgensen <toke@toke.dk>
> ---
>  net/sched/sch_cake.c |   46 ++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 46 insertions(+)
> 
> diff --git a/net/sched/sch_cake.c b/net/sched/sch_cake.c
> index 68ac908470f1..fecd9caac0cc 100644
> --- a/net/sched/sch_cake.c
> +++ b/net/sched/sch_cake.c
> @@ -71,6 +71,10 @@
>  #include <net/tcp.h>
>  #include <net/flow_dissector.h>
>  
> +#if IS_ENABLED(CONFIG_NF_CONNTRACK)
> +#include <net/netfilter/nf_conntrack_core.h>
> +#endif
> +
>  #define CAKE_SET_WAYS (8)
>  #define CAKE_MAX_TINS (8)
>  #define CAKE_QUEUES (1024)
> @@ -516,6 +520,29 @@ static bool cobalt_should_drop(struct cobalt_vars *vars,
>  	return drop;
>  }
>  
> +static void cake_update_flowkeys(struct flow_keys *keys,
> +				 const struct sk_buff *skb)
> +{
> +#if IS_ENABLED(CONFIG_NF_CONNTRACK)

I would remove the ifdef, not really needed, it will simplify things.

But I leave it to you to decide, this is not I deal breaker.

^ permalink raw reply

* Re: [PATCH net-next v16 4/8] netfilter: Add nf_ct_get_tuple_skb callback
From: Pablo Neira Ayuso @ 2018-05-28 19:49 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen; +Cc: netdev, cake, netfilter-devel
In-Reply-To: <152751766686.30935.14644567905547700823.stgit@alrua-kau>

On Mon, May 28, 2018 at 04:27:46PM +0200, Toke Høiland-Jørgensen wrote:
[...]
> diff --git a/net/netfilter/core.c b/net/netfilter/core.c
> index 0f6b8172fb9a..520565198f0e 100644
> --- a/net/netfilter/core.c
> +++ b/net/netfilter/core.c
> @@ -572,6 +572,27 @@ void nf_conntrack_destroy(struct nf_conntrack *nfct)
>  }
>  EXPORT_SYMBOL(nf_conntrack_destroy);
>  
> +bool (*skb_ct_get_tuple)(struct nf_conntrack_tuple *,
> +			 const struct sk_buff *) __rcu __read_mostly;
> +EXPORT_SYMBOL(skb_ct_get_tuple);

Now we have struct nf_ct_hook in net-next, please add ->get_tuple to
that new object.

Thanks.

^ permalink raw reply

* Re: [PATCH 3/4 RFCv2] net: dsa: realtek-smi: Add Realtek SMI driver
From: Andrew Lunn @ 2018-05-28 19:45 UTC (permalink / raw)
  To: Linus Walleij
  Cc: Vivien Didelot, Florian Fainelli, netdev, openwrt-devel,
	LEDE Development List, Antti Seppälä, Roman Yeryomin,
	Colin Leitner, Gabor Juhos
In-Reply-To: <20180528174752.6806-4-linus.walleij@linaro.org>

> +struct rtl8366_mib_counter {
> +	unsigned	base;
> +	unsigned	offset;
> +	unsigned	length;
> +	const char	*name;
> +};


> +void rtl8366_get_strings(struct dsa_switch *ds, int port, uint8_t *data)
> +{
> +	struct realtek_smi *smi = ds->priv;
> +	struct rtl8366_mib_counter *mib;
> +	int i;
> +
> +	if (port >= smi->num_ports)
> +		return;
> +
> +	for (i = 0; i < smi->num_mib_counters; i++) {
> +		mib = &smi->mib_counters[i];
> +		memcpy(data + i * ETH_GSTRING_LEN,
> +		       mib->name, ETH_GSTRING_LEN);
> +	}
> +}

Hi Linus

name is a char *. Its length is determined by its content. But you
perform a memcpy of ETH_GSTRING_LEN. This can take you off the end of
the string causing an out of bounds error. Either make name
ETH_GSTRING_LEN long, or you strncpy().

	Andrew

^ permalink raw reply

* Re: [PATCH net] sctp: not allow to set rto_min with a value below 200 msecs
From: Marcelo Ricardo Leitner @ 2018-05-28 19:43 UTC (permalink / raw)
  To: Neil Horman
  Cc: Dmitry Vyukov, Michael Tuexen, Xin Long, network dev, linux-sctp,
	David Miller, David Ahern, Eric Dumazet, syzkaller
In-Reply-To: <20180527010059.GA15798@neilslaptop.think-freely.org>

On Sat, May 26, 2018 at 09:01:00PM -0400, Neil Horman wrote:
> On Sat, May 26, 2018 at 05:50:39PM +0200, Dmitry Vyukov wrote:
> > On Sat, May 26, 2018 at 5:42 PM, Michael Tuexen
> > <michael.tuexen@lurchi.franken.de> wrote:
> > >> On 25. May 2018, at 21:13, Neil Horman <nhorman@tuxdriver.com> wrote:
> > >>
> > >> On Sat, May 26, 2018 at 01:41:02AM +0800, Xin Long wrote:
> > >>> syzbot reported a rcu_sched self-detected stall on CPU which is caused
> > >>> by too small value set on rto_min with SCTP_RTOINFO sockopt. With this
> > >>> value, hb_timer will get stuck there, as in its timer handler it starts
> > >>> this timer again with this value, then goes to the timer handler again.
> > >>>
> > >>> This problem is there since very beginning, and thanks to Eric for the
> > >>> reproducer shared from a syzbot mail.
> > >>>
> > >>> This patch fixes it by not allowing to set rto_min with a value below
> > >>> 200 msecs, which is based on TCP's, by either setsockopt or sysctl.
> > >>>
> > >>> Reported-by: syzbot+3dcd59a1f907245f891f@syzkaller.appspotmail.com
> > >>> Suggested-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
> > >>> Signed-off-by: Xin Long <lucien.xin@gmail.com>
> > >>> ---
> > >>> include/net/sctp/constants.h |  1 +
> > >>> net/sctp/socket.c            | 10 +++++++---
> > >>> net/sctp/sysctl.c            |  3 ++-
> > >>> 3 files changed, 10 insertions(+), 4 deletions(-)
> > >>>
> > >>> diff --git a/include/net/sctp/constants.h b/include/net/sctp/constants.h
> > >>> index 20ff237..2ee7a7b 100644
> > >>> --- a/include/net/sctp/constants.h
> > >>> +++ b/include/net/sctp/constants.h
> > >>> @@ -277,6 +277,7 @@ enum { SCTP_MAX_GABS = 16 };
> > >>> #define SCTP_RTO_INITIAL     (3 * 1000)
> > >>> #define SCTP_RTO_MIN         (1 * 1000)
> > >>> #define SCTP_RTO_MAX         (60 * 1000)
> > >>> +#define SCTP_RTO_HARD_MIN   200
> > >>>
> > >>> #define SCTP_RTO_ALPHA          3   /* 1/8 when converted to right shifts. */
> > >>> #define SCTP_RTO_BETA           2   /* 1/4 when converted to right shifts. */
> > >>> diff --git a/net/sctp/socket.c b/net/sctp/socket.c
> > >>> index ae7e7c6..6ef12c7 100644
> > >>> --- a/net/sctp/socket.c
> > >>> +++ b/net/sctp/socket.c
> > >>> @@ -3029,7 +3029,8 @@ static int sctp_setsockopt_nodelay(struct sock *sk, char __user *optval,
> > >>>  * be changed.
> > >>>  *
> > >>>  */
> > >>> -static int sctp_setsockopt_rtoinfo(struct sock *sk, char __user *optval, unsigned int optlen)
> > >>> +static int sctp_setsockopt_rtoinfo(struct sock *sk, char __user *optval,
> > >>> +                               unsigned int optlen)
> > >>> {
> > >>>      struct sctp_rtoinfo rtoinfo;
> > >>>      struct sctp_association *asoc;
> > >>> @@ -3056,10 +3057,13 @@ static int sctp_setsockopt_rtoinfo(struct sock *sk, char __user *optval, unsigne
> > >>>      else
> > >>>              rto_max = asoc ? asoc->rto_max : sp->rtoinfo.srto_max;
> > >>>
> > >>> -    if (rto_min)
> > >>> +    if (rto_min) {
> > >>> +            if (rto_min < SCTP_RTO_HARD_MIN)
> > >>> +                    return -EINVAL;
> > >>>              rto_min = asoc ? msecs_to_jiffies(rto_min) : rto_min;
> > >>> -    else
> > >>> +    } else {
> > >>>              rto_min = asoc ? asoc->rto_min : sp->rtoinfo.srto_min;
> > >>> +    }
> > >>>
> > >>>      if (rto_min > rto_max)
> > >>>              return -EINVAL;
> > >>> diff --git a/net/sctp/sysctl.c b/net/sctp/sysctl.c
> > >>> index 33ca5b7..7ec854a 100644
> > >>> --- a/net/sctp/sysctl.c
> > >>> +++ b/net/sctp/sysctl.c
> > >>> @@ -52,6 +52,7 @@ static int rto_alpha_min = 0;
> > >>> static int rto_beta_min = 0;
> > >>> static int rto_alpha_max = 1000;
> > >>> static int rto_beta_max = 1000;
> > >>> +static int rto_hard_min = SCTP_RTO_HARD_MIN;
> > >>>
> > >>> static unsigned long max_autoclose_min = 0;
> > >>> static unsigned long max_autoclose_max =
> > >>> @@ -116,7 +117,7 @@ static struct ctl_table sctp_net_table[] = {
> > >>>              .maxlen         = sizeof(unsigned int),
> > >>>              .mode           = 0644,
> > >>>              .proc_handler   = proc_sctp_do_rto_min,
> > >>> -            .extra1         = &one,
> > >>> +            .extra1         = &rto_hard_min,
> > >>>              .extra2         = &init_net.sctp.rto_max
> > >>>      },
> > >>>      {
> > >>> --
> > >>> 2.1.0
> > >>>
> > >>> --
> > >>> To unsubscribe from this list: send the line "unsubscribe linux-sctp" in
> > >>> the body of a message to majordomo@vger.kernel.org
> > >>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > >>>
> > >> Patch looks fine, you probably want to note this hard minimum in man(7) sctp as
> > >> well
> > >>
> > > I'm aware of some signalling networks which use RTO.min of smaller values than 200ms.
> > > So could this be reduced?
> > 
> > Hi Michael,
> > 
> > What value do they use?
> > 
> > Xin, Neil, is there more principled way of ensuring that a timer won't
> > cause a hard CPU stall? There are slow machines and there are slow
> > kernels (in particular syzbot kernel has tons of debug configs
> > enabled). 200ms _should_ not cause problems because we did not see
> > them with tcp. But it's hard to say what's the low limit as we are
> > trying to put a hard upper bound on execution time of a complex
> > section of code. Is there something like cond_resched for timers?
> Unfortunately, Theres not really a way to do conditional rescheduling of timers,
> additionally, we have a problem because the timer is reset as a side effect of
> the SCTP state machine, and so the execution time between timer updates has a
> signifcant amount of jitter (meaning its a pretty hard value to calibrate,
> unless you just select a 'safe' large value for the floor).
> 
> What we might could do (though this might impact the protocol function is change
> the timer update side effects to simply set a flag, and consistently update the
> timers on exit from sctp_do_sm, so they don't re-arm until all state machine
> processing is complete.  Anyone have any thoughts on that?

I was reviewing all this again and I'm thinking that we are missing
the real point. With the parameters that reproducer [1] has, setting
those very low RTO parameters, it causes the timer to actually
busyloop on the heartbeats, as Xin had explained.

But thing is, it busy loops not just because RTO is too low, but
because hbinterval was not accounted.

/* What is the next timeout value for this transport? */
unsigned long sctp_transport_timeout(struct sctp_transport *trans)
{
        /* RTO + timer slack +/- 50% of RTO */
        unsigned long timeout = trans->rto >> 1;  <-- [a]

        if (trans->state != SCTP_UNCONFIRMED &&
            trans->state != SCTP_PF)             <--- [2]
                timeout += trans->hbinterval;

        return timeout;
}

The if() in [2] is to speed up path verification before using them, as
per the commit changelog. Secondary paths added on processing the
cookie are created with status SCTP_UNCONFIRMED, and HB timers are
started in the sequence:
 sctp_sf_do_5_1D_ce
   -> sctp_process_init
     |> sctp_process_param
     | -> sctp_assoc_add_peer(asoc, &addr, gfp, SCTP_UNCONFIRMED)
     '> sctp_add_cmd_sf(commands, SCTP_CMD_HB_TIMERS_START, SCTP_NULL());

which starts the timer using only the small RTO for secondary paths:
static void sctp_cmd_hb_timers_start(struct sctp_cmd_seq *cmds,
                                     struct sctp_association *asoc)
{
        struct sctp_transport *t;

        /* Start a heartbeat timer for each transport on the association.
         * hold a reference on the transport to make sure none of
         * the needed data structures go away.
         */
        list_for_each_entry(t, &asoc->peer.transport_addr_list, transports)
                sctp_transport_reset_hb_timer(t);
}

But if the system is too busy generating HBs, it likely won't process
incoming HB ACKs, which would stop the loop as it would mark the
transport as Active.

I'm now thinking a better fix would be to have a specific way to
kickstart these initial heartbeets, and then always use hbinterval on
subsequent ones.

This would not only fix the issue, but also improve the time we need
to identify the transports as Active upon association start, which is
currently RTO/2 (equals to 500ms by default).

While working on this, I got myself wondering how HZ can affect the
stack with such small RTO. If we have HZ=250, for example, we probably
should be careful when doing calcs such as in mark [a] to not let it
tend to 0. This should not be related to the reported issue as
syzkaller was using HZ=1000.

(I didn't do any tests, this is only based on code review so far)

1. https://syzkaller.appspot.com/x/repro.syz?x=1079cf8f800000
2. ad8fec1720e0 ("[SCTP]: Verify all the paths to a peer via heartbeat before using them.")
b. https://syzkaller.appspot.com/x/.config?x=f3b4e30da84ec1ed

^ permalink raw reply

* Re: [PATCH 3/4 RFCv2] net: dsa: realtek-smi: Add Realtek SMI driver
From: Andrew Lunn @ 2018-05-28 19:33 UTC (permalink / raw)
  To: Linus Walleij
  Cc: Vivien Didelot, Florian Fainelli, netdev, openwrt-devel,
	LEDE Development List, Antti Seppälä, Roman Yeryomin,
	Colin Leitner, Gabor Juhos
In-Reply-To: <20180528174752.6806-4-linus.walleij@linaro.org>

> +#ifdef CONFIG_RTL8366_SMI_DEBUG_FS
> +	struct dentry           *debugfs_root;
> +	u8			dbg_vlan_4k_page;
> +#endif

Hi Linus

I think you forgot to remove this.

  Andrew

^ permalink raw reply

* Re: [PATCH net] sctp: not allow to set rto_min with a value below 200 msecs
From: Marcelo Ricardo Leitner @ 2018-05-28 18:56 UTC (permalink / raw)
  To: Michael Tuexen
  Cc: Dmitry Vyukov, Neil Horman, Xin Long, network dev, linux-sctp,
	David Miller, David Ahern, Eric Dumazet, syzkaller
In-Reply-To: <D8455431-D0D7-4420-BEBF-E0B5BE5DBC97@lurchi.franken.de>

On Sun, May 27, 2018 at 10:58:35AM +0200, Michael Tuexen wrote:
...
> >>> Patch looks fine, you probably want to note this hard minimum in man(7) sctp as
> >>> well
> >>> 
> >> I'm aware of some signalling networks which use RTO.min of smaller values than 200ms.
> >> So could this be reduced?
> > 
> > Hi Michael,
> > 
> > What value do they use?
> I have seen values of
> RTO.Min     =  50ms
> RTO.Max     = 200ms
> RTO.Initial = 100ms

Those look doable to me.

Thanks,
Marcelo

^ permalink raw reply

* Re: [PATCH 2/4 RFCv2] net: dsa: Add bindings for Realtek SMI DSAs
From: Andrew Lunn @ 2018-05-28 18:54 UTC (permalink / raw)
  To: Linus Walleij
  Cc: Vivien Didelot, Florian Fainelli, netdev, openwrt-devel,
	LEDE Development List, Antti Seppälä, Roman Yeryomin,
	Colin Leitner, Gabor Juhos, devicetree
In-Reply-To: <20180528174752.6806-3-linus.walleij@linaro.org>

> +Examples:
> +
> +switch {
> +	compatible = "realtek,rtl8366rb";
> +	/* 22 = MDIO (has input reads), 21 = MDC (clock, output only) */
> +	mdc-gpios = <&gpio0 21 GPIO_ACTIVE_HIGH>;
> +	mdio-gpios = <&gpio0 22 GPIO_ACTIVE_HIGH>;
> +	reset-gpios = <&gpio0 14 GPIO_ACTIVE_LOW>;
> +
> +	switch_intc: interrupt-controller {
> +		/* GPIO 15 provides the interrupt */
> +		interrupt-parent = <&gpio0>;
> +		interrupts = <15 IRQ_TYPE_LEVEL_LOW>;
> +		interrupt-controller;
> +		#address-cells = <0>;
> +		#interrupt-cells = <1>;
> +	};
> +
> +	ports {
> +		#address-cells = <1>;
> +		#size-cells = <0>;
> +		reg = <0>;
> +		port@0 {
> +			reg = <0>;
> +			label = "lan0";
> +			phy-handle = <&phy0>;
> +		};
> +		port@1 {
> +			reg = <1>;
> +			label = "lan1";
> +			phy-handle = <&phy1>;
> +		};
> +		port@2 {
> +			reg = <2>;
> +			label = "lan2";
> +			phy-handle = <&phy2>;
> +		};
> +		port@3 {
> +			reg = <3>;
> +			label = "lan3";
> +			phy-handle = <&phy3>;
> +		};
> +		port@4 {
> +			reg = <4>;
> +			label = "wan";
> +			phy-handle = <&phy4>;
> +		};
> +		port@5 {
> +			reg = <5>;
> +			label = "cpu";
> +			ethernet = <&gmac0>;
> +			phy-mode = "rgmii";
> +			fixed-link {
> +				speed = <1000>;
> +				full-duplex;
> +			};
> +		};
> +	};
> +
> +	mdio {
> +		compatible = "realtek,smi-mdio", "dsa-mdio";
> +		#address-cells = <1>;
> +		#size-cells = <0>;
> +
> +		phy0: phy@0 {
> +			reg = <0>;
> +			interrupt-parent = <&switch_intc>;
> +			interrupts = <0>;
> +		};

Hi Linus

This all looks correct, and as i suggested a while ago.

But since then, i discovered something, which allows this to be
simplified.

The mdio bus structure has an irq array. If you put the interrupt
numbers into this array, when the phy is connected, it will use that
interrupt. You then don't need a lot of this in your device tree.

Take a look at:

commit 6f88284f3bd77a0e51de22d4956f07557bcc0dbf
Author: Andrew Lunn <andrew@lunn.ch>
Date:   Sat Mar 17 20:32:05 2018 +0100

    net: dsa: mv88e6xxx: Add MDIO interrupts for internal PHYs
    
    When registering an MDIO bus, it is possible to pass an array of
    interrupts, one per address on the bus. phylib will then associate the
    interrupt to the PHY device, if no other interrupt is provided.
    
    Some of the global2 interrupts are PHY interrupts. Place them into the
    MDIO bus structure.


    Andrew

^ permalink raw reply

* Re: [PATCH net] net: sched: check netif_xmit_frozen_or_stopped() in sch_direct_xmit()
From: Song Liu @ 2018-05-28 18:30 UTC (permalink / raw)
  To: John Fastabend
  Cc: Song Liu, Alexei Starovoitov, netdev@vger.kernel.org, Kernel Team,
	David S . Miller
In-Reply-To: <c419633d-f1d6-9c60-1c89-d8fefd24d4ff@gmail.com>



> On May 26, 2018, at 12:43 PM, John Fastabend <john.fastabend@gmail.com> wrote:
> 
> On 05/25/2018 12:46 PM, Song Liu wrote:
>> On Fri, May 25, 2018 at 11:11 AM, Song Liu <songliubraving@fb.com> wrote:
>>> Summary:
>>> 
>>> At the end of sch_direct_xmit(), we are in the else path of
>>> !dev_xmit_complete(ret), which means ret == NETDEV_TX_OK. The following
>>> condition will always fail and netif_xmit_frozen_or_stopped() is not
>>> checked at all.
>>> 
>>>    if (ret && netif_xmit_frozen_or_stopped(txq))
>>>         return false;
>>> 
>>> In this patch, this condition is fixed as:
>>> 
>>>    if (netif_xmit_frozen_or_stopped(txq))
>>>         return false;
>>> 
>>> and further simplifies the code as:
>>> 
>>>    return !netif_xmit_frozen_or_stopped(txq);
>>> 
>>> Fixes: 29b86cdac00a ("net: sched: remove remaining uses for qdisc_qlen in xmit path")
>>> Cc: John Fastabend <john.fastabend@gmail.com>
>>> Cc: David S. Miller <davem@davemloft.net>
>>> Signed-off-by: Song Liu <songliubraving@fb.com>
>>> ---
>>> net/sched/sch_generic.c | 5 +----
>>> 1 file changed, 1 insertion(+), 4 deletions(-)
>>> 
>>> diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
>>> index 39c144b..8261d48 100644
>>> --- a/net/sched/sch_generic.c
>>> +++ b/net/sched/sch_generic.c
>>> @@ -346,10 +346,7 @@ bool sch_direct_xmit(struct sk_buff *skb, struct Qdisc *q,
>>>                return false;
>>>        }
>>> 
>>> -       if (ret && netif_xmit_frozen_or_stopped(txq))
>>> -               return false;
>>> -
>>> -       return true;
>>> +       return !netif_xmit_frozen_or_stopped(txq);
>>> }
>>> 
>>> /*
>>> --
>>> 2.9.5
>>> 
>> 
>> Alexei and I discussed about this offline. We would like to share our
>> discussion here to
>> clarify the motivation.
>> 
>> Before 29b86cdac00a, ret in condition "if (ret &&
>> netif_xmit_frozen_or_stopped()" is not
>> the value from dev_hard_start_xmit(), because ret is overwritten by
>> either qdisc_qlen()
>> or dev_requeue_skb(). Therefore, 29b86cdac00a changed the behavior of
>> this condition.
>> 
>> For ret from dev_hard_start_xmit(), I dig into the function and found
>> it is from return value
>> of ndo_start_xmit(). Per netdevice.h, ndo_start_xmit() should only
>> return NETDEV_TX_OK
>> or NETDEV_TX_BUSY. I survey many drivers, and they all follow the rule. The only
>> exception is vlan.
>> 
>> Given ret could only be NETDEV_TX_OK or NETDEV_TX_BUSY (ignore vlan for now),
>> if it fails condition "if (!dev_xmit_complete(ret))", ret must be
>> NETDEV_TX_OK == 0. So
>> netif_xmit_frozen_or_stopped() will always be bypassed.
>> 
>> It is probably OK to ignore netif_xmit_frozen_or_stopped(), and return true from
>> sch_direct_xmit(), as I didn't see that break any functionality. But
>> it is more like "correct
>> by accident" to me. This is the motivation of my original patch.
>> 
>> Alexei pointed out that, the following condition is more like original logic:
>> 
>>      if (qdisc_qlen(q) && netif_xmit_frozen_or_stopped(txq))
>>            return false;
>> 
>> However, I think John would like to remove qdisc_qlen() from the tx
>> path. I didn't see
> 
> Yep qdisc_qlen() is not very friendly for lockless users. At
> some point we will get around to writing a distributed rate
> limiter qdisc and it will be nice to not have to work-around
> qdisc_qlen().
> 
>> any issue without the extra qdisc_qlen() check, so the patch is
>> probably good AS-IS.
>> 
>> Please share your comments and feedback on this.
>> 
> 
> Thanks for the detailed analysis. The above patch looks OK
> to me. Actually I'm debating if we should just drop the check.
> But, there looks to be a case where drivers return NETDEV_TX_OK
> and then stop the queue because it is nearly overrun. By putting
> the check there we stop early instead of doing some extra work
> before realizing the driver ring is full.
> 
> Still this overrun case should be rare so removing the check
> should be OK. Plus as you note its not been running anyways. My
> current recommendation is just remove the check altogether.
> 
> Thanks,
> John 

Thanks John! I will resend a clean up patch to net-next. 

Song

^ permalink raw reply

* Re: INFO: rcu detected stall in is_bpf_text_address
From: Marcelo Ricardo Leitner @ 2018-05-28 18:22 UTC (permalink / raw)
  To: Xin Long
  Cc: Eric Dumazet, syzbot, ast, Daniel Borkmann, LKML, network dev,
	syzkaller-bugs
In-Reply-To: <CADvbK_e7LXc7minSmaPw6iZvjWo215wG-pPAS0gr58+-VxUO7Q@mail.gmail.com>

On Sun, May 20, 2018 at 04:26:03PM +0800, Xin Long wrote:
> On Sat, May 19, 2018 at 11:57 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> > SCTP experts, please take a look.
> >
> > On 05/19/2018 08:55 AM, syzbot wrote:
> >> Hello,
> >>
> >> syzbot found the following crash on:
> >>
> >> HEAD commit:    73fcb1a370c7 Merge branch 'akpm' (patches from Andrew)
> >> git tree:       upstream
> >> console output: https://syzkaller.appspot.com/x/log.txt?x=1462ec0f800000
> >> kernel config:  https://syzkaller.appspot.com/x/.config?x=f3b4e30da84ec1ed
> >> dashboard link: https://syzkaller.appspot.com/bug?extid=3dcd59a1f907245f891f
> >> compiler:       gcc (GCC) 8.0.1 20180413 (experimental)
> >> syzkaller repro:https://syzkaller.appspot.com/x/repro.syz?x=1079cf8f800000

The reproducer has:
r0 = socket$inet6(0xa, 0x1, 0x8010000000000084)

Considering socket() prototype uses an int for the 3rd argument, how
should I interpret this 64b value?

0x84 is IPPROTO_SCTP, but don't know about the 0x8010000000000000 in
there.

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox