Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH 02/17] netfilter: add namespace support for l4proto
From: Pablo Neira Ayuso @ 2012-05-23 10:25 UTC (permalink / raw)
  To: Gao feng
  Cc: netfilter-devel, netdev, serge.hallyn, ebiederm, dlezcano,
	Gao feng
In-Reply-To: <1336985547-31960-3-git-send-email-gaofeng@cn.fujitsu.com>

On Mon, May 14, 2012 at 04:52:12PM +0800, Gao feng wrote:
> From: Gao feng <gaofeng@cn.fujitus.com>
> 
> -nf_ct_(un)register_sysctl are changed to support net namespace,
>  use (un)register_net_sysctl_table replaces (un)register_sysctl_paths.
>  and in nf_ct_unregister_sysctl,kfree table only when users is 0.
> 
> -Add the struct net as param of nf_conntrack_l4proto_(un)register.
>  register or unregister the l4proto only when the net is init_net.
> 
> -nf_conntrack_l4proto_register call init_net to initial the pernet
>  data of l4proto.
> 
> -nf_ct_l4proto_net is used to get the pernet data of l4proto.
> 
> -use init_net as a param of nf_conntrack_l4proto_(un)register.
> 
> Acked-by: Eric W. Biederman <ebiederm@xmission.com>
> Signed-off-by: Gao feng <gaofeng@cn.fujitus.com>
> ---
>  include/net/netfilter/nf_conntrack_l4proto.h   |   13 +-
>  net/ipv4/netfilter/nf_conntrack_l3proto_ipv4.c |   18 +-
>  net/ipv6/netfilter/nf_conntrack_l3proto_ipv6.c |   18 +-
>  net/netfilter/nf_conntrack_proto.c             |  245 ++++++++++++++----------
>  net/netfilter/nf_conntrack_proto_dccp.c        |   10 +-
>  net/netfilter/nf_conntrack_proto_gre.c         |    6 +-
>  net/netfilter/nf_conntrack_proto_sctp.c        |   10 +-
>  net/netfilter/nf_conntrack_proto_udplite.c     |   10 +-
>  8 files changed, 191 insertions(+), 139 deletions(-)
> 
> diff --git a/include/net/netfilter/nf_conntrack_l4proto.h b/include/net/netfilter/nf_conntrack_l4proto.h
> index a90eab5..a93dcd5 100644
> --- a/include/net/netfilter/nf_conntrack_l4proto.h
> +++ b/include/net/netfilter/nf_conntrack_l4proto.h
> @@ -12,7 +12,7 @@
>  #include <linux/netlink.h>
>  #include <net/netlink.h>
>  #include <net/netfilter/nf_conntrack.h>
> -
> +#include <net/netns/generic.h>

Minor nitpick: make sure there's still one line between this structure
below and the include headers.

>  struct seq_file;
>  
>  struct nf_conntrack_l4proto {
> @@ -129,8 +129,15 @@ nf_ct_l4proto_find_get(u_int16_t l3proto, u_int8_t l4proto);
>  extern void nf_ct_l4proto_put(struct nf_conntrack_l4proto *p);
>  
>  /* Protocol registration. */
> -extern int nf_conntrack_l4proto_register(struct nf_conntrack_l4proto *proto);
> -extern void nf_conntrack_l4proto_unregister(struct nf_conntrack_l4proto *proto);
> +extern int nf_conntrack_l4proto_register(struct net *net,
> +					 struct nf_conntrack_l4proto *proto);
> +extern void nf_conntrack_l4proto_unregister(struct net *net,
> +					    struct nf_conntrack_l4proto *proto);
> +
> +extern int nf_ct_l4proto_register_sysctl(struct net *net,
> +					 struct nf_conntrack_l4proto *l4proto);
> +extern void nf_ct_l4proto_unregister_sysctl(struct net *net,
> +					    struct nf_conntrack_l4proto *l4proto);
>  
>  /* Generic netlink helpers */
>  extern int nf_ct_port_tuple_to_nlattr(struct sk_buff *skb,
> diff --git a/net/ipv4/netfilter/nf_conntrack_l3proto_ipv4.c b/net/ipv4/netfilter/nf_conntrack_l3proto_ipv4.c
> index 91747d4..46ec515 100644
> --- a/net/ipv4/netfilter/nf_conntrack_l3proto_ipv4.c
> +++ b/net/ipv4/netfilter/nf_conntrack_l3proto_ipv4.c
> @@ -391,19 +391,19 @@ static int __init nf_conntrack_l3proto_ipv4_init(void)
>  		return ret;
>  	}
>  
> -	ret = nf_conntrack_l4proto_register(&nf_conntrack_l4proto_tcp4);
> +	ret = nf_conntrack_l4proto_register(&init_net, &nf_conntrack_l4proto_tcp4);
>  	if (ret < 0) {
>  		pr_err("nf_conntrack_ipv4: can't register tcp.\n");
>  		goto cleanup_sockopt;
>  	}
>  
> -	ret = nf_conntrack_l4proto_register(&nf_conntrack_l4proto_udp4);
> +	ret = nf_conntrack_l4proto_register(&init_net, &nf_conntrack_l4proto_udp4);
>  	if (ret < 0) {
>  		pr_err("nf_conntrack_ipv4: can't register udp.\n");
>  		goto cleanup_tcp;
>  	}
>  
> -	ret = nf_conntrack_l4proto_register(&nf_conntrack_l4proto_icmp);
> +	ret = nf_conntrack_l4proto_register(&init_net, &nf_conntrack_l4proto_icmp);
>  	if (ret < 0) {
>  		pr_err("nf_conntrack_ipv4: can't register icmp.\n");
>  		goto cleanup_udp;
> @@ -434,11 +434,11 @@ static int __init nf_conntrack_l3proto_ipv4_init(void)
>   cleanup_ipv4:
>  	nf_conntrack_l3proto_unregister(&nf_conntrack_l3proto_ipv4);
>   cleanup_icmp:
> -	nf_conntrack_l4proto_unregister(&nf_conntrack_l4proto_icmp);
> +	nf_conntrack_l4proto_unregister(&init_net, &nf_conntrack_l4proto_icmp);
>   cleanup_udp:
> -	nf_conntrack_l4proto_unregister(&nf_conntrack_l4proto_udp4);
> +	nf_conntrack_l4proto_unregister(&init_net, &nf_conntrack_l4proto_udp4);
>   cleanup_tcp:
> -	nf_conntrack_l4proto_unregister(&nf_conntrack_l4proto_tcp4);
> +	nf_conntrack_l4proto_unregister(&init_net, &nf_conntrack_l4proto_tcp4);
>   cleanup_sockopt:
>  	nf_unregister_sockopt(&so_getorigdst);
>  	return ret;
> @@ -452,9 +452,9 @@ static void __exit nf_conntrack_l3proto_ipv4_fini(void)
>  #endif
>  	nf_unregister_hooks(ipv4_conntrack_ops, ARRAY_SIZE(ipv4_conntrack_ops));
>  	nf_conntrack_l3proto_unregister(&nf_conntrack_l3proto_ipv4);
> -	nf_conntrack_l4proto_unregister(&nf_conntrack_l4proto_icmp);
> -	nf_conntrack_l4proto_unregister(&nf_conntrack_l4proto_udp4);
> -	nf_conntrack_l4proto_unregister(&nf_conntrack_l4proto_tcp4);
> +	nf_conntrack_l4proto_unregister(&init_net, &nf_conntrack_l4proto_icmp);
> +	nf_conntrack_l4proto_unregister(&init_net, &nf_conntrack_l4proto_udp4);
> +	nf_conntrack_l4proto_unregister(&init_net, &nf_conntrack_l4proto_tcp4);
>  	nf_unregister_sockopt(&so_getorigdst);
>  }
>  
> diff --git a/net/ipv6/netfilter/nf_conntrack_l3proto_ipv6.c b/net/ipv6/netfilter/nf_conntrack_l3proto_ipv6.c
> index fe925e4..55f379f 100644
> --- a/net/ipv6/netfilter/nf_conntrack_l3proto_ipv6.c
> +++ b/net/ipv6/netfilter/nf_conntrack_l3proto_ipv6.c
> @@ -341,19 +341,19 @@ static int __init nf_conntrack_l3proto_ipv6_init(void)
>  	need_conntrack();
>  	nf_defrag_ipv6_enable();
>  
> -	ret = nf_conntrack_l4proto_register(&nf_conntrack_l4proto_tcp6);
> +	ret = nf_conntrack_l4proto_register(&init_net, &nf_conntrack_l4proto_tcp6);
>  	if (ret < 0) {
>  		pr_err("nf_conntrack_ipv6: can't register tcp.\n");
>  		return ret;
>  	}
>  
> -	ret = nf_conntrack_l4proto_register(&nf_conntrack_l4proto_udp6);
> +	ret = nf_conntrack_l4proto_register(&init_net, &nf_conntrack_l4proto_udp6);
>  	if (ret < 0) {
>  		pr_err("nf_conntrack_ipv6: can't register udp.\n");
>  		goto cleanup_tcp;
>  	}
>  
> -	ret = nf_conntrack_l4proto_register(&nf_conntrack_l4proto_icmpv6);
> +	ret = nf_conntrack_l4proto_register(&init_net, &nf_conntrack_l4proto_icmpv6);
>  	if (ret < 0) {
>  		pr_err("nf_conntrack_ipv6: can't register icmpv6.\n");
>  		goto cleanup_udp;
> @@ -377,11 +377,11 @@ static int __init nf_conntrack_l3proto_ipv6_init(void)
>   cleanup_ipv6:
>  	nf_conntrack_l3proto_unregister(&nf_conntrack_l3proto_ipv6);
>   cleanup_icmpv6:
> -	nf_conntrack_l4proto_unregister(&nf_conntrack_l4proto_icmpv6);
> +	nf_conntrack_l4proto_unregister(&init_net, &nf_conntrack_l4proto_icmpv6);
>   cleanup_udp:
> -	nf_conntrack_l4proto_unregister(&nf_conntrack_l4proto_udp6);
> +	nf_conntrack_l4proto_unregister(&init_net, &nf_conntrack_l4proto_udp6);
>   cleanup_tcp:
> -	nf_conntrack_l4proto_unregister(&nf_conntrack_l4proto_tcp6);
> +	nf_conntrack_l4proto_unregister(&init_net, &nf_conntrack_l4proto_tcp6);
>  	return ret;
>  }
>  
> @@ -390,9 +390,9 @@ static void __exit nf_conntrack_l3proto_ipv6_fini(void)
>  	synchronize_net();
>  	nf_unregister_hooks(ipv6_conntrack_ops, ARRAY_SIZE(ipv6_conntrack_ops));
>  	nf_conntrack_l3proto_unregister(&nf_conntrack_l3proto_ipv6);
> -	nf_conntrack_l4proto_unregister(&nf_conntrack_l4proto_icmpv6);
> -	nf_conntrack_l4proto_unregister(&nf_conntrack_l4proto_udp6);
> -	nf_conntrack_l4proto_unregister(&nf_conntrack_l4proto_tcp6);
> +	nf_conntrack_l4proto_unregister(&init_net, &nf_conntrack_l4proto_icmpv6);
> +	nf_conntrack_l4proto_unregister(&init_net, &nf_conntrack_l4proto_udp6);
> +	nf_conntrack_l4proto_unregister(&init_net, &nf_conntrack_l4proto_tcp6);
>  }
>  
>  module_init(nf_conntrack_l3proto_ipv6_init);
> diff --git a/net/netfilter/nf_conntrack_proto.c b/net/netfilter/nf_conntrack_proto.c
> index 8b631b0..6d68727 100644
> --- a/net/netfilter/nf_conntrack_proto.c
> +++ b/net/netfilter/nf_conntrack_proto.c
> @@ -35,30 +35,39 @@ EXPORT_SYMBOL_GPL(nf_ct_l3protos);
>  static DEFINE_MUTEX(nf_ct_proto_mutex);
>  
>  #ifdef CONFIG_SYSCTL
> -static int
> -nf_ct_register_sysctl(struct ctl_table_header **header, const char *path,
> -		      struct ctl_table *table, unsigned int *users)
> +int
> +nf_ct_register_sysctl(struct net *net,
> +		      struct ctl_table_header **header,
> +		      const char *path,
> +		      struct ctl_table *table,
> +		      unsigned int *users)
>  {
>  	if (*header == NULL) {
> -		*header = register_net_sysctl(&init_net, path, table);
> +		*header = register_net_sysctl(net, path, table);
>  		if (*header == NULL)
>  			return -ENOMEM;
>  	}
>  	if (users != NULL)
>  		(*users)++;
> +
>  	return 0;
>  }
> +EXPORT_SYMBOL_GPL(nf_ct_register_sysctl);

I don't see why we need to export nf_ct_register_sysctl. I think this
is a left-over from the previous patchset.

> -static void
> +void
>  nf_ct_unregister_sysctl(struct ctl_table_header **header,
> -			struct ctl_table *table, unsigned int *users)
> +			struct ctl_table **table,
> +			unsigned int *users)
>  {
>  	if (users != NULL && --*users > 0)
>  		return;
>  
>  	unregister_net_sysctl_table(*header);
> +	kfree(*table);
>  	*header = NULL;
> +	*table = NULL;
>  }
> +EXPORT_SYMBOL_GPL(nf_ct_unregister_sysctl);

Same thing. I don't find any external user of this new exported
function in your entire patchset.

You have to fix this.

>  #endif
>  
>  struct nf_conntrack_l4proto *
> @@ -167,7 +176,8 @@ static int nf_ct_l3proto_register_sysctl(struct nf_conntrack_l3proto *l3proto)
>  
>  #ifdef CONFIG_SYSCTL
>  	if (l3proto->ctl_table != NULL) {
> -		err = nf_ct_register_sysctl(&l3proto->ctl_table_header,
> +		err = nf_ct_register_sysctl(&init_net,
> +					    &l3proto->ctl_table_header,
>  					    l3proto->ctl_table_path,
>  					    l3proto->ctl_table, NULL);
>  	}
> @@ -180,7 +190,7 @@ static void nf_ct_l3proto_unregister_sysctl(struct nf_conntrack_l3proto *l3proto
>  #ifdef CONFIG_SYSCTL
>  	if (l3proto->ctl_table_header != NULL)
>  		nf_ct_unregister_sysctl(&l3proto->ctl_table_header,
> -					l3proto->ctl_table, NULL);
> +					&l3proto->ctl_table, NULL);
>  #endif
>  }
>  
> @@ -243,137 +253,172 @@ void nf_conntrack_l3proto_unregister(struct nf_conntrack_l3proto *proto)
>  }
>  EXPORT_SYMBOL_GPL(nf_conntrack_l3proto_unregister);
>  
> -static int nf_ct_l4proto_register_sysctl(struct nf_conntrack_l4proto *l4proto)
> +static struct nf_proto_net *nf_ct_l4proto_net(struct net *net,
> +					      struct nf_conntrack_l4proto *l4proto)
>  {
> -	int err = 0;
> +	if (l4proto->net_id)
> +		return net_generic(net, *l4proto->net_id);
> +	else
> +		return NULL;
> +}
>  
> +int nf_ct_l4proto_register_sysctl(struct net *net,
> +				  struct nf_conntrack_l4proto *l4proto)
> +{
> +	int err = 0;
> +	struct nf_proto_net *pn = nf_ct_l4proto_net(net, l4proto);
> +	if (pn == NULL)
> +		return 0;
>  #ifdef CONFIG_SYSCTL
> -	if (l4proto->ctl_table != NULL) {
> -		err = nf_ct_register_sysctl(l4proto->ctl_table_header,
> +	if (pn->ctl_table != NULL) {
> +		err = nf_ct_register_sysctl(net,
> +					    &pn->ctl_table_header,
>  					    "net/netfilter",
> -					    l4proto->ctl_table,
> -					    l4proto->ctl_table_users);
> -		if (err < 0)
> +					    pn->ctl_table,
> +					    &pn->users);
> +		if (err < 0) {
> +			kfree(pn->ctl_table);
> +			pn->ctl_table = NULL;
                               ^^^^^^^^^^^
Do you really need to set this above to NULL? Is there any existing
bug trap? If not, it's superfluous, please, remove it.

>  			goto out;
> +		}
>  	}
>  #ifdef CONFIG_NF_CONNTRACK_PROC_COMPAT
> -	if (l4proto->ctl_compat_table != NULL) {
> -		err = nf_ct_register_sysctl(&l4proto->ctl_compat_table_header,
> +	if (l4proto->compat && pn->ctl_compat_table != NULL) {
> +		err = nf_ct_register_sysctl(net,
> +					    &pn->ctl_compat_header,
>  					    "net/ipv4/netfilter",
> -					    l4proto->ctl_compat_table, NULL);
> +					    pn->ctl_compat_table,
> +					    NULL);
>  		if (err == 0)
>  			goto out;
> -		nf_ct_unregister_sysctl(l4proto->ctl_table_header,
> -					l4proto->ctl_table,
> -					l4proto->ctl_table_users);
> +
> +		kfree(pn->ctl_compat_table);
> +		pn->ctl_compat_table = NULL;
> +		nf_ct_unregister_sysctl(&pn->ctl_table_header,
> +					&pn->ctl_table,
> +					&pn->users);
>  	}
>  #endif /* CONFIG_NF_CONNTRACK_PROC_COMPAT */
>  out:
>  #endif /* CONFIG_SYSCTL */
>  	return err;
>  }
> +EXPORT_SYMBOL_GPL(nf_ct_l4proto_register_sysctl);
>  
> -static void nf_ct_l4proto_unregister_sysctl(struct nf_conntrack_l4proto *l4proto)
> +void nf_ct_l4proto_unregister_sysctl(struct net *net,
> +				     struct nf_conntrack_l4proto *l4proto)
>  {
> +	struct nf_proto_net *pn = nf_ct_l4proto_net(net, l4proto);
> +	if (pn == NULL)
> +		return;
>  #ifdef CONFIG_SYSCTL
> -	if (l4proto->ctl_table_header != NULL &&
> -	    *l4proto->ctl_table_header != NULL)
> -		nf_ct_unregister_sysctl(l4proto->ctl_table_header,
> -					l4proto->ctl_table,
> -					l4proto->ctl_table_users);
> +	if (pn->ctl_table_header != NULL)
> +		nf_ct_unregister_sysctl(&pn->ctl_table_header,
> +					&pn->ctl_table,
> +					&pn->users);
> +
>  #ifdef CONFIG_NF_CONNTRACK_PROC_COMPAT
> -	if (l4proto->ctl_compat_table_header != NULL)
> -		nf_ct_unregister_sysctl(&l4proto->ctl_compat_table_header,
> -					l4proto->ctl_compat_table, NULL);
> +	if (l4proto->compat && pn->ctl_compat_header != NULL)
> +		nf_ct_unregister_sysctl(&pn->ctl_compat_header,
> +					&pn->ctl_compat_table,
> +					NULL);
>  #endif /* CONFIG_NF_CONNTRACK_PROC_COMPAT */
> +#else
> +	pn->users--;
>  #endif /* CONFIG_SYSCTL */
>  }
> +EXPORT_SYMBOL_GPL(nf_ct_l4proto_unregister_sysctl);
>  
>  /* FIXME: Allow NULL functions and sub in pointers to generic for
>     them. --RR */
> -int nf_conntrack_l4proto_register(struct nf_conntrack_l4proto *l4proto)
> +int nf_conntrack_l4proto_register(struct net *net,
> +				  struct nf_conntrack_l4proto *l4proto)
>  {
>  	int ret = 0;

Minor nitpick: you save this amount of edits in this function that
result from the extra tabbing by moving all ...

if (net == &init_net) {
    ... this code ...
}

into some new static int nf_conntrack_l4proto_register_net(...) that
will be called by nf_conntrack_l4proto_register.

It will result more maintainable code. We still stick to 80-chars
columns, saving that extra tabbing makes the code more readable.

>  
> -	if (l4proto->l3proto >= PF_MAX)
> -		return -EBUSY;
> -
> -	if ((l4proto->to_nlattr && !l4proto->nlattr_size)
> -		|| (l4proto->tuple_to_nlattr && !l4proto->nlattr_tuple_size))
> -		return -EINVAL;
> -
> -	mutex_lock(&nf_ct_proto_mutex);
> -	if (!nf_ct_protos[l4proto->l3proto]) {
> -		/* l3proto may be loaded latter. */
> -		struct nf_conntrack_l4proto __rcu **proto_array;
> -		int i;
> -
> -		proto_array = kmalloc(MAX_NF_CT_PROTO *
> -				      sizeof(struct nf_conntrack_l4proto *),
> -				      GFP_KERNEL);
> -		if (proto_array == NULL) {
> -			ret = -ENOMEM;
> +	if (net == &init_net) {
> +		if (l4proto->l3proto >= PF_MAX)
> +			return -EBUSY;
> +
> +		if ((l4proto->to_nlattr && !l4proto->nlattr_size)
> +			|| (l4proto->tuple_to_nlattr && !l4proto->nlattr_tuple_size))
> +			return -EINVAL;
> +
> +		mutex_lock(&nf_ct_proto_mutex);
> +		if (!nf_ct_protos[l4proto->l3proto]) {
> +			/* l3proto may be loaded latter. */
> +			struct nf_conntrack_l4proto __rcu **proto_array;
> +			int i;
> +
> +			proto_array = kmalloc(MAX_NF_CT_PROTO *
> +					      sizeof(struct nf_conntrack_l4proto *),
> +					      GFP_KERNEL);
> +			if (proto_array == NULL) {
> +				ret = -ENOMEM;
> +				goto out_unlock;
> +			}
> +
> +			for (i = 0; i < MAX_NF_CT_PROTO; i++)
> +				RCU_INIT_POINTER(proto_array[i], &nf_conntrack_l4proto_generic);
> +
> +			/* Before making proto_array visible to lockless readers,
> +			 * we must make sure its content is committed to memory.
> +			 */
> +			smp_wmb();
> +
> +			nf_ct_protos[l4proto->l3proto] = proto_array;
> +		} else if (rcu_dereference_protected(
> +				nf_ct_protos[l4proto->l3proto][l4proto->l4proto],
> +				lockdep_is_held(&nf_ct_proto_mutex)
> +				) != &nf_conntrack_l4proto_generic) {
> +			ret = -EBUSY;
>  			goto out_unlock;
>  		}
>  
> -		for (i = 0; i < MAX_NF_CT_PROTO; i++)
> -			RCU_INIT_POINTER(proto_array[i], &nf_conntrack_l4proto_generic);
> -
> -		/* Before making proto_array visible to lockless readers,
> -		 * we must make sure its content is committed to memory.
> -		 */
> -		smp_wmb();
> -
> -		nf_ct_protos[l4proto->l3proto] = proto_array;
> -	} else if (rcu_dereference_protected(
> -			nf_ct_protos[l4proto->l3proto][l4proto->l4proto],
> -			lockdep_is_held(&nf_ct_proto_mutex)
> -			) != &nf_conntrack_l4proto_generic) {
> -		ret = -EBUSY;
> -		goto out_unlock;
> -	}
> -
> -	ret = nf_ct_l4proto_register_sysctl(l4proto);
> -	if (ret < 0)
> -		goto out_unlock;
> -
> -	l4proto->nla_size = 0;
> -	if (l4proto->nlattr_size)
> -		l4proto->nla_size += l4proto->nlattr_size();
> -	if (l4proto->nlattr_tuple_size)
> -		l4proto->nla_size += 3 * l4proto->nlattr_tuple_size();
> -
> -	rcu_assign_pointer(nf_ct_protos[l4proto->l3proto][l4proto->l4proto],
> -			   l4proto);
> +		l4proto->nla_size = 0;
> +		if (l4proto->nlattr_size)
> +			l4proto->nla_size += l4proto->nlattr_size();
> +		if (l4proto->nlattr_tuple_size)
> +			l4proto->nla_size += 3 * l4proto->nlattr_tuple_size();
>  
> +		rcu_assign_pointer(nf_ct_protos[l4proto->l3proto][l4proto->l4proto],
> +				   l4proto);
>  out_unlock:
> -	mutex_unlock(&nf_ct_proto_mutex);
> -	return ret;
> +		mutex_unlock(&nf_ct_proto_mutex);
> +		if (ret < 0)
> +			return ret;
> +	}
> +	if (l4proto->init_net) {
> +		ret = l4proto->init_net(net, l4proto->compat);
> +		if (ret < 0)
> +			return ret;
> +	}
> +	return nf_ct_l4proto_register_sysctl(net, l4proto);
>  }
>  EXPORT_SYMBOL_GPL(nf_conntrack_l4proto_register);
>  
> -void nf_conntrack_l4proto_unregister(struct nf_conntrack_l4proto *l4proto)
> +void nf_conntrack_l4proto_unregister(struct net *net,
> +				     struct nf_conntrack_l4proto *l4proto)
>  {
> -	struct net *net;
> -
> -	BUG_ON(l4proto->l3proto >= PF_MAX);
> -
> -	mutex_lock(&nf_ct_proto_mutex);
> -	BUG_ON(rcu_dereference_protected(
> -			nf_ct_protos[l4proto->l3proto][l4proto->l4proto],
> -			lockdep_is_held(&nf_ct_proto_mutex)
> -			) != l4proto);
> -	rcu_assign_pointer(nf_ct_protos[l4proto->l3proto][l4proto->l4proto],
> -			   &nf_conntrack_l4proto_generic);
> -	nf_ct_l4proto_unregister_sysctl(l4proto);
> -	mutex_unlock(&nf_ct_proto_mutex);
> -
> -	synchronize_rcu();
> +	if (net == &init_net) {

Same thing as above here.

> +		BUG_ON(l4proto->l3proto >= PF_MAX);
> +		mutex_lock(&nf_ct_proto_mutex);
> +
> +		BUG_ON(rcu_dereference_protected(
> +				nf_ct_protos[l4proto->l3proto][l4proto->l4proto],
> +				lockdep_is_held(&nf_ct_proto_mutex)
> +				) != l4proto);
> +		rcu_assign_pointer(nf_ct_protos[l4proto->l3proto][l4proto->l4proto],
> +				   &nf_conntrack_l4proto_generic);
> +		mutex_unlock(&nf_ct_proto_mutex);
> +
> +		synchronize_rcu();
> +	}
> +	nf_ct_l4proto_unregister_sysctl(net, l4proto);
>  
>  	/* Remove all contrack entries for this protocol */
>  	rtnl_lock();
> -	for_each_net(net)
> -		nf_ct_iterate_cleanup(net, kill_l4proto, l4proto);
> +	nf_ct_iterate_cleanup(net, kill_l4proto, l4proto);
>  	rtnl_unlock();
>  }
>  EXPORT_SYMBOL_GPL(nf_conntrack_l4proto_unregister);
> @@ -383,7 +428,7 @@ int nf_conntrack_proto_init(void)
>  	unsigned int i;
>  	int err;
>  
> -	err = nf_ct_l4proto_register_sysctl(&nf_conntrack_l4proto_generic);
> +	err = nf_ct_l4proto_register_sysctl(&init_net, &nf_conntrack_l4proto_generic);
>  	if (err < 0)
>  		return err;
>  
> @@ -397,7 +442,7 @@ void nf_conntrack_proto_fini(void)
>  {
>  	unsigned int i;
>  
> -	nf_ct_l4proto_unregister_sysctl(&nf_conntrack_l4proto_generic);
> +	nf_ct_l4proto_unregister_sysctl(&init_net, &nf_conntrack_l4proto_generic);
>  
>  	/* free l3proto protocol tables */
>  	for (i = 0; i < PF_MAX; i++)
> diff --git a/net/netfilter/nf_conntrack_proto_dccp.c b/net/netfilter/nf_conntrack_proto_dccp.c
> index ef706a4..5a8e037 100644
> --- a/net/netfilter/nf_conntrack_proto_dccp.c
> +++ b/net/netfilter/nf_conntrack_proto_dccp.c
> @@ -945,17 +945,17 @@ static int __init nf_conntrack_proto_dccp_init(void)
>  	if (err < 0)
>  		goto err1;
>  
> -	err = nf_conntrack_l4proto_register(&dccp_proto4);
> +	err = nf_conntrack_l4proto_register(&init_net, &dccp_proto4);
>  	if (err < 0)
>  		goto err2;
>  
> -	err = nf_conntrack_l4proto_register(&dccp_proto6);
> +	err = nf_conntrack_l4proto_register(&init_net, &dccp_proto6);
>  	if (err < 0)
>  		goto err3;
>  	return 0;
>  
>  err3:
> -	nf_conntrack_l4proto_unregister(&dccp_proto4);
> +	nf_conntrack_l4proto_unregister(&init_net, &dccp_proto4);
>  err2:
>  	unregister_pernet_subsys(&dccp_net_ops);
>  err1:
> @@ -965,8 +965,8 @@ err1:
>  static void __exit nf_conntrack_proto_dccp_fini(void)
>  {
>  	unregister_pernet_subsys(&dccp_net_ops);
> -	nf_conntrack_l4proto_unregister(&dccp_proto6);
> -	nf_conntrack_l4proto_unregister(&dccp_proto4);
> +	nf_conntrack_l4proto_unregister(&init_net, &dccp_proto6);
> +	nf_conntrack_l4proto_unregister(&init_net, &dccp_proto4);
>  }
>  
>  module_init(nf_conntrack_proto_dccp_init);
> diff --git a/net/netfilter/nf_conntrack_proto_gre.c b/net/netfilter/nf_conntrack_proto_gre.c
> index 4bf6b4e..132f0d2 100644
> --- a/net/netfilter/nf_conntrack_proto_gre.c
> +++ b/net/netfilter/nf_conntrack_proto_gre.c
> @@ -396,18 +396,18 @@ static int __init nf_ct_proto_gre_init(void)
>  {
>  	int rv;
>  
> -	rv = nf_conntrack_l4proto_register(&nf_conntrack_l4proto_gre4);
> +	rv = nf_conntrack_l4proto_register(&init_net, &nf_conntrack_l4proto_gre4);
>  	if (rv < 0)
>  		return rv;
>  	rv = register_pernet_subsys(&proto_gre_net_ops);
>  	if (rv < 0)
> -		nf_conntrack_l4proto_unregister(&nf_conntrack_l4proto_gre4);
> +		nf_conntrack_l4proto_unregister(&init_net, &nf_conntrack_l4proto_gre4);
>  	return rv;
>  }
>  
>  static void __exit nf_ct_proto_gre_fini(void)
>  {
> -	nf_conntrack_l4proto_unregister(&nf_conntrack_l4proto_gre4);
> +	nf_conntrack_l4proto_unregister(&init_net, &nf_conntrack_l4proto_gre4);
>  	unregister_pernet_subsys(&proto_gre_net_ops);
>  }
>  
> diff --git a/net/netfilter/nf_conntrack_proto_sctp.c b/net/netfilter/nf_conntrack_proto_sctp.c
> index 996db2f..97bbc20 100644
> --- a/net/netfilter/nf_conntrack_proto_sctp.c
> +++ b/net/netfilter/nf_conntrack_proto_sctp.c
> @@ -791,12 +791,12 @@ static int __init nf_conntrack_proto_sctp_init(void)
>  {
>  	int ret;
>  
> -	ret = nf_conntrack_l4proto_register(&nf_conntrack_l4proto_sctp4);
> +	ret = nf_conntrack_l4proto_register(&init_net, &nf_conntrack_l4proto_sctp4);
>  	if (ret) {
>  		pr_err("nf_conntrack_l4proto_sctp4: protocol register failed\n");
>  		goto out;
>  	}
> -	ret = nf_conntrack_l4proto_register(&nf_conntrack_l4proto_sctp6);
> +	ret = nf_conntrack_l4proto_register(&init_net, &nf_conntrack_l4proto_sctp6);
>  	if (ret) {
>  		pr_err("nf_conntrack_l4proto_sctp6: protocol register failed\n");
>  		goto cleanup_sctp4;
> @@ -805,15 +805,15 @@ static int __init nf_conntrack_proto_sctp_init(void)
>  	return ret;
>  
>   cleanup_sctp4:
> -	nf_conntrack_l4proto_unregister(&nf_conntrack_l4proto_sctp4);
> +	nf_conntrack_l4proto_unregister(&init_net, &nf_conntrack_l4proto_sctp4);
>   out:
>  	return ret;
>  }
>  
>  static void __exit nf_conntrack_proto_sctp_fini(void)
>  {
> -	nf_conntrack_l4proto_unregister(&nf_conntrack_l4proto_sctp6);
> -	nf_conntrack_l4proto_unregister(&nf_conntrack_l4proto_sctp4);
> +	nf_conntrack_l4proto_unregister(&init_net, &nf_conntrack_l4proto_sctp6);
> +	nf_conntrack_l4proto_unregister(&init_net, &nf_conntrack_l4proto_sctp4);
>  }
>  
>  module_init(nf_conntrack_proto_sctp_init);
> diff --git a/net/netfilter/nf_conntrack_proto_udplite.c b/net/netfilter/nf_conntrack_proto_udplite.c
> index 4d60a53..fa142a8 100644
> --- a/net/netfilter/nf_conntrack_proto_udplite.c
> +++ b/net/netfilter/nf_conntrack_proto_udplite.c
> @@ -299,23 +299,23 @@ static int __init nf_conntrack_proto_udplite_init(void)
>  {
>  	int err;
>  
> -	err = nf_conntrack_l4proto_register(&nf_conntrack_l4proto_udplite4);
> +	err = nf_conntrack_l4proto_register(&init_net, &nf_conntrack_l4proto_udplite4);
>  	if (err < 0)
>  		goto err1;
> -	err = nf_conntrack_l4proto_register(&nf_conntrack_l4proto_udplite6);
> +	err = nf_conntrack_l4proto_register(&init_net, &nf_conntrack_l4proto_udplite6);
>  	if (err < 0)
>  		goto err2;
>  	return 0;
>  err2:
> -	nf_conntrack_l4proto_unregister(&nf_conntrack_l4proto_udplite4);
> +	nf_conntrack_l4proto_unregister(&init_net, &nf_conntrack_l4proto_udplite4);
>  err1:
>  	return err;
>  }
>  
>  static void __exit nf_conntrack_proto_udplite_exit(void)
>  {
> -	nf_conntrack_l4proto_unregister(&nf_conntrack_l4proto_udplite6);
> -	nf_conntrack_l4proto_unregister(&nf_conntrack_l4proto_udplite4);
> +	nf_conntrack_l4proto_unregister(&init_net, &nf_conntrack_l4proto_udplite6);
> +	nf_conntrack_l4proto_unregister(&init_net, &nf_conntrack_l4proto_udplite4);
>  }
>  
>  module_init(nf_conntrack_proto_udplite_init);
> -- 
> 1.7.7.6
> 
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH 01/17] netfilter: add struct nf_proto_net for register l4proto sysctl
From: Pablo Neira Ayuso @ 2012-05-23 10:12 UTC (permalink / raw)
  To: Gao feng
  Cc: netfilter-devel, netdev, serge.hallyn, ebiederm, dlezcano,
	Gao feng
In-Reply-To: <1336985547-31960-2-git-send-email-gaofeng@cn.fujitsu.com>

On Mon, May 14, 2012 at 04:52:11PM +0800, Gao feng wrote:
> From: Gao feng <gaofeng@cn.fujitus.com>
> 
> the struct nf_proto_net stroes proto's ctl_table_header and ctl_table,
> nf_ct_l4proto_(un)register_sysctl use it to register sysctl.
> 
> there are some changes for struct nf_conntrack_l4proto:
> - add field compat to identify if this proto should do compat.
> - the net_id field is used to store the pernet_operations id
>   that belones to l4proto.
> - init_net will be used to initial the proto's pernet data
> 
> and add init_net for struct nf_conntrack_l3proto too.

This patchset looks bette but there are still things that we have to
resolve.

The first one (regarding this patch 1/17) changes in:
* include/net/netfilter/nf_conntrack_l4proto.h
* include/net/netns/conntrack.h

should be included in:
[PATCH] netfilter: add namespace support for l4proto

And changes in:
* include/net/netfilter/nf_conntrack_l3proto.h

should be included in:
[PATCH] netfilter: add namespace support for l3proto

I already told you. A patch that adds a structure without using it,
is not good. The structure has to go together with the code uses it.

More comments below.

> Acked-by: Eric W. Biederman <ebiederm@xmission.com>
> Signed-off-by: Gao feng <gaofeng@cn.fujitus.com>
> ---
>  include/net/netfilter/nf_conntrack_l3proto.h |    3 +++
>  include/net/netfilter/nf_conntrack_l4proto.h |    6 ++++++
>  include/net/netns/conntrack.h                |   12 ++++++++++++
>  3 files changed, 21 insertions(+), 0 deletions(-)
> 
> diff --git a/include/net/netfilter/nf_conntrack_l3proto.h b/include/net/netfilter/nf_conntrack_l3proto.h
> index 9699c02..9766005 100644
> --- a/include/net/netfilter/nf_conntrack_l3proto.h
> +++ b/include/net/netfilter/nf_conntrack_l3proto.h
> @@ -69,6 +69,9 @@ struct nf_conntrack_l3proto {
>  	struct ctl_table	*ctl_table;
>  #endif /* CONFIG_SYSCTL */
>  
> +	/* Init l3proto pernet data */
> +	int (*init_net)(struct net *net);
> +
>  	/* Module (if any) which this is connected to. */
>  	struct module *me;
>  };
> diff --git a/include/net/netfilter/nf_conntrack_l4proto.h b/include/net/netfilter/nf_conntrack_l4proto.h
> index 3b572bb..a90eab5 100644
> --- a/include/net/netfilter/nf_conntrack_l4proto.h
> +++ b/include/net/netfilter/nf_conntrack_l4proto.h
> @@ -22,6 +22,8 @@ struct nf_conntrack_l4proto {
>  	/* L4 Protocol number. */
>  	u_int8_t l4proto;
>  
> +	u_int8_t compat;

I don't see why we need this new field.

It seems to be set to 1 in each structure that has set:

.ctl_compat_table

to non-NULL. So, it's redundant.

Moreover, you already know from the protocol tracker itself if you
have to allocate the compat ctl table or not.

In other words: You set compat to 1 for nf_conntrack_l4proto_generic.
Then, you pass that compat value to generic_init_net via ->inet_net
again, but this information (that determines if the compat has to be
done or not) is already in the scope of the protocol tracker.

You have to fix this.

> +
>  	/* Try to fill in the third arg: dataoff is offset past network protocol
>             hdr.  Return true if possible. */
>  	bool (*pkt_to_tuple)(const struct sk_buff *skb, unsigned int dataoff,
> @@ -103,6 +105,10 @@ struct nf_conntrack_l4proto {
>  	struct ctl_table	*ctl_compat_table;
>  #endif
>  #endif
> +	int	*net_id;
> +	/* Init l4proto pernet data */
> +	int (*init_net)(struct net *net, u_int8_t compat);
> +
>  	/* Protocol name */
>  	const char *name;
>  
> diff --git a/include/net/netns/conntrack.h b/include/net/netns/conntrack.h
> index a053a19..1f53038 100644
> --- a/include/net/netns/conntrack.h
> +++ b/include/net/netns/conntrack.h
> @@ -8,6 +8,18 @@
>  struct ctl_table_header;
>  struct nf_conntrack_ecache;
>  
> +struct nf_proto_net {
> +#ifdef CONFIG_SYSCTL
> +	struct ctl_table_header *ctl_table_header;
> +	struct ctl_table        *ctl_table;
> +#ifdef CONFIG_NF_CONNTRACK_PROC_COMPAT
> +	struct ctl_table_header *ctl_compat_header;
> +	struct ctl_table        *ctl_compat_table;
> +#endif
> +#endif
> +	unsigned int		users;
> +};
> +
>  struct netns_ct {
>  	atomic_t		count;
>  	unsigned int		expect_count;

^ permalink raw reply

* Re: TCPBacklogDrops during aggressive bursts of traffic
From: Eric Dumazet @ 2012-05-23  9:44 UTC (permalink / raw)
  To: Kieran Mansley; +Cc: Ben Hutchings, netdev
In-Reply-To: <1337720076.3361.667.camel@edumazet-glaptop>

Update :

My tests show that sk_backlog.len can reach 1.5MB easily on a netperf,
even with a fast machine, receiving a 10Gb flow (ixgbe adapter), if
LRO/GRO are off.

424 backlogdrop for 193.182.546 incoming packets (with my tcp_space()
patch applied)

I believe that as soon as ixgbe can use build_skb() and avoid the 1024
bytes overhead per skb, it should go away.

Of course, another way to solve the problem would be to change
tcp_recvmsg() to use lock_sock_fast(), so that no frame is backlogged at
all.

Locking the socket for the whole operation (including copyout to user)
is not very good. It was good enough years ago with small receive
window.

With a potentially huge backlog, it means user process has to process
it, regardless of its latency constraints. CPU caches are also
completely destroyed because of huge amount of data included in thousand
of skbs.

^ permalink raw reply

* Re: [PATCH v6 2/2] decrement static keys on real destroy time
From: Glauber Costa @ 2012-05-23  9:16 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm-Bw31MaZKKs3YtjvyW6yDsg, cgroups-u79uwXL29TY76Z2rM5mHXA,
	devel-GEFAQzZX7r8dnm+yROfE0A,
	kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A,
	netdev-u79uwXL29TY76Z2rM5mHXA, Tejun Heo, Li Zefan,
	Johannes Weiner, Michal Hocko, David Miller
In-Reply-To: <20120522154610.f2f9b78e.akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>

On 05/23/2012 02:46 AM, Andrew Morton wrote:
> Here, we're open-coding kinda-test_bit().  Why do that?  These flags are
> modified with set_bit() and friends, so we should read them with the
> matching test_bit()?

My reasoning was to be as cheap as possible, as you noted yourself two
paragraphs below.

> Also, these bool-returning functions will return values other than 0
> and 1.  That probably works OK and I don't know what the C standards
> and implementations do about this.  But it seems unclean and slightly
> risky to have a "bool" value of 32!  Converting these functions to use
> test_bit() fixes this - test_bit() returns only 0 or 1.
>
> test_bit() is slightly more expensive than the above.  If this is
> considered to be an issue then I guess we could continue to use this
> approach.  But I do think a code comment is needed, explaining and
> justifying the unusual decision to bypass the bitops API.  Also these
> functions should tell the truth and return an "int" type.
>
>> >
>> >  +static void disarm_static_keys(struct mem_cgroup *memcg)
>> >  +{
>> >  +	disarm_sock_keys(memcg);
>> >  +}
> Why does this function exist?  Its single caller could call
> disarm_sock_keys() directly.

It exists to make it clear that this is the point in which static keys
should be disabled. I already have a patchset that introduces other 
static keys, that should, of course, also be disabled here.

I am totally fine with calling directly disarm_sock_keys(), and then in 
that series wrap it in disarm_static_keys, IOW, defer its introduction,
if that's how you prefer.


>
>> >    static void drain_all_stock_async(struct mem_cgroup *memcg);
>> >
>> >    static struct mem_cgroup_per_zone *
>> >  @@ -4836,6 +4854,13 @@ static void free_work(struct work_struct *work)
>> >    	int size = sizeof(struct mem_cgroup);
>> >
>> >    	memcg = container_of(work, struct mem_cgroup, work_freeing);
>> >  +	/*
>> >  +	 * We need to make sure that (at least for now), the jump label
>> >  +	 * destruction code runs outside of the cgroup lock.
> This is a poor comment - it failed to tell the reader*why*  that code
> must run outside the cgroup lock.

Ok, will update.

>> >							schedule_work()
>> >  +	 * will guarantee this happens. Be careful if you need to move this
>> >  +	 * disarm_static_keys around
> It's a bit difficult for the reader to be careful when he isn't told
> what the risks are.

Ok, will update.

>> >  +	 */
>> >  +	disarm_static_keys(memcg);
>> >    	if (size<  PAGE_SIZE)
>> >    		kfree(memcg);
>> >    	else
>> >  diff --git a/net/ipv4/tcp_memcontrol.c b/net/ipv4/tcp_memcontrol.c
>> >  index 1517037..3b8fa25 100644
>> >  --- a/net/ipv4/tcp_memcontrol.c
>> >  +++ b/net/ipv4/tcp_memcontrol.c
>> >  @@ -74,9 +74,6 @@ void tcp_destroy_cgroup(struct mem_cgroup *memcg)
>> >    	percpu_counter_destroy(&tcp->tcp_sockets_allocated);
>> >
>> >    	val = res_counter_read_u64(&tcp->tcp_memory_allocated, RES_LIMIT);
>> >  -
>> >  -	if (val != RESOURCE_MAX)
>> >  -		static_key_slow_dec(&memcg_socket_limit_enabled);
>> >    }
>> >    EXPORT_SYMBOL(tcp_destroy_cgroup);
>> >
>> >  @@ -107,10 +104,33 @@ static int tcp_update_limit(struct mem_cgroup *memcg, u64 val)
>> >    		tcp->tcp_prot_mem[i] = min_t(long, val>>  PAGE_SHIFT,
>> >    					net->ipv4.sysctl_tcp_mem[i]);
>> >
>> >  -	if (val == RESOURCE_MAX&&  old_lim != RESOURCE_MAX)
>> >  -		static_key_slow_dec(&memcg_socket_limit_enabled);
>> >  -	else if (old_lim == RESOURCE_MAX&&  val != RESOURCE_MAX)
>> >  -		static_key_slow_inc(&memcg_socket_limit_enabled);
>> >  +	if (val == RESOURCE_MAX)
>> >  +		clear_bit(MEMCG_SOCK_ACTIVE,&cg_proto->flags);
>> >  +	else if (val != RESOURCE_MAX) {
>> >  +		/*
>> >  +		 *  The active bit needs to be written after the static_key update.
>> >  +		 *  This is what guarantees that the socket activation function
>> >  +		 *  is the last one to run. See sock_update_memcg() for details,
>> >  +		 *  and note that we don't mark any socket as belonging to this
>> >  +		 *  memcg until that flag is up.
>> >  +		 *
>> >  +		 *  We need to do this, because static_keys will span multiple
>> >  +		 *  sites, but we can't control their order. If we mark a socket
>> >  +		 *  as accounted, but the accounting functions are not patched in
>> >  +		 *  yet, we'll lose accounting.
>> >  +		 *
>> >  +		 *  We never race with the readers in sock_update_memcg(), because
>> >  +		 *  when this value change, the code to process it is not patched in
>> >  +		 *  yet.
>> >  +		 *
>> >  +		 *  The activated bit is used to guarantee that no two writers will
>> >  +		 *  do the update in the same memcg. Without that, we can't properly
>> >  +		 *  shutdown the static key.
>> >  +		 */
> This comment needlessly overflows 80 cols and has a pointless and
> unconventional double-space indenting.  I already provided a patch
> which fixes this and a few other things, but that was ignored when you
> did the v6.

Sorry, I missed it.
>
>> >  +		if (!test_and_set_bit(MEMCG_SOCK_ACTIVATED,&cg_proto->flags))
>> >  +			static_key_slow_inc(&memcg_socket_limit_enabled);
>> >  +		set_bit(MEMCG_SOCK_ACTIVE,&cg_proto->flags);
>> >  +	}
> So here are suggested changes from*some*  of the above discussion.
> Please consider, incorporate, retest and send us a v7?

How do you want me to do it? Should I add your patch ontop of mine,
and then another one that tweaks whatever else is left, or should I just
merge those changes into the patches I have?

^ permalink raw reply

* Re: [PATCH 1/3] drivers: net: stmmac: add blackfin support
From: Giuseppe CAVALLARO @ 2012-05-23  8:27 UTC (permalink / raw)
  To: Giuseppe CAVALLARO
  Cc: Bob Liu, davem, francesco.virlinzi, rayagond, sr, netdev,
	uclinux-dist-devel
In-Reply-To: <4FBC9E09.6010300@st.com>

On 5/23/2012 10:21 AM, Giuseppe CAVALLARO wrote:
> Hello Bob Liu
> 
> On 5/23/2012 9:58 AM, Bob Liu wrote:
>> Hi Peppe,
>>
>> On 5/22/12, Giuseppe CAVALLARO <peppe.cavallaro@st.com> wrote:
>>> Hello Bob Liu
>>>
>>> On 5/22/2012 9:38 AM, Bob Liu wrote:
>>>> Blackfin arch use stmmac on its reference board bf609-ezkit, the stmmac
>>>> ip
>>>> version is 3.61a.
>>>>
>>>> But the spec seems a little different, some register addr and define are
>>>> not
>>>> the same with current code.
>>>>
>>>> This patch add the support for blackfin arch following the spec.
>>>
>>> The 3.61a is supported and you have to point to the dw1000.h header file.
>>>
>>> To support this GMAC generation you only need to pass from the platform
>>> the field has_gmac (see stmmac.txt).
>>> Also the 3.61a has the HW cap registers so many internal fields (e.g. rx
>>> coe, enhanced descr ...) will be fixed at run-time (although you can
>>> pass them from the platform).
>>>
>>> Your patch adds the GMAC SPEC in the old MAC 10/100.
>>>
>>> Also I am reluctant to have specific ifdef <ARCH> within the code.
>>> I do think the driver already has all the platform fields to run on your
>>> board. If you need extra conf pls feel free to enhance the
>>> plat_stmmacenet_data.
>>>
>>
>> Thank you for your reply.
>> I tried to use driver dwmac1000 by setting .has_gmac = 1 today.
>> Ping can finish with no error but when rcp a file or telnet it will hang.
> 
> Hmm this should be debugged ... maybe you can verify the tx / rx
> checksum. I mean if your IP has these modules or if, for somereason, the
> HW cap register is not present and there are not properly fixed
> 
>>
>> Using below patch without setting .has_gmac, everything works fine.
> 
> With your patch (that added the dwmac1000 into the dwmac100) you are
> indeed using the MAC100 setting where by default there is no HW
> checksumming ;-)

 platform fields to use are: tx_coe and rx_coe.

Peppe

>> Any ideas?  Thank you.
> 
> you are welcome
> 
> Peppe
> 
>> diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac100.h
>> b/drivers/net/ethernet/stmicro/stmmac/dwmac100.h
>> index 7c6d857..00499b8 100644
>> --- a/drivers/net/ethernet/stmicro/stmmac/dwmac100.h
>> +++ b/drivers/net/ethernet/stmicro/stmmac/dwmac100.h
>> @@ -29,6 +29,18 @@
>>   *                             MAC BLOCK defines
>>   *---------------------------------------------------------------------------*/
>>  /* MAC CSR offset */
>> +#if defined(CONFIG_BLACKFIN)
>> +#define MAC_CONTROL    0x00000000      /* MAC Control */
>> +#define MAC_FRAME_FILTER       0x0000004       /* Frame filter */
>> +#define MAC_HASH_HIGH  0x00000008      /* Multicast Hash Table High */
>> +#define MAC_HASH_LOW   0x0000000c      /* Multicast Hash Table Low */
>> +#define MAC_MII_ADDR   0x00000010      /* MII Address */
>> +#define MAC_MII_DATA   0x00000014      /* MII Data */
>> +#define MAC_FLOW_CTRL  0x00000018      /* Flow Control */
>> +#define MAC_VLAN1      0x0000001c      /* VLAN1 Tag */
>> +#define MAC_ADDR_HIGH  0x00000040      /* MAC Address High */
>> +#define MAC_ADDR_LOW   0x00000044      /* MAC Address Low */
>> +#else
>>  #define MAC_CONTROL    0x00000000      /* MAC Control */
>>  #define MAC_ADDR_HIGH  0x00000004      /* MAC Address High */
>>  #define MAC_ADDR_LOW   0x00000008      /* MAC Address Low */
>> @@ -39,6 +51,7 @@
>>  #define MAC_FLOW_CTRL  0x0000001c      /* Flow Control */
>>  #define MAC_VLAN1      0x00000020      /* VLAN1 Tag */
>>  #define MAC_VLAN2      0x00000024      /* VLAN2 Tag */
>> +#endif
>>
>>  /* MAC CTRL defines */
>>  #define MAC_CONTROL_RA 0x80000000      /* Receive All Mode */
>> @@ -67,7 +80,11 @@
>>  #define MAC_CONTROL_TE         0x00000008      /* Transmitter Enable */
>>  #define MAC_CONTROL_RE         0x00000004      /* Receiver Enable */
>>
>> +#ifdef CONFIG_BLACKFIN
>> +#define MAC_CORE_INIT ((1 << 14) | MAC_CONTROL_DBF)
>> +#else
>>  #define MAC_CORE_INIT (MAC_CONTROL_HBD | MAC_CONTROL_ASTP)
>> +#endif
>>
>>  /* MAC FLOW CTRL defines */
>>  #define MAC_FLOW_CTRL_PT_MASK  0xffff0000      /* Pause Time Mask */
>>
> 
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 

^ permalink raw reply

* Re: [PATCH 1/3] drivers: net: stmmac: add blackfin support
From: Giuseppe CAVALLARO @ 2012-05-23  8:21 UTC (permalink / raw)
  To: Bob Liu; +Cc: davem, francesco.virlinzi, rayagond, sr, netdev,
	uclinux-dist-devel
In-Reply-To: <CAA_GA1dWh=8rh9TQ4D81X22v-Cy1ZRUbBcZnVegqfFnKn_hVSw@mail.gmail.com>

Hello Bob Liu

On 5/23/2012 9:58 AM, Bob Liu wrote:
> Hi Peppe,
> 
> On 5/22/12, Giuseppe CAVALLARO <peppe.cavallaro@st.com> wrote:
>> Hello Bob Liu
>>
>> On 5/22/2012 9:38 AM, Bob Liu wrote:
>>> Blackfin arch use stmmac on its reference board bf609-ezkit, the stmmac
>>> ip
>>> version is 3.61a.
>>>
>>> But the spec seems a little different, some register addr and define are
>>> not
>>> the same with current code.
>>>
>>> This patch add the support for blackfin arch following the spec.
>>
>> The 3.61a is supported and you have to point to the dw1000.h header file.
>>
>> To support this GMAC generation you only need to pass from the platform
>> the field has_gmac (see stmmac.txt).
>> Also the 3.61a has the HW cap registers so many internal fields (e.g. rx
>> coe, enhanced descr ...) will be fixed at run-time (although you can
>> pass them from the platform).
>>
>> Your patch adds the GMAC SPEC in the old MAC 10/100.
>>
>> Also I am reluctant to have specific ifdef <ARCH> within the code.
>> I do think the driver already has all the platform fields to run on your
>> board. If you need extra conf pls feel free to enhance the
>> plat_stmmacenet_data.
>>
> 
> Thank you for your reply.
> I tried to use driver dwmac1000 by setting .has_gmac = 1 today.
> Ping can finish with no error but when rcp a file or telnet it will hang.

Hmm this should be debugged ... maybe you can verify the tx / rx
checksum. I mean if your IP has these modules or if, for somereason, the
HW cap register is not present and there are not properly fixed

> 
> Using below patch without setting .has_gmac, everything works fine.

With your patch (that added the dwmac1000 into the dwmac100) you are
indeed using the MAC100 setting where by default there is no HW
checksumming ;-)

> Any ideas?  Thank you.

you are welcome

Peppe

> diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac100.h
> b/drivers/net/ethernet/stmicro/stmmac/dwmac100.h
> index 7c6d857..00499b8 100644
> --- a/drivers/net/ethernet/stmicro/stmmac/dwmac100.h
> +++ b/drivers/net/ethernet/stmicro/stmmac/dwmac100.h
> @@ -29,6 +29,18 @@
>   *                             MAC BLOCK defines
>   *---------------------------------------------------------------------------*/
>  /* MAC CSR offset */
> +#if defined(CONFIG_BLACKFIN)
> +#define MAC_CONTROL    0x00000000      /* MAC Control */
> +#define MAC_FRAME_FILTER       0x0000004       /* Frame filter */
> +#define MAC_HASH_HIGH  0x00000008      /* Multicast Hash Table High */
> +#define MAC_HASH_LOW   0x0000000c      /* Multicast Hash Table Low */
> +#define MAC_MII_ADDR   0x00000010      /* MII Address */
> +#define MAC_MII_DATA   0x00000014      /* MII Data */
> +#define MAC_FLOW_CTRL  0x00000018      /* Flow Control */
> +#define MAC_VLAN1      0x0000001c      /* VLAN1 Tag */
> +#define MAC_ADDR_HIGH  0x00000040      /* MAC Address High */
> +#define MAC_ADDR_LOW   0x00000044      /* MAC Address Low */
> +#else
>  #define MAC_CONTROL    0x00000000      /* MAC Control */
>  #define MAC_ADDR_HIGH  0x00000004      /* MAC Address High */
>  #define MAC_ADDR_LOW   0x00000008      /* MAC Address Low */
> @@ -39,6 +51,7 @@
>  #define MAC_FLOW_CTRL  0x0000001c      /* Flow Control */
>  #define MAC_VLAN1      0x00000020      /* VLAN1 Tag */
>  #define MAC_VLAN2      0x00000024      /* VLAN2 Tag */
> +#endif
> 
>  /* MAC CTRL defines */
>  #define MAC_CONTROL_RA 0x80000000      /* Receive All Mode */
> @@ -67,7 +80,11 @@
>  #define MAC_CONTROL_TE         0x00000008      /* Transmitter Enable */
>  #define MAC_CONTROL_RE         0x00000004      /* Receiver Enable */
> 
> +#ifdef CONFIG_BLACKFIN
> +#define MAC_CORE_INIT ((1 << 14) | MAC_CONTROL_DBF)
> +#else
>  #define MAC_CORE_INIT (MAC_CONTROL_HBD | MAC_CONTROL_ASTP)
> +#endif
> 
>  /* MAC FLOW CTRL defines */
>  #define MAC_FLOW_CTRL_PT_MASK  0xffff0000      /* Pause Time Mask */
> 

^ permalink raw reply

* Kernel doesn't propagate DNSSL to userspace (e.g. NetworkManager)
From: Pavel Simerda @ 2012-05-23  8:16 UTC (permalink / raw)
  To: netdev; +Cc: Dan Williams, danw
In-Reply-To: <0263d6e6-50ba-4101-b958-260e2948c4ab@zmail19.collab.prod.int.phx2.redhat.com>

Hi,

I filed a bugreport to about lack of DNSSL support in kernel:

https://bugzilla.redhat.com/show_bug.cgi?id=824121

While NetworkManager recieves RDNSS neighbor discovery user option from kernel, it doesn't recieve DNSSL at all. This can be debugged with NetworkManager
(or hopefully some better testing tool) and radvdump (to check if DNSSL is present).

radvdump reports DNSSL is there, NetworkManager gets no netlink message from kernel.

kernel-3.3.4-4.fc17.x86_64
NetworkManager-0.9.4.0-7.git20120403.fc17.x86_64

Dave Jones asked me to contact this ML directly. I'm offlist.

Cheers,

Pavel Šimerda

^ permalink raw reply

* Re: [PATCH 1/3] drivers: net: stmmac: add blackfin support
From: Bob Liu @ 2012-05-23  7:58 UTC (permalink / raw)
  To: Giuseppe CAVALLARO
  Cc: davem, francesco.virlinzi, rayagond, sr, netdev,
	uclinux-dist-devel
In-Reply-To: <4FBB8852.7060403@st.com>

Hi Peppe,

On 5/22/12, Giuseppe CAVALLARO <peppe.cavallaro@st.com> wrote:
> Hello Bob Liu
>
> On 5/22/2012 9:38 AM, Bob Liu wrote:
>> Blackfin arch use stmmac on its reference board bf609-ezkit, the stmmac
>> ip
>> version is 3.61a.
>>
>> But the spec seems a little different, some register addr and define are
>> not
>> the same with current code.
>>
>> This patch add the support for blackfin arch following the spec.
>
> The 3.61a is supported and you have to point to the dw1000.h header file.
>
> To support this GMAC generation you only need to pass from the platform
> the field has_gmac (see stmmac.txt).
> Also the 3.61a has the HW cap registers so many internal fields (e.g. rx
> coe, enhanced descr ...) will be fixed at run-time (although you can
> pass them from the platform).
>
> Your patch adds the GMAC SPEC in the old MAC 10/100.
>
> Also I am reluctant to have specific ifdef <ARCH> within the code.
> I do think the driver already has all the platform fields to run on your
> board. If you need extra conf pls feel free to enhance the
> plat_stmmacenet_data.
>

Thank you for your reply.
I tried to use driver dwmac1000 by setting .has_gmac = 1 today.
Ping can finish with no error but when rcp a file or telnet it will hang.

Using below patch without setting .has_gmac, everything works fine.
Any ideas?  Thank you.

diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac100.h
b/drivers/net/ethernet/stmicro/stmmac/dwmac100.h
index 7c6d857..00499b8 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac100.h
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac100.h
@@ -29,6 +29,18 @@
  *                             MAC BLOCK defines
  *---------------------------------------------------------------------------*/
 /* MAC CSR offset */
+#if defined(CONFIG_BLACKFIN)
+#define MAC_CONTROL    0x00000000      /* MAC Control */
+#define MAC_FRAME_FILTER       0x0000004       /* Frame filter */
+#define MAC_HASH_HIGH  0x00000008      /* Multicast Hash Table High */
+#define MAC_HASH_LOW   0x0000000c      /* Multicast Hash Table Low */
+#define MAC_MII_ADDR   0x00000010      /* MII Address */
+#define MAC_MII_DATA   0x00000014      /* MII Data */
+#define MAC_FLOW_CTRL  0x00000018      /* Flow Control */
+#define MAC_VLAN1      0x0000001c      /* VLAN1 Tag */
+#define MAC_ADDR_HIGH  0x00000040      /* MAC Address High */
+#define MAC_ADDR_LOW   0x00000044      /* MAC Address Low */
+#else
 #define MAC_CONTROL    0x00000000      /* MAC Control */
 #define MAC_ADDR_HIGH  0x00000004      /* MAC Address High */
 #define MAC_ADDR_LOW   0x00000008      /* MAC Address Low */
@@ -39,6 +51,7 @@
 #define MAC_FLOW_CTRL  0x0000001c      /* Flow Control */
 #define MAC_VLAN1      0x00000020      /* VLAN1 Tag */
 #define MAC_VLAN2      0x00000024      /* VLAN2 Tag */
+#endif

 /* MAC CTRL defines */
 #define MAC_CONTROL_RA 0x80000000      /* Receive All Mode */
@@ -67,7 +80,11 @@
 #define MAC_CONTROL_TE         0x00000008      /* Transmitter Enable */
 #define MAC_CONTROL_RE         0x00000004      /* Receiver Enable */

+#ifdef CONFIG_BLACKFIN
+#define MAC_CORE_INIT ((1 << 14) | MAC_CONTROL_DBF)
+#else
 #define MAC_CORE_INIT (MAC_CONTROL_HBD | MAC_CONTROL_ASTP)
+#endif

 /* MAC FLOW CTRL defines */
 #define MAC_FLOW_CTRL_PT_MASK  0xffff0000      /* Pause Time Mask */

-- 
Regards,
--Bob

^ permalink raw reply related

* Re: [PATCH] ipv4: fix the rcu race between free_fib_info and ip_route_output_slow
From: Yanmin Zhang @ 2012-05-23  7:47 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, kunx.jiang, netdev, linux-kernel
In-Reply-To: <1337758775.3361.2056.camel@edumazet-glaptop>

On Wed, 2012-05-23 at 09:39 +0200, Eric Dumazet wrote:
> On Wed, 2012-05-23 at 15:24 +0800, Yanmin Zhang wrote:
> > On Wed, 2012-05-23 at 09:13 +0200, Eric Dumazet wrote:
> > > On Wed, 2012-05-23 at 08:55 +0200, Eric Dumazet wrote:
> > > 
> > > > Please hold on, I'll send a v2
> > > 
> > > I believe your patch should be fine, if you move back the
> > > fib_info_cnt--;
> > > 
> > > So only do the dev_put() in free_fib_info_rcu().
> > We would do so in a new patch.
> > 
> > > 
> > > No need to clear nh_dev to NULL since we are freeing fi at the end of
> > > function.
> > David suggests to reset it to NULL to detect other potential
> > race conditions.
> > 
> 
> Yes but no.
> 
> Users are in a RCU read lock and we should not change nh_dev to NULL
> while they are running.
> 
> Thats what I tried to do (David message gave me this wrong hint) but it
> came to a dead issue.
> 
> Only after a grace period we can :
> 	dev_put() all involved net_devices
> 	kfree(fi)
> 
> > Besides above suggestions, how do you think about:
> > 
> > fib_create_info=>fib_find_info, but fib_find_info is not protected by
> > fib_info_lock. See the codes:
> > 
> > fib_create_info()
> > {
> > 	...
> > link_it:
> >         ofi = fib_find_info(fi);
> >         if (ofi) {
> >                 fi->fib_dead = 1;
> >                 free_fib_info(fi);
> >                 ofi->fib_treeref++;
> >                 return ofi;
> >         }
> >         fi->fib_treeref++;
> >         atomic_inc(&fi->fib_clntref);
> >         spin_lock_bh(&fib_info_lock);
> > 
> > 	...
> > }
> > 
> > I plan to change it to hold fib_info_lock before calling fib_find_info. Is
> > it ok for you?
> 
> Its not needed we hold RTNL mutex.
Thanks. We would work out a new patch and post it here after running MTBF
testing.

^ permalink raw reply

* Re: [PATCH] ipv4: fix the rcu race between free_fib_info and ip_route_output_slow
From: Eric Dumazet @ 2012-05-23  7:41 UTC (permalink / raw)
  To: Yanmin Zhang; +Cc: David Miller, kunx.jiang, netdev, linux-kernel
In-Reply-To: <1337758775.3361.2056.camel@edumazet-glaptop>

On Wed, 2012-05-23 at 09:39 +0200, Eric Dumazet wrote:

> spinlock is needed mainly because of ip_fib_check_default(), but this
> could be changed to use RCU as well. 
> 
> (readers use RCU, writers already hold RTNL -> no more fib_info_lock )

Since its not a fast path and hash table can be resized, its probably
not worth the pain.

^ permalink raw reply

* Re: [PATCH] ipv4: fix the rcu race between free_fib_info and ip_route_output_slow
From: Eric Dumazet @ 2012-05-23  7:39 UTC (permalink / raw)
  To: Yanmin Zhang; +Cc: David Miller, kunx.jiang, netdev, linux-kernel
In-Reply-To: <1337757862.14538.199.camel@ymzhang.sh.intel.com>

On Wed, 2012-05-23 at 15:24 +0800, Yanmin Zhang wrote:
> On Wed, 2012-05-23 at 09:13 +0200, Eric Dumazet wrote:
> > On Wed, 2012-05-23 at 08:55 +0200, Eric Dumazet wrote:
> > 
> > > Please hold on, I'll send a v2
> > 
> > I believe your patch should be fine, if you move back the
> > fib_info_cnt--;
> > 
> > So only do the dev_put() in free_fib_info_rcu().
> We would do so in a new patch.
> 
> > 
> > No need to clear nh_dev to NULL since we are freeing fi at the end of
> > function.
> David suggests to reset it to NULL to detect other potential
> race conditions.
> 

Yes but no.

Users are in a RCU read lock and we should not change nh_dev to NULL
while they are running.

Thats what I tried to do (David message gave me this wrong hint) but it
came to a dead issue.

Only after a grace period we can :
	dev_put() all involved net_devices
	kfree(fi)

> Besides above suggestions, how do you think about:
> 
> fib_create_info=>fib_find_info, but fib_find_info is not protected by
> fib_info_lock. See the codes:
> 
> fib_create_info()
> {
> 	...
> link_it:
>         ofi = fib_find_info(fi);
>         if (ofi) {
>                 fi->fib_dead = 1;
>                 free_fib_info(fi);
>                 ofi->fib_treeref++;
>                 return ofi;
>         }
>         fi->fib_treeref++;
>         atomic_inc(&fi->fib_clntref);
>         spin_lock_bh(&fib_info_lock);
> 
> 	...
> }
> 
> I plan to change it to hold fib_info_lock before calling fib_find_info. Is
> it ok for you?

Its not needed we hold RTNL mutex.

spinlock is needed mainly because of ip_fib_check_default(), but this
could be changed to use RCU as well. 

(readers use RCU, writers already hold RTNL -> no more fib_info_lock )

^ permalink raw reply

* Re: [PATCH 1/4] stmmac: remove two useless initialisation
From: Giuseppe CAVALLARO @ 2012-05-23  7:38 UTC (permalink / raw)
  To: David Miller; +Cc: netdev
In-Reply-To: <20120523.024211.816676351593259308.davem@davemloft.net>

On 5/23/2012 8:42 AM, David Miller wrote:
> 
> You really need to post your patches properly.
> 
> In particular, you need to indicate, either in an initial "[PATCH 0/N]"
> posting, or in the subject lines of that patch, exactly what tree
> you are targetting.
> 
> And if this is meant for net-next, I told you specifically that net-next
> is not open right now and you need to submit this later when I make
> the announcement here that it is open.
> 
> We are in the middle of the merge window, and therefore you should only
> be submitting bug fixes against the 'net' tree.

Previous ones were for net-next and in fact in the subject I indeed used:

  [net-next X/N] stmmac: ....

After your comment I re-sent them for net.git (pls see my previous email
where I clarified that).

I have always posted all my patches for net-next adding its suffix in
the subject.

For all the patches for net.git I've never added the suffix but I'll do
that.

I agree with you that the suffix should be added and I had understood
(and always used and fixed in my subjectprefix) that these were the
format of the patches in this mailing list

[PATCH] ...        =>>> for net.git
[PATCH net-next]   =>>> for net-next.git

So I'll use starting from now:

[PATCH (net.git)]    =>>> for net.git
[PATCH (net-next)]   =>>> for net-next.git

> It is extremely irritating that, when I go out of my way to make
> announcements here on this list, multiple times, about what kind of
> patches are appropriate and what kinds are not, and yet people still
> do not listen and they still submit any old crap they feel like
> submitting.
> 
> People need to stop this, now.
> 
> Pay attention to the development state, and follow the rules,
> otherwise you'll royally piss me off and I'll ignore your patches
> completely.
> 
> Thanks.
> 
> 
> 

^ permalink raw reply

* Re: [PATCH] ipv4: fix the rcu race between free_fib_info and ip_route_output_slow
From: Yanmin Zhang @ 2012-05-23  7:24 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, kunx.jiang, netdev, linux-kernel
In-Reply-To: <1337757238.3361.1965.camel@edumazet-glaptop>

On Wed, 2012-05-23 at 09:13 +0200, Eric Dumazet wrote:
> On Wed, 2012-05-23 at 08:55 +0200, Eric Dumazet wrote:
> 
> > Please hold on, I'll send a v2
> 
> I believe your patch should be fine, if you move back the
> fib_info_cnt--;
> 
> So only do the dev_put() in free_fib_info_rcu().
We would do so in a new patch.

> 
> No need to clear nh_dev to NULL since we are freeing fi at the end of
> function.
David suggests to reset it to NULL to detect other potential
race conditions.

Besides above suggestions, how do you think about:

fib_create_info=>fib_find_info, but fib_find_info is not protected by
fib_info_lock. See the codes:

fib_create_info()
{
	...
link_it:
        ofi = fib_find_info(fi);
        if (ofi) {
                fi->fib_dead = 1;
                free_fib_info(fi);
                ofi->fib_treeref++;
                return ofi;
        }
        fi->fib_treeref++;
        atomic_inc(&fi->fib_clntref);
        spin_lock_bh(&fib_info_lock);

	...
}

I plan to change it to hold fib_info_lock before calling fib_find_info. Is
it ok for you?

Thanks for the direct speaking.

Yanmin

^ permalink raw reply

* Re: [PATCH] ipv4: fix the rcu race between free_fib_info and ip_route_output_slow
From: Eric Dumazet @ 2012-05-23  7:13 UTC (permalink / raw)
  To: Yanmin Zhang; +Cc: David Miller, kunx.jiang, netdev, linux-kernel
In-Reply-To: <1337756138.3361.1922.camel@edumazet-glaptop>

On Wed, 2012-05-23 at 08:55 +0200, Eric Dumazet wrote:

> Please hold on, I'll send a v2

I believe your patch should be fine, if you move back the
fib_info_cnt--;

So only do the dev_put() in free_fib_info_rcu().

No need to clear nh_dev to NULL since we are freeing fi at the end of
function.

^ permalink raw reply

* Re: [PATCH] ipv4: fix the rcu race between free_fib_info and ip_route_output_slow
From: Eric Dumazet @ 2012-05-23  6:55 UTC (permalink / raw)
  To: Yanmin Zhang; +Cc: David Miller, kunx.jiang, netdev, linux-kernel
In-Reply-To: <1337755623.14538.194.camel@ymzhang.sh.intel.com>

On Wed, 2012-05-23 at 14:47 +0800, Yanmin Zhang wrote:
> On Wed, 2012-05-23 at 08:27 +0200, Eric Dumazet wrote:
> > On Wed, 2012-05-23 at 14:15 +0800, Yanmin Zhang wrote:
> > 
> > > 
> > > Besides fi->fib_dev here, we need check fi->fib_hash, fi->fib_lhash,
> > > and nexthop_nh->nh_hash.
> > 
> > No, its not needed.
> I am checking it now.
> fib_create_info=>fib_find_info, but fib_find_info is not protected by
> fib_info_lock. See the codes:
> 
> 
> link_it:
>         ofi = fib_find_info(fi);
>         if (ofi) {
>                 fi->fib_dead = 1;
>                 free_fib_info(fi);
>                 ofi->fib_treeref++;
>                 return ofi;
>         }
> 
> > 
> > I am sending a patch, because I feel this area is too complex for non
> > netdev guys.
> Indeed, it's complicated. I will test your patches.

Please hold on, I'll send a v2

^ permalink raw reply

* Re: [PATCH] ipv4: fix the rcu race between free_fib_info and ip_route_output_slow
From: Yanmin Zhang @ 2012-05-23  6:47 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, kunx.jiang, netdev, linux-kernel
In-Reply-To: <1337754459.3361.1850.camel@edumazet-glaptop>

On Wed, 2012-05-23 at 08:27 +0200, Eric Dumazet wrote:
> On Wed, 2012-05-23 at 14:15 +0800, Yanmin Zhang wrote:
> 
> > 
> > Besides fi->fib_dev here, we need check fi->fib_hash, fi->fib_lhash,
> > and nexthop_nh->nh_hash.
> 
> No, its not needed.
I am checking it now.
fib_create_info=>fib_find_info, but fib_find_info is not protected by
fib_info_lock. See the codes:


link_it:
        ofi = fib_find_info(fi);
        if (ofi) {
                fi->fib_dead = 1;
                free_fib_info(fi);
                ofi->fib_treeref++;
                return ofi;
        }

> 
> I am sending a patch, because I feel this area is too complex for non
> netdev guys.
Indeed, it's complicated. I will test your patches.

^ permalink raw reply

* Re: [PATCH] ipv4: fix the rcu race between free_fib_info and ip_route_output_slow
From: Eric Dumazet @ 2012-05-23  6:43 UTC (permalink / raw)
  To: Yanmin Zhang; +Cc: David Miller, kunx.jiang, netdev, linux-kernel
In-Reply-To: <1337755038.3361.1878.camel@edumazet-glaptop>

On Wed, 2012-05-23 at 08:37 +0200, Eric Dumazet wrote:
> From: Eric Dumazet <edumazet@google.com>

> diff --git a/net/ipv4/route.c b/net/ipv4/route.c
> index ffcb3b0..b56b91e 100644
> --- a/net/ipv4/route.c
> +++ b/net/ipv4/route.c
> @@ -2127,7 +2127,7 @@ static int __mkroute_input(struct sk_buff *skb,
>  	u32 itag;
>  
>  	/* get a working reference to the output device */
> -	out_dev = __in_dev_get_rcu(FIB_RES_DEV(*res));
> +	out_dev = __in_dev_get_rcu(rcu_dereference(FIB_RES_DEV(*res)));

This part might need additional fix (if FIB_RES_DEV(*res) is NULL),
because __in_dev_get_rcu() could crash dereferencing NULL pointer.

^ permalink raw reply

* Re: [PATCH 1/4] stmmac: remove two useless initialisation
From: David Miller @ 2012-05-23  6:42 UTC (permalink / raw)
  To: peppe.cavallaro; +Cc: netdev
In-Reply-To: <1337754964-21754-1-git-send-email-peppe.cavallaro@st.com>


You really need to post your patches properly.

In particular, you need to indicate, either in an initial "[PATCH 0/N]"
posting, or in the subject lines of that patch, exactly what tree
you are targetting.

And if this is meant for net-next, I told you specifically that net-next
is not open right now and you need to submit this later when I make
the announcement here that it is open.

We are in the middle of the merge window, and therefore you should only
be submitting bug fixes against the 'net' tree.

It is extremely irritating that, when I go out of my way to make
announcements here on this list, multiple times, about what kind of
patches are appropriate and what kinds are not, and yet people still
do not listen and they still submit any old crap they feel like
submitting.

People need to stop this, now.

Pay attention to the development state, and follow the rules,
otherwise you'll royally piss me off and I'll ignore your patches
completely.

Thanks.

^ permalink raw reply

* Re: [PATCH] ipv4: fix the rcu race between free_fib_info and ip_route_output_slow
From: Eric Dumazet @ 2012-05-23  6:37 UTC (permalink / raw)
  To: Yanmin Zhang; +Cc: David Miller, kunx.jiang, netdev, linux-kernel
In-Reply-To: <1337749339.3361.1655.camel@edumazet-glaptop>

From: Eric Dumazet <edumazet@google.com>

I am testing following patch, could you test it too ?

Thanks

[PATCH] ipv4: nh_dev should be rcu protected

Yanmin Zhang reported an OOPS and provided a patch.

<3>[23898.789643] BUG: sleeping function called from invalid context at
/data/buildbot/workdir/ics/hardware/intel/linux-2.6/arch/x86/mm/fault.c:1103
<3>[23898.862215] in_atomic(): 0, irqs_disabled(): 0, pid: 10526, name:
Thread-6683
<4>[23898.967805] HSU serial 0000:00:05.1: 0000:00:05.2:HSU serial
prevented me
to suspend...
<4>[23899.258526] Pid: 10526, comm: Thread-6683 Tainted: G        W
3.0.8-137685-ge7742f9 #1
<4>[23899.357404] HSU serial 0000:00:05.1: 0000:00:05.2:HSU serial
prevented me
to suspend...
<4>[23899.904225] Call Trace:
<4>[23899.989209]  [<c1227f50>] ? pgtable_bad+0x130/0x130
<4>[23900.000416]  [<c1238c2a>] __might_sleep+0x10a/0x110
<4>[23900.007357]  [<c1228021>] do_page_fault+0xd1/0x3c0
<4>[23900.013764]  [<c18e9ba9>] ? restore_all+0xf/0xf
<4>[23900.024024]  [<c17c007b>] ? napi_complete+0x8b/0x690
<4>[23900.029297]  [<c1227f50>] ? pgtable_bad+0x130/0x130
<4>[23900.123739]  [<c1227f50>] ? pgtable_bad+0x130/0x130
<4>[23900.128955]  [<c18ea0c3>] error_code+0x5f/0x64
<4>[23900.133466]  [<c1227f50>] ? pgtable_bad+0x130/0x130
<4>[23900.138450]  [<c17f6298>] ? __ip_route_output_key+0x698/0x7c0
<4>[23900.144312]  [<c17f5f8d>] ? __ip_route_output_key+0x38d/0x7c0
<4>[23900.150730]  [<c17f63df>] ip_route_output_flow+0x1f/0x60
<4>[23900.156261]  [<c181de58>] ip4_datagram_connect+0x188/0x2b0
<4>[23900.161960]  [<c18e981f>] ? _raw_spin_unlock_bh+0x1f/0x30
<4>[23900.167834]  [<c18298d6>] inet_dgram_connect+0x36/0x80
<4>[23900.173224]  [<c14f9e88>] ? _copy_from_user+0x48/0x140
<4>[23900.178817]  [<c17ab9da>] sys_connect+0x9a/0xd0
<4>[23900.183538]  [<c132e93c>] ? alloc_file+0xdc/0x240
<4>[23900.189111]  [<c123925d>] ? sub_preempt_count+0x3d/0x50

But real fix is to provide appropriate RCU protection to nh_dev/fib_dev,
with appropriate __rcu annotation.

tested with CONFIG_PROVE_RCU and CONFIG_SPARSE_RCU_POINTER

Reported-by: Yanmin Zhang <yanmin_zhang@linux.intel.com>
Reported-by: Kun Jiang <kunx.jiang@intel.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 include/linux/inetdevice.h        |    8 ++-
 include/net/ip_fib.h              |    2 
 net/ipv4/devinet.c                |   11 ++--
 net/ipv4/fib_frontend.c           |    6 +-
 net/ipv4/fib_semantics.c          |   66 ++++++++++++++++++----------
 net/ipv4/fib_trie.c               |   11 ++--
 net/ipv4/netfilter/ipt_rpfilter.c |    4 -
 net/ipv4/route.c                  |    4 -
 8 files changed, 71 insertions(+), 41 deletions(-)

diff --git a/include/linux/inetdevice.h b/include/linux/inetdevice.h
index 597f4a9..8cfa569 100644
--- a/include/linux/inetdevice.h
+++ b/include/linux/inetdevice.h
@@ -172,7 +172,13 @@ extern int		inet_addr_onlink(struct in_device *in_dev, __be32 a, __be32 b);
 extern int		devinet_ioctl(struct net *net, unsigned int cmd, void __user *);
 extern void		devinet_init(void);
 extern struct in_device	*inetdev_by_index(struct net *, int);
-extern __be32		inet_select_addr(const struct net_device *dev, __be32 dst, int scope);
+
+extern __be32	__inet_select_addr(struct net *net,
+				   const struct net_device *dev,
+				   __be32 dst, int scope);
+#define inet_select_addr(dev, dst, scope) \
+	__inet_select_addr(dev_net(dev), (dev), (dst), (scope))
+
 extern __be32		inet_confirm_addr(struct in_device *in_dev, __be32 dst, __be32 local, int scope);
 extern struct in_ifaddr *inet_ifa_byprefix(struct in_device *in_dev, __be32 prefix, __be32 mask);
 
diff --git a/include/net/ip_fib.h b/include/net/ip_fib.h
index 78df0866..9d49426 100644
--- a/include/net/ip_fib.h
+++ b/include/net/ip_fib.h
@@ -46,7 +46,7 @@ struct fib_config {
 struct fib_info;
 
 struct fib_nh {
-	struct net_device	*nh_dev;
+	struct net_device __rcu	*nh_dev;
 	struct hlist_node	nh_hash;
 	struct fib_info		*nh_parent;
 	unsigned int		nh_flags;
diff --git a/net/ipv4/devinet.c b/net/ipv4/devinet.c
index 10e15a1..be1e75c 100644
--- a/net/ipv4/devinet.c
+++ b/net/ipv4/devinet.c
@@ -164,7 +164,7 @@ struct net_device *__ip_dev_find(struct net *net, __be32 addr, bool devref)
 		if (local &&
 		    !fib_table_lookup(local, &fl4, &res, FIB_LOOKUP_NOREF) &&
 		    res.type == RTN_LOCAL)
-			result = FIB_RES_DEV(res);
+			result = rcu_dereference(FIB_RES_DEV(res));
 	}
 	if (result && devref)
 		dev_hold(result);
@@ -960,14 +960,15 @@ out:
 	return done;
 }
 
-__be32 inet_select_addr(const struct net_device *dev, __be32 dst, int scope)
+__be32 __inet_select_addr(struct net *net,
+			  const struct net_device *dev,
+			  __be32 dst, int scope)
 {
 	__be32 addr = 0;
 	struct in_device *in_dev;
-	struct net *net = dev_net(dev);
 
 	rcu_read_lock();
-	in_dev = __in_dev_get_rcu(dev);
+	in_dev = dev ? __in_dev_get_rcu(dev) : NULL;
 	if (!in_dev)
 		goto no_in_dev;
 
@@ -1007,7 +1008,7 @@ out_unlock:
 	rcu_read_unlock();
 	return addr;
 }
-EXPORT_SYMBOL(inet_select_addr);
+EXPORT_SYMBOL(__inet_select_addr);
 
 static __be32 confirm_addr_indev(struct in_device *in_dev, __be32 dst,
 			      __be32 local, int scope)
diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c
index 3854411..2479884 100644
--- a/net/ipv4/fib_frontend.c
+++ b/net/ipv4/fib_frontend.c
@@ -159,7 +159,7 @@ static inline unsigned int __inet_dev_addr_type(struct net *net,
 		ret = RTN_UNICAST;
 		rcu_read_lock();
 		if (!fib_table_lookup(local_table, &fl4, &res, FIB_LOOKUP_NOREF)) {
-			if (!dev || dev == res.fi->fib_dev)
+			if (!dev || dev == rcu_dereference(res.fi->fib_dev))
 				ret = res.type;
 		}
 		rcu_read_unlock();
@@ -237,13 +237,13 @@ int fib_validate_source(struct sk_buff *skb, __be32 src, __be32 dst, u8 tos,
 	for (ret = 0; ret < res.fi->fib_nhs; ret++) {
 		struct fib_nh *nh = &res.fi->fib_nh[ret];
 
-		if (nh->nh_dev == dev) {
+		if (rcu_dereference(nh->nh_dev) == dev) {
 			dev_match = true;
 			break;
 		}
 	}
 #else
-	if (FIB_RES_DEV(res) == dev)
+	if (rcu_dereference(FIB_RES_DEV(res)) == dev)
 		dev_match = true;
 #endif
 	if (dev_match) {
diff --git a/net/ipv4/fib_semantics.c b/net/ipv4/fib_semantics.c
index a8bdf74..a09f055 100644
--- a/net/ipv4/fib_semantics.c
+++ b/net/ipv4/fib_semantics.c
@@ -157,9 +157,13 @@ void free_fib_info(struct fib_info *fi)
 		return;
 	}
 	change_nexthops(fi) {
-		if (nexthop_nh->nh_dev)
-			dev_put(nexthop_nh->nh_dev);
-		nexthop_nh->nh_dev = NULL;
+		struct net_device *ndev;
+
+		ndev = rtnl_dereference(nexthop_nh->nh_dev);
+		if (ndev) {
+			dev_put(ndev);
+			RCU_INIT_POINTER(nexthop_nh->nh_dev, NULL);
+		}
 	} endfor_nexthops(fi);
 	fib_info_cnt--;
 	release_net(fi->fib_net);
@@ -273,7 +277,7 @@ int ip_fib_check_default(__be32 gw, struct net_device *dev)
 	hash = fib_devindex_hashfn(dev->ifindex);
 	head = &fib_info_devhash[hash];
 	hlist_for_each_entry(nh, node, head, nh_hash) {
-		if (nh->nh_dev == dev &&
+		if (rcu_access_pointer(nh->nh_dev) == dev &&
 		    nh->nh_gw == gw &&
 		    !(nh->nh_flags & RTNH_F_DEAD)) {
 			spin_unlock(&fib_info_lock);
@@ -360,13 +364,17 @@ struct fib_alias *fib_find_alias(struct list_head *fah, u8 tos, u32 prio)
 	return NULL;
 }
 
+/* called in rcu_read_lock() section */
 int fib_detect_death(struct fib_info *fi, int order,
 		     struct fib_info **last_resort, int *last_idx, int dflt)
 {
-	struct neighbour *n;
+	struct neighbour *n = NULL;
 	int state = NUD_NONE;
+	struct net_device *ndev = rcu_dereference(fi->fib_dev);
+
+	if (ndev)
+		n = neigh_lookup(&arp_tbl, &fi->fib_nh[0].nh_gw, ndev);
 
-	n = neigh_lookup(&arp_tbl, &fi->fib_nh[0].nh_gw, fi->fib_dev);
 	if (n) {
 		state = n->nud_state;
 		neigh_release(n);
@@ -551,7 +559,7 @@ static int fib_check_nh(struct fib_config *cfg, struct fib_info *fi,
 				return -ENODEV;
 			if (!(dev->flags & IFF_UP))
 				return -ENETDOWN;
-			nh->nh_dev = dev;
+			rcu_assign_pointer(nh->nh_dev, dev);
 			dev_hold(dev);
 			nh->nh_scope = RT_SCOPE_LINK;
 			return 0;
@@ -578,7 +586,8 @@ static int fib_check_nh(struct fib_config *cfg, struct fib_info *fi,
 			goto out;
 		nh->nh_scope = res.scope;
 		nh->nh_oif = FIB_RES_OIF(res);
-		nh->nh_dev = dev = FIB_RES_DEV(res);
+		dev = rcu_dereference(FIB_RES_DEV(res));
+		rcu_assign_pointer(nh->nh_dev, dev);
 		if (!dev)
 			goto out;
 		dev_hold(dev);
@@ -597,8 +606,8 @@ static int fib_check_nh(struct fib_config *cfg, struct fib_info *fi,
 		err = -ENETDOWN;
 		if (!(in_dev->dev->flags & IFF_UP))
 			goto out;
-		nh->nh_dev = in_dev->dev;
-		dev_hold(nh->nh_dev);
+		rcu_assign_pointer(nh->nh_dev, in_dev->dev);
+		dev_hold(in_dev->dev);
 		nh->nh_scope = RT_SCOPE_HOST;
 		err = 0;
 	}
@@ -695,9 +704,14 @@ static void fib_info_hash_move(struct hlist_head *new_info_hash,
 
 __be32 fib_info_update_nh_saddr(struct net *net, struct fib_nh *nh)
 {
-	nh->nh_saddr = inet_select_addr(nh->nh_dev,
-					nh->nh_gw,
-					nh->nh_parent->fib_scope);
+	struct net_device *ndev;
+
+	rcu_read_lock();
+	ndev = rcu_dereference(nh->nh_dev);
+	nh->nh_saddr = __inet_select_addr(net, ndev,
+					  nh->nh_gw,
+					  nh->nh_parent->fib_scope);
+	rcu_read_unlock();
 	nh->nh_saddr_genid = atomic_read(&net->ipv4.dev_addr_genid);
 
 	return nh->nh_saddr;
@@ -843,7 +857,8 @@ struct fib_info *fib_create_info(struct fib_config *cfg)
 		if (nhs != 1 || nh->nh_gw)
 			goto err_inval;
 		nh->nh_scope = RT_SCOPE_NOWHERE;
-		nh->nh_dev = dev_get_by_index(net, fi->fib_nh->nh_oif);
+		RCU_INIT_POINTER(nh->nh_dev,
+				 dev_get_by_index(net, fi->fib_nh->nh_oif));
 		err = -ENODEV;
 		if (nh->nh_dev == NULL)
 			goto failure;
@@ -889,10 +904,11 @@ link_it:
 	change_nexthops(fi) {
 		struct hlist_head *head;
 		unsigned int hash;
+		struct net_device *ndev = rtnl_dereference(nexthop_nh->nh_dev);
 
-		if (!nexthop_nh->nh_dev)
+		if (!ndev)
 			continue;
-		hash = fib_devindex_hashfn(nexthop_nh->nh_dev->ifindex);
+		hash = fib_devindex_hashfn(ndev->ifindex);
 		head = &fib_info_devhash[hash];
 		hlist_add_head(&nexthop_nh->nh_hash, head);
 	} endfor_nexthops(fi)
@@ -1049,14 +1065,14 @@ int fib_sync_down_dev(struct net_device *dev, int force)
 		int dead;
 
 		BUG_ON(!fi->fib_nhs);
-		if (nh->nh_dev != dev || fi == prev_fi)
+		if (rtnl_dereference(nh->nh_dev) != dev || fi == prev_fi)
 			continue;
 		prev_fi = fi;
 		dead = 0;
 		change_nexthops(fi) {
 			if (nexthop_nh->nh_flags & RTNH_F_DEAD)
 				dead++;
-			else if (nexthop_nh->nh_dev == dev &&
+			else if (rtnl_dereference(nexthop_nh->nh_dev) == dev &&
 				 nexthop_nh->nh_scope != scope) {
 				nexthop_nh->nh_flags |= RTNH_F_DEAD;
 #ifdef CONFIG_IP_ROUTE_MULTIPATH
@@ -1068,7 +1084,8 @@ int fib_sync_down_dev(struct net_device *dev, int force)
 				dead++;
 			}
 #ifdef CONFIG_IP_ROUTE_MULTIPATH
-			if (force > 1 && nexthop_nh->nh_dev == dev) {
+			if (force > 1 &&
+			    rtnl_dereference(nexthop_nh->nh_dev) == dev) {
 				dead = fi->fib_nhs;
 				break;
 			}
@@ -1167,20 +1184,23 @@ int fib_sync_up(struct net_device *dev)
 		int alive;
 
 		BUG_ON(!fi->fib_nhs);
-		if (nh->nh_dev != dev || fi == prev_fi)
+		if (rtnl_dereference(nh->nh_dev) != dev || fi == prev_fi)
 			continue;
 
 		prev_fi = fi;
 		alive = 0;
 		change_nexthops(fi) {
+			struct net_device *ndev;
+
 			if (!(nexthop_nh->nh_flags & RTNH_F_DEAD)) {
 				alive++;
 				continue;
 			}
-			if (nexthop_nh->nh_dev == NULL ||
-			    !(nexthop_nh->nh_dev->flags & IFF_UP))
+			ndev = rtnl_dereference(nexthop_nh->nh_dev);
+			if (ndev == NULL ||
+			    !(ndev->flags & IFF_UP))
 				continue;
-			if (nexthop_nh->nh_dev != dev ||
+			if (ndev != dev ||
 			    !__in_dev_get_rtnl(dev))
 				continue;
 			alive++;
diff --git a/net/ipv4/fib_trie.c b/net/ipv4/fib_trie.c
index 30b88d7..3861ba0 100644
--- a/net/ipv4/fib_trie.c
+++ b/net/ipv4/fib_trie.c
@@ -2556,11 +2556,14 @@ static int fib_route_seq_show(struct seq_file *seq, void *v)
 			    || fa->fa_type == RTN_MULTICAST)
 				continue;
 
-			if (fi)
+			if (fi) {
+				struct net_device *ndev;
+
+				ndev = rcu_dereference(fi->fib_dev);
 				seq_printf(seq,
 					 "%s\t%08X\t%08X\t%04X\t%d\t%u\t"
 					 "%d\t%08X\t%d\t%u\t%u%n",
-					 fi->fib_dev ? fi->fib_dev->name : "*",
+					 ndev ? ndev->name : "*",
 					 prefix,
 					 fi->fib_nh->nh_gw, flags, 0, 0,
 					 fi->fib_priority,
@@ -2569,13 +2572,13 @@ static int fib_route_seq_show(struct seq_file *seq, void *v)
 					  fi->fib_advmss + 40 : 0),
 					 fi->fib_window,
 					 fi->fib_rtt >> 3, &len);
-			else
+			} else {
 				seq_printf(seq,
 					 "*\t%08X\t%08X\t%04X\t%d\t%u\t"
 					 "%d\t%08X\t%d\t%u\t%u%n",
 					 prefix, 0, flags, 0, 0, 0,
 					 mask, 0, 0, 0, &len);
-
+			}
 			seq_printf(seq, "%*s\n", 127 - len, "");
 		}
 	}
diff --git a/net/ipv4/netfilter/ipt_rpfilter.c b/net/ipv4/netfilter/ipt_rpfilter.c
index 31371be..bdc9393 100644
--- a/net/ipv4/netfilter/ipt_rpfilter.c
+++ b/net/ipv4/netfilter/ipt_rpfilter.c
@@ -52,13 +52,13 @@ static bool rpfilter_lookup_reverse(struct flowi4 *fl4,
 	for (ret = 0; ret < res.fi->fib_nhs; ret++) {
 		struct fib_nh *nh = &res.fi->fib_nh[ret];
 
-		if (nh->nh_dev == dev) {
+		if (rcu_dereference(nh->nh_dev) == dev) {
 			dev_match = true;
 			break;
 		}
 	}
 #else
-	if (FIB_RES_DEV(res) == dev)
+	if (rcu_dereference(FIB_RES_DEV(res)) == dev)
 		dev_match = true;
 #endif
 	if (dev_match || flags & XT_RPFILTER_LOOSE)
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index ffcb3b0..b56b91e 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -2127,7 +2127,7 @@ static int __mkroute_input(struct sk_buff *skb,
 	u32 itag;
 
 	/* get a working reference to the output device */
-	out_dev = __in_dev_get_rcu(FIB_RES_DEV(*res));
+	out_dev = __in_dev_get_rcu(rcu_dereference(FIB_RES_DEV(*res)));
 	if (out_dev == NULL) {
 		net_crit_ratelimited("Bug in ip_route_input_slow(). Please report.\n");
 		return -EINVAL;
@@ -2786,7 +2786,7 @@ static struct rtable *ip_route_output_slow(struct net *net, struct flowi4 *fl4)
 	if (!fl4->saddr)
 		fl4->saddr = FIB_RES_PREFSRC(net, res);
 
-	dev_out = FIB_RES_DEV(res);
+	dev_out = rcu_dereference(FIB_RES_DEV(res));
 	fl4->flowi4_oif = dev_out->ifindex;
 
 

^ permalink raw reply related

* [PATCH 4/4] stmmac: update driver's doc
From: Giuseppe CAVALLARO @ 2012-05-23  6:36 UTC (permalink / raw)
  To: netdev; +Cc: Giuseppe Cavallaro
In-Reply-To: <1337754964-21754-1-git-send-email-peppe.cavallaro@st.com>

Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
---
 Documentation/networking/stmmac.txt |   44 +++++++++++++++++++---------------
 1 files changed, 25 insertions(+), 19 deletions(-)

diff --git a/Documentation/networking/stmmac.txt b/Documentation/networking/stmmac.txt
index ab1e8d7..5cb9a19 100644
--- a/Documentation/networking/stmmac.txt
+++ b/Documentation/networking/stmmac.txt
@@ -10,8 +10,8 @@ Currently this network device driver is for all STM embedded MAC/GMAC
 (i.e. 7xxx/5xxx SoCs), SPEAr (arm), Loongson1B (mips) and XLINX XC2V3000
 FF1152AMT0221 D1215994A VIRTEX FPGA board.
 
-DWC Ether MAC 10/100/1000 Universal version 3.60a (and older) and DWC Ether MAC 10/100
-Universal version 4.0 have been used for developing this driver.
+DWC Ether MAC 10/100/1000 Universal version 3.60a (and older) and DWC Ether
+MAC 10/100 Universal version 4.0 have been used for developing this driver.
 
 This driver supports both the platform bus and PCI.
 
@@ -54,27 +54,27 @@ net_device structure enabling the scatter/gather feature.
 When one or more packets are received, an interrupt happens. The interrupts
 are not queued so the driver has to scan all the descriptors in the ring during
 the receive process.
-This is based on NAPI so the interrupt handler signals only if there is work to be
-done, and it exits.
+This is based on NAPI so the interrupt handler signals only if there is work
+to be done, and it exits.
 Then the poll method will be scheduled at some future point.
 The incoming packets are stored, by the DMA, in a list of pre-allocated socket
 buffers in order to avoid the memcpy (Zero-copy).
 
 4.3) Timer-Driver Interrupt
-Instead of having the device that asynchronously notifies the frame receptions, the
-driver configures a timer to generate an interrupt at regular intervals.
-Based on the granularity of the timer, the frames that are received by the device
-will experience different levels of latency. Some NICs have dedicated timer
-device to perform this task. STMMAC can use either the RTC device or the TMU
-channel 2  on STLinux platforms.
+Instead of having the device that asynchronously notifies the frame receptions,
+the driver configures a timer to generate an interrupt at regular intervals.
+Based on the granularity of the timer, the frames that are received by the
+device will experience different levels of latency. Some NICs have dedicated
+timer device to perform this task. STMMAC can use either the RTC device or the
+TMU channel 2  on STLinux platforms.
 The timers frequency can be passed to the driver as parameter; when change it,
 take care of both hardware capability and network stability/performance impact.
-Several performance tests on STM platforms showed this optimisation allows to spare
-the CPU while having the maximum throughput.
+Several performance tests on STM platforms showed this optimisation allows to
+spare the CPU while having the maximum throughput.
 
 4.4) WOL
-Wake up on Lan feature through Magic and Unicast frames are supported for the GMAC
-core.
+Wake up on Lan feature through Magic and Unicast frames are supported for the
+GMAC core.
 
 4.5) DMA descriptors
 Driver handles both normal and enhanced descriptors. The latter has been only
@@ -106,7 +106,8 @@ Several driver's information can be passed through the platform
 These are included in the include/linux/stmmac.h header file
 and detailed below as well:
 
- struct plat_stmmacenet_data {
+struct plat_stmmacenet_data {
+	char *phy_bus_name;
 	int bus_id;
 	int phy_addr;
 	int interface;
@@ -124,19 +125,24 @@ and detailed below as well:
 	void (*bus_setup)(void __iomem *ioaddr);
 	int (*init)(struct platform_device *pdev);
 	void (*exit)(struct platform_device *pdev);
+	void *custom_cfg;
+	void *custom_data;
 	void *bsp_priv;
  };
 
 Where:
+ o phy_bus_name: phy bus name to attach to the stmmac.
  o bus_id: bus identifier.
  o phy_addr: the physical address can be passed from the platform.
 	    If it is set to -1 the driver will automatically
 	    detect it at run-time by probing all the 32 addresses.
  o interface: PHY device's interface.
  o mdio_bus_data: specific platform fields for the MDIO bus.
- o pbl: the Programmable Burst Length is maximum number of beats to
+ o dma_cfg: internal DMA parameters
+   o pbl: the Programmable Burst Length is maximum number of beats to
        be transferred in one DMA transaction.
        GMAC also enables the 4xPBL by default.
+   o fixed_burst/mixed_burst/burst_len
  o clk_csr: fixed CSR Clock range selection.
  o has_gmac: uses the GMAC core.
  o enh_desc: if sets the MAC will use the enhanced descriptor structure.
@@ -160,8 +166,9 @@ Where:
 	     this is sometime necessary on some platforms (e.g. ST boxes)
 	     where the HW needs to have set some PIO lines or system cfg
 	     registers.
- o custom_cfg: this is a custom configuration that can be passed while
-	      initialising the resources.
+ o custom_cfg/custom_data: this is a custom configuration that can be passed
+			   while initialising the resources.
+ o bsp_priv: another private poiter.
 
 For MDIO bus The we have:
 
@@ -180,7 +187,6 @@ Where:
  o irqs: list of IRQs, one per PHY.
  o probed_phy_irq: if irqs is NULL, use this for probed PHY.
 
-
 For DMA engine we have the following internal fields that should be
 tuned according to the HW capabilities.
 
-- 
1.7.4.4

^ permalink raw reply related

* [PATCH 3/4] stmmac: fix driver Kconfig when built as module
From: Giuseppe CAVALLARO @ 2012-05-23  6:36 UTC (permalink / raw)
  To: netdev; +Cc: Giuseppe Cavallaro, Rayagond Kokatanur
In-Reply-To: <1337754964-21754-1-git-send-email-peppe.cavallaro@st.com>

This patches fixes the driver when built as dyn module.
In fact the platform part cannot be built and the probe fails
(thanks to Bob Liu that reported this bug).
The patch also makes the selection of Platform and PCI parts
mutually exclusive.

Reported-by: Bob Liu <lliubbo@gmail.com>
Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Reviewed-by: Ben Hutchings <bhutchings@solarflare.com>
cc: Rayagond Kokatanur <rayagond@vayavyalabs.com>
---
 drivers/net/ethernet/stmicro/stmmac/Kconfig |    9 ++++++---
 1 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/stmicro/stmmac/Kconfig b/drivers/net/ethernet/stmicro/stmmac/Kconfig
index 0364283..3318b32 100644
--- a/drivers/net/ethernet/stmicro/stmmac/Kconfig
+++ b/drivers/net/ethernet/stmicro/stmmac/Kconfig
@@ -12,10 +12,12 @@ config STMMAC_ETH
 
 if STMMAC_ETH
 
+choice
+        prompt "STMMAC bus support"
+
 config STMMAC_PLATFORM
-	tristate "STMMAC platform bus support"
+	bool "Platform bus support"
 	depends on STMMAC_ETH
-	default y
 	---help---
 	  This selects the platform specific bus support for
 	  the stmmac device driver. This is the driver used
@@ -26,7 +28,7 @@ config STMMAC_PLATFORM
 	  If unsure, say N.
 
 config STMMAC_PCI
-	tristate "STMMAC support on PCI bus (EXPERIMENTAL)"
+	bool "PCI bus support (EXPERIMENTAL)"
 	depends on STMMAC_ETH && PCI && EXPERIMENTAL
 	---help---
 	  This is to select the Synopsys DWMAC available on PCI devices,
@@ -36,6 +38,7 @@ config STMMAC_PCI
 	  D1215994A VIRTEX FPGA board.
 
 	  If unsure, say N.
+endchoice
 
 config STMMAC_DEBUG_FS
 	bool "Enable monitoring via sysFS "
-- 
1.7.4.4

^ permalink raw reply related

* [PATCH 2/4] stmmac: fix driver's doc when run kernel-doc script
From: Giuseppe CAVALLARO @ 2012-05-23  6:36 UTC (permalink / raw)
  To: netdev; +Cc: Giuseppe Cavallaro
In-Reply-To: <1337754964-21754-1-git-send-email-peppe.cavallaro@st.com>

Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
---
 drivers/net/ethernet/stmicro/stmmac/stmmac_main.c |    7 +++++--
 1 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
index 0caae72..18ed878 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
@@ -833,8 +833,9 @@ static u32 stmmac_get_synopsys_id(struct stmmac_priv *priv)
 
 /**
  * stmmac_selec_desc_mode
- * @dev : device pointer
- * Description: select the Enhanced/Alternate or Normal descriptors */
+ * @priv : private structure
+ * Description: select the Enhanced/Alternate or Normal descriptors
+ */
 static void stmmac_selec_desc_mode(struct stmmac_priv *priv)
 {
 	if (priv->plat->enh_desc) {
@@ -1860,6 +1861,8 @@ static int stmmac_hw_init(struct stmmac_priv *priv)
 /**
  * stmmac_dvr_probe
  * @device: device pointer
+ * @plat_dat: platform data pointer
+ * @addr: iobase memory address
  * Description: this is the main probe function used to
  * call the alloc_etherdev, allocate the priv structure.
  */
-- 
1.7.4.4

^ permalink raw reply related

* [PATCH 1/4] stmmac: remove two useless initialisation
From: Giuseppe CAVALLARO @ 2012-05-23  6:36 UTC (permalink / raw)
  To: netdev; +Cc: Giuseppe Cavallaro

This patch removes two useful initialisation in the
stmmac_rx and stmmac_tx function.
In the former, count var was already reset and in the
stmmac_tx we only need to increment the dirty pointer
w/o setting the entry var.

Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
---
 drivers/net/ethernet/stmicro/stmmac/stmmac_main.c |    3 +--
 1 files changed, 1 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
index 7096633..0caae72 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
@@ -677,7 +677,7 @@ static void stmmac_tx(struct stmmac_priv *priv)
 
 		priv->hw->desc->release_tx_desc(p);
 
-		entry = (++priv->dirty_tx) % txsize;
+		priv->dirty_tx++;
 	}
 	if (unlikely(netif_queue_stopped(priv->dev) &&
 		     stmmac_tx_avail(priv) > STMMAC_TX_THRESH(priv))) {
@@ -1307,7 +1307,6 @@ static int stmmac_rx(struct stmmac_priv *priv, int limit)
 		display_ring(priv->dma_rx, rxsize);
 	}
 #endif
-	count = 0;
 	while (!priv->hw->desc->get_rx_owner(p)) {
 		int status;
 
-- 
1.7.4.4

^ permalink raw reply related

* Re: [PATCH] ipv4: fix the rcu race between free_fib_info and ip_route_output_slow
From: Eric Dumazet @ 2012-05-23  6:27 UTC (permalink / raw)
  To: Yanmin Zhang; +Cc: David Miller, kunx.jiang, netdev, linux-kernel
In-Reply-To: <1337753757.14538.190.camel@ymzhang.sh.intel.com>

On Wed, 2012-05-23 at 14:15 +0800, Yanmin Zhang wrote:

> 
> Besides fi->fib_dev here, we need check fi->fib_hash, fi->fib_lhash,
> and nexthop_nh->nh_hash.

No, its not needed.

I am sending a patch, because I feel this area is too complex for non
netdev guys.

^ permalink raw reply

* Re: [net-next 1/4] stmmac: modify and improve the bus_setup platform callback
From: Giuseppe CAVALLARO @ 2012-05-23  6:19 UTC (permalink / raw)
  To: David Miller; +Cc: netdev
In-Reply-To: <20120523.014359.1912436981409831773.davem@davemloft.net>

Hello David

On 5/23/2012 7:43 AM, David Miller wrote:
> 
> As I stated the other day, the net-next tree is closed, therefore only
> bug fixes are appropriate at this time.

Ok, so only these patches shouldbe considered:

  stmmac: update driver's doc
  stmmac: remove two useless initialisation
  stmmac: remove two useless initialisation

and

  stmmac: fix driver Kconfig when built as module
     This is attached to another thread and under review.


> If there are bug fixes you want integrated in this series, you need
> to seperate them out and submit them only at this time.

Thanks, I'm resending them again for net.git.

> When net-next opens back up you can submit the rest.

I'll only resend the following patch (that should be stay in next repo)
later:
     stmmac: modify and improve the bus_setup platform callback

Regards
Peppe

> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox