Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH v3 net-next] rtnetlink: bridge: use ext_ack instead of printk
From: David Ahern @ 2017-10-10 14:47 UTC (permalink / raw)
  To: Florian Westphal, netdev
In-Reply-To: <20171010144427.8341-1-fw@strlen.de>

On 10/10/17 8:44 AM, Florian Westphal wrote:
> @@ -3666,7 +3666,7 @@ static int rtnl_bridge_setlink(struct sk_buff *skb, struct nlmsghdr *nlh,
>  
>  	dev = __dev_get_by_index(net, ifm->ifi_index);
>  	if (!dev) {
> -		pr_info("PF_BRIDGE: RTM_SETLINK with unknown ifindex\n");
> +		NL_SET_ERR_MSG(extack, "RTM_SETLINK with unknown ifindex");
>  		return -ENODEV;
>  	}
>  
> @@ -3741,7 +3741,7 @@ static int rtnl_bridge_dellink(struct sk_buff *skb, struct nlmsghdr *nlh,
>  
>  	dev = __dev_get_by_index(net, ifm->ifi_index);
>  	if (!dev) {
> -		pr_info("PF_BRIDGE: RTM_SETLINK with unknown ifindex\n");
> +		NL_SET_ERR_MSG(extack, "RTM_SETLINK with unknown ifindex");
>  		return -ENODEV;
>  	}

missed a couple of 'RTM_* with' strings

^ permalink raw reply

* Re: [PATCH net v2] net: enable interface alias removal via rtnl
From: David Ahern @ 2017-10-10 14:50 UTC (permalink / raw)
  To: Nicolas Dichtel, davem; +Cc: netdev, oliver, Stephen Hemminger
In-Reply-To: <20171010124138.27342-1-nicolas.dichtel@6wind.com>

On 10/10/17 6:41 AM, Nicolas Dichtel wrote:
> IFLA_IFALIAS is defined as NLA_STRING. It means that the minimal length of
> the attribute is 1 ("\0"). However, to remove an alias, the attribute
> length must be 0 (see dev_set_alias()).
> 
> Let's define the type to NLA_BINARY, so that the alias can be removed.

not to be pedantic, but we need to be clear that the type is changed
only for policy validation.

> 
> Example:
> $ ip l s dummy0 alias foo
> $ ip l l dev dummy0
> 5: dummy0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
>     link/ether ae:20:30:4f:a7:f3 brd ff:ff:ff:ff:ff:ff
>     alias foo
> 
> Before the patch:
> $ ip l s dummy0 alias ""
> RTNETLINK answers: Numerical result out of range
> 
> After the patch:
> $ ip l s dummy0 alias ""
> $ ip l l dev dummy0
> 5: dummy0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
>     link/ether ae:20:30:4f:a7:f3 brd ff:ff:ff:ff:ff:ff
> 
> CC: Oliver Hartkopp <oliver@hartkopp.net>
> CC: Stephen Hemminger <stephen@networkplumber.org>
> Fixes: 96ca4a2cc145 ("net: remove ifalias on empty given alias")
> Reported-by: Julien FLoret <julien.floret@6wind.com>
> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
> ---
> 
> v1 -> v2: add the comment
> 
>  net/core/rtnetlink.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
> index d4bcdcc68e92..5343565d88b7 100644
> --- a/net/core/rtnetlink.c
> +++ b/net/core/rtnetlink.c
> @@ -1483,7 +1483,10 @@ static const struct nla_policy ifla_policy[IFLA_MAX+1] = {
>  	[IFLA_LINKINFO]		= { .type = NLA_NESTED },
>  	[IFLA_NET_NS_PID]	= { .type = NLA_U32 },
>  	[IFLA_NET_NS_FD]	= { .type = NLA_U32 },
> -	[IFLA_IFALIAS]	        = { .type = NLA_STRING, .len = IFALIASZ-1 },
> +	/* IFLA_IFALIAS is a string, but policy is set to NLA_BINARY to
> +	 * allow 0-length string (needed to remove an alias).
> +	 */
> +	[IFLA_IFALIAS]	        = { .type = NLA_BINARY, .len = IFALIASZ - 1 },
>  	[IFLA_VFINFO_LIST]	= {. type = NLA_NESTED },
>  	[IFLA_VF_PORTS]		= { .type = NLA_NESTED },
>  	[IFLA_PORT_SELF]	= { .type = NLA_NESTED },
> 

Seems like a reasonable solution.

Acked-by: David Ahern <dsahern@gmail.com>

^ permalink raw reply

* Re: [PATCH net-next v2 4/5] selinux: bpf: Add selinux check for eBPF syscall operations
From: Stephen Smalley @ 2017-10-10 14:52 UTC (permalink / raw)
  To: Chenbo Feng, linux-security-module-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA, SELinux
  Cc: Chenbo Feng, Alexei Starovoitov, Daniel Borkmann,
	lorenzo-hpIqsD4AKlfQT0dZR+AlfA
In-Reply-To: <1507645097.30616.6.camel-+05T5uksL2qpZYMLLGbcSA@public.gmane.org>

On Tue, 2017-10-10 at 10:18 -0400, Stephen Smalley wrote:
> On Mon, 2017-10-09 at 15:20 -0700, Chenbo Feng wrote:
> > From: Chenbo Feng <fengc-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
> > 
> > Implement the actual checks introduced to eBPF related syscalls.
> > This
> > implementation use the security field inside bpf object to store a
> > sid that
> > identify the bpf object. And when processes try to access the
> > object,
> > selinux will check if processes have the right privileges. The
> > creation
> > of eBPF object are also checked at the general bpf check hook and
> > new
> > cmd introduced to eBPF domain can also be checked there.
> > 
> > Signed-off-by: Chenbo Feng <fengc-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
> > Acked-by: Alexei Starovoitov <ast-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
> > ---
> >  security/selinux/hooks.c            | 111
> > ++++++++++++++++++++++++++++++++++++
> >  security/selinux/include/classmap.h |   2 +
> >  security/selinux/include/objsec.h   |   4 ++
> >  3 files changed, 117 insertions(+)
> > 
> > diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
> > index f5d304736852..41aba4e3d57c 100644
> > --- a/security/selinux/hooks.c
> > +++ b/security/selinux/hooks.c
> > @@ -85,6 +85,7 @@
> >  #include <linux/export.h>
> >  #include <linux/msg.h>
> >  #include <linux/shm.h>
> > +#include <linux/bpf.h>
> >  
> >  #include "avc.h"
> >  #include "objsec.h"
> > @@ -6252,6 +6253,106 @@ static void selinux_ib_free_security(void
> > *ib_sec)
> >  }
> >  #endif
> >  
> > +#ifdef CONFIG_BPF_SYSCALL
> > +static int selinux_bpf(int cmd, union bpf_attr *attr,
> > +				     unsigned int size)
> > +{
> > +	u32 sid = current_sid();
> > +	int ret;
> > +
> > +	switch (cmd) {
> > +	case BPF_MAP_CREATE:
> > +		ret = avc_has_perm(sid, sid, SECCLASS_BPF_MAP,
> > BPF_MAP__CREATE,
> > +				   NULL);
> > +		break;
> > +	case BPF_PROG_LOAD:
> > +		ret = avc_has_perm(sid, sid, SECCLASS_BPF_PROG,
> > BPF_PROG__LOAD,
> > +				   NULL);
> > +		break;
> > +	default:
> > +		ret = 0;
> > +		break;
> > +	}
> > +
> > +	return ret;
> > +}
> > +
> > +static u32 bpf_map_fmode_to_av(fmode_t fmode)
> > +{
> > +	u32 av = 0;
> > +
> > +	if (f_mode & FMODE_READ)
> > +		av |= BPF_MAP__READ;
> > +	if (f_mode & FMODE_WRITE)
> > +		av |= BPF_MAP__WRITE;
> > +	return av;
> > +}
> > +
> > +static int selinux_bpf_map(struct bpf_map *map, fmode_t fmode)
> > +{
> > +	u32 sid = current_sid();
> > +	struct bpf_security_struct *bpfsec;
> > +
> > +	bpfsec = map->security;
> > +	return avc_has_perm(sid, bpfsec->sid, SECCLASS_BPF_MAP,
> > +			    bpf_map_fmode_to_av(fmode), NULL);
> > +}
> > +
> > +static int selinux_bpf_prog(struct bpf_prog *prog)
> > +{
> > +	u32 sid = current_sid();
> > +	struct bpf_security_struct *bpfsec;
> > +
> > +	bpfsec = prog->aux->security;
> > +	return avc_has_perm(sid, bpfsec->sid, SECCLASS_BPF_PROG,
> > +			    BPF_PROG__USE, NULL);
> > +}
> > +
> > +static int selinux_bpf_map_alloc(struct bpf_map *map)
> > +{
> > +	struct bpf_security_struct *bpfsec;
> > +
> > +	bpfsec = kzalloc(sizeof(*bpfsec), GFP_KERNEL);
> > +	if (!bpfsec)
> > +		return -ENOMEM;
> > +
> > +	bpfsec->sid = current_sid();
> > +	map->security = bpfsec;
> > +
> > +	return 0;
> > +}
> > +
> > +static void selinux_bpf_map_free(struct bpf_map *map)
> > +{
> > +	struct bpf_security_struct *bpfsec = map->security;
> > +
> > +	map->security = NULL;
> > +	kfree(bpfsec);
> > +}
> > +
> > +static int selinux_bpf_prog_alloc(struct bpf_prog_aux *aux)
> > +{
> > +	struct bpf_security_struct *bpfsec;
> > +
> > +	bpfsec = kzalloc(sizeof(*bpfsec), GFP_KERNEL);
> > +	if (!bpfsec)
> > +		return -ENOMEM;
> > +
> > +	bpfsec->sid = current_sid();
> > +	aux->security = bpfsec;
> > +
> > +	return 0;
> > +}
> > +
> > +static void selinux_bpf_prog_free(struct bpf_prog_aux *aux)
> > +{
> > +	struct bpf_security_struct *bpfsec = aux->security;
> > +
> > +	aux->security = NULL;
> > +	kfree(bpfsec);
> > +}
> > +#endif
> > +
> >  static struct security_hook_list selinux_hooks[]
> > __lsm_ro_after_init
> > = {
> >  	LSM_HOOK_INIT(binder_set_context_mgr,
> > selinux_binder_set_context_mgr),
> >  	LSM_HOOK_INIT(binder_transaction,
> > selinux_binder_transaction),
> > @@ -6471,6 +6572,16 @@ static struct security_hook_list
> > selinux_hooks[] __lsm_ro_after_init = {
> >  	LSM_HOOK_INIT(audit_rule_match, selinux_audit_rule_match),
> >  	LSM_HOOK_INIT(audit_rule_free, selinux_audit_rule_free),
> >  #endif
> > +
> > +#ifdef CONFIG_BPF_SYSCALL
> > +	LSM_HOOK_INIT(bpf, selinux_bpf),
> > +	LSM_HOOK_INIT(bpf_map, selinux_bpf_map),
> > +	LSM_HOOK_INIT(bpf_prog, selinux_bpf_prog),
> > +	LSM_HOOK_INIT(bpf_map_alloc_security,
> > selinux_bpf_map_alloc),
> > +	LSM_HOOK_INIT(bpf_prog_alloc_security,
> > selinux_bpf_prog_alloc),
> > +	LSM_HOOK_INIT(bpf_map_free_security,
> > selinux_bpf_map_free),
> > +	LSM_HOOK_INIT(bpf_prog_free_security,
> > selinux_bpf_prog_free),
> > +#endif
> >  };
> >  
> >  static __init int selinux_init(void)
> > diff --git a/security/selinux/include/classmap.h
> > b/security/selinux/include/classmap.h
> > index 35ffb29a69cb..7253c5eea59c 100644
> > --- a/security/selinux/include/classmap.h
> > +++ b/security/selinux/include/classmap.h
> > @@ -237,6 +237,8 @@ struct security_class_mapping secclass_map[] =
> > {
> >  	  { "access", NULL } },
> >  	{ "infiniband_endport",
> >  	  { "manage_subnet", NULL } },
> > +	{ "bpf_map", {"create", "read", "write"} },
> > +	{ "bpf_prog", {"load", "use"} },
> 
> Again I have to ask: do you truly need/want two separate classes, or
> would a single class with distinct permissions suffice, ala:
>         { "bpf", { "create_map", "read_map", "write_map",
> "prog_load",
> "prog_use" } },
> 
> and then allow A self:bpf { create_map read_map write_map prog_load
> prog_use }; would be stored in a single policy avtab rule, and be
> cached in a single AVC entry.

I guess for consistency in naming it should be either:
        { "bpf", { "map_create", "map_read", "map_write", "prog_load",
"prog_use" } },
 
or:

        { "bpf", { "create_map", "read_map", "write_map", "load_prog",
"use_prog" } },
 

> >  	{ NULL }
> >    };
> >  
> > diff --git a/security/selinux/include/objsec.h
> > b/security/selinux/include/objsec.h
> > index 1649cd18eb0b..3d54468ce334 100644
> > --- a/security/selinux/include/objsec.h
> > +++ b/security/selinux/include/objsec.h
> > @@ -150,6 +150,10 @@ struct pkey_security_struct {
> >  	u32	sid;	/* SID of pkey */
> >  };
> >  
> > +struct bpf_security_struct {
> > +	u32 sid;  /*SID of bpf obj creater*/
> > +};
> > +
> >  extern unsigned int selinux_checkreqprot;
> >  
> >  #endif /* _SELINUX_OBJSEC_H_ */

^ permalink raw reply

* [PATCH net] macsec: fix memory leaks when skb_to_sgvec fails
From: Sabrina Dubroca @ 2017-10-10 15:07 UTC (permalink / raw)
  To: netdev; +Cc: Sabrina Dubroca, Jason A . Donenfeld

Fixes: cda7ea690350 ("macsec: check return value of skb_to_sgvec always")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
---
 drivers/net/macsec.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/net/macsec.c b/drivers/net/macsec.c
index 98e4deaa3a6a..5ab1b8849c30 100644
--- a/drivers/net/macsec.c
+++ b/drivers/net/macsec.c
@@ -742,6 +742,7 @@ static struct sk_buff *macsec_encrypt(struct sk_buff *skb,
 	sg_init_table(sg, ret);
 	ret = skb_to_sgvec(skb, sg, 0, skb->len);
 	if (unlikely(ret < 0)) {
+		aead_request_free(req);
 		macsec_txsa_put(tx_sa);
 		kfree_skb(skb);
 		return ERR_PTR(ret);
@@ -954,6 +955,7 @@ static struct sk_buff *macsec_decrypt(struct sk_buff *skb,
 	sg_init_table(sg, ret);
 	ret = skb_to_sgvec(skb, sg, 0, skb->len);
 	if (unlikely(ret < 0)) {
+		aead_request_free(req);
 		kfree_skb(skb);
 		return ERR_PTR(ret);
 	}
-- 
2.14.2

^ permalink raw reply related

* Re: [PATCH net-next] openvswitch: add ct_clear action
From: Eric Garver @ 2017-10-10 15:09 UTC (permalink / raw)
  To: Joe Stringer; +Cc: Pravin Shelar, Linux Kernel Network Developers, ovs dev
In-Reply-To: <CAPWQB7FgKYe4Ax08NzW97-WGmriC7j9YhxQF9QtuQZwMjA00bQ@mail.gmail.com>

On Tue, Oct 10, 2017 at 05:33:48AM -0700, Joe Stringer wrote:
> On 9 October 2017 at 21:41, Pravin Shelar <pshelar@ovn.org> wrote:
> > On Fri, Oct 6, 2017 at 9:44 AM, Eric Garver <e@erig.me> wrote:
> >> This adds a ct_clear action for clearing conntrack state. ct_clear is
> >> currently implemented in OVS userspace, but is not backed by an action
> >> in the kernel datapath. This is useful for flows that may modify a
> >> packet tuple after a ct lookup has already occurred.
> >>
> >> Signed-off-by: Eric Garver <e@erig.me>
> > Patch mostly looks good. I have following comments.
> >
> >> ---
> >>  include/uapi/linux/openvswitch.h |  2 ++
> >>  net/openvswitch/actions.c        |  5 +++++
> >>  net/openvswitch/conntrack.c      | 12 ++++++++++++
> >>  net/openvswitch/conntrack.h      |  7 +++++++
> >>  net/openvswitch/flow_netlink.c   |  5 +++++
> >>  5 files changed, 31 insertions(+)
> >>
> >> diff --git a/include/uapi/linux/openvswitch.h b/include/uapi/linux/openvswitch.h
> >> index 156ee4cab82e..1b6e510e2cc6 100644
> >> --- a/include/uapi/linux/openvswitch.h
> >> +++ b/include/uapi/linux/openvswitch.h
> >> @@ -806,6 +806,7 @@ struct ovs_action_push_eth {
> >>   * packet.
> >>   * @OVS_ACTION_ATTR_POP_ETH: Pop the outermost Ethernet header off the
> >>   * packet.
> >> + * @OVS_ACTION_ATTR_CT_CLEAR: Clear conntrack state from the packet.
> >>   *
> >>   * Only a single header can be set with a single %OVS_ACTION_ATTR_SET.  Not all
> >>   * fields within a header are modifiable, e.g. the IPv4 protocol and fragment
> >> @@ -835,6 +836,7 @@ enum ovs_action_attr {
> >>         OVS_ACTION_ATTR_TRUNC,        /* u32 struct ovs_action_trunc. */
> >>         OVS_ACTION_ATTR_PUSH_ETH,     /* struct ovs_action_push_eth. */
> >>         OVS_ACTION_ATTR_POP_ETH,      /* No argument. */
> >> +       OVS_ACTION_ATTR_CT_CLEAR,     /* No argument. */
> >>
> >>         __OVS_ACTION_ATTR_MAX,        /* Nothing past this will be accepted
> >>                                        * from userspace. */
> >> diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c
> >> index a54a556fcdb5..db9c7f2e662b 100644
> >> --- a/net/openvswitch/actions.c
> >> +++ b/net/openvswitch/actions.c
> >> @@ -1203,6 +1203,10 @@ static int do_execute_actions(struct datapath *dp, struct sk_buff *skb,
> >>                                 return err == -EINPROGRESS ? 0 : err;
> >>                         break;
> >>
> >> +               case OVS_ACTION_ATTR_CT_CLEAR:
> >> +                       err = ovs_ct_clear(skb, key);
> >> +                       break;
> >> +
> >>                 case OVS_ACTION_ATTR_PUSH_ETH:
> >>                         err = push_eth(skb, key, nla_data(a));
> >>                         break;
> >> @@ -1210,6 +1214,7 @@ static int do_execute_actions(struct datapath *dp, struct sk_buff *skb,
> >>                 case OVS_ACTION_ATTR_POP_ETH:
> >>                         err = pop_eth(skb, key);
> >>                         break;
> >> +
> >>                 }
> > Unrelated change.
> >
> >>
> >>                 if (unlikely(err)) {
> >> diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c
> >> index d558e882ca0c..f9b73c726ad7 100644
> >> --- a/net/openvswitch/conntrack.c
> >> +++ b/net/openvswitch/conntrack.c
> >> @@ -1129,6 +1129,18 @@ int ovs_ct_execute(struct net *net, struct sk_buff *skb,
> >>         return err;
> >>  }
> >>
> >> +int ovs_ct_clear(struct sk_buff *skb, struct sw_flow_key *key)
> >> +{
> >> +       if (skb_nfct(skb)) {
> >> +               nf_conntrack_put(skb_nfct(skb));
> >> +               nf_ct_set(skb, NULL, 0);
> > Can the new conntract state be appropriate? may be IP_CT_UNTRACKED?
> >
> >> +       }
> >> +
> >> +       ovs_ct_fill_key(skb, key);
> >> +
> > I do not see need to refill the key if there is no skb-nf-ct.
> 
> Really this is trying to just zero the CT key fields, but reuses
> existing functions, right? This means that subsequent upcalls, for

Right.

> instance, won't have the outdated view of the CT state from the
> previous lookup (that was prior to the ct_clear). I'd expect these key
> fields to be cleared.

I assumed Pravin was saying that we don't need to clear them if there is
no conntrack state. They should already be zero.

^ permalink raw reply

* [PATCH v4 net-next] rtnetlink: bridge: use ext_ack instead of printk
From: Florian Westphal @ 2017-10-10 15:10 UTC (permalink / raw)
  To: netdev; +Cc: Florian Westphal, David Ahern

We can now piggyback error strings to userspace via extended acks
rather than using printk.

Before:
bridge fdb add 01:02:03:04:05:06 dev br0 vlan 4095
RTNETLINK answers: Invalid argument

After:
bridge fdb add 01:02:03:04:05:06 dev br0 vlan 4095
Error: invalid vlan id.

v3: drop 'RTM_' prefixes, suggested by David Ahern, they
are not useful, the add/del in bridge command line is enough.

Also reword error in response to malformed/bad vlan id attribute
size.

Cc: David Ahern <dsahern@gmail.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
---
 change since v3: forgot to remove "RTM_SETLINK:" prefix in error message.

 net/core/rtnetlink.c | 28 ++++++++++++++--------------
 1 file changed, 14 insertions(+), 14 deletions(-)

diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index e84d108cfee4..6a09f3d575af 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -3066,21 +3066,21 @@ int ndo_dflt_fdb_add(struct ndmsg *ndm,
 }
 EXPORT_SYMBOL(ndo_dflt_fdb_add);
 
-static int fdb_vid_parse(struct nlattr *vlan_attr, u16 *p_vid)
+static int fdb_vid_parse(struct nlattr *vlan_attr, u16 *p_vid,
+			 struct netlink_ext_ack *extack)
 {
 	u16 vid = 0;
 
 	if (vlan_attr) {
 		if (nla_len(vlan_attr) != sizeof(u16)) {
-			pr_info("PF_BRIDGE: RTM_NEWNEIGH with invalid vlan\n");
+			NL_SET_ERR_MSG(extack, "invalid vlan attribute size");
 			return -EINVAL;
 		}
 
 		vid = nla_get_u16(vlan_attr);
 
 		if (!vid || vid >= VLAN_VID_MASK) {
-			pr_info("PF_BRIDGE: RTM_NEWNEIGH with invalid vlan id %d\n",
-				vid);
+			NL_SET_ERR_MSG(extack, "invalid vlan id");
 			return -EINVAL;
 		}
 	}
@@ -3105,24 +3105,24 @@ static int rtnl_fdb_add(struct sk_buff *skb, struct nlmsghdr *nlh,
 
 	ndm = nlmsg_data(nlh);
 	if (ndm->ndm_ifindex == 0) {
-		pr_info("PF_BRIDGE: RTM_NEWNEIGH with invalid ifindex\n");
+		NL_SET_ERR_MSG(extack, "invalid ifindex");
 		return -EINVAL;
 	}
 
 	dev = __dev_get_by_index(net, ndm->ndm_ifindex);
 	if (dev == NULL) {
-		pr_info("PF_BRIDGE: RTM_NEWNEIGH with unknown ifindex\n");
+		NL_SET_ERR_MSG(extack, "unknown ifindex");
 		return -ENODEV;
 	}
 
 	if (!tb[NDA_LLADDR] || nla_len(tb[NDA_LLADDR]) != ETH_ALEN) {
-		pr_info("PF_BRIDGE: RTM_NEWNEIGH with invalid address\n");
+		NL_SET_ERR_MSG(extack, "invalid address");
 		return -EINVAL;
 	}
 
 	addr = nla_data(tb[NDA_LLADDR]);
 
-	err = fdb_vid_parse(tb[NDA_VLAN], &vid);
+	err = fdb_vid_parse(tb[NDA_VLAN], &vid, extack);
 	if (err)
 		return err;
 
@@ -3209,24 +3209,24 @@ static int rtnl_fdb_del(struct sk_buff *skb, struct nlmsghdr *nlh,
 
 	ndm = nlmsg_data(nlh);
 	if (ndm->ndm_ifindex == 0) {
-		pr_info("PF_BRIDGE: RTM_DELNEIGH with invalid ifindex\n");
+		NL_SET_ERR_MSG(extack, "invalid ifindex");
 		return -EINVAL;
 	}
 
 	dev = __dev_get_by_index(net, ndm->ndm_ifindex);
 	if (dev == NULL) {
-		pr_info("PF_BRIDGE: RTM_DELNEIGH with unknown ifindex\n");
+		NL_SET_ERR_MSG(extack, "unknown ifindex");
 		return -ENODEV;
 	}
 
 	if (!tb[NDA_LLADDR] || nla_len(tb[NDA_LLADDR]) != ETH_ALEN) {
-		pr_info("PF_BRIDGE: RTM_DELNEIGH with invalid address\n");
+		NL_SET_ERR_MSG(extack, "invalid address");
 		return -EINVAL;
 	}
 
 	addr = nla_data(tb[NDA_LLADDR]);
 
-	err = fdb_vid_parse(tb[NDA_VLAN], &vid);
+	err = fdb_vid_parse(tb[NDA_VLAN], &vid, extack);
 	if (err)
 		return err;
 
@@ -3666,7 +3666,7 @@ static int rtnl_bridge_setlink(struct sk_buff *skb, struct nlmsghdr *nlh,
 
 	dev = __dev_get_by_index(net, ifm->ifi_index);
 	if (!dev) {
-		pr_info("PF_BRIDGE: RTM_SETLINK with unknown ifindex\n");
+		NL_SET_ERR_MSG(extack, "unknown ifindex");
 		return -ENODEV;
 	}
 
@@ -3741,7 +3741,7 @@ static int rtnl_bridge_dellink(struct sk_buff *skb, struct nlmsghdr *nlh,
 
 	dev = __dev_get_by_index(net, ifm->ifi_index);
 	if (!dev) {
-		pr_info("PF_BRIDGE: RTM_SETLINK with unknown ifindex\n");
+		NL_SET_ERR_MSG(extack, "unknown ifindex");
 		return -ENODEV;
 	}
 
-- 
2.13.6

^ permalink raw reply related

* Re: [PATCH v4 net-next] rtnetlink: bridge: use ext_ack instead of printk
From: David Ahern @ 2017-10-10 15:11 UTC (permalink / raw)
  To: Florian Westphal, netdev
In-Reply-To: <20171010151004.20056-1-fw@strlen.de>

On 10/10/17 9:10 AM, Florian Westphal wrote:
> We can now piggyback error strings to userspace via extended acks
> rather than using printk.
> 
> Before:
> bridge fdb add 01:02:03:04:05:06 dev br0 vlan 4095
> RTNETLINK answers: Invalid argument
> 
> After:
> bridge fdb add 01:02:03:04:05:06 dev br0 vlan 4095
> Error: invalid vlan id.
> 
> v3: drop 'RTM_' prefixes, suggested by David Ahern, they
> are not useful, the add/del in bridge command line is enough.
> 
> Also reword error in response to malformed/bad vlan id attribute
> size.
> 
> Cc: David Ahern <dsahern@gmail.com>
> Signed-off-by: Florian Westphal <fw@strlen.de>
> ---
>  change since v3: forgot to remove "RTM_SETLINK:" prefix in error message.
> 
>  net/core/rtnetlink.c | 28 ++++++++++++++--------------
>  1 file changed, 14 insertions(+), 14 deletions(-)

Reviewed-by: David Ahern <dsahern@gmail.com>

^ permalink raw reply

* RE: [patch net-next 2/4] net: sched: introduce per-egress action device callbacks
From: David Laight @ 2017-10-10 15:12 UTC (permalink / raw)
  To: 'Jiri Pirko'
  Cc: netdev@vger.kernel.org, davem@davemloft.net, jhs@mojatatu.com,
	xiyou.wangcong@gmail.com, saeedm@mellanox.com,
	matanb@mellanox.com, leonro@mellanox.com, mlxsw@mellanox.com
In-Reply-To: <20171010143152.GG2033@nanopsycho>

From: Jiri Pirko
> Sent: 10 October 2017 15:32
> To: David Laight
> Cc: netdev@vger.kernel.org; davem@davemloft.net; jhs@mojatatu.com; xiyou.wangcong@gmail.com;
> saeedm@mellanox.com; matanb@mellanox.com; leonro@mellanox.com; mlxsw@mellanox.com
> Subject: Re: [patch net-next 2/4] net: sched: introduce per-egress action device callbacks
> 
> Tue, Oct 10, 2017 at 03:31:59PM CEST, David.Laight@ACULAB.COM wrote:
> >From: Jiri Pirko
> >> Sent: 10 October 2017 08:30
> >> Introduce infrastructure that allows drivers to register callbacks that
> >> are called whenever tc would offload inserted rule and specified device
> >> acts as tc action egress device.
> >
> >How does a driver safely unregister a callback?
> >(to avoid a race with the callback being called.)
> >
> >Usually this requires a callback in the context that makes the
> >notification callbacks indicating that no more such callbacks
> >will be made.
> 
> rtnl is your answer. It is being held during register/unregister/cb

Do you mean 'acquired during register/unregister' and 'held across the
callback' ?

So the unregister sleeps (or spins?) until any callbacks complete?
So the driver mustn't hold any locks (etc) across the unregister that
it acquires in the callback.
That ought to be noted somewhere.

	David

^ permalink raw reply

* RE: [PATCH v2 net-next 1/2] net: dsa: lan9303: Move tag setup to new lan9303_setup_tagging
From: Woojung.Huh @ 2017-10-10 15:14 UTC (permalink / raw)
  To: privat, andrew, vivien.didelot, f.fainelli, netdev, linux-kernel
In-Reply-To: <20171010124953.386-2-privat@egil-hjelmeland.no>

> +/* forward special tagged packets from port 0 to port 1 *or* port 2 */
> +static int lan9303_setup_tagging(struct lan9303 *chip)
> +{
> +	int ret;
> +	u32 val;
> +	/* enable defining the destination port via special VLAN tagging
> +	 * for port 0
> +	 */
> +	ret = lan9303_write_switch_reg(chip,
> LAN9303_SWE_INGRESS_PORT_TYPE,
> +
> LAN9303_SWE_INGRESS_PORT_TYPE_VLAN);
> +	if (ret)
> +		return ret;
> +
> +	/* tag incoming packets at port 1 and 2 on their way to port 0 to be
> +	 * able to discover their source port
> +	 */
> +	val = LAN9303_BM_EGRSS_PORT_TYPE_SPECIAL_TAG_PORT0;
> +	return lan9303_write_switch_reg(chip,
> LAN9303_BM_EGRSS_PORT_TYPE, val);
Specific reason to use val then using LAN9303_BM_EGRSS_PORT_TYPE_SPECIAL_TAG_PORT0
like previous line?

> @@ -644,6 +648,10 @@ static int lan9303_setup(struct dsa_switch *ds)
>  		return -EINVAL;
>  	}
> 
> +	ret = lan9303_setup_tagging(chip);
> +	if (ret)
> +		dev_err(chip->dev, "failed to setup port tagging %d\n", ret);
> +
Still move on when error happens?

>  	ret = lan9303_separate_ports(chip);
>  	if (ret)
>  		dev_err(chip->dev, "failed to separate ports %d\n", ret);
> --
> 2.11.0

- Woojung

^ permalink raw reply

* [net 0/2][pull request] Intel Wired LAN Driver Updates 2017-10-10
From: Jeff Kirsher @ 2017-10-10 15:14 UTC (permalink / raw)
  To: davem; +Cc: Jeff Kirsher, netdev, nhorman, sassmann, jogreene

This series contains updates to i40e only.

Stefano Brivio fixes the grammar in a function header comment.

Alex fixes a memory leak where we were not correctly placing the pages
from buffers that had been used to return a filter programming status
back on the ring.

The following are changes since commit 529a86e063e9ff625c4ff247d8aa17d8072444fb:
  Merge branch 'ppc-bundle' (bundle from Michael Ellerman)
and are available in the git repository at:
  git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-queue 40GbE

Alexander Duyck (1):
  i40e: Fix memory leak related filter programming status

Stefano Brivio (1):
  i40e: Fix comment about locking for __i40e_read_nvm_word()

 drivers/net/ethernet/intel/i40e/i40e_nvm.c  |  2 +-
 drivers/net/ethernet/intel/i40e/i40e_txrx.c | 63 ++++++++++++++++-------------
 2 files changed, 37 insertions(+), 28 deletions(-)

-- 
2.14.2

^ permalink raw reply

* [net 1/2] i40e: Fix comment about locking for __i40e_read_nvm_word()
From: Jeff Kirsher @ 2017-10-10 15:14 UTC (permalink / raw)
  To: davem; +Cc: Stefano Brivio, netdev, nhorman, sassmann, jogreene, Jeff Kirsher
In-Reply-To: <20171010151416.43149-1-jeffrey.t.kirsher@intel.com>

From: Stefano Brivio <sbrivio@redhat.com>

Caller needs to acquire the lock. Called functions will not.

Fixes: 09f79fd49d94 ("i40e: avoid NVM acquire deadlock during NVM update")
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/i40e/i40e_nvm.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_nvm.c b/drivers/net/ethernet/intel/i40e/i40e_nvm.c
index 57505b1df98d..d591b3e6bd7c 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_nvm.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_nvm.c
@@ -298,7 +298,7 @@ static i40e_status i40e_read_nvm_word_aq(struct i40e_hw *hw, u16 offset,
 }
 
 /**
- * __i40e_read_nvm_word - Reads nvm word, assumes called does the locking
+ * __i40e_read_nvm_word - Reads nvm word, assumes caller does the locking
  * @hw: pointer to the HW structure
  * @offset: offset of the Shadow RAM word to read (0x000000 - 0x001FFF)
  * @data: word read from the Shadow RAM
-- 
2.14.2

^ permalink raw reply related

* [net 2/2] i40e: Fix memory leak related filter programming status
From: Jeff Kirsher @ 2017-10-10 15:14 UTC (permalink / raw)
  To: davem; +Cc: Alexander Duyck, netdev, nhorman, sassmann, jogreene,
	Jeff Kirsher
In-Reply-To: <20171010151416.43149-1-jeffrey.t.kirsher@intel.com>

From: Alexander Duyck <alexander.h.duyck@intel.com>

It looks like we weren't correctly placing the pages from buffers that had
been used to return a filter programming status back on the ring. As a
result they were being overwritten and tracking of the pages was lost.

This change works to correct that by incorporating part of
i40e_put_rx_buffer into the programming status handler code. As a result we
should now be correctly placing the pages for those buffers on the
re-allocation list instead of letting them stay in place.

Fixes: 0e626ff7ccbf ("i40e: Fix support for flow director programming status")
Reported-by: Anders K. Pedersen <akp@cohaesio.com>
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Anders K Pedersen <akp@cohaesio.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/i40e/i40e_txrx.c | 63 ++++++++++++++++-------------
 1 file changed, 36 insertions(+), 27 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
index 1519dfb851d0..2756131495f0 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
@@ -1037,6 +1037,32 @@ static bool i40e_set_new_dynamic_itr(struct i40e_ring_container *rc)
 	return false;
 }
 
+/**
+ * i40e_reuse_rx_page - page flip buffer and store it back on the ring
+ * @rx_ring: rx descriptor ring to store buffers on
+ * @old_buff: donor buffer to have page reused
+ *
+ * Synchronizes page for reuse by the adapter
+ **/
+static void i40e_reuse_rx_page(struct i40e_ring *rx_ring,
+			       struct i40e_rx_buffer *old_buff)
+{
+	struct i40e_rx_buffer *new_buff;
+	u16 nta = rx_ring->next_to_alloc;
+
+	new_buff = &rx_ring->rx_bi[nta];
+
+	/* update, and store next to alloc */
+	nta++;
+	rx_ring->next_to_alloc = (nta < rx_ring->count) ? nta : 0;
+
+	/* transfer page from old buffer to new buffer */
+	new_buff->dma		= old_buff->dma;
+	new_buff->page		= old_buff->page;
+	new_buff->page_offset	= old_buff->page_offset;
+	new_buff->pagecnt_bias	= old_buff->pagecnt_bias;
+}
+
 /**
  * i40e_rx_is_programming_status - check for programming status descriptor
  * @qw: qword representing status_error_len in CPU ordering
@@ -1071,15 +1097,24 @@ static void i40e_clean_programming_status(struct i40e_ring *rx_ring,
 					  union i40e_rx_desc *rx_desc,
 					  u64 qw)
 {
-	u32 ntc = rx_ring->next_to_clean + 1;
+	struct i40e_rx_buffer *rx_buffer;
+	u32 ntc = rx_ring->next_to_clean;
 	u8 id;
 
 	/* fetch, update, and store next to clean */
+	rx_buffer = &rx_ring->rx_bi[ntc++];
 	ntc = (ntc < rx_ring->count) ? ntc : 0;
 	rx_ring->next_to_clean = ntc;
 
 	prefetch(I40E_RX_DESC(rx_ring, ntc));
 
+	/* place unused page back on the ring */
+	i40e_reuse_rx_page(rx_ring, rx_buffer);
+	rx_ring->rx_stats.page_reuse_count++;
+
+	/* clear contents of buffer_info */
+	rx_buffer->page = NULL;
+
 	id = (qw & I40E_RX_PROG_STATUS_DESC_QW1_PROGID_MASK) >>
 		  I40E_RX_PROG_STATUS_DESC_QW1_PROGID_SHIFT;
 
@@ -1638,32 +1673,6 @@ static bool i40e_cleanup_headers(struct i40e_ring *rx_ring, struct sk_buff *skb,
 	return false;
 }
 
-/**
- * i40e_reuse_rx_page - page flip buffer and store it back on the ring
- * @rx_ring: rx descriptor ring to store buffers on
- * @old_buff: donor buffer to have page reused
- *
- * Synchronizes page for reuse by the adapter
- **/
-static void i40e_reuse_rx_page(struct i40e_ring *rx_ring,
-			       struct i40e_rx_buffer *old_buff)
-{
-	struct i40e_rx_buffer *new_buff;
-	u16 nta = rx_ring->next_to_alloc;
-
-	new_buff = &rx_ring->rx_bi[nta];
-
-	/* update, and store next to alloc */
-	nta++;
-	rx_ring->next_to_alloc = (nta < rx_ring->count) ? nta : 0;
-
-	/* transfer page from old buffer to new buffer */
-	new_buff->dma		= old_buff->dma;
-	new_buff->page		= old_buff->page;
-	new_buff->page_offset	= old_buff->page_offset;
-	new_buff->pagecnt_bias	= old_buff->pagecnt_bias;
-}
-
 /**
  * i40e_page_is_reusable - check if any reuse is possible
  * @page: page struct to check
-- 
2.14.2

^ permalink raw reply related

* Re: RIF/VRF overflow in spectrum and reporting errors back to user
From: David Ahern @ 2017-10-10 15:23 UTC (permalink / raw)
  To: Ido Schimmel; +Cc: Ido Schimmel, Jiri Pirko, netdev@vger.kernel.org
In-Reply-To: <20171009093110.GA5193@shredder.mtl.com>

On 10/9/17 3:31 AM, Ido Schimmel wrote:
> Hi David,
> 
> On Sun, Oct 08, 2017 at 02:10:33PM -0600, David Ahern wrote:
>> Jiri / Ido:
>>
>> I am looking at adding user messages for spectrum failures related to
>> RIF and VRF overflow coming from the inetaddr and inet6addr notifier
>> paths. The key is that if the notifiers fail the address add needs to
>> fail and an error reported to the user as to what happened.
> 
> Thanks for working on this. Very nice idea!
> 
>> Earlier this year 3ad7d2468f79f added in_validator_info and
>> in6_validator_info as a way for the notifiers to fail adding an address.
>> Adding support to spectrum for that notifier is complicated by the fact
>> that the validator notifier and address notifiers will come in back to
>> back for the NETDEV_UP case. Ignoring NETDEV_UP in
>> mlxsw_sp_inetaddr_event seems ok for IPv6 but not clear for IPv4 since
>> the NETDEV_UP case is emitted on an address delete that involves a
>> promotion. Handling the back to back NETDEV_UP is complicated since
>> functions invoked by __mlxsw_sp_inetaddr_event can take multiple
>> references. Specifically, in mlxsw_sp_port_vlan_router_join():
>>     fid = rif->ops->fid_get(rif);
>>
>> Can NETDEV_UP be ignored for the inetaddr notifier if it is handled by
>> the validator notitifer?
> 
> Yes. The case where we get a NETDEV_DOWN for an address delete and then
> a NETDEV_UP for a promotion is basically a NOP from the driver's
> perspective. When the NETDEV_DOWN is received, the RIF isn't destroyed
> because the address list isn't empty (there's an address to be
> promoted). When the NETDEV_UP is received, it's ignored because we
> already have a RIF.

You lost me on the RIF. Looking at the chain:

mlxsw_sp_inet6addr_event_work or mlxsw_sp_inetaddr_event
- __mlxsw_sp_inetaddr_event
  + mlxsw_sp_inetaddr_vlan_event
    * mlxsw_sp_inetaddr_port_vlan_event
      - NETDEV_UP: mlxsw_sp_port_vlan_router_join

mlxsw_sp_port_vlan_router_join does the rif lookup and if it exists
calls fid_get() which takes a reference. I read that to mean
back-to-back NETDEV_UP notifiers (the address validator and then the
address notifier) would lead to a reference count leak.

Based on your address delete comment, I take the IPv4 solution to be
adding the validator notifier to spectrum and then ignoring NETDEV_UP in
mlxsw_sp_inetaddr_event. That means IPv4 inetaddr work is done for the
validator notifier while NETDEV_DOWN is done through the inetaddr notifier.

> 
> Regarding IPv6, it's a bit more complicated actually, since we do the
> actual work in a workqueue, as the notification chain is atomic. I
> believe this is because the notifier can be called from softirq in
> response to RA packets.
> 
> However, this case isn't interesting for mlxsw, as the fact that you
> process an RA packet suggests you already have a link-local address and
> thus a RIF. Plus, the kernel won't even process such packets in our case
> as you most likely have forwarding enabled (unless you tweaked accept_ra
> for some reason).
> 
> Looking at ipvlan (the only user of inet6addr_validator_chain), I see
> that it ignores this specific case and returns NOTIFY_DONE. Maybe we can
> move this notification chain to be blocking and not call it in response
> to RA packets seeing that all its users ignore it?

Seems reasonable to me.

I have it coded. Let me test and send an rfc.

^ permalink raw reply

* Re: [PATCH RFC 0/3] tun zerocopy stats
From: Willem de Bruijn @ 2017-10-10 15:29 UTC (permalink / raw)
  To: David Miller
  Cc: Network Development, Michael S. Tsirkin, Jason Wang,
	Willem de Bruijn
In-Reply-To: <20171009.205228.714368596112967819.davem@davemloft.net>

On Mon, Oct 9, 2017 at 11:52 PM, David Miller <davem@davemloft.net> wrote:
> From: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
> Date: Fri,  6 Oct 2017 18:25:13 -0400
>
>> From: Willem de Bruijn <willemb@google.com>
>>
>> Add zerocopy transfer statistics to the vhost_net/tun zerocopy path.
>>
>> I've been using this to verify recent changes to zerocopy tuning [1].
>> Sharing more widely, as it may be useful in similar future work.
>>
>> Use ethtool stats as interface, as these are defined per device
>> driver and can easily be extended.
>>
>> Make the zerocopy release callback take an extra hop through the tun
>> driver to allow the driver to increment its counters.
>>
>> Care must be taken to avoid adding an alloc/free to this hot path.
>> Since the caller already must allocate a ubuf_info, make it allocate
>> two at a time and grant one to the tun device.
>>
>>  1/3: introduce ethtool stats (`ethtool -S $DEV`) for tun devices
>>  2/3: add zerocopy tx and tx_err counters
>>  3/3: convert vhost_net to pass a pair of ubuf_info to tun
>>
>> [1] http://patchwork.ozlabs.org/patch/822613/
>
> This looks mostly fine to me, but I don't know enough about how vhost
> and tap interact to tell whether this makes sense to upstream.

Thanks for taking a look. The need for monitoring these stats has
come up in a couple of patch evaluation discussions, so I wanted
to share at least one implementation to get the data.

Because the choice to use zerocopy is based on heuristics and
there is a cost if it mispredicts, I think we even want to being able
to continuously monitor this in production.

The implementation is probably not ready for that as is.

> What are the runtime costs for these new statistics?

It currently doubles the size of the ubuf_info memory pool. That can be
fixed, as the current size is UIO_MAXIOV (1024), but the number of
zerocopy packets in flight is bound by VHOST_MAX_PEND (128).

It also adds an indirect function call to call to each zerocopy skb free
path, though.

If there is a way to expose these stats through vhost_net directly,
instead of through tun, that may be better. But I did not see a
suitable interface. Perhaps debugfs.

^ permalink raw reply

* Re: [PATCH v2 net-next 1/2] net: dsa: lan9303: Move tag setup to new lan9303_setup_tagging
From: Egil Hjelmeland @ 2017-10-10 15:30 UTC (permalink / raw)
  To: Woojung.Huh, andrew, vivien.didelot, f.fainelli, netdev,
	linux-kernel
In-Reply-To: <9235D6609DB808459E95D78E17F2E43D40B48C3D@CHN-SV-EXMX02.mchp-main.com>

On 10. okt. 2017 17:14, Woojung.Huh@microchip.com wrote:
>> +/* forward special tagged packets from port 0 to port 1 *or* port 2 */
>> +static int lan9303_setup_tagging(struct lan9303 *chip)
>> +{
>> +	int ret;
>> +	u32 val;
>> +	/* enable defining the destination port via special VLAN tagging
>> +	 * for port 0
>> +	 */
>> +	ret = lan9303_write_switch_reg(chip,
>> LAN9303_SWE_INGRESS_PORT_TYPE,
>> +
>> LAN9303_SWE_INGRESS_PORT_TYPE_VLAN);
>> +	if (ret)
>> +		return ret;
>> +
>> +	/* tag incoming packets at port 1 and 2 on their way to port 0 to be
>> +	 * able to discover their source port
>> +	 */
>> +	val = LAN9303_BM_EGRSS_PORT_TYPE_SPECIAL_TAG_PORT0;
>> +	return lan9303_write_switch_reg(chip,
>> LAN9303_BM_EGRSS_PORT_TYPE, val);
> Specific reason to use val then using LAN9303_BM_EGRSS_PORT_TYPE_SPECIAL_TAG_PORT0
> like previous line?
> 
Specific reason was to please a reviewer that did not like my
indenting in first version. I did not agree with him, but since
nobody else spoke up, I changed the code.

>> @@ -644,6 +648,10 @@ static int lan9303_setup(struct dsa_switch *ds)
>>   		return -EINVAL;
>>   	}
>>
>> +	ret = lan9303_setup_tagging(chip);
>> +	if (ret)
>> +		dev_err(chip->dev, "failed to setup port tagging %d\n", ret);
>> +
> Still move on when error happens?
> 
Good question. I just followed the pattern from the original function,
which was not made by me. Actually I did once reflect on whether this 
was the correct way. Perhaps it could be argued that it is better to 
allow the device to come up, so the problem can be investigated?

>>   	ret = lan9303_separate_ports(chip);
>>   	if (ret)
>>   		dev_err(chip->dev, "failed to separate ports %d\n", ret);
>> --
>> 2.11.0
> 
> - Woojung
> 

^ permalink raw reply

* Re: [patch net-next 2/4] net: sched: introduce per-egress action device callbacks
From: Jiri Pirko @ 2017-10-10 15:39 UTC (permalink / raw)
  To: David Laight
  Cc: netdev@vger.kernel.org, davem@davemloft.net, jhs@mojatatu.com,
	xiyou.wangcong@gmail.com, saeedm@mellanox.com,
	matanb@mellanox.com, leonro@mellanox.com, mlxsw@mellanox.com
In-Reply-To: <063D6719AE5E284EB5DD2968C1650D6DD008F06E@AcuExch.aculab.com>

Tue, Oct 10, 2017 at 05:12:34PM CEST, David.Laight@ACULAB.COM wrote:
>From: Jiri Pirko
>> Sent: 10 October 2017 15:32
>> To: David Laight
>> Cc: netdev@vger.kernel.org; davem@davemloft.net; jhs@mojatatu.com; xiyou.wangcong@gmail.com;
>> saeedm@mellanox.com; matanb@mellanox.com; leonro@mellanox.com; mlxsw@mellanox.com
>> Subject: Re: [patch net-next 2/4] net: sched: introduce per-egress action device callbacks
>> 
>> Tue, Oct 10, 2017 at 03:31:59PM CEST, David.Laight@ACULAB.COM wrote:
>> >From: Jiri Pirko
>> >> Sent: 10 October 2017 08:30
>> >> Introduce infrastructure that allows drivers to register callbacks that
>> >> are called whenever tc would offload inserted rule and specified device
>> >> acts as tc action egress device.
>> >
>> >How does a driver safely unregister a callback?
>> >(to avoid a race with the callback being called.)
>> >
>> >Usually this requires a callback in the context that makes the
>> >notification callbacks indicating that no more such callbacks
>> >will be made.
>> 
>> rtnl is your answer. It is being held during register/unregister/cb
>
>Do you mean 'acquired during register/unregister' and 'held across the
>callback' ?
>
>So the unregister sleeps (or spins?) until any callbacks complete?
>So the driver mustn't hold any locks (etc) across the unregister that
>it acquires in the callback.
>That ought to be noted somewhere.

You actually have a point. I don't take rtnl for reg/unreg as I suppose
to. Will fix.


>
>	David
>

^ permalink raw reply

* Re: RIF/VRF overflow in spectrum and reporting errors back to user
From: Ido Schimmel @ 2017-10-10 15:47 UTC (permalink / raw)
  To: David Ahern; +Cc: Ido Schimmel, Jiri Pirko, netdev@vger.kernel.org
In-Reply-To: <779c2ba6-fd29-b260-f008-b41a607acd41@gmail.com>

On Tue, Oct 10, 2017 at 09:23:48AM -0600, David Ahern wrote:
> On 10/9/17 3:31 AM, Ido Schimmel wrote:
> >> Can NETDEV_UP be ignored for the inetaddr notifier if it is handled by
> >> the validator notitifer?
> > 
> > Yes. The case where we get a NETDEV_DOWN for an address delete and then
> > a NETDEV_UP for a promotion is basically a NOP from the driver's
> > perspective. When the NETDEV_DOWN is received, the RIF isn't destroyed
> > because the address list isn't empty (there's an address to be
> > promoted). When the NETDEV_UP is received, it's ignored because we
> > already have a RIF.
> 
> You lost me on the RIF. Looking at the chain:
> 
> mlxsw_sp_inet6addr_event_work or mlxsw_sp_inetaddr_event
> - __mlxsw_sp_inetaddr_event
>   + mlxsw_sp_inetaddr_vlan_event
>     * mlxsw_sp_inetaddr_port_vlan_event
>       - NETDEV_UP: mlxsw_sp_port_vlan_router_join
> 
> mlxsw_sp_port_vlan_router_join does the rif lookup and if it exists
> calls fid_get() which takes a reference. I read that to mean
> back-to-back NETDEV_UP notifiers (the address validator and then the
> address notifier) would lead to a reference count leak.
> 
> Based on your address delete comment, I take the IPv4 solution to be
> adding the validator notifier to spectrum and then ignoring NETDEV_UP in
> mlxsw_sp_inetaddr_event. That means IPv4 inetaddr work is done for the
> validator notifier while NETDEV_DOWN is done through the inetaddr notifier.

Exactly. The only NETDEV_UP we "miss" is the one sent for the promoted
address in the inetaddr chain, but it's irrelevant because when we got
the preceding NETDEV_DOWN for the deleted primary address we didn't
destroy the RIF as the address list wasn't empty (see
mlxsw_sp_rif_should_config() which is called by both top functions in
your call chain).

> > Regarding IPv6, it's a bit more complicated actually, since we do the
> > actual work in a workqueue, as the notification chain is atomic. I
> > believe this is because the notifier can be called from softirq in
> > response to RA packets.
> > 
> > However, this case isn't interesting for mlxsw, as the fact that you
> > process an RA packet suggests you already have a link-local address and
> > thus a RIF. Plus, the kernel won't even process such packets in our case
> > as you most likely have forwarding enabled (unless you tweaked accept_ra
> > for some reason).
> > 
> > Looking at ipvlan (the only user of inet6addr_validator_chain), I see
> > that it ignores this specific case and returns NOTIFY_DONE. Maybe we can
> > move this notification chain to be blocking and not call it in response
> > to RA packets seeing that all its users ignore it?
> 
> Seems reasonable to me.
> 
> I have it coded. Let me test and send an rfc.

Great. Looking forward to it.

^ permalink raw reply

* RE: [PATCH v2 net-next 1/2] net: dsa: lan9303: Move tag setup to new lan9303_setup_tagging
From: Woojung.Huh @ 2017-10-10 15:51 UTC (permalink / raw)
  To: privat, andrew, vivien.didelot, f.fainelli, netdev, linux-kernel
In-Reply-To: <2d03fb99-fd1c-9a0b-3274-eda605c3eb2d@egil-hjelmeland.no>

> > Specific reason to use val then using
> LAN9303_BM_EGRSS_PORT_TYPE_SPECIAL_TAG_PORT0
> > like previous line?
> >
> Specific reason was to please a reviewer that did not like my
> indenting in first version. I did not agree with him, but since
> nobody else spoke up, I changed the code.
Got it. Missed previous patch/comment.

> >> @@ -644,6 +648,10 @@ static int lan9303_setup(struct dsa_switch *ds)
> >>   		return -EINVAL;
> >>   	}
> >>
> >> +	ret = lan9303_setup_tagging(chip);
> >> +	if (ret)
> >> +		dev_err(chip->dev, "failed to setup port tagging %d\n", ret);
> >> +
> > Still move on when error happens?
> >
> Good question. I just followed the pattern from the original function,
> which was not made by me. Actually I did once reflect on whether this
> was the correct way. Perhaps it could be argued that it is better to
> allow the device to come up, so the problem can be investigated?
Maybe depends on severity of setting?
BTW, lan9303_setup() still returns ZERO at the end?

Thanks.
Woojung

^ permalink raw reply

* Re: [PATCH v2 net-next 1/2] net: dsa: lan9303: Move tag setup to new lan9303_setup_tagging
From: Vivien Didelot @ 2017-10-10 15:51 UTC (permalink / raw)
  To: Egil Hjelmeland, Woojung.Huh, andrew, f.fainelli, netdev,
	linux-kernel
In-Reply-To: <2d03fb99-fd1c-9a0b-3274-eda605c3eb2d@egil-hjelmeland.no>

Hi Egil,

Egil Hjelmeland <privat@egil-hjelmeland.no> writes:

>>> +	val = LAN9303_BM_EGRSS_PORT_TYPE_SPECIAL_TAG_PORT0;
>>> +	return lan9303_write_switch_reg(chip,
>>> LAN9303_BM_EGRSS_PORT_TYPE, val);
>> Specific reason to use val then using LAN9303_BM_EGRSS_PORT_TYPE_SPECIAL_TAG_PORT0
>> like previous line?
>> 
> Specific reason was to please a reviewer that did not like my
> indenting in first version. I did not agree with him, but since
> nobody else spoke up, I changed the code.

Your indentation was broken and did not respect the Kernel Coding
Style. Using a temporary variable here ensures you respect both the
80-char limit and the vertical alignment on opening parenthesis.

You'd like to use ./scripts/checkpatch.pl before submitting patches.


Thanks,

        Vivien

^ permalink raw reply

* Re: [PATCH RFC tip/core/rcu 14/15] netfilter: Remove now-redundant smp_read_barrier_depends()
From: Paul E. McKenney @ 2017-10-10 15:56 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, mingo, torvalds, mark.rutland, dhowells, linux-arch,
	will.deacon, Pablo Neira Ayuso, Jozsef Kadlecsik,
	Florian Westphal, David S. Miller, netfilter-devel, coreteam,
	netdev
In-Reply-To: <20171010084334.nbyhryiwyrl6km4u@hirez.programming.kicks-ass.net>

On Tue, Oct 10, 2017 at 10:43:34AM +0200, Peter Zijlstra wrote:
> On Mon, Oct 09, 2017 at 05:22:48PM -0700, Paul E. McKenney wrote:
> > READ_ONCE() now implies smp_read_barrier_depends(), which means that
> > the instances in arpt_do_table(), ipt_do_table(), and ip6t_do_table()
> > are now redundant.  This commit removes them and adjusts the comments.
> 
> Similar to the previous patch, the lack of READ_ONCE() in the original
> code is a pre-existing bug. It would allow the compiler to tear the load
> and observe a composite of two difference pointer values, or reload the
> private pointer and result in table_base and jumpstacl being part of
> different objects.
> 
> It would be good to point out this actually fixes a bug in the code.

Assuming that these changes actually fixed something, agreed.  ;-)

							Thanx, Paul

^ permalink raw reply

* [PATCH net 0/2] nfp: fix ethtool stats and page allocation
From: Jakub Kicinski @ 2017-10-10 16:16 UTC (permalink / raw)
  To: netdev; +Cc: oss-drivers, Jakub Kicinski

Hi!

Two fixes for net.  First one makes sure we handle gather of stats on
32bit machines correctly (ouch).  The second fix solves a potential
NULL-deref if we fail to allocate a page with XDP running.

I used Fixes: tags pointing to where the bug was introduced, but for
patch 1 it has been in the driver "for ever" and fix won't backport
cleanly beyond commit 325945ede6d4 ("nfp: split software and hardware 
vNIC statistics") which is in net.

Jakub Kicinski (2):
  nfp: fix ethtool stats gather retry
  nfp: handle page allocation failures

 drivers/net/ethernet/netronome/nfp/nfp_net_common.c  | 20 ++++++++++++++------
 drivers/net/ethernet/netronome/nfp/nfp_net_ethtool.c |  8 +++++---
 2 files changed, 19 insertions(+), 9 deletions(-)

-- 
2.14.1

^ permalink raw reply

* [PATCH net 1/2] nfp: fix ethtool stats gather retry
From: Jakub Kicinski @ 2017-10-10 16:16 UTC (permalink / raw)
  To: netdev; +Cc: oss-drivers, Jakub Kicinski
In-Reply-To: <20171010161623.23838-1-jakub.kicinski@netronome.com>

The while loop fetching 64 bit ethtool statistics may have
to retry multiple times, it shouldn't modify the outside state.

Fixes: 4c3523623dc0 ("net: add driver for Netronome NFP4000/NFP6000 NIC VFs")
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
---
 drivers/net/ethernet/netronome/nfp/nfp_net_ethtool.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_ethtool.c b/drivers/net/ethernet/netronome/nfp/nfp_net_ethtool.c
index 07969f06df10..dc016dfec64d 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net_ethtool.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net_ethtool.c
@@ -464,7 +464,7 @@ static u64 *nfp_vnic_get_sw_stats(struct net_device *netdev, u64 *data)
 
 		do {
 			start = u64_stats_fetch_begin(&nn->r_vecs[i].rx_sync);
-			*data++ = nn->r_vecs[i].rx_pkts;
+			data[0] = nn->r_vecs[i].rx_pkts;
 			tmp[0] = nn->r_vecs[i].hw_csum_rx_ok;
 			tmp[1] = nn->r_vecs[i].hw_csum_rx_inner_ok;
 			tmp[2] = nn->r_vecs[i].hw_csum_rx_error;
@@ -472,14 +472,16 @@ static u64 *nfp_vnic_get_sw_stats(struct net_device *netdev, u64 *data)
 
 		do {
 			start = u64_stats_fetch_begin(&nn->r_vecs[i].tx_sync);
-			*data++ = nn->r_vecs[i].tx_pkts;
-			*data++ = nn->r_vecs[i].tx_busy;
+			data[1] = nn->r_vecs[i].tx_pkts;
+			data[2] = nn->r_vecs[i].tx_busy;
 			tmp[3] = nn->r_vecs[i].hw_csum_tx;
 			tmp[4] = nn->r_vecs[i].hw_csum_tx_inner;
 			tmp[5] = nn->r_vecs[i].tx_gather;
 			tmp[6] = nn->r_vecs[i].tx_lso;
 		} while (u64_stats_fetch_retry(&nn->r_vecs[i].tx_sync, start));
 
+		data += 3;
+
 		for (j = 0; j < NN_ET_RVEC_GATHER_STATS; j++)
 			gathered_stats[j] += tmp[j];
 	}
-- 
2.14.1

^ permalink raw reply related

* [PATCH net 2/2] nfp: handle page allocation failures
From: Jakub Kicinski @ 2017-10-10 16:16 UTC (permalink / raw)
  To: netdev; +Cc: oss-drivers, Jakub Kicinski
In-Reply-To: <20171010161623.23838-1-jakub.kicinski@netronome.com>

page_address() does not handle NULL argument gracefully,
make sure we NULL-check the page pointer before passing it
to page_address().

Fixes: ecd63a0217d5 ("nfp: add XDP support in the driver")
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
---
 drivers/net/ethernet/netronome/nfp/nfp_net_common.c | 20 ++++++++++++++------
 1 file changed, 14 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
index 1c0187f0af51..e118b5f23996 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
@@ -1180,10 +1180,14 @@ static void *nfp_net_rx_alloc_one(struct nfp_net_dp *dp, dma_addr_t *dma_addr)
 {
 	void *frag;
 
-	if (!dp->xdp_prog)
+	if (!dp->xdp_prog) {
 		frag = netdev_alloc_frag(dp->fl_bufsz);
-	else
-		frag = page_address(alloc_page(GFP_KERNEL | __GFP_COLD));
+	} else {
+		struct page *page;
+
+		page = alloc_page(GFP_KERNEL | __GFP_COLD);
+		frag = page ? page_address(page) : NULL;
+	}
 	if (!frag) {
 		nn_dp_warn(dp, "Failed to alloc receive page frag\n");
 		return NULL;
@@ -1203,10 +1207,14 @@ static void *nfp_net_napi_alloc_one(struct nfp_net_dp *dp, dma_addr_t *dma_addr)
 {
 	void *frag;
 
-	if (!dp->xdp_prog)
+	if (!dp->xdp_prog) {
 		frag = napi_alloc_frag(dp->fl_bufsz);
-	else
-		frag = page_address(alloc_page(GFP_ATOMIC | __GFP_COLD));
+	} else {
+		struct page *page;
+
+		page = alloc_page(GFP_ATOMIC | __GFP_COLD);
+		frag = page ? page_address(page) : NULL;
+	}
 	if (!frag) {
 		nn_dp_warn(dp, "Failed to alloc receive page frag\n");
 		return NULL;
-- 
2.14.1

^ permalink raw reply related

* Re: [RFC net 1/1] net: sched: act: fix rcu race in dump
From: Cong Wang @ 2017-10-10 16:40 UTC (permalink / raw)
  To: Alexander Aring
  Cc: Jamal Hadi Salim, Jiri Pirko, Linux Kernel Network Developers,
	kurup.manish, Brenda Butler
In-Reply-To: <20171010123218.5251-2-aring@mojatatu.com>

On Tue, Oct 10, 2017 at 5:32 AM, Alexander Aring <aring@mojatatu.com> wrote:
> This patch fixes an issue with kfree_rcu which is not protected by RTNL
> lock. It could be that the current assigned rcu pointer will be freed by
> kfree_rcu while dump callback is running.

Why? kfree_rcu() respects existing readers, so why this could happen?


>
> To prevent this, we call rcu_synchronize at first. Then we are sure all
> latest rcu functions e.g. rcu_assign_pointer and kfree_rcu in init are
> done. After rcu_synchronize we dereference under RTNL lock which is also
> held in init function, which means no other rcu_assign_pointer or
> kfree_rcu will occur.

If you really want to wait for kfree_rcu(), rcu_barrier() is the one
instead of rcu_synchronize(). Just FYI.


>
> To call rcu_synchronize will also prevent weird behaviours by doing over
> netlink:
>
>  - set params A
>  - set params B
>  - dump params
>   \--> will dump params A


What's wrong with this? Existing readers could still read old data,
which is _perfectly_ fine as long as we don't free the old data before
they are gone.

^ permalink raw reply

* [RFC net-next 0/4] mlxsw: spectrum_router: Add extack messages for RIF and VRF overflow
From: David Ahern @ 2017-10-10 16:41 UTC (permalink / raw)
  To: netdev; +Cc: jiri, idosch, kjlx, David Ahern

Currently, exceeding the number of VRF instances or the number of router
interfaces either fails with a non-intuitive EBUSY:
    $ ip li set swp1s1.6 vrf vrf-1s1-6 up
    RTNETLINK answers: Device or resource busy

or fails silently (IPv6) since the checks are done in a work queue. This
set adds support for the address validator notifier to spectrum which
allows ext-ack based messages to be returned on failure.

To make that happen the IPv6 version needs to be converted from atomic
to blocking (patch 1), and then support for extack needs to be added
to the notifier (patch 2). Patches 3 and 4 add the validator notifier
to spectrum and then plumb the extack argument.

With this set, VRF overflows fail with:
   $ ip li set swp1s1.6 vrf vrf-1s1-6 up
   Error: spectrum: Exceeded number of supported VRF.

and RIF overflows fail with:
   $ ip addr add dev swp1s2.191 10.12.191.1/24
   Error: spectrum: Exceeded number of supported router interfaces.

David Ahern (4):
  net: ipv6: Make inet6addr_validator a blocking notifier
  net: Add extack to validator_info structs used for address notifier
  mlxsw: spectrum: router: Add support for address validator notifier
  mlxsw: spectrum_router: Add extack message for RIF and VRF overflow

 drivers/net/ethernet/mellanox/mlxsw/spectrum.c     |  10 ++
 drivers/net/ethernet/mellanox/mlxsw/spectrum.h     |   4 +
 .../net/ethernet/mellanox/mlxsw/spectrum_router.c  | 163 +++++++++++++++------
 drivers/net/ipvlan/ipvlan_main.c                   |  10 +-
 include/linux/inetdevice.h                         |   1 +
 include/net/addrconf.h                             |   1 +
 net/ipv4/devinet.c                                 |   8 +-
 net/ipv6/addrconf.c                                |  51 ++++---
 net/ipv6/addrconf_core.c                           |   9 +-
 9 files changed, 184 insertions(+), 73 deletions(-)

-- 
2.1.4

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox