Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH 2/3] x_tables: Use also dev->ifalias for interface matching
From: Eric Dumazet @ 2015-01-12 16:51 UTC (permalink / raw)
  To: Patrick Schaaf
  Cc: Richard Weinberger, davem, coreteam, netfilter-devel,
	linux-kernel, netdev, bhutchings, john.fastabend, herbert,
	vyasevic, jiri, vfalico, therbert, edumazet, yoshfuji, jmorris,
	kuznet, kadlec, kaber, pablo, kay, stephen
In-Reply-To: <1425960.ovH4s7sjue@rofl>

On Mon, 2015-01-12 at 17:39 +0100, Patrick Schaaf wrote:
> > iptables should have used ifindex, its sad we allowed the substring
> 
> > match in first place.
> 
>  
> 
> Not to comment on the ifalias thing, which I think is unneccessary,
> too, but matching on interface names instead of only ifindex, is
> definitely needed, so that one can establish a full ruleset before
> interfaces even exist. That's good practise at boottime, but also
> needed for dynamic interface creation during runtime.
> 
>  
> 
> A pure ifindex-during-packet-inspection approach might still work, but
> the ruleset must IMO keep the interface names. Maybe register them in
> a hash, keyed by name, with values an ifindex or ifindex set (for
> wildcard names), plus a reverse mapping from active ifindices to all
> places in these hash values where an ifindex has been remembered. On
> interface creation / destruction that structure could then be updated,
> and active packet filtering rules would refer to (and keep a refcount
> on) specific hash elements.
> 
Please do not send html messages : Your reply did not reach the lists.

Then, all you mention could have been solved by proper userspace
support.

Every time you add an interface or change device name, you could change
firewalls rules if needed. Nothing shocking here.

The ruleset can indeed mention interface names, but the kernel part
really should not care about names, which are a 'human' convenient way
to represent things.

^ permalink raw reply

* Re: IPsec workshop at netdev01?
From: Nicolas Dichtel @ 2015-01-12 17:19 UTC (permalink / raw)
  To: Florian Westphal, Steffen Klassert
  Cc: netdev, Jamal Hadi Salim, Herbert Xu, David Miller
In-Reply-To: <20150107125554.GF11324@breakpoint.cc>

Le 07/01/2015 13:55, Florian Westphal a écrit :
> Steffen Klassert <steffen.klassert@secunet.com> wrote:
>> On Tue, Jan 06, 2015 at 06:00:26PM +0100, Florian Westphal wrote:
>>> Steffen Klassert <steffen.klassert@secunet.com> wrote:
>>>> - We still lack a 32/64 bit compatibiltiy layer for IPsec, this issue
>>>>    comes up from time to time. Some solutions were proposed in the past
>>>>    but all had problems. The current behaviour is broken if someone tries
>>>>    to configure IPsec with 32 bit tools on a 64 bit machine. Can we get
>>>>    this right somehow or is it better to just return an error in this case?
>>>
>>> FWIW I think
>>> http://patchwork.ozlabs.org/patch/49465/
>>>
>>> came closest to achieving full CONFIG_COMPAT support; since netlink is
>>> no longer async now I'm not sure we'd still need additonal 32-compat syscalls
>>> to make compat work for all cases.
>>>
>>> So "its ugly as hell" is probably the only problem that is hard to avoid ;-)
>>
>> Yeah, and it will be no fun to maintain it...
>
> Not sure, you'd have to make sure that no new attributes introduce need
> to add another compat hack.
>
> The best argument against supporting it is that this problem
> has existed for so long that there arguably isn't much demand
> (else, such patch would have been merged years ago).
In fact, there is regularly some proposals to fix this, but I think that
nobody has taken the time to make a patch that satisfies everybody.
There is certainly a number of "private" patch for this problem, hence it
can be good to have a consensus on this topic.


Regards,
Nicolas

^ permalink raw reply

* Re: [PATCH 2/3] x_tables: Use also dev->ifalias for interface matching
From: Patrick Schaaf @ 2015-01-12 17:19 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Richard Weinberger, davem, coreteam, netfilter-devel,
	linux-kernel, netdev, bhutchings, john.fastabend, herbert,
	vyasevic, jiri, vfalico, therbert, edumazet, yoshfuji, jmorris,
	kuznet, kadlec, kaber, pablo, kay, stephen
In-Reply-To: <1421081514.4099.14.camel@edumazet-glaptop2.roam.corp.google.com>

On Monday 12 January 2015 08:51:54 Eric Dumazet wrote:
> On Mon, 2015-01-12 at 17:39 +0100, Patrick Schaaf wrote:
> > 
> > Not to comment on the ifalias thing, which I think is unneccessary,
> > too, but matching on interface names instead of only ifindex, is
> > definitely needed, so that one can establish a full ruleset before
> > interfaces even exist. That's good practise at boottime, but also
> > needed for dynamic interface creation during runtime.
> 
> Please do not send html messages : Your reply did not reach the lists.

Sigh. Sorry...

> Then, all you mention could have been solved by proper userspace
> support.
> 
> Every time you add an interface or change device name, you could change
> firewalls rules if needed. Nothing shocking here.

That is totally impractical, IMO.

Interfaces come and go through many different actions. There's the admin 
downing and upping stuff like bridges or bonds. There's stuff like libvirt / 
KVM / qemu creating and destroying interfaces. In all these cases, in my 
practise, I give the interfaces useful names to that I can prefix-match them 
in iptables rules.

Dynamically modifying the ruleset for each such creation and destruction, 
would be a huge burden. The base ruleset would need suitable "hooks" where 
these rules were inserted (ordering matters!). The addition would hardly be 
atomic (with traditional iptables, unless done by generating a whole new 
ruleset and restoring). The programs (e.g. libvirt) would need to be able to 
call out to these specially crafted rule generator scripts. The admin would 
need to add them as pre/post actions to their static (manual) interface 
configuration. Loading and looking at the ruleset before bringing up the 
interface would be impossible.

Note that I do fully agree that it's sad that iptables rules waste all that 
memory for each and every rule! I remember musing about improving that in 
talks with Harald Welte back in the 90ies. A simple match would be perfectly 
fine for me. Only having ifindex support, isn't.

best regards
  Patrick

^ permalink raw reply

* Re: [PATCH 2/3] x_tables: Use also dev->ifalias for interface matching
From: Patrick McHardy @ 2015-01-12 17:22 UTC (permalink / raw)
  To: Patrick Schaaf
  Cc: Eric Dumazet, Richard Weinberger, davem, coreteam,
	netfilter-devel, linux-kernel, netdev, bhutchings, john.fastabend,
	herbert, vyasevic, jiri, vfalico, therbert, edumazet, yoshfuji,
	jmorris, kuznet, kadlec, pablo, kay, stephen
In-Reply-To: <2868544.UBk2Y85taW@rofl>

On 12.01, Patrick Schaaf wrote:
> On Monday 12 January 2015 08:51:54 Eric Dumazet wrote:
> > On Mon, 2015-01-12 at 17:39 +0100, Patrick Schaaf wrote:
> > > 
> > > Not to comment on the ifalias thing, which I think is unneccessary,
> > > too, but matching on interface names instead of only ifindex, is
> > > definitely needed, so that one can establish a full ruleset before
> > > interfaces even exist. That's good practise at boottime, but also
> > > needed for dynamic interface creation during runtime.
> > 
> > Please do not send html messages : Your reply did not reach the lists.
> 
> Sigh. Sorry...
> 
> > Then, all you mention could have been solved by proper userspace
> > support.
> > 
> > Every time you add an interface or change device name, you could change
> > firewalls rules if needed. Nothing shocking here.
> 
> That is totally impractical, IMO.
> 
> Interfaces come and go through many different actions. There's the admin 
> downing and upping stuff like bridges or bonds. There's stuff like libvirt / 
> KVM / qemu creating and destroying interfaces. In all these cases, in my 
> practise, I give the interfaces useful names to that I can prefix-match them 
> in iptables rules.
> 
> Dynamically modifying the ruleset for each such creation and destruction, 
> would be a huge burden. The base ruleset would need suitable "hooks" where 
> these rules were inserted (ordering matters!). The addition would hardly be 
> atomic (with traditional iptables, unless done by generating a whole new 
> ruleset and restoring). The programs (e.g. libvirt) would need to be able to 
> call out to these specially crafted rule generator scripts. The admin would 
> need to add them as pre/post actions to their static (manual) interface 
> configuration. Loading and looking at the ruleset before bringing up the 
> interface would be impossible.

devgroups seem like the best solution for this.

^ permalink raw reply

* Re: [PATCH 2/6] vxlan: Group Policy extension
From: Nicolas Dichtel @ 2015-01-12 17:37 UTC (permalink / raw)
  To: Thomas Graf, davem, jesse, stephen, pshelar, therbert,
	alexei.starovoitov
  Cc: netdev, dev
In-Reply-To: <7339e3bff124cecaf65cd04ea9bdc973c730ba34.1420756324.git.tgraf@suug.ch>

Le 08/01/2015 23:47, Thomas Graf a écrit :
> Implements supports for the Group Policy VXLAN extension [0] to provide
> a lightweight and simple security label mechanism across network peers
> based on VXLAN. The security context and associated metadata is mapped
> to/from skb->mark. This allows further mapping to a SELinux context
> using SECMARK, to implement ACLs directly with nftables, iptables, OVS,
> tc, etc.
>
> The group membership is defined by the lower 16 bits of skb->mark, the
> upper 16 bits are used for flags.
>
> SELinux allows to manage label to secure local resources. However,
> distributed applications require ACLs to implemented across hosts. This
> is typically achieved by matching on L2-L4 fields to identify the
> original sending host and process on the receiver. On top of that,
> netlabel and specifically CIPSO [1] allow to map security contexts to
> universal labels.  However, netlabel and CIPSO are relatively complex.
> This patch provides a lightweight alternative for overlay network
> environments with a trusted underlay. No additional control protocol
> is required.
>
>             Host 1:                       Host 2:
>
>        Group A        Group B        Group B     Group A
>        +-----+   +-------------+    +-------+   +-----+
>        | lxc |   | SELinux CTX |    | httpd |   | VM  |
>        +--+--+   +--+----------+    +---+---+   +--+--+
> 	  \---+---/                     \----+---/
> 	      |                              |
> 	  +---+---+                      +---+---+
> 	  | vxlan |                      | vxlan |
> 	  +---+---+                      +---+---+
> 	      +------------------------------+
>
> Backwards compatibility:
> A VXLAN-GBP socket can receive standard VXLAN frames and will assign
> the default group 0x0000 to such frames. A Linux VXLAN socket will
> drop VXLAN-GBP  frames. The extension is therefore disabled by default
> and needs to be specifically enabled:
>
>     ip link add [...] type vxlan [...] gbp
>
> In a mixed environment with VXLAN and VXLAN-GBP sockets, the GBP socket
> must run on a separate port number.
>
> Examples:
>   iptables:
>    host1# iptables -I OUTPUT -m owner --uid-owner 101 -j MARK --set-mark 0x200
>    host2# iptables -I INPUT -m mark --mark 0x200 -j DROP
>
>   OVS:
>    # ovs-ofctl add-flow br0 'in_port=1,actions=load:0x200->NXM_NX_TUN_GBP_ID[],NORMAL'
>    # ovs-ofctl add-flow br0 'in_port=2,tun_gbp_id=0x200,actions=drop'
>
> [0] https://tools.ietf.org/html/draft-smith-vxlan-group-policy
> [1] http://lwn.net/Articles/204905/
>
> Signed-off-by: Thomas Graf <tgraf@suug.ch>
> ---
> v2:
>   - split GBP header definition into separate struct vxlanhdr_gbp as requested
>     by Alexei
>
>   drivers/net/vxlan.c           | 161 ++++++++++++++++++++++++++++++------------
>   include/net/vxlan.h           |  73 +++++++++++++++++--
>   include/uapi/linux/if_link.h  |   8 +++
>   net/openvswitch/vport-vxlan.c |   9 ++-
>   4 files changed, 198 insertions(+), 53 deletions(-)
>
> diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
> index 4d52aa9..b148739 100644
> --- a/drivers/net/vxlan.c
> +++ b/drivers/net/vxlan.c
> @@ -132,6 +132,7 @@ struct vxlan_dev {
>   	__u8		  tos;		/* TOS override */
>   	__u8		  ttl;
>   	u32		  flags;	/* VXLAN_F_* in vxlan.h */
> +	u32		  exts;		/* Enabled extensions */
>
>   	struct work_struct sock_work;
>   	struct work_struct igmp_join;
> @@ -568,7 +569,8 @@ static struct sk_buff **vxlan_gro_receive(struct sk_buff **head, struct sk_buff
>   			continue;
>
>   		vh2 = (struct vxlanhdr *)(p->data + off_vx);
> -		if (vh->vx_vni != vh2->vx_vni) {
> +		if (vh->vx_flags != vh2->vx_flags ||
> +		    vh->vx_vni != vh2->vx_vni) {
>   			NAPI_GRO_CB(p)->same_flow = 0;
>   			continue;
>   		}
> @@ -1095,6 +1097,7 @@ static int vxlan_udp_encap_recv(struct sock *sk, struct sk_buff *skb)
>   {
>   	struct vxlan_sock *vs;
>   	struct vxlanhdr *vxh;
> +	struct vxlan_metadata md = {0};
>
>   	/* Need Vxlan and inner Ethernet header to be present */
>   	if (!pskb_may_pull(skb, VXLAN_HLEN))
> @@ -1113,6 +1116,22 @@ static int vxlan_udp_encap_recv(struct sock *sk, struct sk_buff *skb)
>   	if (vs->exts) {
>   		if (!vxh->vni_present)
>   			goto error_invalid_header;
> +
> +		if (vxh->gbp_present) {
> +			struct vxlanhdr_gbp *gbp;
> +
> +			if (!(vs->exts & VXLAN_EXT_GBP))
> +				goto error_invalid_header;
> +
> +			gbp = (struct vxlanhdr_gbp *)vxh;
> +			md.gbp = ntohs(gbp->policy_id);
> +
> +			if (gbp->dont_learn)
> +				md.gbp |= VXLAN_GBP_DONT_LEARN;
> +
> +			if (gbp->policy_applied)
> +				md.gbp |= VXLAN_GBP_POLICY_APPLIED;
> +		}
>   	} else {
>   		if (vxh->vx_flags != htonl(VXLAN_FLAGS) ||
>   		    (vxh->vx_vni & htonl(0xff)))
> @@ -1122,7 +1141,8 @@ static int vxlan_udp_encap_recv(struct sock *sk, struct sk_buff *skb)
>   	if (iptunnel_pull_header(skb, VXLAN_HLEN, htons(ETH_P_TEB)))
>   		goto drop;
>
> -	vs->rcv(vs, skb, vxh->vx_vni);
> +	md.vni = vxh->vx_vni;
> +	vs->rcv(vs, skb, &md);
>   	return 0;
>
>   drop:
> @@ -1138,8 +1158,8 @@ error:
>   	return 1;
>   }
>
> -static void vxlan_rcv(struct vxlan_sock *vs,
> -		      struct sk_buff *skb, __be32 vx_vni)
> +static void vxlan_rcv(struct vxlan_sock *vs, struct sk_buff *skb,
> +		      struct vxlan_metadata *md)
>   {
>   	struct iphdr *oip = NULL;
>   	struct ipv6hdr *oip6 = NULL;
> @@ -1150,7 +1170,7 @@ static void vxlan_rcv(struct vxlan_sock *vs,
>   	int err = 0;
>   	union vxlan_addr *remote_ip;
>
> -	vni = ntohl(vx_vni) >> 8;
> +	vni = ntohl(md->vni) >> 8;
>   	/* Is this VNI defined? */
>   	vxlan = vxlan_vs_find_vni(vs, vni);
>   	if (!vxlan)
> @@ -1184,6 +1204,7 @@ static void vxlan_rcv(struct vxlan_sock *vs,
>   		goto drop;
>
>   	skb_reset_network_header(skb);
> +	skb->mark = md->gbp;
>
>   	if (oip6)
>   		err = IP6_ECN_decapsulate(oip6, skb);
> @@ -1533,15 +1554,57 @@ static bool route_shortcircuit(struct net_device *dev, struct sk_buff *skb)
>   	return false;
>   }
>
> +static int vxlan_build_hdr(struct sk_buff *skb, struct vxlan_sock *vs,
> +			   int min_headroom, struct vxlan_metadata *md)
> +{
> +	struct vxlanhdr *vxh;
> +	int err;
> +
> +	/* Need space for new headers (invalidates iph ptr) */
> +	err = skb_cow_head(skb, min_headroom);
> +	if (unlikely(err)) {
> +		kfree_skb(skb);
> +		return err;
> +	}
> +
> +	skb = vlan_hwaccel_push_inside(skb);
> +	if (WARN_ON(!skb))
> +		return -ENOMEM;
> +
> +	vxh = (struct vxlanhdr *)__skb_push(skb, sizeof(*vxh));
> +	vxh->vx_flags = htonl(VXLAN_FLAGS);
> +	vxh->vx_vni = md->vni;
> +
> +	if (vs->exts)  {
> +		if (vs->exts & VXLAN_EXT_GBP) {
> +			struct vxlanhdr_gbp *gbp;
> +
> +			gbp = (struct vxlanhdr_gbp *)vxh;
> +			vxh->gbp_present = 1;
> +
> +			if (md->gbp & VXLAN_GBP_DONT_LEARN)
> +				gbp->dont_learn = 1;
> +
> +			if (md->gbp & VXLAN_GBP_POLICY_APPLIED)
> +				gbp->policy_applied = 1;
> +
> +			gbp->policy_id = htons(md->gbp & VXLAN_GBP_ID_MASK);
> +		}
> +	}
> +
> +	skb_set_inner_protocol(skb, htons(ETH_P_TEB));
> +
> +	return 0;
> +}
> +
>   #if IS_ENABLED(CONFIG_IPV6)
>   static int vxlan6_xmit_skb(struct vxlan_sock *vs,
>   			   struct dst_entry *dst, struct sk_buff *skb,
>   			   struct net_device *dev, struct in6_addr *saddr,
>   			   struct in6_addr *daddr, __u8 prio, __u8 ttl,
> -			   __be16 src_port, __be16 dst_port, __be32 vni,
> -			   bool xnet)
> +			   __be16 src_port, __be16 dst_port,
> +			   struct vxlan_metadata *md, bool xnet)
>   {
> -	struct vxlanhdr *vxh;
>   	int min_headroom;
>   	int err;
>   	bool udp_sum = !udp_get_no_check6_tx(vs->sock->sk);
> @@ -1558,24 +1621,9 @@ static int vxlan6_xmit_skb(struct vxlan_sock *vs,
>   			+ VXLAN_HLEN + sizeof(struct ipv6hdr)
>   			+ (vlan_tx_tag_present(skb) ? VLAN_HLEN : 0);
>
> -	/* Need space for new headers (invalidates iph ptr) */
> -	err = skb_cow_head(skb, min_headroom);
> -	if (unlikely(err)) {
> -		kfree_skb(skb);
> -		goto err;
> -	}
> -
> -	skb = vlan_hwaccel_push_inside(skb);
> -	if (WARN_ON(!skb)) {
> -		err = -ENOMEM;
> +	err = vxlan_build_hdr(skb, vs, min_headroom, md);
> +	if (err)
>   		goto err;
> -	}
> -
> -	vxh = (struct vxlanhdr *) __skb_push(skb, sizeof(*vxh));
> -	vxh->vx_flags = htonl(VXLAN_FLAGS);
> -	vxh->vx_vni = vni;
> -
> -	skb_set_inner_protocol(skb, htons(ETH_P_TEB));
>
>   	udp_tunnel6_xmit_skb(vs->sock, dst, skb, dev, saddr, daddr, prio,
>   			     ttl, src_port, dst_port);
> @@ -1589,9 +1637,9 @@ err:
>   int vxlan_xmit_skb(struct vxlan_sock *vs,
>   		   struct rtable *rt, struct sk_buff *skb,
>   		   __be32 src, __be32 dst, __u8 tos, __u8 ttl, __be16 df,
> -		   __be16 src_port, __be16 dst_port, __be32 vni, bool xnet)
> +		   __be16 src_port, __be16 dst_port,
> +		   struct vxlan_metadata *md, bool xnet)
>   {
> -	struct vxlanhdr *vxh;
>   	int min_headroom;
>   	int err;
>   	bool udp_sum = !vs->sock->sk->sk_no_check_tx;
> @@ -1604,22 +1652,9 @@ int vxlan_xmit_skb(struct vxlan_sock *vs,
>   			+ VXLAN_HLEN + sizeof(struct iphdr)
>   			+ (vlan_tx_tag_present(skb) ? VLAN_HLEN : 0);
>
> -	/* Need space for new headers (invalidates iph ptr) */
> -	err = skb_cow_head(skb, min_headroom);
> -	if (unlikely(err)) {
> -		kfree_skb(skb);
> +	err = vxlan_build_hdr(skb, vs, min_headroom, md);
> +	if (err)
>   		return err;
> -	}
> -
> -	skb = vlan_hwaccel_push_inside(skb);
> -	if (WARN_ON(!skb))
> -		return -ENOMEM;
> -
> -	vxh = (struct vxlanhdr *) __skb_push(skb, sizeof(*vxh));
> -	vxh->vx_flags = htonl(VXLAN_FLAGS);
> -	vxh->vx_vni = vni;
> -
> -	skb_set_inner_protocol(skb, htons(ETH_P_TEB));
>
>   	return udp_tunnel_xmit_skb(vs->sock, rt, skb, src, dst, tos,
>   				   ttl, df, src_port, dst_port, xnet);
> @@ -1679,6 +1714,7 @@ static void vxlan_xmit_one(struct sk_buff *skb, struct net_device *dev,
>   	const struct iphdr *old_iph;
>   	struct flowi4 fl4;
>   	union vxlan_addr *dst;
> +	struct vxlan_metadata md;
>   	__be16 src_port = 0, dst_port;
>   	u32 vni;
>   	__be16 df = 0;
> @@ -1749,11 +1785,12 @@ static void vxlan_xmit_one(struct sk_buff *skb, struct net_device *dev,
>
>   		tos = ip_tunnel_ecn_encap(tos, old_iph, skb);
>   		ttl = ttl ? : ip4_dst_hoplimit(&rt->dst);
> +		md.vni = htonl(vni << 8);
> +		md.gbp = skb->mark;
>
>   		err = vxlan_xmit_skb(vxlan->vn_sock, rt, skb,
>   				     fl4.saddr, dst->sin.sin_addr.s_addr,
> -				     tos, ttl, df, src_port, dst_port,
> -				     htonl(vni << 8),
> +				     tos, ttl, df, src_port, dst_port, &md,
>   				     !net_eq(vxlan->net, dev_net(vxlan->dev)));
>   		if (err < 0) {
>   			/* skb is already freed. */
> @@ -1806,10 +1843,12 @@ static void vxlan_xmit_one(struct sk_buff *skb, struct net_device *dev,
>   		}
>
>   		ttl = ttl ? : ip6_dst_hoplimit(ndst);
> +		md.vni = htonl(vni << 8);
> +		md.gbp = skb->mark;
>
>   		err = vxlan6_xmit_skb(vxlan->vn_sock, ndst, skb,
>   				      dev, &fl6.saddr, &fl6.daddr, 0, ttl,
> -				      src_port, dst_port, htonl(vni << 8),
> +				      src_port, dst_port, &md,
>   				      !net_eq(vxlan->net, dev_net(vxlan->dev)));
>   #endif
>   	}
> @@ -2210,6 +2249,11 @@ static const struct nla_policy vxlan_policy[IFLA_VXLAN_MAX + 1] = {
>   	[IFLA_VXLAN_UDP_CSUM]	= { .type = NLA_U8 },
>   	[IFLA_VXLAN_UDP_ZERO_CSUM6_TX]	= { .type = NLA_U8 },
>   	[IFLA_VXLAN_UDP_ZERO_CSUM6_RX]	= { .type = NLA_U8 },
> +	[IFLA_VXLAN_EXTENSION]	= { .type = NLA_NESTED },
> +};
> +
> +static const struct nla_policy vxlan_ext_policy[IFLA_VXLAN_EXT_MAX + 1] = {
> +	[IFLA_VXLAN_EXT_GBP]	= { .type = NLA_FLAG, },
>   };
>
>   static int vxlan_validate(struct nlattr *tb[], struct nlattr *data[])
> @@ -2246,6 +2290,18 @@ static int vxlan_validate(struct nlattr *tb[], struct nlattr *data[])
>   		}
>   	}
>
> +	if (data[IFLA_VXLAN_EXTENSION]) {
> +		int err;
> +
> +		err = nla_validate_nested(data[IFLA_VXLAN_EXTENSION],
> +					  IFLA_VXLAN_EXT_MAX, vxlan_ext_policy);
> +		if (err < 0) {
> +			pr_debug("invalid VXLAN extension configuration: %d\n",
> +				 err);
> +			return -EINVAL;
> +		}
> +	}
> +
>   	return 0;
>   }
>
> @@ -2400,6 +2456,18 @@ static void vxlan_sock_work(struct work_struct *work)
>   	dev_put(vxlan->dev);
>   }
>
> +static void configure_vxlan_exts(struct vxlan_dev *vxlan, struct nlattr *attr)
> +{
> +	struct nlattr *exts[IFLA_VXLAN_EXT_MAX+1];
> +
> +	/* Validated in vxlan_validate() */
> +	if (nla_parse_nested(exts, IFLA_VXLAN_EXT_MAX, attr, NULL) < 0)
> +		BUG();
> +
> +	if (exts[IFLA_VXLAN_EXT_GBP])
> +		vxlan->exts |= VXLAN_EXT_GBP;
> +}
> +
>   static int vxlan_newlink(struct net *net, struct net_device *dev,
>   			 struct nlattr *tb[], struct nlattr *data[])
>   {
> @@ -2525,6 +2593,9 @@ static int vxlan_newlink(struct net *net, struct net_device *dev,
>   	    nla_get_u8(data[IFLA_VXLAN_UDP_ZERO_CSUM6_RX]))
>   		vxlan->flags |= VXLAN_F_UDP_ZERO_CSUM6_RX;
>
> +	if (data[IFLA_VXLAN_EXTENSION])
> +		configure_vxlan_exts(vxlan, data[IFLA_VXLAN_EXTENSION]);
> +
Can you also update vxlan_fill_info() so that these new attributes can be dumped 
via netlink?


Thank you,
Nicolas

^ permalink raw reply

* Re: [PATCH 2/3] x_tables: Use also dev->ifalias for interface matching
From: Patrick Schaaf @ 2015-01-12 17:41 UTC (permalink / raw)
  To: Patrick McHardy
  Cc: Eric Dumazet, Richard Weinberger, davem, coreteam,
	netfilter-devel, linux-kernel, netdev, bhutchings, john.fastabend,
	herbert, vyasevic, jiri, vfalico, therbert, edumazet, yoshfuji,
	jmorris, kuznet, kadlec, pablo, kay, stephen
In-Reply-To: <20150112172257.GG17329@acer.localdomain>

On Monday 12 January 2015 17:22:57 Patrick McHardy wrote:
> On 12.01, Patrick Schaaf wrote:
> >
> > Interfaces come and go through many different actions. There's the admin
> > downing and upping stuff like bridges or bonds. There's stuff like libvirt
> > / KVM / qemu creating and destroying interfaces. In all these cases, in
> > my practise, I give the interfaces useful names to that I can
> > prefix-match them in iptables rules.
> > 
> > Dynamically modifying the ruleset for each such creation and destruction,
> > would be a huge burden. The base ruleset would need suitable "hooks" where
> > these rules were inserted (ordering matters!). The addition would hardly
> > be
> > atomic (with traditional iptables, unless done by generating a whole new
> > ruleset and restoring). The programs (e.g. libvirt) would need to be able
> > to call out to these specially crafted rule generator scripts. The admin
> > would need to add them as pre/post actions to their static (manual)
> > interface configuration. Loading and looking at the ruleset before
> > bringing up the interface would be impossible.
> 
> devgroups seem like the best solution for this.

Could be, technically.

Is there devgroup support in libvirt, ifcfg, whatever other distros use for 
their static interface configuration? Or, do I again have to write pre/post 
scripts to set devgroups? Wouldn't bother me too much nowadays, I've automated 
that for ifcfg style stuff in my production environment a year ago, but it's 
something an admin must actively manage...

There is other stuff, apart from libvirt, that creates and destroys interfaces 
on the fly. From my production environment, there's at least keepalived, which 
creates macvlan interfaces on the fly for VRRP VMAC support. I can configure 
the name for that, but nothing else, nor can I call a script pre/post for 
that. And my iptables rules on that boxen _do_ match specially on these 
interfaces.

Gooling a bit around does not immediately turn up any good documentation on it 
at all (four year old iproute2 commits, once I give that as a search term 
too?). Looks very sketchy (although the fundamental idea is clear to me. I'm 
looking through the normal admin practise lens....)

best regards
  Patrick

^ permalink raw reply

* Re: [PATCH 1/5] net: atarilance: Remove obsolete IRQ_TYPE_PRIO
From: David Miller @ 2015-01-12 17:56 UTC (permalink / raw)
  To: geert
  Cc: plagnioj, tomi.valkeinen, perex, tiwai, linux-m68k, linux-kernel,
	netdev
In-Reply-To: <1421052021-12560-2-git-send-email-geert@linux-m68k.org>

From: Geert Uytterhoeven <geert@linux-m68k.org>
Date: Mon, 12 Jan 2015 09:40:17 +0100

> IRQ_TYPE_PRIO is no longer used by the Atari platform interrupt code
> since commit 734085651c9b80aa ("[PATCH] m68k: convert atari irq code")
> in v2.6.18-rc1, so drop it.
> 
> Note that its value has been reused for a different purpose
> (IRQ_TYPE_EDGE_FALLING) since commit 6a6de9ef5850d063 ("[PATCH] genirq:
> core") in v2.6.18-rc1.
> 
> Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>

Acked-by: David S. Miller <davem@davemloft.net>

^ permalink raw reply

* Re: [PATCH 2/6] vxlan: Group Policy extension
From: David Miller @ 2015-01-12 17:59 UTC (permalink / raw)
  To: nicolas.dichtel
  Cc: tgraf, jesse, stephen, pshelar, therbert, alexei.starovoitov,
	netdev, dev
In-Reply-To: <54B40661.9020408@6wind.com>


Can you PLEASE, PLEASE, not quote and entire full patch just to add two
lines of commentary.

Quote _only_ the _RELEVANT_ portions of the email you are replying to.

^ permalink raw reply

* Re: [PATCH 2/6] vxlan: Group Policy extension
From: Tom Herbert @ 2015-01-12 18:14 UTC (permalink / raw)
  To: Thomas Graf
  Cc: David Miller, Jesse Gross, Stephen Hemminger, Pravin B Shelar,
	Alexei Starovoitov, Linux Netdev List, dev@openvswitch.org
In-Reply-To: <7339e3bff124cecaf65cd04ea9bdc973c730ba34.1420756324.git.tgraf@suug.ch>

> diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
> index f7d0d2d..9f07bf5 100644
> --- a/include/uapi/linux/if_link.h
> +++ b/include/uapi/linux/if_link.h
> @@ -370,10 +370,18 @@ enum {
>         IFLA_VXLAN_UDP_CSUM,
>         IFLA_VXLAN_UDP_ZERO_CSUM6_TX,
>         IFLA_VXLAN_UDP_ZERO_CSUM6_RX,
> +       IFLA_VXLAN_EXTENSION,
>         __IFLA_VXLAN_MAX
>  };
>  #define IFLA_VXLAN_MAX (__IFLA_VXLAN_MAX - 1)
>
> +enum {
> +       IFLA_VXLAN_EXT_UNSPEC,
> +       IFLA_VXLAN_EXT_GBP,
> +       __IFLA_VXLAN_EXT_MAX,
> +};
> +#define IFLA_VXLAN_EXT_MAX (__IFLA_VXLAN_EXT_MAX - 1)
> +

Creating a level of indirection for extensions seems overly
complicated to me. Why not just define IFLA_VXLAN_GBP as just another
enum above?

Tom

^ permalink raw reply

* Re: Query regarding sk_filter
From: Alexei Starovoitov @ 2015-01-12 18:17 UTC (permalink / raw)
  To: Kumar Sanghvi; +Cc: netdev@vger.kernel.org

On Sat, Jan 10, 2015 at 4:40 AM, Kumar Sanghvi <kumaras@chelsio.com> wrote:
>
> May be, the question that I should have asked is : do child socket and parent listening socket have the
> same sk_filter applied i.e. do child socket inherit sk_filter from parent listening socket ?

yes. socket returned from accept() inherits filter from parent socket.
if you're curious the path is:
tcp_check_req - tcp_v4_syn_recv_sock - tcp_create_openreq_child - sk_clone_lock

^ permalink raw reply

* Re: [PATCH net-next v11 3/3] net: hisilicon: new hip04 ethernet driver
From: Alexander Graf @ 2015-01-12 18:25 UTC (permalink / raw)
  To: Ding Tianhong, arnd, robh+dt, davem, grant.likely
  Cc: sergei.shtylyov, linux-arm-kernel, eric.dumazet, xuwei5,
	zhangfei.gao, netdev, devicetree, linux
In-Reply-To: <1421049832-6224-4-git-send-email-dingtianhong@huawei.com>

On 12.01.15 09:03, Ding Tianhong wrote:
> Support Hisilicon hip04 ethernet driver, including 100M / 1000M controller.
> The controller has no tx done interrupt, reclaim xmitted buffer in the poll.
> 
> v11: Add ethtool support for tx coalecse getting and setting, the xmit_more
> is not supported for this patch, but I think it could work for hip04,
> will support it later after some tests for performance better.
> 
> Here are some performance test results by ping and iperf(add tx_coalesce_frames/users),
> it looks that the performance and latency is more better by tx_coalesce_frames/usecs.
> 
> - Before:
> $ ping 192.168.1.1 ...
> --- 192.168.1.1 ping statistics ---

Writing --- directly into your patch description is usually a pretty bad
idea. Git am cuts off everything that comes after --- so your patch
description ends here without manual intervention ;).

> 24 packets transmitted, 24 received, 0% packet loss, time 22999ms
> rtt min/avg/max/mdev = 0.180/0.202/0.403/0.043 ms
> 
> $ iperf -c 192.168.1.1 ...
> [ ID] Interval       Transfer     Bandwidth
> [  3]  0.0- 1.0 sec   115 MBytes   945 Mbits/sec
> 
> - After:
> $ ping 192.168.1.1 ...
> --- 192.168.1.1 ping statistics ---
> 24 packets transmitted, 24 received, 0% packet loss, time 22999ms
> rtt min/avg/max/mdev = 0.178/0.190/0.380/0.041 ms
> 
> $ iperf -c 192.168.1.1 ...
> [ ID] Interval       Transfer     Bandwidth
> [  3]  0.0- 1.0 sec   115 MBytes   965 Mbits/sec
> 
> v10: According David Miller and Arnd Bergmann's suggestion, add some modification

Version history however should go after a --- line, so that it doesn't
show up in the patch description in the tree.

> for v9 version
> - drop the workqueue
> - batch cleanup based on tx_coalesce_frames/usecs for better throughput
> - use a reasonable default tx timeout (200us, could be shorted
>   based on measurements) with a range timer
> - fix napi poll function return value
> - use a lockless queue for cleanup
> 
> Signed-off-by: Zhangfei Gao <zhangfei.gao@linaro.org>
> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
> Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
> ---


[...]

> +static int hip04_remove(struct platform_device *pdev)
> +{
> +	struct net_device *ndev = platform_get_drvdata(pdev);
> +	struct hip04_priv *priv = netdev_priv(ndev);
> +	struct device *d = &pdev->dev;
> +
> +	if (priv->phy)
> +		phy_disconnect(priv->phy);
> +
> +	hip04_free_ring(ndev, d);
> +	unregister_netdev(ndev);
> +	free_irq(ndev->irq, ndev);
> +	of_node_put(priv->phy_node);
> +	cancel_work_sync(&priv->tx_timeout_task);
> +	free_netdev(ndev);
> +
> +	return 0;
> +}
> +
> +static const struct of_device_id hip04_mac_match[] = {
> +	{ .compatible = "hisilicon,hip04-mac" },
> +	{ }
> +};

This is missing

MODULE_DEVICE_TABLE(of, hip04_mac_match);

to enable automatic module loading, no?


Alex

^ permalink raw reply

* Re: Query regarding sk_filter
From: Kumar Sanghvi @ 2015-01-12 18:00 UTC (permalink / raw)
  To: Alexei Starovoitov; +Cc: netdev@vger.kernel.org
In-Reply-To: <CAADnVQJx+6Pv0qTSkKgnW8aqCwVtz4oDaVC6tR-Df6XFFrjMDA@mail.gmail.com>

On Monday, January 01/12/15, 2015 at 10:17:16 -0800, Alexei Starovoitov wrote:
> On Sat, Jan 10, 2015 at 4:40 AM, Kumar Sanghvi <kumaras@chelsio.com> wrote:
> >
> > May be, the question that I should have asked is : do child socket and parent listening socket have the
> > same sk_filter applied i.e. do child socket inherit sk_filter from parent listening socket ?
> 
> yes. socket returned from accept() inherits filter from parent socket.
> if you're curious the path is:
> tcp_check_req - tcp_v4_syn_recv_sock - tcp_create_openreq_child - sk_clone_lock
>
Got it. Thanks! 

^ permalink raw reply

* Re: [PATCH RFC net-next] ip_tunnel: create percpu gro_cell
From: Martin Lau @ 2015-01-12 18:33 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev, kernel-team
In-Reply-To: <1420851759.5947.83.camel@edumazet-glaptop2.roam.corp.google.com>

On Fri, Jan 09, 2015 at 05:02:39PM -0800, Eric Dumazet wrote:
> On Fri, 2015-01-09 at 15:17 -0800, Martin Lau wrote:
> > Is the spin_lock() still needed?
> 
> Think about cpu hotplug, we can have interesting things here.
> 
> Basically no cost at all once mem is percpu.
Thanks, will post a v2.

--Martin

^ permalink raw reply

* Re: [PATCH v5] can: Convert to runtime_pm
From: Sören Brinkmann @ 2015-01-12 18:45 UTC (permalink / raw)
  To: Kedareswara rao Appana
  Cc: wg, mkl, michal.simek, grant.likely, robh+dt, linux-can, netdev,
	linux-arm-kernel, linux-kernel, devicetree,
	Kedareswara rao Appana
In-Reply-To: <baf987c11c0242c2bf87b81f9396b09d@BN1BFFO11FD018.protection.gbl>

On Mon, 2015-01-12 at 08:34PM +0530, Kedareswara rao Appana wrote:
> Instead of enabling/disabling clocks at several locations in the driver,
> Use the runtime_pm framework. This consolidates the actions for runtime PM
> In the appropriate callbacks and makes the driver more readable and mantainable.
> 
> Signed-off-by: Soren Brinkmann <soren.brinkmann@xilinx.com>
> Signed-off-by: Kedareswara rao Appana <appanad@xilinx.com>
> ---
> Changes for v5:
>  - Updated with the review comments.
>    Updated the remove fuction to use runtime_pm.
> Chnages for v4:
>  - Updated with the review comments.
> Changes for v3:
>   - Converted the driver to use runtime_pm.
> Changes for v2:
>   - Removed the struct platform_device* from suspend/resume
>     as suggest by Lothar.
> 
>  drivers/net/can/xilinx_can.c |  157 ++++++++++++++++++++++++++++-------------
>  1 files changed, 107 insertions(+), 50 deletions(-)
[..]
> +static int __maybe_unused xcan_runtime_resume(struct device *dev)
>  {
> -	struct platform_device *pdev = dev_get_drvdata(dev);
> -	struct net_device *ndev = platform_get_drvdata(pdev);
> +	struct net_device *ndev = dev_get_drvdata(dev);
>  	struct xcan_priv *priv = netdev_priv(ndev);
>  	int ret;
> +	u32 isr, status;
>  
>  	ret = clk_enable(priv->bus_clk);
>  	if (ret) {
> @@ -1014,15 +1030,28 @@ static int __maybe_unused xcan_resume(struct device *dev)
>  	ret = clk_enable(priv->can_clk);
>  	if (ret) {
>  		dev_err(dev, "Cannot enable clock.\n");
> -		clk_disable_unprepare(priv->bus_clk);
> +		clk_disable(priv->bus_clk);
[...]
> @@ -1173,12 +1219,23 @@ static int xcan_remove(struct platform_device *pdev)
>  {
>  	struct net_device *ndev = platform_get_drvdata(pdev);
>  	struct xcan_priv *priv = netdev_priv(ndev);
> +	int ret;
> +
> +	ret = pm_runtime_get_sync(&pdev->dev);
> +	if (ret < 0) {
> +		netdev_err(ndev, "%s: pm_runtime_get failed(%d)\n",
> +				__func__, ret);
> +		return ret;
> +	}
>  
>  	if (set_reset_mode(ndev) < 0)
>  		netdev_err(ndev, "mode resetting failed!\n");
>  
>  	unregister_candev(ndev);
> +	pm_runtime_disable(&pdev->dev);
>  	netif_napi_del(&priv->napi);
> +	clk_disable_unprepare(priv->bus_clk);
> +	clk_disable_unprepare(priv->can_clk);

Shouldn't pretty much all these occurrences of clk_disable/enable
disappear? This should all be handled by the runtime_pm framework now.

	Sören

^ permalink raw reply

* Re: [PATCH net] gso: do GSO for local skb with size bigger than MTU
From: Jesse Gross @ 2015-01-12 18:48 UTC (permalink / raw)
  To: Fan Du
  Cc: Du, Fan, Thomas Graf, davem@davemloft.net, Michael S. Tsirkin,
	Jason Wang, netdev@vger.kernel.org, fw@strlen.de,
	dev@openvswitch.org, pshelar@nicira.com
In-Reply-To: <54AF6A60.5090804@gmail.com>

On Thu, Jan 8, 2015 at 9:42 PM, Fan Du <fengyuleidian0615@gmail.com> wrote:
> 于 2015年01月09日 03:55, Jesse Gross 写道:
>
>> On Thu, Jan 8, 2015 at 1:39 AM, Fan Du <fengyuleidian0615@gmail.com>
>> wrote:
>>>
>>> 于 2015年01月08日 04:52, Jesse Gross 写道:
>>>>>
>>>>>
>>>>> My understanding is:
>>>>>>
>>>>>> controller sets the forwarding rules into kernel datapath, any flow
>>>>>> not
>>>>>> matching
>>>>>> with the rules are threw to controller by upcall. Once the rule
>>>>>> decision
>>>>>> is
>>>>>> made
>>>>>> by controller, then, this flow packet is pushed down to datapath to be
>>>>>> forwarded
>>>>>> again according to the new rule.
>>>>>>
>>>>>> So I'm not sure whether pushing the over-MTU-sized packet or pushing
>>>>>> the
>>>>>> forged ICMP
>>>>>> without encapsulation to controller is required by current ovs
>>>>>> implementation. By doing
>>>>>> so, such over-MTU-sized packet is treated as a event for the
>>>>>> controller
>>>>>> to
>>>>>> be take
>>>>>> care of.
>>>>
>>>>
>>>> If flows are implementing routing (again, they are doing things like
>>>> decrementing the TTL) then it is necessary for them to also handle
>>>> this situation using some potentially new primitives (like a size
>>>> check). Otherwise you end up with issues like the ones that I
>>>> mentioned above like needing to forge addresses because you don't know
>>>> what the correct ones are.
>>>
>>>
>>>
>>> Thanks for explaining, Jesse!
>>>
>>> btw, I don't get it about "to forge addresses", building ICMP message
>>> with Guest packet doesn't require to forge address when not encapsulating
>>> ICMP message with outer headers.
>>
>>
>> Your patch has things like this (for the inner IP header):
>>
>> +                               new_ip->saddr = orig_ip->daddr;
>> +                               new_ip->daddr = orig_ip->saddr;
>>
>> These addresses are owned by the endpoints, not the host generating
>> generating the ICMP message, so I would consider that to be forging
>> addresses.
>>
>>> If the flows aren't doing things to
>>>>
>>>>
>>>> implement routing, then you really have a flat L2 network and you
>>>> shouldn't be doing this type of behavior at all as I described in the
>>>> original plan.
>>>
>>>
>>>
>>> For flows implementing routing scenario:
>>> First of all, over-MTU-sized packet could only be detected once the flow
>>> as been consulted(each port could implement a 'check' hook to do this),
>>> and just before send to the actual port.
>>>
>>> Then pushing the over-MTU-sized packet back to controller, it's the
>>> controller
>>> who will will decide whether to build ICMP message, or whatever routing
>>> behaviour
>>> it may take. And sent it back with the port information. This ICMP
>>> message
>>> will
>>> travel back to Guest.
>>>
>>> Why does the flow has to use primitive like a "check size"? "check size"
>>> will only take effect after do_output. I'm not very clear with this
>>> approach.
>>
>>
>> Checking the size obviously needs to be an action that would take
>> place before outputting in order for it to have any effect. Attaching
>> a check to a port does not fit in very well with the other primitives
>> of OVS, so I think an action is the obvious place to put it.
>
>
> If flow is defined as:
>
>         CHECK_SIZE -> OUTPUT
>
> Then traversing actions at CHECK_SIZE needs to find the exactly OUTPUT port,
> thus get its underlay encapsulation method as well as valid route for
> physical
> NIC MTU, with those information can calculation whether GSOed packets
> exceeds physical MTU. This is feasible anyway at the first look. After this,
> it's the controller responsibility to handle such event.
>
> If the CHECK_SIZE returns positive(over-MTU-sized packets show up), then
> call
> output_userspace to push this packet upper wards.
>
> I'm not sure this vague idea is the expected behaviour as required by "L3
> processing".

Yes, I think that's the right path.

^ permalink raw reply

* Re: [PATCH net] gso: do GSO for local skb with size bigger than MTU
From: Jesse Gross @ 2015-01-12 18:55 UTC (permalink / raw)
  To: Fan Du
  Cc: Michael S. Tsirkin, Du, Fan, Thomas Graf, davem@davemloft.net,
	Jason Wang, netdev@vger.kernel.org, fw@strlen.de,
	dev@openvswitch.org, pshelar@nicira.com
In-Reply-To: <54AF6B9F.6080104@gmail.com>

On Thu, Jan 8, 2015 at 9:48 PM, Fan Du <fengyuleidian0615@gmail.com> wrote:
> 于 2015年01月09日 03:55, Jesse Gross 写道:
>>
>> On Thu, Jan 8, 2015 at 1:39 AM, Fan Du<fengyuleidian0615@gmail.com>
>> wrote:
>>
>>> >于 2015年01月08日 04:52, Jesse Gross 写道:
>>>>>
>>>>> >>>
>>>>> >>>My understanding is:
>>>>>>
>>>>>> >>> >controller sets the forwarding rules into kernel datapath, any
>>>>>> >>> > flow not
>>>>>> >>> >matching
>>>>>> >>> >with the rules are threw to controller by upcall. Once the rule
>>>>>> >>> > decision
>>>>>> >>> >is
>>>>>> >>> >made
>>>>>> >>> >by controller, then, this flow packet is pushed down to datapath
>>>>>> >>> > to be
>>>>>> >>> >forwarded
>>>>>> >>> >again according to the new rule.
>>>>>> >>> >
>>>>>> >>> >So I'm not sure whether pushing the over-MTU-sized packet or
>>>>>> >>> > pushing the
>>>>>> >>> >forged ICMP
>>>>>> >>> >without encapsulation to controller is required by current ovs
>>>>>> >>> >implementation. By doing
>>>>>> >>> >so, such over-MTU-sized packet is treated as a event for the
>>>>>> >>> > controller
>>>>>> >>> >to
>>>>>> >>> >be take
>>>>>> >>> >care of.
>>>>
>>>> >>
>>>> >>If flows are implementing routing (again, they are doing things like
>>>> >>decrementing the TTL) then it is necessary for them to also handle
>>>> >>this situation using some potentially new primitives (like a size
>>>> >>check). Otherwise you end up with issues like the ones that I
>>>> >>mentioned above like needing to forge addresses because you don't know
>>>> >>what the correct ones are.
>>>
>>> >
>>> >
>>> >Thanks for explaining, Jesse!
>>> >
>>> >btw, I don't get it about "to forge addresses", building ICMP message
>>> >with Guest packet doesn't require to forge address when not
>>> > encapsulating
>>> >ICMP message with outer headers.
>>
>> Your patch has things like this (for the inner IP header):
>>
>> +                               new_ip->saddr = orig_ip->daddr;
>> +                               new_ip->daddr = orig_ip->saddr;
>>
>> These addresses are owned by the endpoints, not the host generating
>> generating the ICMP message, so I would consider that to be forging
>> addresses.
>>
>>> >If the flows aren't doing things to
>>>>
>>>> >>
>>>> >>implement routing, then you really have a flat L2 network and you
>>>> >>shouldn't be doing this type of behavior at all as I described in the
>>>> >>original plan.
>>>
>>> >
>>> >
>>> >For flows implementing routing scenario:
>>> >First of all, over-MTU-sized packet could only be detected once the flow
>>> >as been consulted(each port could implement a 'check' hook to do this),
>>> >and just before send to the actual port.
>>> >
>>> >Then pushing the over-MTU-sized packet back to controller, it's the
>>> >controller
>>> >who will will decide whether to build ICMP message, or whatever routing
>>> >behaviour
>>> >it may take. And sent it back with the port information. This ICMP
>>> > message
>>> >will
>>> >travel back to Guest.
>>> >
>>> >Why does the flow has to use primitive like a "check size"? "check size"
>>> >will only take effect after do_output. I'm not very clear with this
>>> >approach.
>>
>> Checking the size obviously needs to be an action that would take
>> place before outputting in order for it to have any effect. Attaching
>> a check to a port does not fit in very well with the other primitives
>> of OVS, so I think an action is the obvious place to put it.
>>
>>> >And not all scenario involving flow with routing behaviour, just set up
>>> > a
>>> >vxlan tunnel, and attach KVM guest or Docker onto it for playing or
>>> >developing.
>>> >This wouldn't necessarily require user to set additional specific flows
>>> > to
>>> >make
>>> >over-MTU-sized packet pass through the tunnel correctly. In such
>>> > scenario, I
>>> >think the original patch in this thread to fragment tunnel packet is
>>> > still
>>> >needed
>>> >OR workout a generic component to build ICMP for all type tunnel in L2
>>> >level.
>>> >Both of those will act as a backup plan as there is no such specific
>>> > flow as
>>> >default.
>>
>> In these cases, we should find a way to adjust the MTU, preferably
>> automatically using virtio.
>
>
> I'm gonna to argue this a bit more here.
>
> virtio_net pose no limit at its simulated net device, actually it can fall
> into
> anywhere between 68 and 65535. Most importantly, virtio_net just simulates
> NIC,
> it just can’t assume/presume there is an encapsulating port at its
> downstream.
> How should virtio automatically adjust its upper guest MTU?

There are at least two parts to this:
 * Calculating the right MTU for the guest device.
 * Transferring the MTU from the host to the guest.

The first would presumably involve exposing some kind of API that the
component that does know the right value could program. In this case,
that component could be OVS using the same type of information that
you just described in the earlier post about L3. The API could simply
to just set the MTU of the device in the host and this gets mirrored
to the guest.

The second part I guess is probably a fairly straightforward extension
to virtio but I don't know the details.

^ permalink raw reply

* Re: [PATCH 2/6] vxlan: Group Policy extension
From: Jesse Gross @ 2015-01-12 19:23 UTC (permalink / raw)
  To: Thomas Graf
  Cc: David Miller, Stephen Hemminger, Pravin Shelar, Tom Herbert,
	Alexei Starovoitov, dev@openvswitch.org, netdev
In-Reply-To: <014606bb741d017675f7728a0bccc4cbed42af4b.1421064100.git.tgraf@suug.ch>

On Mon, Jan 12, 2015 at 4:26 AM, Thomas Graf <tgraf@suug.ch> wrote:
> diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
> index 4d52aa9..b148739 100644
> --- a/drivers/net/vxlan.c
> +++ b/drivers/net/vxlan.c
> @@ -568,7 +569,8 @@ static struct sk_buff **vxlan_gro_receive(struct sk_buff **head, struct sk_buff
>                         continue;
>
>                 vh2 = (struct vxlanhdr *)(p->data + off_vx);
> -               if (vh->vx_vni != vh2->vx_vni) {
> +               if (vh->vx_flags != vh2->vx_flags ||
> +                   vh->vx_vni != vh2->vx_vni) {

It's probably better to do a memcmp over the entire header. There's no
guarantee that new fields will be entirely represented by flags.

> diff --git a/net/openvswitch/vport-vxlan.c b/net/openvswitch/vport-vxlan.c
> index d7c46b3..dd68c97 100644
> --- a/net/openvswitch/vport-vxlan.c
> +++ b/net/openvswitch/vport-vxlan.c
> @@ -146,6 +147,7 @@ static int vxlan_tnl_send(struct vport *vport, struct sk_buff *skb)
>         struct vxlan_port *vxlan_port = vxlan_vport(vport);
>         __be16 dst_port = inet_sk(vxlan_port->vs->sock->sk)->inet_sport;
>         struct ovs_key_ipv4_tunnel *tun_key;
> +       struct vxlan_metadata md;

It might be a good idea to zero out 'md', even if not strictly required.

^ permalink raw reply

* [PATCH net-next] ipv6: directly include libc-compat.h in ipv6.h
From: Willem de Bruijn @ 2015-01-12 19:29 UTC (permalink / raw)
  To: netdev; +Cc: davem, xiyou.wangcong, Willem de Bruijn

From: Willem de Bruijn <willemb@google.com>

Patch 3b50d9029809 ("ipv6: fix redefinition of in6_pktinfo ...")
fixed a libc compatibility issue in ipv6 structure definitions
as described in include/uapi/linux/libc-compat.h.

It relies on including linux/in6.h to include libc-compat.h itself.
Include that file directly to clearly communicate the dependency
(libc-compat.h: "This include must be as early as possible").

Signed-off-by: Willem de Bruijn <willemb@google.com>

----

As discussed in http://patchwork.ozlabs.org/patch/427384/
---
 include/uapi/linux/ipv6.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/include/uapi/linux/ipv6.h b/include/uapi/linux/ipv6.h
index b9b1b7d..73cb02d 100644
--- a/include/uapi/linux/ipv6.h
+++ b/include/uapi/linux/ipv6.h
@@ -1,6 +1,7 @@
 #ifndef _UAPI_IPV6_H
 #define _UAPI_IPV6_H
 
+#include <linux/libc-compat.h>
 #include <linux/types.h>
 #include <linux/in6.h>
 #include <asm/byteorder.h>
-- 
2.2.0.rc0.207.ga3a616c

^ permalink raw reply related

* Re: [PATCH 5/6] openvswitch: Allow for any level of nesting in flow attributes
From: Jesse Gross @ 2015-01-12 19:41 UTC (permalink / raw)
  To: Thomas Graf
  Cc: David Miller, Stephen Hemminger, Pravin Shelar, Tom Herbert,
	Alexei Starovoitov, dev@openvswitch.org, netdev
In-Reply-To: <793586e5305017d914913df4d3502cada6f1f55f.1421064100.git.tgraf@suug.ch>

On Mon, Jan 12, 2015 at 4:26 AM, Thomas Graf <tgraf@suug.ch> wrote:
> diff --git a/net/openvswitch/flow_netlink.c b/net/openvswitch/flow_netlink.c
> index 8980d32..457ccf3 100644
> --- a/net/openvswitch/flow_netlink.c
> +++ b/net/openvswitch/flow_netlink.c
> +static const struct ovs_len_tbl ovs_tunnel_key_lens[OVS_TUNNEL_KEY_ATTR_MAX + 1] = {
> +       [OVS_TUNNEL_KEY_ATTR_ID]            = { .len = sizeof(u64) },
> +       [OVS_TUNNEL_KEY_ATTR_IPV4_SRC]      = { .len = sizeof(u32) },
> +       [OVS_TUNNEL_KEY_ATTR_IPV4_DST]      = { .len = sizeof(u32) },
> +       [OVS_TUNNEL_KEY_ATTR_TOS]           = { .len = 1 },
> +       [OVS_TUNNEL_KEY_ATTR_TTL]           = { .len = 1 },
> +       [OVS_TUNNEL_KEY_ATTR_DONT_FRAGMENT] = { .len = 0 },
> +       [OVS_TUNNEL_KEY_ATTR_CSUM]          = { .len = 0 },
> +       [OVS_TUNNEL_KEY_ATTR_TP_SRC]        = { .len = sizeof(u16) },
> +       [OVS_TUNNEL_KEY_ATTR_TP_DST]        = { .len = sizeof(u16) },
> +       [OVS_TUNNEL_KEY_ATTR_OAM]           = { .len = 0 },
> +       [OVS_TUNNEL_KEY_ATTR_GENEVE_OPTS]   = { .len = OVS_ATTR_NESTED },
> +};

Geneve isn't really nested - maybe we should break it out into a
separate name? OVS_ATTR_VARIABLE? We shouldn't really try to traverse
it as netlink attributes anyways.

> +static const struct ovs_len_tbl ovs_key_lens[OVS_KEY_ATTR_MAX + 1] = {
> +       [OVS_KEY_ATTR_ENCAP]     = { .len = OVS_ATTR_NESTED },

This is not new but I think that encap isn't really handled correctly.
In theory, there could be multiple levels of nesting here (either
another encap or some other element) but there's no 'next' link.
However, I don't believe the situation arises today.

^ permalink raw reply

* Re: [PATCH v5] can: Convert to runtime_pm
From: Marc Kleine-Budde @ 2015-01-12 19:52 UTC (permalink / raw)
  To: Kedareswara rao Appana, wg, michal.simek, soren.brinkmann,
	grant.likely, robh+dt
  Cc: linux-can, netdev, linux-arm-kernel, linux-kernel, devicetree,
	Kedareswara rao Appana
In-Reply-To: <baf987c11c0242c2bf87b81f9396b09d@BN1BFFO11FD018.protection.gbl>

[-- Attachment #1: Type: text/plain, Size: 12223 bytes --]

On 01/12/2015 04:04 PM, Kedareswara rao Appana wrote:
> Instead of enabling/disabling clocks at several locations in the driver,
> Use the runtime_pm framework. This consolidates the actions for runtime PM
> In the appropriate callbacks and makes the driver more readable and mantainable.
> 
> Signed-off-by: Soren Brinkmann <soren.brinkmann@xilinx.com>
> Signed-off-by: Kedareswara rao Appana <appanad@xilinx.com>

FTBFS:

> drivers/net/can/xilinx_can.c:1064:9: error: undefined identifier 'SET_PM_RUNTIME_PM_OPS'
>   CC [M]  drivers/net/can/xilinx_can.o
> drivers/net/can/xilinx_can.c:1064:2: error: implicit declaration of function ‘SET_PM_RUNTIME_PM_OPS’ [-Werror=implicit-function-declaration]
>   SET_PM_RUNTIME_PM_OPS(xcan_runtime_suspend, xcan_runtime_resume, NULL)
>   ^
> drivers/net/can/xilinx_can.c:1065:1: error: initializer element is not constant
>  };
>  ^
> drivers/net/can/xilinx_can.c:1065:1: error: (near initialization for ‘xcan_dev_pm_ops.prepare’)

Have a look at commit 40bd62c6194bdee1bc6652b3b0aa28e34883f603:

More comments inline. Looks quite good now.

>     PM: Remove the SET_PM_RUNTIME_PM_OPS() macro
> 
>     There're now no users left of the SET_PM_RUNTIME_PM_OPS() macro, since
>     all have converted to use the SET_RUNTIME_PM_OPS() macro instead, so
>     let's remove it.
> ---
> Changes for v5:
>  - Updated with the review comments.
>    Updated the remove fuction to use runtime_pm.
> Chnages for v4:
>  - Updated with the review comments.
> Changes for v3:
>   - Converted the driver to use runtime_pm.
> Changes for v2:
>   - Removed the struct platform_device* from suspend/resume
>     as suggest by Lothar.
> 
>  drivers/net/can/xilinx_can.c |  157 ++++++++++++++++++++++++++++-------------
>  1 files changed, 107 insertions(+), 50 deletions(-)
> 
> diff --git a/drivers/net/can/xilinx_can.c b/drivers/net/can/xilinx_can.c
> index 6c67643..67aef00 100644
> --- a/drivers/net/can/xilinx_can.c
> +++ b/drivers/net/can/xilinx_can.c
> @@ -32,6 +32,7 @@
>  #include <linux/can/dev.h>
>  #include <linux/can/error.h>
>  #include <linux/can/led.h>
> +#include <linux/pm_runtime.h>
>  
>  #define DRIVER_NAME	"xilinx_can"
>  
> @@ -138,7 +139,7 @@ struct xcan_priv {
>  	u32 (*read_reg)(const struct xcan_priv *priv, enum xcan_reg reg);
>  	void (*write_reg)(const struct xcan_priv *priv, enum xcan_reg reg,
>  			u32 val);
> -	struct net_device *dev;
> +	struct device *dev;
>  	void __iomem *reg_base;
>  	unsigned long irq_flags;
>  	struct clk *bus_clk;
> @@ -842,6 +843,13 @@ static int xcan_open(struct net_device *ndev)
>  	struct xcan_priv *priv = netdev_priv(ndev);
>  	int ret;
>  
> +	ret = pm_runtime_get_sync(priv->dev);
> +	if (ret < 0) {
> +		netdev_err(ndev, "%s: pm_runtime_get failed(%d)\n",
> +				__func__, ret);
> +		return ret;
> +	}
> +
>  	ret = request_irq(ndev->irq, xcan_interrupt, priv->irq_flags,
>  			ndev->name, ndev);
>  	if (ret < 0) {
> @@ -849,29 +857,17 @@ static int xcan_open(struct net_device *ndev)
>  		goto err;
>  	}
>  
> -	ret = clk_prepare_enable(priv->can_clk);
> -	if (ret) {
> -		netdev_err(ndev, "unable to enable device clock\n");
> -		goto err_irq;
> -	}
> -
> -	ret = clk_prepare_enable(priv->bus_clk);
> -	if (ret) {
> -		netdev_err(ndev, "unable to enable bus clock\n");
> -		goto err_can_clk;
> -	}
> -
>  	/* Set chip into reset mode */
>  	ret = set_reset_mode(ndev);
>  	if (ret < 0) {
>  		netdev_err(ndev, "mode resetting failed!\n");
> -		goto err_bus_clk;
> +		goto err_irq;
>  	}
>  
>  	/* Common open */
>  	ret = open_candev(ndev);
>  	if (ret)
> -		goto err_bus_clk;
> +		goto err_irq;
>  
>  	ret = xcan_chip_start(ndev);
>  	if (ret < 0) {
> @@ -887,13 +883,11 @@ static int xcan_open(struct net_device *ndev)
>  
>  err_candev:
>  	close_candev(ndev);
> -err_bus_clk:
> -	clk_disable_unprepare(priv->bus_clk);
> -err_can_clk:
> -	clk_disable_unprepare(priv->can_clk);
>  err_irq:
>  	free_irq(ndev->irq, ndev);
>  err:
> +	pm_runtime_put(priv->dev);
> +
>  	return ret;
>  }
>  
> @@ -910,12 +904,11 @@ static int xcan_close(struct net_device *ndev)
>  	netif_stop_queue(ndev);
>  	napi_disable(&priv->napi);
>  	xcan_chip_stop(ndev);
> -	clk_disable_unprepare(priv->bus_clk);
> -	clk_disable_unprepare(priv->can_clk);
>  	free_irq(ndev->irq, ndev);
>  	close_candev(ndev);
>  
>  	can_led_event(ndev, CAN_LED_EVENT_STOP);
> +	pm_runtime_put(priv->dev);
>  
>  	return 0;
>  }
> @@ -934,27 +927,20 @@ static int xcan_get_berr_counter(const struct net_device *ndev,
>  	struct xcan_priv *priv = netdev_priv(ndev);
>  	int ret;
>  
> -	ret = clk_prepare_enable(priv->can_clk);
> -	if (ret)
> -		goto err;
> -
> -	ret = clk_prepare_enable(priv->bus_clk);
> -	if (ret)
> -		goto err_clk;
> +	ret = pm_runtime_get_sync(priv->dev);
> +	if (ret < 0) {
> +		netdev_err(ndev, "%s: pm_runtime_get failed(%d)\n",
> +				__func__, ret);
> +		return ret;
> +	}
>  
>  	bec->txerr = priv->read_reg(priv, XCAN_ECR_OFFSET) & XCAN_ECR_TEC_MASK;
>  	bec->rxerr = ((priv->read_reg(priv, XCAN_ECR_OFFSET) &
>  			XCAN_ECR_REC_MASK) >> XCAN_ESR_REC_SHIFT);
>  
> -	clk_disable_unprepare(priv->bus_clk);
> -	clk_disable_unprepare(priv->can_clk);
> +	pm_runtime_put(priv->dev);
>  
>  	return 0;
> -
> -err_clk:
> -	clk_disable_unprepare(priv->can_clk);
> -err:
> -	return ret;
>  }
>  
>  
> @@ -967,15 +953,45 @@ static const struct net_device_ops xcan_netdev_ops = {
>  
>  /**
>   * xcan_suspend - Suspend method for the driver
> - * @dev:	Address of the platform_device structure
> + * @dev:	Address of the device structure
>   *
>   * Put the driver into low power mode.
> - * Return: 0 always
> + * Return: 0 on success and failure value on error
>   */
>  static int __maybe_unused xcan_suspend(struct device *dev)
>  {
> -	struct platform_device *pdev = dev_get_drvdata(dev);
> -	struct net_device *ndev = platform_get_drvdata(pdev);
> +	if (!device_may_wakeup(dev))
> +		return pm_runtime_force_suspend(dev);
> +
> +	return 0;
> +}
> +
> +/**
> + * xcan_resume - Resume from suspend
> + * @dev:	Address of the device structure
> + *
> + * Resume operation after suspend.
> + * Return: 0 on success and failure value on error
> + */
> +static int __maybe_unused xcan_resume(struct device *dev)
> +{
> +	if (!device_may_wakeup(dev))
> +		return pm_runtime_force_resume(dev);
> +
> +	return 0;
> +
> +}
> +
> +/**
> + * xcan_runtime_suspend - Runtime suspend method for the driver
> + * @dev:	Address of the device structure
> + *
> + * Put the driver into low power mode.
> + * Return: 0 always
> + */
> +static int __maybe_unused xcan_runtime_suspend(struct device *dev)
> +{
> +	struct net_device *ndev = dev_get_drvdata(dev);
>  	struct xcan_priv *priv = netdev_priv(ndev);
>  
>  	if (netif_running(ndev)) {
> @@ -993,18 +1009,18 @@ static int __maybe_unused xcan_suspend(struct device *dev)
>  }
>  
>  /**
> - * xcan_resume - Resume from suspend
> - * @dev:	Address of the platformdevice structure
> + * xcan_runtime_resume - Runtime resume from suspend
> + * @dev:	Address of the device structure
>   *
>   * Resume operation after suspend.
>   * Return: 0 on success and failure value on error
>   */
> -static int __maybe_unused xcan_resume(struct device *dev)
> +static int __maybe_unused xcan_runtime_resume(struct device *dev)
>  {
> -	struct platform_device *pdev = dev_get_drvdata(dev);
> -	struct net_device *ndev = platform_get_drvdata(pdev);
> +	struct net_device *ndev = dev_get_drvdata(dev);
>  	struct xcan_priv *priv = netdev_priv(ndev);
>  	int ret;
> +	u32 isr, status;
>  
>  	ret = clk_enable(priv->bus_clk);
>  	if (ret) {
> @@ -1014,15 +1030,28 @@ static int __maybe_unused xcan_resume(struct device *dev)
>  	ret = clk_enable(priv->can_clk);
>  	if (ret) {
>  		dev_err(dev, "Cannot enable clock.\n");
> -		clk_disable_unprepare(priv->bus_clk);
> +		clk_disable(priv->bus_clk);
>  		return ret;
>  	}
>  
>  	priv->write_reg(priv, XCAN_MSR_OFFSET, 0);
>  	priv->write_reg(priv, XCAN_SRR_OFFSET, XCAN_SRR_CEN_MASK);
> -	priv->can.state = CAN_STATE_ERROR_ACTIVE;
> +	isr = priv->read_reg(priv, XCAN_ISR_OFFSET);
> +	status = priv->read_reg(priv, XCAN_SR_OFFSET);
>  
>  	if (netif_running(ndev)) {
> +		if (isr & XCAN_IXR_BSOFF_MASK) {
> +			priv->can.state = CAN_STATE_BUS_OFF;
> +			priv->write_reg(priv, XCAN_SRR_OFFSET,
> +					XCAN_SRR_RESET_MASK);
> +		} else if ((status & XCAN_SR_ESTAT_MASK) ==
> +					XCAN_SR_ESTAT_MASK) {
> +			priv->can.state = CAN_STATE_ERROR_PASSIVE;
> +		} else if (status & XCAN_SR_ERRWRN_MASK) {
> +			priv->can.state = CAN_STATE_ERROR_WARNING;
> +		} else {
> +			priv->can.state = CAN_STATE_ERROR_ACTIVE;
> +		}
>  		netif_device_attach(ndev);
>  		netif_start_queue(ndev);
>  	}
> @@ -1030,7 +1059,10 @@ static int __maybe_unused xcan_resume(struct device *dev)
>  	return 0;
>  }
>  
> -static SIMPLE_DEV_PM_OPS(xcan_dev_pm_ops, xcan_suspend, xcan_resume);
> +static const struct dev_pm_ops xcan_dev_pm_ops = {
> +	SET_SYSTEM_SLEEP_PM_OPS(xcan_suspend, xcan_resume)
> +	SET_PM_RUNTIME_PM_OPS(xcan_runtime_suspend, xcan_runtime_resume, NULL)
> +};
>  
>  /**
>   * xcan_probe - Platform registration call
> @@ -1071,7 +1103,7 @@ static int xcan_probe(struct platform_device *pdev)
>  		return -ENOMEM;
>  
>  	priv = netdev_priv(ndev);
> -	priv->dev = ndev;
> +	priv->dev = &pdev->dev;
>  	priv->can.bittiming_const = &xcan_bittiming_const;
>  	priv->can.do_set_mode = xcan_do_set_mode;
>  	priv->can.do_get_berr_counter = xcan_get_berr_counter;
> @@ -1137,6 +1169,16 @@ static int xcan_probe(struct platform_device *pdev)
>  
>  	netif_napi_add(ndev, &priv->napi, xcan_rx_poll, rx_max);
>  
> +	pm_runtime_set_active(&pdev->dev);
> +	pm_runtime_irq_safe(&pdev->dev);
> +	pm_runtime_enable(&pdev->dev);
> +	ret = pm_runtime_get_sync(&pdev->dev);
> +	if (ret < 0) {
> +		netdev_err(ndev, "%s: pm_runtime_get failed(%d)\n",
> +			__func__, ret);
> +		goto err_pmdisable;

after err_pmdisable is a runtime_put(), your cleanup is broken.

> +	}
> +
>  	ret = register_candev(ndev);
>  	if (ret) {
>  		dev_err(&pdev->dev, "fail to register failed (err=%d)\n", ret);
> 		goto err_unprepare_disable_busclk;

I think you have to adjust the jump label.

> 	}
> 
> 	devm_can_led_init(ndev);
> 
> 	pm_runtime_put(&pdev->dev);



> @@ -1144,15 +1186,19 @@ static int xcan_probe(struct platform_device *pdev)
>  	}
>  
>  	devm_can_led_init(ndev);
> -	clk_disable_unprepare(priv->bus_clk);
> -	clk_disable_unprepare(priv->can_clk);
> +
> +	pm_runtime_put(&pdev->dev);
> +
>  	netdev_dbg(ndev, "reg_base=0x%p irq=%d clock=%d, tx fifo depth:%d\n",
>  			priv->reg_base, ndev->irq, priv->can.clock.freq,
>  			priv->tx_max);
>  
>  	return 0;
>  
> +err_pmdisable:
> +	pm_runtime_disable(&pdev->dev);
>  err_unprepare_disable_busclk:
> +	pm_runtime_put(priv->dev);

This cleanup is not correct.

>  	clk_disable_unprepare(priv->bus_clk);
>  err_unprepare_disable_dev:
>  	clk_disable_unprepare(priv->can_clk);
> @@ -1173,12 +1219,23 @@ static int xcan_remove(struct platform_device *pdev)
>  {
>  	struct net_device *ndev = platform_get_drvdata(pdev);
>  	struct xcan_priv *priv = netdev_priv(ndev);
> +	int ret;
> +
> +	ret = pm_runtime_get_sync(&pdev->dev);
> +	if (ret < 0) {
> +		netdev_err(ndev, "%s: pm_runtime_get failed(%d)\n",
> +				__func__, ret);
> +		return ret;
> +	}
>  
>  	if (set_reset_mode(ndev) < 0)
>  		netdev_err(ndev, "mode resetting failed!\n");
>  
>  	unregister_candev(ndev);
> +	pm_runtime_disable(&pdev->dev);
>  	netif_napi_del(&priv->napi);
> +	clk_disable_unprepare(priv->bus_clk);
> +	clk_disable_unprepare(priv->can_clk);
>  	free_candev(ndev);
>  
>  	return 0;
> 

Marc

-- 
Pengutronix e.K.                  | Marc Kleine-Budde           |
Industrial Linux Solutions        | Phone: +49-231-2826-924     |
Vertretung West/Dortmund          | Fax:   +49-5121-206917-5555 |
Amtsgericht Hildesheim, HRA 2686  | http://www.pengutronix.de   |


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply

* Re: [PATCH RESEND] isdn: fix NUL (\0 or \x00) specification in string
From: David Miller @ 2015-01-12 20:32 UTC (permalink / raw)
  To: me; +Cc: linux-kernel, dsterba, mac, isdn, netdev
In-Reply-To: <1420657812-19445-1-git-send-email-me@mortis.eu>

From: Giel van Schijndel <me@mortis.eu>
Date: Wed,  7 Jan 2015 20:10:12 +0100

> In C one can either use '\0' or '\x00' (or '\000') to add a NUL byte to
> a string. '\0x00' isn't part of these and will in fact result in a
> single NUL followed by "x00". This fixes that.
> 
> Signed-off-by: Giel van Schijndel <me@mortis.eu>
> Reported-at: http://www.viva64.com/en/b/0299/

Applied, thank you.

^ permalink raw reply

* Re: [PATCH] net/at91_ether: prepare and unprepare clock
From: David Miller @ 2015-01-12 20:35 UTC (permalink / raw)
  To: alexandre.belloni
  Cc: nicolas.ferre, boris.brezillon, linux-arm-kernel, linux-kernel,
	netdev
In-Reply-To: <1420671566-30784-1-git-send-email-alexandre.belloni@free-electrons.com>

From: Alexandre Belloni <alexandre.belloni@free-electrons.com>
Date: Wed,  7 Jan 2015 23:59:26 +0100

> The clock is enabled without being prepared, this leads to:
> 
> WARNING: CPU: 0 PID: 1 at drivers/clk/clk.c:889 __clk_enable+0x24/0xa8()
> 
> and a non working ethernet interface.
> 
> Use clk_prepare_enable() and clk_disable_unprepare() to handle the clock.
> 
> Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>

Applied, thanks.

^ permalink raw reply

* Re: [PATCH v4 2/4] can: kvaser_usb: Update error counters before exiting on OOM
From: Ahmed S. Darwish @ 2015-01-12 20:36 UTC (permalink / raw)
  To: Marc Kleine-Budde
  Cc: Olivier Sobrie, Oliver Hartkopp, Wolfgang Grandegger,
	David S. Miller, Paul Gortmaker, Linux-CAN, netdev, LKML
In-Reply-To: <54B3AB6C.8020900@pengutronix.de>

On Mon, Jan 12, 2015 at 12:09:32PM +0100, Marc Kleine-Budde wrote:
> On 01/11/2015 09:15 PM, Ahmed S. Darwish wrote:
> > From: Ahmed S. Darwish <ahmed.darwish@valeo.com>
> > 
> > Let the error counters be more accurate in case of Out of
> > Memory conditions.
> 
> Please have a look at kvaser_usb_rx_error(), the whole state handling is
> omitted in case of OOM.
> 

I see. Regarding kvaser_usb_rx_error(), would something like
below patch be acceptable? 

Kindly note that separating recording interface state from
error frame packet building leads to duplication of a good
number of if-conditions. Meanwhile, it truly saves _all_
of the possible state before any ENOMEM -- the correct thing
to do.

Another solution was to allocate the can frame on the stack,
and thus avoiding any code duplication. But this only leads
to calls of "kvaser_usb_simple_msg_async", which can fail
with -ENOMEM by itself, returning to the very same problem
again. 

If the patch is acceptable, I'll rebase my USBCAN-II driver
above it and re-submit the series (minus the merged patch).

Thanks,

-->

[ Patch is build-tested, but not _fully_ run-time tested.

  It's based on linux-can/testing commit d642b49f6d84b94bd
  "can: kvaser_usb: Don't dereference skb after a netif_rx" ]

Subject: [PATCH] can: kvaser_usb: Update net interface state
 before exiting on OOM

From: Ahmed S. Darwish <ahmed.darwish@valeo.com>

Let the network interface can bus state and error counters be
more accurate in case of Out of Memory conditions.

Suggested-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Ahmed S. Darwish <ahmed.darwish@valeo.com>
---
 drivers/net/can/usb/kvaser_usb.c | 182 +++++++++++++++++++++++----------------
 1 file changed, 106 insertions(+), 76 deletions(-)

diff --git a/drivers/net/can/usb/kvaser_usb.c b/drivers/net/can/usb/kvaser_usb.c
index c32cd61..2d0ef76 100644
--- a/drivers/net/can/usb/kvaser_usb.c
+++ b/drivers/net/can/usb/kvaser_usb.c
@@ -273,6 +273,10 @@ struct kvaser_msg {
 	} u;
 } __packed;
 
+struct kvaser_usb_error_summary {
+	u8 channel, status, txerr, rxerr, error_factor;
+};
+
 struct kvaser_usb_tx_urb_context {
 	struct kvaser_usb_net_priv *priv;
 	u32 echo_index;
@@ -615,6 +619,57 @@ static void kvaser_usb_unlink_tx_urbs(struct kvaser_usb_net_priv *priv)
 		priv->tx_contexts[i].echo_index = MAX_TX_URBS;
 }
 
+static void kvaser_usb_rx_error_update_state(struct kvaser_usb_net_priv *priv,
+					     const struct kvaser_usb_error_summary *es)
+{
+	struct net_device_stats *stats;
+	unsigned int new_state;
+
+	stats = &priv->netdev->stats;
+	new_state = priv->can.state;
+
+	netdev_dbg(priv->netdev, "Error status: 0x%02x\n", es->status);
+
+	if (es->status & M16C_STATE_BUS_OFF) {
+		priv->can.can_stats.bus_off++;
+		new_state = CAN_STATE_BUS_OFF;
+	} else if (es->status & M16C_STATE_BUS_PASSIVE) {
+		if (priv->can.state != CAN_STATE_ERROR_PASSIVE) {
+			priv->can.can_stats.error_passive++;
+		}
+		new_state = CAN_STATE_ERROR_PASSIVE;
+	}
+
+	if (es->status == M16C_STATE_BUS_ERROR) {
+		if ((priv->can.state < CAN_STATE_ERROR_WARNING) &&
+		    ((es->txerr >= 96) || (es->rxerr >= 96))) {
+			priv->can.can_stats.error_warning++;
+			new_state = CAN_STATE_ERROR_WARNING;
+		} else if (priv->can.state > CAN_STATE_ERROR_ACTIVE) {
+			new_state = CAN_STATE_ERROR_ACTIVE;
+		}
+	}
+
+	if (!es->status) {
+		new_state = CAN_STATE_ERROR_ACTIVE;
+	}
+
+	if (priv->can.restart_ms &&
+	    (priv->can.state >= CAN_STATE_BUS_OFF) &&
+	    (new_state < CAN_STATE_BUS_OFF)) {
+		priv->can.can_stats.restarts++;
+	}
+
+	if (es->error_factor) {
+		priv->can.can_stats.bus_error++;
+		stats->rx_errors++;
+	}
+
+	priv->bec.txerr = es->txerr;
+	priv->bec.rxerr = es->rxerr;
+	priv->can.state = new_state;
+}
+
 static void kvaser_usb_rx_error(const struct kvaser_usb *dev,
 				const struct kvaser_msg *msg)
 {
@@ -622,30 +677,30 @@ static void kvaser_usb_rx_error(const struct kvaser_usb *dev,
 	struct sk_buff *skb;
 	struct net_device_stats *stats;
 	struct kvaser_usb_net_priv *priv;
-	unsigned int new_state;
-	u8 channel, status, txerr, rxerr, error_factor;
+	struct kvaser_usb_error_summary es = { };
+	unsigned int new_state, old_state;
 
 	switch (msg->id) {
 	case CMD_CAN_ERROR_EVENT:
-		channel = msg->u.error_event.channel;
-		status =  msg->u.error_event.status;
-		txerr = msg->u.error_event.tx_errors_count;
-		rxerr = msg->u.error_event.rx_errors_count;
-		error_factor = msg->u.error_event.error_factor;
+		es.channel = msg->u.error_event.channel;
+		es.status =  msg->u.error_event.status;
+		es.txerr = msg->u.error_event.tx_errors_count;
+		es.rxerr = msg->u.error_event.rx_errors_count;
+		es.error_factor = msg->u.error_event.error_factor;
 		break;
 	case CMD_LOG_MESSAGE:
-		channel = msg->u.log_message.channel;
-		status = msg->u.log_message.data[0];
-		txerr = msg->u.log_message.data[2];
-		rxerr = msg->u.log_message.data[3];
-		error_factor = msg->u.log_message.data[1];
+		es.channel = msg->u.log_message.channel;
+		es.status = msg->u.log_message.data[0];
+		es.txerr = msg->u.log_message.data[2];
+		es.rxerr = msg->u.log_message.data[3];
+		es.error_factor = msg->u.log_message.data[1];
 		break;
 	case CMD_CHIP_STATE_EVENT:
-		channel = msg->u.chip_state_event.channel;
-		status =  msg->u.chip_state_event.status;
-		txerr = msg->u.chip_state_event.tx_errors_count;
-		rxerr = msg->u.chip_state_event.rx_errors_count;
-		error_factor = 0;
+		es.channel = msg->u.chip_state_event.channel;
+		es.status =  msg->u.chip_state_event.status;
+		es.txerr = msg->u.chip_state_event.tx_errors_count;
+		es.rxerr = msg->u.chip_state_event.rx_errors_count;
+		es.error_factor = 0;
 		break;
 	default:
 		dev_err(dev->udev->dev.parent, "Invalid msg id (%d)\n",
@@ -653,122 +708,97 @@ static void kvaser_usb_rx_error(const struct kvaser_usb *dev,
 		return;
 	}
 
-	if (channel >= dev->nchannels) {
+	if (es.channel >= dev->nchannels) {
 		dev_err(dev->udev->dev.parent,
-			"Invalid channel number (%d)\n", channel);
+			"Invalid channel number (%d)\n", es.channel);
 		return;
 	}
 
-	priv = dev->nets[channel];
+	priv = dev->nets[es.channel];
 	stats = &priv->netdev->stats;
 
-	if (status & M16C_STATE_BUS_RESET) {
+	if (es.status & M16C_STATE_BUS_RESET) {
 		kvaser_usb_unlink_tx_urbs(priv);
 		return;
 	}
 
+	old_state = priv->can.state;
+	kvaser_usb_rx_error_update_state(priv, &es);
+	new_state = priv->can.state;
+
 	skb = alloc_can_err_skb(priv->netdev, &cf);
 	if (!skb) {
 		stats->rx_dropped++;
 		return;
 	}
 
-	new_state = priv->can.state;
-
-	netdev_dbg(priv->netdev, "Error status: 0x%02x\n", status);
-
-	if (status & M16C_STATE_BUS_OFF) {
-		cf->can_id |= CAN_ERR_BUSOFF;
-
-		priv->can.can_stats.bus_off++;
+	if (es.status & M16C_STATE_BUS_OFF) {
 		if (!priv->can.restart_ms)
 			kvaser_usb_simple_msg_async(priv, CMD_STOP_CHIP);
-
 		netif_carrier_off(priv->netdev);
 
-		new_state = CAN_STATE_BUS_OFF;
-	} else if (status & M16C_STATE_BUS_PASSIVE) {
-		if (priv->can.state != CAN_STATE_ERROR_PASSIVE) {
+		cf->can_id |= CAN_ERR_BUSOFF;
+	} else if (es.status & M16C_STATE_BUS_PASSIVE) {
+		if (old_state != CAN_STATE_ERROR_PASSIVE) {
 			cf->can_id |= CAN_ERR_CRTL;
 
-			if (txerr || rxerr)
-				cf->data[1] = (txerr > rxerr)
+			if (es.txerr || es.rxerr)
+				cf->data[1] = (es.txerr > es.rxerr)
 						? CAN_ERR_CRTL_TX_PASSIVE
 						: CAN_ERR_CRTL_RX_PASSIVE;
 			else
 				cf->data[1] = CAN_ERR_CRTL_TX_PASSIVE |
 					      CAN_ERR_CRTL_RX_PASSIVE;
-
-			priv->can.can_stats.error_passive++;
 		}
-
-		new_state = CAN_STATE_ERROR_PASSIVE;
 	}
 
-	if (status == M16C_STATE_BUS_ERROR) {
-		if ((priv->can.state < CAN_STATE_ERROR_WARNING) &&
-		    ((txerr >= 96) || (rxerr >= 96))) {
+	if (es.status == M16C_STATE_BUS_ERROR) {
+		if ((old_state < CAN_STATE_ERROR_WARNING) &&
+		    ((es.txerr >= 96) || (es.rxerr >= 96))) {
 			cf->can_id |= CAN_ERR_CRTL;
-			cf->data[1] = (txerr > rxerr)
+			cf->data[1] = (es.txerr > es.rxerr)
 					? CAN_ERR_CRTL_TX_WARNING
 					: CAN_ERR_CRTL_RX_WARNING;
-
-			priv->can.can_stats.error_warning++;
-			new_state = CAN_STATE_ERROR_WARNING;
-		} else if (priv->can.state > CAN_STATE_ERROR_ACTIVE) {
+		} else if (old_state > CAN_STATE_ERROR_ACTIVE) {
 			cf->can_id |= CAN_ERR_PROT;
 			cf->data[2] = CAN_ERR_PROT_ACTIVE;
-
-			new_state = CAN_STATE_ERROR_ACTIVE;
 		}
 	}
 
-	if (!status) {
+	if (!es.status) {
 		cf->can_id |= CAN_ERR_PROT;
 		cf->data[2] = CAN_ERR_PROT_ACTIVE;
-
-		new_state = CAN_STATE_ERROR_ACTIVE;
 	}
 
 	if (priv->can.restart_ms &&
-	    (priv->can.state >= CAN_STATE_BUS_OFF) &&
+	    (old_state >= CAN_STATE_BUS_OFF) &&
 	    (new_state < CAN_STATE_BUS_OFF)) {
 		cf->can_id |= CAN_ERR_RESTARTED;
 		netif_carrier_on(priv->netdev);
-
-		priv->can.can_stats.restarts++;
 	}
 
-	if (error_factor) {
-		priv->can.can_stats.bus_error++;
-		stats->rx_errors++;
-
+	if (es.error_factor) {
 		cf->can_id |= CAN_ERR_BUSERROR | CAN_ERR_PROT;
 
-		if (error_factor & M16C_EF_ACKE)
+		if (es.error_factor & M16C_EF_ACKE)
 			cf->data[3] |= (CAN_ERR_PROT_LOC_ACK);
-		if (error_factor & M16C_EF_CRCE)
+		if (es.error_factor & M16C_EF_CRCE)
 			cf->data[3] |= (CAN_ERR_PROT_LOC_CRC_SEQ |
 					CAN_ERR_PROT_LOC_CRC_DEL);
-		if (error_factor & M16C_EF_FORME)
+		if (es.error_factor & M16C_EF_FORME)
 			cf->data[2] |= CAN_ERR_PROT_FORM;
-		if (error_factor & M16C_EF_STFE)
+		if (es.error_factor & M16C_EF_STFE)
 			cf->data[2] |= CAN_ERR_PROT_STUFF;
-		if (error_factor & M16C_EF_BITE0)
+		if (es.error_factor & M16C_EF_BITE0)
 			cf->data[2] |= CAN_ERR_PROT_BIT0;
-		if (error_factor & M16C_EF_BITE1)
+		if (es.error_factor & M16C_EF_BITE1)
 			cf->data[2] |= CAN_ERR_PROT_BIT1;
-		if (error_factor & M16C_EF_TRE)
+		if (es.error_factor & M16C_EF_TRE)
 			cf->data[2] |= CAN_ERR_PROT_TX;
 	}
 
-	cf->data[6] = txerr;
-	cf->data[7] = rxerr;
-
-	priv->bec.txerr = txerr;
-	priv->bec.rxerr = rxerr;
-
-	priv->can.state = new_state;
+	cf->data[6] = es.txerr;
+	cf->data[7] = es.rxerr;
 
 	stats->rx_packets++;
 	stats->rx_bytes += cf->can_dlc;
@@ -792,6 +822,9 @@ static void kvaser_usb_rx_can_err(const struct kvaser_usb_net_priv *priv,
 	}
 
 	if (msg->u.rx_can.flag & MSG_FLAG_OVERRUN) {
+		stats->rx_over_errors++;
+		stats->rx_errors++;
+
 		skb = alloc_can_err_skb(priv->netdev, &cf);
 		if (!skb) {
 			stats->rx_dropped++;
@@ -801,9 +834,6 @@ static void kvaser_usb_rx_can_err(const struct kvaser_usb_net_priv *priv,
 		cf->can_id |= CAN_ERR_CRTL;
 		cf->data[1] = CAN_ERR_CRTL_RX_OVERFLOW;
 
-		stats->rx_over_errors++;
-		stats->rx_errors++;
-
 		stats->rx_packets++;
 		stats->rx_bytes += cf->can_dlc;
 		netif_rx(skb);
-- 
1.9.1

^ permalink raw reply related

* Re: [PATCH 1/1] update ip-sysctl.txt documentation (v2)
From: David Miller @ 2015-01-12 20:39 UTC (permalink / raw)
  To: ani; +Cc: corbet, edumazet, linux-doc, linux-kernel, P, netdev, fruggeri
In-Reply-To: <1420674356-32210-1-git-send-email-ani@arista.com>

From: Ani Sinha <ani@arista.com>
Date: Wed,  7 Jan 2015 15:45:56 -0800

> Update documentation to reflect the fact that
> /proc/sys/net/ipv4/route/max_size is no longer used for ipv4.
> 
> Signed-off-by: Ani Sinha <ani@arista.com>

Applied, thank you.

^ permalink raw reply

* Re: [patch -next] net: eth: xgene: devm_ioremap() returns NULL on error
From: David Miller @ 2015-01-12 20:40 UTC (permalink / raw)
  To: dan.carpenter
  Cc: isubramanian, fkan, kchudgar, grant.likely, robh+dt, netdev,
	kernel-janitors
In-Reply-To: <20150108105211.GB10597@mwanda>

From: Dan Carpenter <dan.carpenter@oracle.com>
Date: Thu, 8 Jan 2015 13:52:12 +0300

> devm_ioremap() returns NULL on failure, it doesn't return an ERR_PTR.
> 
> Fixes: de7b5b3d790a ('net: eth: xgene: change APM X-Gene SoC platform ethernet to support ACPI')
> Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>

Applied, thanks.

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox