Netdev List

Netdev List
 help / color / mirror / Atom feed

* [PATCH iproute2/net-next 0/3] tc: flower: Support matching on ICMP
From: Simon Horman @ 2016-12-02 18:24 UTC (permalink / raw)
  To: netdev; +Cc: Stephen Hemminger, Jamal Hadi Salim, Jiri Pirko, Simon Horman

Add support for matching on ICMP type and code to flower. This is modeled
on existing support for matching on L4 ports.

The second patch provided a minor cleanup which is in keeping with
they style used in the last patch.

This is marked as an RFC to match the same designation given to the
corresponding kernel patches.

Based on iproute2/net-next with the following applied:
* [[PATCH iproute2/net-next v2] 0/4] tc: flower: SCTP and other port fixes

Changes since RFC:
* Update names of enums (Jiri)
* Use enums
* Drop RFC designation


Simon Horman (3):
  tc: flower: update headers for TCA_FLOWER_KEY_ICMP*
  tc: flower: introduce enum flower_endpoint
  tc: flower: support matching on ICMP type and code

 include/linux/pkt_cls.h |  10 ++++
 man/man8/tc-flower.8    |  20 ++++++--
 tc/f_flower.c           | 125 ++++++++++++++++++++++++++++++++++++++++++------
 3 files changed, 136 insertions(+), 19 deletions(-)

-- 
2.7.0.rc3.207.g0ac5344

^ permalink raw reply

* Re: [PATCH net-next v2] net: thunderx: Fix transmit queue timeout issue
From: David Miller @ 2016-12-02 18:33 UTC (permalink / raw)
  To: sunil.kovvuri; +Cc: netdev, linux-kernel, linux-arm-kernel, sgoutham
In-Reply-To: <1480596868-17693-1-git-send-email-sunil.kovvuri@gmail.com>

From: sunil.kovvuri@gmail.com
Date: Thu,  1 Dec 2016 18:24:28 +0530

> From: Sunil Goutham <sgoutham@cavium.com>
> 
> Transmit queue timeout issue is seen in two cases
> - Due to a race condition btw setting stop_queue at xmit()
>   and checking for stopped_queue in NAPI poll routine, at times
>   transmission from a SQ comes to a halt. This is fixed
>   by using barriers and also added a check for SQ free descriptors,
>   incase SQ is stopped and there are only CQE_RX i.e no CQE_TX.
> - Contrary to an assumption, a HW errata where HW doesn't stop transmission
>   even though there are not enough CQEs available for a CQE_TX is
>   not fixed in T88 pass 2.x. This results in a Qset error with
>   'CQ_WR_FULL' stalling transmission. This is fixed by adjusting
>   RXQ's  RED levels for CQ level such that there is always enough
>   space left for CQE_TXs.
> 
> Signed-off-by: Sunil Goutham <sgoutham@cavium.com>
> ---
> v2: As suggested by David, replaced netif_tx_start_queue with 
>     netif_tx_wake_queue.

Applied, thanks.

^ permalink raw reply

* Re: [PATCH] ip6_offload: check segs for NULL in ipv6_gso_segment.
From: David Miller @ 2016-12-02 18:36 UTC (permalink / raw)
  To: asavkov; +Cc: netdev, linux-kernel, jstancek, steffen.klassert,
	alexander.h.duyck
In-Reply-To: <1480597564-32355-1-git-send-email-asavkov@redhat.com>

From: Artem Savkov <asavkov@redhat.com>
Date: Thu,  1 Dec 2016 14:06:04 +0100

> segs needs to be checked for being NULL in ipv6_gso_segment() before calling
> skb_shinfo(segs), otherwise kernel can run into a NULL-pointer dereference:
 ...
> Signed-off-by: Artem Savkov <asavkov@redhat.com>

Applied and queued up for -stable.

^ permalink raw reply

* Re: [PATCH net-next] mlx4: fix use-after-free in mlx4_en_fold_software_stats()
From: David Miller @ 2016-12-02 18:36 UTC (permalink / raw)
  To: eric.dumazet; +Cc: brouer, saeedm, netdev, tariqt
In-Reply-To: <1480597326.18162.276.camel@edumazet-glaptop3.roam.corp.google.com>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Thu, 01 Dec 2016 05:02:06 -0800

> From: Eric Dumazet <edumazet@google.com>
> 
> My recent commit to get more precise rx/tx counters in ndo_get_stats64()
> can lead to crashes at device dismantle, as Jesper found out.
> 
> We must prevent mlx4_en_fold_software_stats() trying to access
> tx/rx rings if they are deleted.
> 
> Fix this by adding a test against priv->port_up in
> mlx4_en_fold_software_stats()
> 
> Calling mlx4_en_fold_software_stats() from mlx4_en_stop_port()
> allows us to eventually broadcast the latest/current counters to
> rtnetlink monitors.
> 
> Fixes: 40931b85113d ("mlx4: give precise rx/tx bytes/packets counters")
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Reported-and-bisected-by: Jesper Dangaard Brouer <brouer@redhat.com>
> Tested-by: Jesper Dangaard Brouer <brouer@redhat.com>

Applied.

^ permalink raw reply

* Re: [PATCH net-next 2/2] net/sched: cls_flower: Support matching on ICMP type and code
From: Jiri Pirko @ 2016-12-02 18:38 UTC (permalink / raw)
  To: Simon Horman
  Cc: David Miller, netdev, Jay Vosburgh, Veaceslav Falico,
	Andy Gospodarek, Jamal Hadi Salim, Jiri Pirko
In-Reply-To: <1480701951-3686-3-git-send-email-simon.horman@netronome.com>

Fri, Dec 02, 2016 at 07:05:51PM CET, simon.horman@netronome.com wrote:
>Support matching on ICMP type and code.
>
>Example usage:
>
>tc qdisc add dev eth0 ingress
>
>tc filter add dev eth0 protocol ip parent ffff: flower \
>	indev eth0 ip_proto icmp type 8 code 0 action drop
>
>tc filter add dev eth0 protocol ipv6 parent ffff: flower \
>	indev eth0 ip_proto icmpv6 type 128 code 0 action drop
>
>Signed-off-by: Simon Horman <simon.horman@netronome.com>
>---
> include/net/flow_dissector.h | 24 ++++++++++++++++++++++--
> include/uapi/linux/pkt_cls.h | 10 ++++++++++
> net/sched/cls_flower.c       | 42 ++++++++++++++++++++++++++++++++++++++++++
> 3 files changed, 74 insertions(+), 2 deletions(-)
>
>diff --git a/include/net/flow_dissector.h b/include/net/flow_dissector.h
>index 8880025914e3..5540dfa18872 100644
>--- a/include/net/flow_dissector.h
>+++ b/include/net/flow_dissector.h
>@@ -199,10 +199,30 @@ struct flow_keys_digest {
> void make_flow_keys_digest(struct flow_keys_digest *digest,
> 			   const struct flow_keys *flow);
> 
>+static inline bool flow_protos_are_icmpv4(__be16 n_proto, u8 ip_proto)
>+{
>+	return n_proto == htons(ETH_P_IP) && ip_proto == IPPROTO_ICMP;
>+}
>+
>+static inline bool flow_protos_are_icmpv6(__be16 n_proto, u8 ip_proto)
>+{
>+	return n_proto == htons(ETH_P_IPV6) && ip_proto == IPPROTO_ICMPV6;
>+}
>+
> static inline bool flow_protos_are_icmp_any(__be16 n_proto, u8 ip_proto)
> {
>-	return (n_proto == htons(ETH_P_IP) && ip_proto == IPPROTO_ICMP) ||
>-		(n_proto == htons(ETH_P_IPV6) && ip_proto == IPPROTO_ICMPV6);
>+	return flow_protos_are_icmpv4(n_proto, ip_proto) ||
>+		flow_protos_are_icmpv6(n_proto, ip_proto);
>+}
>+
>+static inline bool flow_basic_key_is_icmpv4(const struct flow_dissector_key_basic *basic)
>+{
>+	return flow_protos_are_icmpv4(basic->n_proto, basic->ip_proto);
>+}
>+
>+static inline bool flow_basic_key_is_icmpv6(const struct flow_dissector_key_basic *basic)
>+{
>+	return flow_protos_are_icmpv6(basic->n_proto, basic->ip_proto);
> }
> 

This hunk looks like it should be squashed to the previous patch.


> static inline bool flow_keys_are_icmp_any(const struct flow_keys *keys)
>diff --git a/include/uapi/linux/pkt_cls.h b/include/uapi/linux/pkt_cls.h
>index 86786d45ee66..58160fe80b80 100644
>--- a/include/uapi/linux/pkt_cls.h
>+++ b/include/uapi/linux/pkt_cls.h
>@@ -457,6 +457,16 @@ enum {
> 	TCA_FLOWER_KEY_ENC_UDP_SRC_PORT_MASK,	/* be16 */
> 	TCA_FLOWER_KEY_ENC_UDP_DST_PORT,	/* be16 */
> 	TCA_FLOWER_KEY_ENC_UDP_DST_PORT_MASK,	/* be16 */
>+
>+	TCA_FLOWER_KEY_ICMPV4_CODE,	/* u8 */
>+	TCA_FLOWER_KEY_ICMPV4_CODE_MASK,/* u8 */
>+	TCA_FLOWER_KEY_ICMPV4_TYPE,	/* u8 */
>+	TCA_FLOWER_KEY_ICMPV4_TYPE_MASK,/* u8 */
>+	TCA_FLOWER_KEY_ICMPV6_CODE,	/* u8 */
>+	TCA_FLOWER_KEY_ICMPV6_CODE_MASK,/* u8 */
>+	TCA_FLOWER_KEY_ICMPV6_TYPE,	/* u8 */
>+	TCA_FLOWER_KEY_ICMPV6_TYPE_MASK,/* u8 */
>+
> 	__TCA_FLOWER_MAX,
> };
> 
>diff --git a/net/sched/cls_flower.c b/net/sched/cls_flower.c
>index e8dd09af0d0c..412efa7de226 100644
>--- a/net/sched/cls_flower.c
>+++ b/net/sched/cls_flower.c
>@@ -355,6 +355,14 @@ static const struct nla_policy fl_policy[TCA_FLOWER_MAX + 1] = {
> 	[TCA_FLOWER_KEY_ENC_UDP_SRC_PORT_MASK]	= { .type = NLA_U16 },
> 	[TCA_FLOWER_KEY_ENC_UDP_DST_PORT]	= { .type = NLA_U16 },
> 	[TCA_FLOWER_KEY_ENC_UDP_DST_PORT_MASK]	= { .type = NLA_U16 },
>+	[TCA_FLOWER_KEY_ICMPV4_TYPE]	= { .type = NLA_U8 },
>+	[TCA_FLOWER_KEY_ICMPV4_TYPE_MASK] = { .type = NLA_U8 },
>+	[TCA_FLOWER_KEY_ICMPV4_CODE]	= { .type = NLA_U8 },
>+	[TCA_FLOWER_KEY_ICMPV4_CODE_MASK] = { .type = NLA_U8 },
>+	[TCA_FLOWER_KEY_ICMPV6_TYPE]	= { .type = NLA_U8 },
>+	[TCA_FLOWER_KEY_ICMPV6_TYPE_MASK] = { .type = NLA_U8 },
>+	[TCA_FLOWER_KEY_ICMPV6_CODE]	= { .type = NLA_U8 },
>+	[TCA_FLOWER_KEY_ICMPV6_CODE_MASK] = { .type = NLA_U8 },
> };
> 
> static void fl_set_key_val(struct nlattr **tb,
>@@ -471,6 +479,20 @@ static int fl_set_key(struct net *net, struct nlattr **tb,
> 		fl_set_key_val(tb, &key->tp.dst, TCA_FLOWER_KEY_SCTP_DST,
> 			       &mask->tp.dst, TCA_FLOWER_KEY_SCTP_DST_MASK,
> 			       sizeof(key->tp.dst));
>+	} else if (flow_basic_key_is_icmpv4(&key->basic)) {
>+		fl_set_key_val(tb, &key->tp.type, TCA_FLOWER_KEY_ICMPV4_TYPE,
>+			       &mask->tp.type, TCA_FLOWER_KEY_ICMPV4_TYPE_MASK,
>+			       sizeof(key->tp.type));
>+		fl_set_key_val(tb, &key->tp.code, TCA_FLOWER_KEY_ICMPV4_CODE,
>+			       &mask->tp.code, TCA_FLOWER_KEY_ICMPV4_CODE_MASK,
>+			       sizeof(key->tp.code));
>+	} else if (flow_basic_key_is_icmpv6(&key->basic)) {
>+		fl_set_key_val(tb, &key->tp.type, TCA_FLOWER_KEY_ICMPV6_TYPE,
>+			       &mask->tp.type, TCA_FLOWER_KEY_ICMPV6_TYPE_MASK,
>+			       sizeof(key->tp.type));
>+		fl_set_key_val(tb, &key->tp.code, TCA_FLOWER_KEY_ICMPV4_CODE,
>+			       &mask->tp.code, TCA_FLOWER_KEY_ICMPV4_CODE_MASK,
>+			       sizeof(key->tp.code));
> 	}
> 
> 	if (tb[TCA_FLOWER_KEY_ENC_IPV4_SRC] ||
>@@ -943,6 +965,26 @@ static int fl_dump(struct net *net, struct tcf_proto *tp, unsigned long fh,
> 				  &mask->tp.dst, TCA_FLOWER_KEY_SCTP_DST_MASK,
> 				  sizeof(key->tp.dst))))
> 		goto nla_put_failure;
>+	else if (flow_basic_key_is_icmpv4(&key->basic) &&
>+		 (fl_dump_key_val(skb, &key->tp.type,
>+				  TCA_FLOWER_KEY_ICMPV4_TYPE, &mask->tp.type,
>+				  TCA_FLOWER_KEY_ICMPV4_TYPE_MASK,
>+				  sizeof(key->tp.type)) ||
>+		  fl_dump_key_val(skb, &key->tp.code,
>+				  TCA_FLOWER_KEY_ICMPV4_CODE, &mask->tp.code,
>+				  TCA_FLOWER_KEY_ICMPV4_CODE_MASK,
>+				  sizeof(key->tp.code))))
>+		goto nla_put_failure;
>+	else if (flow_basic_key_is_icmpv6(&key->basic) &&
>+		 (fl_dump_key_val(skb, &key->tp.type,
>+				  TCA_FLOWER_KEY_ICMPV6_TYPE, &mask->tp.type,
>+				  TCA_FLOWER_KEY_ICMPV6_TYPE_MASK,
>+				  sizeof(key->tp.type)) ||
>+		  fl_dump_key_val(skb, &key->tp.code,
>+				  TCA_FLOWER_KEY_ICMPV6_CODE, &mask->tp.code,
>+				  TCA_FLOWER_KEY_ICMPV6_CODE_MASK,
>+				  sizeof(key->tp.code))))
>+		goto nla_put_failure;
> 
> 	if (key->enc_control.addr_type == FLOW_DISSECTOR_KEY_IPV4_ADDRS &&
> 	    (fl_dump_key_val(skb, &key->enc_ipv4.src,
>-- 
>2.7.0.rc3.207.g0ac5344
>

^ permalink raw reply

* bpf bounded loops. Was: [flamebait] xdp
From: Alexei Starovoitov @ 2016-12-02 18:39 UTC (permalink / raw)
  To: Hannes Frederic Sowa
  Cc: Tom Herbert, Thomas Graf, Linux Kernel Network Developers,
	Daniel Borkmann, David S. Miller
In-Reply-To: <9b4264f8-26b9-a611-56f0-0840cecf9c44@stressinduktion.org>

On Thu, Dec 01, 2016 at 10:27:12PM +0100, Hannes Frederic Sowa wrote:
> like") and the problematic of parsing DNS packets in XDP due to string
> processing and looping inside eBPF.

Hannes,
Not too long ago you proposed a very interesting idea to add
support for bounded loops without adding any new bpf instructions and
changing llvm (which was way better than my 'rep' like instructions
I was experimenting with). I thought systemtap guys also wanted bounded
loops and you were cooperating on the design, so I gave up on my work and
was expecting an imminent patch from you. I guess it sounds like you know
believe that bounded loops are impossible or I misunderstand your statement ?

As far as pattern search for DNS packets...
it was requested by Cloudflare guys back in March:
https://github.com/iovisor/bcc/issues/471
and it is useful for several tracing use cases as well.
Unfortunately no one had time to implement it yet.

^ permalink raw reply

* Re: arp_filter and IPv6 ND
From: Hannes Frederic Sowa @ 2016-12-02 18:39 UTC (permalink / raw)
  To: Saku Ytti; +Cc: netdev
In-Reply-To: <CAAeewD_erNdBw-zjPP9iFuju6FDgAgWrMKhMXPb58nqa0r22rA@mail.gmail.com>

Hi,

On 02.12.2016 18:51, Saku Ytti wrote:
> On 2 December 2016 at 18:45, Hannes Frederic Sowa
> <hannes@stressinduktion.org> wrote:
> 
>> next-hop-self attribute on your neighbor in that direction? BGP in
>> general doesn't lead to ND entry installs, protocols like IS-IS afair
>> can short circuit here.
> 
> That's the whole problem, Linux does not think of ND or ARP as
> interface specific thing, but as global thing. ND and ARP will happily
> answer to query from any interface if any other interface has said IP.
> I'm not sure why the Loopback ended up in Cisco ND Cache, answer is
> either Cisco queried for it or Linux did gratuitous answer. I believe
> gratuitous.
> 
>> Hmm, I would keep the Loopback announcements out of the BGP.
> 
> It's extremely common way to do anycast, but not interesting for the
> topic at hand.

Okay, sorry, I understood that you terminate the bgp sessions on those
IPs and use them for router-ids. But they are merely service IPs bound
to the loopback interface. I take it back and even for end system bgp
speakers/service announcement that is fine.

>> For enterprise and cloud stuff it is certainly very surprising, as some
>> isolations don't work as expected. OTOH it is really easy to build up
>> home networks and things are more plug and play.
> 
> Can you give me practical example when the behaviour is desirable, my
> imagination is failing me. I'm not arguing, I just want to understand
> it, as I've never had the need myself.

The major difference is that you e.g. keep connectivity in some
scenarios where strong end systems would fail.

E.g. you can use IP addresses bound to other interfaces to send replys
on another interface. This can be useful if you have a limited amount of
IP addresses on the system but much more interfaces. Especially if they
are limited in scope, like in IPv6.

Basically Cisco's feature of "unnumbered interface" is always provided
in Linux. And there are certainly cases where you would want to use it,
e.g. emulate private-vlan feature for network separation.

Also in the BGP setup, you might have it easier to establish loopback
neighbor contact by just using static on-link routes, without caring
about more complex numbering there (otherwise you pretty soon introduce
OSPF or some other routing protocol to do the recursive forward resolution).

> I've never ran into setup which needs it, but cursory googling shows
> several people having broken networks because of the behaviour. If it
> is needed, I'm sure it's esoteric setup and perhaps saner default
> would that extra sysctl config is needed to get this interface
> agnostic ARP/ND behaviour.

Yes, it is a very problematic situation at internet exchanges and weak
end behavior must be disabled there as it causes havoc.

As global IPv6 addresses are more or less global, such problems actually
shouldn't exist, as no conflicting IP addresses should show up. Link
Local addresses are anyway handled in a strong end manner. Thus if the
Cisco router would install your routing entry you would probably not
have noticed. :)

>> Some RFCs require that for some router implementations (CPE), on the
>> other hand weak end model in Linux was probably inherited by IPv4. The
>> addition of duplicate address detection (which of course only makes
>> sense in strong end systems) to IPv6, basically shows that IPv6 is more
>> or less designed to be a strong end system model.
>>
>> Anyway, a patch to suppress ndisc requests on those interfaces will
>> probably be accepted.
> 
> Grand, not that I feel comfortable writing it. I'd rather see the
> whole suppression functionality moved to neighbour.c from being AFI
> specific.

Yes sure, please provide a patch. A separate sysctl is necessary anyway
because the current one is within the ipv4 procfs directory hierarchy.

Bye,
Hannes

^ permalink raw reply

* Re: [PATH net v2] cdc_ether: Fix handling connection notification
From: David Miller @ 2016-12-02 18:40 UTC (permalink / raw)
  To: kristian.evensen; +Cc: oliver, linux-usb, netdev, linux-kernel
In-Reply-To: <20161201132317.32324-1-kristian.evensen@gmail.com>

From: Kristian Evensen <kristian.evensen@gmail.com>
Date: Thu,  1 Dec 2016 14:23:17 +0100

> Commit bfe9b9d2df66 ("cdc_ether: Improve ZTE MF823/831/910 handling")
> introduced a work-around in usbnet_cdc_status() for devices that exported
> cdc carrier on twice on connect. Before the commit, this behavior caused
> the link state to be incorrect. It was assumed that all CDC Ethernet
> devices would either export this behavior, or send one off and then one on
> notification (which seems to be the default behavior).
> 
> Unfortunately, it turns out multiple devices sends a connection
> notification multiple times per second (via an interrupt), even when
> connection state does not change. This has been observed with several
> different USB LAN dongles (at least), for example 13b1:0041 (Linksys).
> After bfe9b9d2df66, the link state has been set as down and then up for
> each notification. This has caused a flood of Netlink NEWLINK messages and
> syslog to be flooded with messages similar to:
> 
> cdc_ether 2-1:2.0 eth1: kevent 12 may have been dropped
> 
> This commit fixes the behavior by reverting usbnet_cdc_status() to how it
> was before bfe9b9d2df66. The work-around has been moved to a separate
> status-function which is only called when a known, affect device is
> detected.
> 
> v1->v2:
> 
> * Do not open-code netif_carrier_ok() (thanks Henning Schild).
> * Call netif_carrier_off() instead of usb_link_change(). This prevents
> calling schedule_work() twice without giving the work queue a chance to be
> processed (thanks Bjørn Mork).
> 
> Fixes: bfe9b9d2df66 ("cdc_ether: Improve ZTE MF823/831/910 handling")
> Reported-by: Henning Schild <henning.schild@siemens.com>
> Signed-off-by: Kristian Evensen <kristian.evensen@gmail.com>

Applied and queued up for -stable, thanks.

^ permalink raw reply

* Re: [PATCH 1/1] NET: usb: qmi_wwan: add support for Telit LE922A PID 0x1040
From: David Miller @ 2016-12-02 18:42 UTC (permalink / raw)
  To: dnlplm; +Cc: bjorn, netdev
In-Reply-To: <1480607525-23044-2-git-send-email-dnlplm@gmail.com>

From: Daniele Palmas <dnlplm@gmail.com>
Date: Thu,  1 Dec 2016 16:52:05 +0100

> This patch adds support for PID 0x1040 of Telit LE922A.
> 
> The qmi adapter requires to have DTR set for proper working,
> so QMI_WWAN_QUIRK_DTR has been enabled.
> 
> Signed-off-by: Daniele Palmas <dnlplm@gmail.com>

Applied.

^ permalink raw reply

* Re: [PATCH net] geneve: avoid use-after-free of skb->data
From: John W. Linville @ 2016-12-02 18:33 UTC (permalink / raw)
  To: Sabrina Dubroca; +Cc: netdev
In-Reply-To: <027c88dd060f5ca4535cb346db125829b2181a88.1480675406.git.sd@queasysnail.net>

On Fri, Dec 02, 2016 at 04:49:29PM +0100, Sabrina Dubroca wrote:
> geneve{,6}_build_skb can end up doing a pskb_expand_head(), which
> makes the ip_hdr(skb) reference we stashed earlier stale. Since it's
> only needed as an argument to ip_tunnel_ecn_encap(), move this
> directly in the function call.
> 
> Fixes: 08399efc6319 ("geneve: ensure ECN info is handled properly in all tx/rx paths")
> Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>

Reviewed-by: John W. Linville <linville@tuxdriver.com>

-- 
John W. Linville		Someday the world will need a hero, and you
linville@tuxdriver.com			might be all we have.  Be ready.

^ permalink raw reply

* Re: [PATCH v7 net-next 0/6] net: Add bpf support for sockets
From: David Miller @ 2016-12-02 18:46 UTC (permalink / raw)
  To: dsa; +Cc: netdev, daniel, ast, daniel, maheshb, tgraf
In-Reply-To: <1480610888-31082-1-git-send-email-dsa@cumulusnetworks.com>

From: David Ahern <dsa@cumulusnetworks.com>
Date: Thu,  1 Dec 2016 08:48:02 -0800

> The recently added VRF support in Linux leverages the bind-to-device
> API for programs to specify an L3 domain for a socket. While
> SO_BINDTODEVICE has been around for ages, not every ipv4/ipv6 capable
> program has support for it. Even for those programs that do support it,
> the API requires processes to be started as root (CAP_NET_RAW) which
> is not desirable from a general security perspective.
> 
> This patch set leverages Daniel Mack's work to attach bpf programs to
> a cgroup to provide a capability to set sk_bound_dev_if for all
> AF_INET{6} sockets opened by a process in a cgroup when the sockets
> are allocated.
 ...

Series applied, thanks David.

^ permalink raw reply

* Re: [PATCH v6 net-next 0/7] Support Armada 37xx SoC (ARMv8 64-bits) in mvneta driver
From: David Miller @ 2016-12-02 18:52 UTC (permalink / raw)
  To: gregory.clement
  Cc: linux-kernel, netdev, jszhang, arnd, jason, andrew,
	sebastian.hesselbarth, thomas.petazzoni, linux-arm-kernel, nadavh,
	mw, dima, yelena
In-Reply-To: <cover.dd374b7aaa358be0211d7ead81129a399fa692f4.1480611779.git-series.gregory.clement@free-electrons.com>

From: Gregory CLEMENT <gregory.clement@free-electrons.com>
Date: Thu,  1 Dec 2016 18:03:03 +0100

> The Armada 37xx is a new ARMv8 SoC from Marvell using same network
> controller as the older Armada 370/38x/XP SoCs. This series adapts the
> driver in order to be able to use it on this new SoC. The main changes
> are:
> 
> - 64-bits support: the first patches allow using the driver on a 64-bit
>   architecture.
> 
> - MBUS support: the mbus configuration is different on Armada 37xx
>   from the older SoCs.
> 
> - per cpu interrupt: Armada 37xx do not support per cpu interrupt for
>   the NETA IP, the non-per-CPU behavior was added back.
> 
> The first patch is an optimization in the rx path in swbm mode.
> The second patch remove unnecessary allocation for HWBM.
> The first item is solved by patches 4 and 5.
> The 2 last items are solved by patch 6.
> In patch 7 the dt support is added.
> 
> Beside Armada 37xx, this series have been again tested on Armada XP
> and Armada 38x (with Hardware Buffer Management and with Software
> Buffer Management).
 ...

Series applied, thanks.

^ permalink raw reply

* Re: [PATCH net v2] net: bcmgenet: Utilize correct struct device for all DMA operations
From: David Miller @ 2016-12-02 18:54 UTC (permalink / raw)
  To: f.fainelli; +Cc: netdev, pgynther, jaedon.shin, opendmb
In-Reply-To: <1480614345-5827-1-git-send-email-florian.fainelli@broadcom.com>

From: Florian Fainelli <f.fainelli@gmail.com>
Date: Thu,  1 Dec 2016 09:45:45 -0800

> From: Florian Fainelli <f.fainelli@gmail.com>
> 
> __bcmgenet_tx_reclaim() and bcmgenet_free_rx_buffers() are not using the
> same struct device during unmap that was used for the map operation,
> which makes DMA-API debugging warn about it. Fix this by always using
> &priv->pdev->dev throughout the driver, using an identical device
> reference for all map/unmap calls.
> 
> Fixes: 1c1008c793fa ("net: bcmgenet: add main driver file")
> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>

Applied and queued up for -stable, thanks.

^ permalink raw reply

* [PATCH net] net: avoid signed overflows for SO_{SND|RCV}BUFFORCE
From: Eric Dumazet @ 2016-12-02 17:44 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, Andrey Konovalov

From: Eric Dumazet <edumazet@google.com>

CAP_NET_ADMIN users should not be allowed to set negative
sk_sndbuf or sk_rcvbuf values, as it can lead to various memory
corruptions, crashes, OOM...

Note that before commit 82981930125a ("net: cleanups in
sock_setsockopt()"), the bug was even more serious, since SO_SNDBUF
and SO_RCVBUF were vulnerable.

This needs to be backported to all known linux kernels.

Again, many thanks to syzkaller team for discovering this gem.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Andrey Konovalov <andreyknvl@google.com>
---
 net/core/sock.c |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/core/sock.c b/net/core/sock.c
index 5e3ca414357e2404db28eeacc5e9306051161493..00a074dbfe9bf169c2b81498e6ae265199745b22 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -715,7 +715,7 @@ int sock_setsockopt(struct socket *sock, int level, int optname,
 		val = min_t(u32, val, sysctl_wmem_max);
 set_sndbuf:
 		sk->sk_userlocks |= SOCK_SNDBUF_LOCK;
-		sk->sk_sndbuf = max_t(u32, val * 2, SOCK_MIN_SNDBUF);
+		sk->sk_sndbuf = max_t(int, val * 2, SOCK_MIN_SNDBUF);
 		/* Wake up sending tasks if we upped the value. */
 		sk->sk_write_space(sk);
 		break;
@@ -751,7 +751,7 @@ int sock_setsockopt(struct socket *sock, int level, int optname,
 		 * returning the value we actually used in getsockopt
 		 * is the most desirable behavior.
 		 */
-		sk->sk_rcvbuf = max_t(u32, val * 2, SOCK_MIN_RCVBUF);
+		sk->sk_rcvbuf = max_t(int, val * 2, SOCK_MIN_RCVBUF);
 		break;
 
 	case SO_RCVBUFFORCE:

^ permalink raw reply related

* Re: [PATCH v3] sh_eth: remove unchecked interrupts for RZ/A1
From: David Miller @ 2016-12-02 18:55 UTC (permalink / raw)
  To: chris.brandt
  Cc: sergei.shtylyov, horms+renesas, geert+renesas, netdev,
	linux-renesas-soc
In-Reply-To: <20161201183214.30196-1-chris.brandt@renesas.com>

From: Chris Brandt <chris.brandt@renesas.com>
Date: Thu,  1 Dec 2016 13:32:14 -0500

> When streaming a lot of data and the RZ/A1 can't keep up, some status bits
> will get set that are not being checked or cleared which cause the
> following messages and the Ethernet driver to stop working. This
> patch fixes that issue.
> 
> irq 21: nobody cared (try booting with the "irqpoll" option)
> handlers:
> [<c036b71c>] sh_eth_interrupt
> Disabling IRQ #21
> 
> Fixes: db893473d313a4ad ("sh_eth: Add support for r7s72100")
> Signed-off-by: Chris Brandt <chris.brandt@renesas.com>
> Acked-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>

Applied and queued up for -stable, thanks.

^ permalink raw reply

* Re: pull-request: wireless-drivers-next 2016-12-01
From: David Miller @ 2016-12-02 18:58 UTC (permalink / raw)
  To: kvalo; +Cc: linux-wireless, netdev, linux-kernel
In-Reply-To: <877f7jsdmm.fsf@kamboji.qca.qualcomm.com>

From: Kalle Valo <kvalo@codeaurora.org>
Date: Thu, 01 Dec 2016 20:33:37 +0200

> here's another pull request for net-next. Nothing special to mention
> about, the details are in the signed tag below.
> 
> This time there's a trivial conflict in
> drivers/net/wireless/ath/ath10k/mac.c:
> 
> <<<<<<< HEAD
> 	ieee80211_hw_set(ar->hw, SUPPORTS_TX_FRAG);
> =======
> 	ieee80211_hw_set(ar->hw, REPORTS_LOW_ACK);
>>>>>>>> d5fb3a138048798ce4cc4b4ced47d07d1794c577
> 
> We want to have both flags enabled in ath10k.
> 
> I'm planning to submit at least one more pull request, if Linus gives us
> one more week I might send even two. For example there are patches to
> convert wcn36xx to use the real SMD bus subsystem but they depend on few
> arm-soc patches. I'll send a separate email about that, they are not
> part of this pull request.
> 
> Please let me know if there are any problems.

Pulled, thanks so much for the heads up about the ath10k merge conflict.

^ permalink raw reply

* Re: [patch] net: renesas: ravb: unintialized return value
From: David Miller @ 2016-12-02 19:00 UTC (permalink / raw)
  To: dan.carpenter
  Cc: sergei.shtylyov, johan, ykaneko0929, kazuya.mizuguchi.ks,
	horms+renesas, wsa+renesas, andrew, tremyfr,
	niklas.soderlund+renesas, arnd, netdev, linux-renesas-soc,
	kernel-janitors
In-Reply-To: <20161201205744.GB10701@mwanda>

From: Dan Carpenter <dan.carpenter@oracle.com>
Date: Thu, 1 Dec 2016 23:57:44 +0300

> We want to set the other "err" variable here so that we can return it
> later.  My version of GCC misses this issue but I caught it with a
> static checker.
> 
> Fixes: 9f70eb339f52 ("net: ethernet: renesas: ravb: fix fixed-link phydev leaks")
> Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>

Applied.

^ permalink raw reply

* Re: [PATCH] iproute2: ss: escape all null bytes in abstract unix domain socket
From: Eric Dumazet @ 2016-12-02 18:59 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: Isaac Boukris, davem, netdev, linux-kernel
In-Reply-To: <20161112101729.4d400929@samsung9>

On Sat, 2016-11-12 at 10:17 +0300, Stephen Hemminger wrote:
> On Sat, 29 Oct 2016 22:20:19 +0300
> Isaac Boukris <iboukris@gmail.com> wrote:
> 
> > Abstract unix domain socket may embed null characters,
> > these should be translated to '@' when printed by ss the
> > same way the null prefix is currently being translated.
> > 
> > Signed-off-by: Isaac Boukris <iboukris@gmail.com>
> 
> Applied

Probably not a good idea to have :

                       for (int i = 0; i < len; i++)
                               if (name[i] == '\0')
                                       name[i] = '@';

ss.c: In function 'unix_show_sock':
ss.c:3128:4: error: 'for' loop initial declarations are only allowed in C99 mode
ss.c:3128:4: note: use option -std=c99 or -std=gnu99 to compile your code
make[1]: *** [ss.o] Error 1

^ permalink raw reply

* Re: pull-request: can 2016-12-02
From: David Miller @ 2016-12-02 19:02 UTC (permalink / raw)
  To: mkl; +Cc: netdev, linux-can, kernel
In-Reply-To: <20161202082931.22270-1-mkl@pengutronix.de>

From: Marc Kleine-Budde <mkl@pengutronix.de>
Date: Fri,  2 Dec 2016 09:29:29 +0100

> this is a pull request for net/master.
> 
> THere are two patches by Stephane Grosjean, who adds support for the new
> PCAN-USB X6 USB interface to the pcan_usb driver.

Pulled, thanks.

^ permalink raw reply

* Re: [PATCH net v3] tipc: check minimum bearer MTU
From: David Miller @ 2016-12-02 19:03 UTC (permalink / raw)
  To: mkubecek; +Cc: jon.maloy, zhangqian-c, netdev, linux-kernel, tipc-discussion,
	ben
In-Reply-To: <20161202083341.BB955A0F33@unicorn.suse.cz>

From: Michal Kubecek <mkubecek@suse.cz>
Date: Fri,  2 Dec 2016 09:33:41 +0100 (CET)

> Qian Zhang (张谦) reported a potential socket buffer overflow in
> tipc_msg_build() which is also known as CVE-2016-8632: due to
> insufficient checks, a buffer overflow can occur if MTU is too short for
> even tipc headers. As anyone can set device MTU in a user/net namespace,
> this issue can be abused by a regular user.
> 
> As agreed in the discussion on Ben Hutchings' original patch, we should
> check the MTU at the moment a bearer is attached rather than for each
> processed packet. We also need to repeat the check when bearer MTU is
> adjusted to new device MTU. UDP case also needs a check to avoid
> overflow when calculating bearer MTU.
> 
> Fixes: b97bf3fd8f6a ("[TIPC] Initial merge")
> Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
> Reported-by: Qian Zhang (张谦) <zhangqian-c@360.cn>

Applied and queued up for -stable, thanks.
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most 
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
_______________________________________________
tipc-discussion mailing list
tipc-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tipc-discussion

^ permalink raw reply

* Re: [PATCH net] net: avoid signed overflows for SO_{SND|RCV}BUFFORCE
From: David Miller @ 2016-12-02 19:10 UTC (permalink / raw)
  To: eric.dumazet; +Cc: netdev, andreyknvl
In-Reply-To: <1480700693.18162.378.camel@edumazet-glaptop3.roam.corp.google.com>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Fri, 02 Dec 2016 09:44:53 -0800

> From: Eric Dumazet <edumazet@google.com>
> 
> CAP_NET_ADMIN users should not be allowed to set negative
> sk_sndbuf or sk_rcvbuf values, as it can lead to various memory
> corruptions, crashes, OOM...
> 
> Note that before commit 82981930125a ("net: cleanups in
> sock_setsockopt()"), the bug was even more serious, since SO_SNDBUF
> and SO_RCVBUF were vulnerable.
> 
> This needs to be backported to all known linux kernels.
> 
> Again, many thanks to syzkaller team for discovering this gem.
> 
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Reported-by: Andrey Konovalov <andreyknvl@google.com>

Applied and queued up for -stable, thanks Eric.

^ permalink raw reply

* Re: [PATCHv2 net-next 1/4] net: dsa: mv88e6xxx: Implement mv88e6390 tag remap
From: Vivien Didelot @ 2016-12-02 19:15 UTC (permalink / raw)
  To: Andrew Lunn, David Miller; +Cc: netdev, Andrew Lunn
In-Reply-To: <1480701779-30633-2-git-send-email-andrew@lunn.ch>

Hi Andrew,

Andrew Lunn <andrew@lunn.ch> writes:

> +/* Offset 0x18: Port IEEE Priority Remapping Registers [0-3]
> + * Offset 0x19: Port IEEE Priority Remapping Registers [4-7]
> + */
> +
> +int mv88e6095_port_tag_remap(struct mv88e6xxx_chip *chip, int port)
> +{
> +	int err;
> +
> +	/* Use a direct priority mapping for all IEEE tagged frames */
> +	err = mv88e6xxx_port_write(chip, port, PORT_TAG_REGMAP_0123, 0x3210);
> +	if (err)
> +		return err;
> +
> +	return mv88e6xxx_port_write(chip, port, PORT_TAG_REGMAP_4567, 0x7654);
> +}
> +
> +static int mv88e6xxx_port_ieeepmt_write(struct mv88e6xxx_chip *chip,
> +					int port, u16 table,
> +					u8 pointer, u16 data)
> +{
> +	u16 reg;
> +
> +	reg = PORT_IEEE_PRIO_MAP_TABLE_UPDATE |
> +		table |
> +		(pointer << PORT_IEEE_PRIO_MAP_TABLE_POINTER_SHIFT) |
> +		data;
> +
> +	return mv88e6xxx_port_write(chip, port, PORT_IEEE_PRIO_MAP_TABLE, reg);
> +}
> +

I'll send a delta patch to introduce mv88e6xxx_port_update() so we'll
benefit from the free wait on update bit 15.

> +int mv88e6390_port_tag_remap(struct mv88e6xxx_chip *chip, int port)
> +{
> +	int err, i;
> +
> +	for (i = 0; i <= 7; i++) {
> +		err = mv88e6xxx_port_ieeepmt_write(
> +			chip, port, PORT_IEEE_PRIO_MAP_TABLE_INGRESS_PCP,
> +			i, (i | i << 4));

So here you are also mapping the frame's IEEE QPRI (offset 4), this is a
bit inconsistent compared to mv88e6095_port_tag_remap, which doesn't.

But it seems like these functions are only used at the moment to write
the default values, so I guess it doesn't really matter right now...

> +		if (err)
> +			return err;
> +
> +		err = mv88e6xxx_port_ieeepmt_write(
> +			chip, port, PORT_IEEE_PRIO_MAP_TABLE_EGRESS_GREEN_PCP,
> +			i, i);
> +		if (err)
> +			return err;
> +
> +		err = mv88e6xxx_port_ieeepmt_write(
> +			chip, port, PORT_IEEE_PRIO_MAP_TABLE_EGRESS_YELLOW_PCP,
> +			i, i);
> +		if (err)
> +			return err;
> +
> +		err = mv88e6xxx_port_ieeepmt_write(
> +			chip, port, PORT_IEEE_PRIO_MAP_TABLE_EGRESS_AVB_PCP,
> +			i, i);
> +		if (err)
> +			return err;
> +	}
> +
> +	return 0;
> +}

Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>

Thanks,

        Vivien

^ permalink raw reply

* Re: [PATCH net-next 2/2] net/sched: cls_flower: Support matching on ICMP type and code
From: Simon Horman @ 2016-12-02 19:17 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: David Miller, netdev, Jay Vosburgh, Veaceslav Falico,
	Andy Gospodarek, Jamal Hadi Salim, Jiri Pirko
In-Reply-To: <20161202183848.GF1883@nanopsycho.orion>

On Fri, Dec 02, 2016 at 07:38:48PM +0100, Jiri Pirko wrote:
> Fri, Dec 02, 2016 at 07:05:51PM CET, simon.horman@netronome.com wrote:
> >Support matching on ICMP type and code.
> >
> >Example usage:
> >
> >tc qdisc add dev eth0 ingress
> >
> >tc filter add dev eth0 protocol ip parent ffff: flower \
> >	indev eth0 ip_proto icmp type 8 code 0 action drop
> >
> >tc filter add dev eth0 protocol ipv6 parent ffff: flower \
> >	indev eth0 ip_proto icmpv6 type 128 code 0 action drop
> >
> >Signed-off-by: Simon Horman <simon.horman@netronome.com>
> >---
> > include/net/flow_dissector.h | 24 ++++++++++++++++++++++--
> > include/uapi/linux/pkt_cls.h | 10 ++++++++++
> > net/sched/cls_flower.c       | 42 ++++++++++++++++++++++++++++++++++++++++++
> > 3 files changed, 74 insertions(+), 2 deletions(-)
> >
> >diff --git a/include/net/flow_dissector.h b/include/net/flow_dissector.h
> >index 8880025914e3..5540dfa18872 100644
> >--- a/include/net/flow_dissector.h
> >+++ b/include/net/flow_dissector.h
> >@@ -199,10 +199,30 @@ struct flow_keys_digest {
> > void make_flow_keys_digest(struct flow_keys_digest *digest,
> > 			   const struct flow_keys *flow);
> > 
> >+static inline bool flow_protos_are_icmpv4(__be16 n_proto, u8 ip_proto)
> >+{
> >+	return n_proto == htons(ETH_P_IP) && ip_proto == IPPROTO_ICMP;
> >+}
> >+
> >+static inline bool flow_protos_are_icmpv6(__be16 n_proto, u8 ip_proto)
> >+{
> >+	return n_proto == htons(ETH_P_IPV6) && ip_proto == IPPROTO_ICMPV6;
> >+}
> >+
> > static inline bool flow_protos_are_icmp_any(__be16 n_proto, u8 ip_proto)
> > {
> >-	return (n_proto == htons(ETH_P_IP) && ip_proto == IPPROTO_ICMP) ||
> >-		(n_proto == htons(ETH_P_IPV6) && ip_proto == IPPROTO_ICMPV6);
> >+	return flow_protos_are_icmpv4(n_proto, ip_proto) ||
> >+		flow_protos_are_icmpv6(n_proto, ip_proto);
> >+}
> >+
> >+static inline bool flow_basic_key_is_icmpv4(const struct flow_dissector_key_basic *basic)
> >+{
> >+	return flow_protos_are_icmpv4(basic->n_proto, basic->ip_proto);
> >+}
> >+
> >+static inline bool flow_basic_key_is_icmpv6(const struct flow_dissector_key_basic *basic)
> >+{
> >+	return flow_protos_are_icmpv6(basic->n_proto, basic->ip_proto);
> > }
> > 
> 
> This hunk looks like it should be squashed to the previous patch.

I included it in this patch as it is where these helpers are used
for the first time. I can shuffle it into the first patch if you prefer;
I agree it does make sense to put all the dissector changes there.

^ permalink raw reply

* Re: [PATCH net] geneve: avoid use-after-free of skb->data
From: David Miller @ 2016-12-02 19:09 UTC (permalink / raw)
  To: sd; +Cc: netdev, linville
In-Reply-To: <027c88dd060f5ca4535cb346db125829b2181a88.1480675406.git.sd@queasysnail.net>

From: Sabrina Dubroca <sd@queasysnail.net>
Date: Fri,  2 Dec 2016 16:49:29 +0100

> geneve{,6}_build_skb can end up doing a pskb_expand_head(), which
> makes the ip_hdr(skb) reference we stashed earlier stale. Since it's
> only needed as an argument to ip_tunnel_ecn_encap(), move this
> directly in the function call.
> 
> Fixes: 08399efc6319 ("geneve: ensure ECN info is handled properly in all tx/rx paths")
> Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>

Applied and queued up for -stable, thanks.

This bug happens so many times that I think it might be time for
a debugging mode for pskb_expand_head() that unconditionally
reallocates the skb->data buffer regardless of whether it's
necessary or not and somehow unmaps the previous buffer to
force a trap on stale pointers.

Better ideas welcome, of course :)

^ permalink raw reply

* Re: bpf bounded loops. Was: [flamebait] xdp
From: Hannes Frederic Sowa @ 2016-12-02 19:25 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Tom Herbert, Thomas Graf, Linux Kernel Network Developers,
	Daniel Borkmann, David S. Miller
In-Reply-To: <20161202183903.GC54949@ast-mbp.thefacebook.com>

Hi,

On 02.12.2016 19:39, Alexei Starovoitov wrote:
> On Thu, Dec 01, 2016 at 10:27:12PM +0100, Hannes Frederic Sowa wrote:
>> like") and the problematic of parsing DNS packets in XDP due to string
>> processing and looping inside eBPF.
> 
> Hannes,
> Not too long ago you proposed a very interesting idea to add
> support for bounded loops without adding any new bpf instructions and
> changing llvm (which was way better than my 'rep' like instructions
> I was experimenting with). I thought systemtap guys also wanted bounded
> loops and you were cooperating on the design, so I gave up on my work and
> was expecting an imminent patch from you. I guess it sounds like you know
> believe that bounded loops are impossible or I misunderstand your statement ?

Your argument was that it would need a new verifier as the current first
pass checks that we indeed can lay out the basic blocks as a DAG which
the second pass depends on. This would be violated.

Because eBPF is available by non privileged users this would need a lot
of effort to rewrite and verify (or indeed keep two verifiers in the
kernel for priv and non-priv). The verifier itself is exposed to
unprivileged users.

Also, by design, if we keep the current limits, this would not give you
more instructions to operate on compared to the flattened version of the
program, it would merely reduce the numbers of optimizations in LLVM
that let the verifier reject the program.

Only enabling the relaxed verifier for root users seemed thus being
problematic as programs wouldn't be portable between nonprivileged and
privileged users.

> As far as pattern search for DNS packets...
> it was requested by Cloudflare guys back in March:
> https://github.com/iovisor/bcc/issues/471
> and it is useful for several tracing use cases as well.
> Unfortunately no one had time to implement it yet.

The string operations you proposed on the other hand, which would count
as one eBPF instructions, would give a lot more flexibility and allow
more cycles to burn, but don't help parsing binary protocols like IPv6
extension headers.

Bye,
Hannes

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox