Netdev List

Netdev List
 help / color / mirror / Atom feed

* [PATCHv2 net-next 1/5] ipv4: add __ip_queue_xmit() that supports tos param
From: Xin Long @ 2018-07-02 10:21 UTC (permalink / raw)
  To: network dev, linux-sctp
  Cc: Marcelo Ricardo Leitner, Neil Horman, davem, hideaki.yoshifuji
In-Reply-To: <cover.1530526661.git.lucien.xin@gmail.com>

This patch introduces __ip_queue_xmit(), through which the callers
can pass tos param into it without having to set inet->tos. For
ipv6, ip6_xmit() already allows passing tclass parameter.

It's needed when some transport protocol doesn't use inet->tos,
like sctp's per transport dscp, which will be added in next patch.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
---
 include/net/ip.h     | 9 ++++++++-
 net/ipv4/ip_output.c | 9 +++++----
 2 files changed, 13 insertions(+), 5 deletions(-)

diff --git a/include/net/ip.h b/include/net/ip.h
index 0d2281b..09da79d 100644
--- a/include/net/ip.h
+++ b/include/net/ip.h
@@ -148,7 +148,8 @@ void ip_send_check(struct iphdr *ip);
 int __ip_local_out(struct net *net, struct sock *sk, struct sk_buff *skb);
 int ip_local_out(struct net *net, struct sock *sk, struct sk_buff *skb);
 
-int ip_queue_xmit(struct sock *sk, struct sk_buff *skb, struct flowi *fl);
+int __ip_queue_xmit(struct sock *sk, struct sk_buff *skb, struct flowi *fl,
+		    __u8 tos);
 void ip_init(void);
 int ip_append_data(struct sock *sk, struct flowi4 *fl4,
 		   int getfrag(void *from, char *to, int offset, int len,
@@ -174,6 +175,12 @@ struct sk_buff *ip_make_skb(struct sock *sk, struct flowi4 *fl4,
 			    struct ipcm_cookie *ipc, struct rtable **rtp,
 			    struct inet_cork *cork, unsigned int flags);
 
+static inline int ip_queue_xmit(struct sock *sk, struct sk_buff *skb,
+				struct flowi *fl)
+{
+	return __ip_queue_xmit(sk, skb, fl, inet_sk(sk)->tos);
+}
+
 static inline struct sk_buff *ip_finish_skb(struct sock *sk, struct flowi4 *fl4)
 {
 	return __ip_make_skb(sk, fl4, &sk->sk_write_queue, &inet_sk(sk)->cork.base);
diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index b3308e9..188cc58 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -423,7 +423,8 @@ static void ip_copy_addrs(struct iphdr *iph, const struct flowi4 *fl4)
 }
 
 /* Note: skb->sk can be different from sk, in case of tunnels */
-int ip_queue_xmit(struct sock *sk, struct sk_buff *skb, struct flowi *fl)
+int __ip_queue_xmit(struct sock *sk, struct sk_buff *skb, struct flowi *fl,
+		    __u8 tos)
 {
 	struct inet_sock *inet = inet_sk(sk);
 	struct net *net = sock_net(sk);
@@ -462,7 +463,7 @@ int ip_queue_xmit(struct sock *sk, struct sk_buff *skb, struct flowi *fl)
 					   inet->inet_dport,
 					   inet->inet_sport,
 					   sk->sk_protocol,
-					   RT_CONN_FLAGS(sk),
+					   RT_CONN_FLAGS_TOS(sk, tos),
 					   sk->sk_bound_dev_if);
 		if (IS_ERR(rt))
 			goto no_route;
@@ -478,7 +479,7 @@ int ip_queue_xmit(struct sock *sk, struct sk_buff *skb, struct flowi *fl)
 	skb_push(skb, sizeof(struct iphdr) + (inet_opt ? inet_opt->opt.optlen : 0));
 	skb_reset_network_header(skb);
 	iph = ip_hdr(skb);
-	*((__be16 *)iph) = htons((4 << 12) | (5 << 8) | (inet->tos & 0xff));
+	*((__be16 *)iph) = htons((4 << 12) | (5 << 8) | (tos & 0xff));
 	if (ip_dont_fragment(sk, &rt->dst) && !skb->ignore_df)
 		iph->frag_off = htons(IP_DF);
 	else
@@ -511,7 +512,7 @@ int ip_queue_xmit(struct sock *sk, struct sk_buff *skb, struct flowi *fl)
 	kfree_skb(skb);
 	return -EHOSTUNREACH;
 }
-EXPORT_SYMBOL(ip_queue_xmit);
+EXPORT_SYMBOL(__ip_queue_xmit);
 
 static void ip_copy_metadata(struct sk_buff *to, struct sk_buff *from)
 {
-- 
2.1.0

^ permalink raw reply related

* [PATCHv2 net-next 2/5] sctp: add support for dscp and flowlabel per transport
From: Xin Long @ 2018-07-02 10:21 UTC (permalink / raw)
  To: network dev, linux-sctp
  Cc: Marcelo Ricardo Leitner, Neil Horman, davem, hideaki.yoshifuji
In-Reply-To: <cover.1530526661.git.lucien.xin@gmail.com>

Like some other per transport params, flowlabel and dscp are added
in transport, asoc and sctp_sock. By default, transport sets its
value from asoc's, and asoc does it from sctp_sock. flowlabel
only works for ipv6 transport.

Other than that they need to be passed down in sctp_xmit, flow4/6
also needs to set them before looking up route in get_dst.

Note that it uses '& 0x100000' to check if flowlabel is set and
'& 0x1' (tos 1st bit is unused) to check if dscp is set by users,
so that they could be set to 0 by sockopt in next patch.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
---
 include/linux/sctp.h       |  7 +++++++
 include/net/sctp/structs.h |  9 +++++++++
 net/sctp/associola.c       |  7 +++++++
 net/sctp/ipv6.c            | 11 +++++++++--
 net/sctp/protocol.c        | 16 ++++++++++++----
 5 files changed, 44 insertions(+), 6 deletions(-)

diff --git a/include/linux/sctp.h b/include/linux/sctp.h
index b36c766..83d9434 100644
--- a/include/linux/sctp.h
+++ b/include/linux/sctp.h
@@ -801,4 +801,11 @@ struct sctp_strreset_resptsn {
 	__be32 receivers_next_tsn;
 };
 
+enum {
+	SCTP_DSCP_SET_MASK = 0x1,
+	SCTP_DSCP_VAL_MASK = 0xfc,
+	SCTP_FLOWLABEL_SET_MASK = 0x100000,
+	SCTP_FLOWLABEL_VAL_MASK = 0xfffff
+};
+
 #endif /* __LINUX_SCTP_H__ */
diff --git a/include/net/sctp/structs.h b/include/net/sctp/structs.h
index 701a517..ab869e0 100644
--- a/include/net/sctp/structs.h
+++ b/include/net/sctp/structs.h
@@ -193,6 +193,9 @@ struct sctp_sock {
 	/* This is the max_retrans value for new associations. */
 	__u16 pathmaxrxt;
 
+	__u32 flowlabel;
+	__u8  dscp;
+
 	/* The initial Path MTU to use for new associations. */
 	__u32 pathmtu;
 
@@ -895,6 +898,9 @@ struct sctp_transport {
 	 */
 	__u16 pathmaxrxt;
 
+	__u32 flowlabel;
+	__u8  dscp;
+
 	/* This is the partially failed retrans value for the transport
 	 * and will be initialized from the assocs value.  This can be changed
 	 * using the SCTP_PEER_ADDR_THLDS socket option
@@ -1772,6 +1778,9 @@ struct sctp_association {
 	 */
 	__u16 pathmaxrxt;
 
+	__u32 flowlabel;
+	__u8  dscp;
+
 	/* Flag that path mtu update is pending */
 	__u8   pmtu_pending;
 
diff --git a/net/sctp/associola.c b/net/sctp/associola.c
index 5d5a162..16ecfbc 100644
--- a/net/sctp/associola.c
+++ b/net/sctp/associola.c
@@ -115,6 +115,9 @@ static struct sctp_association *sctp_association_init(
 	/* Initialize path max retrans value. */
 	asoc->pathmaxrxt = sp->pathmaxrxt;
 
+	asoc->flowlabel = sp->flowlabel;
+	asoc->dscp = sp->dscp;
+
 	/* Initialize default path MTU. */
 	asoc->pathmtu = sp->pathmtu;
 
@@ -647,6 +650,10 @@ struct sctp_transport *sctp_assoc_add_peer(struct sctp_association *asoc,
 	peer->sackdelay = asoc->sackdelay;
 	peer->sackfreq = asoc->sackfreq;
 
+	if (addr->sa.sa_family == AF_INET6)
+		peer->flowlabel = asoc->flowlabel;
+	peer->dscp = asoc->dscp;
+
 	/* Enable/disable heartbeat, SACK delay, and path MTU discovery
 	 * based on association setting.
 	 */
diff --git a/net/sctp/ipv6.c b/net/sctp/ipv6.c
index 7339918..772513d 100644
--- a/net/sctp/ipv6.c
+++ b/net/sctp/ipv6.c
@@ -209,12 +209,17 @@ static int sctp_v6_xmit(struct sk_buff *skb, struct sctp_transport *transport)
 	struct sock *sk = skb->sk;
 	struct ipv6_pinfo *np = inet6_sk(sk);
 	struct flowi6 *fl6 = &transport->fl.u.ip6;
+	__u8 tclass = np->tclass;
 	int res;
 
 	pr_debug("%s: skb:%p, len:%d, src:%pI6 dst:%pI6\n", __func__, skb,
 		 skb->len, &fl6->saddr, &fl6->daddr);
 
-	IP6_ECN_flow_xmit(sk, fl6->flowlabel);
+	if (transport->dscp & SCTP_DSCP_SET_MASK)
+		tclass = transport->dscp & SCTP_DSCP_VAL_MASK;
+
+	if (INET_ECN_is_capable(tclass))
+		IP6_ECN_flow_xmit(sk, fl6->flowlabel);
 
 	if (!(transport->param_flags & SPP_PMTUD_ENABLE))
 		skb->ignore_df = 1;
@@ -223,7 +228,7 @@ static int sctp_v6_xmit(struct sk_buff *skb, struct sctp_transport *transport)
 
 	rcu_read_lock();
 	res = ip6_xmit(sk, skb, fl6, sk->sk_mark, rcu_dereference(np->opt),
-		       np->tclass);
+		       tclass);
 	rcu_read_unlock();
 	return res;
 }
@@ -254,6 +259,8 @@ static void sctp_v6_get_dst(struct sctp_transport *t, union sctp_addr *saddr,
 		fl6->flowi6_oif = daddr->v6.sin6_scope_id;
 	else if (asoc)
 		fl6->flowi6_oif = asoc->base.sk->sk_bound_dev_if;
+	if (t->flowlabel & SCTP_FLOWLABEL_SET_MASK)
+		fl6->flowlabel = htonl(t->flowlabel & SCTP_FLOWLABEL_VAL_MASK);
 
 	pr_debug("%s: dst=%pI6 ", __func__, &fl6->daddr);
 
diff --git a/net/sctp/protocol.c b/net/sctp/protocol.c
index 5dffbc4..d57fd30 100644
--- a/net/sctp/protocol.c
+++ b/net/sctp/protocol.c
@@ -426,13 +426,16 @@ static void sctp_v4_get_dst(struct sctp_transport *t, union sctp_addr *saddr,
 	struct dst_entry *dst = NULL;
 	union sctp_addr *daddr = &t->ipaddr;
 	union sctp_addr dst_saddr;
+	__u8 tos = inet_sk(sk)->tos;
 
+	if (t->dscp & SCTP_DSCP_SET_MASK)
+		tos = t->dscp & SCTP_DSCP_VAL_MASK;
 	memset(fl4, 0x0, sizeof(struct flowi4));
 	fl4->daddr  = daddr->v4.sin_addr.s_addr;
 	fl4->fl4_dport = daddr->v4.sin_port;
 	fl4->flowi4_proto = IPPROTO_SCTP;
 	if (asoc) {
-		fl4->flowi4_tos = RT_CONN_FLAGS(asoc->base.sk);
+		fl4->flowi4_tos = RT_CONN_FLAGS_TOS(asoc->base.sk, tos);
 		fl4->flowi4_oif = asoc->base.sk->sk_bound_dev_if;
 		fl4->fl4_sport = htons(asoc->base.bind_addr.port);
 	}
@@ -495,7 +498,7 @@ static void sctp_v4_get_dst(struct sctp_transport *t, union sctp_addr *saddr,
 		fl4->fl4_sport = laddr->a.v4.sin_port;
 		flowi4_update_output(fl4,
 				     asoc->base.sk->sk_bound_dev_if,
-				     RT_CONN_FLAGS(asoc->base.sk),
+				     RT_CONN_FLAGS_TOS(asoc->base.sk, tos),
 				     daddr->v4.sin_addr.s_addr,
 				     laddr->a.v4.sin_addr.s_addr);
 
@@ -971,16 +974,21 @@ static inline int sctp_v4_xmit(struct sk_buff *skb,
 			       struct sctp_transport *transport)
 {
 	struct inet_sock *inet = inet_sk(skb->sk);
+	__u8 dscp = inet->tos;
 
 	pr_debug("%s: skb:%p, len:%d, src:%pI4, dst:%pI4\n", __func__, skb,
-		 skb->len, &transport->fl.u.ip4.saddr, &transport->fl.u.ip4.daddr);
+		 skb->len, &transport->fl.u.ip4.saddr,
+		 &transport->fl.u.ip4.daddr);
+
+	if (transport->dscp & SCTP_DSCP_SET_MASK)
+		dscp = transport->dscp & SCTP_DSCP_VAL_MASK;
 
 	inet->pmtudisc = transport->param_flags & SPP_PMTUD_ENABLE ?
 			 IP_PMTUDISC_DO : IP_PMTUDISC_DONT;
 
 	SCTP_INC_STATS(sock_net(&inet->sk), SCTP_MIB_OUTSCTPPACKS);
 
-	return ip_queue_xmit(&inet->sk, skb, &transport->fl);
+	return __ip_queue_xmit(&inet->sk, skb, &transport->fl, dscp);
 }
 
 static struct sctp_af sctp_af_inet;
-- 
2.1.0

^ permalink raw reply related

* [PATCHv2 net-next 3/5] sctp: add spp_ipv6_flowlabel and spp_dscp for sctp_paddrparams
From: Xin Long @ 2018-07-02 10:21 UTC (permalink / raw)
  To: network dev, linux-sctp
  Cc: Marcelo Ricardo Leitner, Neil Horman, davem, hideaki.yoshifuji
In-Reply-To: <cover.1530526661.git.lucien.xin@gmail.com>

spp_ipv6_flowlabel and spp_dscp are added in sctp_paddrparams in
this patch so that users could set sctp_sock/asoc/transport dscp
and flowlabel with spp_flags SPP_IPV6_FLOWLABEL or SPP_DSCP by
SCTP_PEER_ADDR_PARAMS , as described section 8.1.12 in RFC6458.

As said in last patch, it uses '| 0x100000' or '|0x1' to mark
flowlabel or dscp is set,  so that their values could be set
to 0.

Note that to guarantee that an old app built with old kernel
headers could work on the newer kernel, the param's check in
sctp_g/setsockopt_peer_addr_params() is also improved, which
follows the way that sctp_g/setsockopt_delayed_ack() or some
other sockopts' process that accept two types of params does.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
---
 include/uapi/linux/sctp.h |   4 ++
 net/sctp/socket.c         | 177 ++++++++++++++++++++++++++++++++++++++++++++--
 2 files changed, 175 insertions(+), 6 deletions(-)

diff --git a/include/uapi/linux/sctp.h b/include/uapi/linux/sctp.h
index c02986a..b479db5 100644
--- a/include/uapi/linux/sctp.h
+++ b/include/uapi/linux/sctp.h
@@ -763,6 +763,8 @@ enum  sctp_spp_flags {
 	SPP_SACKDELAY_DISABLE = 1<<6,	/*Disable SACK*/
 	SPP_SACKDELAY = SPP_SACKDELAY_ENABLE | SPP_SACKDELAY_DISABLE,
 	SPP_HB_TIME_IS_ZERO = 1<<7,	/* Set HB delay to 0 */
+	SPP_IPV6_FLOWLABEL = 1<<8,
+	SPP_DSCP = 1<<9,
 };
 
 struct sctp_paddrparams {
@@ -773,6 +775,8 @@ struct sctp_paddrparams {
 	__u32			spp_pathmtu;
 	__u32			spp_sackdelay;
 	__u32			spp_flags;
+	__u32			spp_ipv6_flowlabel;
+	__u8			spp_dscp;
 } __attribute__((packed, aligned(4)));
 
 /*
diff --git a/net/sctp/socket.c b/net/sctp/socket.c
index bf11f9c..452029f 100644
--- a/net/sctp/socket.c
+++ b/net/sctp/socket.c
@@ -2393,6 +2393,8 @@ static int sctp_setsockopt_autoclose(struct sock *sk, char __user *optval,
  *     uint32_t                spp_pathmtu;
  *     uint32_t                spp_sackdelay;
  *     uint32_t                spp_flags;
+ *     uint32_t                spp_ipv6_flowlabel;
+ *     uint8_t                 spp_dscp;
  * };
  *
  *   spp_assoc_id    - (one-to-many style socket) This is filled in the
@@ -2472,6 +2474,45 @@ static int sctp_setsockopt_autoclose(struct sock *sk, char __user *optval,
  *                     also that this field is mutually exclusive to
  *                     SPP_SACKDELAY_ENABLE, setting both will have undefined
  *                     results.
+ *
+ *                     SPP_IPV6_FLOWLABEL:  Setting this flag enables the
+ *                     setting of the IPV6 flow label value.  The value is
+ *                     contained in the spp_ipv6_flowlabel field.
+ *                     Upon retrieval, this flag will be set to indicate that
+ *                     the spp_ipv6_flowlabel field has a valid value returned.
+ *                     If a specific destination address is set (in the
+ *                     spp_address field), then the value returned is that of
+ *                     the address.  If just an association is specified (and
+ *                     no address), then the association's default flow label
+ *                     is returned.  If neither an association nor a destination
+ *                     is specified, then the socket's default flow label is
+ *                     returned.  For non-IPv6 sockets, this flag will be left
+ *                     cleared.
+ *
+ *                     SPP_DSCP:  Setting this flag enables the setting of the
+ *                     Differentiated Services Code Point (DSCP) value
+ *                     associated with either the association or a specific
+ *                     address.  The value is obtained in the spp_dscp field.
+ *                     Upon retrieval, this flag will be set to indicate that
+ *                     the spp_dscp field has a valid value returned.  If a
+ *                     specific destination address is set when called (in the
+ *                     spp_address field), then that specific destination
+ *                     address's DSCP value is returned.  If just an association
+ *                     is specified, then the association's default DSCP is
+ *                     returned.  If neither an association nor a destination is
+ *                     specified, then the socket's default DSCP is returned.
+ *
+ *   spp_ipv6_flowlabel
+ *                   - This field is used in conjunction with the
+ *                     SPP_IPV6_FLOWLABEL flag and contains the IPv6 flow label.
+ *                     The 20 least significant bits are used for the flow
+ *                     label.  This setting has precedence over any IPv6-layer
+ *                     setting.
+ *
+ *   spp_dscp        - This field is used in conjunction with the SPP_DSCP flag
+ *                     and contains the DSCP.  The 6 most significant bits are
+ *                     used for the DSCP.  This setting has precedence over any
+ *                     IPv4- or IPv6- layer setting.
  */
 static int sctp_apply_peer_addr_params(struct sctp_paddrparams *params,
 				       struct sctp_transport   *trans,
@@ -2611,6 +2652,51 @@ static int sctp_apply_peer_addr_params(struct sctp_paddrparams *params,
 		}
 	}
 
+	if (params->spp_flags & SPP_IPV6_FLOWLABEL) {
+		if (trans && trans->ipaddr.sa.sa_family == AF_INET6) {
+			trans->flowlabel = params->spp_ipv6_flowlabel &
+					   SCTP_FLOWLABEL_VAL_MASK;
+			trans->flowlabel |= SCTP_FLOWLABEL_SET_MASK;
+		} else if (asoc) {
+			list_for_each_entry(trans,
+					    &asoc->peer.transport_addr_list,
+					    transports) {
+				if (trans->ipaddr.sa.sa_family != AF_INET6)
+					continue;
+				trans->flowlabel = params->spp_ipv6_flowlabel &
+						   SCTP_FLOWLABEL_VAL_MASK;
+				trans->flowlabel |= SCTP_FLOWLABEL_SET_MASK;
+			}
+			asoc->flowlabel = params->spp_ipv6_flowlabel &
+					  SCTP_FLOWLABEL_VAL_MASK;
+			asoc->flowlabel |= SCTP_FLOWLABEL_SET_MASK;
+		} else if (sctp_opt2sk(sp)->sk_family == AF_INET6) {
+			sp->flowlabel = params->spp_ipv6_flowlabel &
+					SCTP_FLOWLABEL_VAL_MASK;
+			sp->flowlabel |= SCTP_FLOWLABEL_SET_MASK;
+		}
+	}
+
+	if (params->spp_flags & SPP_DSCP) {
+		if (trans) {
+			trans->dscp = params->spp_dscp & SCTP_DSCP_VAL_MASK;
+			trans->dscp |= SCTP_DSCP_SET_MASK;
+		} else if (asoc) {
+			list_for_each_entry(trans,
+					    &asoc->peer.transport_addr_list,
+					    transports) {
+				trans->dscp = params->spp_dscp &
+					      SCTP_DSCP_VAL_MASK;
+				trans->dscp |= SCTP_DSCP_SET_MASK;
+			}
+			asoc->dscp = params->spp_dscp & SCTP_DSCP_VAL_MASK;
+			asoc->dscp |= SCTP_DSCP_SET_MASK;
+		} else {
+			sp->dscp = params->spp_dscp & SCTP_DSCP_VAL_MASK;
+			sp->dscp |= SCTP_DSCP_SET_MASK;
+		}
+	}
+
 	return 0;
 }
 
@@ -2625,11 +2711,18 @@ static int sctp_setsockopt_peer_addr_params(struct sock *sk,
 	int error;
 	int hb_change, pmtud_change, sackdelay_change;
 
-	if (optlen != sizeof(struct sctp_paddrparams))
+	if (optlen == sizeof(params)) {
+		if (copy_from_user(&params, optval, optlen))
+			return -EFAULT;
+	} else if (optlen == ALIGN(offsetof(struct sctp_paddrparams,
+					    spp_ipv6_flowlabel), 4)) {
+		if (copy_from_user(&params, optval, optlen))
+			return -EFAULT;
+		if (params.spp_flags & (SPP_DSCP | SPP_IPV6_FLOWLABEL))
+			return -EINVAL;
+	} else {
 		return -EINVAL;
-
-	if (copy_from_user(&params, optval, optlen))
-		return -EFAULT;
+	}
 
 	/* Validate flags and value parameters. */
 	hb_change        = params.spp_flags & SPP_HB;
@@ -5453,6 +5546,45 @@ static int sctp_getsockopt_peeloff_flags(struct sock *sk, int len,
  *                     also that this field is mutually exclusive to
  *                     SPP_SACKDELAY_ENABLE, setting both will have undefined
  *                     results.
+ *
+ *                     SPP_IPV6_FLOWLABEL:  Setting this flag enables the
+ *                     setting of the IPV6 flow label value.  The value is
+ *                     contained in the spp_ipv6_flowlabel field.
+ *                     Upon retrieval, this flag will be set to indicate that
+ *                     the spp_ipv6_flowlabel field has a valid value returned.
+ *                     If a specific destination address is set (in the
+ *                     spp_address field), then the value returned is that of
+ *                     the address.  If just an association is specified (and
+ *                     no address), then the association's default flow label
+ *                     is returned.  If neither an association nor a destination
+ *                     is specified, then the socket's default flow label is
+ *                     returned.  For non-IPv6 sockets, this flag will be left
+ *                     cleared.
+ *
+ *                     SPP_DSCP:  Setting this flag enables the setting of the
+ *                     Differentiated Services Code Point (DSCP) value
+ *                     associated with either the association or a specific
+ *                     address.  The value is obtained in the spp_dscp field.
+ *                     Upon retrieval, this flag will be set to indicate that
+ *                     the spp_dscp field has a valid value returned.  If a
+ *                     specific destination address is set when called (in the
+ *                     spp_address field), then that specific destination
+ *                     address's DSCP value is returned.  If just an association
+ *                     is specified, then the association's default DSCP is
+ *                     returned.  If neither an association nor a destination is
+ *                     specified, then the socket's default DSCP is returned.
+ *
+ *   spp_ipv6_flowlabel
+ *                   - This field is used in conjunction with the
+ *                     SPP_IPV6_FLOWLABEL flag and contains the IPv6 flow label.
+ *                     The 20 least significant bits are used for the flow
+ *                     label.  This setting has precedence over any IPv6-layer
+ *                     setting.
+ *
+ *   spp_dscp        - This field is used in conjunction with the SPP_DSCP flag
+ *                     and contains the DSCP.  The 6 most significant bits are
+ *                     used for the DSCP.  This setting has precedence over any
+ *                     IPv4- or IPv6- layer setting.
  */
 static int sctp_getsockopt_peer_addr_params(struct sock *sk, int len,
 					    char __user *optval, int __user *optlen)
@@ -5462,9 +5594,15 @@ static int sctp_getsockopt_peer_addr_params(struct sock *sk, int len,
 	struct sctp_association *asoc = NULL;
 	struct sctp_sock        *sp = sctp_sk(sk);
 
-	if (len < sizeof(struct sctp_paddrparams))
+	if (len >= sizeof(params))
+		len = sizeof(params);
+	else if (len >= ALIGN(offsetof(struct sctp_paddrparams,
+				       spp_ipv6_flowlabel), 4))
+		len = ALIGN(offsetof(struct sctp_paddrparams,
+				     spp_ipv6_flowlabel), 4);
+	else
 		return -EINVAL;
-	len = sizeof(struct sctp_paddrparams);
+
 	if (copy_from_user(&params, optval, len))
 		return -EFAULT;
 
@@ -5499,6 +5637,15 @@ static int sctp_getsockopt_peer_addr_params(struct sock *sk, int len,
 
 		/*draft-11 doesn't say what to return in spp_flags*/
 		params.spp_flags      = trans->param_flags;
+		if (trans->flowlabel & SCTP_FLOWLABEL_SET_MASK) {
+			params.spp_ipv6_flowlabel = trans->flowlabel &
+						    SCTP_FLOWLABEL_VAL_MASK;
+			params.spp_flags |= SPP_IPV6_FLOWLABEL;
+		}
+		if (trans->dscp & SCTP_DSCP_SET_MASK) {
+			params.spp_dscp	= trans->dscp & SCTP_DSCP_VAL_MASK;
+			params.spp_flags |= SPP_DSCP;
+		}
 	} else if (asoc) {
 		/* Fetch association values. */
 		params.spp_hbinterval = jiffies_to_msecs(asoc->hbinterval);
@@ -5508,6 +5655,15 @@ static int sctp_getsockopt_peer_addr_params(struct sock *sk, int len,
 
 		/*draft-11 doesn't say what to return in spp_flags*/
 		params.spp_flags      = asoc->param_flags;
+		if (asoc->flowlabel & SCTP_FLOWLABEL_SET_MASK) {
+			params.spp_ipv6_flowlabel = asoc->flowlabel &
+						    SCTP_FLOWLABEL_VAL_MASK;
+			params.spp_flags |= SPP_IPV6_FLOWLABEL;
+		}
+		if (asoc->dscp & SCTP_DSCP_SET_MASK) {
+			params.spp_dscp	= asoc->dscp & SCTP_DSCP_VAL_MASK;
+			params.spp_flags |= SPP_DSCP;
+		}
 	} else {
 		/* Fetch socket values. */
 		params.spp_hbinterval = sp->hbinterval;
@@ -5517,6 +5673,15 @@ static int sctp_getsockopt_peer_addr_params(struct sock *sk, int len,
 
 		/*draft-11 doesn't say what to return in spp_flags*/
 		params.spp_flags      = sp->param_flags;
+		if (sp->flowlabel & SCTP_FLOWLABEL_SET_MASK) {
+			params.spp_ipv6_flowlabel = sp->flowlabel &
+						    SCTP_FLOWLABEL_VAL_MASK;
+			params.spp_flags |= SPP_IPV6_FLOWLABEL;
+		}
+		if (sp->dscp & SCTP_DSCP_SET_MASK) {
+			params.spp_dscp	= sp->dscp & SCTP_DSCP_VAL_MASK;
+			params.spp_flags |= SPP_DSCP;
+		}
 	}
 
 	if (copy_to_user(optval, &params, len))
-- 
2.1.0

^ permalink raw reply related

* [PATCHv2 net-next 4/5] sctp: add support for setting flowlabel when adding a transport
From: Xin Long @ 2018-07-02 10:21 UTC (permalink / raw)
  To: network dev, linux-sctp
  Cc: Marcelo Ricardo Leitner, Neil Horman, davem, hideaki.yoshifuji
In-Reply-To: <cover.1530526661.git.lucien.xin@gmail.com>

Struct sockaddr_in6 has the member sin6_flowinfo that includes the
ipv6 flowlabel, it should also support for setting flowlabel when
adding a transport whose ipaddr is from userspace.

Note that addrinfo in sctp_sendmsg is using struct in6_addr for
the secondary addrs, which doesn't contain sin6_flowinfo, and
it needs to copy sin6_flowinfo from the primary addr.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
---
 net/sctp/associola.c | 12 ++++++++++--
 net/sctp/socket.c    |  5 +++++
 2 files changed, 15 insertions(+), 2 deletions(-)

diff --git a/net/sctp/associola.c b/net/sctp/associola.c
index 16ecfbc..297d9cf 100644
--- a/net/sctp/associola.c
+++ b/net/sctp/associola.c
@@ -650,8 +650,16 @@ struct sctp_transport *sctp_assoc_add_peer(struct sctp_association *asoc,
 	peer->sackdelay = asoc->sackdelay;
 	peer->sackfreq = asoc->sackfreq;
 
-	if (addr->sa.sa_family == AF_INET6)
-		peer->flowlabel = asoc->flowlabel;
+	if (addr->sa.sa_family == AF_INET6) {
+		__be32 info = addr->v6.sin6_flowinfo;
+
+		if (info) {
+			peer->flowlabel = ntohl(info & IPV6_FLOWLABEL_MASK);
+			peer->flowlabel |= SCTP_FLOWLABEL_SET_MASK;
+		} else {
+			peer->flowlabel = asoc->flowlabel;
+		}
+	}
 	peer->dscp = asoc->dscp;
 
 	/* Enable/disable heartbeat, SACK delay, and path MTU discovery
diff --git a/net/sctp/socket.c b/net/sctp/socket.c
index 452029f..2607b50 100644
--- a/net/sctp/socket.c
+++ b/net/sctp/socket.c
@@ -1697,6 +1697,7 @@ static int sctp_sendmsg_new_asoc(struct sock *sk, __u16 sflags,
 	struct sctp_association *asoc;
 	enum sctp_scope scope;
 	struct cmsghdr *cmsg;
+	__be32 flowinfo = 0;
 	struct sctp_af *af;
 	int err;
 
@@ -1781,6 +1782,9 @@ static int sctp_sendmsg_new_asoc(struct sock *sk, __u16 sflags,
 	if (!cmsgs->addrs_msg)
 		return 0;
 
+	if (daddr->sa.sa_family == AF_INET6)
+		flowinfo = daddr->v6.sin6_flowinfo;
+
 	/* sendv addr list parse */
 	for_each_cmsghdr(cmsg, cmsgs->addrs_msg) {
 		struct sctp_transport *transport;
@@ -1813,6 +1817,7 @@ static int sctp_sendmsg_new_asoc(struct sock *sk, __u16 sflags,
 			}
 
 			dlen = sizeof(struct in6_addr);
+			daddr->v6.sin6_flowinfo = flowinfo;
 			daddr->v6.sin6_family = AF_INET6;
 			daddr->v6.sin6_port = htons(asoc->peer.port);
 			memcpy(&daddr->v6.sin6_addr, CMSG_DATA(cmsg), dlen);
-- 
2.1.0

^ permalink raw reply related

* [PATCHv2 net-next 5/5] sctp: check for ipv6_pinfo legal sndflow with flowlabel in sctp_v6_get_dst
From: Xin Long @ 2018-07-02 10:21 UTC (permalink / raw)
  To: network dev, linux-sctp
  Cc: Marcelo Ricardo Leitner, Neil Horman, davem, hideaki.yoshifuji
In-Reply-To: <cover.1530526661.git.lucien.xin@gmail.com>

The transport with illegal flowlabel should not be allowed to send
packets. Other transport protocols already denies this.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
---
 net/sctp/ipv6.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/net/sctp/ipv6.c b/net/sctp/ipv6.c
index 772513d..d83ddc4 100644
--- a/net/sctp/ipv6.c
+++ b/net/sctp/ipv6.c
@@ -262,6 +262,15 @@ static void sctp_v6_get_dst(struct sctp_transport *t, union sctp_addr *saddr,
 	if (t->flowlabel & SCTP_FLOWLABEL_SET_MASK)
 		fl6->flowlabel = htonl(t->flowlabel & SCTP_FLOWLABEL_VAL_MASK);
 
+	if (np->sndflow && (fl6->flowlabel & IPV6_FLOWLABEL_MASK)) {
+		struct ip6_flowlabel *flowlabel;
+
+		flowlabel = fl6_sock_lookup(sk, fl6->flowlabel);
+		if (!flowlabel)
+			goto out;
+		fl6_sock_release(flowlabel);
+	}
+
 	pr_debug("%s: dst=%pI6 ", __func__, &fl6->daddr);
 
 	if (asoc)
-- 
2.1.0

^ permalink raw reply related

* Re: [PATCH net] tcp: prevent bogus FRTO undos with non-SACK flows
From: Ilpo Järvinen @ 2018-07-02 10:26 UTC (permalink / raw)
  To: Neal Cardwell; +Cc: Netdev, Yuchung Cheng, Eric Dumazet, Michal Kubecek
In-Reply-To: <CADVnQymia_drxUkUP1V8MvqCObyKUXhg+xjrssef9S5D8GYnKw@mail.gmail.com>

On Sat, 30 Jun 2018, Neal Cardwell wrote:

> As I mentioned, I ran your patch through all our team's TCP
> packetdrill tests, and it passes all of the tests. One of our tests
> needed updating, because if there is a non-SACK connection with a
> spurious RTO due to a delayed flight of ACKs then the FRTO undo now
> happens one ACK later (when we get an ACK that doesn't cover a
> retransmit). But that seems fine to me.

Yes, this is what is wanted. The non-SACK FRTO cannot make decision on 
the first cumulative ACK because that could be (often is) triggered by the 
retransmit but only from the next ACK after that.

Even with SACK FRTO, there is a hazard on doing it that early as tail ACK 
losses can lead to discovery of newly SACKed skbs from ACK of the
retransmitted segment. For that to occur, however, the cumulative ACK 
cannot cover those skbs implying more holes that need to be recovered. 
Therefore, the window reduction will eventually occur anyway but it would 
still first do a bogus undo also in that case.

> I also cooked the new packetdrill test below to explicitly cover this
> case you are addressing (please let me know if you have an alternate
> suggestion).
> 
> Tested-by: Neal Cardwell <ncardwell@google.com>
> Acked-by: Neal Cardwell <ncardwell@google.com>
> 
> Thanks!
> neal
> 
> ---
> 
>     0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
>    +0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
>    +0 bind(3, ..., ...) = 0
>    +0 listen(3, 1) = 0
> 
>    +0 < S 0:0(0) win 32792 <mss 1000,nop,wscale 7>
>    +0 > S. 0:0(0) ack 1 <mss 1460,nop,wscale 8>
>  +.02 < . 1:1(0) ack 1 win 257
>    +0 accept(3, ..., ...) = 4
> 
> // Send 3 packets. First is really lost. And the dupacks
> // for the data packets that arrived at the reciver are slow in arriving.
>    +0 write(4, ..., 3000) = 3000
>    +0 > P. 1:3001(3000) ack 1
> 
> // RTO and retransmit head. This fills a real loss.
>  +.22 > . 1:1001(1000) ack 1
> 
> // Dupacks for packets 2 and 3 arrive.
> +.02  < . 1:1(0) ack 1 win 257
>    +0 < . 1:1(0) ack 1 win 257
> 
> // The cumulative ACK for all the data arrives. We do not undo, because
> // this is a non-SACK connection, and retransmitted data was ACKed.
> // It's good that there's no FRTO undo, since a packet was really lost.
> // Because this is non-SACK, tcp_try_undo_recovery() holds CA_Loss
> // until something beyond high_seq is ACKed.
> +.005 < . 1:1(0) ack 3001 win 257
>    +0 %{ assert tcpi_ca_state == TCP_CA_Loss, tcpi_ca_state }%
>    +0 %{ assert tcpi_snd_cwnd == 4, tcpi_snd_cwnd }%
>    +0 %{ assert tcpi_snd_ssthresh == 7, tcpi_snd_ssthresh }%

I think that the snd_cwnd is still fishy there but that would 
require also the other patch from my series (cwnd was 1 so it should be 2 
after the cumulative ACK).


-- 
 i.

^ permalink raw reply

* Re: [RFC PATCH] ipv6: make ipv6_renew_options() interrupt/kernel safe
From: Paul Moore @ 2018-07-02 11:03 UTC (permalink / raw)
  To: netdev; +Cc: Al Viro, selinux, linux-security-module
In-Reply-To: <153050046203.740.13741366203375982437.stgit@chester>

On July 1, 2018 11:01:04 PM Paul Moore <pmoore@redhat.com> wrote:

> From: Paul Moore <paul@paul-moore.com>
>
> At present the ipv6_renew_options_kern() function ends up calling into
> access_ok() which is problematic if done from inside an interrupt as
> access_ok() calls WARN_ON_IN_IRQ() on some (all?) architectures
> (x86-64 is affected).  Example warning/backtrace is shown below:
>
> WARNING: CPU: 1 PID: 3144 at lib/usercopy.c:11 _copy_from_user+0x85/0x90
> ...
> Call Trace:
>  <IRQ>
>  ipv6_renew_option+0xb2/0xf0
>  ipv6_renew_options+0x26a/0x340
>  ipv6_renew_options_kern+0x2c/0x40
>  calipso_req_setattr+0x72/0xe0
>  netlbl_req_setattr+0x126/0x1b0
>  selinux_netlbl_inet_conn_request+0x80/0x100
>  selinux_inet_conn_request+0x6d/0xb0
>  security_inet_conn_request+0x32/0x50
>  tcp_conn_request+0x35f/0xe00
>  ? __lock_acquire+0x250/0x16c0
>  ? selinux_socket_sock_rcv_skb+0x1ae/0x210
>  ? tcp_rcv_state_process+0x289/0x106b
>  tcp_rcv_state_process+0x289/0x106b
>  ? tcp_v6_do_rcv+0x1a7/0x3c0
>  tcp_v6_do_rcv+0x1a7/0x3c0
>  tcp_v6_rcv+0xc82/0xcf0
>  ip6_input_finish+0x10d/0x690
>  ip6_input+0x45/0x1e0
>  ? ip6_rcv_finish+0x1d0/0x1d0
>  ipv6_rcv+0x32b/0x880
>  ? ip6_make_skb+0x1e0/0x1e0
>  __netif_receive_skb_core+0x6f2/0xdf0
>  ? process_backlog+0x85/0x250
>  ? process_backlog+0x85/0x250
>  ? process_backlog+0xec/0x250
>  process_backlog+0xec/0x250
>  net_rx_action+0x153/0x480
>  __do_softirq+0xd9/0x4f7
>  do_softirq_own_stack+0x2a/0x40
>  </IRQ>
>  ...
>
> While not present in the backtrace, ipv6_renew_option() ends up calling
> access_ok() via the following chain:
>
>  access_ok()
>  _copy_from_user()
>  copy_from_user()
>  ipv6_renew_option()
>
> The fix presented in this patch is to perform the userspace copy
> earlier in the call chain such that it is only called when the option
> data is actually coming from userspace; that place is
> do_ipv6_setsockopt().  Not only does this solve the problem seen in
> the backtrace above, it also allows us to simplify the code quite a
> bit by removing ipv6_renew_options_kern() completely.  We also take
> this opportunity to cleanup ipv6_renew_options()/ipv6_renew_option()
> a small amount as well.
>
> This patch is heavily based on a rough patch by Al Viro.  I've taken
> his original patch, converted a kmemdup() call in do_ipv6_setsockopt()
> to a memdup_user() call, made better use of the e_inval jump target in
> the same function, and cleaned up the use ipv6_renew_option() by
> ipv6_renew_options().
>
> CC: Al Viro <viro@zeniv.linux.org.uk>
> Signed-off-by: Paul Moore <paul@paul-moore.com>
> ---
> include/net/ipv6.h       |    9 ----
> net/ipv6/calipso.c       |    9 +---
> net/ipv6/exthdrs.c       |  108 ++++++++++++----------------------------------
> net/ipv6/ipv6_sockglue.c |   27 ++++++++----
> 4 files changed, 50 insertions(+), 103 deletions(-)


Hold off on this patch, while it worked for me, I just received a bug report from Intel's 0day robot that I want to chase down.

> diff --git a/include/net/ipv6.h b/include/net/ipv6.h
> index 16475c269749..d02881e4ad1f 100644
> --- a/include/net/ipv6.h
> +++ b/include/net/ipv6.h
> @@ -355,14 +355,7 @@ struct ipv6_txoptions *ipv6_dup_options(struct sock *sk,
> struct ipv6_txoptions *ipv6_renew_options(struct sock *sk,
> 	 struct ipv6_txoptions *opt,
> 	 int newtype,
> -	 struct ipv6_opt_hdr __user *newopt,
> -	 int newoptlen);
> -struct ipv6_txoptions *
> -ipv6_renew_options_kern(struct sock *sk,
> -	struct ipv6_txoptions *opt,
> -	int newtype,
> -	struct ipv6_opt_hdr *newopt,
> -	int newoptlen);
> +	 struct ipv6_opt_hdr *newopt);
> struct ipv6_txoptions *ipv6_fixup_options(struct ipv6_txoptions *opt_space,
> 	 struct ipv6_txoptions *opt);
>
> diff --git a/net/ipv6/calipso.c b/net/ipv6/calipso.c
> index 1323b9679cf7..1c0bb9fb76e6 100644
> --- a/net/ipv6/calipso.c
> +++ b/net/ipv6/calipso.c
> @@ -799,8 +799,7 @@ static int calipso_opt_update(struct sock *sk, struct ipv6_opt_hdr *hop)
> {
> 	struct ipv6_txoptions *old = txopt_get(inet6_sk(sk)), *txopts;
>
> -	txopts = ipv6_renew_options_kern(sk, old, IPV6_HOPOPTS,
> -	hop, hop ? ipv6_optlen(hop) : 0);
> +	txopts = ipv6_renew_options(sk, old, IPV6_HOPOPTS, hop);
> 	txopt_put(old);
> 	if (IS_ERR(txopts))
> 	return PTR_ERR(txopts);
> @@ -1222,8 +1221,7 @@ static int calipso_req_setattr(struct request_sock *req,
> 	if (IS_ERR(new))
> 	return PTR_ERR(new);
>
> -	txopts = ipv6_renew_options_kern(sk, req_inet->ipv6_opt, IPV6_HOPOPTS,
> -	new, new ? ipv6_optlen(new) : 0);
> +	txopts = ipv6_renew_options(sk, req_inet->ipv6_opt, IPV6_HOPOPTS, new);
>
> 	kfree(new);
>
> @@ -1260,8 +1258,7 @@ static void calipso_req_delattr(struct request_sock *req)
> 	if (calipso_opt_del(req_inet->ipv6_opt->hopopt, &new))
> 	return; /* Nothing to do */
>
> -	txopts = ipv6_renew_options_kern(sk, req_inet->ipv6_opt, IPV6_HOPOPTS,
> -	new, new ? ipv6_optlen(new) : 0);
> +	txopts = ipv6_renew_options(sk, req_inet->ipv6_opt, IPV6_HOPOPTS, new);
>
> 	if (!IS_ERR(txopts)) {
> 	txopts = xchg(&req_inet->ipv6_opt, txopts);
> diff --git a/net/ipv6/exthdrs.c b/net/ipv6/exthdrs.c
> index 5bc2bf3733ab..1e1d9bc2fd3d 100644
> --- a/net/ipv6/exthdrs.c
> +++ b/net/ipv6/exthdrs.c
> @@ -1015,29 +1015,21 @@ ipv6_dup_options(struct sock *sk, struct ipv6_txoptions *opt)
> }
> EXPORT_SYMBOL_GPL(ipv6_dup_options);
>
> -static int ipv6_renew_option(void *ohdr,
> -	    struct ipv6_opt_hdr __user *newopt, int newoptlen,
> -	    int inherit,
> -	    struct ipv6_opt_hdr **hdr,
> -	    char **p)
> +static void ipv6_renew_option(int renewtype,
> +	     struct ipv6_opt_hdr **dest,
> +	     struct ipv6_opt_hdr *old,
> +	     struct ipv6_opt_hdr *new,
> +	     int newtype, char **p)
> {
> -	if (inherit) {
> -	if (ohdr) {
> -	memcpy(*p, ohdr, ipv6_optlen((struct ipv6_opt_hdr *)ohdr));
> -	*hdr = (struct ipv6_opt_hdr *)*p;
> -	*p += CMSG_ALIGN(ipv6_optlen(*hdr));
> -	}
> -	} else {
> -	if (newopt) {
> -	if (copy_from_user(*p, newopt, newoptlen))
> -	return -EFAULT;
> -	*hdr = (struct ipv6_opt_hdr *)*p;
> -	if (ipv6_optlen(*hdr) > newoptlen)
> -	return -EINVAL;
> -	*p += CMSG_ALIGN(newoptlen);
> -	}
> -	}
> -	return 0;
> +	struct ipv6_opt_hdr *src;
> +
> +	src = (renewtype == newtype ? new : old);
> +	if (!src)
> +	return;
> +
> +	memcpy(*p, src, ipv6_optlen(src));
> +	*dest = (struct ipv6_opt_hdr *)*p;
> +	p += CMSG_ALIGN(ipv6_optlen(*dest));
> }
>
> /**
> @@ -1063,13 +1055,11 @@ static int ipv6_renew_option(void *ohdr,
>  */
> struct ipv6_txoptions *
> ipv6_renew_options(struct sock *sk, struct ipv6_txoptions *opt,
> -	  int newtype,
> -	  struct ipv6_opt_hdr __user *newopt, int newoptlen)
> +	  int newtype, struct ipv6_opt_hdr *newopt)
> {
> 	int tot_len = 0;
> 	char *p;
> 	struct ipv6_txoptions *opt2;
> -	int err;
>
> 	if (opt) {
> 	if (newtype != IPV6_HOPOPTS && opt->hopopt)
> @@ -1082,8 +1072,8 @@ ipv6_renew_options(struct sock *sk, struct ipv6_txoptions *opt,
> 	tot_len += CMSG_ALIGN(ipv6_optlen(opt->dst1opt));
> 	}
>
> -	if (newopt && newoptlen)
> -	tot_len += CMSG_ALIGN(newoptlen);
> +	if (newopt)
> +	tot_len += CMSG_ALIGN(ipv6_optlen(newopt));
>
> 	if (!tot_len)
> 	return NULL;
> @@ -1098,29 +1088,16 @@ ipv6_renew_options(struct sock *sk, struct ipv6_txoptions *opt,
> 	opt2->tot_len = tot_len;
> 	p = (char *)(opt2 + 1);
>
> -	err = ipv6_renew_option(opt ? opt->hopopt : NULL, newopt, newoptlen,
> -	newtype != IPV6_HOPOPTS,
> -	&opt2->hopopt, &p);
> -	if (err)
> -	goto out;
> -
> -	err = ipv6_renew_option(opt ? opt->dst0opt : NULL, newopt, newoptlen,
> -	newtype != IPV6_RTHDRDSTOPTS,
> -	&opt2->dst0opt, &p);
> -	if (err)
> -	goto out;
> -
> -	err = ipv6_renew_option(opt ? opt->srcrt : NULL, newopt, newoptlen,
> -	newtype != IPV6_RTHDR,
> -	(struct ipv6_opt_hdr **)&opt2->srcrt, &p);
> -	if (err)
> -	goto out;
> -
> -	err = ipv6_renew_option(opt ? opt->dst1opt : NULL, newopt, newoptlen,
> -	newtype != IPV6_DSTOPTS,
> -	&opt2->dst1opt, &p);
> -	if (err)
> -	goto out;
> +	ipv6_renew_option(IPV6_HOPOPTS, &opt2->hopopt, opt->hopopt,
> +	 newopt, newtype, &p);
> +	ipv6_renew_option(IPV6_RTHDRDSTOPTS, &opt2->dst0opt, opt->dst0opt,
> +	 newopt, newtype, &p);
> +	ipv6_renew_option(IPV6_RTHDR,
> +	 (struct ipv6_opt_hdr **)&opt2->srcrt,
> +	 (struct ipv6_opt_hdr *)opt->srcrt,
> +	 newopt, newtype, &p);
> +	ipv6_renew_option(IPV6_DSTOPTS, &opt2->dst1opt, opt->dst1opt,
> +	 newopt, newtype, &p);
>
> 	opt2->opt_nflen = (opt2->hopopt ? ipv6_optlen(opt2->hopopt) : 0) +
> 	 (opt2->dst0opt ? ipv6_optlen(opt2->dst0opt) : 0) +
> @@ -1128,37 +1105,6 @@ ipv6_renew_options(struct sock *sk, struct ipv6_txoptions *opt,
> 	opt2->opt_flen = (opt2->dst1opt ? ipv6_optlen(opt2->dst1opt) : 0);
>
> 	return opt2;
> -out:
> -	sock_kfree_s(sk, opt2, opt2->tot_len);
> -	return ERR_PTR(err);
> -}
> -
> -/**
> - * ipv6_renew_options_kern - replace a specific ext hdr with a new one.
> - *
> - * @sk: sock from which to allocate memory
> - * @opt: original options
> - * @newtype: option type to replace in @opt
> - * @newopt: new option of type @newtype to replace (kernel-mem)
> - * @newoptlen: length of @newopt
> - *
> - * See ipv6_renew_options().  The difference is that @newopt is
> - * kernel memory, rather than user memory.
> - */
> -struct ipv6_txoptions *
> -ipv6_renew_options_kern(struct sock *sk, struct ipv6_txoptions *opt,
> -	int newtype, struct ipv6_opt_hdr *newopt,
> -	int newoptlen)
> -{
> -	struct ipv6_txoptions *ret_val;
> -	const mm_segment_t old_fs = get_fs();
> -
> -	set_fs(KERNEL_DS);
> -	ret_val = ipv6_renew_options(sk, opt, newtype,
> -	    (struct ipv6_opt_hdr __user *)newopt,
> -	    newoptlen);
> -	set_fs(old_fs);
> -	return ret_val;
> }
>
> struct ipv6_txoptions *ipv6_fixup_options(struct ipv6_txoptions *opt_space,
> diff --git a/net/ipv6/ipv6_sockglue.c b/net/ipv6/ipv6_sockglue.c
> index 4d780c7f0130..c95c3486d904 100644
> --- a/net/ipv6/ipv6_sockglue.c
> +++ b/net/ipv6/ipv6_sockglue.c
> @@ -398,6 +398,12 @@ static int do_ipv6_setsockopt(struct sock *sk, int level, int optname,
> 	case IPV6_DSTOPTS:
> 	{
> 	struct ipv6_txoptions *opt;
> +	struct ipv6_opt_hdr *new = NULL;
> +
> +	/* hop-by-hop / destination options are privileged option */
> +	retv = -EPERM;
> +	if (optname != IPV6_RTHDR && !ns_capable(net->user_ns, CAP_NET_RAW))
> +	break;
>
> 	/* remove any sticky options header with a zero option
> 	* length, per RFC3542.
> @@ -409,17 +415,22 @@ static int do_ipv6_setsockopt(struct sock *sk, int level, int optname,
> 	else if (optlen < sizeof(struct ipv6_opt_hdr) ||
> 	optlen & 0x7 || optlen > 8 * 255)
> 	goto e_inval;
> -
> -	/* hop-by-hop / destination options are privileged option */
> -	retv = -EPERM;
> -	if (optname != IPV6_RTHDR && !ns_capable(net->user_ns, CAP_NET_RAW))
> -	break;
> +	else {
> +	new = memdup_user(optval, optlen);
> +	if (IS_ERR(new)) {
> +	retv = PTR_ERR(new);
> +	break;
> +	}
> +	if (unlikely(ipv6_optlen(new) > optlen)) {
> +	kfree(new);
> +	goto e_inval;
> +	}
> +	}
>
> 	opt = rcu_dereference_protected(np->opt,
> 	lockdep_sock_is_held(sk));
> -	opt = ipv6_renew_options(sk, opt, optname,
> -	(struct ipv6_opt_hdr __user *)optval,
> -	optlen);
> +	opt = ipv6_renew_options(sk, opt, optname, new);
> +	kfree(new);
> 	if (IS_ERR(opt)) {
> 	retv = PTR_ERR(opt);
> 	break;

^ permalink raw reply

* Re: [PATCH v3 1/4] Simplify usbnet_cdc_update_filter
From: Miguel Rodríguez Pérez @ 2018-07-02 11:19 UTC (permalink / raw)
  To: Oliver Neukum, gregkh, linux-usb, netdev
In-Reply-To: <1530519944.18402.10.camel@suse.com>

I get a panic if I remove this patch, because intf comes NULL for
cdc_ncm devices. I'll send an updated patch that solves this issue while
still using usb_control_msg.

On 02/07/18 10:25, Oliver Neukum wrote:
> On So, 2018-07-01 at 11:05 +0200, Miguel Rodríguez Pérez         wrote:
>> Remove some unneded varibles to make the code easier to read
>> and, replace the generic usb_control_msg function for the
>> more specific usbnet_write_cmd.
>>
>> Signed-off-by: Miguel Rodríguez Pérez <miguel@det.uvigo.gal>
> 
> No,
> 
> sorry, but this is not good. The reason is a bit subtle.
> Drivers need to reset the filters when handling post_reset()
> [ and reset_resume() ] usbnet_write_cmd() falls back to
> kmemdup() with GFP_KERNEL. Usbnet is a framework with class
> drivers and some of the devices we drive have a storage
> interface. Thence we are on the block error handling path here.
> 
> The simplest solution is to leave out this patch in the sequence.
> 
> 	Regards
> 		Oliver
> 
> 
> NACKED-BY: Oliver Neukum <oneukum@suse.com>
> 
> 
>> ---
>>  drivers/net/usb/cdc_ether.c | 15 +++++----------
>>  1 file changed, 5 insertions(+), 10 deletions(-)
>>
>> diff --git a/drivers/net/usb/cdc_ether.c b/drivers/net/usb/cdc_ether.c
>> index 178b956501a7..815ed0dc18fe 100644
>> --- a/drivers/net/usb/cdc_ether.c
>> +++ b/drivers/net/usb/cdc_ether.c
>> @@ -77,9 +77,7 @@ static const u8 mbm_guid[16] = {
>>  
>>  static void usbnet_cdc_update_filter(struct usbnet *dev)
>>  {
>> -	struct cdc_state	*info = (void *) &dev->data;
>> -	struct usb_interface	*intf = info->control;
>> -	struct net_device	*net = dev->net;
>> +	struct net_device *net = dev->net;
>>  
>>  	u16 cdc_filter = USB_CDC_PACKET_TYPE_DIRECTED
>>  			| USB_CDC_PACKET_TYPE_BROADCAST;
>> @@ -93,16 +91,13 @@ static void usbnet_cdc_update_filter(struct usbnet *dev)
>>  	if (!netdev_mc_empty(net) || (net->flags & IFF_ALLMULTI))
>>  		cdc_filter |= USB_CDC_PACKET_TYPE_ALL_MULTICAST;
>>  
>> -	usb_control_msg(dev->udev,
>> -			usb_sndctrlpipe(dev->udev, 0),
>> +	usbnet_write_cmd(dev,
>>  			USB_CDC_SET_ETHERNET_PACKET_FILTER,
>> -			USB_TYPE_CLASS | USB_RECIP_INTERFACE,
>> +			USB_TYPE_CLASS | USB_DIR_OUT | USB_RECIP_INTERFACE,
>>  			cdc_filter,
>> -			intf->cur_altsetting->desc.bInterfaceNumber,
>> +			dev->intf->cur_altsetting->desc.bInterfaceNumber,
>>  			NULL,
>> -			0,
>> -			USB_CTRL_SET_TIMEOUT
>> -		);
>> +			0);
>>  }
>>  
>>  /* probes control interface, claims data interface, collects the bulk
> 

-- 
Miguel Rodríguez Pérez
Laboratorio de Redes
EE Telecomunicación – Universidade de Vigo

^ permalink raw reply

* [PATCH v4 1/4] Use dev->intf to get interface information
From: Miguel Rodríguez Pérez @ 2018-07-02 11:28 UTC (permalink / raw)
  To: oliver, linux-usb, netdev, gregkh; +Cc: Miguel Rodríguez Pérez
In-Reply-To: <e3e25aa6-9f9a-0643-a644-e8efdf12a562@det.uvigo.gal>

usbnet_cdc_update_filter was getting the interface number from the
usb_interface struct in cdc_state->control. However, cdc_ncm does
not initialize that structure in its bind function, but uses
cdc_ncm_cts instead. Getting intf directly from struct usbnet solves
the problem.

Signed-off-by: Miguel Rodríguez Pérez <miguel@det.uvigo.gal>
---
 drivers/net/usb/cdc_ether.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/drivers/net/usb/cdc_ether.c b/drivers/net/usb/cdc_ether.c
index 178b956501a7..beac02cbde51 100644
--- a/drivers/net/usb/cdc_ether.c
+++ b/drivers/net/usb/cdc_ether.c
@@ -77,8 +77,6 @@ static const u8 mbm_guid[16] = {
 
 static void usbnet_cdc_update_filter(struct usbnet *dev)
 {
-	struct cdc_state	*info = (void *) &dev->data;
-	struct usb_interface	*intf = info->control;
 	struct net_device	*net = dev->net;
 
 	u16 cdc_filter = USB_CDC_PACKET_TYPE_DIRECTED
@@ -98,7 +96,7 @@ static void usbnet_cdc_update_filter(struct usbnet *dev)
 			USB_CDC_SET_ETHERNET_PACKET_FILTER,
 			USB_TYPE_CLASS | USB_RECIP_INTERFACE,
 			cdc_filter,
-			intf->cur_altsetting->desc.bInterfaceNumber,
+			dev->intf->cur_altsetting->desc.bInterfaceNumber,
 			NULL,
 			0,
 			USB_CTRL_SET_TIMEOUT
-- 
2.17.1

^ permalink raw reply related

* Re: [PATCH] wlcore: Fix memory leak in wlcore_cmd_wait_for_event_or_timeout
From: Tony Lindgren @ 2018-07-02 11:30 UTC (permalink / raw)
  To: Gustavo A. R. Silva
  Cc: Kalle Valo, David S. Miller, linux-wireless, netdev, linux-kernel
In-Reply-To: <20180628130809.GA8147@embeddedor.com>

* Gustavo A. R. Silva <gustavo@embeddedor.com> [180628 13:11]:
> In case memory resources for *events_vector* were allocated, release
> them before return.
> 
> Addresses-Coverity-ID: 1470194 ("Resource leak")
> Fixes: 4ec7cece87b3 ("wlcore: Add missing PM call for wlcore_cmd_wait_for_event_or_timeout()")
> Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>

Thanks for catching this one:

Acked-by: Tony Lindgren <tony@atomide.com>

> ---
>  drivers/net/wireless/ti/wlcore/cmd.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/net/wireless/ti/wlcore/cmd.c b/drivers/net/wireless/ti/wlcore/cmd.c
> index 836c616..9039687 100644
> --- a/drivers/net/wireless/ti/wlcore/cmd.c
> +++ b/drivers/net/wireless/ti/wlcore/cmd.c
> @@ -195,8 +195,7 @@ int wlcore_cmd_wait_for_event_or_timeout(struct wl1271 *wl,
>  	ret = pm_runtime_get_sync(wl->dev);
>  	if (ret < 0) {
>  		pm_runtime_put_noidle(wl->dev);
> -
> -		return ret;
> +		goto free_vector;
>  	}
>  
>  	do {
> @@ -232,6 +231,7 @@ int wlcore_cmd_wait_for_event_or_timeout(struct wl1271 *wl,
>  out:
>  	pm_runtime_mark_last_busy(wl->dev);
>  	pm_runtime_put_autosuspend(wl->dev);
> +free_vector:
>  	kfree(events_vector);
>  	return ret;
>  }
> -- 
> 2.7.4
> 

^ permalink raw reply

* Re: [PATCH net] net: fix use-after-free in GRO with ESP
From: David Miller @ 2018-07-02 11:34 UTC (permalink / raw)
  To: sd; +Cc: netdev, sbrivio, steffen.klassert
In-Reply-To: <62edcac57b52aa0546936700f7a0b50a2327806a.1530368567.git.sd@queasysnail.net>

From: Sabrina Dubroca <sd@queasysnail.net>
Date: Sat, 30 Jun 2018 17:38:55 +0200

> Since the addition of GRO for ESP, gro_receive can consume the skb and
> return -EINPROGRESS. In that case, the lower layer GRO handler cannot
> touch the skb anymore.
> 
> Commit 5f114163f2f5 ("net: Add a skb_gro_flush_final helper.") converted
> some of the gro_receive handlers that can lead to ESP's gro_receive so
> that they wouldn't access the skb when -EINPROGRESS is returned, but
> missed other spots, mainly in tunneling protocols.
> 
> This patch finishes the conversion to using skb_gro_flush_final(), and
> adds a new helper, skb_gro_flush_final_remcsum(), used in VXLAN and
> GUE.
> 
> Fixes: 5f114163f2f5 ("net: Add a skb_gro_flush_final helper.")
> Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
> Reviewed-by: Stefano Brivio <sbrivio@redhat.com>

Applied and queued up for -stable, thank you.

^ permalink raw reply

* Re: [PATCH][next] netdevsim: fix sa_idx out of bounds check
From: David Miller @ 2018-07-02 11:36 UTC (permalink / raw)
  To: colin.king; +Cc: jakub.kicinski, netdev, kernel-janitors, linux-kernel
In-Reply-To: <20180630203924.5121-1-colin.king@canonical.com>

From: Colin King <colin.king@canonical.com>
Date: Sat, 30 Jun 2018 21:39:24 +0100

> From: Colin Ian King <colin.king@canonical.com>
> 
> Currently if sa_idx is equal to NSIM_IPSEC_MAX_SA_COUNT then
> an out-of-bounds read on ipsec->sa will occur. Fix the
> incorrect bounds check by using >= rather than >.
> 
> Detected by CoverityScan, CID#1470226 ("Out-of-bounds-read")
> 
> Fixes: 7699353da875 ("netdevsim: add ipsec offload testing")
> Signed-off-by: Colin Ian King <colin.king@canonical.com>

Applied, thank you.

^ permalink raw reply

* Re: [PATCH net] ipv6: sr: fix passing wrong flags to crypto_alloc_shash()
From: David Miller @ 2018-07-02 11:37 UTC (permalink / raw)
  To: ebiggers3; +Cc: netdev, david.lebrun, linux-crypto, ebiggers
In-Reply-To: <20180630222656.333-1-ebiggers3@gmail.com>

From: Eric Biggers <ebiggers3@gmail.com>
Date: Sat, 30 Jun 2018 15:26:56 -0700

> From: Eric Biggers <ebiggers@google.com>
> 
> The 'mask' argument to crypto_alloc_shash() uses the CRYPTO_ALG_* flags,
> not 'gfp_t'.  So don't pass GFP_KERNEL to it.
> 
> Fixes: bf355b8d2c30 ("ipv6: sr: add core files for SR HMAC support")
> Signed-off-by: Eric Biggers <ebiggers@google.com>

Oops, applied and queued up for -stable, thanks!

^ permalink raw reply

* Re: [PATCHv2 net] ipvlan: call dev_change_flags when ipvlan mode is reset
From: David Miller @ 2018-07-02 11:38 UTC (permalink / raw)
  To: liuhangbin; +Cc: netdev, sbrivio, pabeni, maheshb, xiyou.wangcong, sd
In-Reply-To: <1530433281-22743-1-git-send-email-liuhangbin@gmail.com>

From: Hangbin Liu <liuhangbin@gmail.com>
Date: Sun,  1 Jul 2018 16:21:21 +0800

> After we change the ipvlan mode from l3 to l2, or vice versa, we only
> reset IFF_NOARP flag, but don't flush the ARP table cache, which will
> cause eth->h_dest to be equal to eth->h_source in ipvlan_xmit_mode_l2().
> Then the message will not come out of host.
> 
> Here is the reproducer on local host:
> 
> ip link set eth1 up
> ip addr add 192.168.1.1/24 dev eth1
> ip link add link eth1 ipvlan1 type ipvlan mode l3
> 
> ip netns add net1
> ip link set ipvlan1 netns net1
> ip netns exec net1 ip link set ipvlan1 up
> ip netns exec net1 ip addr add 192.168.2.1/24 dev ipvlan1
> 
> ip route add 192.168.2.0/24 via 192.168.1.2
> ping 192.168.2.2 -c 2
> 
> ip netns exec net1 ip link set ipvlan1 type ipvlan mode l2
> ping 192.168.2.2 -c 2
> 
> Add the same configuration on remote host. After we set the mode to l2,
> we could find that the src/dst MAC addresses are the same on eth1:
> 
> 21:26:06.648565 00:b7:13:ad:d3:05 > 00:b7:13:ad:d3:05, ethertype IPv4 (0x0800), length 98: (tos 0x0, ttl 64, id 58356, offset 0, flags [DF], proto ICMP (1), length 84)
>     192.168.2.1 > 192.168.2.2: ICMP echo request, id 22686, seq 1, length 64
> 
> Fix this by calling dev_change_flags(), which will call netdevice notifier
> with flag change info.
> 
> v2:
> a) As pointed out by Wang Cong, check return value for dev_change_flags() when
> change dev flags.
> b) As suggested by Stefano and Sabrina, move flags setting before l3mdev_ops.
> So we don't need to redo ipvlan_{, un}register_nf_hook() again in err path.
> 
> Reported-by: Jianlin Shi <jishi@redhat.com>
> Reviewed-by: Stefano Brivio <sbrivio@redhat.com>
> Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
> Fixes: 2ad7bf3638411 ("ipvlan: Initial check-in of the IPVLAN driver.")
> Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>

Applied, thank you.

^ permalink raw reply

* [PATCH][net-next] net: increase MAX_GRO_SKBS to 64
From: Li RongQing @ 2018-07-02 11:41 UTC (permalink / raw)
  To: netdev

After 07d78363dcffd [net: Convert NAPI gro list into a small hash table]
there is 8 hash buckets, which allow more flows to be held for merging.

keep each as original list length, so increase MAX_GRO_SKBS to 64

Signed-off-by: Li RongQing <lirongqing@baidu.com>
---
 net/core/dev.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index 08d58e0debe5..ac315e41d5e7 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -149,8 +149,7 @@
 
 #include "net-sysfs.h"
 
-/* Instead of increasing this, you should create a hash table. */
-#define MAX_GRO_SKBS 8
+#define MAX_GRO_SKBS 64
 
 /* This should be increased if a protocol with a bigger head is added. */
 #define GRO_MAX_HEAD (MAX_HEADER + 128)
-- 
2.16.2

^ permalink raw reply related

* Re: [PATCH][net-next] net: increase MAX_GRO_SKBS to 64
From: David Miller @ 2018-07-02 11:44 UTC (permalink / raw)
  To: lirongqing; +Cc: netdev, eric.dumazet
In-Reply-To: <1530531703-11368-1-git-send-email-lirongqing@baidu.com>

From: Li RongQing <lirongqing@baidu.com>
Date: Mon,  2 Jul 2018 19:41:43 +0800

> After 07d78363dcffd [net: Convert NAPI gro list into a small hash table]
> there is 8 hash buckets, which allow more flows to be held for merging.
> 
> keep each as original list length, so increase MAX_GRO_SKBS to 64
> 
> Signed-off-by: Li RongQing <lirongqing@baidu.com>

I would like to hear some feedback from Eric, 64 might be too big.

^ permalink raw reply

* Re: [PATCH net] sctp: fix the issue that pathmtu may be set lower than MINSEGMENT
From: Neil Horman @ 2018-07-02 11:45 UTC (permalink / raw)
  To: Xin Long; +Cc: network dev, linux-sctp, davem, Marcelo Ricardo Leitner,
	syzkaller
In-Reply-To: <0928f7dda7db59c8aa01e97a87792d9e643e70ab.1530514276.git.lucien.xin@gmail.com>

On Mon, Jul 02, 2018 at 02:51:16PM +0800, Xin Long wrote:
> After commit b6c5734db070 ("sctp: fix the handling of ICMP Frag Needed
> for too small MTUs"), sctp_transport_update_pmtu would refetch pathmtu
> from the dst and set it to transport's pathmtu without any check.
> 
> The new pathmtu may be lower than MINSEGMENT if the dst is obsolete and
> updated by .get_dst() in sctp_transport_update_pmtu.
> 
> Syzbot reported a warning in sctp_mtu_payload caused by this.
> 
> This fix uses the refetched pathmtu only when it's greater than the
> frag_needed pmtu.
> 
> Fixes: b6c5734db070 ("sctp: fix the handling of ICMP Frag Needed for too small MTUs")
> Reported-by: syzbot+f0d9d7cba052f9344b03@syzkaller.appspotmail.com
> Signed-off-by: Xin Long <lucien.xin@gmail.com>
> ---
>  net/sctp/transport.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/net/sctp/transport.c b/net/sctp/transport.c
> index 445b7ef..ddfb687 100644
> --- a/net/sctp/transport.c
> +++ b/net/sctp/transport.c
> @@ -282,7 +282,10 @@ bool sctp_transport_update_pmtu(struct sctp_transport *t, u32 pmtu)
>  
>  	if (dst) {
>  		/* Re-fetch, as under layers may have a higher minimum size */
> -		pmtu = SCTP_TRUNC4(dst_mtu(dst));
> +		u32 mtu = SCTP_TRUNC4(dst_mtu(dst));
> +
> +		if (pmtu < mtu)
> +			pmtu = mtu;
nit, but why not u32 mtu = min(pmtu, SCTP_TRUNC4(dst_mtu(dst))) here ?

Neil

>  		change = t->pathmtu != pmtu;
>  	}
>  	t->pathmtu = pmtu;
> -- 
> 2.1.0
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-sctp" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply

* RE: [RFC net-next 15/15] net: lora: Add Semtech SX1301
From: Ben Whitten @ 2018-07-02 11:51 UTC (permalink / raw)
  To: Andreas Färber, netdev@vger.kernel.org
  Cc: linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org, Jian-Hong Pan, Jiri Pirko,
	Marcel Holtmann, David S . Miller, Matthias Brugger, Janus Piwek,
	Michael Röder, Dollar Chen, Ken Yu, Steve deRosier,
	Mark Brown, linux-spi@vger.kernel.org
In-Reply-To: <20180701110804.32415-16-afaerber@suse.de>

Hi Andreas,

Excellent work on doing this I have also been working on and off
this personally for some time.
Have a look at my repository [1] for sx1301 and sx1257 drivers,
I use regmaps capability of switching pages which should simplify
your driver considerably, I also have a full register map and bit field.

I have also been trying to use the clk framework to capture the various
routing that the cards have.

I will dig into this series this evening.

[1] https://github.com/BWhitten/linux-stable/tree/971aadc8fdfe842020d912449bdd71b33d576fe3/drivers/net/lora


> Subject: [RFC net-next 15/15] net: lora: Add Semtech SX1301
> 
> The Semtech SX1301 was the first multi-channel LoRa "concentrator".
> It uses a SPI interface to the host as well as a dual SPI interface to
> its radios. These two have been implemented as spi_controller, so that
> the Device Tree can specify whether the respective module uses two
> SX1257, two SX1255 or a combination of these or some unforeseen chipset.
> 
> This implementation is the most recent - initialization is not yet
> complete, it will need to load firmware into the two on-chip MCUs.
> 
> Unfortunately there is no full datasheet with register descriptions,
> only a BSD-licensed userspace HAL implementation using spidev devices.
> Therefore some register names are unknown.
> 
> Cc: Ben Whitten <ben.whitten@lairdtech.com>
> Cc: Steve deRosier <derosier@gmail.com>
> Cc: Mark Brown <broonie@kernel.org>
> Cc: Michael Röder <michael.roeder@avnet.eu>
> Cc: Ken Yu (禹凯) <ken.yu@rakwireless.com>
> Cc: linux-spi@vger.kernel.org
> Signed-off-by: Andreas Färber <afaerber@suse.de>
> ---
>  drivers/net/lora/Kconfig  |   7 +
>  drivers/net/lora/Makefile |   3 +
>  drivers/net/lora/sx1301.c | 446
> ++++++++++++++++++++++++++++++++++++++++++++++
>  3 files changed, 456 insertions(+)
>  create mode 100644 drivers/net/lora/sx1301.c
> 
> diff --git a/drivers/net/lora/Kconfig b/drivers/net/lora/Kconfig
> index 68c7480d7812..950450e353b4 100644
> --- a/drivers/net/lora/Kconfig
> +++ b/drivers/net/lora/Kconfig
> @@ -45,6 +45,13 @@ config LORA_SX1276
>  	help
>  	  Semtech SX1272/1276/1278
> 
> +config LORA_SX1301
> +	tristate "Semtech SX1301 SPI driver"
> +	default y
> +	depends on SPI
> +	help
> +	  Semtech SX1301
> +
>  config LORA_USI
>  	tristate "USI WM-SG-SM-42 driver"
>  	default y
> diff --git a/drivers/net/lora/Makefile b/drivers/net/lora/Makefile
> index 44c578bde7d5..1cc1e3aa189b 100644
> --- a/drivers/net/lora/Makefile
> +++ b/drivers/net/lora/Makefile
> @@ -22,6 +22,9 @@ lora-sx1257-y := sx1257.o
>  obj-$(CONFIG_LORA_SX1276) += lora-sx1276.o
>  lora-sx1276-y := sx1276.o
> 
> +obj-$(CONFIG_LORA_SX1301) += lora-sx1301.o
> +lora-sx1301-y := sx1301.o
> +
>  obj-$(CONFIG_LORA_USI) += lora-usi.o
>  lora-usi-y := usi.o
> 
> diff --git a/drivers/net/lora/sx1301.c b/drivers/net/lora/sx1301.c
> new file mode 100644
> index 000000000000..5c936c1116d1
> --- /dev/null
> +++ b/drivers/net/lora/sx1301.c
> @@ -0,0 +1,446 @@
> +// SPDX-License-Identifier: GPL-2.0-or-later
> +/*
> + * Semtech SX1301 LoRa concentrator
> + *
> + * Copyright (c) 2018 Andreas Färber
> + *
> + * Based on SX1301 HAL code:
> + * Copyright (c) 2013 Semtech-Cycleo
> + */
> +
> +#include <linux/bitops.h>
> +#include <linux/delay.h>
> +#include <linux/lora.h>
> +#include <linux/module.h>
> +#include <linux/netdevice.h>
> +#include <linux/of.h>
> +#include <linux/of_device.h>
> +#include <linux/of_gpio.h>
> +#include <linux/lora/dev.h>
> +#include <linux/spi/spi.h>
> +
> +#define REG_PAGE_RESET			0
> +#define REG_VERSION			1
> +#define REG_2_SPI_RADIO_A_DATA		33
> +#define REG_2_SPI_RADIO_A_DATA_READBACK	34
> +#define REG_2_SPI_RADIO_A_ADDR		35
> +#define REG_2_SPI_RADIO_A_CS		37
> +#define REG_2_SPI_RADIO_B_DATA		38
> +#define REG_2_SPI_RADIO_B_DATA_READBACK	39
> +#define REG_2_SPI_RADIO_B_ADDR		40
> +#define REG_2_SPI_RADIO_B_CS		42
> +
> +#define REG_PAGE_RESET_SOFT_RESET	BIT(7)
> +
> +#define REG_16_GLOBAL_EN		BIT(3)
> +
> +#define REG_17_CLK32M_EN		BIT(0)
> +
> +#define REG_2_43_RADIO_A_EN		BIT(0)
> +#define REG_2_43_RADIO_B_EN		BIT(1)
> +#define REG_2_43_RADIO_RST		BIT(2)
> +
> +struct spi_sx1301 {
> +	struct spi_device *parent;
> +	u8 page;
> +	u8 regs;
> +};
> +
> +struct sx1301_priv {
> +	struct lora_priv lora;
> +	struct gpio_desc *rst_gpio;
> +	u8 cur_page;
> +	struct spi_controller *radio_a_ctrl, *radio_b_ctrl;
> +};
> +
> +static int sx1301_read(struct spi_device *spi, u8 reg, u8 *val)
> +{
> +	u8 addr = reg & 0x7f;
> +	return spi_write_then_read(spi, &addr, 1, val, 1);
> +}
> +
> +static int sx1301_write(struct spi_device *spi, u8 reg, u8 val)
> +{
> +	u8 buf[2];
> +
> +	buf[0] = reg | BIT(7);
> +	buf[1] = val;
> +	return spi_write(spi, buf, 2);
> +}
> +
> +static int sx1301_page_switch(struct spi_device *spi, u8 page)
> +{
> +	struct sx1301_priv *priv = spi_get_drvdata(spi);
> +	int ret;
> +
> +	if (priv->cur_page == page)
> +		return 0;
> +
> +	dev_dbg(&spi->dev, "switching to page %u\n", (unsigned)page);
> +	ret = sx1301_write(spi, REG_PAGE_RESET, page & 0x3);
> +	if (ret) {
> +		dev_err(&spi->dev, "switching to page %u failed\n",
> (unsigned)page);
> +		return ret;
> +	}
> +
> +	priv->cur_page = page;
> +
> +	return 0;
> +}
> +
> +static int sx1301_soft_reset(struct spi_device *spi)
> +{
> +	return sx1301_write(spi, REG_PAGE_RESET,
> REG_PAGE_RESET_SOFT_RESET);
> +}
> +
> +#define REG_RADIO_X_DATA		0
> +#define REG_RADIO_X_DATA_READBACK	1
> +#define REG_RADIO_X_ADDR		2
> +#define REG_RADIO_X_CS			4
> +
> +static int sx1301_radio_set_cs(struct spi_controller *ctrl, bool enable)
> +{
> +	struct spi_sx1301 *ssx = spi_controller_get_devdata(ctrl);
> +	u8 cs;
> +	int ret;
> +
> +	dev_dbg(&ctrl->dev, "setting CS to %s\n", enable ? "1" : "0");
> +
> +	ret = sx1301_page_switch(ssx->parent, ssx->page);
> +	if (ret) {
> +		dev_warn(&ctrl->dev, "failed to switch page for CS (%d)\n",
> ret);
> +		return ret;
> +	}
> +
> +	ret = sx1301_read(ssx->parent, ssx->regs + REG_RADIO_X_CS, &cs);
> +	if (ret) {
> +		dev_warn(&ctrl->dev, "failed to read CS (%d)\n", ret);
> +		cs = 0;
> +	}
> +
> +	if (enable)
> +		cs |= BIT(0);
> +	else
> +		cs &= ~BIT(0);
> +
> +	ret = sx1301_write(ssx->parent, ssx->regs + REG_RADIO_X_CS, cs);
> +	if (ret)
> +		dev_warn(&ctrl->dev, "failed to write CS (%d)\n", ret);
> +
> +	return 0;
> +}
> +
> +static void sx1301_radio_spi_set_cs(struct spi_device *spi, bool enable)
> +{
> +	int ret;
> +
> +	dev_dbg(&spi->dev, "setting SPI CS to %s\n", enable ? "1" : "0");
> +
> +	if (enable)
> +		return;
> +
> +	ret = sx1301_radio_set_cs(spi->controller, enable);
> +	if (ret)
> +		dev_warn(&spi->dev, "failed to write CS (%d)\n", ret);
> +}
> +
> +static int sx1301_radio_spi_transfer_one(struct spi_controller *ctrl,
> +	struct spi_device *spi, struct spi_transfer *xfr)
> +{
> +	struct spi_sx1301 *ssx = spi_controller_get_devdata(ctrl);
> +	const u8 *tx_buf = xfr->tx_buf;
> +	u8 *rx_buf = xfr->rx_buf;
> +	int ret;
> +
> +	if (xfr->len == 0 || xfr->len > 3)
> +		return -EINVAL;
> +
> +	dev_dbg(&spi->dev, "transferring one (%u)\n", xfr->len);
> +
> +	ret = sx1301_page_switch(ssx->parent, ssx->page);
> +	if (ret) {
> +		dev_err(&spi->dev, "failed to switch page for transfer
> (%d)\n", ret);
> +		return ret;
> +	}
> +
> +	if (tx_buf) {
> +		ret = sx1301_write(ssx->parent, ssx->regs +
> REG_RADIO_X_ADDR, tx_buf ? tx_buf[0] : 0);
> +		if (ret) {
> +			dev_err(&spi->dev, "SPI radio address write
> failed\n");
> +			return ret;
> +		}
> +
> +		ret = sx1301_write(ssx->parent, ssx->regs +
> REG_RADIO_X_DATA, (tx_buf && xfr->len >= 2) ? tx_buf[1] : 0);
> +		if (ret) {
> +			dev_err(&spi->dev, "SPI radio data write failed\n");
> +			return ret;
> +		}
> +
> +		ret = sx1301_radio_set_cs(ctrl, true);
> +		if (ret) {
> +			dev_err(&spi->dev, "SPI radio CS set failed\n");
> +			return ret;
> +		}
> +
> +		ret = sx1301_radio_set_cs(ctrl, false);
> +		if (ret) {
> +			dev_err(&spi->dev, "SPI radio CS unset failed\n");
> +			return ret;
> +		}
> +	}
> +
> +	if (rx_buf) {
> +		ret = sx1301_read(ssx->parent, ssx->regs +
> REG_RADIO_X_DATA_READBACK, &rx_buf[xfr->len - 1]);
> +		if (ret) {
> +			dev_err(&spi->dev, "SPI radio data read failed\n");
> +			return ret;
> +		}
> +	}
> +
> +	return 0;
> +}
> +
> +static void sx1301_radio_setup(struct spi_controller *ctrl)
> +{
> +	ctrl->mode_bits = SPI_CS_HIGH | SPI_NO_CS;
> +	ctrl->bits_per_word_mask = SPI_BPW_MASK(8);
> +	ctrl->num_chipselect = 1;
> +	ctrl->set_cs = sx1301_radio_spi_set_cs;
> +	ctrl->transfer_one = sx1301_radio_spi_transfer_one;
> +}
> +
> +static int sx1301_probe(struct spi_device *spi)
> +{
> +	struct net_device *netdev;
> +	struct sx1301_priv *priv;
> +	struct spi_sx1301 *radio;
> +	struct gpio_desc *rst;
> +	int ret;
> +	u8 val;
> +
> +	rst = devm_gpiod_get_optional(&spi->dev, "reset",
> GPIOD_OUT_LOW);
> +	if (IS_ERR(rst))
> +		return PTR_ERR(rst);
> +
> +	gpiod_set_value_cansleep(rst, 1);
> +	msleep(100);
> +	gpiod_set_value_cansleep(rst, 0);
> +	msleep(100);
> +
> +	spi->bits_per_word = 8;
> +	spi_setup(spi);
> +
> +	ret = sx1301_read(spi, REG_VERSION, &val);
> +	if (ret) {
> +		dev_err(&spi->dev, "version read failed\n");
> +		goto err_version;
> +	}
> +
> +	if (val != 103) {
> +		dev_err(&spi->dev, "unexpected version: %u\n", val);
> +		ret = -ENXIO;
> +		goto err_version;
> +	}
> +
> +	netdev = alloc_loradev(sizeof(*priv));
> +	if (!netdev) {
> +		ret = -ENOMEM;
> +		goto err_alloc_loradev;
> +	}
> +
> +	priv = netdev_priv(netdev);
> +	priv->rst_gpio = rst;
> +	priv->cur_page = 0xff;
> +
> +	spi_set_drvdata(spi, netdev);
> +	SET_NETDEV_DEV(netdev, &spi->dev);
> +
> +	ret = sx1301_write(spi, REG_PAGE_RESET, 0);
> +	if (ret) {
> +		dev_err(&spi->dev, "page/reset write failed\n");
> +		return ret;
> +	}
> +
> +	ret = sx1301_soft_reset(spi);
> +	if (ret) {
> +		dev_err(&spi->dev, "soft reset failed\n");
> +		return ret;
> +	}
> +
> +	ret = sx1301_read(spi, 16, &val);
> +	if (ret) {
> +		dev_err(&spi->dev, "16 read failed\n");
> +		return ret;
> +	}
> +
> +	val &= ~REG_16_GLOBAL_EN;
> +
> +	ret = sx1301_write(spi, 16, val);
> +	if (ret) {
> +		dev_err(&spi->dev, "16 write failed\n");
> +		return ret;
> +	}
> +
> +	ret = sx1301_read(spi, 17, &val);
> +	if (ret) {
> +		dev_err(&spi->dev, "17 read failed\n");
> +		return ret;
> +	}
> +
> +	val &= ~REG_17_CLK32M_EN;
> +
> +	ret = sx1301_write(spi, 17, val);
> +	if (ret) {
> +		dev_err(&spi->dev, "17 write failed\n");
> +		return ret;
> +	}
> +
> +	ret = sx1301_page_switch(spi, 2);
> +	if (ret) {
> +		dev_err(&spi->dev, "page 2 switch failed\n");
> +		return ret;
> +	}
> +
> +	ret = sx1301_read(spi, 43, &val);
> +	if (ret) {
> +		dev_err(&spi->dev, "2|43 read failed\n");
> +		return ret;
> +	}
> +
> +	val |= REG_2_43_RADIO_B_EN | REG_2_43_RADIO_A_EN;
> +
> +	ret = sx1301_write(spi, 43, val);
> +	if (ret) {
> +		dev_err(&spi->dev, "2|43 write failed\n");
> +		return ret;
> +	}
> +
> +	msleep(500);
> +
> +	ret = sx1301_read(spi, 43, &val);
> +	if (ret) {
> +		dev_err(&spi->dev, "2|43 read failed\n");
> +		return ret;
> +	}
> +
> +	val |= REG_2_43_RADIO_RST;
> +
> +	ret = sx1301_write(spi, 43, val);
> +	if (ret) {
> +		dev_err(&spi->dev, "2|43 write failed\n");
> +		return ret;
> +	}
> +
> +	msleep(5);
> +
> +	ret = sx1301_read(spi, 43, &val);
> +	if (ret) {
> +		dev_err(&spi->dev, "2|43 read failed\n");
> +		return ret;
> +	}
> +
> +	val &= ~REG_2_43_RADIO_RST;
> +
> +	ret = sx1301_write(spi, 43, val);
> +	if (ret) {
> +		dev_err(&spi->dev, "2|43 write failed\n");
> +		return ret;
> +	}
> +
> +	/* radio A */
> +
> +	priv->radio_a_ctrl = spi_alloc_master(&spi->dev, sizeof(*radio));
> +	if (!priv->radio_a_ctrl) {
> +		ret = -ENOMEM;
> +		goto err_radio_a_alloc;
> +	}
> +
> +	sx1301_radio_setup(priv->radio_a_ctrl);
> +	priv->radio_a_ctrl->dev.of_node = of_get_child_by_name(spi-
> >dev.of_node, "radio-a");
> +
> +	radio = spi_controller_get_devdata(priv->radio_a_ctrl);
> +	radio->page = 2;
> +	radio->regs = REG_2_SPI_RADIO_A_DATA;
> +	radio->parent = spi;
> +
> +	dev_info(&spi->dev, "registering radio A SPI\n");
> +
> +	ret = devm_spi_register_controller(&spi->dev, priv->radio_a_ctrl);
> +	if (ret) {
> +		dev_err(&spi->dev, "radio A SPI register failed\n");
> +		goto err_radio_a_register;
> +	}
> +
> +	/* radio B */
> +
> +	priv->radio_b_ctrl = spi_alloc_master(&spi->dev, sizeof(*radio));
> +	if (!priv->radio_b_ctrl) {
> +		ret = -ENOMEM;
> +		goto err_radio_b_alloc;
> +	}
> +
> +	sx1301_radio_setup(priv->radio_b_ctrl);
> +	priv->radio_b_ctrl->dev.of_node = of_get_child_by_name(spi-
> >dev.of_node, "radio-b");
> +
> +	radio = spi_controller_get_devdata(priv->radio_b_ctrl);
> +	radio->page = 2;
> +	radio->regs = REG_2_SPI_RADIO_B_DATA;
> +	radio->parent = spi;
> +
> +	dev_info(&spi->dev, "registering radio B SPI\n");
> +
> +	ret = devm_spi_register_controller(&spi->dev, priv->radio_b_ctrl);
> +	if (ret) {
> +		dev_err(&spi->dev, "radio B SPI register failed\n");
> +		goto err_radio_b_register;
> +	}
> +
> +	dev_info(&spi->dev, "SX1301 module probed\n");
> +
> +	return 0;
> +err_radio_b_register:
> +	spi_controller_put(priv->radio_b_ctrl);
> +err_radio_b_alloc:
> +err_radio_a_register:
> +	spi_controller_put(priv->radio_a_ctrl);
> +err_radio_a_alloc:
> +	free_loradev(netdev);
> +err_alloc_loradev:
> +err_version:
> +	return ret;
> +}
> +
> +static int sx1301_remove(struct spi_device *spi)
> +{
> +	struct net_device *netdev = spi_get_drvdata(spi);
> +
> +	//unregister_loradev(netdev);
> +	free_loradev(netdev);
> +
> +	dev_info(&spi->dev, "SX1301 module removed\n");
> +
> +	return 0;
> +}
> +
> +#ifdef CONFIG_OF
> +static const struct of_device_id sx1301_dt_ids[] = {
> +	{ .compatible = "semtech,sx1301" },
> +	{}
> +};
> +MODULE_DEVICE_TABLE(of, sx1301_dt_ids);
> +#endif
> +
> +static struct spi_driver sx1301_spi_driver = {
> +	.driver = {
> +		.name = "sx1301",
> +		.of_match_table = of_match_ptr(sx1301_dt_ids),
> +	},
> +	.probe = sx1301_probe,
> +	.remove = sx1301_remove,
> +};
> +
> +module_spi_driver(sx1301_spi_driver);
> +
> +MODULE_DESCRIPTION("SX1301 SPI driver");
> +MODULE_AUTHOR("Andreas Färber <afaerber@suse.de>");
> +MODULE_LICENSE("GPL");
> --
> 2.16.4


^ permalink raw reply

* [PATCH] net: stmmac_tc: use 64-bit arithmetic instead of 32-bit
From: Gustavo A. R. Silva @ 2018-07-02 12:09 UTC (permalink / raw)
  To: Giuseppe Cavallaro, Alexandre Torgue, Jose Abreu, David S. Miller
  Cc: netdev, linux-kernel, Gustavo A. R. Silva

Add suffix UL to constant 1024 in order to give the compiler complete
information about the proper arithmetic to use. Notice that this
constant is used in a context that expects an expression of type
u64 (64 bits, unsigned) and  following expressions are currently
being evaluated using 32-bit arithmetic:

qopt->idleslope * 1024 * ptr
qopt->hicredit * 1024 * 8
qopt->locredit * 1024 * 8

Addresses-Coverity-ID: 1470246 ("Unintentional integer overflow")
Addresses-Coverity-ID: 1470248 ("Unintentional integer overflow")
Addresses-Coverity-ID: 1470249 ("Unintentional integer overflow")
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
---
 drivers/net/ethernet/stmicro/stmmac/stmmac_tc.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_tc.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_tc.c
index 0b0fca0..8fedc28 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_tc.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_tc.c
@@ -321,7 +321,7 @@ static int tc_setup_cbs(struct stmmac_priv *priv,
 	speed_div = (priv->speed == SPEED_100) ? 100000 : 1000000;
 
 	/* Final adjustments for HW */
-	value = qopt->idleslope * 1024 * ptr;
+	value = qopt->idleslope * 1024UL * ptr;
 	do_div(value, speed_div);
 	priv->plat->tx_queues_cfg[queue].idle_slope = value & GENMASK(31, 0);
 
@@ -329,10 +329,10 @@ static int tc_setup_cbs(struct stmmac_priv *priv,
 	do_div(value, speed_div);
 	priv->plat->tx_queues_cfg[queue].send_slope = value & GENMASK(31, 0);
 
-	value = qopt->hicredit * 1024 * 8;
+	value = qopt->hicredit * 1024UL * 8;
 	priv->plat->tx_queues_cfg[queue].high_credit = value & GENMASK(31, 0);
 
-	value = qopt->locredit * 1024 * 8;
+	value = qopt->locredit * 1024UL * 8;
 	priv->plat->tx_queues_cfg[queue].low_credit = value & GENMASK(31, 0);
 
 	ret = stmmac_config_cbs(priv, priv->hw,
-- 
2.7.4

^ permalink raw reply related

* [PATCH net-next v4 1/4] vhost: lock the vqs one by one
From: xiangxia.m.yue @ 2018-07-02 12:57 UTC (permalink / raw)
  To: jasowang
  Cc: mst, makita.toshiaki, virtualization, netdev, Tonghao Zhang,
	Tonghao Zhang
In-Reply-To: <1530536228-17462-1-git-send-email-xiangxia.m.yue@gmail.com>

From: Tonghao Zhang <xiangxia.m.yue@gmail.com>

This patch changes the way that lock all vqs
at the same, to lock them one by one. It will
be used for next patch to avoid the deadlock.

Signed-off-by: Tonghao Zhang <zhangtonghao@didichuxing.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/vhost/vhost.c | 24 +++++++-----------------
 1 file changed, 7 insertions(+), 17 deletions(-)

diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index 895eaa2..4ca9383 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -294,8 +294,11 @@ static void vhost_vq_meta_reset(struct vhost_dev *d)
 {
 	int i;
 
-	for (i = 0; i < d->nvqs; ++i)
+	for (i = 0; i < d->nvqs; ++i) {
+		mutex_lock(&d->vqs[i]->mutex);
 		__vhost_vq_meta_reset(d->vqs[i]);
+		mutex_unlock(&d->vqs[i]->mutex);
+	}
 }
 
 static void vhost_vq_reset(struct vhost_dev *dev,
@@ -887,20 +890,6 @@ static inline void __user *__vhost_get_user(struct vhost_virtqueue *vq,
 #define vhost_get_used(vq, x, ptr) \
 	vhost_get_user(vq, x, ptr, VHOST_ADDR_USED)
 
-static void vhost_dev_lock_vqs(struct vhost_dev *d)
-{
-	int i = 0;
-	for (i = 0; i < d->nvqs; ++i)
-		mutex_lock_nested(&d->vqs[i]->mutex, i);
-}
-
-static void vhost_dev_unlock_vqs(struct vhost_dev *d)
-{
-	int i = 0;
-	for (i = 0; i < d->nvqs; ++i)
-		mutex_unlock(&d->vqs[i]->mutex);
-}
-
 static int vhost_new_umem_range(struct vhost_umem *umem,
 				u64 start, u64 size, u64 end,
 				u64 userspace_addr, int perm)
@@ -950,7 +939,10 @@ static void vhost_iotlb_notify_vq(struct vhost_dev *d,
 		if (msg->iova <= vq_msg->iova &&
 		    msg->iova + msg->size - 1 > vq_msg->iova &&
 		    vq_msg->type == VHOST_IOTLB_MISS) {
+			mutex_lock(&node->vq->mutex);
 			vhost_poll_queue(&node->vq->poll);
+			mutex_unlock(&node->vq->mutex);
+
 			list_del(&node->node);
 			kfree(node);
 		}
@@ -982,7 +974,6 @@ static int vhost_process_iotlb_msg(struct vhost_dev *dev,
 	int ret = 0;
 
 	mutex_lock(&dev->mutex);
-	vhost_dev_lock_vqs(dev);
 	switch (msg->type) {
 	case VHOST_IOTLB_UPDATE:
 		if (!dev->iotlb) {
@@ -1016,7 +1007,6 @@ static int vhost_process_iotlb_msg(struct vhost_dev *dev,
 		break;
 	}
 
-	vhost_dev_unlock_vqs(dev);
 	mutex_unlock(&dev->mutex);
 
 	return ret;
-- 
1.8.3.1

^ permalink raw reply related

* [PATCH net-next v4 2/4] net: vhost: replace magic number of lock annotation
From: xiangxia.m.yue @ 2018-07-02 12:57 UTC (permalink / raw)
  To: jasowang
  Cc: mst, makita.toshiaki, virtualization, netdev, Tonghao Zhang,
	Tonghao Zhang
In-Reply-To: <1530536228-17462-1-git-send-email-xiangxia.m.yue@gmail.com>

From: Tonghao Zhang <xiangxia.m.yue@gmail.com>

Use the VHOST_NET_VQ_XXX as a subclass for mutex_lock_nested.

Signed-off-by: Tonghao Zhang <zhangtonghao@didichuxing.com>
Acked-by: Jason Wang <jasowang@redhat.com>
---
 drivers/vhost/net.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index e7cf7d2..62bb8e8 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -484,7 +484,7 @@ static void handle_tx(struct vhost_net *net)
 	bool zcopy, zcopy_used;
 	int sent_pkts = 0;
 
-	mutex_lock(&vq->mutex);
+	mutex_lock_nested(&vq->mutex, VHOST_NET_VQ_TX);
 	sock = vq->private_data;
 	if (!sock)
 		goto out;
@@ -655,7 +655,7 @@ static int vhost_net_rx_peek_head_len(struct vhost_net *net, struct sock *sk)
 		/* Flush batched heads first */
 		vhost_rx_signal_used(rvq);
 		/* Both tx vq and rx socket were polled here */
-		mutex_lock_nested(&vq->mutex, 1);
+		mutex_lock_nested(&vq->mutex, VHOST_NET_VQ_TX);
 		vhost_disable_notify(&net->dev, vq);
 
 		preempt_disable();
@@ -789,7 +789,7 @@ static void handle_rx(struct vhost_net *net)
 	__virtio16 num_buffers;
 	int recv_pkts = 0;
 
-	mutex_lock_nested(&vq->mutex, 0);
+	mutex_lock_nested(&vq->mutex, VHOST_NET_VQ_RX);
 	sock = vq->private_data;
 	if (!sock)
 		goto out;
-- 
1.8.3.1

^ permalink raw reply related

* [PATCH net-next v4 3/4] net: vhost: factor out busy polling logic to vhost_net_busy_poll()
From: xiangxia.m.yue @ 2018-07-02 12:57 UTC (permalink / raw)
  To: jasowang
  Cc: mst, makita.toshiaki, virtualization, netdev, Tonghao Zhang,
	Tonghao Zhang
In-Reply-To: <1530536228-17462-1-git-send-email-xiangxia.m.yue@gmail.com>

From: Tonghao Zhang <xiangxia.m.yue@gmail.com>

Factor out generic busy polling logic and will be
used for in tx path in the next patch. And with the patch,
qemu can set differently the busyloop_timeout for rx queue.

Signed-off-by: Tonghao Zhang <zhangtonghao@didichuxing.com>
---
 drivers/vhost/net.c | 94 +++++++++++++++++++++++++++++++----------------------
 1 file changed, 55 insertions(+), 39 deletions(-)

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 62bb8e8..2790959 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -429,6 +429,52 @@ static int vhost_net_enable_vq(struct vhost_net *n,
 	return vhost_poll_start(poll, sock->file);
 }
 
+static int sk_has_rx_data(struct sock *sk)
+{
+	struct socket *sock = sk->sk_socket;
+
+	if (sock->ops->peek_len)
+		return sock->ops->peek_len(sock);
+
+	return skb_queue_empty(&sk->sk_receive_queue);
+}
+
+static void vhost_net_busy_poll(struct vhost_net *net,
+				struct vhost_virtqueue *rvq,
+				struct vhost_virtqueue *tvq,
+				bool rx)
+{
+	unsigned long uninitialized_var(endtime);
+	unsigned long busyloop_timeout;
+	struct socket *sock;
+	struct vhost_virtqueue *vq = rx ? tvq : rvq;
+
+	mutex_lock_nested(&vq->mutex, rx ? VHOST_NET_VQ_TX: VHOST_NET_VQ_RX);
+
+	vhost_disable_notify(&net->dev, vq);
+	sock = rvq->private_data;
+	busyloop_timeout = rx ? rvq->busyloop_timeout : tvq->busyloop_timeout;
+
+	preempt_disable();
+	endtime = busy_clock() + busyloop_timeout;
+	while (vhost_can_busy_poll(tvq->dev, endtime) &&
+	       !(sock && sk_has_rx_data(sock->sk)) &&
+	       vhost_vq_avail_empty(tvq->dev, tvq))
+		cpu_relax();
+	preempt_enable();
+
+	if ((rx && !vhost_vq_avail_empty(&net->dev, vq)) ||
+	    (!rx && (sock && sk_has_rx_data(sock->sk)))) {
+		vhost_poll_queue(&vq->poll);
+	} else if (unlikely(vhost_enable_notify(&net->dev, vq))) {
+		vhost_disable_notify(&net->dev, vq);
+		vhost_poll_queue(&vq->poll);
+	}
+
+	mutex_unlock(&vq->mutex);
+}
+
+
 static int vhost_net_tx_get_vq_desc(struct vhost_net *net,
 				    struct vhost_virtqueue *vq,
 				    struct iovec iov[], unsigned int iov_size,
@@ -621,16 +667,6 @@ static int peek_head_len(struct vhost_net_virtqueue *rvq, struct sock *sk)
 	return len;
 }
 
-static int sk_has_rx_data(struct sock *sk)
-{
-	struct socket *sock = sk->sk_socket;
-
-	if (sock->ops->peek_len)
-		return sock->ops->peek_len(sock);
-
-	return skb_queue_empty(&sk->sk_receive_queue);
-}
-
 static void vhost_rx_signal_used(struct vhost_net_virtqueue *nvq)
 {
 	struct vhost_virtqueue *vq = &nvq->vq;
@@ -645,39 +681,19 @@ static void vhost_rx_signal_used(struct vhost_net_virtqueue *nvq)
 
 static int vhost_net_rx_peek_head_len(struct vhost_net *net, struct sock *sk)
 {
-	struct vhost_net_virtqueue *rvq = &net->vqs[VHOST_NET_VQ_RX];
-	struct vhost_net_virtqueue *nvq = &net->vqs[VHOST_NET_VQ_TX];
-	struct vhost_virtqueue *vq = &nvq->vq;
-	unsigned long uninitialized_var(endtime);
-	int len = peek_head_len(rvq, sk);
+	struct vhost_net_virtqueue *rnvq = &net->vqs[VHOST_NET_VQ_RX];
+	struct vhost_net_virtqueue *tnvq = &net->vqs[VHOST_NET_VQ_TX];
 
-	if (!len && vq->busyloop_timeout) {
-		/* Flush batched heads first */
-		vhost_rx_signal_used(rvq);
-		/* Both tx vq and rx socket were polled here */
-		mutex_lock_nested(&vq->mutex, VHOST_NET_VQ_TX);
-		vhost_disable_notify(&net->dev, vq);
+	int len = peek_head_len(rnvq, sk);
 
-		preempt_disable();
-		endtime = busy_clock() + vq->busyloop_timeout;
-
-		while (vhost_can_busy_poll(&net->dev, endtime) &&
-		       !sk_has_rx_data(sk) &&
-		       vhost_vq_avail_empty(&net->dev, vq))
-			cpu_relax();
-
-		preempt_enable();
-
-		if (!vhost_vq_avail_empty(&net->dev, vq))
-			vhost_poll_queue(&vq->poll);
-		else if (unlikely(vhost_enable_notify(&net->dev, vq))) {
-			vhost_disable_notify(&net->dev, vq);
-			vhost_poll_queue(&vq->poll);
-		}
+	if (!len && rnvq->vq.busyloop_timeout) {
+		/* Flush batched heads first */
+		vhost_rx_signal_used(rnvq);
 
-		mutex_unlock(&vq->mutex);
+		/* Both tx vq and rx socket were polled here */
+		vhost_net_busy_poll(net, &rnvq->vq, &tnvq->vq, true);
 
-		len = peek_head_len(rvq, sk);
+		len = peek_head_len(rnvq, sk);
 	}
 
 	return len;
-- 
1.8.3.1

^ permalink raw reply related

* [PATCH net-next v4 4/4] net: vhost: add rx busy polling in tx path
From: xiangxia.m.yue @ 2018-07-02 12:57 UTC (permalink / raw)
  To: jasowang
  Cc: mst, makita.toshiaki, virtualization, netdev, Tonghao Zhang,
	Tonghao Zhang
In-Reply-To: <1530536228-17462-1-git-send-email-xiangxia.m.yue@gmail.com>

From: Tonghao Zhang <xiangxia.m.yue@gmail.com>

This patch improves the guest receive and transmit performance.
On the handle_tx side, we poll the sock receive queue at the
same time. handle_rx do that in the same way.

We set the poll-us=100us and use the iperf3 to test
its bandwidth, use the netperf to test throughput and mean
latency. When running the tests, the vhost-net kthread of
that VM, is alway 100% CPU. The commands are shown as below.

iperf3  -s -D
iperf3  -c IP -i 1 -P 1 -t 20 -M 1400

or
netserver
netperf -H IP -t TCP_RR -l 20 -- -O "THROUGHPUT,MEAN_LATENCY"

host -> guest:
iperf3:
* With the patch:     27.0 Gbits/sec
* Without the patch:  14.4 Gbits/sec

netperf (TCP_RR):
* With the patch:     48039.56 trans/s, 20.64us mean latency
* Without the patch:  46027.07 trans/s, 21.58us mean latency

This patch also improves the guest transmit performance.

guest -> host:
iperf3:
* With the patch:     27.2 Gbits/sec
* Without the patch:  24.4 Gbits/sec

netperf (TCP_RR):
* With the patch:     47963.25 trans/s, 20.71us mean latency
* Without the patch:  45796.70 trans/s, 21.68us mean latency

Signed-off-by: Tonghao Zhang <zhangtonghao@didichuxing.com>
---
 drivers/vhost/net.c | 10 +++-------
 1 file changed, 3 insertions(+), 7 deletions(-)

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 2790959..3f26547 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -480,17 +480,13 @@ static int vhost_net_tx_get_vq_desc(struct vhost_net *net,
 				    struct iovec iov[], unsigned int iov_size,
 				    unsigned int *out_num, unsigned int *in_num)
 {
-	unsigned long uninitialized_var(endtime);
+	struct vhost_net_virtqueue *rnvq = &net->vqs[VHOST_NET_VQ_RX];
 	int r = vhost_get_vq_desc(vq, vq->iov, ARRAY_SIZE(vq->iov),
 				  out_num, in_num, NULL, NULL);
 
 	if (r == vq->num && vq->busyloop_timeout) {
-		preempt_disable();
-		endtime = busy_clock() + vq->busyloop_timeout;
-		while (vhost_can_busy_poll(vq->dev, endtime) &&
-		       vhost_vq_avail_empty(vq->dev, vq))
-			cpu_relax();
-		preempt_enable();
+		vhost_net_busy_poll(net, &rnvq->vq, vq, false);
+
 		r = vhost_get_vq_desc(vq, vq->iov, ARRAY_SIZE(vq->iov),
 				      out_num, in_num, NULL, NULL);
 	}
-- 
1.8.3.1

^ permalink raw reply related

* [PATCH net-next v4 0/4] net: vhost: improve performance when enable busyloop
From: xiangxia.m.yue @ 2018-07-02 12:57 UTC (permalink / raw)
  To: jasowang; +Cc: netdev, virtualization, mst

From: Tonghao Zhang <xiangxia.m.yue@gmail.com>

This patches improve the guest receive and transmit performance.
On the handle_tx side, we poll the sock receive queue at the same time.
handle_rx do that in the same way.

For more performance report, see patch 4.

v3 -> v4:
fix some issues

v2 -> v3:
This patches are splited from previous big patch:
http://patchwork.ozlabs.org/patch/934673/

Tonghao Zhang (4):
  vhost: lock the vqs one by one
  net: vhost: replace magic number of lock annotation
  net: vhost: factor out busy polling logic to vhost_net_busy_poll()
  net: vhost: add rx busy polling in tx path

 drivers/vhost/net.c   | 108 ++++++++++++++++++++++++++++----------------------
 drivers/vhost/vhost.c |  24 ++++-------
 2 files changed, 67 insertions(+), 65 deletions(-)

-- 
1.8.3.1

^ permalink raw reply

* Re: [PATCH] wlcore: Fix memory leak in wlcore_cmd_wait_for_event_or_timeout
From: Gustavo A. R. Silva @ 2018-07-02 13:01 UTC (permalink / raw)
  To: Tony Lindgren
  Cc: Kalle Valo, David S. Miller, linux-wireless, netdev, linux-kernel
In-Reply-To: <20180702113008.GV112168@atomide.com>

On 07/02/2018 06:30 AM, Tony Lindgren wrote:
> * Gustavo A. R. Silva <gustavo@embeddedor.com> [180628 13:11]:
>> In case memory resources for *events_vector* were allocated, release
>> them before return.
>>
>> Addresses-Coverity-ID: 1470194 ("Resource leak")
>> Fixes: 4ec7cece87b3 ("wlcore: Add missing PM call for wlcore_cmd_wait_for_event_or_timeout()")
>> Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
> 
> Thanks for catching this one:
> 
> Acked-by: Tony Lindgren <tony@atomide.com>
> 

Glad to help. :)

Thanks
--
Gustavo

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox