Netdev List
 help / color / mirror / Atom feed
* [PATCH net-next v2 1/2] rtnetlink: add new RTM_GETSTATS message to dump link stats
From: Roopa Prabhu @ 2016-04-09  6:38 UTC (permalink / raw)
  To: netdev; +Cc: jhs, davem

From: Roopa Prabhu <roopa@cumulusnetworks.com>

This patch adds a new RTM_GETSTATS message to query link stats via netlink
from the kernel. RTM_NEWLINK also dumps stats today, but RTM_NEWLINK
returns a lot more than just stats and is expensive in some cases when
frequent polling for stats from userspace is a common operation.

RTM_GETSTATS is an attempt to provide a light weight netlink message
to explicity query only link stats from the kernel on an interface.
The idea is to also keep it extensible so that new kinds of stats can be
added to it in the future.

This patch adds the following attribute for NETDEV stats:
struct nla_policy ifla_stats_policy[IFLA_STATS_MAX + 1] = {
        [IFLA_STATS_LINK64]  = { .len = sizeof(struct rtnl_link_stats64) },
};

This patch also allows for af family stats (an example af stats for IPV6
is available with the second patch in the series).

Like any other rtnetlink message, RTM_GETSTATS can be used to get stats of
a single interface or all interfaces with NLM_F_DUMP.

Future possible new types of stat attributes:
- IFLA_MPLS_STATS  (nested. for mpls/mdev stats)
- IFLA_EXTENDED_STATS (nested. extended software netdev stats like bridge,
  vlan, vxlan etc)
- IFLA_EXTENDED_HW_STATS (nested. extended hardware stats which are
  available via ethtool today)

This patch also declares a filter mask for all stat attributes.
User has to provide a mask of stats attributes to query. This will be
specified in a new hdr 'struct if_stats_msg' for stats messages.

Without any attributes in the filter_mask, no stats will be returned.

This patch has been tested with mofified iproute2 ifstat.

Suggested-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
---
 include/net/rtnetlink.h        |   5 ++
 include/uapi/linux/if_link.h   |  18 ++++
 include/uapi/linux/rtnetlink.h |   5 ++
 net/core/rtnetlink.c           | 200 +++++++++++++++++++++++++++++++++++++++++
 4 files changed, 228 insertions(+)

diff --git a/include/net/rtnetlink.h b/include/net/rtnetlink.h
index 2f87c1b..fa68158 100644
--- a/include/net/rtnetlink.h
+++ b/include/net/rtnetlink.h
@@ -131,6 +131,11 @@ struct rtnl_af_ops {
 						    const struct nlattr *attr);
 	int			(*set_link_af)(struct net_device *dev,
 					       const struct nlattr *attr);
+	size_t			(*get_link_af_stats_size)(const struct net_device *dev,
+							  u32 filter_mask);
+	int			(*fill_link_af_stats)(struct sk_buff *skb,
+						      const struct net_device *dev,
+						      u32 filter_mask);
 };
 
 void __rtnl_af_unregister(struct rtnl_af_ops *ops);
diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
index 9427f17..4cfd029 100644
--- a/include/uapi/linux/if_link.h
+++ b/include/uapi/linux/if_link.h
@@ -780,4 +780,22 @@ enum {
 
 #define IFLA_HSR_MAX (__IFLA_HSR_MAX - 1)
 
+/* STATS section */
+
+struct if_stats_msg {
+	__u8  family;
+	__u32 ifindex;
+	__u32 filter_mask;
+};
+
+enum {
+	IFLA_STATS_UNSPEC,
+	IFLA_STATS_LINK64,
+	__IFLA_STATS_MAX,
+};
+
+#define IFLA_STATS_MAX (__IFLA_STATS_MAX - 1)
+
+#define IFLA_STATS_FILTER_BIT(ATTR)	(1 << (ATTR))
+
 #endif /* _UAPI_LINUX_IF_LINK_H */
diff --git a/include/uapi/linux/rtnetlink.h b/include/uapi/linux/rtnetlink.h
index ca764b5..cc885c4 100644
--- a/include/uapi/linux/rtnetlink.h
+++ b/include/uapi/linux/rtnetlink.h
@@ -139,6 +139,11 @@ enum {
 	RTM_GETNSID = 90,
 #define RTM_GETNSID RTM_GETNSID
 
+	RTM_NEWSTATS = 92,
+#define RTM_NEWSTATS RTM_NEWSTATS
+	RTM_GETSTATS = 94,
+#define RTM_GETSTATS RTM_GETSTATS
+
 	__RTM_MAX,
 #define RTM_MAX		(((__RTM_MAX + 3) & ~3) - 1)
 };
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index a75f7e9..d1fba58 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -3451,6 +3451,203 @@ out:
 	return err;
 }
 
+static int rtnl_fill_statsinfo(struct sk_buff *skb, struct net_device *dev,
+			       int type, u32 pid, u32 seq, u32 change,
+			       unsigned int flags, unsigned int filter_mask)
+{
+	const struct rtnl_link_stats64 *stats;
+	struct rtnl_link_stats64 temp;
+	struct if_stats_msg *ifsm;
+	struct nlmsghdr *nlh;
+	struct rtnl_af_ops *af_ops;
+	struct nlattr *attr;
+
+	ASSERT_RTNL();
+
+	nlh = nlmsg_put(skb, pid, seq, type, sizeof(*ifsm), flags);
+	if (!nlh)
+		return -EMSGSIZE;
+
+	ifsm = nlmsg_data(nlh);
+	ifsm->ifindex = dev->ifindex;
+	ifsm->filter_mask = filter_mask;
+
+	if (filter_mask & IFLA_STATS_FILTER_BIT(IFLA_STATS_LINK64)) {
+		attr = nla_reserve(skb, IFLA_STATS_LINK64,
+				   sizeof(struct rtnl_link_stats64));
+		if (!attr)
+			return -EMSGSIZE;
+
+		stats = dev_get_stats(dev, &temp);
+
+		copy_rtnl_link_stats64(nla_data(attr), stats);
+	}
+
+	list_for_each_entry(af_ops, &rtnl_af_ops, list) {
+		if (af_ops->fill_link_af_stats) {
+			int err;
+
+			err = af_ops->fill_link_af_stats(skb, dev, filter_mask);
+			if (err < 0)
+				goto nla_put_failure;
+		}
+	}
+
+	nlmsg_end(skb, nlh);
+
+	return 0;
+
+nla_put_failure:
+	nlmsg_cancel(skb, nlh);
+
+	return -EMSGSIZE;
+}
+
+static const struct nla_policy ifla_stats_policy[IFLA_STATS_MAX + 1] = {
+	[IFLA_STATS_LINK64]	= { .len = sizeof(struct rtnl_link_stats64) },
+};
+
+static size_t rtnl_link_get_af_stats_size(const struct net_device *dev,
+					  u32 filter_mask)
+{
+	struct rtnl_af_ops *af_ops;
+	size_t size = 0;
+
+	list_for_each_entry(af_ops, &rtnl_af_ops, list) {
+		if (af_ops->get_link_af_stats_size)
+			size += af_ops->get_link_af_stats_size(dev,
+							       filter_mask);
+	}
+
+	return size;
+}
+
+static size_t if_nlmsg_stats_size(const struct net_device *dev,
+				  u32 filter_mask)
+{
+	size_t size = 0;
+
+	if (filter_mask & IFLA_STATS_FILTER_BIT(IFLA_STATS_LINK64))
+		size += nla_total_size(sizeof(struct rtnl_link_stats64));
+
+	size += rtnl_link_get_af_stats_size(dev, filter_mask);
+
+	return size;
+}
+
+static int rtnl_stats_get(struct sk_buff *skb, struct nlmsghdr *nlh)
+{
+	struct net *net = sock_net(skb->sk);
+	struct if_stats_msg *ifsm;
+	struct net_device *dev = NULL;
+	struct sk_buff *nskb;
+	u32 filter_mask;
+	int err;
+
+	ifsm = nlmsg_data(nlh);
+	if (ifsm->ifindex > 0)
+		dev = __dev_get_by_index(net, ifsm->ifindex);
+	else
+		return -EINVAL;
+
+	if (!dev)
+		return -ENODEV;
+
+	filter_mask = ifsm->filter_mask;
+	if (!filter_mask)
+		return -EINVAL;
+
+	nskb = nlmsg_new(if_nlmsg_stats_size(dev, filter_mask), GFP_KERNEL);
+	if (!nskb)
+		return -ENOBUFS;
+
+	err = rtnl_fill_statsinfo(nskb, dev, RTM_NEWSTATS,
+				  NETLINK_CB(skb).portid, nlh->nlmsg_seq, 0,
+				  0, filter_mask);
+	if (err < 0) {
+		/* -EMSGSIZE implies BUG in if_nlmsg_stats_size */
+		WARN_ON(err == -EMSGSIZE);
+		kfree_skb(nskb);
+	} else {
+		err = rtnl_unicast(nskb, net, NETLINK_CB(skb).portid);
+	}
+
+	return err;
+}
+
+static u16 rtnl_stats_calcit(struct sk_buff *skb, struct nlmsghdr *nlh)
+{
+	struct net *net = sock_net(skb->sk);
+	struct net_device *dev;
+	u16 min_ifinfo_dump_size = 0;
+	struct if_stats_msg *ifsm;
+	u32 filter_mask;
+
+	ifsm = nlmsg_data(nlh);
+	filter_mask = ifsm->filter_mask;
+
+	/* traverse the list of net devices and compute the minimum
+	 * buffer size based upon the filter mask.
+	 */
+	list_for_each_entry(dev, &net->dev_base_head, dev_list) {
+		min_ifinfo_dump_size = max_t(u16, min_ifinfo_dump_size,
+					     if_nlmsg_stats_size(dev,
+								 filter_mask));
+	}
+
+	return min_ifinfo_dump_size;
+}
+
+static int rtnl_stats_dump(struct sk_buff *skb, struct netlink_callback *cb)
+{
+	struct net *net = sock_net(skb->sk);
+	struct if_stats_msg *ifsm;
+	int h, s_h;
+	int idx = 0, s_idx;
+	struct net_device *dev;
+	struct hlist_head *head;
+	unsigned int flags = NLM_F_MULTI;
+	u32 filter_mask = 0;
+	int err;
+
+	s_h = cb->args[0];
+	s_idx = cb->args[1];
+
+	cb->seq = net->dev_base_seq;
+
+	ifsm = nlmsg_data(cb->nlh);
+	filter_mask = ifsm->filter_mask;
+
+	for (h = s_h; h < NETDEV_HASHENTRIES; h++, s_idx = 0) {
+		idx = 0;
+		head = &net->dev_index_head[h];
+		hlist_for_each_entry(dev, head, index_hlist) {
+			if (idx < s_idx)
+				goto cont;
+			err = rtnl_fill_statsinfo(skb, dev, RTM_NEWSTATS,
+						  NETLINK_CB(cb->skb).portid,
+						  cb->nlh->nlmsg_seq, 0,
+						  flags, filter_mask);
+			/* If we ran out of room on the first message,
+			 * we're in trouble
+			 */
+			WARN_ON((err == -EMSGSIZE) && (skb->len == 0));
+
+			if (err < 0)
+				goto out;
+
+			nl_dump_check_consistent(cb, nlmsg_hdr(skb));
+cont:
+			idx++;
+		}
+	}
+out:
+	cb->args[1] = idx;
+	cb->args[0] = h;
+
+	return skb->len;
+}
+
 /* Process one rtnetlink message. */
 
 static int rtnetlink_rcv_msg(struct sk_buff *skb, struct nlmsghdr *nlh)
@@ -3600,4 +3797,7 @@ void __init rtnetlink_init(void)
 	rtnl_register(PF_BRIDGE, RTM_GETLINK, NULL, rtnl_bridge_getlink, NULL);
 	rtnl_register(PF_BRIDGE, RTM_DELLINK, rtnl_bridge_dellink, NULL, NULL);
 	rtnl_register(PF_BRIDGE, RTM_SETLINK, rtnl_bridge_setlink, NULL, NULL);
+
+	rtnl_register(PF_UNSPEC, RTM_GETSTATS, rtnl_stats_get, rtnl_stats_dump,
+		      rtnl_stats_calcit);
 }
-- 
1.9.1

^ permalink raw reply related

* [PATCH net-next v2 0/2] rtnetlink: new message for stats
From: Roopa Prabhu @ 2016-04-09  6:38 UTC (permalink / raw)
  To: netdev; +Cc: jhs, davem

From: Roopa Prabhu <roopa@cumulusnetworks.com>

This patch adds a new RTM_GETSTATS message to query link stats via
netlink from the kernel. RTM_NEWLINK also dumps stats today, but
RTM_NEWLINK returns a lot more than just stats and is expensive in some
cases when frequent polling for stats from userspace is a common operation.

RTM_GETSTATS is an attempt to provide a light weight netlink message
to explicity query only link stats from the kernel on an interface.
The idea is to also keep it extensible so that new kinds of stats can be
added to it in the future.


Roopa Prabhu (2):
  rtnetlink: add new RTM_GETSTATS to dump link stats
  ipv6: add support for stats via RTM_GETSTATS

Suggested-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>

RFC to v1 (apologies for the delay in sending this version out. busy days):
        - Addressed feedback from Dave
                - removed rtnl_link_stats
                - Added hdr struct if_stats_msg to carry ifindex and
                  filter mask
                - new macro IFLA_STATS_FILTER_BIT(ATTR) for filter mask
        - split the ipv6 patch into a separate patch, need some more eyes on it
        - prefix attributes with IFLA_STATS instead of IFLA_LINK_STATS for shorter
          attribute names

v1 - v2:
        - move IFLA_STATS_INET6 declaration to the inet6 patch
        - get rid of RTM_DELSTATS
        - mark ipv6 patch RFC. It can be used as an example for
          other AF stats like stats
        

 include/net/rtnetlink.h        |   5 +
 include/uapi/linux/if_link.h   |  19 ++++
 include/uapi/linux/rtnetlink.h |   7 ++
 net/core/rtnetlink.c           | 201 +++++++++++++++++++++++++++++++++++++++++
 net/ipv6/addrconf.c            |  77 ++++++++++++++--
 5 files changed, 301 insertions(+), 8 deletions(-)

-- 
1.9.1

^ permalink raw reply

* Re: [PATCHv2 net-next 4/6] sctp: add the sctp_diag.c file
From: Eric Dumazet @ 2016-04-09  5:51 UTC (permalink / raw)
  To: Xin Long
  Cc: network dev, linux-sctp, Marcelo Ricardo Leitner, Vlad Yasevich,
	daniel, davem
In-Reply-To: <ac19f4966c7df6399c2a3bfa0f7f9589b9ff536f.1460177331.git.lucien.xin@gmail.com>

On Sat, 2016-04-09 at 12:53 +0800, Xin Long wrote:
> This one will implement all the interface of inet_diag, inet_diag_handler.
> which includes sctp_diag_dump, sctp_diag_dump_one and sctp_diag_get_info.


> +static int inet_assoc_diag_fill(struct sock *sk,
> +				struct sctp_association *asoc,
> +				struct sk_buff *skb,
> +				const struct inet_diag_req_v2 *req,
> +				struct user_namespace *user_ns,
> +				int portid, u32 seq, u16 nlmsg_flags,
> +				const struct nlmsghdr *unlh)
> +{
> +	const struct inet_sock *inet = inet_sk(sk);
> +	const struct inet_diag_handler *handler;
> +	int ext = req->idiag_ext;
> +	struct inet_diag_msg *r;
> +	struct nlmsghdr  *nlh;
> +	struct nlattr *attr;
> +	void *info = NULL;
> +	union sctp_addr laddr, paddr;
> +	struct dst_entry *dst;
> +	struct sctp_infox infox;
> +
> +	handler = inet_diag_get_handler(req->sdiag_protocol);
> +	BUG_ON(!handler);
> +
> +	nlh = nlmsg_put(skb, portid, seq, unlh->nlmsg_type, sizeof(*r),
> +			nlmsg_flags);
> +	if (!nlh)
> +		return -EMSGSIZE;
> +
> +	r = nlmsg_data(nlh);
> +	BUG_ON(!sk_fullsock(sk));
> +
> +	laddr = list_entry(asoc->base.bind_addr.address_list.next,
> +			   struct sctp_sockaddr_entry, list)->a;
> +	paddr = asoc->peer.primary_path->ipaddr;
> +	dst = asoc->peer.primary_path->dst;
> +
> +	r->idiag_family = sk->sk_family;
> +	r->id.idiag_sport = htons(asoc->base.bind_addr.port);
> +	r->id.idiag_dport = htons(asoc->peer.port);
> +	r->id.idiag_if = dst ? dst->dev->ifindex : 0;
> +	sock_diag_save_cookie(sk, r->id.idiag_cookie);
> +
> +#if IS_ENABLED(CONFIG_IPV6)
> +	if (sk->sk_family == AF_INET6) {
> +		*(struct in6_addr *)r->id.idiag_src = laddr.v6.sin6_addr;
> +		*(struct in6_addr *)r->id.idiag_dst = paddr.v6.sin6_addr;
> +	} else
> +#endif
> +	{
> +		memset(&r->id.idiag_src, 0, sizeof(r->id.idiag_src));
> +		memset(&r->id.idiag_dst, 0, sizeof(r->id.idiag_dst));
> +
> +		r->id.idiag_src[0] = laddr.v4.sin_addr.s_addr;
> +		r->id.idiag_dst[0] = paddr.v4.sin_addr.s_addr;
> +	}
> +
> +	r->idiag_state = asoc->state;
> +	r->idiag_timer = SCTP_EVENT_TIMEOUT_T3_RTX;
> +	r->idiag_retrans = asoc->rtx_data_chunks;
> +#define EXPIRES_IN_MS(tmo)  DIV_ROUND_UP((tmo - jiffies) * 1000, HZ)
> +	r->idiag_expires =
> +		EXPIRES_IN_MS(asoc->timeouts[SCTP_EVENT_TIMEOUT_T3_RTX]);
> +#undef EXPIRES_IN_MS
> +
> +	if (nla_put_u8(skb, INET_DIAG_SHUTDOWN, sk->sk_shutdown))
> +		goto errout;
> +
> +	/* IPv6 dual-stack sockets use inet->tos for IPv4 connections,
> +	 * hence this needs to be included regardless of socket family.
> +	 */
> +	if (ext & (1 << (INET_DIAG_TOS - 1)))
> +		if (nla_put_u8(skb, INET_DIAG_TOS, inet->tos) < 0)
> +			goto errout;
> +
> +#if IS_ENABLED(CONFIG_IPV6)
> +	if (r->idiag_family == AF_INET6) {
> +		if (ext & (1 << (INET_DIAG_TCLASS - 1)))
> +			if (nla_put_u8(skb, INET_DIAG_TCLASS,
> +				       inet6_sk(sk)->tclass) < 0)
> +				goto errout;
> +
> +		if (((1 << sk->sk_state) & (TCPF_LISTEN | TCPF_CLOSE)) &&
> +		    nla_put_u8(skb, INET_DIAG_SKV6ONLY, ipv6_only_sock(sk)))
> +			goto errout;
> +	}
> +#endif
> +
> +	r->idiag_uid = from_kuid_munged(user_ns, sock_i_uid(sk));
> +	r->idiag_inode = sock_i_ino(sk);
> +
> +	if (ext & (1 << (INET_DIAG_MEMINFO - 1))) {
> +		struct inet_diag_meminfo minfo = {
> +			.idiag_rmem = sk_rmem_alloc_get(sk),
> +			.idiag_wmem = sk->sk_wmem_queued,
> +			.idiag_fmem = sk->sk_forward_alloc,
> +			.idiag_tmem = sk_wmem_alloc_get(sk),
> +		};
> +

All this code looks familiar.

Why inet_sk_diag_fill() is not used instead ?

> +		if (nla_put(skb, INET_DIAG_MEMINFO, sizeof(minfo), &minfo) < 0)
> +			goto errout;
> +	}
> +
> +	if (ext & (1 << (INET_DIAG_SKMEMINFO - 1)))
> +		if (sock_diag_put_meminfo(sk, skb, INET_DIAG_SKMEMINFO))
> +			goto errout;
> +
> +	if ((ext & (1 << (INET_DIAG_INFO - 1))) && handler->idiag_info_size) {
> +		attr = nla_reserve(skb, INET_DIAG_INFO,
> +				   handler->idiag_info_size);
> +		if (!attr)
> +			goto errout;
> +
> +		info = nla_data(attr);
> +	}
> +	infox.sctpinfo = (struct sctp_info *)info;
> +	infox.asoc = asoc;
> +	handler->idiag_get_info(sk, r, &infox);
> +
> +	if (ext & (1 << (INET_DIAG_CONG - 1)))
> +		if (nla_put_string(skb, INET_DIAG_CONG, "reno") < 0)
> +			goto errout;
> +
> +	if (inet_sctp_fill_laddrs(skb, &asoc->base.bind_addr.address_list))
> +		goto errout;
> +
> +	if (inet_sctp_fill_paddrs(skb, asoc))
> +		goto errout;
> +
> +	nlmsg_end(skb, nlh);
> +	return 0;
> +
> +errout:
> +	nlmsg_cancel(skb, nlh);
> +	return -EMSGSIZE;
> +}
> +
> +static int inet_ep_diag_fill(struct sock *sk, struct sctp_endpoint *ep,
> +			     struct sk_buff *skb,
> +			     const struct inet_diag_req_v2 *req,
> +			     struct user_namespace *user_ns,
> +			     u32 portid, u32 seq, u16 nlmsg_flags,
> +			     const struct nlmsghdr *unlh)
> +{
> +	const struct inet_sock *inet = inet_sk(sk);
> +	const struct inet_diag_handler *handler;
> +	int ext = req->idiag_ext;
> +	struct inet_diag_msg *r;
> +	struct nlmsghdr  *nlh;
> +	struct nlattr *attr;
> +	void *info = NULL;
> +	struct sctp_infox infox;
> +
> +	handler = inet_diag_get_handler(req->sdiag_protocol);
> +	BUG_ON(!handler);
> +
> +	nlh = nlmsg_put(skb, portid, seq, unlh->nlmsg_type, sizeof(*r),
> +			nlmsg_flags);
> +	if (!nlh)
> +		return -EMSGSIZE;
> +
> +	r = nlmsg_data(nlh);
> +	BUG_ON(!sk_fullsock(sk));
> +
> +	inet_diag_msg_common_fill(r, sk);
> +	r->idiag_state = sk->sk_state;
> +	r->idiag_timer = 0;
> +	r->idiag_retrans = 0;
> +
> +	if (nla_put_u8(skb, INET_DIAG_SHUTDOWN, sk->sk_shutdown))
> +		goto errout;
> +
> +	/* IPv6 dual-stack sockets use inet->tos for IPv4 connections,
> +	 * hence this needs to be included regardless of socket family.
> +	 */
> +	if (ext & (1 << (INET_DIAG_TOS - 1)))
> +		if (nla_put_u8(skb, INET_DIAG_TOS, inet->tos) < 0)
> +			goto errout;
> +
> +#if IS_ENABLED(CONFIG_IPV6)
> +	if (r->idiag_family == AF_INET6) {
> +		if (ext & (1 << (INET_DIAG_TCLASS - 1)))
> +			if (nla_put_u8(skb, INET_DIAG_TCLASS,
> +				       inet6_sk(sk)->tclass) < 0)
> +				goto errout;
> +
> +		if (((1 << sk->sk_state) & (TCPF_LISTEN | TCPF_CLOSE)) &&
> +		    nla_put_u8(skb, INET_DIAG_SKV6ONLY, ipv6_only_sock(sk)))
> +			goto errout;
> +	}
> +#endif
> +
> +	r->idiag_uid = from_kuid_munged(user_ns, sock_i_uid(sk));
> +	r->idiag_inode = sock_i_ino(sk);
> +
> +	if (ext & (1 << (INET_DIAG_MEMINFO - 1))) {
> +		struct inet_diag_meminfo minfo = {
> +			.idiag_rmem = sk_rmem_alloc_get(sk),
> +			.idiag_wmem = sk->sk_wmem_queued,
> +			.idiag_fmem = sk->sk_forward_alloc,
> +			.idiag_tmem = sk_wmem_alloc_get(sk),
> +		};
> +

Again, looks a lot of duplication.

Also you missed that INET_DIAG_MEMINFO is kind of obsolete,
now we have sock_diag_put_meminfo()


> +		if (nla_put(skb, INET_DIAG_MEMINFO, sizeof(minfo), &minfo) < 0)
> +			goto errout;
> +	}
> +
> +	if (ext & (1 << (INET_DIAG_SKMEMINFO - 1)))
> +		if (sock_diag_put_meminfo(sk, skb, INET_DIAG_SKMEMINFO))
> +			goto errout;
> +
> +	if ((ext & (1 << (INET_DIAG_INFO - 1))) && handler->idiag_info_size) {
> +		attr = nla_reserve(skb, INET_DIAG_INFO,
> +				   handler->idiag_info_size);
> +		if (!attr)
> +			goto errout;
> +
> +		info = nla_data(attr);
> +	}
> +	infox.sctpinfo = (struct sctp_info *)info;
> +	infox.asoc = NULL;
> +	handler->idiag_get_info(sk, r, &infox);
> +
> +	if (inet_sctp_fill_laddrs(skb, &ep->base.bind_addr.address_list))
> +		goto errout;
> +
> +	nlmsg_end(skb, nlh);
> +	return 0;
> +
> +errout:
> +	nlmsg_cancel(skb, nlh);
> +	return -EMSGSIZE;
> +}
> +
> +static size_t inet_assoc_attr_size(struct sctp_association *asoc)
> +{
> +	int addrlen = sizeof(struct sockaddr_storage);
> +	int addrcnt = 0;
> +	struct sctp_sockaddr_entry *laddr;
> +
> +	list_for_each_entry_rcu(laddr, &asoc->base.bind_addr.address_list,
> +				list)
> +		addrcnt++;
> +
> +	return	  nla_total_size(sizeof(struct tcp_info))

Are you sure you want to use tcp_info ???

> +		+ nla_total_size(1) /* INET_DIAG_SHUTDOWN */
> +		+ nla_total_size(1) /* INET_DIAG_TOS */
> +		+ nla_total_size(1) /* INET_DIAG_TCLASS */
> +		+ nla_total_size(addrlen * asoc->peer.transport_count)
> +		+ nla_total_size(addrlen * addrcnt)
> +		+ nla_total_size(sizeof(struct inet_diag_meminfo))
> +		+ nla_total_size(sizeof(struct inet_diag_msg))
> +		+ nla_total_size(sizeof(struct sctp_info))
> +		+ 64;
> +}

^ permalink raw reply

* [PATCH net-next] net: bcmgenet: use __napi_schedule_irqoff()
From: Eric Dumazet @ 2016-04-09  5:30 UTC (permalink / raw)
  To: Florian Fainelli; +Cc: David Miller, netdev, Petri Gynther, opendmb
In-Reply-To: <5B66FA2B-6AAE-4FEB-B5FB-C5C9DF48FDB5@gmail.com>

From: Florian Fainelli <f.fainelli@gmail.com>

bcmgenet_isr1() and bcmgenet_isr0() run in hard irq context,
we do not need to block irq again.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 drivers/net/ethernet/broadcom/genet/bcmgenet.c |    8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/genet/bcmgenet.c b/drivers/net/ethernet/broadcom/genet/bcmgenet.c
index f7b42b9fc979..4367d561a12e 100644
--- a/drivers/net/ethernet/broadcom/genet/bcmgenet.c
+++ b/drivers/net/ethernet/broadcom/genet/bcmgenet.c
@@ -2493,7 +2493,7 @@ static irqreturn_t bcmgenet_isr1(int irq, void *dev_id)
 
 		if (likely(napi_schedule_prep(&rx_ring->napi))) {
 			rx_ring->int_disable(rx_ring);
-			__napi_schedule(&rx_ring->napi);
+			__napi_schedule_irqoff(&rx_ring->napi);
 		}
 	}
 
@@ -2506,7 +2506,7 @@ static irqreturn_t bcmgenet_isr1(int irq, void *dev_id)
 
 		if (likely(napi_schedule_prep(&tx_ring->napi))) {
 			tx_ring->int_disable(tx_ring);
-			__napi_schedule(&tx_ring->napi);
+			__napi_schedule_irqoff(&tx_ring->napi);
 		}
 	}
 
@@ -2536,7 +2536,7 @@ static irqreturn_t bcmgenet_isr0(int irq, void *dev_id)
 
 		if (likely(napi_schedule_prep(&rx_ring->napi))) {
 			rx_ring->int_disable(rx_ring);
-			__napi_schedule(&rx_ring->napi);
+			__napi_schedule_irqoff(&rx_ring->napi);
 		}
 	}
 
@@ -2545,7 +2545,7 @@ static irqreturn_t bcmgenet_isr0(int irq, void *dev_id)
 
 		if (likely(napi_schedule_prep(&tx_ring->napi))) {
 			tx_ring->int_disable(tx_ring);
-			__napi_schedule(&tx_ring->napi);
+			__napi_schedule_irqoff(&tx_ring->napi);
 		}
 	}
 

^ permalink raw reply related

* Re: [PATCH net-next] net: bcmgenet: use napi_complete_done()
From: Eric Dumazet @ 2016-04-09  5:27 UTC (permalink / raw)
  To: Florian Fainelli; +Cc: David Miller, netdev, Petri Gynther, opendmb
In-Reply-To: <5B66FA2B-6AAE-4FEB-B5FB-C5C9DF48FDB5@gmail.com>

On Fri, 2016-04-08 at 22:19 -0700, Florian Fainelli wrote:

> Along the same line of changes, we could use napi_schedule_irqoff since NAPI is always scheduled from ISR context.

Good point, I'll cook the patch ;)

Thanks !

^ permalink raw reply

* Re: [PATCHv2 net-next 1/6] sctp: add sctp_info dump api for sctp_diag
From: Eric Dumazet @ 2016-04-09  5:19 UTC (permalink / raw)
  To: Xin Long
  Cc: network dev, linux-sctp, Marcelo Ricardo Leitner, Vlad Yasevich,
	daniel, davem
In-Reply-To: <c507274a984bd1b0a7e7a59d1e825352536efd25.1460177331.git.lucien.xin@gmail.com>

On Sat, 2016-04-09 at 12:53 +0800, Xin Long wrote:
> sctp_diag will dump some important details of sctp's assoc or ep, we use
> sctp_info to describe them,  sctp_get_sctp_info to get them, and export
> it to sctp_diag.ko.
> 


> +int sctp_get_sctp_info(struct sock *sk, struct sctp_association *asoc,
> +		       struct sctp_info *info)
> +{
> +	struct sctp_transport *prim;
> +	struct list_head *pos, *temp;
> +	int mask;
> +
> +	memset(info, 0, sizeof(*info));
> +	if (!asoc) {
> +		struct sctp_sock *sp = sctp_sk(sk);
> +
> +		info->sctpi_s_autoclose = sp->autoclose;
> +		info->sctpi_s_adaptation_ind = sp->adaptation_ind;
> +		info->sctpi_s_pd_point = sp->pd_point;
> +		info->sctpi_s_nodelay = sp->nodelay;
> +		info->sctpi_s_disable_fragments = sp->disable_fragments;
> +		info->sctpi_s_v4mapped = sp->v4mapped;
> +		info->sctpi_s_frag_interleave = sp->frag_interleave;
> +
> +		return 0;
> +	}
> +
> +	info->sctpi_tag = asoc->c.my_vtag;
> +	info->sctpi_state = asoc->state;
> +	info->sctpi_rwnd = asoc->a_rwnd;
> +	info->sctpi_unackdata = asoc->unack_data;
> +	info->sctpi_penddata = sctp_tsnmap_pending(&asoc->peer.tsn_map);
> +	info->sctpi_instrms = asoc->c.sinit_max_instreams;
> +	info->sctpi_outstrms = asoc->c.sinit_num_ostreams;
> +	list_for_each_safe(pos, temp, &asoc->base.inqueue.in_chunk_list)
> +		info->sctpi_inqueue++;
> +	list_for_each_safe(pos, temp, &asoc->outqueue.out_chunk_list)
> +		info->sctpi_outqueue++;

Is this safe ?

Do you own the lock on socket or whatever lock protecting this list ?


> +	info->sctpi_overall_error = asoc->overall_error_count;
> +	info->sctpi_max_burst = asoc->max_burst;
> +	info->sctpi_maxseg = asoc->frag_point;
> +	info->sctpi_peer_rwnd = asoc->peer.rwnd;
> +	info->sctpi_peer_tag = asoc->c.peer_vtag;
> +
> +	mask = asoc->peer.ecn_capable << 1;
> +	mask = (mask | asoc->peer.ipv4_address) << 1;
> +	mask = (mask | asoc->peer.ipv6_address) << 1;
> +	mask = (mask | asoc->peer.hostname_address) << 1;
> +	mask = (mask | asoc->peer.asconf_capable) << 1;
> +	mask = (mask | asoc->peer.prsctp_capable) << 1;
> +	mask = (mask | asoc->peer.auth_capable);
> +	info->sctpi_peer_capable = mask;
> +	mask = asoc->peer.sack_needed << 1;
> +	mask = (mask | asoc->peer.sack_generation) << 1;
> +	mask = (mask | asoc->peer.zero_window_announced);
> +	info->sctpi_peer_sack = mask;
> +
> +	info->sctpi_isacks = asoc->stats.isacks;
> +	info->sctpi_osacks = asoc->stats.osacks;
> +	info->sctpi_opackets = asoc->stats.opackets;
> +	info->sctpi_ipackets = asoc->stats.ipackets;
> +	info->sctpi_rtxchunks = asoc->stats.rtxchunks;
> +	info->sctpi_outofseqtsns = asoc->stats.outofseqtsns;
> +	info->sctpi_idupchunks = asoc->stats.idupchunks;
> +	info->sctpi_gapcnt = asoc->stats.gapcnt;
> +	info->sctpi_ouodchunks = asoc->stats.ouodchunks;
> +	info->sctpi_iuodchunks = asoc->stats.iuodchunks;
> +	info->sctpi_oodchunks = asoc->stats.oodchunks;
> +	info->sctpi_iodchunks = asoc->stats.iodchunks;
> +	info->sctpi_octrlchunks = asoc->stats.octrlchunks;
> +	info->sctpi_ictrlchunks = asoc->stats.ictrlchunks;
> +
> +	prim = asoc->peer.primary_path;
> +	memcpy(&info->sctpi_p_address, &prim->ipaddr,
> +	       sizeof(struct sockaddr_storage));
> +	info->sctpi_p_state = prim->state;
> +	info->sctpi_p_cwnd = prim->cwnd;
> +	info->sctpi_p_srtt = prim->srtt;
> +	info->sctpi_p_rto = jiffies_to_msecs(prim->rto);
> +	info->sctpi_p_hbinterval = prim->hbinterval;
> +	info->sctpi_p_pathmaxrxt = prim->pathmaxrxt;
> +	info->sctpi_p_sackdelay = jiffies_to_msecs(prim->sackdelay);
> +	info->sctpi_p_ssthresh = prim->ssthresh;
> +	info->sctpi_p_partial_bytes_acked = prim->partial_bytes_acked;
> +	info->sctpi_p_flight_size = prim->flight_size;
> +	info->sctpi_p_error = prim->error_count;
> +
> +	return 0;
> +}
> +EXPORT_SYMBOL_GPL(sctp_get_sctp_info);

info is not guaranteed to be aligned on 8 bytes.

You need to use put_unaligned()

Check commit ff5d749772018 ("tcp: beware of alignments in
tcp_get_info()") for details.

^ permalink raw reply

* Re: [PATCH net-next] net: bcmgenet: use napi_complete_done()
From: Florian Fainelli @ 2016-04-09  5:19 UTC (permalink / raw)
  To: Eric Dumazet, David Miller; +Cc: netdev, Petri Gynther, opendmb
In-Reply-To: <1460178400.6473.469.camel@edumazet-glaptop3.roam.corp.google.com>

On April 8, 2016 10:06:40 PM PDT, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>From: Eric Dumazet <edumazet@google.com>
>
>By using napi_complete_done(), we allow fine tuning
>of /sys/class/net/ethX/gro_flush_timeout for higher GRO aggregation
>efficiency for a Gbit NIC.
>
>Check commit 24d2e4a50737 ("tg3: use napi_complete_done()") for
>details.
>
>Signed-off-by: Eric Dumazet <edumazet@google.com>
>Cc: Petri Gynther <pgynther@google.com>
>Cc: Florian Fainelli <f.fainelli@gmail.com>

Acked-by: Florian Fainelli <f.fainelli@gmail.com>

Along the same line of changes, we could use napi_schedule_irqoff since NAPI is always scheduled from ISR context.


>---
> drivers/net/ethernet/broadcom/genet/bcmgenet.c |    2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
>diff --git a/drivers/net/ethernet/broadcom/genet/bcmgenet.c
>b/drivers/net/ethernet/broadcom/genet/bcmgenet.c
>index f7b42b9fc979..e823013d3125 100644
>--- a/drivers/net/ethernet/broadcom/genet/bcmgenet.c
>+++ b/drivers/net/ethernet/broadcom/genet/bcmgenet.c
>@@ -1735,7 +1735,7 @@ static int bcmgenet_rx_poll(struct napi_struct
>*napi, int budget)
> 	work_done = bcmgenet_desc_rx(ring, budget);
> 
> 	if (work_done < budget) {
>-		napi_complete(napi);
>+		napi_complete_done(napi, work_done);
> 		ring->int_enable(ring);
> 	}
> 


-- 
Florian

^ permalink raw reply

* Re: [PATCHv2 net-next 1/6] sctp: add sctp_info dump api for sctp_diag
From: Eric Dumazet @ 2016-04-09  5:16 UTC (permalink / raw)
  To: Xin Long
  Cc: network dev, linux-sctp, Marcelo Ricardo Leitner, Vlad Yasevich,
	daniel, davem
In-Reply-To: <c507274a984bd1b0a7e7a59d1e825352536efd25.1460177331.git.lucien.xin@gmail.com>

On Sat, 2016-04-09 at 12:53 +0800, Xin Long wrote:
> sctp_diag will dump some important details of sctp's assoc or ep, we use
> sctp_info to describe them,  sctp_get_sctp_info to get them, and export
> it to sctp_diag.ko.
> 
> Signed-off-by: Xin Long <lucien.xin@gmail.com>
> ---
>  include/linux/sctp.h    | 65 +++++++++++++++++++++++++++++++++++++
>  include/net/sctp/sctp.h |  3 ++
>  net/sctp/socket.c       | 86 +++++++++++++++++++++++++++++++++++++++++++++++++
>  3 files changed, 154 insertions(+)
> 
> diff --git a/include/linux/sctp.h b/include/linux/sctp.h
> index a9414fd..a448ebc 100644
> --- a/include/linux/sctp.h
> +++ b/include/linux/sctp.h
> @@ -705,4 +705,69 @@ typedef struct sctp_auth_chunk {
>  	sctp_authhdr_t auth_hdr;
>  } __packed sctp_auth_chunk_t;
>  
> +struct sctp_info {
> +	__u32	sctpi_tag;
> +	__u32	sctpi_state;
> +	__u32	sctpi_rwnd;
> +	__u16	sctpi_unackdata;
> +	__u16	sctpi_penddata;
> +	__u16	sctpi_instrms;
> +	__u16	sctpi_outstrms;
> +	__u32	sctpi_fragmentation_point;
> +	__u32	sctpi_inqueue;
> +	__u32	sctpi_outqueue;
> +	__u32	sctpi_overall_error;
> +	__u32	sctpi_max_burst;
> +	__u32	sctpi_maxseg;
> +	__u32	sctpi_peer_rwnd;
> +	__u32	sctpi_peer_tag;
> +	__u8	sctpi_peer_capable;
> +	__u8	sctpi_peer_sack;
> +
> +	/* assoc status info */
> +	__u64	sctpi_isacks;
> +	__u64	sctpi_osacks;
> +	__u64	sctpi_opackets;
> +	__u64	sctpi_ipackets;
> +	__u64	sctpi_rtxchunks;
> +	__u64	sctpi_outofseqtsns;
> +	__u64	sctpi_idupchunks;
> +	__u64	sctpi_gapcnt;
> +	__u64	sctpi_ouodchunks;
> +	__u64	sctpi_iuodchunks;
> +	__u64	sctpi_oodchunks;
> +	__u64	sctpi_iodchunks;
> +	__u64	sctpi_octrlchunks;
> +	__u64	sctpi_ictrlchunks;
> +
> +	/* primary transport info */
> +	struct sockaddr_storage	sctpi_p_address;
> +	__s32	sctpi_p_state;
> +	__u32	sctpi_p_cwnd;
> +	__u32	sctpi_p_srtt;
> +	__u32	sctpi_p_rto;
> +	__u32	sctpi_p_hbinterval;
> +	__u32	sctpi_p_pathmaxrxt;
> +	__u32	sctpi_p_sackdelay;
> +	__u32	sctpi_p_sackfreq;
> +	__u32	sctpi_p_ssthresh;
> +	__u32	sctpi_p_partial_bytes_acked;
> +	__u32	sctpi_p_flight_size;
> +	__u16	sctpi_p_error;
> +
> +	/* sctp sock info */
> +	__u32	sctpi_s_autoclose;
> +	__u32	sctpi_s_adaptation_ind;
> +	__u32	sctpi_s_pd_point;
> +	__u8	sctpi_s_nodelay;
> +	__u8	sctpi_s_disable_fragments;
> +	__u8	sctpi_s_v4mapped;
> +	__u8	sctpi_s_frag_interleave;
> +};
> +

Lots of holes in this structure...

^ permalink raw reply

* [GIT] Networking
From: David Miller @ 2016-04-09  5:14 UTC (permalink / raw)
  To: torvalds; +Cc: akpm, netdev, linux-kernel


1) Stale SKB data pointer access across pskb_may_pull() calls in L2TP,
   from Haishuang Yan.

2) Fix multicast frame handling in mac80211 AP code, from Felix
   Fietkau.

3) mac80211 station hashtable insert errors not handled properly, fix
   from Johannes Berg.

4) Fix TX descriptor count limit handling in e1000, from Alexander Duyck.

5) Revert a buggy netdev refcount fix in netpoll, from Bjorn Helgaas.

6) Must assign rtnl_link_ops of the device before registering it,
   fix in ip6_tunnel from Thadeu Lima de Souza Cascardo.

7) Memory leak fix in tc action net exit, from WANG Cong.

8) Add missing AF_KCM entries to name tables, from Dexuan Cui.

9) Fix regression in GRE handling of csums wrt. FOU, from Alexander
   Duyck.

10) Fix memory allocation alignment and congestion map corruption in
    RDS, from Shamir Rabinovitch.

11) Fix default qdisc regression in tuntap driver, from Jason Wang.

Please pull, thanks a lot!

The following changes since commit 05cf8077e54b20dddb756eaa26f3aeb5c38dd3cf:

  Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net (2016-04-01 20:03:33 -0500)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git 

for you to fetch changes up to 30d237a6c2e9be1bb816fe8e787b88fd7aad833b:

  Merge tag 'mac80211-for-davem-2016-04-06' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211 (2016-04-08 16:41:28 -0400)

----------------------------------------------------------------
Alexander Duyck (3):
      e1000: Do not overestimate descriptor counts in Tx pre-check
      e1000: Double Tx descriptors needed check for 82544
      GRE: Disable segmentation offloads w/ CSUM and we are encapsulated via FOU

Arik Nemtsov (3):
      mac80211: TDLS: always downgrade invalid chandefs
      mac80211: TDLS: change BW calculation for WIDER_BW peers
      mac80211: recalc min_def chanctx even when chandef is identical

Bastien Philbert (1):
      bridge: Fix incorrect variable assignment on error path in br_sysfs_addbr

Ben Greear (1):
      mac80211: ensure no limits on station rhashtable

Bjorn Helgaas (1):
      Revert "netpoll: Fix extra refcount release in netpoll_cleanup()"

Dave Jones (1):
      af_packet: tone down the Tx-ring unsupported spew.

David S. Miller (3):
      Merge branch 'master' of git://git.kernel.org/.../jkirsher/net-queue
      Revert "bridge: Fix incorrect variable assignment on error path in br_sysfs_addbr"
      Merge tag 'mac80211-for-davem-2016-04-06' of git://git.kernel.org/.../jberg/mac80211

Dexuan Cui (1):
      net: add the AF_KCM entries to family name tables

Emmanuel Grumbach (2):
      mac80211: don't send deferred frames outside the SP
      mac80211: close the SP when we enqueue frames during the SP

Felix Fietkau (1):
      mac80211: fix AP buffered multicast frames with queue control and txq

Giuseppe CAVALLARO (1):
      stmmac: fix adjust link call in case of a switch is attached

Haishuang Yan (2):
      ipv4: l2tp: fix a potential issue in l2tp_ip_recv
      ipv6: l2tp: fix a potential issue in l2tp_ip6_recv

Hariprasad Shenai (1):
      cxgb4: Add pci device id for chelsio t520-cr adapter

Ilan Peer (1):
      mac80211: Fix BW upgrade for TDLS peers

Jakub Sitnicki (1):
      ipv6: Count in extension headers in skb->network_header

Jason Wang (1):
      tuntap: restore default qdisc

Jeff Mahoney (1):
      mac80211: fix "warning: ‘target_metric’ may be used uninitialized"

Jesse Brandeburg (1):
      i40e: fix errant PCIe bandwidth message

Jiri Benc (1):
      MAINTAINERS: intel-wired-lan list is moderated

Johannes Berg (1):
      mac80211: properly deal with station hashtable insert errors

Jorgen Hansen (1):
      VSOCK: Detach QP check should filter out non matching QPs.

Luis de Bethencourt (2):
      mac80211: add doc for RX_FLAG_DUP_VALIDATED flag
      mac80211: remove description of dropped member

Marcelo Ricardo Leitner (2):
      sctp: flush if we can't fit another DATA chunk
      sctp: use list_* in sctp_list_dequeue

Naveen N. Rao (7):
      samples/bpf: Fix build breakage with map_perf_test_user.c
      samples/bpf: Use llc in PATH, rather than a hardcoded value
      samples/bpf: Enable powerpc support
      lib/test_bpf: Fix JMP_JSET tests
      lib/test_bpf: Add tests for unsigned BPF_JGT
      lib/test_bpf: Add test to check for result of 32-bit add that overflows
      lib/test_bpf: Add additional BPF_ADD tests

Roopa Prabhu (1):
      mpls: find_outdev: check for err ptr in addition to NULL check

Thadeu Lima de Souza Cascardo (1):
      ip6_tunnel: set rtnl_link_ops before calling register_netdevice

WANG Cong (1):
      net_sched: fix a memory leak in tc action

shamir rabinovitch (2):
      RDS: memory allocated must be align to 8
      RDS: fix congestion map corruption for PAGE_SIZE > 4k

stephen hemminger (1):
      bridge, netem: mark mailing lists as moderated

 MAINTAINERS                                        |   6 +-
 drivers/net/ethernet/chelsio/cxgb4/t4_pci_id_tbl.h |   1 +
 drivers/net/ethernet/intel/e1000/e1000_main.c      |  21 ++++-
 drivers/net/ethernet/intel/i40e/i40e_main.c        |   1 +
 drivers/net/ethernet/stmicro/stmmac/stmmac_main.c  |  22 +++--
 drivers/net/tun.c                                  |   4 +-
 include/linux/netdevice.h                          |   5 +-
 include/net/act_api.h                              |   1 +
 include/net/mac80211.h                             |   2 +
 include/net/sctp/sctp.h                            |   6 +-
 lib/test_bpf.c                                     | 229 +++++++++++++++++++++++++++++++++++++++++++++++++++-
 net/core/dev.c                                     |   1 +
 net/core/netpoll.c                                 |   3 +-
 net/core/sock.c                                    |   9 ++-
 net/ipv4/fou.c                                     |   6 ++
 net/ipv4/gre_offload.c                             |   8 ++
 net/ipv4/ip_gre.c                                  |  13 ++-
 net/ipv6/ip6_output.c                              |   8 +-
 net/ipv6/ip6_tunnel.c                              |   2 +-
 net/l2tp/l2tp_ip.c                                 |   8 +-
 net/l2tp/l2tp_ip6.c                                |   8 +-
 net/mac80211/chan.c                                |   4 +-
 net/mac80211/ieee80211_i.h                         |   4 +
 net/mac80211/mesh_hwmp.c                           |   2 +-
 net/mac80211/sta_info.c                            |  14 ++--
 net/mac80211/sta_info.h                            |   1 -
 net/mac80211/tdls.c                                |  43 ++++++++--
 net/mac80211/tx.c                                  |  13 ++-
 net/mac80211/vht.c                                 |  30 +++++--
 net/mpls/af_mpls.c                                 |   3 +
 net/packet/af_packet.c                             |   2 +-
 net/rds/ib_recv.c                                  |   2 +-
 net/rds/page.c                                     |   4 +-
 net/sctp/output.c                                  |   3 +-
 net/vmw_vsock/vmci_transport.c                     |   4 +-
 samples/bpf/Makefile                               |  12 +--
 samples/bpf/bpf_helpers.h                          |  26 ++++++
 samples/bpf/map_perf_test_user.c                   |   1 +
 samples/bpf/spintest_kern.c                        |   2 +-
 samples/bpf/tracex2_kern.c                         |   4 +-
 samples/bpf/tracex4_kern.c                         |   2 +-
 41 files changed, 448 insertions(+), 92 deletions(-)

^ permalink raw reply

* [PATCH net-next] net: bcmgenet: use napi_complete_done()
From: Eric Dumazet @ 2016-04-09  5:06 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, Petri Gynther, Florian Fainelli

From: Eric Dumazet <edumazet@google.com>

By using napi_complete_done(), we allow fine tuning
of /sys/class/net/ethX/gro_flush_timeout for higher GRO aggregation
efficiency for a Gbit NIC.

Check commit 24d2e4a50737 ("tg3: use napi_complete_done()") for details.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Petri Gynther <pgynther@google.com>
Cc: Florian Fainelli <f.fainelli@gmail.com>
---
 drivers/net/ethernet/broadcom/genet/bcmgenet.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/broadcom/genet/bcmgenet.c b/drivers/net/ethernet/broadcom/genet/bcmgenet.c
index f7b42b9fc979..e823013d3125 100644
--- a/drivers/net/ethernet/broadcom/genet/bcmgenet.c
+++ b/drivers/net/ethernet/broadcom/genet/bcmgenet.c
@@ -1735,7 +1735,7 @@ static int bcmgenet_rx_poll(struct napi_struct *napi, int budget)
 	work_done = bcmgenet_desc_rx(ring, budget);
 
 	if (work_done < budget) {
-		napi_complete(napi);
+		napi_complete_done(napi, work_done);
 		ring->int_enable(ring);
 	}
 

^ permalink raw reply related

* [PATCHv2 net-next 6/6] sctp: fix some rhashtable functions using in sctp proc/diag
From: Xin Long @ 2016-04-09  4:53 UTC (permalink / raw)
  To: network dev, linux-sctp
  Cc: Marcelo Ricardo Leitner, Vlad Yasevich, daniel, davem
In-Reply-To: <cover.1460177331.git.lucien.xin@gmail.com>

When rhashtable_walk_init return err, no release function should be
called, and when rhashtable_walk_start return err, we should only invoke
rhashtable_walk_exit to release the source.

But now when sctp_transport_walk_start return err, we just call
rhashtable_walk_stop/exit, and never care about if rhashtable_walk_init
or start return err, which is so bad.

We will fix it by calling rhashtable_walk_exit if rhashtable_walk_start
return err in sctp_transport_walk_start, and if sctp_transport_walk_start
return err, we do not need to call sctp_transport_walk_stop any more.

For sctp proc, we will use 'iter->start_fail' to decide if we will call
rhashtable_walk_stop/exit.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
---
 net/sctp/proc.c   |  7 ++++++-
 net/sctp/socket.c | 15 ++++++++++-----
 2 files changed, 16 insertions(+), 6 deletions(-)

diff --git a/net/sctp/proc.c b/net/sctp/proc.c
index 9fe1393..4cb5aed 100644
--- a/net/sctp/proc.c
+++ b/net/sctp/proc.c
@@ -280,6 +280,7 @@ void sctp_eps_proc_exit(struct net *net)
 struct sctp_ht_iter {
 	struct seq_net_private p;
 	struct rhashtable_iter hti;
+	int start_fail;
 };
 
 static void *sctp_transport_seq_start(struct seq_file *seq, loff_t *pos)
@@ -287,8 +288,10 @@ static void *sctp_transport_seq_start(struct seq_file *seq, loff_t *pos)
 	struct sctp_ht_iter *iter = seq->private;
 	int err = sctp_transport_walk_start(&iter->hti);
 
-	if (err)
+	if (err) {
+		iter->start_fail = 1;
 		return ERR_PTR(err);
+	}
 
 	return sctp_transport_get_idx(seq_file_net(seq), &iter->hti, *pos);
 }
@@ -297,6 +300,8 @@ static void sctp_transport_seq_stop(struct seq_file *seq, void *v)
 {
 	struct sctp_ht_iter *iter = seq->private;
 
+	if (iter->start_fail)
+		return;
 	sctp_transport_walk_stop(&iter->hti);
 }
 
diff --git a/net/sctp/socket.c b/net/sctp/socket.c
index b0bf6c7..473a40c 100644
--- a/net/sctp/socket.c
+++ b/net/sctp/socket.c
@@ -4298,8 +4298,12 @@ int sctp_transport_walk_start(struct rhashtable_iter *iter)
 		return err;
 
 	err = rhashtable_walk_start(iter);
+	if (err && err != -EAGAIN) {
+		rhashtable_walk_exit(iter);
+		return err;
+	}
 
-	return err == -EAGAIN ? 0 : err;
+	return 0;
 }
 
 void sctp_transport_walk_stop(struct rhashtable_iter *iter)
@@ -4388,11 +4392,12 @@ EXPORT_SYMBOL_GPL(sctp_transport_lookup_process);
 int sctp_for_each_transport(int (*cb)(struct sctp_transport *, void *),
 			    struct net *net, int pos, void *p) {
 	struct rhashtable_iter hti;
-	int err = 0;
 	void *obj;
+	int err;
 
-	if (sctp_transport_walk_start(&hti))
-		goto out;
+	err = sctp_transport_walk_start(&hti);
+	if (err)
+		return err;
 
 	sctp_transport_get_idx(net, &hti, pos);
 	obj = sctp_transport_get_next(net, &hti);
@@ -4406,8 +4411,8 @@ int sctp_for_each_transport(int (*cb)(struct sctp_transport *, void *),
 		if (err)
 			break;
 	}
-out:
 	sctp_transport_walk_stop(&hti);
+
 	return err;
 }
 EXPORT_SYMBOL_GPL(sctp_for_each_transport);
-- 
2.1.0

^ permalink raw reply related

* [PATCHv2 net-next 5/6] sctp: merge the seq_start/next/exits in remaddrs and assocs
From: Xin Long @ 2016-04-09  4:53 UTC (permalink / raw)
  To: network dev, linux-sctp
  Cc: Marcelo Ricardo Leitner, Vlad Yasevich, daniel, davem
In-Reply-To: <cover.1460177331.git.lucien.xin@gmail.com>

In sctp proc, these three functions in remaddrs and assocs are the
same. we should merge them into one.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
---
 net/sctp/proc.c | 45 +++++++++------------------------------------
 1 file changed, 9 insertions(+), 36 deletions(-)

diff --git a/net/sctp/proc.c b/net/sctp/proc.c
index dd8492f..9fe1393 100644
--- a/net/sctp/proc.c
+++ b/net/sctp/proc.c
@@ -282,7 +282,7 @@ struct sctp_ht_iter {
 	struct rhashtable_iter hti;
 };
 
-static void *sctp_assocs_seq_start(struct seq_file *seq, loff_t *pos)
+static void *sctp_transport_seq_start(struct seq_file *seq, loff_t *pos)
 {
 	struct sctp_ht_iter *iter = seq->private;
 	int err = sctp_transport_walk_start(&iter->hti);
@@ -293,14 +293,14 @@ static void *sctp_assocs_seq_start(struct seq_file *seq, loff_t *pos)
 	return sctp_transport_get_idx(seq_file_net(seq), &iter->hti, *pos);
 }
 
-static void sctp_assocs_seq_stop(struct seq_file *seq, void *v)
+static void sctp_transport_seq_stop(struct seq_file *seq, void *v)
 {
 	struct sctp_ht_iter *iter = seq->private;
 
 	sctp_transport_walk_stop(&iter->hti);
 }
 
-static void *sctp_assocs_seq_next(struct seq_file *seq, void *v, loff_t *pos)
+static void *sctp_transport_seq_next(struct seq_file *seq, void *v, loff_t *pos)
 {
 	struct sctp_ht_iter *iter = seq->private;
 
@@ -367,9 +367,9 @@ static int sctp_assocs_seq_show(struct seq_file *seq, void *v)
 }
 
 static const struct seq_operations sctp_assoc_ops = {
-	.start = sctp_assocs_seq_start,
-	.next  = sctp_assocs_seq_next,
-	.stop  = sctp_assocs_seq_stop,
+	.start = sctp_transport_seq_start,
+	.next  = sctp_transport_seq_next,
+	.stop  = sctp_transport_seq_stop,
 	.show  = sctp_assocs_seq_show,
 };
 
@@ -406,33 +406,6 @@ void sctp_assocs_proc_exit(struct net *net)
 	remove_proc_entry("assocs", net->sctp.proc_net_sctp);
 }
 
-static void *sctp_remaddr_seq_start(struct seq_file *seq, loff_t *pos)
-{
-	struct sctp_ht_iter *iter = seq->private;
-	int err = sctp_transport_walk_start(&iter->hti);
-
-	if (err)
-		return ERR_PTR(err);
-
-	return sctp_transport_get_idx(seq_file_net(seq), &iter->hti, *pos);
-}
-
-static void *sctp_remaddr_seq_next(struct seq_file *seq, void *v, loff_t *pos)
-{
-	struct sctp_ht_iter *iter = seq->private;
-
-	++*pos;
-
-	return sctp_transport_get_next(seq_file_net(seq), &iter->hti);
-}
-
-static void sctp_remaddr_seq_stop(struct seq_file *seq, void *v)
-{
-	struct sctp_ht_iter *iter = seq->private;
-
-	sctp_transport_walk_stop(&iter->hti);
-}
-
 static int sctp_remaddr_seq_show(struct seq_file *seq, void *v)
 {
 	struct sctp_association *assoc;
@@ -506,9 +479,9 @@ static int sctp_remaddr_seq_show(struct seq_file *seq, void *v)
 }
 
 static const struct seq_operations sctp_remaddr_ops = {
-	.start = sctp_remaddr_seq_start,
-	.next  = sctp_remaddr_seq_next,
-	.stop  = sctp_remaddr_seq_stop,
+	.start = sctp_transport_seq_start,
+	.next  = sctp_transport_seq_next,
+	.stop  = sctp_transport_seq_stop,
 	.show  = sctp_remaddr_seq_show,
 };
 
-- 
2.1.0

^ permalink raw reply related

* [PATCHv2 net-next 4/6] sctp: add the sctp_diag.c file
From: Xin Long @ 2016-04-09  4:53 UTC (permalink / raw)
  To: network dev, linux-sctp
  Cc: Marcelo Ricardo Leitner, Vlad Yasevich, daniel, davem
In-Reply-To: <cover.1460177331.git.lucien.xin@gmail.com>

This one will implement all the interface of inet_diag, inet_diag_handler.
which includes sctp_diag_dump, sctp_diag_dump_one and sctp_diag_get_info.

It will work as a modules, and register inet_diag_handler when loading.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
---
 include/uapi/linux/inet_diag.h |   2 +
 net/sctp/Kconfig               |   4 +
 net/sctp/Makefile              |   1 +
 net/sctp/sctp_diag.c           | 581 +++++++++++++++++++++++++++++++++++++++++
 4 files changed, 588 insertions(+)
 create mode 100644 net/sctp/sctp_diag.c

diff --git a/include/uapi/linux/inet_diag.h b/include/uapi/linux/inet_diag.h
index 68a1f71..f5f3629 100644
--- a/include/uapi/linux/inet_diag.h
+++ b/include/uapi/linux/inet_diag.h
@@ -113,6 +113,8 @@ enum {
 	INET_DIAG_DCTCPINFO,
 	INET_DIAG_PROTOCOL,  /* response attribute only */
 	INET_DIAG_SKV6ONLY,
+	INET_DIAG_LOCALS,
+	INET_DIAG_PEERS,
 };
 
 #define INET_DIAG_MAX INET_DIAG_SKV6ONLY
diff --git a/net/sctp/Kconfig b/net/sctp/Kconfig
index 71c1a59..d9c04dc 100644
--- a/net/sctp/Kconfig
+++ b/net/sctp/Kconfig
@@ -99,5 +99,9 @@ config SCTP_COOKIE_HMAC_SHA1
 	select CRYPTO_HMAC if SCTP_COOKIE_HMAC_SHA1
 	select CRYPTO_SHA1 if SCTP_COOKIE_HMAC_SHA1
 
+config INET_SCTP_DIAG
+	depends on INET_DIAG
+	def_tristate INET_DIAG
+
 
 endif # IP_SCTP
diff --git a/net/sctp/Makefile b/net/sctp/Makefile
index 3b4ffb0..0fca582 100644
--- a/net/sctp/Makefile
+++ b/net/sctp/Makefile
@@ -4,6 +4,7 @@
 
 obj-$(CONFIG_IP_SCTP) += sctp.o
 obj-$(CONFIG_NET_SCTPPROBE) += sctp_probe.o
+obj-$(CONFIG_INET_SCTP_DIAG) += sctp_diag.o
 
 sctp-y := sm_statetable.o sm_statefuns.o sm_sideeffect.o \
 	  protocol.o endpointola.o associola.o \
diff --git a/net/sctp/sctp_diag.c b/net/sctp/sctp_diag.c
new file mode 100644
index 0000000..86fccc2
--- /dev/null
+++ b/net/sctp/sctp_diag.c
@@ -0,0 +1,581 @@
+#include <linux/module.h>
+#include <linux/inet_diag.h>
+#include <linux/sock_diag.h>
+#include <net/sctp/sctp.h>
+
+extern const struct inet_diag_handler *inet_diag_get_handler(int proto);
+extern void inet_diag_msg_common_fill(struct inet_diag_msg *r,
+				      struct sock *sk);
+
+static int inet_sctp_fill_laddrs(struct sk_buff *skb,
+				 struct list_head *address_list)
+{
+	struct sctp_sockaddr_entry *laddr;
+	int addrlen = sizeof(struct sockaddr_storage);
+	int addrcnt = 0;
+	struct nlattr *attr;
+	void *info = NULL;
+
+	list_for_each_entry_rcu(laddr, address_list, list)
+		addrcnt++;
+
+	attr = nla_reserve(skb, INET_DIAG_LOCALS, addrlen * addrcnt);
+	if (!attr)
+		return -EMSGSIZE;
+
+	info = nla_data(attr);
+	list_for_each_entry_rcu(laddr, address_list, list) {
+		memcpy(info, &laddr->a, addrlen);
+		info += addrlen;
+	}
+
+	return 0;
+}
+
+static int inet_sctp_fill_paddrs(struct sk_buff *skb,
+				 struct sctp_association *asoc)
+{
+	int addrlen = sizeof(struct sockaddr_storage);
+	struct sctp_transport *from;
+	struct nlattr *attr;
+	void *info = NULL;
+
+	attr = nla_reserve(skb, INET_DIAG_PEERS,
+			   addrlen * asoc->peer.transport_count);
+	if (!attr)
+		return -EMSGSIZE;
+
+	info = nla_data(attr);
+	list_for_each_entry(from, &asoc->peer.transport_addr_list,
+			    transports) {
+		memcpy(info, &from->ipaddr, addrlen);
+		info += addrlen;
+	}
+
+	return 0;
+}
+
+static int inet_assoc_diag_fill(struct sock *sk,
+				struct sctp_association *asoc,
+				struct sk_buff *skb,
+				const struct inet_diag_req_v2 *req,
+				struct user_namespace *user_ns,
+				int portid, u32 seq, u16 nlmsg_flags,
+				const struct nlmsghdr *unlh)
+{
+	const struct inet_sock *inet = inet_sk(sk);
+	const struct inet_diag_handler *handler;
+	int ext = req->idiag_ext;
+	struct inet_diag_msg *r;
+	struct nlmsghdr  *nlh;
+	struct nlattr *attr;
+	void *info = NULL;
+	union sctp_addr laddr, paddr;
+	struct dst_entry *dst;
+	struct sctp_infox infox;
+
+	handler = inet_diag_get_handler(req->sdiag_protocol);
+	BUG_ON(!handler);
+
+	nlh = nlmsg_put(skb, portid, seq, unlh->nlmsg_type, sizeof(*r),
+			nlmsg_flags);
+	if (!nlh)
+		return -EMSGSIZE;
+
+	r = nlmsg_data(nlh);
+	BUG_ON(!sk_fullsock(sk));
+
+	laddr = list_entry(asoc->base.bind_addr.address_list.next,
+			   struct sctp_sockaddr_entry, list)->a;
+	paddr = asoc->peer.primary_path->ipaddr;
+	dst = asoc->peer.primary_path->dst;
+
+	r->idiag_family = sk->sk_family;
+	r->id.idiag_sport = htons(asoc->base.bind_addr.port);
+	r->id.idiag_dport = htons(asoc->peer.port);
+	r->id.idiag_if = dst ? dst->dev->ifindex : 0;
+	sock_diag_save_cookie(sk, r->id.idiag_cookie);
+
+#if IS_ENABLED(CONFIG_IPV6)
+	if (sk->sk_family == AF_INET6) {
+		*(struct in6_addr *)r->id.idiag_src = laddr.v6.sin6_addr;
+		*(struct in6_addr *)r->id.idiag_dst = paddr.v6.sin6_addr;
+	} else
+#endif
+	{
+		memset(&r->id.idiag_src, 0, sizeof(r->id.idiag_src));
+		memset(&r->id.idiag_dst, 0, sizeof(r->id.idiag_dst));
+
+		r->id.idiag_src[0] = laddr.v4.sin_addr.s_addr;
+		r->id.idiag_dst[0] = paddr.v4.sin_addr.s_addr;
+	}
+
+	r->idiag_state = asoc->state;
+	r->idiag_timer = SCTP_EVENT_TIMEOUT_T3_RTX;
+	r->idiag_retrans = asoc->rtx_data_chunks;
+#define EXPIRES_IN_MS(tmo)  DIV_ROUND_UP((tmo - jiffies) * 1000, HZ)
+	r->idiag_expires =
+		EXPIRES_IN_MS(asoc->timeouts[SCTP_EVENT_TIMEOUT_T3_RTX]);
+#undef EXPIRES_IN_MS
+
+	if (nla_put_u8(skb, INET_DIAG_SHUTDOWN, sk->sk_shutdown))
+		goto errout;
+
+	/* IPv6 dual-stack sockets use inet->tos for IPv4 connections,
+	 * hence this needs to be included regardless of socket family.
+	 */
+	if (ext & (1 << (INET_DIAG_TOS - 1)))
+		if (nla_put_u8(skb, INET_DIAG_TOS, inet->tos) < 0)
+			goto errout;
+
+#if IS_ENABLED(CONFIG_IPV6)
+	if (r->idiag_family == AF_INET6) {
+		if (ext & (1 << (INET_DIAG_TCLASS - 1)))
+			if (nla_put_u8(skb, INET_DIAG_TCLASS,
+				       inet6_sk(sk)->tclass) < 0)
+				goto errout;
+
+		if (((1 << sk->sk_state) & (TCPF_LISTEN | TCPF_CLOSE)) &&
+		    nla_put_u8(skb, INET_DIAG_SKV6ONLY, ipv6_only_sock(sk)))
+			goto errout;
+	}
+#endif
+
+	r->idiag_uid = from_kuid_munged(user_ns, sock_i_uid(sk));
+	r->idiag_inode = sock_i_ino(sk);
+
+	if (ext & (1 << (INET_DIAG_MEMINFO - 1))) {
+		struct inet_diag_meminfo minfo = {
+			.idiag_rmem = sk_rmem_alloc_get(sk),
+			.idiag_wmem = sk->sk_wmem_queued,
+			.idiag_fmem = sk->sk_forward_alloc,
+			.idiag_tmem = sk_wmem_alloc_get(sk),
+		};
+
+		if (nla_put(skb, INET_DIAG_MEMINFO, sizeof(minfo), &minfo) < 0)
+			goto errout;
+	}
+
+	if (ext & (1 << (INET_DIAG_SKMEMINFO - 1)))
+		if (sock_diag_put_meminfo(sk, skb, INET_DIAG_SKMEMINFO))
+			goto errout;
+
+	if ((ext & (1 << (INET_DIAG_INFO - 1))) && handler->idiag_info_size) {
+		attr = nla_reserve(skb, INET_DIAG_INFO,
+				   handler->idiag_info_size);
+		if (!attr)
+			goto errout;
+
+		info = nla_data(attr);
+	}
+	infox.sctpinfo = (struct sctp_info *)info;
+	infox.asoc = asoc;
+	handler->idiag_get_info(sk, r, &infox);
+
+	if (ext & (1 << (INET_DIAG_CONG - 1)))
+		if (nla_put_string(skb, INET_DIAG_CONG, "reno") < 0)
+			goto errout;
+
+	if (inet_sctp_fill_laddrs(skb, &asoc->base.bind_addr.address_list))
+		goto errout;
+
+	if (inet_sctp_fill_paddrs(skb, asoc))
+		goto errout;
+
+	nlmsg_end(skb, nlh);
+	return 0;
+
+errout:
+	nlmsg_cancel(skb, nlh);
+	return -EMSGSIZE;
+}
+
+static int inet_ep_diag_fill(struct sock *sk, struct sctp_endpoint *ep,
+			     struct sk_buff *skb,
+			     const struct inet_diag_req_v2 *req,
+			     struct user_namespace *user_ns,
+			     u32 portid, u32 seq, u16 nlmsg_flags,
+			     const struct nlmsghdr *unlh)
+{
+	const struct inet_sock *inet = inet_sk(sk);
+	const struct inet_diag_handler *handler;
+	int ext = req->idiag_ext;
+	struct inet_diag_msg *r;
+	struct nlmsghdr  *nlh;
+	struct nlattr *attr;
+	void *info = NULL;
+	struct sctp_infox infox;
+
+	handler = inet_diag_get_handler(req->sdiag_protocol);
+	BUG_ON(!handler);
+
+	nlh = nlmsg_put(skb, portid, seq, unlh->nlmsg_type, sizeof(*r),
+			nlmsg_flags);
+	if (!nlh)
+		return -EMSGSIZE;
+
+	r = nlmsg_data(nlh);
+	BUG_ON(!sk_fullsock(sk));
+
+	inet_diag_msg_common_fill(r, sk);
+	r->idiag_state = sk->sk_state;
+	r->idiag_timer = 0;
+	r->idiag_retrans = 0;
+
+	if (nla_put_u8(skb, INET_DIAG_SHUTDOWN, sk->sk_shutdown))
+		goto errout;
+
+	/* IPv6 dual-stack sockets use inet->tos for IPv4 connections,
+	 * hence this needs to be included regardless of socket family.
+	 */
+	if (ext & (1 << (INET_DIAG_TOS - 1)))
+		if (nla_put_u8(skb, INET_DIAG_TOS, inet->tos) < 0)
+			goto errout;
+
+#if IS_ENABLED(CONFIG_IPV6)
+	if (r->idiag_family == AF_INET6) {
+		if (ext & (1 << (INET_DIAG_TCLASS - 1)))
+			if (nla_put_u8(skb, INET_DIAG_TCLASS,
+				       inet6_sk(sk)->tclass) < 0)
+				goto errout;
+
+		if (((1 << sk->sk_state) & (TCPF_LISTEN | TCPF_CLOSE)) &&
+		    nla_put_u8(skb, INET_DIAG_SKV6ONLY, ipv6_only_sock(sk)))
+			goto errout;
+	}
+#endif
+
+	r->idiag_uid = from_kuid_munged(user_ns, sock_i_uid(sk));
+	r->idiag_inode = sock_i_ino(sk);
+
+	if (ext & (1 << (INET_DIAG_MEMINFO - 1))) {
+		struct inet_diag_meminfo minfo = {
+			.idiag_rmem = sk_rmem_alloc_get(sk),
+			.idiag_wmem = sk->sk_wmem_queued,
+			.idiag_fmem = sk->sk_forward_alloc,
+			.idiag_tmem = sk_wmem_alloc_get(sk),
+		};
+
+		if (nla_put(skb, INET_DIAG_MEMINFO, sizeof(minfo), &minfo) < 0)
+			goto errout;
+	}
+
+	if (ext & (1 << (INET_DIAG_SKMEMINFO - 1)))
+		if (sock_diag_put_meminfo(sk, skb, INET_DIAG_SKMEMINFO))
+			goto errout;
+
+	if ((ext & (1 << (INET_DIAG_INFO - 1))) && handler->idiag_info_size) {
+		attr = nla_reserve(skb, INET_DIAG_INFO,
+				   handler->idiag_info_size);
+		if (!attr)
+			goto errout;
+
+		info = nla_data(attr);
+	}
+	infox.sctpinfo = (struct sctp_info *)info;
+	infox.asoc = NULL;
+	handler->idiag_get_info(sk, r, &infox);
+
+	if (inet_sctp_fill_laddrs(skb, &ep->base.bind_addr.address_list))
+		goto errout;
+
+	nlmsg_end(skb, nlh);
+	return 0;
+
+errout:
+	nlmsg_cancel(skb, nlh);
+	return -EMSGSIZE;
+}
+
+static size_t inet_assoc_attr_size(struct sctp_association *asoc)
+{
+	int addrlen = sizeof(struct sockaddr_storage);
+	int addrcnt = 0;
+	struct sctp_sockaddr_entry *laddr;
+
+	list_for_each_entry_rcu(laddr, &asoc->base.bind_addr.address_list,
+				list)
+		addrcnt++;
+
+	return	  nla_total_size(sizeof(struct tcp_info))
+		+ nla_total_size(1) /* INET_DIAG_SHUTDOWN */
+		+ nla_total_size(1) /* INET_DIAG_TOS */
+		+ nla_total_size(1) /* INET_DIAG_TCLASS */
+		+ nla_total_size(addrlen * asoc->peer.transport_count)
+		+ nla_total_size(addrlen * addrcnt)
+		+ nla_total_size(sizeof(struct inet_diag_meminfo))
+		+ nla_total_size(sizeof(struct inet_diag_msg))
+		+ nla_total_size(sizeof(struct sctp_info))
+		+ 64;
+}
+
+/* callback and param */
+struct sctp_comm_param {
+	struct sk_buff *skb;
+	struct netlink_callback *cb;
+	const struct inet_diag_req_v2 *r;
+	const struct nlmsghdr *nlh;
+};
+
+static int sctp_tsp_dump_one(struct sctp_transport *tsp, void *p)
+{
+	struct sctp_association *assoc = tsp->asoc;
+	struct sock *sk = tsp->asoc->base.sk;
+	struct sctp_comm_param *commp = p;
+	struct sk_buff *in_skb = commp->skb;
+	const struct inet_diag_req_v2 *req = commp->r;
+	const struct nlmsghdr *nlh = commp->nlh;
+	struct net *net = sock_net(in_skb->sk);
+	struct sk_buff *rep;
+	int err;
+
+	err = sock_diag_check_cookie(sk, req->id.idiag_cookie);
+	if (err)
+		goto out;
+
+	err = -ENOMEM;
+	rep = nlmsg_new(inet_assoc_attr_size(assoc), GFP_KERNEL);
+	if (!rep)
+		goto out;
+
+	err = inet_assoc_diag_fill(sk, assoc, rep, req,
+				   sk_user_ns(NETLINK_CB(in_skb).sk),
+				   NETLINK_CB(in_skb).portid,
+				   nlh->nlmsg_seq, 0, nlh);
+	if (err < 0) {
+		WARN_ON(err == -EMSGSIZE);
+		kfree_skb(rep);
+		goto out;
+	}
+
+	err = netlink_unicast(net->diag_nlsk, rep, NETLINK_CB(in_skb).portid,
+			      MSG_DONTWAIT);
+	if (err > 0)
+		err = 0;
+out:
+	return err;
+}
+
+static int sctp_tsp_dump(struct sctp_transport *tsp, void *p)
+{
+	struct sctp_endpoint *ep = tsp->asoc->ep;
+	struct sctp_comm_param *commp = p;
+	struct sock *sk = ep->base.sk;
+	struct sk_buff *skb = commp->skb;
+	struct netlink_callback *cb = commp->cb;
+	const struct inet_diag_req_v2 *r = commp->r;
+	struct sctp_association *assoc =
+		list_entry(ep->asocs.next, struct sctp_association, asocs);
+	int err = 0;
+
+	if (tsp->asoc != assoc)
+		goto out;
+
+	if (r->sdiag_family != AF_UNSPEC && sk->sk_family != r->sdiag_family)
+		goto out;
+
+	lock_sock(sk);
+	list_for_each_entry(assoc, &ep->asocs, asocs) {
+		if (cb->args[4] < cb->args[1])
+			goto next;
+
+		if (r->id.idiag_sport != htons(assoc->base.bind_addr.port) &&
+		    r->id.idiag_sport)
+			goto next;
+		if (r->id.idiag_dport != htons(assoc->peer.port) &&
+		    r->id.idiag_dport)
+			goto next;
+
+		if (!cb->args[3] &&
+		    inet_ep_diag_fill(sk, ep, skb, r,
+				      sk_user_ns(NETLINK_CB(cb->skb).sk),
+				      NETLINK_CB(cb->skb).portid,
+				      cb->nlh->nlmsg_seq,
+				      NLM_F_MULTI, cb->nlh) < 0) {
+			cb->args[3] = 1;
+			err = 2;
+			goto release;
+		}
+		cb->args[3] = 1;
+
+		if (inet_assoc_diag_fill(sk, assoc, skb, r,
+					 sk_user_ns(NETLINK_CB(cb->skb).sk),
+					 NETLINK_CB(cb->skb).portid,
+					 cb->nlh->nlmsg_seq, 0, cb->nlh) < 0) {
+			err = 2;
+			goto release;
+		}
+next:
+		cb->args[4]++;
+	}
+	cb->args[1] = 0;
+	cb->args[2]++;
+	cb->args[3] = 0;
+	cb->args[4] = 0;
+release:
+	release_sock(sk);
+	return err;
+out:
+	cb->args[2]++;
+	return err;
+}
+
+static int sctp_ep_dump(struct sctp_endpoint *ep, void *p)
+{
+	struct sctp_comm_param *commp = p;
+	struct sock *sk = ep->base.sk;
+	struct sk_buff *skb = commp->skb;
+	struct netlink_callback *cb = commp->cb;
+	const struct inet_diag_req_v2 *r = commp->r;
+	struct net *net = sock_net(skb->sk);
+	struct inet_sock *inet = inet_sk(sk);
+	int err = 0;
+
+	if (!net_eq(sock_net(sk), net))
+		goto out;
+
+	if (cb->args[4] < cb->args[1])
+		goto next;
+
+	if (r->sdiag_family != AF_UNSPEC &&
+	    sk->sk_family != r->sdiag_family)
+		goto next;
+
+	if (r->id.idiag_sport != inet->inet_sport &&
+	    r->id.idiag_sport)
+		goto next;
+
+	if (r->id.idiag_dport != inet->inet_dport &&
+	    r->id.idiag_dport)
+		goto next;
+
+	if (inet_ep_diag_fill(sk, ep, skb, r,
+			      sk_user_ns(NETLINK_CB(cb->skb).sk),
+			      NETLINK_CB(cb->skb).portid,
+			      cb->nlh->nlmsg_seq, NLM_F_MULTI,
+			      cb->nlh) < 0) {
+		err = 2;
+		goto out;
+	}
+next:
+	cb->args[4]++;
+out:
+	return err;
+}
+
+/* define the functions for sctp_diag_handler*/
+static void sctp_diag_get_info(struct sock *sk, struct inet_diag_msg *r,
+			       void *info)
+{
+	struct sctp_infox *infox = (struct sctp_infox *)info;
+
+	if (infox->asoc) {
+		r->idiag_rqueue = atomic_read(&infox->asoc->rmem_alloc);
+		r->idiag_wqueue = infox->asoc->sndbuf_used;
+	} else {
+		r->idiag_rqueue = sk->sk_ack_backlog;
+		r->idiag_wqueue = sk->sk_max_ack_backlog;
+	}
+	if (infox->sctpinfo)
+		sctp_get_sctp_info(sk, infox->asoc, infox->sctpinfo);
+}
+
+static int sctp_diag_dump_one(struct sk_buff *in_skb,
+			      const struct nlmsghdr *nlh,
+			      const struct inet_diag_req_v2 *req)
+{
+	struct net *net = sock_net(in_skb->sk);
+	union sctp_addr laddr, paddr;
+	struct sctp_comm_param commp = {
+		.skb = in_skb,
+		.r = req,
+		.nlh = nlh,
+	};
+
+	if (req->sdiag_family == AF_INET) {
+		laddr.v4.sin_port = req->id.idiag_sport;
+		laddr.v4.sin_addr.s_addr = req->id.idiag_src[0];
+		laddr.v4.sin_family = AF_INET;
+
+		paddr.v4.sin_port = req->id.idiag_dport;
+		paddr.v4.sin_addr.s_addr = req->id.idiag_dst[0];
+		paddr.v4.sin_family = AF_INET;
+	} else {
+		laddr.v6.sin6_port = req->id.idiag_sport;
+		memcpy(&laddr.v6.sin6_addr, req->id.idiag_src, 64);
+		laddr.v6.sin6_family = AF_INET6;
+
+		paddr.v6.sin6_port = req->id.idiag_dport;
+		memcpy(&paddr.v6.sin6_addr, req->id.idiag_dst, 64);
+		paddr.v6.sin6_family = AF_INET6;
+	}
+
+	return sctp_transport_lookup_process(sctp_tsp_dump_one,
+					     net, &laddr, &paddr, &commp);
+}
+
+static void sctp_diag_dump(struct sk_buff *skb, struct netlink_callback *cb,
+			   const struct inet_diag_req_v2 *r, struct nlattr *bc)
+{
+	u32 idiag_states = r->idiag_states;
+	struct net *net = sock_net(skb->sk);
+	struct sctp_comm_param commp = {
+		.skb = skb,
+		.cb = cb,
+		.r = r,
+	};
+
+	/* eps hashtable dumps
+	 * args:
+	 * 0 : if it will traversal listen sock
+	 * 1 : to record the sock pos of this time's traversal
+	 * 4 : to work as a temporary variable to traversal list
+	 */
+	if (cb->args[0] == 0) {
+		if (!(idiag_states & TCPF_LISTEN))
+			goto skip;
+		if (sctp_for_each_endpoint(sctp_ep_dump, &commp))
+			goto done;
+skip:
+		cb->args[0] = 1;
+		cb->args[1] = 0;
+		cb->args[4] = 0;
+	}
+
+	/* asocs by transport hashtable dump
+	 * args:
+	 * 1 : to record the assoc pos of this time's traversal
+	 * 2 : to record the transport pos of this time's traversal
+	 * 3 : to mark if we have dumped the ep info of the current asoc
+	 * 4 : to work as a temporary variable to traversal list
+	 */
+	if (!(idiag_states & ~TCPF_LISTEN))
+		goto done;
+	sctp_for_each_transport(sctp_tsp_dump, net, cb->args[2], &commp);
+done:
+	cb->args[1] = cb->args[4];
+	cb->args[4] = 0;
+}
+
+static const struct inet_diag_handler sctp_diag_handler = {
+	.dump		 = sctp_diag_dump,
+	.dump_one	 = sctp_diag_dump_one,
+	.idiag_get_info  = sctp_diag_get_info,
+	.idiag_type	 = IPPROTO_SCTP,
+	.idiag_info_size = sizeof(struct sctp_info),
+};
+
+static int __init sctp_diag_init(void)
+{
+	return inet_diag_register(&sctp_diag_handler);
+}
+
+static void __exit sctp_diag_exit(void)
+{
+	inet_diag_unregister(&sctp_diag_handler);
+}
+
+module_init(sctp_diag_init);
+module_exit(sctp_diag_exit);
+MODULE_LICENSE("GPL");
+MODULE_ALIAS_NET_PF_PROTO_TYPE(PF_NETLINK, NETLINK_SOCK_DIAG, 2-132);
-- 
2.1.0

^ permalink raw reply related

* Re: [PATCH net-next] net: bcmgenet: add BQL support
From: Eric Dumazet @ 2016-04-09  4:53 UTC (permalink / raw)
  To: Petri Gynther
  Cc: David Miller, netdev, Florian Fainelli, opendmb, Jaedon Shin
In-Reply-To: <CAGXr9JEiz3W3HxBJSHbk1pjikTP6M-Dg9o7wNCQpqc1sguZQpw@mail.gmail.com>

On Fri, 2016-04-08 at 21:13 -0700, Petri Gynther wrote:

> What values does the networking core program into BQL dynamic limits
> that my code in netdev->ndo_open() would wipe out?
> 

0 and 0

Clearing again these values by 0 and 0 is defensive programming.

As I said, no BQL enabled driver does that, and we do not want various
drivers implementing BQL in various ways.

Having the same logic is easier for code review and maintenance.

This was proven to work for many years.

> You mentioned the queue init path:
> netdev_init_one_queue() -> dql_init() -> dql_reset()
> 
> that is called when the netdev is created and Tx queues allocated.
> 
> But, does the networking core somewhere set *different* values for BQL
> dynamic limits than what dql_reset() did, before opening the device?
> 
> > For example, tg3 calls netdev_tx_reset_queue() only when freeing tx
> > rings, as it might have freed skb(s) not from normal TX complete path
> > and thus missed appropriate dql_completed().
> >
> 
> Looking at the tg3 driver, it calls:
> tg3_stop()
>   tg3_free_rings()
>     netdev_tx_reset_queue()
> 
> netdev_tx_reset_queue() is called unconditionally, as long as the Tx
> ring exists. So "ip link set dev eth<x> down" would cause it to be
> called.
> 
> Why is it OK to call netdev_tx_reset_queue() from the
> netdev->ndo_stop() path, but not from netdev->ndo_open() path?

Because we properly init BQL state when a device is created in core
networking stack. So that we do not have to copy the same code over and
over in 100 drivers. This is called code factorization.


Put these calls in bcmgenet_fini_dma(), to follow the BQL model used in
all other drivers.

Thanks.

^ permalink raw reply

* [PATCHv2 net-next 3/6] sctp: export some functions for sctp_diag in inet_diag
From: Xin Long @ 2016-04-09  4:53 UTC (permalink / raw)
  To: network dev, linux-sctp
  Cc: Marcelo Ricardo Leitner, Vlad Yasevich, daniel, davem
In-Reply-To: <cover.1460177331.git.lucien.xin@gmail.com>

inet_diag_msg_common_fill is used to fill the diag msg common info,
we need to use it in sctp_diag as well, so export it.

We also add inet_diag_get_handler() to access inet_diag_table in sctp
diag.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
---
 net/ipv4/inet_diag.c | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/net/ipv4/inet_diag.c b/net/ipv4/inet_diag.c
index bd591eb..5a0bfe0 100644
--- a/net/ipv4/inet_diag.c
+++ b/net/ipv4/inet_diag.c
@@ -66,7 +66,13 @@ static void inet_diag_unlock_handler(const struct inet_diag_handler *handler)
 	mutex_unlock(&inet_diag_table_mutex);
 }
 
-static void inet_diag_msg_common_fill(struct inet_diag_msg *r, struct sock *sk)
+const struct inet_diag_handler *inet_diag_get_handler(int proto)
+{
+	return inet_diag_table[proto];
+}
+EXPORT_SYMBOL_GPL(inet_diag_get_handler);
+
+void inet_diag_msg_common_fill(struct inet_diag_msg *r, struct sock *sk)
 {
 	r->idiag_family = sk->sk_family;
 
@@ -89,6 +95,7 @@ static void inet_diag_msg_common_fill(struct inet_diag_msg *r, struct sock *sk)
 	r->id.idiag_dst[0] = sk->sk_daddr;
 	}
 }
+EXPORT_SYMBOL_GPL(inet_diag_msg_common_fill);
 
 static size_t inet_sk_attr_size(void)
 {
-- 
2.1.0

^ permalink raw reply related

* [PATCHv2 net-next 2/6] sctp: export some apis or variables for sctp_diag and reuse some for proc
From: Xin Long @ 2016-04-09  4:53 UTC (permalink / raw)
  To: network dev, linux-sctp
  Cc: Marcelo Ricardo Leitner, Vlad Yasevich, daniel, davem
In-Reply-To: <cover.1460177331.git.lucien.xin@gmail.com>

For some main variables in sctp.ko, we couldn't export it to other modules,
so we have to define some api to access them.

It will include sctp transport and endpoint's traversal.

There are some transport traversal functions for sctp_diag, we can also
use it for sctp_proc. cause they have the similar situation to traversal
transport.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
---
 include/net/sctp/sctp.h |  13 +++++
 net/sctp/proc.c         |  80 +++++++------------------------
 net/sctp/socket.c       | 124 ++++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 155 insertions(+), 62 deletions(-)

diff --git a/include/net/sctp/sctp.h b/include/net/sctp/sctp.h
index 36e1eae..c0c4deb 100644
--- a/include/net/sctp/sctp.h
+++ b/include/net/sctp/sctp.h
@@ -116,6 +116,19 @@ extern struct percpu_counter sctp_sockets_allocated;
 int sctp_asconf_mgmt(struct sctp_sock *, struct sctp_sockaddr_entry *);
 struct sk_buff *sctp_skb_recv_datagram(struct sock *, int, int, int *);
 
+int sctp_transport_walk_start(struct rhashtable_iter *iter);
+void sctp_transport_walk_stop(struct rhashtable_iter *iter);
+struct sctp_transport *sctp_transport_get_next(struct net *net,
+			struct rhashtable_iter *iter);
+struct sctp_transport *sctp_transport_get_idx(struct net *net,
+			struct rhashtable_iter *iter, int pos);
+int sctp_transport_lookup_process(int (*cb)(struct sctp_transport *, void *),
+				  struct net *net,
+				  const union sctp_addr *laddr,
+				  const union sctp_addr *paddr, void *p);
+int sctp_for_each_transport(int (*cb)(struct sctp_transport *, void *),
+			    struct net *net, int pos, void *p);
+int sctp_for_each_endpoint(int (*cb)(struct sctp_endpoint *, void *), void *p);
 int sctp_get_sctp_info(struct sock *sk, struct sctp_association *asoc,
 		       struct sctp_info *info);
 
diff --git a/net/sctp/proc.c b/net/sctp/proc.c
index 5cfac8d..dd8492f 100644
--- a/net/sctp/proc.c
+++ b/net/sctp/proc.c
@@ -282,80 +282,31 @@ struct sctp_ht_iter {
 	struct rhashtable_iter hti;
 };
 
-static struct sctp_transport *sctp_transport_get_next(struct seq_file *seq)
-{
-	struct sctp_ht_iter *iter = seq->private;
-	struct sctp_transport *t;
-
-	t = rhashtable_walk_next(&iter->hti);
-	for (; t; t = rhashtable_walk_next(&iter->hti)) {
-		if (IS_ERR(t)) {
-			if (PTR_ERR(t) == -EAGAIN)
-				continue;
-			break;
-		}
-
-		if (net_eq(sock_net(t->asoc->base.sk), seq_file_net(seq)) &&
-		    t->asoc->peer.primary_path == t)
-			break;
-	}
-
-	return t;
-}
-
-static struct sctp_transport *sctp_transport_get_idx(struct seq_file *seq,
-						     loff_t pos)
-{
-	void *obj = SEQ_START_TOKEN;
-
-	while (pos && (obj = sctp_transport_get_next(seq)) && !IS_ERR(obj))
-		pos--;
-
-	return obj;
-}
-
-static int sctp_transport_walk_start(struct seq_file *seq)
-{
-	struct sctp_ht_iter *iter = seq->private;
-	int err;
-
-	err = rhashtable_walk_init(&sctp_transport_hashtable, &iter->hti);
-	if (err)
-		return err;
-
-	err = rhashtable_walk_start(&iter->hti);
-
-	return err == -EAGAIN ? 0 : err;
-}
-
-static void sctp_transport_walk_stop(struct seq_file *seq)
-{
-	struct sctp_ht_iter *iter = seq->private;
-
-	rhashtable_walk_stop(&iter->hti);
-	rhashtable_walk_exit(&iter->hti);
-}
-
 static void *sctp_assocs_seq_start(struct seq_file *seq, loff_t *pos)
 {
-	int err = sctp_transport_walk_start(seq);
+	struct sctp_ht_iter *iter = seq->private;
+	int err = sctp_transport_walk_start(&iter->hti);
 
 	if (err)
 		return ERR_PTR(err);
 
-	return sctp_transport_get_idx(seq, *pos);
+	return sctp_transport_get_idx(seq_file_net(seq), &iter->hti, *pos);
 }
 
 static void sctp_assocs_seq_stop(struct seq_file *seq, void *v)
 {
-	sctp_transport_walk_stop(seq);
+	struct sctp_ht_iter *iter = seq->private;
+
+	sctp_transport_walk_stop(&iter->hti);
 }
 
 static void *sctp_assocs_seq_next(struct seq_file *seq, void *v, loff_t *pos)
 {
+	struct sctp_ht_iter *iter = seq->private;
+
 	++*pos;
 
-	return sctp_transport_get_next(seq);
+	return sctp_transport_get_next(seq_file_net(seq), &iter->hti);
 }
 
 /* Display sctp associations (/proc/net/sctp/assocs). */
@@ -457,24 +408,29 @@ void sctp_assocs_proc_exit(struct net *net)
 
 static void *sctp_remaddr_seq_start(struct seq_file *seq, loff_t *pos)
 {
-	int err = sctp_transport_walk_start(seq);
+	struct sctp_ht_iter *iter = seq->private;
+	int err = sctp_transport_walk_start(&iter->hti);
 
 	if (err)
 		return ERR_PTR(err);
 
-	return sctp_transport_get_idx(seq, *pos);
+	return sctp_transport_get_idx(seq_file_net(seq), &iter->hti, *pos);
 }
 
 static void *sctp_remaddr_seq_next(struct seq_file *seq, void *v, loff_t *pos)
 {
+	struct sctp_ht_iter *iter = seq->private;
+
 	++*pos;
 
-	return sctp_transport_get_next(seq);
+	return sctp_transport_get_next(seq_file_net(seq), &iter->hti);
 }
 
 static void sctp_remaddr_seq_stop(struct seq_file *seq, void *v)
 {
-	sctp_transport_walk_stop(seq);
+	struct sctp_ht_iter *iter = seq->private;
+
+	sctp_transport_walk_stop(&iter->hti);
 }
 
 static int sctp_remaddr_seq_show(struct seq_file *seq, void *v)
diff --git a/net/sctp/socket.c b/net/sctp/socket.c
index 8f79f23..b0bf6c7 100644
--- a/net/sctp/socket.c
+++ b/net/sctp/socket.c
@@ -4288,6 +4288,130 @@ int sctp_get_sctp_info(struct sock *sk, struct sctp_association *asoc,
 }
 EXPORT_SYMBOL_GPL(sctp_get_sctp_info);
 
+/* use callback to avoid exporting the core structure */
+int sctp_transport_walk_start(struct rhashtable_iter *iter)
+{
+	int err;
+
+	err = rhashtable_walk_init(&sctp_transport_hashtable, iter);
+	if (err)
+		return err;
+
+	err = rhashtable_walk_start(iter);
+
+	return err == -EAGAIN ? 0 : err;
+}
+
+void sctp_transport_walk_stop(struct rhashtable_iter *iter)
+{
+	rhashtable_walk_stop(iter);
+	rhashtable_walk_exit(iter);
+}
+
+struct sctp_transport *sctp_transport_get_next(struct net *net,
+					       struct rhashtable_iter *iter)
+{
+	struct sctp_transport *t;
+
+	t = rhashtable_walk_next(iter);
+	for (; t; t = rhashtable_walk_next(iter)) {
+		if (IS_ERR(t)) {
+			if (PTR_ERR(t) == -EAGAIN)
+				continue;
+			break;
+		}
+
+		if (net_eq(sock_net(t->asoc->base.sk), net) &&
+		    t->asoc->peer.primary_path == t)
+			break;
+	}
+
+	return t;
+}
+
+struct sctp_transport *sctp_transport_get_idx(struct net *net,
+					      struct rhashtable_iter *iter,
+					      int pos)
+{
+	void *obj = SEQ_START_TOKEN;
+
+	while (pos && (obj = sctp_transport_get_next(net, iter)) &&
+	       !IS_ERR(obj))
+		pos--;
+
+	return obj;
+}
+
+int sctp_for_each_endpoint(int (*cb)(struct sctp_endpoint *, void *),
+			   void *p) {
+	int err = 0;
+	int hash = 0;
+	struct sctp_ep_common *epb;
+	struct sctp_hashbucket *head;
+
+	for (head = sctp_ep_hashtable; hash < sctp_ep_hashsize;
+	     hash++, head++) {
+		read_lock(&head->lock);
+		sctp_for_each_hentry(epb, &head->chain) {
+			err = cb(sctp_ep(epb), p);
+			if (err)
+				break;
+		}
+		read_unlock(&head->lock);
+	}
+
+	return err;
+}
+EXPORT_SYMBOL_GPL(sctp_for_each_endpoint);
+
+int sctp_transport_lookup_process(int (*cb)(struct sctp_transport *, void *),
+				  struct net *net,
+				  const union sctp_addr *laddr,
+				  const union sctp_addr *paddr, void *p)
+{
+	struct sctp_transport *transport;
+	int err = 0;
+
+	rcu_read_lock();
+	transport = sctp_addrs_lookup_transport(net, laddr, paddr);
+	if (!transport || !sctp_transport_hold(transport))
+		goto out;
+	err = cb(transport, p);
+	sctp_transport_put(transport);
+
+out:
+	rcu_read_unlock();
+	return err;
+}
+EXPORT_SYMBOL_GPL(sctp_transport_lookup_process);
+
+int sctp_for_each_transport(int (*cb)(struct sctp_transport *, void *),
+			    struct net *net, int pos, void *p) {
+	struct rhashtable_iter hti;
+	int err = 0;
+	void *obj;
+
+	if (sctp_transport_walk_start(&hti))
+		goto out;
+
+	sctp_transport_get_idx(net, &hti, pos);
+	obj = sctp_transport_get_next(net, &hti);
+	for (; obj && !IS_ERR(obj); obj = sctp_transport_get_next(net, &hti)) {
+		struct sctp_transport *transport = obj;
+
+		if (!sctp_transport_hold(transport))
+			continue;
+		err = cb(transport, p);
+		sctp_transport_put(transport);
+		if (err)
+			break;
+	}
+out:
+	sctp_transport_walk_stop(&hti);
+	return err;
+}
+EXPORT_SYMBOL_GPL(sctp_for_each_transport);
+
 /* 7.2.1 Association Status (SCTP_STATUS)
 
  * Applications can retrieve current status information about an
-- 
2.1.0

^ permalink raw reply related

* [PATCHv2 net-next 1/6] sctp: add sctp_info dump api for sctp_diag
From: Xin Long @ 2016-04-09  4:53 UTC (permalink / raw)
  To: network dev, linux-sctp
  Cc: Marcelo Ricardo Leitner, Vlad Yasevich, daniel, davem
In-Reply-To: <cover.1460177331.git.lucien.xin@gmail.com>

sctp_diag will dump some important details of sctp's assoc or ep, we use
sctp_info to describe them,  sctp_get_sctp_info to get them, and export
it to sctp_diag.ko.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
---
 include/linux/sctp.h    | 65 +++++++++++++++++++++++++++++++++++++
 include/net/sctp/sctp.h |  3 ++
 net/sctp/socket.c       | 86 +++++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 154 insertions(+)

diff --git a/include/linux/sctp.h b/include/linux/sctp.h
index a9414fd..a448ebc 100644
--- a/include/linux/sctp.h
+++ b/include/linux/sctp.h
@@ -705,4 +705,69 @@ typedef struct sctp_auth_chunk {
 	sctp_authhdr_t auth_hdr;
 } __packed sctp_auth_chunk_t;
 
+struct sctp_info {
+	__u32	sctpi_tag;
+	__u32	sctpi_state;
+	__u32	sctpi_rwnd;
+	__u16	sctpi_unackdata;
+	__u16	sctpi_penddata;
+	__u16	sctpi_instrms;
+	__u16	sctpi_outstrms;
+	__u32	sctpi_fragmentation_point;
+	__u32	sctpi_inqueue;
+	__u32	sctpi_outqueue;
+	__u32	sctpi_overall_error;
+	__u32	sctpi_max_burst;
+	__u32	sctpi_maxseg;
+	__u32	sctpi_peer_rwnd;
+	__u32	sctpi_peer_tag;
+	__u8	sctpi_peer_capable;
+	__u8	sctpi_peer_sack;
+
+	/* assoc status info */
+	__u64	sctpi_isacks;
+	__u64	sctpi_osacks;
+	__u64	sctpi_opackets;
+	__u64	sctpi_ipackets;
+	__u64	sctpi_rtxchunks;
+	__u64	sctpi_outofseqtsns;
+	__u64	sctpi_idupchunks;
+	__u64	sctpi_gapcnt;
+	__u64	sctpi_ouodchunks;
+	__u64	sctpi_iuodchunks;
+	__u64	sctpi_oodchunks;
+	__u64	sctpi_iodchunks;
+	__u64	sctpi_octrlchunks;
+	__u64	sctpi_ictrlchunks;
+
+	/* primary transport info */
+	struct sockaddr_storage	sctpi_p_address;
+	__s32	sctpi_p_state;
+	__u32	sctpi_p_cwnd;
+	__u32	sctpi_p_srtt;
+	__u32	sctpi_p_rto;
+	__u32	sctpi_p_hbinterval;
+	__u32	sctpi_p_pathmaxrxt;
+	__u32	sctpi_p_sackdelay;
+	__u32	sctpi_p_sackfreq;
+	__u32	sctpi_p_ssthresh;
+	__u32	sctpi_p_partial_bytes_acked;
+	__u32	sctpi_p_flight_size;
+	__u16	sctpi_p_error;
+
+	/* sctp sock info */
+	__u32	sctpi_s_autoclose;
+	__u32	sctpi_s_adaptation_ind;
+	__u32	sctpi_s_pd_point;
+	__u8	sctpi_s_nodelay;
+	__u8	sctpi_s_disable_fragments;
+	__u8	sctpi_s_v4mapped;
+	__u8	sctpi_s_frag_interleave;
+};
+
+struct sctp_infox {
+	struct sctp_info *sctpinfo;
+	struct sctp_association *asoc;
+};
+
 #endif /* __LINUX_SCTP_H__ */
diff --git a/include/net/sctp/sctp.h b/include/net/sctp/sctp.h
index 65521cf..36e1eae 100644
--- a/include/net/sctp/sctp.h
+++ b/include/net/sctp/sctp.h
@@ -116,6 +116,9 @@ extern struct percpu_counter sctp_sockets_allocated;
 int sctp_asconf_mgmt(struct sctp_sock *, struct sctp_sockaddr_entry *);
 struct sk_buff *sctp_skb_recv_datagram(struct sock *, int, int, int *);
 
+int sctp_get_sctp_info(struct sock *sk, struct sctp_association *asoc,
+		       struct sctp_info *info);
+
 /*
  * sctp/primitive.c
  */
diff --git a/net/sctp/socket.c b/net/sctp/socket.c
index 878d28e..8f79f23 100644
--- a/net/sctp/socket.c
+++ b/net/sctp/socket.c
@@ -4202,6 +4202,92 @@ static void sctp_shutdown(struct sock *sk, int how)
 	}
 }
 
+int sctp_get_sctp_info(struct sock *sk, struct sctp_association *asoc,
+		       struct sctp_info *info)
+{
+	struct sctp_transport *prim;
+	struct list_head *pos, *temp;
+	int mask;
+
+	memset(info, 0, sizeof(*info));
+	if (!asoc) {
+		struct sctp_sock *sp = sctp_sk(sk);
+
+		info->sctpi_s_autoclose = sp->autoclose;
+		info->sctpi_s_adaptation_ind = sp->adaptation_ind;
+		info->sctpi_s_pd_point = sp->pd_point;
+		info->sctpi_s_nodelay = sp->nodelay;
+		info->sctpi_s_disable_fragments = sp->disable_fragments;
+		info->sctpi_s_v4mapped = sp->v4mapped;
+		info->sctpi_s_frag_interleave = sp->frag_interleave;
+
+		return 0;
+	}
+
+	info->sctpi_tag = asoc->c.my_vtag;
+	info->sctpi_state = asoc->state;
+	info->sctpi_rwnd = asoc->a_rwnd;
+	info->sctpi_unackdata = asoc->unack_data;
+	info->sctpi_penddata = sctp_tsnmap_pending(&asoc->peer.tsn_map);
+	info->sctpi_instrms = asoc->c.sinit_max_instreams;
+	info->sctpi_outstrms = asoc->c.sinit_num_ostreams;
+	list_for_each_safe(pos, temp, &asoc->base.inqueue.in_chunk_list)
+		info->sctpi_inqueue++;
+	list_for_each_safe(pos, temp, &asoc->outqueue.out_chunk_list)
+		info->sctpi_outqueue++;
+	info->sctpi_overall_error = asoc->overall_error_count;
+	info->sctpi_max_burst = asoc->max_burst;
+	info->sctpi_maxseg = asoc->frag_point;
+	info->sctpi_peer_rwnd = asoc->peer.rwnd;
+	info->sctpi_peer_tag = asoc->c.peer_vtag;
+
+	mask = asoc->peer.ecn_capable << 1;
+	mask = (mask | asoc->peer.ipv4_address) << 1;
+	mask = (mask | asoc->peer.ipv6_address) << 1;
+	mask = (mask | asoc->peer.hostname_address) << 1;
+	mask = (mask | asoc->peer.asconf_capable) << 1;
+	mask = (mask | asoc->peer.prsctp_capable) << 1;
+	mask = (mask | asoc->peer.auth_capable);
+	info->sctpi_peer_capable = mask;
+	mask = asoc->peer.sack_needed << 1;
+	mask = (mask | asoc->peer.sack_generation) << 1;
+	mask = (mask | asoc->peer.zero_window_announced);
+	info->sctpi_peer_sack = mask;
+
+	info->sctpi_isacks = asoc->stats.isacks;
+	info->sctpi_osacks = asoc->stats.osacks;
+	info->sctpi_opackets = asoc->stats.opackets;
+	info->sctpi_ipackets = asoc->stats.ipackets;
+	info->sctpi_rtxchunks = asoc->stats.rtxchunks;
+	info->sctpi_outofseqtsns = asoc->stats.outofseqtsns;
+	info->sctpi_idupchunks = asoc->stats.idupchunks;
+	info->sctpi_gapcnt = asoc->stats.gapcnt;
+	info->sctpi_ouodchunks = asoc->stats.ouodchunks;
+	info->sctpi_iuodchunks = asoc->stats.iuodchunks;
+	info->sctpi_oodchunks = asoc->stats.oodchunks;
+	info->sctpi_iodchunks = asoc->stats.iodchunks;
+	info->sctpi_octrlchunks = asoc->stats.octrlchunks;
+	info->sctpi_ictrlchunks = asoc->stats.ictrlchunks;
+
+	prim = asoc->peer.primary_path;
+	memcpy(&info->sctpi_p_address, &prim->ipaddr,
+	       sizeof(struct sockaddr_storage));
+	info->sctpi_p_state = prim->state;
+	info->sctpi_p_cwnd = prim->cwnd;
+	info->sctpi_p_srtt = prim->srtt;
+	info->sctpi_p_rto = jiffies_to_msecs(prim->rto);
+	info->sctpi_p_hbinterval = prim->hbinterval;
+	info->sctpi_p_pathmaxrxt = prim->pathmaxrxt;
+	info->sctpi_p_sackdelay = jiffies_to_msecs(prim->sackdelay);
+	info->sctpi_p_ssthresh = prim->ssthresh;
+	info->sctpi_p_partial_bytes_acked = prim->partial_bytes_acked;
+	info->sctpi_p_flight_size = prim->flight_size;
+	info->sctpi_p_error = prim->error_count;
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(sctp_get_sctp_info);
+
 /* 7.2.1 Association Status (SCTP_STATUS)
 
  * Applications can retrieve current status information about an
-- 
2.1.0

^ permalink raw reply related

* [PATCHv2 net-next 0/6] sctp: support sctp_diag in kernel
From: Xin Long @ 2016-04-09  4:53 UTC (permalink / raw)
  To: network dev, linux-sctp
  Cc: Marcelo Ricardo Leitner, Vlad Yasevich, daniel, davem

This patchset will add sctp_diag module to implement diag interface on
sctp in kernel.

For a listening sctp endpoint, we will just dump it's ep info.
For a sctp connection, we will the assoc info and it's ep info.

The ss dump will looks like:

[iproute2]# ./misc/ss --sctp  -n -l
State      Recv-Q Send-Q   Local Address:Port       Peer Address:Port
LISTEN     0      128      172.16.254.254:8888      *:*
LISTEN     0      5        127.0.0.1:1234           *:*
LISTEN     0      5        127.0.0.1:1234           *:*
  - ESTAB  0      0        127.0.0.1%lo:1234        127.0.0.1:4321
LISTEN     0      128      172.16.254.254:8888      *:*
  - ESTAB  0      0        172.16.254.254%eth1:8888 172.16.253.253:8888
  - ESTAB  0      0        172.16.254.254%eth1:8888 172.16.1.1:8888
  - ESTAB  0      0        172.16.254.254%eth1:8888 172.16.1.2:8888
  - ESTAB  0      0        172.16.254.254%eth1:8888 172.16.2.1:8888
  - ESTAB  0      0        172.16.254.254%eth1:8888 172.16.2.2:8888
  - ESTAB  0      0        172.16.254.254%eth1:8888 172.16.3.1:8888
  - ESTAB  0      0        172.16.254.254%eth1:8888 172.16.3.2:8888
LISTEN     0      0        127.0.0.1:4321           *:*
  - ESTAB  0      0        127.0.0.1%lo:4321        127.0.0.1:1234

The entries with '- ESTAB' are the assocs, some of them may belong to
the same endpoint. So we will dump the parent endpoint first, like the
entry with 'LISTEN'. then dump the assocs. ep and assocs entries will
be dumped in right order so that ss can show them in tree format easily.

Besides, this patchset also simplifies sctp proc codes, cause it has
some similar codes with sctp diag in sctp transport traversal.

v1->v2:
  1. inet_diag_get_handler needs to return it as const.
  2. merge 5/7 into 2/7 of v1.

Xin Long (6):
  sctp: add sctp_info dump api for sctp_diag
  sctp: export some apis or variables for sctp_diag and reuse some for
    proc
  sctp: export some functions for sctp_diag in inet_diag
  sctp: add the sctp_diag.c file
  sctp: merge the seq_start/next/exits in remaddrs and assocs
  sctp: fix some rhashtable functions using in sctp proc/diag

 include/linux/sctp.h           |  65 +++++
 include/net/sctp/sctp.h        |  16 ++
 include/uapi/linux/inet_diag.h |   2 +
 net/ipv4/inet_diag.c           |   9 +-
 net/sctp/Kconfig               |   4 +
 net/sctp/Makefile              |   1 +
 net/sctp/proc.c                | 104 ++------
 net/sctp/sctp_diag.c           | 581 +++++++++++++++++++++++++++++++++++++++++
 net/sctp/socket.c              | 215 +++++++++++++++
 9 files changed, 911 insertions(+), 86 deletions(-)
 create mode 100644 net/sctp/sctp_diag.c

-- 
2.1.0

^ permalink raw reply

* Re: [PATCH net-next] ibmvnic: Enable use of multiple tx/rx scrqs
From: David Miller @ 2016-04-09  4:24 UTC (permalink / raw)
  To: jallen; +Cc: tlfalcon, netdev, linuxppc-dev
In-Reply-To: <57053E33.6020706@linux.vnet.ibm.com>

From: John Allen <jallen@linux.vnet.ibm.com>
Date: Wed, 6 Apr 2016 11:49:55 -0500

> Enables the use of multiple transmit and receive scrqs allowing the ibmvnic
> driver to take advantage of multiqueue functionality. To achieve this, the
> driver must implement the process of negotiating the maximum number of
> queues allowed by the server. Initially, the driver will attempt to login
> with the maximum number of tx and rx queues supported by the server. If
> the server fails to allocate the requested number of scrqs, it will return
> partial success in the login response. In this case, we must reinitiate
> the login process from the request capabilities stage and attempt to login
> requesting fewer scrqs.
> 
> Signed-off-by: John Allen <jallen@linux.vnet.ibm.com>

Applied, thanks.

^ permalink raw reply

* Re: [PATCH net-next] net: bcmgenet: add BQL support
From: Petri Gynther @ 2016-04-09  4:13 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, netdev, Florian Fainelli, opendmb, Jaedon Shin
In-Reply-To: <1460166979.6473.451.camel@edumazet-glaptop3.roam.corp.google.com>

On Fri, Apr 8, 2016 at 6:56 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Fri, 2016-04-08 at 18:39 -0700, Petri Gynther wrote:
>> On Fri, Apr 8, 2016 at 1:36 PM, David Miller <davem@davemloft.net> wrote:
>> > From: Petri Gynther <pgynther@google.com>
>> > Date: Tue,  5 Apr 2016 17:50:01 -0700
>> >
>> >> Add Byte Queue Limits (BQL) support to bcmgenet driver.
>> >>
>> >> Signed-off-by: Petri Gynther <pgynther@google.com>
>> >
>> > As Eric Dumazet indicated, your ->ndo_init() code to reset the queues is
>> > probably not necessary at all.
>>
>> I added the netdev_tx_reset_queue(txq) calls to ndo_open() path:
>> netdev->ndo_open()
>>   bcmgenet_open()
>>     bcmgenet_netif_start()
>>       for all Tx queues:
>>         netdev_tx_reset_queue(txq)
>>           clear __QUEUE_STATE_STACK_XOFF
>>           dql_reset()
>>       netif_tx_start_all_queues(dev)
>>         for all Tx queues:
>>           clear __QUEUE_STATE_DRV_XOFF
>>
>> So, I think the call to netdev_tx_reset_queue(txq) is in the right
>> place. It ensures that the Tx queue state is clean when the device is
>> opened.
>
>
> The netdev_tx_reset_queue(txq) calls are only needed in exceptional
> conditions.
>
> Not at device start, as the core networking layer init all txq
> (including their BQL state) properly before giving them to drivers for
> use.
>

What values does the networking core program into BQL dynamic limits
that my code in netdev->ndo_open() would wipe out?

You mentioned the queue init path:
netdev_init_one_queue() -> dql_init() -> dql_reset()

that is called when the netdev is created and Tx queues allocated.

But, does the networking core somewhere set *different* values for BQL
dynamic limits than what dql_reset() did, before opening the device?

> For example, tg3 calls netdev_tx_reset_queue() only when freeing tx
> rings, as it might have freed skb(s) not from normal TX complete path
> and thus missed appropriate dql_completed().
>

Looking at the tg3 driver, it calls:
tg3_stop()
  tg3_free_rings()
    netdev_tx_reset_queue()

netdev_tx_reset_queue() is called unconditionally, as long as the Tx
ring exists. So "ip link set dev eth<x> down" would cause it to be
called.

Why is it OK to call netdev_tx_reset_queue() from the
netdev->ndo_stop() path, but not from netdev->ndo_open() path?

> If you believe BQL drivers need a fix, please elaborate ?
>
> Thanks.
>
>

^ permalink raw reply

* Re: [PATCH net-next] net: bcmgenet: add BQL support
From: Alexander Duyck @ 2016-04-09  2:26 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Petri Gynther, David Miller, netdev, Florian Fainelli, opendmb,
	Jaedon Shin
In-Reply-To: <1460166979.6473.451.camel@edumazet-glaptop3.roam.corp.google.com>

On Fri, Apr 8, 2016 at 6:56 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Fri, 2016-04-08 at 18:39 -0700, Petri Gynther wrote:
>> On Fri, Apr 8, 2016 at 1:36 PM, David Miller <davem@davemloft.net> wrote:
>> > From: Petri Gynther <pgynther@google.com>
>> > Date: Tue,  5 Apr 2016 17:50:01 -0700
>> >
>> >> Add Byte Queue Limits (BQL) support to bcmgenet driver.
>> >>
>> >> Signed-off-by: Petri Gynther <pgynther@google.com>
>> >
>> > As Eric Dumazet indicated, your ->ndo_init() code to reset the queues is
>> > probably not necessary at all.
>>
>> I added the netdev_tx_reset_queue(txq) calls to ndo_open() path:
>> netdev->ndo_open()
>>   bcmgenet_open()
>>     bcmgenet_netif_start()
>>       for all Tx queues:
>>         netdev_tx_reset_queue(txq)
>>           clear __QUEUE_STATE_STACK_XOFF
>>           dql_reset()
>>       netif_tx_start_all_queues(dev)
>>         for all Tx queues:
>>           clear __QUEUE_STATE_DRV_XOFF
>>
>> So, I think the call to netdev_tx_reset_queue(txq) is in the right
>> place. It ensures that the Tx queue state is clean when the device is
>> opened.
>
>
> The netdev_tx_reset_queue(txq) calls are only needed in exceptional
> conditions.
>
> Not at device start, as the core networking layer init all txq
> (including their BQL state) properly before giving them to drivers for
> use.
>
> For example, tg3 calls netdev_tx_reset_queue() only when freeing tx
> rings, as it might have freed skb(s) not from normal TX complete path
> and thus missed appropriate dql_completed().
>
> If you believe BQL drivers need a fix, please elaborate ?
>
> Thanks.

For a bit of history on why you might want to do the reset on clean-up
instead of init you might take a look at commit dad8a3b3eaa0 ("igb,
ixgbe: netdev_tx_reset_queue incorrectly called from tx init path").

Basically you want to make certain you flush the queues after bringing
the interface down so that you don't possibly trigger any false hangs
for having stalled queues.  Basically the rule with the BQL stuff is
you need to leave the Tx queue in the state you found it in instead of
just wiping it and making use of it and putting it away dirty.

- Alex

^ permalink raw reply

* Re: [PATCH net-next] net: bcmgenet: add BQL support
From: Eric Dumazet @ 2016-04-09  1:56 UTC (permalink / raw)
  To: Petri Gynther
  Cc: David Miller, netdev, Florian Fainelli, opendmb, Jaedon Shin
In-Reply-To: <CAGXr9JE8Qp-u4gjm08oB8nXDoQxFM12=Da2=MZbWcZrRXOCi_w@mail.gmail.com>

On Fri, 2016-04-08 at 18:39 -0700, Petri Gynther wrote:
> On Fri, Apr 8, 2016 at 1:36 PM, David Miller <davem@davemloft.net> wrote:
> > From: Petri Gynther <pgynther@google.com>
> > Date: Tue,  5 Apr 2016 17:50:01 -0700
> >
> >> Add Byte Queue Limits (BQL) support to bcmgenet driver.
> >>
> >> Signed-off-by: Petri Gynther <pgynther@google.com>
> >
> > As Eric Dumazet indicated, your ->ndo_init() code to reset the queues is
> > probably not necessary at all.
> 
> I added the netdev_tx_reset_queue(txq) calls to ndo_open() path:
> netdev->ndo_open()
>   bcmgenet_open()
>     bcmgenet_netif_start()
>       for all Tx queues:
>         netdev_tx_reset_queue(txq)
>           clear __QUEUE_STATE_STACK_XOFF
>           dql_reset()
>       netif_tx_start_all_queues(dev)
>         for all Tx queues:
>           clear __QUEUE_STATE_DRV_XOFF
> 
> So, I think the call to netdev_tx_reset_queue(txq) is in the right
> place. It ensures that the Tx queue state is clean when the device is
> opened.


The netdev_tx_reset_queue(txq) calls are only needed in exceptional
conditions.

Not at device start, as the core networking layer init all txq
(including their BQL state) properly before giving them to drivers for
use.

For example, tg3 calls netdev_tx_reset_queue() only when freeing tx
rings, as it might have freed skb(s) not from normal TX complete path
and thus missed appropriate dql_completed().

If you believe BQL drivers need a fix, please elaborate ?

Thanks.

^ permalink raw reply

* Re: [PATCH net-next] net: bcmgenet: add BQL support
From: Petri Gynther @ 2016-04-09  1:39 UTC (permalink / raw)
  To: David Miller, Eric Dumazet; +Cc: netdev, Florian Fainelli, opendmb, Jaedon Shin
In-Reply-To: <20160408.163648.2055481752216023669.davem@davemloft.net>

On Fri, Apr 8, 2016 at 1:36 PM, David Miller <davem@davemloft.net> wrote:
> From: Petri Gynther <pgynther@google.com>
> Date: Tue,  5 Apr 2016 17:50:01 -0700
>
>> Add Byte Queue Limits (BQL) support to bcmgenet driver.
>>
>> Signed-off-by: Petri Gynther <pgynther@google.com>
>
> As Eric Dumazet indicated, your ->ndo_init() code to reset the queues is
> probably not necessary at all.

I added the netdev_tx_reset_queue(txq) calls to ndo_open() path:
netdev->ndo_open()
  bcmgenet_open()
    bcmgenet_netif_start()
      for all Tx queues:
        netdev_tx_reset_queue(txq)
          clear __QUEUE_STATE_STACK_XOFF
          dql_reset()
      netif_tx_start_all_queues(dev)
        for all Tx queues:
          clear __QUEUE_STATE_DRV_XOFF

So, I think the call to netdev_tx_reset_queue(txq) is in the right
place. It ensures that the Tx queue state is clean when the device is
opened.

^ permalink raw reply

* RE: [PATCH -v2] drivers: net: ethernet: intel: e1000e: fix ethtool autoneg off for non-copper
From: Brown, Aaron F @ 2016-04-09  1:03 UTC (permalink / raw)
  To: Daniel Walker, Ruinskiy, Dima, Kirsher, Jeffrey T,
	Brandeburg, Jesse, Nelson, Shannon, Wyborny, Carolyn,
	Skidmore, Donald C, Allan, Bruce W, Ronciak, John,
	Williams, Mitch A
  Cc: Steve Shih, xe-kernel@external.cisco.com, Daniel Walker,
	intel-wired-lan@lists.osuosl.org, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org
In-Reply-To: <1459881004-13992-1-git-send-email-danielwa@cisco.com>

> From: netdev-owner@vger.kernel.org [mailto:netdev-
> owner@vger.kernel.org] On Behalf Of Daniel Walker
> Sent: Tuesday, April 5, 2016 11:30 AM
> To: Ruinskiy, Dima <dima.ruinskiy@intel.com>; Kirsher, Jeffrey T
> <jeffrey.t.kirsher@intel.com>; Brandeburg, Jesse
> <jesse.brandeburg@intel.com>; Nelson, Shannon
> <shannon.nelson@intel.com>; Wyborny, Carolyn
> <carolyn.wyborny@intel.com>; Skidmore, Donald C
> <donald.c.skidmore@intel.com>; Allan, Bruce W <bruce.w.allan@intel.com>;
> Ronciak, John <john.ronciak@intel.com>; Williams, Mitch A
> <mitch.a.williams@intel.com>
> Cc: Steve Shih <sshih@cisco.com>; xe-kernel@external.cisco.com; Daniel
> Walker <dwalker@fifo99.com>; intel-wired-lan@lists.osuosl.org;
> netdev@vger.kernel.org; linux-kernel@vger.kernel.org
> Subject: [PATCH -v2] drivers: net: ethernet: intel: e1000e: fix ethtool autoneg
> off for non-copper
> 
> From: Steve Shih <sshih@cisco.com>
> 
> This patch fixes the issues for disabling auto-negotiation and forcing
> speed and duplex settings for the non-copper media.
> 
> For non-copper media, e1000_get_settings should return
> ETH_TP_MDI_INVALID for
> eth_tp_mdix_ctrl instead of ETH_TP_MDI_AUTO so subsequent
> e1000_set_settings
> call would not fail with -EOPNOTSUPP.
> 
> e1000_set_spd_dplx should not automatically turn autoneg back on for
> forced
> 1000 Mbps full duplex settings for non-copper media.
> 
> Cc: xe-kernel@external.cisco.com
> Cc: Daniel Walker <dwalker@fifo99.com>
> Signed-off-by: Steve Shih <sshih@cisco.com>
> ---
>  drivers/net/ethernet/intel/e1000e/ethtool.c | 11 +++++++++--
>  1 file changed, 9 insertions(+), 2 deletions(-)

Tested-by: Aaron Brown <aaron.f.brown@intel.com>

^ permalink raw reply

* Re: How do I avoid recvmsg races with IP_RECVERR?
From: Andy Lutomirski @ 2016-04-09  0:02 UTC (permalink / raw)
  To: Hannes Frederic Sowa; +Cc: Andy Lutomirski, Network Development
In-Reply-To: <1433291591.3300318.285256449.713A7E1E@webmail.messagingengine.com>

On Tue, Jun 2, 2015 at 5:33 PM, Hannes Frederic Sowa
<hannes@stressinduktion.org> wrote:
> On Wed, Jun 3, 2015, at 02:03, Andy Lutomirski wrote:
>> On Tue, Jun 2, 2015 at 2:50 PM, Hannes Frederic Sowa
>> <hannes@stressinduktion.org> wrote:
>> >> My proposal would be to make the error conversion lazy:
>> >>
>> >> Keeping duplicate data is not a good idea in general: So we shouldn't
>> >> use sk->sk_err if IP_RECVERR is set at all but let sock_error just use
>> >> the sk_error_queue and extract the error code from there.
>> >>
>> >> Only if IP_RECVERR was not set, we use sk->sk_err logic.
>> >>
>> >> What do you think?
>> >
>> > I just noticed that this will probably break existing user space
>> > applications which require that icmp errors are transient even with
>> > IP_RECVERR. We can mark that with a bit in the sk_error_queue pointer
>> > and xchg the pointer, hmmm....
>>
>> Do you mean to fix the race like this but to otherwise leave the
>> semantics
>> alone?  That would be an improvement, but it might be nice to also add
>> a non-crappy API for this, too.
>
> Yes, keep current semantics but fix the race you reported.
>
> I currently don't have good proposals for a decent API to handle this
> besides adding some ancillary cmsg data to msg_control. This still would
> not solve the problem fundamentally, as a -EFAULT/-EINVAL return value
> could also mean that msg_control should not be touched, thus we end up
> again relying on errno checking. :/ Thus checking error queue after
> receiving an error indications is my best hunch so far.
>
> Your proposal with MSG_IGNORE_ERROR seems reasonable so far for ping or
> udp, but I haven't fully grasped the TCP semantics of sk->sk_err, yet.

I was looking at this a bit, and I was thinking about adding a new
socket option, but I'm a bit vague on how all this fits together.

One option would be a socket option that simply causes sock_error to
return 0 (and change SO_ERROR to peek at sk_err directly).  But there
seem to be sock_error callers all over the place, and maybe this
change would cause problems.

Another option would be to add a socket option that explicitly turns
off everything that queues soft errors to sk_err.

I think that, for IP datagrams at least, the ideal semantics would be
for soft errors not to affect sk_err and for POLLERR to be set if the
error queue is nonempty.

--Andy

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox