Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH v4 3/4] can: kvaser_usb: Add support for the Usbcan-II family
From: Marc Kleine-Budde @ 2015-01-18 20:13 UTC (permalink / raw)
  To: Ahmed S. Darwish, Olivier Sobrie
  Cc: Oliver Hartkopp, Wolfgang Grandegger, David S. Miller,
	Paul Gortmaker, Linux-CAN, netdev, LKML
In-Reply-To: <20150118201221.GA15143@linux>

[-- Attachment #1: Type: text/plain, Size: 2137 bytes --]

On 01/18/2015 09:12 PM, Ahmed S. Darwish wrote:
> Hi!
> 
> On Mon, Jan 12, 2015 at 02:53:02PM +0100, Olivier Sobrie wrote:
>> Hello,
>>
>> On Sun, Jan 11, 2015 at 03:36:12PM -0500, Ahmed S. Darwish wrote:
>>> From: Ahmed S. Darwish <ahmed.darwish@valeo.com>
>>>
> 
> ...
> 
>>> @@ -98,7 +128,13 @@
>>>  #define CMD_START_CHIP_REPLY		27
>>>  #define CMD_STOP_CHIP			28
>>>  #define CMD_STOP_CHIP_REPLY		29
>>> -#define CMD_GET_CARD_INFO2		32
>>> +#define CMD_READ_CLOCK			30
>>> +#define CMD_READ_CLOCK_REPLY		31
>>
>> These two defines are not used.
>>
> 
> They were added for completeness: the only gap in our continuous
> sequence of command IDs from 12 to 39 ;-) No big deal, to be
> removed in the next submission.
> 
> ...
> 
>>> +
>>> +struct kvaser_msg_tx_acknowledge_header {
>>> +	u8 channel;
>>> +	u8 tid;
>>> +};
>>
>> Is this struct really needed? Can't you simply use
>> leaf_msg_tx_acknowledge or usbcan_msg_tx_acknowledge
>> structures to read the header.
>> Same for kvaser_msg_rx_can_header.
>>
> 
> They're added to ensure type-safety throughout the code. Basically
> they're the common part of a command that has different wire format
> between the Leaf and the USBCan, but share a common header.  Such
> notation was only added when it was strictly necessary.
> 
> For example, there are three functions where 'rx_can_header' is
> referenced in the driver, and one function where 'tx_acknowledge_header'
> is referenced. Without such header structure, I'll have to sprinkle
> 3 to 4 extra blocks of:
> 
> 	switch (dev->family) {
> 	       case KVASER_LEAF: 
> 	       case KVASER_USBCAN:
> 	}
> 
> which would be _really_ ugly. The *_header notation ensures that, in
> the body of each function, we're accessing the fields in a very safe
> manner.

+1 Keep it as it is.

Marc

-- 
Pengutronix e.K.                  | Marc Kleine-Budde           |
Industrial Linux Solutions        | Phone: +49-231-2826-924     |
Vertretung West/Dortmund          | Fax:   +49-5121-206917-5555 |
Amtsgericht Hildesheim, HRA 2686  | http://www.pengutronix.de   |


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply

* [PATCH iproute2 0/3] Unify output for inet sockets
From: Vadim Kochan @ 2015-01-18 20:43 UTC (permalink / raw)
  To: netdev; +Cc: Vadim Kochan

This is series is about to have output information from /proc and
Netlink in one place.

Some short summary:

    1) Make memory info more readable.
    2) Unify output of inet sockets from /proc and Netlink.
    3) Added new '-H, --human' option to show info in more human format
        which is used for memory info meanwhile.

Vadim Kochan (3):
  ss: Make meminfo look little bit more readable
  ss: Unify inet sockets output
  ss: Unify tcp stats output

 include/utils.h |   3 +
 ip/ipaddress.c  |  31 +--
 lib/utils.c     |  40 ++-
 misc/ss.c       | 737 ++++++++++++++++++++++++++++++--------------------------
 4 files changed, 437 insertions(+), 374 deletions(-)

-- 
2.1.3

^ permalink raw reply

* [PATCH iproute2 1/3] ss: Make meminfo look little bit more readable
From: Vadim Kochan @ 2015-01-18 20:43 UTC (permalink / raw)
  To: netdev; +Cc: Vadim Kochan
In-Reply-To: <1421613815-6635-1-git-send-email-vadim4j@gmail.com>

From: Vadim Kochan <vadim4j@gmail.com>

Signed-off-by: Vadim Kochan <vadim4j@gmail.com>
---
 include/utils.h |  3 +++
 ip/ipaddress.c  | 31 ++--------------------
 lib/utils.c     | 40 +++++++++++++++++++++++++++-
 misc/ss.c       | 82 ++++++++++++++++++++++++++++++++++-----------------------
 4 files changed, 93 insertions(+), 63 deletions(-)

diff --git a/include/utils.h b/include/utils.h
index e1fe7cf..6b0b76c 100644
--- a/include/utils.h
+++ b/include/utils.h
@@ -5,6 +5,7 @@
 #include <asm/types.h>
 #include <resolv.h>
 #include <stdlib.h>
+#include <stdbool.h>
 
 #include "libnetlink.h"
 #include "ll_map.h"
@@ -162,4 +163,6 @@ int iplink_parse(int argc, char **argv, struct iplink_req *req,
 		char **name, char **type, char **link, char **dev,
 		int *group, int *index);
 
+char *sprint_num(char *str, uint64_t num, bool use_iec);
+
 #endif /* __UTILS_H__ */
diff --git a/ip/ipaddress.c b/ip/ipaddress.c
index d5e863d..5a2a956 100644
--- a/ip/ipaddress.c
+++ b/ip/ipaddress.c
@@ -362,41 +362,14 @@ static void print_vfinfo(FILE *fp, struct rtattr *vfinfo)
 
 static void print_num(FILE *fp, unsigned width, uint64_t count)
 {
-	const char *prefix = "kMGTPE";
-	const unsigned int base = use_iec ? 1024 : 1000;
-	uint64_t powi = 1;
-	uint16_t powj = 1;
-	uint8_t precision = 2;
 	char buf[64];
 
-	if (!human_readable || count < base) {
+	if (!human_readable) {
 		fprintf(fp, "%-*"PRIu64" ", width, count);
 		return;
 	}
 
-	/* increase value by a factor of 1000/1024 and print
-	 * if result is something a human can read */
-	for(;;) {
-		powi *= base;
-		if (count / base < powi)
-			break;
-
-		if (!prefix[1])
-			break;
-		++prefix;
-	}
-
-	/* try to guess a good number of digits for precision */
-	for (; precision > 0; precision--) {
-		powj *= 10;
-		if (count / powi < powj)
-			break;
-	}
-
-	snprintf(buf, sizeof(buf), "%.*f%c%s", precision,
-		(double) count / powi, *prefix, use_iec ? "i" : "");
-
-	fprintf(fp, "%-*s ", width, buf);
+	fprintf(fp, "%-*s ", width, sprint_num(buf, count, use_iec));
 }
 
 static void print_link_stats64(FILE *fp, const struct rtnl_link_stats64 *s,
diff --git a/lib/utils.c b/lib/utils.c
index f65ceaa..21e2e78 100644
--- a/lib/utils.c
+++ b/lib/utils.c
@@ -28,7 +28,7 @@
 #include <time.h>
 #include <sys/time.h>
 #include <errno.h>
-
+#include <inttypes.h>
 
 #include "utils.h"
 
@@ -878,3 +878,41 @@ void print_nlmsg_timestamp(FILE *fp, const struct nlmsghdr *n)
 	tstr[strlen(tstr)-1] = 0;
 	fprintf(fp, "Timestamp: %s %lu us\n", tstr, usecs);
 }
+
+char *sprint_num(char *str, uint64_t num, bool use_iec)
+{
+	const char *prefix = "kMGTPE";
+	const unsigned int base = use_iec ? 1024 : 1000;
+	uint64_t powi = 1;
+	uint16_t powj = 1;
+	uint8_t precision = 2;
+
+	if (num < base) {
+		sprintf(str, "%"PRIu64, num);
+		return str;
+	}
+
+	/* increase value by a factor of 1000/1024 and print
+	 * if result is something a human can read */
+	for(;;) {
+		powi *= base;
+		if (num / base < powi)
+			break;
+
+		if (!prefix[1])
+			break;
+		++prefix;
+	}
+
+	/* try to guess a good number of digits for precision */
+	for (; precision > 0; precision--) {
+		powj *= 10;
+		if (num / powi < powj)
+			break;
+	}
+
+	sprintf(str, "%.*f%c%s", precision,
+		(double) num / powi, *prefix, use_iec ? "i" : "");
+
+	return str;
+}
diff --git a/misc/ss.c b/misc/ss.c
index f434f57..c5995ab 100644
--- a/misc/ss.c
+++ b/misc/ss.c
@@ -26,6 +26,7 @@
 #include <fnmatch.h>
 #include <getopt.h>
 #include <stdbool.h>
+#include <inttypes.h>
 
 #include "utils.h"
 #include "rt_names.h"
@@ -96,6 +97,7 @@ int show_tcpinfo = 0;
 int show_bpf = 0;
 int show_proc_ctx = 0;
 int show_sock_ctx = 0;
+bool show_human = 0;
 /* If show_users & show_proc_ctx only do user_ent_hash_build() once */
 int user_ent_hash_build_init = 0;
 
@@ -1598,37 +1600,56 @@ outerr:
 	return ferror(fp) ? -1 : 0;
 }
 
-static char *sprint_bw(char *buf, double bw)
+static char *sprint_bandw(char *buf, double bw)
 {
-	if (bw > 1000000.)
-		sprintf(buf,"%.1fM", bw / 1000000.);
-	else if (bw > 1000.)
-		sprintf(buf,"%.1fK", bw / 1000.);
-	else
-		sprintf(buf, "%g", bw);
+	return sprint_num(buf, bw, false);
+}
 
-	return buf;
+static char *sprint_bytes(char *buf, uint64_t bytes)
+{
+	if (!show_human) {
+		sprintf(buf, "%"PRIu64, bytes);
+		return buf;
+	}
+
+	return sprint_num(buf, bytes, false);
 }
 
 static void print_skmeminfo(struct rtattr *tb[], int attrtype)
 {
+	char buf[64];
 	const __u32 *skmeminfo;
-	if (!tb[attrtype])
+
+	if (!tb[attrtype]) {
+		if (attrtype == INET_DIAG_SKMEMINFO) {
+			if (!tb[INET_DIAG_MEMINFO])
+				return;
+
+			const struct inet_diag_meminfo *minfo =
+				RTA_DATA(tb[INET_DIAG_MEMINFO]);
+
+			printf(" mem:(rd=%s,", sprint_bytes(buf, minfo->idiag_rmem));
+			printf("wr=%s,", sprint_bytes(buf, minfo->idiag_tmem));
+			printf("fwd=%s,", sprint_bytes(buf, minfo->idiag_fmem));
+			printf("wr_queue=%s)", sprint_bytes(buf, minfo->idiag_wmem));
+		}
 		return;
+	}
+
 	skmeminfo = RTA_DATA(tb[attrtype]);
 
-	printf(" skmem:(r%u,rb%u,t%u,tb%u,f%u,w%u,o%u",
-	       skmeminfo[SK_MEMINFO_RMEM_ALLOC],
-	       skmeminfo[SK_MEMINFO_RCVBUF],
-	       skmeminfo[SK_MEMINFO_WMEM_ALLOC],
-	       skmeminfo[SK_MEMINFO_SNDBUF],
-	       skmeminfo[SK_MEMINFO_FWD_ALLOC],
-	       skmeminfo[SK_MEMINFO_WMEM_QUEUED],
-	       skmeminfo[SK_MEMINFO_OPTMEM]);
+	printf(" skmem:(");
+	printf("rd=%s,", sprint_bytes(buf, skmeminfo[SK_MEMINFO_RMEM_ALLOC]));
+	printf("rcv_buf=%s,", sprint_bytes(buf, skmeminfo[SK_MEMINFO_RCVBUF]));
+	printf("wr=%s,", sprint_bytes(buf, skmeminfo[SK_MEMINFO_WMEM_ALLOC]));
+	printf("snd_buf=%s,", sprint_bytes(buf, skmeminfo[SK_MEMINFO_SNDBUF]));
+	printf("fwd=%s,", sprint_bytes(buf, skmeminfo[SK_MEMINFO_FWD_ALLOC]));
+	printf("wr_queue=%s,", sprint_bytes(buf, skmeminfo[SK_MEMINFO_WMEM_QUEUED]));
+	printf("oth=%s", sprint_bytes(buf, skmeminfo[SK_MEMINFO_OPTMEM]));
 
 	if (RTA_PAYLOAD(tb[attrtype]) >=
 		(SK_MEMINFO_BACKLOG + 1) * sizeof(__u32))
-		printf(",bl%u", skmeminfo[SK_MEMINFO_BACKLOG]);
+		printf(",bklog=%s", sprint_bytes(buf, skmeminfo[SK_MEMINFO_BACKLOG]));
 
 	printf(")");
 }
@@ -1639,17 +1660,7 @@ static void tcp_show_info(const struct nlmsghdr *nlh, struct inet_diag_msg *r,
 	char b1[64];
 	double rtt = 0;
 
-	if (tb[INET_DIAG_SKMEMINFO]) {
-		print_skmeminfo(tb, INET_DIAG_SKMEMINFO);
-	} else if (tb[INET_DIAG_MEMINFO]) {
-		const struct inet_diag_meminfo *minfo
-			= RTA_DATA(tb[INET_DIAG_MEMINFO]);
-		printf(" mem:(r%u,w%u,f%u,t%u)",
-		       minfo->idiag_rmem,
-		       minfo->idiag_wmem,
-		       minfo->idiag_fmem,
-		       minfo->idiag_tmem);
-	}
+	print_skmeminfo(tb, INET_DIAG_SKMEMINFO);
 
 	if (tb[INET_DIAG_INFO]) {
 		struct tcp_info *info;
@@ -1723,7 +1734,7 @@ static void tcp_show_info(const struct nlmsghdr *nlh, struct inet_diag_msg *r,
 
 		if (rtt > 0 && info->tcpi_snd_mss && info->tcpi_snd_cwnd) {
 			printf(" send %sbps",
-			       sprint_bw(b1, (double) info->tcpi_snd_cwnd *
+			       sprint_bandw(b1, (double) info->tcpi_snd_cwnd *
 					 (double) info->tcpi_snd_mss * 8000000.
 					 / rtt));
 		}
@@ -1740,12 +1751,12 @@ static void tcp_show_info(const struct nlmsghdr *nlh, struct inet_diag_msg *r,
 		if (info->tcpi_pacing_rate &&
 		    info->tcpi_pacing_rate != ~0ULL) {
 			printf(" pacing_rate %sbps",
-				sprint_bw(b1, info->tcpi_pacing_rate * 8.));
+				sprint_bandw(b1, info->tcpi_pacing_rate * 8.));
 
 			if (info->tcpi_max_pacing_rate &&
 			    info->tcpi_max_pacing_rate != ~0ULL)
 				printf("/%sbps",
-					sprint_bw(b1, info->tcpi_max_pacing_rate * 8.));
+					sprint_bandw(b1, info->tcpi_max_pacing_rate * 8.));
 		}
 		if (info->tcpi_unacked)
 			printf(" unacked:%u", info->tcpi_unacked);
@@ -3207,6 +3218,7 @@ static void _usage(FILE *dest)
 "   -b, --bpf           show bpf filter socket information\n"
 "   -Z, --context       display process SELinux security contexts\n"
 "   -z, --contexts      display process and socket SELinux security contexts\n"
+"   -H, --human         display info in human readable format\n"
 "\n"
 "   -4, --ipv4          display only IP version 4 sockets\n"
 "   -6, --ipv6          display only IP version 6 sockets\n"
@@ -3306,6 +3318,7 @@ static const struct option long_opts[] = {
 	{ "help", 0, 0, 'h' },
 	{ "context", 0, 0, 'Z' },
 	{ "contexts", 0, 0, 'z' },
+	{ "human", 0, 0, 'H' },
 	{ 0 }
 
 };
@@ -3321,7 +3334,7 @@ int main(int argc, char *argv[])
 	struct filter dbs_filter = {};
 	int state_filter = 0;
 
-	while ((ch = getopt_long(argc, argv, "dhaletuwxnro460spbf:miA:D:F:vVzZ",
+	while ((ch = getopt_long(argc, argv, "dhaletuwxnro460spbf:miA:D:F:vVzZH",
 				 long_opts, NULL)) != EOF) {
 		switch(ch) {
 		case 'n':
@@ -3493,6 +3506,9 @@ int main(int argc, char *argv[])
 			show_proc_ctx++;
 			user_ent_hash_build();
 			break;
+		case 'H':
+			show_human = true;
+			break;
 		case 'h':
 		case '?':
 			help();
-- 
2.1.3

^ permalink raw reply related

* [PATCH iproute2 2/3] ss: Unify inet sockets output
From: Vadim Kochan @ 2015-01-18 20:43 UTC (permalink / raw)
  To: netdev; +Cc: Vadim Kochan
In-Reply-To: <1421613815-6635-1-git-send-email-vadim4j@gmail.com>

From: Vadim Kochan <vadim4j@gmail.com>

Signed-off-by: Vadim Kochan <vadim4j@gmail.com>
---
 misc/ss.c | 355 +++++++++++++++++++++++++-------------------------------------
 1 file changed, 144 insertions(+), 211 deletions(-)

diff --git a/misc/ss.c b/misc/ss.c
index c5995ab..40439b3 100644
--- a/misc/ss.c
+++ b/misc/ss.c
@@ -708,6 +708,7 @@ struct tcpstat
 	int		refcnt;
 	unsigned long long sk;
 	int		rto, ato, qack, cwnd, ssthresh;
+	unsigned int	iface;
 };
 
 static const char *tmr_name[] = {
@@ -746,12 +747,6 @@ static const char *print_ms_timer(int timeout)
 	return buf;
 }
 
-static const char *print_hz_timer(int timeout)
-{
-	int hz = get_user_hz();
-	return print_ms_timer(((timeout*1000) + hz-1)/hz);
-}
-
 struct scache
 {
 	struct scache *next;
@@ -1441,55 +1436,124 @@ out:
 	return res;
 }
 
-static int tcp_show_line(char *line, const struct filter *f, int family)
+static char *proto_name(int protocol)
+{
+	switch (protocol) {
+	case IPPROTO_UDP:
+		return "udp";
+	case IPPROTO_TCP:
+		return "tcp";
+	case IPPROTO_DCCP:
+		return "dccp";
+	}
+
+	return "???";
+}
+
+static void inet_stats_print(struct tcpstat *s, int protocol)
+{
+	char *buf = NULL;
+
+	if (netid_width)
+		printf("%-*s ", netid_width, proto_name(protocol));
+	if (state_width)
+		printf("%-*s ", state_width, sstate_name[s->state]);
+
+	printf("%-6d %-6d ", s->rq, s->wq);
+
+	formatted_print(&s->local, s->lport, s->iface);
+	formatted_print(&s->remote, s->rport, 0);
+
+	if (show_options) {
+		if (s->timer) {
+			if (s->timer > 4)
+				s->timer = 5;
+			printf(" timer:(%s,%s,%d)",
+			       tmr_name[s->timer],
+			       print_ms_timer(s->timeout),
+			       s->retrs);
+		}
+	}
+
+	if (show_proc_ctx || show_sock_ctx) {
+		if (find_entry(s->ino, &buf,
+				(show_proc_ctx & show_sock_ctx) ?
+				PROC_SOCK_CTX : PROC_CTX) > 0) {
+			printf(" users:(%s)", buf);
+			free(buf);
+		}
+	} else if (show_users) {
+		if (find_entry(s->ino, &buf, USERS) > 0) {
+			printf(" users:(%s)", buf);
+			free(buf);
+		}
+	}
+}
+
+static int proc_parse_inet_addr(char *loc, char *rem, int family, struct tcpstat *s)
+{
+	s->local.family = s->remote.family = family;
+	if (family == AF_INET) {
+		sscanf(loc, "%x:%x", s->local.data, (unsigned*)&s->lport);
+		sscanf(rem, "%x:%x", s->remote.data, (unsigned*)&s->rport);
+		s->local.bytelen = s->remote.bytelen = 4;
+		return 0;
+	} else {
+		sscanf(loc, "%08x%08x%08x%08x:%x",
+		       s->local.data,
+		       s->local.data + 1,
+		       s->local.data + 2,
+		       s->local.data + 3,
+		       &s->lport);
+		sscanf(rem, "%08x%08x%08x%08x:%x",
+		       s->remote.data,
+		       s->remote.data + 1,
+		       s->remote.data + 2,
+		       s->remote.data + 3,
+		       &s->rport);
+		s->local.bytelen = s->remote.bytelen = 16;
+		return 0;
+	}
+	return -1;
+}
+
+static int proc_inet_split_line(char *line, char **loc, char **rem, char **data)
 {
-	struct tcpstat s;
-	char *loc, *rem, *data;
-	char opt[256];
-	int n;
 	char *p;
 
 	if ((p = strchr(line, ':')) == NULL)
 		return -1;
-	loc = p+2;
 
-	if ((p = strchr(loc, ':')) == NULL)
+	*loc = p+2;
+	if ((p = strchr(*loc, ':')) == NULL)
 		return -1;
-	p[5] = 0;
-	rem = p+6;
 
-	if ((p = strchr(rem, ':')) == NULL)
+	p[5] = 0;
+	*rem = p+6;
+	if ((p = strchr(*rem, ':')) == NULL)
 		return -1;
+
 	p[5] = 0;
-	data = p+6;
+	*data = p+6;
+	return 0;
+}
 
-	do {
-		int state = (data[1] >= 'A') ? (data[1] - 'A' + 10) : (data[1] - '0');
+static int tcp_show_line(char *line, const struct filter *f, int family)
+{
+	struct tcpstat s = {};
+	char *loc, *rem, *data;
+	char opt[256];
+	int n;
+	int hz = get_user_hz();
 
-		if (!(f->states & (1<<state)))
-			return 0;
-	} while (0);
+	if (proc_inet_split_line(line, &loc, &rem, &data))
+		return -1;
 
-	s.local.family = s.remote.family = family;
-	if (family == AF_INET) {
-		sscanf(loc, "%x:%x", s.local.data, (unsigned*)&s.lport);
-		sscanf(rem, "%x:%x", s.remote.data, (unsigned*)&s.rport);
-		s.local.bytelen = s.remote.bytelen = 4;
-	} else {
-		sscanf(loc, "%08x%08x%08x%08x:%x",
-		       s.local.data,
-		       s.local.data+1,
-		       s.local.data+2,
-		       s.local.data+3,
-		       &s.lport);
-		sscanf(rem, "%08x%08x%08x%08x:%x",
-		       s.remote.data,
-		       s.remote.data+1,
-		       s.remote.data+2,
-		       s.remote.data+3,
-		       &s.rport);
-		s.local.bytelen = s.remote.bytelen = 16;
-	}
+	int state = (data[1] >= 'A') ? (data[1] - 'A' + 10) : (data[1] - '0');
+	if (!(f->states & (1 << state)))
+		return 0;
+
+	proc_parse_inet_addr(loc, rem, family, &s);
 
 	if (f->f && run_ssfilter(f->f, &s) == 0)
 		return 0;
@@ -1511,66 +1575,36 @@ static int tcp_show_line(char *line, const struct filter *f, int family)
 		s.ato = s.qack = 0;
 	}
 
-	if (netid_width)
-		printf("%-*s ", netid_width, "tcp");
-	if (state_width)
-		printf("%-*s ", state_width, sstate_name[s.state]);
-
-	printf("%-6d %-6d ", s.rq, s.wq);
+	s.retrs = s.timer != 1 ? s.probes : s.retrs;
+	s.timeout = (s.timeout * 1000 + hz - 1) / hz;
 
-	formatted_print(&s.local, s.lport, 0);
-	formatted_print(&s.remote, s.rport, 0);
+	inet_stats_print(&s, IPPROTO_TCP);
 
-	if (show_options) {
-		if (s.timer) {
-			if (s.timer > 4)
-				s.timer = 5;
-			printf(" timer:(%s,%s,%d)",
-			       tmr_name[s.timer],
-			       print_hz_timer(s.timeout),
-			       s.timer != 1 ? s.probes : s.retrs);
-		}
+	if (show_details) {
+		if (s.uid)
+			printf(" uid:%u", (unsigned)s.uid);
+		printf(" ino:%u", s.ino);
+		printf(" sk:%llx", s.sk);
+		if (opt[0])
+			printf(" opt:\"%s\"", opt);
 	}
+
 	if (show_tcpinfo) {
-		int hz = get_user_hz();
-		if (s.rto && s.rto != 3*hz)
-			printf(" rto:%g", (double)s.rto/hz);
+		if (s.rto && s.rto != 3 * hz)
+			printf(" rto:%g", (double)s.rto / hz);
 		if (s.ato)
-			printf(" ato:%g", (double)s.ato/hz);
+			printf(" ato:%g", (double)s.ato / hz);
 		if (s.cwnd != 2)
 			printf(" cwnd:%d", s.cwnd);
 		if (s.ssthresh != -1)
 			printf(" ssthresh:%d", s.ssthresh);
-		if (s.qack/2)
-			printf(" qack:%d", s.qack/2);
-		if (s.qack&1)
+		if (s.qack / 2)
+			printf(" qack:%d", s.qack / 2);
+		if (s.qack & 1)
 			printf(" bidir");
 	}
-	char *buf = NULL;
-	if (show_proc_ctx || show_sock_ctx) {
-		if (find_entry(s.ino, &buf,
-				(show_proc_ctx & show_sock_ctx) ?
-				PROC_SOCK_CTX : PROC_CTX) > 0) {
-			printf(" users:(%s)", buf);
-			free(buf);
-		}
-	} else if (show_users) {
-		if (find_entry(s.ino, &buf, USERS) > 0) {
-			printf(" users:(%s)", buf);
-			free(buf);
-		}
-	}
 
-	if (show_details) {
-		if (s.uid)
-			printf(" uid:%u", (unsigned)s.uid);
-		printf(" ino:%u", s.ino);
-		printf(" sk:%llx", s.sk);
-		if (opt[0])
-			printf(" opt:\"%s\"", opt);
-	}
 	printf("\n");
-
 	return 0;
 }
 
@@ -1779,25 +1813,11 @@ static void tcp_show_info(const struct nlmsghdr *nlh, struct inet_diag_msg *r,
 	}
 }
 
-static char *proto_name(int protocol)
-{
-	switch (protocol) {
-	case IPPROTO_UDP:
-		return "udp";
-	case IPPROTO_TCP:
-		return "tcp";
-	case IPPROTO_DCCP:
-		return "dccp";
-	}
-
-	return "???";
-}
-
 static int inet_show_sock(struct nlmsghdr *nlh, struct filter *f, int protocol)
 {
 	struct rtattr * tb[INET_DIAG_MAX+1];
 	struct inet_diag_msg *r = NLMSG_DATA(nlh);
-	struct tcpstat s;
+	struct tcpstat s = {};
 
 	parse_rtattr(tb, INET_DIAG_MAX, (struct rtattr*)(r+1),
 		     nlh->nlmsg_len - NLMSG_LENGTH(sizeof(*r)));
@@ -1806,52 +1826,28 @@ static int inet_show_sock(struct nlmsghdr *nlh, struct filter *f, int protocol)
 	s.local.family = s.remote.family = r->idiag_family;
 	s.lport = ntohs(r->id.idiag_sport);
 	s.rport = ntohs(r->id.idiag_dport);
+	s.wq = r->idiag_wqueue;
+	s.rq = r->idiag_rqueue;
+	s.timer = r->idiag_timer;
+	s.timeout = r->idiag_expires;
+	s.retrs = r->idiag_retrans;
+	s.ino = r->idiag_inode;
+	s.uid = r->idiag_uid;
+	s.iface = r->id.idiag_if;
+
 	if (s.local.family == AF_INET) {
 		s.local.bytelen = s.remote.bytelen = 4;
 	} else {
 		s.local.bytelen = s.remote.bytelen = 16;
 	}
+
 	memcpy(s.local.data, r->id.idiag_src, s.local.bytelen);
 	memcpy(s.remote.data, r->id.idiag_dst, s.local.bytelen);
 
 	if (f && f->f && run_ssfilter(f->f, &s) == 0)
 		return 0;
 
-	if (netid_width)
-		printf("%-*s ", netid_width, proto_name(protocol));
-	if (state_width)
-		printf("%-*s ", state_width, sstate_name[s.state]);
-
-	printf("%-6d %-6d ", r->idiag_rqueue, r->idiag_wqueue);
-
-	formatted_print(&s.local, s.lport, r->id.idiag_if);
-	formatted_print(&s.remote, s.rport, 0);
-
-	if (show_options) {
-		if (r->idiag_timer) {
-			if (r->idiag_timer > 4)
-				r->idiag_timer = 5;
-			printf(" timer:(%s,%s,%d)",
-			       tmr_name[r->idiag_timer],
-			       print_ms_timer(r->idiag_expires),
-			       r->idiag_retrans);
-		}
-	}
-	char *buf = NULL;
-
-	if (show_proc_ctx || show_sock_ctx) {
-		if (find_entry(r->idiag_inode, &buf,
-				(show_proc_ctx & show_sock_ctx) ?
-				PROC_SOCK_CTX : PROC_CTX) > 0) {
-			printf(" users:(%s)", buf);
-			free(buf);
-		}
-	} else if (show_users) {
-		if (find_entry(r->idiag_inode, &buf, USERS) > 0) {
-			printf(" users:(%s)", buf);
-			free(buf);
-		}
-	}
+	inet_stats_print(&s, protocol);
 
 	if (show_details) {
 		if (r->idiag_uid)
@@ -1867,13 +1863,13 @@ static int inet_show_sock(struct nlmsghdr *nlh, struct filter *f, int protocol)
 			printf(" %c-%c", mask & 1 ? '-' : '<', mask & 2 ? '-' : '>');
 		}
 	}
+
 	if (show_mem || show_tcpinfo) {
 		printf("\n\t");
 		tcp_show_info(nlh, r, tb);
 	}
 
 	printf("\n");
-
 	return 0;
 }
 
@@ -2194,53 +2190,19 @@ outerr:
 
 static int dgram_show_line(char *line, const struct filter *f, int family)
 {
-	struct tcpstat s;
+	struct tcpstat s = {};
 	char *loc, *rem, *data;
 	char opt[256];
 	int n;
-	char *p;
-
-	if ((p = strchr(line, ':')) == NULL)
-		return -1;
-	loc = p+2;
-
-	if ((p = strchr(loc, ':')) == NULL)
-		return -1;
-	p[5] = 0;
-	rem = p+6;
 
-	if ((p = strchr(rem, ':')) == NULL)
+	if (proc_inet_split_line(line, &loc, &rem, &data))
 		return -1;
-	p[5] = 0;
-	data = p+6;
-
-	do {
-		int state = (data[1] >= 'A') ? (data[1] - 'A' + 10) : (data[1] - '0');
 
-		if (!(f->states & (1<<state)))
-			return 0;
-	} while (0);
+	int state = (data[1] >= 'A') ? (data[1] - 'A' + 10) : (data[1] - '0');
+	if (!(f->states & (1 << state)))
+		return 0;
 
-	s.local.family = s.remote.family = family;
-	if (family == AF_INET) {
-		sscanf(loc, "%x:%x", s.local.data, (unsigned*)&s.lport);
-		sscanf(rem, "%x:%x", s.remote.data, (unsigned*)&s.rport);
-		s.local.bytelen = s.remote.bytelen = 4;
-	} else {
-		sscanf(loc, "%08x%08x%08x%08x:%x",
-		       s.local.data,
-		       s.local.data+1,
-		       s.local.data+2,
-		       s.local.data+3,
-		       &s.lport);
-		sscanf(rem, "%08x%08x%08x%08x:%x",
-		       s.remote.data,
-		       s.remote.data+1,
-		       s.remote.data+2,
-		       s.remote.data+3,
-		       &s.rport);
-		s.local.bytelen = s.remote.bytelen = 16;
-	}
+	proc_parse_inet_addr(loc, rem, family, &s);
 
 	if (f->f && run_ssfilter(f->f, &s) == 0)
 		return 0;
@@ -2254,31 +2216,7 @@ static int dgram_show_line(char *line, const struct filter *f, int family)
 	if (n < 9)
 		opt[0] = 0;
 
-	if (netid_width)
-		printf("%-*s ", netid_width, dg_proto);
-	if (state_width)
-		printf("%-*s ", state_width, sstate_name[s.state]);
-
-	printf("%-6d %-6d ", s.rq, s.wq);
-
-	formatted_print(&s.local, s.lport, 0);
-	formatted_print(&s.remote, s.rport, 0);
-
-	char *buf = NULL;
-
-	if (show_proc_ctx || show_sock_ctx) {
-		if (find_entry(s.ino, &buf,
-				(show_proc_ctx & show_sock_ctx) ?
-				PROC_SOCK_CTX : PROC_CTX) > 0) {
-			printf(" users:(%s)", buf);
-			free(buf);
-		}
-	} else if (show_users) {
-		if (find_entry(s.ino, &buf, USERS) > 0) {
-			printf(" users:(%s)", buf);
-			free(buf);
-		}
-	}
+	inet_stats_print(&s, IPPROTO_UDP);
 
 	if (show_details) {
 		if (s.uid)
@@ -2288,12 +2226,11 @@ static int dgram_show_line(char *line, const struct filter *f, int family)
 		if (opt[0])
 			printf(" opt:\"%s\"", opt);
 	}
-	printf("\n");
 
+	printf("\n");
 	return 0;
 }
 
-
 static int udp_show(struct filter *f)
 {
 	FILE *fp = NULL;
@@ -2362,7 +2299,6 @@ outerr:
 	} while (0);
 }
 
-
 struct unixstat
 {
 	struct unixstat *next;
@@ -2376,12 +2312,9 @@ struct unixstat
 	char *name;
 };
 
-
-
 int unix_state_map[] = { SS_CLOSE, SS_SYN_SENT,
 			 SS_ESTABLISHED, SS_CLOSING };
 
-
 #define MAX_UNIX_REMEMBER (1024*1024/sizeof(struct unixstat))
 
 static void unix_list_free(struct unixstat *list)
-- 
2.1.3

^ permalink raw reply related

* [PATCH iproute2 3/3] ss: Unify tcp stats output
From: Vadim Kochan @ 2015-01-18 20:43 UTC (permalink / raw)
  To: netdev; +Cc: Vadim Kochan
In-Reply-To: <1421613815-6635-1-git-send-email-vadim4j@gmail.com>

From: Vadim Kochan <vadim4j@gmail.com>

Signed-off-by: Vadim Kochan <vadim4j@gmail.com>
---
 misc/ss.c | 362 +++++++++++++++++++++++++++++++++++++++-----------------------
 1 file changed, 231 insertions(+), 131 deletions(-)

diff --git a/misc/ss.c b/misc/ss.c
index 40439b3..73097b2 100644
--- a/misc/ss.c
+++ b/misc/ss.c
@@ -691,24 +691,59 @@ static const char *sstate_namel[] = {
 	[SS_CLOSING] = "closing",
 };
 
+struct dctcpstat
+{
+	unsigned int	ce_state;
+	unsigned int	alpha;
+	unsigned int	ab_ecn;
+	unsigned int	ab_tot;
+	bool		enabled;
+};
+
 struct tcpstat
 {
-	inet_prefix	local;
-	inet_prefix	remote;
-	int		lport;
-	int		rport;
-	int		state;
-	int		rq, wq;
-	int		timer;
-	int		timeout;
-	int		retrs;
-	unsigned	ino;
-	int		probes;
-	unsigned	uid;
-	int		refcnt;
-	unsigned long long sk;
-	int		rto, ato, qack, cwnd, ssthresh;
-	unsigned int	iface;
+	inet_prefix	    local;
+	inet_prefix	    remote;
+	int		    lport;
+	int		    rport;
+	int		    state;
+	int		    rq, wq;
+	unsigned	    ino;
+	unsigned	    uid;
+	int		    refcnt;
+	unsigned int	    iface;
+	unsigned long long  sk;
+	int		    timer;
+	int		    timeout;
+	int		    probes;
+	char		    *cong_alg;
+	double		    rto, ato, rtt, rttvar;
+	int		    qack, cwnd, ssthresh, backoff;
+	double		    send_bps;
+	int		    snd_wscale;
+	int		    rcv_wscale;
+	int		    mss;
+	unsigned int	    lastsnd;
+	unsigned int	    lastrcv;
+	unsigned int	    lastack;
+	double		    pacing_rate;
+	double		    pacing_rate_max;
+	unsigned int	    unacked;
+	unsigned int	    retrans;
+	unsigned int	    retrans_total;
+	unsigned int	    lost;
+	unsigned int	    sacked;
+	unsigned int	    fackets;
+	unsigned int	    reordering;
+	double		    rcv_rtt;
+	int		    rcv_space;
+	bool		    has_ts_opt;
+	bool		    has_sack_opt;
+	bool		    has_ecn_opt;
+	bool		    has_ecnseen_opt;
+	bool		    has_fastopen_opt;
+	bool		    has_wscale_opt;
+	struct dctcpstat    *dctcp;
 };
 
 static const char *tmr_name[] = {
@@ -1471,7 +1506,7 @@ static void inet_stats_print(struct tcpstat *s, int protocol)
 			printf(" timer:(%s,%s,%d)",
 			       tmr_name[s->timer],
 			       print_ms_timer(s->timeout),
-			       s->retrs);
+			       s->retrans);
 		}
 	}
 
@@ -1538,8 +1573,107 @@ static int proc_inet_split_line(char *line, char **loc, char **rem, char **data)
 	return 0;
 }
 
+static char *sprint_bandw(char *buf, double bw)
+{
+	return sprint_num(buf, bw, false);
+}
+
+static char *sprint_bytes(char *buf, uint64_t bytes)
+{
+	if (!show_human) {
+		sprintf(buf, "%"PRIu64, bytes);
+		return buf;
+	}
+
+	return sprint_num(buf, bytes, false);
+}
+
+static void tcp_stats_print(struct tcpstat *s)
+{
+	char b1[64];
+
+	if (s->has_ts_opt)
+		printf(" ts");
+	if (s->has_sack_opt)
+		printf(" sack");
+	if (s->has_ecn_opt)
+		printf(" ecn");
+	if (s->has_ecnseen_opt)
+		printf(" ecnseen");
+	if (s->has_fastopen_opt)
+		printf(" fastopen");
+	if (s->cong_alg)
+		printf(" %s", s->cong_alg);
+	if (s->has_wscale_opt)
+		printf(" wscale:%d,%d", s->snd_wscale, s->rcv_wscale);
+	if (s->rto)
+		printf(" rto:%g", s->rto);
+	if (s->backoff)
+		printf(" backoff:%u", s->backoff);
+	if (s->rtt)
+		printf(" rtt:%g/%g", s->rtt, s->rttvar);
+	if (s->ato)
+		printf(" ato:%g", s->ato);
+
+	if (s->qack)
+		printf(" qack:%d", s->qack);
+	if (s->qack & 1)
+		printf(" bidir");
+
+	if (s->mss)
+		printf(" mss:%d", s->mss);
+	if (s->cwnd && s->cwnd != 2)
+		printf(" cwnd:%d", s->cwnd);
+	if (s->ssthresh)
+		printf(" ssthresh:%d", s->ssthresh);
+
+	if (s->dctcp && s->dctcp->enabled) {
+		struct dctcpstat *dctcp = s->dctcp;
+
+		printf(" ce_state %u alpha %u ab_ecn %u ab_tot %u",
+				dctcp->ce_state, dctcp->alpha, dctcp->ab_ecn,
+				dctcp->ab_tot);
+	} else if (s->dctcp) {
+		printf(" fallback_mode");
+	}
+
+	if (s->send_bps)
+		printf(" send %sbps", sprint_bandw(b1, s->send_bps));
+	if (s->lastsnd)
+		printf(" lastsnd:%u", s->lastsnd);
+	if (s->lastrcv)
+		printf(" lastrcv:%u", s->lastrcv);
+	if (s->lastack)
+		printf(" lastack:%u", s->lastack);
+
+	if (s->pacing_rate) {
+		printf(" pacing_rate %sbps", sprint_bandw(b1, s->pacing_rate));
+		if (s->pacing_rate_max)
+				printf("/%sbps", sprint_bandw(b1,
+							s->pacing_rate_max));
+	}
+
+	if (s->unacked)
+		printf(" unacked:%u", s->unacked);
+	if (s->retrans || s->retrans_total)
+		printf(" retrans:%u/%u", s->retrans, s->retrans_total);
+	if (s->lost)
+		printf(" lost:%u", s->lost);
+	if (s->sacked && s->state != SS_LISTEN)
+		printf(" sacked:%u", s->sacked);
+	if (s->fackets)
+		printf(" fackets:%u", s->fackets);
+	if (s->reordering != 3)
+		printf(" reordering:%d", s->reordering);
+	if (s->rcv_rtt)
+		printf(" rcv_rtt:%g", s->rcv_rtt);
+	if (s->rcv_space)
+		printf(" rcv_space:%d", s->rcv_space);
+}
+
 static int tcp_show_line(char *line, const struct filter *f, int family)
 {
+	int rto = 0, ato = 0;
 	struct tcpstat s = {};
 	char *loc, *rem, *data;
 	char opt[256];
@@ -1561,22 +1695,27 @@ static int tcp_show_line(char *line, const struct filter *f, int family)
 	opt[0] = 0;
 	n = sscanf(data, "%x %x:%x %x:%x %x %d %d %u %d %llx %d %d %d %d %d %[^\n]\n",
 		   &s.state, &s.wq, &s.rq,
-		   &s.timer, &s.timeout, &s.retrs, &s.uid, &s.probes, &s.ino,
-		   &s.refcnt, &s.sk, &s.rto, &s.ato, &s.qack,
+		   &s.timer, &s.timeout, &s.retrans, &s.uid, &s.probes, &s.ino,
+		   &s.refcnt, &s.sk, &rto, &ato, &s.qack,
 		   &s.cwnd, &s.ssthresh, opt);
 
 	if (n < 17)
 		opt[0] = 0;
 
 	if (n < 12) {
-		s.rto = 0;
+		rto = 0;
 		s.cwnd = 2;
 		s.ssthresh = -1;
-		s.ato = s.qack = 0;
+		ato = s.qack = 0;
 	}
 
-	s.retrs = s.timer != 1 ? s.probes : s.retrs;
-	s.timeout = (s.timeout * 1000 + hz - 1) / hz;
+	s.retrans   = s.timer != 1 ? s.probes : s.retrans;
+	s.timeout   = (s.timeout * 1000 + hz - 1) / hz;
+	s.ato	    = (double)ato / hz;
+	s.qack	   /= 2;
+	s.rto	    = (double)rto;
+	s.ssthresh  = s.ssthresh == -1 ? 0 : s.ssthresh;
+	s.rto	    = s.rto != 3 * hz  ? s.rto / hz : 0;
 
 	inet_stats_print(&s, IPPROTO_TCP);
 
@@ -1589,20 +1728,8 @@ static int tcp_show_line(char *line, const struct filter *f, int family)
 			printf(" opt:\"%s\"", opt);
 	}
 
-	if (show_tcpinfo) {
-		if (s.rto && s.rto != 3 * hz)
-			printf(" rto:%g", (double)s.rto / hz);
-		if (s.ato)
-			printf(" ato:%g", (double)s.ato / hz);
-		if (s.cwnd != 2)
-			printf(" cwnd:%d", s.cwnd);
-		if (s.ssthresh != -1)
-			printf(" ssthresh:%d", s.ssthresh);
-		if (s.qack / 2)
-			printf(" qack:%d", s.qack / 2);
-		if (s.qack & 1)
-			printf(" bidir");
-	}
+	if (show_tcpinfo)
+		tcp_stats_print(&s);
 
 	printf("\n");
 	return 0;
@@ -1634,21 +1761,6 @@ outerr:
 	return ferror(fp) ? -1 : 0;
 }
 
-static char *sprint_bandw(char *buf, double bw)
-{
-	return sprint_num(buf, bw, false);
-}
-
-static char *sprint_bytes(char *buf, uint64_t bytes)
-{
-	if (!show_human) {
-		sprintf(buf, "%"PRIu64, bytes);
-		return buf;
-	}
-
-	return sprint_num(buf, bytes, false);
-}
-
 static void print_skmeminfo(struct rtattr *tb[], int attrtype)
 {
 	char buf[64];
@@ -1688,11 +1800,13 @@ static void print_skmeminfo(struct rtattr *tb[], int attrtype)
 	printf(")");
 }
 
+#define TCPI_HAS_OPT(info, opt) !!(info->tcpi_options & (opt))
+
 static void tcp_show_info(const struct nlmsghdr *nlh, struct inet_diag_msg *r,
 		struct rtattr *tb[])
 {
-	char b1[64];
 	double rtt = 0;
+	struct tcpstat s = {};
 
 	print_skmeminfo(tb, INET_DIAG_SKMEMINFO);
 
@@ -1709,39 +1823,49 @@ static void tcp_show_info(const struct nlmsghdr *nlh, struct inet_diag_msg *r,
 			info = RTA_DATA(tb[INET_DIAG_INFO]);
 
 		if (show_options) {
-			if (info->tcpi_options & TCPI_OPT_TIMESTAMPS)
-				printf(" ts");
-			if (info->tcpi_options & TCPI_OPT_SACK)
-				printf(" sack");
-			if (info->tcpi_options & TCPI_OPT_ECN)
-				printf(" ecn");
-			if (info->tcpi_options & TCPI_OPT_ECN_SEEN)
-				printf(" ecnseen");
-			if (info->tcpi_options & TCPI_OPT_SYN_DATA)
-				printf(" fastopen");
-		}
-
-		if (tb[INET_DIAG_CONG])
-			printf(" %s", rta_getattr_str(tb[INET_DIAG_CONG]));
-
-		if (info->tcpi_options & TCPI_OPT_WSCALE)
-			printf(" wscale:%d,%d", info->tcpi_snd_wscale,
-			       info->tcpi_rcv_wscale);
+			s.has_ts_opt	   = TCPI_HAS_OPT(info, TCPI_OPT_TIMESTAMPS);
+			s.has_sack_opt	   = TCPI_HAS_OPT(info, TCPI_OPT_SACK);
+			s.has_ecn_opt	   = TCPI_HAS_OPT(info, TCPI_OPT_ECN);
+			s.has_ecnseen_opt  = TCPI_HAS_OPT(info, TCPI_OPT_ECN_SEEN);
+			s.has_fastopen_opt = TCPI_HAS_OPT(info, TCPI_OPT_SYN_DATA);
+		}
+
+		if (tb[INET_DIAG_CONG]) {
+			const char *cong_attr = rta_getattr_str(tb[INET_DIAG_CONG]);
+			s.cong_alg = malloc(strlen(cong_attr + 1));
+			strcpy(s.cong_alg, cong_attr);
+		}
+
+		if (TCPI_HAS_OPT(info, TCPI_OPT_WSCALE)) {
+			s.has_wscale_opt  = true;
+			s.snd_wscale	  = info->tcpi_snd_wscale;
+			s.rcv_wscale	  = info->tcpi_rcv_wscale;
+		}
+
 		if (info->tcpi_rto && info->tcpi_rto != 3000000)
-			printf(" rto:%g", (double)info->tcpi_rto/1000);
-		if (info->tcpi_backoff)
-			printf(" backoff:%u", info->tcpi_backoff);
-		if (info->tcpi_rtt)
-			printf(" rtt:%g/%g", (double)info->tcpi_rtt/1000,
-			       (double)info->tcpi_rttvar/1000);
-		if (info->tcpi_ato)
-			printf(" ato:%g", (double)info->tcpi_ato/1000);
-		if (info->tcpi_snd_mss)
-			printf(" mss:%d", info->tcpi_snd_mss);
-		if (info->tcpi_snd_cwnd != 2)
-			printf(" cwnd:%d", info->tcpi_snd_cwnd);
+			s.rto = (double)info->tcpi_rto / 1000;
+
+		s.backoff	 = info->tcpi_backoff;
+		s.rtt		 = (double)info->tcpi_rtt / 1000;
+		s.rttvar	 = (double)info->tcpi_rttvar / 1000;
+		s.ato		 = (double)info->tcpi_rttvar / 1000;
+		s.mss		 = info->tcpi_snd_mss;
+		s.rcv_space	 = info->tcpi_rcv_space;
+		s.rcv_rtt	 = (double)info->tcpi_rcv_rtt / 1000;
+		s.lastsnd	 = info->tcpi_last_data_sent;
+		s.lastrcv	 = info->tcpi_last_data_recv;
+		s.lastack	 = info->tcpi_last_ack_recv;
+		s.unacked	 = info->tcpi_unacked;
+		s.retrans	 = info->tcpi_retrans;
+		s.retrans_total  = info->tcpi_total_retrans;
+		s.lost		 = info->tcpi_lost;
+		s.sacked	 = info->tcpi_sacked;
+		s.reordering	 = info->tcpi_reordering;
+		s.rcv_space	 = info->tcpi_rcv_space;
+		s.cwnd		 = info->tcpi_snd_cwnd;
+
 		if (info->tcpi_snd_ssthresh < 0xFFFF)
-			printf(" ssthresh:%d", info->tcpi_snd_ssthresh);
+			s.ssthresh = info->tcpi_snd_ssthresh;
 
 		rtt = (double) info->tcpi_rtt;
 		if (tb[INET_DIAG_VEGASINFO]) {
@@ -1749,67 +1873,43 @@ static void tcp_show_info(const struct nlmsghdr *nlh, struct inet_diag_msg *r,
 				= RTA_DATA(tb[INET_DIAG_VEGASINFO]);
 
 			if (vinfo->tcpv_enabled &&
-			    vinfo->tcpv_rtt && vinfo->tcpv_rtt != 0x7fffffff)
+					vinfo->tcpv_rtt && vinfo->tcpv_rtt != 0x7fffffff)
 				rtt =  vinfo->tcpv_rtt;
 		}
 
 		if (tb[INET_DIAG_DCTCPINFO]) {
+			struct dctcpstat *dctcp = malloc(sizeof(struct
+						dctcpstat));
+
 			const struct tcp_dctcp_info *dinfo
 				= RTA_DATA(tb[INET_DIAG_DCTCPINFO]);
 
-			if (dinfo->dctcp_enabled) {
-				printf(" ce_state %u alpha %u ab_ecn %u ab_tot %u",
-				       dinfo->dctcp_ce_state, dinfo->dctcp_alpha,
-				       dinfo->dctcp_ab_ecn, dinfo->dctcp_ab_tot);
-			} else {
-				printf(" fallback_mode");
-			}
+			dctcp->enabled	= !!dinfo->dctcp_enabled;
+			dctcp->ce_state = dinfo->dctcp_ce_state;
+			dctcp->alpha	= dinfo->dctcp_alpha;
+			dctcp->ab_ecn	= dinfo->dctcp_ab_ecn;
+			dctcp->ab_tot	= dinfo->dctcp_ab_tot;
+			s.dctcp		= dctcp;
 		}
 
 		if (rtt > 0 && info->tcpi_snd_mss && info->tcpi_snd_cwnd) {
-			printf(" send %sbps",
-			       sprint_bandw(b1, (double) info->tcpi_snd_cwnd *
-					 (double) info->tcpi_snd_mss * 8000000.
-					 / rtt));
+			s.send_bps = (double) info->tcpi_snd_cwnd *
+				(double)info->tcpi_snd_mss * 8000000. / rtt;
 		}
 
-		if (info->tcpi_last_data_sent)
-			printf(" lastsnd:%u", info->tcpi_last_data_sent);
-
-		if (info->tcpi_last_data_recv)
-			printf(" lastrcv:%u", info->tcpi_last_data_recv);
-
-		if (info->tcpi_last_ack_recv)
-			printf(" lastack:%u", info->tcpi_last_ack_recv);
-
 		if (info->tcpi_pacing_rate &&
-		    info->tcpi_pacing_rate != ~0ULL) {
-			printf(" pacing_rate %sbps",
-				sprint_bandw(b1, info->tcpi_pacing_rate * 8.));
+				info->tcpi_pacing_rate != ~0ULL) {
+			s.pacing_rate = info->tcpi_pacing_rate * 8.;
 
 			if (info->tcpi_max_pacing_rate &&
-			    info->tcpi_max_pacing_rate != ~0ULL)
-				printf("/%sbps",
-					sprint_bandw(b1, info->tcpi_max_pacing_rate * 8.));
-		}
-		if (info->tcpi_unacked)
-			printf(" unacked:%u", info->tcpi_unacked);
-		if (info->tcpi_retrans || info->tcpi_total_retrans)
-			printf(" retrans:%u/%u", info->tcpi_retrans,
-			       info->tcpi_total_retrans);
-		if (info->tcpi_lost)
-			printf(" lost:%u", info->tcpi_lost);
-		if (info->tcpi_sacked && r->idiag_state != SS_LISTEN)
-			printf(" sacked:%u", info->tcpi_sacked);
-		if (info->tcpi_fackets)
-			printf(" fackets:%u", info->tcpi_fackets);
-		if (info->tcpi_reordering != 3)
-			printf(" reordering:%d", info->tcpi_reordering);
-		if (info->tcpi_rcv_rtt)
-			printf(" rcv_rtt:%g", (double) info->tcpi_rcv_rtt/1000);
-		if (info->tcpi_rcv_space)
-			printf(" rcv_space:%d", info->tcpi_rcv_space);
-
+					info->tcpi_max_pacing_rate != ~0ULL)
+				s.pacing_rate_max = info->tcpi_max_pacing_rate * 8.;
+		}
+		tcp_stats_print(&s);
+		if (s.dctcp)
+			free(s.dctcp);
+		if (s.cong_alg)
+			free(s.cong_alg);
 	}
 }
 
@@ -1830,7 +1930,7 @@ static int inet_show_sock(struct nlmsghdr *nlh, struct filter *f, int protocol)
 	s.rq = r->idiag_rqueue;
 	s.timer = r->idiag_timer;
 	s.timeout = r->idiag_expires;
-	s.retrs = r->idiag_retrans;
+	s.retrans = r->idiag_retrans;
 	s.ino = r->idiag_inode;
 	s.uid = r->idiag_uid;
 	s.iface = r->id.idiag_if;
-- 
2.1.3

^ permalink raw reply related

* Re: [RFC PATCH net-next] bridge: ability to disable forwarding on a port
From: roopa @ 2015-01-18 20:55 UTC (permalink / raw)
  To: Scott Feldman
  Cc: stephen@networkplumber.org, David S. Miller, Jamal Hadi Salim,
	Jiří Pírko, Arad, Ronen, Thomas Graf,
	john fastabend, vyasevic@redhat.com, Netdev, Wilson Kok,
	Andy Gospodarek
In-Reply-To: <54BB7874.90201@cumulusnetworks.com>

On 1/18/15, 1:10 AM, roopa wrote:
> On 1/17/15, 5:05 PM, Scott Feldman wrote:
>> On Fri, Jan 16, 2015 at 11:32 PM, <roopa@cumulusnetworks.com> wrote:
>>> From: Roopa Prabhu <roopa@cumulusnetworks.com>
>>>
>>> On a Linux bridge with bridge forwarding offloaded to a switch ASIC,
>>> there is a need to not re-forward the frames that come up to the
>>> kernel in software.
>>>
>>> Typically these are broadcast or multicast packets forwarded by the
>>> hardware to multiple destination ports including sending a copy of
>>> the packet to the kernel (e.g. an arp broadcast).
>>> The bridge driver will try to forward the packet again, resulting in
>>> two copies of the same packet.
>>>
>>> These packets can also come up to the kernel for logging when they hit
>>> a LOG acl in hardware.
>>>
>>> This patch makes forwarding a flag on the port similar to
>>> learn and flood and drops the packet just before forwarding.
>>> (The forwarding disable on a bridge is tested to work on our boxes.
>>> The bridge port flag addition is only compile tested.
>>> This will need to be further refined to cover cases where a 
>>> non-switch port
>>> is bridged to a switch port etc. We will submit more patches to cover
>>> all cases if we agree on this approach).
>> Good topic to bring up, thanks for proposing a patch.  There is indeed
>> duplicate pkts sent out in the case where both the bridge and the
>> offloaded device are flooding these non-unicast pkts, such as ARP
>> requests.  We do have per-port control today over unicast flooding
>> using BR_FLOOD (IFLA_BRPORT_UNICAST_FLOOD).
>>
>> As you point out, this doesn't solve the case for non-offloaded ports
>> bridged with switch ports.  If this port setting is enabled on an
>> offloaded switch port, for example, the non-offloaded port can't get
>> an ARP request resolved, if the MAC is behind the offloaded switch
>> port.  But do we care?  Is there a use-case for this one, mixing
>> offloaded and non-offloaded ports in a bridge?
>
> Not sure. I don't know the use case, but I think I might have heard 
> that there could be a case
>  where a switch port could be bridged with a vm's port running on the 
> switch. (?)

Ignoring the above usecase for a bit. And thinking through this again, 
It appears that this check should
be only on the ingress port, no ?

If the ingress bridge port is an offloaded port, don't flood or forward 
because hardware has already done it.
And this is best done with the offload feature flag on the bridge port.

But, If we bring in the usecase, where a bridge has offloaded and 
non-offloaded ports mixed,
there should be an option to disable this check and to achieve that its 
better for this to be a
flag on the port maintained by the bridge driver like the one proposed 
by this patch (BR_FORWARD).

Thanks,
Roopa

^ permalink raw reply

* Re: [net-next PATCH v2 1/1] net: sched: Introduce connmark action
From: Cong Wang @ 2015-01-18 21:00 UTC (permalink / raw)
  To: Jamal Hadi Salim
  Cc: David Miller, netdev, nbd, pablo, Florian Westphal,
	Jiří Pírko
In-Reply-To: <1421611245-18492-1-git-send-email-jhs@emojatatu.com>

On Sun, Jan 18, 2015 at 12:00 PM, Jamal Hadi Salim <jhs@mojatatu.com> wrote:
> +
> +MODULE_AUTHOR("Felix Fietkau <nbd@openwrt.org>");
> +MODULE_DESCRIPTION("Connection tracking mark restoring");
> +MODULE_LICENSE("GPL");


Please move these to the bottom.

> +
> +static int __init connmark_init_module(void)
> +{
> +       int ret;
> +
> +       ret = tcf_hashinfo_init(&connmark_hash_info, CONNMARK_TAB_MASK);
> +       if (ret)
> +               return ret;
> +

Is this against latest net-next? We don't need to init the hashinfo anymore,
tcf_register_action() already does that.

> +       return tcf_register_action(&act_connmark_ops, CONNMARK_TAB_MASK);
> +}
> +
> +static void __exit connmark_cleanup_module(void)
> +{
> +       tcf_unregister_action(&act_connmark_ops);
> +}
> +

Even if we really needed, you forgot to call tcf_hashinfo_destroy()?

Thanks.

^ permalink raw reply

* Re: [net-next PATCH v2 1/1] net: sched: Introduce connmark action
From: Jamal Hadi Salim @ 2015-01-18 21:22 UTC (permalink / raw)
  To: Cong Wang
  Cc: David Miller, netdev, nbd, pablo, Florian Westphal,
	Jiří Pírko
In-Reply-To: <CAHA+R7Muz_d47NX01tKO7v7V7m=5mKf+BPP1jo9dEH==8R6bgw@mail.gmail.com>

On 01/18/15 16:00, Cong Wang wrote:
> On Sun, Jan 18, 2015 at 12:00 PM, Jamal Hadi Salim <jhs@mojatatu.com> wrote:
>> +
>> +MODULE_AUTHOR("Felix Fietkau <nbd@openwrt.org>");
>> +MODULE_DESCRIPTION("Connection tracking mark restoring");
>> +MODULE_LICENSE("GPL");
>
>
> Please move these to the bottom.
>

Done.

>> +
>> +static int __init connmark_init_module(void)
>> +{
>> +       int ret;
>> +
>> +       ret = tcf_hashinfo_init(&connmark_hash_info, CONNMARK_TAB_MASK);
>> +       if (ret)
>> +               return ret;
>> +
>
> Is this against latest net-next? We don't need to init the hashinfo anymore,
> tcf_register_action() already does that.
>

The code itself has been living outside the tree - so i am just doing
only necessary transforms. The above was needed because the action
was maintaining its own hash.
I will convert to the new mode.

>> +       return tcf_register_action(&act_connmark_ops, CONNMARK_TAB_MASK);
>> +}
>> +
>> +static void __exit connmark_cleanup_module(void)
>> +{
>> +       tcf_unregister_action(&act_connmark_ops);
>> +}
>> +
>
> Even if we really needed, you forgot to call tcf_hashinfo_destroy()?
>

Didnt follow - why do you need tcf_hashinfo_destroy()?

Will send v3 shortly

cheers,
jamal

> Thanks.
>

^ permalink raw reply

* [net-next PATCH v3 1/1] net: sched: Introduce connmark action
From: Jamal Hadi Salim @ 2015-01-18 21:35 UTC (permalink / raw)
  To: davem; +Cc: netdev, nbd, pablo, fw, jiri, cwang, Jamal Hadi Salim

From: Felix Fietkau <nbd@openwrt.org>

This tc action allows you to retrieve the connection tracking mark
This action has been used heavily by openwrt for a few years now.

There are known limitations currently:

doesn't work for initial packets, since we only query the ct table.
  Fine given use case is for returning packets

no implicit defrag.
  frags should be rare so fix later..

won't work for more complex tasks, e.g. lookup of other extensions
  since we have no means to store results

we still have a 2nd lookup later on via normal conntrack path.
This shouldn't break anything though since skb->nfct isn't altered.

V2:
remove unnecessary braces (Jiri)
change the action identifier to 14 (Jiri)
Fix some stylistic issues caught by checkpatch
V3:
Move module params to bottom (Cong)
Get rid of tcf_hashinfo_init and friends and conform to newer API (Cong)

Acked-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
---
 include/net/tc_act/tc_connmark.h        |   14 +++
 include/uapi/linux/tc_act/tc_connmark.h |   22 ++++
 net/sched/Kconfig                       |   11 ++
 net/sched/Makefile                      |    1 +
 net/sched/act_connmark.c                |  192 +++++++++++++++++++++++++++++++
 5 files changed, 240 insertions(+)
 create mode 100644 include/net/tc_act/tc_connmark.h
 create mode 100644 include/uapi/linux/tc_act/tc_connmark.h
 create mode 100644 net/sched/act_connmark.c

diff --git a/include/net/tc_act/tc_connmark.h b/include/net/tc_act/tc_connmark.h
new file mode 100644
index 0000000..5c1104c
--- /dev/null
+++ b/include/net/tc_act/tc_connmark.h
@@ -0,0 +1,14 @@
+#ifndef __NET_TC_CONNMARK_H
+#define __NET_TC_CONNMARK_H
+
+#include <net/act_api.h>
+
+struct tcf_connmark_info {
+	struct tcf_common common;
+	u16 zone;
+};
+
+#define to_connmark(a) \
+	container_of(a->priv, struct tcf_connmark_info, common)
+
+#endif /* __NET_TC_CONNMARK_H */
diff --git a/include/uapi/linux/tc_act/tc_connmark.h b/include/uapi/linux/tc_act/tc_connmark.h
new file mode 100644
index 0000000..994b097
--- /dev/null
+++ b/include/uapi/linux/tc_act/tc_connmark.h
@@ -0,0 +1,22 @@
+#ifndef __UAPI_TC_CONNMARK_H
+#define __UAPI_TC_CONNMARK_H
+
+#include <linux/types.h>
+#include <linux/pkt_cls.h>
+
+#define TCA_ACT_CONNMARK 14
+
+struct tc_connmark {
+	tc_gen;
+	__u16 zone;
+};
+
+enum {
+	TCA_CONNMARK_UNSPEC,
+	TCA_CONNMARK_PARMS,
+	TCA_CONNMARK_TM,
+	__TCA_CONNMARK_MAX
+};
+#define TCA_CONNMARK_MAX (__TCA_CONNMARK_MAX - 1)
+
+#endif
diff --git a/net/sched/Kconfig b/net/sched/Kconfig
index c54c9d9..db20cae 100644
--- a/net/sched/Kconfig
+++ b/net/sched/Kconfig
@@ -698,6 +698,17 @@ config NET_ACT_VLAN
 	  To compile this code as a module, choose M here: the
 	  module will be called act_vlan.
 
+config NET_ACT_CONNMARK
+        tristate "Netfilter Connection Mark Retriever"
+        depends on NET_CLS_ACT && NETFILTER && IP_NF_IPTABLES
+        ---help---
+	  Say Y here to allow retrieving of conn mark
+
+	  If unsure, say N.
+
+	  To compile this code as a module, choose M here: the
+	  module will be called act_connmark.
+
 config NET_CLS_IND
 	bool "Incoming device classification"
 	depends on NET_CLS_U32 || NET_CLS_FW
diff --git a/net/sched/Makefile b/net/sched/Makefile
index 679f24a..47304cd 100644
--- a/net/sched/Makefile
+++ b/net/sched/Makefile
@@ -17,6 +17,7 @@ obj-$(CONFIG_NET_ACT_SIMP)	+= act_simple.o
 obj-$(CONFIG_NET_ACT_SKBEDIT)	+= act_skbedit.o
 obj-$(CONFIG_NET_ACT_CSUM)	+= act_csum.o
 obj-$(CONFIG_NET_ACT_VLAN)	+= act_vlan.o
+obj-$(CONFIG_NET_ACT_CONNMARK)	+= act_connmark.o
 obj-$(CONFIG_NET_SCH_FIFO)	+= sch_fifo.o
 obj-$(CONFIG_NET_SCH_CBQ)	+= sch_cbq.o
 obj-$(CONFIG_NET_SCH_HTB)	+= sch_htb.o
diff --git a/net/sched/act_connmark.c b/net/sched/act_connmark.c
new file mode 100644
index 0000000..8e47251
--- /dev/null
+++ b/net/sched/act_connmark.c
@@ -0,0 +1,192 @@
+/*
+ * net/sched/act_connmark.c  netfilter connmark retriever action
+ * skb mark is over-written
+ *
+ * Copyright (c) 2011 Felix Fietkau <nbd@openwrt.org>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+*/
+
+#include <linux/module.h>
+#include <linux/init.h>
+#include <linux/kernel.h>
+#include <linux/skbuff.h>
+#include <linux/rtnetlink.h>
+#include <linux/pkt_cls.h>
+#include <linux/ip.h>
+#include <linux/ipv6.h>
+#include <net/netlink.h>
+#include <net/pkt_sched.h>
+#include <net/act_api.h>
+#include <uapi/linux/tc_act/tc_connmark.h>
+#include <net/tc_act/tc_connmark.h>
+
+#include <net/netfilter/nf_conntrack.h>
+#include <net/netfilter/nf_conntrack_core.h>
+#include <net/netfilter/nf_conntrack_zones.h>
+
+#define CONNMARK_TAB_MASK     3
+
+static int tcf_connmark(struct sk_buff *skb, const struct tc_action *a,
+			struct tcf_result *res)
+{
+	const struct nf_conntrack_tuple_hash *thash;
+	struct nf_conntrack_tuple tuple;
+	enum ip_conntrack_info ctinfo;
+	struct tcf_connmark_info *ca = a->priv;
+	struct nf_conn *c;
+	int proto;
+
+	spin_lock(&ca->tcf_lock);
+	ca->tcf_tm.lastuse = jiffies;
+	bstats_update(&ca->tcf_bstats, skb);
+
+	if (skb->protocol == htons(ETH_P_IP)) {
+		if (skb->len < sizeof(struct iphdr))
+			goto out;
+
+		proto = NFPROTO_IPV4;
+	} else if (skb->protocol == htons(ETH_P_IPV6)) {
+		if (skb->len < sizeof(struct ipv6hdr))
+			goto out;
+
+		proto = NFPROTO_IPV6;
+	} else {
+		goto out;
+	}
+
+	c = nf_ct_get(skb, &ctinfo);
+	if (c) {
+		skb->mark = c->mark;
+		/* using overlimits stats to count how many packets marked */
+		ca->tcf_qstats.overlimits++;
+		nf_ct_put(c);
+		goto out;
+	}
+
+	if (!nf_ct_get_tuplepr(skb, skb_network_offset(skb),
+			       proto, &tuple))
+		goto out;
+
+	thash = nf_conntrack_find_get(dev_net(skb->dev), ca->zone, &tuple);
+	if (!thash)
+		goto out;
+
+	c = nf_ct_tuplehash_to_ctrack(thash);
+	/* using overlimits stats to count how many packets marked */
+	ca->tcf_qstats.overlimits++;
+	skb->mark = c->mark;
+	nf_ct_put(c);
+
+out:
+	skb->nfct = NULL;
+	spin_unlock(&ca->tcf_lock);
+	return ca->tcf_action;
+}
+
+static const struct nla_policy connmark_policy[TCA_CONNMARK_MAX + 1] = {
+	[TCA_CONNMARK_PARMS] = { .len = sizeof(struct tc_connmark) },
+};
+
+static int tcf_connmark_init(struct net *net, struct nlattr *nla,
+			     struct nlattr *est, struct tc_action *a,
+			     int ovr, int bind)
+{
+	struct nlattr *tb[TCA_CONNMARK_MAX + 1];
+	struct tcf_connmark_info *ci;
+	struct tc_connmark *parm;
+	int ret = 0;
+
+	if (!nla)
+		return -EINVAL;
+
+	ret = nla_parse_nested(tb, TCA_CONNMARK_MAX, nla, connmark_policy);
+	if (ret < 0)
+		return ret;
+
+	parm = nla_data(tb[TCA_CONNMARK_PARMS]);
+
+	if (!tcf_hash_check(parm->index, a, bind)) {
+		ret = tcf_hash_create(parm->index, est, a, sizeof(*ci), bind);
+		if (ret)
+			return ret;
+
+		ci = to_connmark(a);
+		ci->tcf_action = parm->action;
+		ci->zone = parm->zone;
+
+		tcf_hash_insert(a);
+		ret = ACT_P_CREATED;
+	} else {
+		ci = to_connmark(a);
+		if (bind)
+			return 0;
+		tcf_hash_release(a, bind);
+		if (!ovr)
+			return -EEXIST;
+		/* replacing action and zone */
+		ci->tcf_action = parm->action;
+		ci->zone = parm->zone;
+	}
+
+	return ret;
+}
+
+static inline int tcf_connmark_dump(struct sk_buff *skb, struct tc_action *a,
+				    int bind, int ref)
+{
+	unsigned char *b = skb_tail_pointer(skb);
+	struct tcf_connmark_info *ci = a->priv;
+
+	struct tc_connmark opt = {
+		.index   = ci->tcf_index,
+		.refcnt  = ci->tcf_refcnt - ref,
+		.bindcnt = ci->tcf_bindcnt - bind,
+		.action  = ci->tcf_action,
+		.zone   = ci->zone,
+	};
+	struct tcf_t t;
+
+	if (nla_put(skb, TCA_CONNMARK_PARMS, sizeof(opt), &opt))
+		goto nla_put_failure;
+
+	t.install = jiffies_to_clock_t(jiffies - ci->tcf_tm.install);
+	t.lastuse = jiffies_to_clock_t(jiffies - ci->tcf_tm.lastuse);
+	t.expires = jiffies_to_clock_t(ci->tcf_tm.expires);
+	if (nla_put(skb, TCA_CONNMARK_TM, sizeof(t), &t))
+		goto nla_put_failure;
+
+	return skb->len;
+nla_put_failure:
+	nlmsg_trim(skb, b);
+	return -1;
+}
+
+static struct tc_action_ops act_connmark_ops = {
+	.kind		=	"connmark",
+	.type		=	TCA_ACT_CONNMARK,
+	.owner		=	THIS_MODULE,
+	.act		=	tcf_connmark,
+	.dump		=	tcf_connmark_dump,
+	.init		=	tcf_connmark_init,
+};
+
+static int __init connmark_init_module(void)
+{
+	return tcf_register_action(&act_connmark_ops, CONNMARK_TAB_MASK);
+}
+
+static void __exit connmark_cleanup_module(void)
+{
+	tcf_unregister_action(&act_connmark_ops);
+}
+
+module_init(connmark_init_module);
+module_exit(connmark_cleanup_module);
+MODULE_AUTHOR("Felix Fietkau <nbd@openwrt.org>");
+MODULE_DESCRIPTION("Connection tracking mark restoring");
+MODULE_LICENSE("GPL");
+
-- 
1.7.9.5

^ permalink raw reply related

* Re: [RFC PATCH v2 1/2] net: af_packet support for direct ring access in user space
From: Neil Horman @ 2015-01-18 22:02 UTC (permalink / raw)
  To: David Miller
  Cc: john.fastabend, netdev, danny.zhou, dborkman, john.ronciak,
	hannes, brouer
In-Reply-To: <20150114.153509.1264618607573705890.davem@davemloft.net>

On Wed, Jan 14, 2015 at 03:35:09PM -0500, David Miller wrote:
> From: John Fastabend <john.fastabend@gmail.com>
> Date: Mon, 12 Jan 2015 20:35:11 -0800
> 
> > +		if ((region.direction != DMA_BIDIRECTIONAL) &&
> > +		    (region.direction != DMA_TO_DEVICE) &&
> > +		    (region.direction != DMA_FROM_DEVICE))
> > +			return -EFAULT;
>  ...
> > +		if ((umem->nmap == npages) &&
> > +		    (0 != dma_map_sg(dev->dev.parent, umem->sglist,
> > +				     umem->nmap, region.direction))) {
> > +			region.iova = sg_dma_address(umem->sglist) + offset;
> 
> I am having trouble seeing how this can work.
> 
> dma_map_{single,sg}() mappings need synchronization after a DMA
> transfer takes place.
> 
> For example if the DMA occurs to the device, then that region can
> be cached in the PCI controller's internal caches and thus future
> cpu writes into that memory region will not be seen, until a
> dma_sync_*() is invoked.
> 
> That isn't going to happen when the device transmit queue is
> being completely managed in userspace.
> 
> And this takes us back to the issue of protection, I don't think
> it is addressed properly yet.
> 
> CAP_NET_ADMIN privileges do not mean "can crap all over memory"
> yet with this feature that can still happen.
> 
> If we are dealing with a device which cannot provide strict protection
> to only the process's locked local pages, you have to do something
> to implement that protection.
> 
> And you have _exactly_ one option to do that, abstracting the page
> addresses and eating a system call to trigger the sends, so that you
> can read from the user's (fake) descriptors and write into the real
> descriptors (translating the DMA addresses along the way) and
> triggering the TX doorbell.
> 
> I am not going to consider seriously an implementation that says "yeah
> sometimes the user can crap onto other people's memory", this isn't
> MS-DOS, it's a system where proper memory protections are mandatory
> rather than optional.
> 
This is probably a stupid question, but can you not dynamically mark the address
range that gets mapped for dma as uncacheable? i.e. Something simmilar to
ioremap_noncache, but to mark the region as uncacheable within the pci
controller?  Would doing so not obviate the need for sync operations
(potentially at the cost of some performance, though perhaps not as much as
incurring a system call)
Neil

^ permalink raw reply

* Re: BW regression after "tcp: refine TSO autosizing"
From: Eyal Perry @ 2015-01-18 21:40 UTC (permalink / raw)
  To: Eric Dumazet, Eyal Perry
  Cc: Or Gerlitz, Linux Netdev List, Amir Vadai, Yevgeny Petrilin,
	Saeed Mahameed, Ido Shamay, Amir Ancel
In-Reply-To: <1421603317.11734.154.camel@edumazet-glaptop2.roam.corp.google.com>


On 1/18/2015 19:48 PM, Eric Dumazet wrote:
> On Sun, 2015-01-18 at 18:22 +0200, Eyal Perry wrote:
>
>> Please let me know if you see something in the results.
> Getting high throughput on a single flow means lot of tweaking.
>
> For a start, mlx4 is known to have interrupt mitigation that can hurt,
> as the TX interrupt timer is restarted for every packet that is
> delivered to the NIC.
>
> ethtool -c ethX
> ..
> tx-usecs: 16
> tx-frames: 16
> tx-usecs-irq: 0
> tx-frames-irq: 256
> ...
>
> -> TX IRQ can be delayed by 16*16 = 256 usec.
>
> Can you try :
>
> ethtool -C ethX tx-usecs 2 tx-frames 2
>
> Or even
>
> ethtool -C ethX tx-usecs 1 tx-frames 1
So indeed, interrupt mitigation (tx-usecs 1 tx-frames 1) improves things up
for the "refined TSO autosizing" kernel (from 18.4Gbps to 19.7Gbps). but
in the
other kernel, the BW is remains the same with and without the coalescing.
> Interrupt mitigation is a trade-off.
>
> If one customer wants high throughput on a single flow, then you might
> remove interrupt mitigation.
>
> If another customer wants cpu efficiency with thousand of flows, I guess
> current mlx4 defaults are pretty good.
>

^ permalink raw reply

* Re: [PATCH net-next 1/2] udp: Do not require sock in udp_tunnel_xmit_skb
From: Or Gerlitz @ 2015-01-18 22:43 UTC (permalink / raw)
  To: Tom Herbert; +Cc: David Miller, Thomas Graf, Linux Netdev List
In-Reply-To: <1421518700-22460-2-git-send-email-therbert@google.com>

On Sat, Jan 17, 2015 at 8:18 PM, Tom Herbert <therbert@google.com> wrote:
> The UDP tunnel transmit functions udp_tunnel_xmit_skb and
> udp_tunnel6_xmit_skb include a socket argument. The socket being
> passed to the functions (from VXLAN) is a UDP created for receive
> side. The only thing that the socket is used for in the transmit
> functions is to get the setting for checksum (enabled or zero).

Tom, just to clarify - re the sockets usage in the transmit side,
somewhere bind or alike is done on them such that we have multiple
source UDP ports for given host VXLAN traffic. Here for example the
sender host is 192.168.31.17 and two ports are seen here 54206 and
50795.

Just wanted to make sure this series doesn't change that, since if
this is the case, we introduce here a regression w.r.t RSS hash
spreading from the outer UDP header data at the receiver side (which
is the right thing to do, per your LKS session...)

IP 192.168.31.18.45367 > 192.168.31.17.4789: UDP, length 62
IP 192.168.31.18.45515 > 192.168.31.17.4789: UDP, length 62
IP 192.168.31.17.54206 > 192.168.31.18.4789: UDP, length 26814
IP 192.168.31.18.45367 > 192.168.31.17.4789: UDP, length 62
IP 192.168.31.18.45515 > 192.168.31.17.4789: UDP, length 62
IP 192.168.31.17.50795 > 192.168.31.18.4789: UDP, length 25498
IP 192.168.31.17.50795 > 192.168.31.18.4789: UDP, length 64922
IP 192.168.31.17.50795 > 192.168.31.18.4789: UDP, length 64922
IP 192.168.31.18.45367 > 192.168.31.17.4789: UDP, length 62
IP 192.168.31.17.54206 > 192.168.31.18.4789: UDP, length 38170

> This patch removes the argument and and adds a nocheck argument
> for checksum setting. This eliminates the unnecessary dependency
> on a UDP socket for UDP tunnel transmit.
>
> Signed-off-by: Tom Herbert <therbert@google.com>
> ---
>  drivers/net/vxlan.c       | 10 ++++++----
>  include/net/udp_tunnel.h  | 16 ++++++++--------
>  net/ipv4/udp_tunnel.c     | 12 ++++++------
>  net/ipv6/ip6_udp_tunnel.c | 12 ++++++------
>  4 files changed, 26 insertions(+), 24 deletions(-)
>
> diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
> index 6b6b456..4fb4205 100644
> --- a/drivers/net/vxlan.c
> +++ b/drivers/net/vxlan.c
> @@ -1763,8 +1763,9 @@ static int vxlan6_xmit_skb(struct vxlan_sock *vs,
>
>         skb_set_inner_protocol(skb, htons(ETH_P_TEB));
>
> -       udp_tunnel6_xmit_skb(vs->sock, dst, skb, dev, saddr, daddr, prio,
> -                            ttl, src_port, dst_port);
> +       udp_tunnel6_xmit_skb(dst, skb, dev, saddr, daddr, prio,
> +                            ttl, src_port, dst_port,
> +                            udp_get_no_check6_tx(vs->sock->sk));
>         return 0;
>  err:
>         dst_release(dst);
> @@ -1842,8 +1843,9 @@ int vxlan_xmit_skb(struct vxlan_sock *vs,
>
>         skb_set_inner_protocol(skb, htons(ETH_P_TEB));
>
> -       return udp_tunnel_xmit_skb(vs->sock, rt, skb, src, dst, tos,
> -                                  ttl, df, src_port, dst_port, xnet);
> +       return udp_tunnel_xmit_skb(rt, skb, src, dst, tos,
> +                                  ttl, df, src_port, dst_port, xnet,
> +                                  vs->sock->sk->sk_no_check_tx);
>  }
>  EXPORT_SYMBOL_GPL(vxlan_xmit_skb);
>
> diff --git a/include/net/udp_tunnel.h b/include/net/udp_tunnel.h
> index 2a50a70..1a20d33 100644
> --- a/include/net/udp_tunnel.h
> +++ b/include/net/udp_tunnel.h
> @@ -77,17 +77,17 @@ void setup_udp_tunnel_sock(struct net *net, struct socket *sock,
>                            struct udp_tunnel_sock_cfg *sock_cfg);
>
>  /* Transmit the skb using UDP encapsulation. */
> -int udp_tunnel_xmit_skb(struct socket *sock, struct rtable *rt,
> -                       struct sk_buff *skb, __be32 src, __be32 dst,
> -                       __u8 tos, __u8 ttl, __be16 df, __be16 src_port,
> -                       __be16 dst_port, bool xnet);
> +int udp_tunnel_xmit_skb(struct rtable *rt, struct sk_buff *skb,
> +                       __be32 src, __be32 dst, __u8 tos, __u8 ttl,
> +                       __be16 df, __be16 src_port, __be16 dst_port,
> +                       bool xnet, bool nocheck);
>
>  #if IS_ENABLED(CONFIG_IPV6)
> -int udp_tunnel6_xmit_skb(struct socket *sock, struct dst_entry *dst,
> -                        struct sk_buff *skb, struct net_device *dev,
> -                        struct in6_addr *saddr, struct in6_addr *daddr,
> +int udp_tunnel6_xmit_skb(struct dst_entry *dst, struct sk_buff *skb,
> +                        struct net_device *dev, struct in6_addr *saddr,
> +                        struct in6_addr *daddr,
>                          __u8 prio, __u8 ttl, __be16 src_port,
> -                        __be16 dst_port);
> +                        __be16 dst_port, bool nocheck);
>  #endif
>
>  void udp_tunnel_sock_release(struct socket *sock);
> diff --git a/net/ipv4/udp_tunnel.c b/net/ipv4/udp_tunnel.c
> index 9996e63..c83b354 100644
> --- a/net/ipv4/udp_tunnel.c
> +++ b/net/ipv4/udp_tunnel.c
> @@ -75,10 +75,10 @@ void setup_udp_tunnel_sock(struct net *net, struct socket *sock,
>  }
>  EXPORT_SYMBOL_GPL(setup_udp_tunnel_sock);
>
> -int udp_tunnel_xmit_skb(struct socket *sock, struct rtable *rt,
> -                       struct sk_buff *skb, __be32 src, __be32 dst,
> -                       __u8 tos, __u8 ttl, __be16 df, __be16 src_port,
> -                       __be16 dst_port, bool xnet)
> +int udp_tunnel_xmit_skb(struct rtable *rt, struct sk_buff *skb,
> +                       __be32 src, __be32 dst, __u8 tos, __u8 ttl,
> +                       __be16 df, __be16 src_port, __be16 dst_port,
> +                       bool xnet, bool nocheck)
>  {
>         struct udphdr *uh;
>
> @@ -90,9 +90,9 @@ int udp_tunnel_xmit_skb(struct socket *sock, struct rtable *rt,
>         uh->source = src_port;
>         uh->len = htons(skb->len);
>
> -       udp_set_csum(sock->sk->sk_no_check_tx, skb, src, dst, skb->len);
> +       udp_set_csum(nocheck, skb, src, dst, skb->len);
>
> -       return iptunnel_xmit(sock->sk, rt, skb, src, dst, IPPROTO_UDP,
> +       return iptunnel_xmit(skb->sk, rt, skb, src, dst, IPPROTO_UDP,
>                              tos, ttl, df, xnet);
>  }
>  EXPORT_SYMBOL_GPL(udp_tunnel_xmit_skb);
> diff --git a/net/ipv6/ip6_udp_tunnel.c b/net/ipv6/ip6_udp_tunnel.c
> index 8db6c98..32d9b26 100644
> --- a/net/ipv6/ip6_udp_tunnel.c
> +++ b/net/ipv6/ip6_udp_tunnel.c
> @@ -62,14 +62,14 @@ error:
>  }
>  EXPORT_SYMBOL_GPL(udp_sock_create6);
>
> -int udp_tunnel6_xmit_skb(struct socket *sock, struct dst_entry *dst,
> -                        struct sk_buff *skb, struct net_device *dev,
> -                        struct in6_addr *saddr, struct in6_addr *daddr,
> -                        __u8 prio, __u8 ttl, __be16 src_port, __be16 dst_port)
> +int udp_tunnel6_xmit_skb(struct dst_entry *dst, struct sk_buff *skb,
> +                        struct net_device *dev, struct in6_addr *saddr,
> +                        struct in6_addr *daddr,
> +                        __u8 prio, __u8 ttl, __be16 src_port,
> +                        __be16 dst_port, bool nocheck)
>  {
>         struct udphdr *uh;
>         struct ipv6hdr *ip6h;
> -       struct sock *sk = sock->sk;
>
>         __skb_push(skb, sizeof(*uh));
>         skb_reset_transport_header(skb);
> @@ -85,7 +85,7 @@ int udp_tunnel6_xmit_skb(struct socket *sock, struct dst_entry *dst,
>                             | IPSKB_REROUTED);
>         skb_dst_set(skb, dst);
>
> -       udp6_set_csum(udp_get_no_check6_tx(sk), skb, saddr, daddr, skb->len);
> +       udp6_set_csum(nocheck, skb, saddr, daddr, skb->len);
>
>         __skb_push(skb, sizeof(*ip6h));
>         skb_reset_network_header(skb);
> --
> 2.2.0.rc0.207.ga3a616c
>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: net: prevent of emerging cross-namespace symlinks patches for 3.14?
From: Miquel van Smoorenburg @ 2015-01-18 23:17 UTC (permalink / raw)
  To: Alexander Y. Fomichev; +Cc: netdev, stable, David Miller
In-Reply-To: <CAChDUfTktt7DAg+Mi0+PWcykhHWvE9wg5FNieWUuoBjZGsU8jQ@mail.gmail.com>

On 15/01/15 19:39, Alexander Y. Fomichev wrote:
> On Thu, Jan 15, 2015 at 12:45 AM, Miquel van Smoorenburg mikevs@xs4all.net> wrote:
>> [first sent to lkml, now to netdev and the original patch author]
>>
>> When running 'lxc' on the latest -stable kernel, 3.14.28, I'm seeing these
>> errors:
>>
>> Jan 14 17:47:16 lxc2 kernel: [   10.704892] sysfs: cannot create duplicate
>> filename '/devices/virtual/net/eth0.104/upper_eth0'
>> I did not see these errors in 3.12. This was fixed in 3.17
>
> Hi,
>
> no objections of course,
> actually it was written and tested with 3.14 in mind.

David, could you have a quick look and ack this if you agree this should 
go in 3.14-stable ?

Thanks

Mike.

^ permalink raw reply

* RE: [net-next v2 13/15] i40e: limit WoL and link settings to partition 1
From: Nelson, Shannon @ 2015-01-18 23:26 UTC (permalink / raw)
  To: Yuval Mintz, Kirsher, Jeffrey T, David Miller
  Cc: netdev, nhorman@redhat.com, sassmann@redhat.com,
	jogreene@redhat.com
In-Reply-To: <B5657A6538887040AD3A81F1008BEC63BC8933@avmb3.qlogic.org>

> From: Yuval Mintz [mailto:Yuval.Mintz@qlogic.com]
> Sent: Saturday, January 17, 2015 10:10 PM
> 
> >From: Shannon Nelson <shannon.nelson@intel.com>
> 
> >When in multi-function mode, e.g. Dell's NPAR, only partition 1
> >of each MAC is allowed to set WoL, speed, and flow control.
> 
> Isn't it problematic? I mean, in bnx2x we address ~same issue -
> but due to symmetry we prevent ALL interfaces from changing
> the link configuration.

Yes, this multi-function hardware port makes for "interesting opportunities" in hardware configuration.  In our case, our requirements are to still allow for configuration of the hardware port.  This is not unlike how a PF controls the physical port for its VF devices.

> 
> How can user be aware of this 'private' behavior? via system logs?
> documentation?

In this case, note that we have added a log message that says these are dependent on the initial partition of the port for setting control.  Also, the particular vendor using the device in this configuration will supply and support the user documentation.

> 
> How does it interact with Physical Device Assignment of the first
> partition?

I'm not sure I understand this question, but I'll answer a different question: these multifunction ports show up as additional PCI functions, again, not unlike VF devices.

> 
> Cheers,
> Yuval

sln

^ permalink raw reply

* Re: Problem with  patch "make nlmsg_end() and genlmsg_end() void"
From: Marcel Holtmann @ 2015-01-18 23:44 UTC (permalink / raw)
  To: Network Development, Johannes Berg; +Cc: David S. Miller, Tom Gundersen
In-Reply-To: <0397034D-1DF4-409F-B335-FCD0D7EAB940@holtmann.org>

Hi Johannes,

> your commit 053c095a82cf773075e83d7233b5cc19a1f73ece is causing problems with systemd-networkd.
> 
> I have an up-to-date Arch Linux installation in a KVM and your change causes massive problems. It makes systemd-networkd to run out of memory.
> 
> systemd-fsck[84]: /dev/vda1: clean, 53283/131072 files, 409813/524032 blocks
> Out of memory: Kill process 142 (systemd-network) score 923 or sacrifice child
> Killed process 142 (systemd-network) total-vm:478416kB, anon-rss:463472kB, file-rss:460kB
> [FAILED] Failed to start Network Service.
> See "systemctl status systemd-networkd.service" for details.
>         Stopping Network Service...
> [  OK  ] Stopped Network Service.
>         Starting Network Service...
> 
> Arch Linux 3.19.0-rc4-devel+ (ttyS0)
> 
> marcel login: Out of memory: Kill process 154 (systemd-network) score 932 or sacrifice child
> Killed process 154 (systemd-network) total-vm:540784kB, anon-rss:468380kB, file-rss:132kB
> Out of memory: Kill process 158 (systemd-network) score 932 or sacrifice child
> Killed process 158 (systemd-network) total-vm:540388kB, anon-rss:468528kB, file-rss:48kB
> Out of memory: Kill process 160 (systemd-network) score 932 or sacrifice child
> Killed process 160 (systemd-network) total-vm:540916kB, anon-rss:468528kB, file-rss:4kB
> Out of memory: Kill process 162 (systemd-network) score 931 or sacrifice child
> Killed process 162 (systemd-network) total-vm:540916kB, anon-rss:468104kB, file-rss:76kB

so this was freaking nasty to find since I had to dig into every single RTNL location that might have an affect on this. I think that I tracked this down to these two locations:

diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index e13b9dbdf154..0e26b9f66cad 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -1327,7 +1327,7 @@ static int rtnl_dump_ifinfo(struct sk_buff *skb, struct netlink_callback *cb)
                         */
                        WARN_ON((err == -EMSGSIZE) && (skb->len == 0));
 
-                       if (err <= 0)
+                       if (err < 0)
                                goto out;
 
                        nl_dump_check_consistent(cb, nlmsg_hdr(skb));
diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index 8975d9501d50..d6b4f5d08014 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -4213,7 +4213,7 @@ static int inet6_dump_addr(struct sk_buff *skb, struct netlink_callback *cb,
                                goto cont;
 
                        if (in6_dump_addrs(idev, skb, cb, type,
-                                          s_ip_idx, &ip_idx) <= 0)
+                                          s_ip_idx, &ip_idx) < 0)
                                goto done;
 cont:
                        idx++;

However I am not sure that these are the only ones. We might have additional issues in functionality that systemd-networkd actually does not use at the moment. These two changes make my KVM image boot properly again.

And actually I am not even sure that these two changes are correct. My KVM image is a dead simple image with no IPv6 support. This change might actually just broke IPv6 and I would not notice.

Tom, do you know if we can do anything in systemd-networkd in regards to RTNL and netlink handling to throw a big warning when something comes back from the kernel that would cause massive memory allocation.

Regards

Marcel

^ permalink raw reply related

* Re: [PATCH 1/7] net: wireless: wcn36xx: add wcn3620 chip type definition
From: Andy Green @ 2015-01-19  0:24 UTC (permalink / raw)
  To: Kalle Valo; +Cc: Eugene Krasnikov, wcn36xx, linux-wireless, netdev
In-Reply-To: <878uh06ulp.fsf@kamboji.qca.qualcomm.com>

On 18 January 2015 at 22:17, Kalle Valo <kvalo@codeaurora.org> wrote:
> Andy Green <andy.green@linaro.org> writes:
>
>> Convert the list of chip types to an enum, add the default
>> UNKNOWN type and a type for WCN3620 chip
>>
>> Signed-off-by: Andy Green <andy.green@linaro.org>
>
> Please just use "wcn36xx: ", you should drop "net: wireless: " entirely.

OK.

Can you help me understand what you'd like to see happen with the chip
variant detection stuff?

There's a comment sent to one list only saying it might be preferable
to keep the old detection code as the default.  But there are no
in-tree users of wcn36xx (mainly due to PIL not being in mainline, I
guess).

The old test's equivalence that AC == 3680 seems kind of weak to me
and establishing the type must be passed in from platform code
reflects the situation that there's no public way to detect the chip
type from Qualcomm.  In the second not-for-upstream series I use that
to pass it in from DT, which is how it'd be normally used.

-Andy

> --
> Kalle Valo

^ permalink raw reply

* Re: [PATCH 1/6] selftests: Introduce minimal shared logic for running tests
From: Michael Ellerman @ 2015-01-19  0:35 UTC (permalink / raw)
  To: Shuah Khan
  Cc: linux-kernel, mmarek, gregkh, akpm, rostedt, mingo, davem,
	keescook, tranmanphong, cov, dh.herrmann, hughd, bobby.prani,
	serge.hallyn, ebiederm, tim.bird, josh, koct9i, linux-kbuild,
	linux-api, netdev
In-Reply-To: <54B95006.3080502@osg.samsung.com>

On Fri, 2015-01-16 at 10:53 -0700, Shuah Khan wrote:
> On 01/09/2015 02:06 AM, Michael Ellerman wrote:
> > This adds a Make include file which most selftests can then include to
> > get the run_tests logic.
> > 
> > On its own this has the advantage of some reduction in repetition, and
> > also means the pass/fail message is defined in fewer places.
> > 
> > However the key advantage is it will allow us to implement install very
> > simply in a subsequent patch.
> > 
> > The default implementation just executes each program in $(TEST_PROGS).
> > 
> > We use a variable to hold the default implementation of $(RUN_TESTS)
> > because that gives us a clean way to override it if necessary, ie. using
> > override. The mount, memory-hotplug and mqueue tests use that to provide
> > a different implementation.
> > 
> > Tests are not run via /bin/bash, so if they are scripts they must be
> > executable, we add u+x to several.
> > 
> > Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
> 
> I like the shared logic approach in general provided it leaves the
> flexibility to not use the shared logic if a test have the need to
> do so.

Yes of course it does, it's entirely optional to include lib.mk.

> This series requires some patch planning. shared logic patch
> followed by individual test patches as opposed a single patch.

It could be a single patch too, but there's no reason to do it that way. The
series works fine as I sent it.

> I would like to see the shared logic work done on top of my patch v4
> series.

That's a waste of time. This series replaces your v4. Doing this "on top" of
your v4 would just mean reverting your v4 series and then applying this.

cheers

^ permalink raw reply

* Re: [PATCH 4/6] kbuild: add a new kselftest_install make target to install selftests
From: Michael Ellerman @ 2015-01-19  0:35 UTC (permalink / raw)
  To: Shuah Khan
  Cc: linux-kernel, mmarek, gregkh, akpm, rostedt, mingo, davem,
	keescook, tranmanphong, cov, dh.herrmann, hughd, bobby.prani,
	serge.hallyn, ebiederm, tim.bird, josh, koct9i, linux-kbuild,
	linux-api, netdev
In-Reply-To: <54B93DA1.2010601@osg.samsung.com>

On Fri, 2015-01-16 at 09:34 -0700, Shuah Khan wrote:
> On 01/09/2015 02:06 AM, Michael Ellerman wrote:
> > Add a new make target to install kernel selftests. This new target will
> > build and install selftests.
> > 
> > The default is just $(objtree)/selftests. This is preferable to
> > something based on $(INSTALL_MOD_PATH) (which defaults to /), as it
> > allows a normal user to install the tests. This is similar to the
> > default behaviour of make headers_install.
> 
> A normal user can install tests at any location they choose by
> overriding the default path. For example:
> 
> INSTALL_MOD_PATH=/tmp make kselftest_install
> 
> will install under tmp.

Why default to a directory that most users can't write to? That's not helpful.

Users who are root can override the path, for example:

INSTALL_MOD_PATH=/ make kselftest_install

> The approach I used also ties test installs to kernel release.
> This addresses an important use-case for kernel developers
> that want to compare results from release to release.

Sure, I'm happy to add the kernel release, so the default would be
$(objtree)/selftests/$(kernel-release)/.

> The use-case for any user to be able to install tests at
> any location is addressed by the above example.

The default should work for most users most of the time, / does not achieve
that.

> I would like these two above use-cases continued to be supported,
> especially the one that tries the test installs to kernel release.

That's fine, I'm happy to update this to use kernel release. But defaulting to
/ doesn't make sense.

> Another goal is to keep changes to the main Makefile minimal and
> the rest of the install support belongs under selftests/Makefile
> and any other include file (like the one you proposed).

Yes, this patch does just that.

cheers

^ permalink raw reply

* Re: [PATCH 2/6] selftests: Add install target
From: Michael Ellerman @ 2015-01-19  0:35 UTC (permalink / raw)
  To: Shuah Khan
  Cc: linux-kernel, mmarek, gregkh, akpm, rostedt, mingo, davem,
	keescook, tranmanphong, cov, dh.herrmann, hughd, bobby.prani,
	serge.hallyn, ebiederm, tim.bird, josh, koct9i, linux-kbuild,
	linux-api, netdev
In-Reply-To: <54B94E75.8030504@osg.samsung.com>

On Fri, 2015-01-16 at 10:46 -0700, Shuah Khan wrote:
> On 01/09/2015 02:06 AM, Michael Ellerman wrote:
> > This adds make install support to selftests. The basic usage is:
> > 
> > $ cd tools/testing/selftests
> > $ make install
> > 
> > That installs into tools/testing/selftests/install, which can then be
> > copied where ever necessary.
> > 
> > The install destination is also configurable using eg:
> > 
> > $ INSTALL_PATH=/mnt/selftests make install
> 
> Please see my response to [PATCH 4/6] kbuild: add a new
> kselftest_install make target to install selftests
> 
> These are addressed by the current approach to use existing
> INSTALL_MOD_PATH.

No that's a separate issue.

This patch adds install support for tools/testing/selftests, *completely
separate* from the kbuild infrastructure. 

That is an existing use case, ie. using selftests on its own, which this patch
easily continues to support.

cheers

^ permalink raw reply

* Re: Problem with patch "make nlmsg_end() and genlmsg_end() void"
From: Scott Feldman @ 2015-01-19  1:53 UTC (permalink / raw)
  To: Marcel Holtmann
  Cc: Network Development, Johannes Berg, David S. Miller,
	Tom Gundersen
In-Reply-To: <E586D25A-9DAF-4535-A282-120F0C6EAE19@holtmann.org>

This patch needs to be reverted ASAP.  git bisect landed me here also;
my processes are getting the OOM msgs.  What testing was done?

Seems someone does care that nlmsg_end() returns skb->len.

-scott

On Sun, Jan 18, 2015 at 3:44 PM, Marcel Holtmann <marcel@holtmann.org> wrote:
> Hi Johannes,
>
>> your commit 053c095a82cf773075e83d7233b5cc19a1f73ece is causing problems with systemd-networkd.
>>
>> I have an up-to-date Arch Linux installation in a KVM and your change causes massive problems. It makes systemd-networkd to run out of memory.
>>
>> systemd-fsck[84]: /dev/vda1: clean, 53283/131072 files, 409813/524032 blocks
>> Out of memory: Kill process 142 (systemd-network) score 923 or sacrifice child
>> Killed process 142 (systemd-network) total-vm:478416kB, anon-rss:463472kB, file-rss:460kB
>> [FAILED] Failed to start Network Service.
>> See "systemctl status systemd-networkd.service" for details.
>>         Stopping Network Service...
>> [  OK  ] Stopped Network Service.
>>         Starting Network Service...
>>
>> Arch Linux 3.19.0-rc4-devel+ (ttyS0)
>>
>> marcel login: Out of memory: Kill process 154 (systemd-network) score 932 or sacrifice child
>> Killed process 154 (systemd-network) total-vm:540784kB, anon-rss:468380kB, file-rss:132kB
>> Out of memory: Kill process 158 (systemd-network) score 932 or sacrifice child
>> Killed process 158 (systemd-network) total-vm:540388kB, anon-rss:468528kB, file-rss:48kB
>> Out of memory: Kill process 160 (systemd-network) score 932 or sacrifice child
>> Killed process 160 (systemd-network) total-vm:540916kB, anon-rss:468528kB, file-rss:4kB
>> Out of memory: Kill process 162 (systemd-network) score 931 or sacrifice child
>> Killed process 162 (systemd-network) total-vm:540916kB, anon-rss:468104kB, file-rss:76kB
>
> so this was freaking nasty to find since I had to dig into every single RTNL location that might have an affect on this. I think that I tracked this down to these two locations:
>
> diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
> index e13b9dbdf154..0e26b9f66cad 100644
> --- a/net/core/rtnetlink.c
> +++ b/net/core/rtnetlink.c
> @@ -1327,7 +1327,7 @@ static int rtnl_dump_ifinfo(struct sk_buff *skb, struct netlink_callback *cb)
>                          */
>                         WARN_ON((err == -EMSGSIZE) && (skb->len == 0));
>
> -                       if (err <= 0)
> +                       if (err < 0)
>                                 goto out;
>
>                         nl_dump_check_consistent(cb, nlmsg_hdr(skb));
> diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
> index 8975d9501d50..d6b4f5d08014 100644
> --- a/net/ipv6/addrconf.c
> +++ b/net/ipv6/addrconf.c
> @@ -4213,7 +4213,7 @@ static int inet6_dump_addr(struct sk_buff *skb, struct netlink_callback *cb,
>                                 goto cont;
>
>                         if (in6_dump_addrs(idev, skb, cb, type,
> -                                          s_ip_idx, &ip_idx) <= 0)
> +                                          s_ip_idx, &ip_idx) < 0)
>                                 goto done;
>  cont:
>                         idx++;
>
> However I am not sure that these are the only ones. We might have additional issues in functionality that systemd-networkd actually does not use at the moment. These two changes make my KVM image boot properly again.
>
> And actually I am not even sure that these two changes are correct. My KVM image is a dead simple image with no IPv6 support. This change might actually just broke IPv6 and I would not notice.
>
> Tom, do you know if we can do anything in systemd-networkd in regards to RTNL and netlink handling to throw a big warning when something comes back from the kernel that would cause massive memory allocation.
>
> Regards
>
> Marcel
>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: Problem with patch "make nlmsg_end() and genlmsg_end() void"
From: Marcel Holtmann @ 2015-01-19  2:10 UTC (permalink / raw)
  To: Scott Feldman
  Cc: Network Development, Johannes Berg, David S. Miller,
	Tom Gundersen
In-Reply-To: <CAE4R7bCWSM+SyDR830N1GmukyaMYE_DT0fd4L_x6Py5PEqMe_A@mail.gmail.com>

Hi Scott,

> This patch needs to be reverted ASAP.  git bisect landed me here also;
> my processes are getting the OOM msgs.  What testing was done?
> 
> Seems someone does care that nlmsg_end() returns skb->len.

I still wonder how this affects userspace. I have not figured that out. Something goes wrong pretty badly somewhere.

Have you tried the small diff with the two locations that were problematic for me?

Regards

Marcel

^ permalink raw reply

* linux-next: build failure after merge of the net-next tree
From: Stephen Rothwell @ 2015-01-19  3:06 UTC (permalink / raw)
  To: David Miller, netdev
  Cc: linux-next, linux-kernel, Jiri Pirko, Scott Feldman,
	Gustavo Padovan

[-- Attachment #1: Type: text/plain, Size: 836 bytes --]

Hi all,

After merging the net-next tree, today's linux-next build (powerpc
ppc64_defconfig) failed like this:

In file included from net/bridge/br.c:22:0:
include/net/switchdev.h:73:1: error: expected identifier or '(' before '{' token
 {
 ^
include/net/switchdev.h:71:19: warning: 'call_netdev_switch_notifiers' declared 'static' but never defined [-Wunused-function]
 static inline int call_netdev_switch_notifiers(unsigned long val, struct net_device *dev,
                   ^

Caused by commit 03bf0c281234 ("switchdev: introduce switchdev notifier").

I have used the net-next tree from next-20150116 for today.  I also had
to use the bluetooth tree from next-20150116 as it was fast forwarded
to be the same as the net-next tree today.

-- 
Cheers,
Stephen Rothwell                    sfr@canb.auug.org.au

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply

* Re: [PATCH net-next] iproute2: bridge: support vlan range
From: roopa @ 2015-01-19  3:06 UTC (permalink / raw)
  To: Scott Feldman; +Cc: Netdev, shemminger, vyasevic@redhat.com, Wilson Kok
In-Reply-To: <CAE4R7bA6f-f9zTL4iKn11xAWD7evbEGMb+b8zgrm780Hx0yg4w@mail.gmail.com>

On 1/18/15, 9:44 AM, Scott Feldman wrote:
> On Sun, Jan 18, 2015 at 1:11 AM, roopa <roopa@cumulusnetworks.com> wrote:
>> On 1/17/15, 5:35 PM, Scott Feldman wrote:
>>> On Thu, Jan 15, 2015 at 10:52 PM,  <roopa@cumulusnetworks.com> wrote:
>>>> From: Roopa Prabhu <roopa@cumulusnetworks.com>
>>>>
>>>> This patch adds vlan range support to bridge command
>>>> using the newly added vinfo flags BRIDGE_VLAN_INFO_RANGE_BEGIN and
>>>> BRIDGE_VLAN_INFO_RANGE_END.
>>>>
>>>> +                               vinfo.flags |=
>>>> BRIDGE_VLAN_INFO_RANGE_BEGIN;
>>>> +                       } else {
>>>> +                               vinfo.vid = atoi(*argv);
>>>> +                       }
>>>>                   } else if (strcmp(*argv, "self") == 0) {
>>>>                           flags |= BRIDGE_FLAGS_SELF;
>>>>                   } else if (strcmp(*argv, "master") == 0) {
>>>> @@ -67,7 +78,7 @@ static int vlan_modify(int cmd, int argc, char **argv)
>>>>                   argc--; argv++;
>>>>           }
>>>>
>>>> -       if (d == NULL || vid == -1) {
>>>> +       if (d == NULL || vinfo.vid == -1) {
>>> Where was vinfo.vid initialized to -1?  Maybe use vid rather than
>>> vinfo.vid in the code above where parsing the arg, and continue using
>>> vid and vid_end until final put of vinfo.
>>>
>> There is already a "memset(&vinfo, 0, sizeof(vinfo));"  in the code in the
>> beginning of the function.
> That's the problem...vinfo.vid is initialized to 0, not -1, so
> checking if vinfo.vid == -1 is always false.

ack, v2 coming ... thanks

^ permalink raw reply

* Re: linux-next: build failure after merge of the net-next tree
From: Marcel Holtmann @ 2015-01-19  3:13 UTC (permalink / raw)
  To: Stephen Rothwell
  Cc: David S. Miller, netdev, linux-next, linux-kernel, Jiri Pirko,
	Scott Feldman, Gustavo F. Padovan
In-Reply-To: <20150119140610.6b41b5ef@canb.auug.org.au>

Hi Stephen,

> After merging the net-next tree, today's linux-next build (powerpc
> ppc64_defconfig) failed like this:
> 
> In file included from net/bridge/br.c:22:0:
> include/net/switchdev.h:73:1: error: expected identifier or '(' before '{' token
> {
> ^
> include/net/switchdev.h:71:19: warning: 'call_netdev_switch_notifiers' declared 'static' but never defined [-Wunused-function]
> static inline int call_netdev_switch_notifiers(unsigned long val, struct net_device *dev,
>                   ^
> 
> Caused by commit 03bf0c281234 ("switchdev: introduce switchdev notifier").
> 
> I have used the net-next tree from next-20150116 for today.  I also had
> to use the bluetooth tree from next-20150116 as it was fast forwarded
> to be the same as the net-next tree today.

it seems Dave forgot to push out his net-next tree, but in theory it should contain a fix for it.

Regards

Marcel

^ permalink raw reply

* Re: linux-next: build failure after merge of the net-next tree
From: Stephen Rothwell @ 2015-01-19  3:30 UTC (permalink / raw)
  To: Marcel Holtmann
  Cc: David S. Miller, netdev, linux-next, linux-kernel, Jiri Pirko,
	Scott Feldman, Gustavo F. Padovan
In-Reply-To: <289C7DCC-D2B3-43B6-B661-EE19371D8B17@holtmann.org>

[-- Attachment #1: Type: text/plain, Size: 1193 bytes --]

Hi Marcel,

On Sun, 18 Jan 2015 19:13:17 -0800 Marcel Holtmann <marcel@holtmann.org> wrote:
>
> > After merging the net-next tree, today's linux-next build (powerpc
> > ppc64_defconfig) failed like this:
> > 
> > In file included from net/bridge/br.c:22:0:
> > include/net/switchdev.h:73:1: error: expected identifier or '(' before '{' token
> > {
> > ^
> > include/net/switchdev.h:71:19: warning: 'call_netdev_switch_notifiers' declared 'static' but never defined [-Wunused-function]
> > static inline int call_netdev_switch_notifiers(unsigned long val, struct net_device *dev,
> >                   ^
> > 
> > Caused by commit 03bf0c281234 ("switchdev: introduce switchdev notifier").
> > 
> > I have used the net-next tree from next-20150116 for today.  I also had
> > to use the bluetooth tree from next-20150116 as it was fast forwarded
> > to be the same as the net-next tree today.
> 
> it seems Dave forgot to push out his net-next tree, but in theory it should contain a fix for it.

Or he did so after I fetched his tree this morning.  Either way, I
assume it will be fixed tomorrow.

-- 
Cheers,
Stephen Rothwell                    sfr@canb.auug.org.au

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox