Netdev List
 help / color / mirror / Atom feed
* Re: [iproute2][PATCH] tc: mirred target: do not report non-existing devices
From: Dan Kenigsberg @ 2012-09-02 10:41 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev, Stephen Hemminger
In-Reply-To: <1346338914.2586.14.camel@edumazet-glaptop>

On Thu, Aug 30, 2012 at 08:01:54AM -0700, Eric Dumazet wrote:
> On Thu, 2012-08-30 at 17:51 +0300, Dan Kenigsberg wrote:
> > Currently, if a mirred target device is removed, `tc filter show`
> > does not reveal the fact. Instead, it replaces the original name of the
> > device with the default output of ll_map:ll_idx_n2a().
> > 
> > This is unfortunate, since one cannot differ between this case and a valid
> > mirroring target device named 'if17'.
> > 
> > It seems that the original code meant to report an error message in this
> > case, but it does not, since ll_index_to_name() never returns 0. I would
> > not like to bail out in case of an error, since the user would still be
> > interested to know what are the other details of the action.
> > 
> > Signed-off-by: Dan Kenigsberg <danken@redhat.com>
> > ---
> >  lib/ll_map.c  |   13 +++++++++++++
> >  tc/m_mirred.c |   10 ++++------
> >  2 files changed, 17 insertions(+), 6 deletions(-)
> > 
> > diff --git a/lib/ll_map.c b/lib/ll_map.c
> > index 1ca781e..8ceef41 100644
> > --- a/lib/ll_map.c
> > +++ b/lib/ll_map.c
> > @@ -108,6 +108,19 @@ const char *ll_idx_n2a(unsigned idx, char *buf)
> >  	return buf;
> >  }
> >  
> > +char *ll_index_exists(unsigned idx)
> > +{
> > +	const struct ll_cache *im;
> > +
> > +	if (idx == 0)
> > +		return 0;
> > +
> > +	for (im = idxhead(idx); im; im = im->idx_next)
> > +		if (im->index == idx)
> > +			return 1;
> > +
> > +	return 0;
> > +}
> >  
> 
> I am curious to know what compiler accepted this.

It was gcc. gcc 2> /dev/null || :, to be exact.

Sorry, fixed patch follows.

^ permalink raw reply

* [PATCH] i825xx: fix paging fault on znet_probe()
From: Fengguang Wu @ 2012-09-02  7:25 UTC (permalink / raw)
  To: David Miller; +Cc: Jeff Kirsher, netdev, LKML

In znet_probe(), strncmp() may access beyond 0x100000 and
trigger the below oops in kvm.  Fix it by limiting the loop
under 0x100000-8. I suspect the limit could be further decreased
to 0x100000-sizeof(struct netidblk), however no datasheet at hand..

[    3.744312] BUG: unable to handle kernel paging request at 80100000
[    3.746145] IP: [<8119d12a>] strncmp+0xc/0x20
[    3.747446] *pde = 01d10067 *pte = 00100160 
[    3.747493] Oops: 0000 [#1] DEBUG_PAGEALLOC
[    3.747493] Pid: 1, comm: swapper Not tainted 3.6.0-rc1-00018-g57bfc0a #73 Bochs Bochs
[    3.747493] EIP: 0060:[<8119d12a>] EFLAGS: 00010206 CPU: 0
[    3.747493] EIP is at strncmp+0xc/0x20
[    3.747493] EAX: 800fff4e EBX: 00000006 ECX: 00000006 EDX: 814d2bb9
[    3.747493] ESI: 80100000 EDI: 814d2bba EBP: 8e03dfa0 ESP: 8e03df98
[    3.747493]  DS: 007b ES: 007b FS: 0000 GS: 00e0 SS: 0068
[    3.747493] CR0: 8005003b CR2: 80100000 CR3: 016f7000 CR4: 00000690
[    3.747493] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
[    3.747493] DR6: ffff0ff0 DR7: 00000400
[    3.747493] Process swapper (pid: 1, ti=8e03c000 task=8e040000 task.ti=8e03c000)
[    3.747493] Stack:
[    3.747493]  800fffff 00000000 8e03dfb4 816a1376 00000006 816a134a 00000000 8e03dfd0
[    3.747493]  816819b5 816ed1c0 8e03dfe4 00000006 00000123 816ed604 8e03dfe4 81681b29
[    3.747493]  00000000 81681a5b 00000000 00000000 8134e542 00000000 00000000 00000000
[    3.747493] Call Trace:
[    3.747493]  [<816a1376>] znet_probe+0x2c/0x26b                             
[    3.747493]  [<816a134a>] ? dnet_driver_init+0xf/0xf                        
[    3.747493]  [<816819b5>] do_one_initcall+0x6a/0x110                        
[    3.747493]  [<81681b29>] kernel_init+0xce/0x14b                            

Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
---
 drivers/net/ethernet/i825xx/znet.c |    8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

--- linux.orig/drivers/net/ethernet/i825xx/znet.c	2012-05-24 19:03:06.928430941 +0800
+++ linux/drivers/net/ethernet/i825xx/znet.c	2012-09-02 15:14:24.943249546 +0800
@@ -139,8 +139,11 @@ struct znet_private {
 /* Only one can be built-in;-> */
 static struct net_device *znet_dev;
 
+#define NETIDBLK_MAGIC		"NETIDBLK"
+#define NETIDBLK_MAGIC_SIZE	8
+
 struct netidblk {
-	char magic[8];		/* The magic number (string) "NETIDBLK" */
+	char magic[NETIDBLK_MAGIC_SIZE];	/* The magic number (string) "NETIDBLK" */
 	unsigned char netid[8]; /* The physical station address */
 	char nettype, globalopt;
 	char vendor[8];		/* The machine vendor and product name. */
@@ -373,14 +376,16 @@ static int __init znet_probe (void)
 	struct znet_private *znet;
 	struct net_device *dev;
 	char *p;
+	char *plast = phys_to_virt(0x100000 - NETIDBLK_MAGIC_SIZE);
 	int err = -ENOMEM;
 
 	/* This code scans the region 0xf0000 to 0xfffff for a "NETIDBLK". */
-	for(p = (char *)phys_to_virt(0xf0000); p < (char *)phys_to_virt(0x100000); p++)
-		if (*p == 'N'  &&  strncmp(p, "NETIDBLK", 8) == 0)
+	for(p = (char *)phys_to_virt(0xf0000); p <= plast; p++)
+		if (*p == 'N' &&
+		    strncmp(p, NETIDBLK_MAGIC, NETIDBLK_MAGIC_SIZE) == 0)
 			break;
 
-	if (p >= (char *)phys_to_virt(0x100000)) {
+	if (p > plast) {
 		if (znet_debug > 1)
 			printk(KERN_INFO "No Z-Note ethernet adaptor found.\n");
 		return -ENODEV;

^ permalink raw reply

* Re: [PATCH] net: Avoid spinlock recursion
From: Fubo Chen @ 2012-09-02  6:55 UTC (permalink / raw)
  To: Cong Wang; +Cc: netdev@vger.kernel.org, David S. Miller
In-Reply-To: <1346418123.20739.4.camel@cr0>

On Fri, Aug 31, 2012 at 1:02 PM, Cong Wang <amwang@redhat.com> wrote:
> On Mon, 2012-08-27 at 14:52 +0000, Fubo Chen wrote:
>> Fixes a bug introduced by commit 6bdb7fe31046ac50b47e83c35cd6c6b6160a475d.
>
> I already sent a patch:
> http://patchwork.ozlabs.org/patch/179954/

Sorry, I had missed your patch. Thanks for reply.

Fubo.

^ permalink raw reply

* [PATCH v2 2/2] iproute2: add support for tcp_metrics
From: Julian Anastasov @ 2012-09-02  5:36 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, Stephen Hemminger
In-Reply-To: <1346564178-1794-1-git-send-email-ja@ssi.bg>

	ip tcp_metrics/tcpmetrics

v2:
- On flush we should provide the address in req2
- "flush all" should use single del, just like when "all" is not provided
- Explain printed information in man page

Signed-off-by: Julian Anastasov <ja@ssi.bg>
---

diff -urpN iproute2-3.5.1/include/linux/tcp_metrics.h iproute2-3.5.1-tcp_metrics/include/linux/tcp_metrics.h
--- iproute2-3.5.1/include/linux/tcp_metrics.h	1970-01-01 02:00:00.000000000 +0200
+++ iproute2-3.5.1-tcp_metrics/include/linux/tcp_metrics.h	2012-08-23 09:50:54.385569009 +0300
@@ -0,0 +1,54 @@
+/* tcp_metrics.h - TCP Metrics Interface */
+
+#ifndef _LINUX_TCP_METRICS_H
+#define _LINUX_TCP_METRICS_H
+
+#include <linux/types.h>
+
+/* NETLINK_GENERIC related info
+ */
+#define TCP_METRICS_GENL_NAME		"tcp_metrics"
+#define TCP_METRICS_GENL_VERSION	0x1
+
+enum tcp_metric_index {
+	TCP_METRIC_RTT,
+	TCP_METRIC_RTTVAR,
+	TCP_METRIC_SSTHRESH,
+	TCP_METRIC_CWND,
+	TCP_METRIC_REORDERING,
+
+	/* Always last.  */
+	__TCP_METRIC_MAX,
+};
+
+#define TCP_METRIC_MAX	(__TCP_METRIC_MAX - 1)
+
+enum {
+	TCP_METRICS_ATTR_UNSPEC,
+	TCP_METRICS_ATTR_ADDR_IPV4,		/* u32 */
+	TCP_METRICS_ATTR_ADDR_IPV6,		/* binary */
+	TCP_METRICS_ATTR_AGE,			/* msecs */
+	TCP_METRICS_ATTR_TW_TSVAL,		/* u32, raw, rcv tsval */
+	TCP_METRICS_ATTR_TW_TS_STAMP,		/* s32, sec age */
+	TCP_METRICS_ATTR_VALS,			/* nested +1, u32 */
+	TCP_METRICS_ATTR_FOPEN_MSS,		/* u16 */
+	TCP_METRICS_ATTR_FOPEN_SYN_DROPS,	/* u16, count of drops */
+	TCP_METRICS_ATTR_FOPEN_SYN_DROP_TS,	/* msecs age */
+	TCP_METRICS_ATTR_FOPEN_COOKIE,		/* binary */
+
+	__TCP_METRICS_ATTR_MAX,
+};
+
+#define TCP_METRICS_ATTR_MAX	(__TCP_METRICS_ATTR_MAX - 1)
+
+enum {
+	TCP_METRICS_CMD_UNSPEC,
+	TCP_METRICS_CMD_GET,
+	TCP_METRICS_CMD_DEL,
+
+	__TCP_METRICS_CMD_MAX,
+};
+
+#define TCP_METRICS_CMD_MAX	(__TCP_METRICS_CMD_MAX - 1)
+
+#endif /* _LINUX_TCP_METRICS_H */
diff -urpN iproute2-3.5.1/ip/Makefile iproute2-3.5.1-tcp_metrics/ip/Makefile
--- iproute2-3.5.1/ip/Makefile	2012-08-13 18:13:58.000000000 +0300
+++ iproute2-3.5.1-tcp_metrics/ip/Makefile	2012-08-23 10:24:01.561660891 +0300
@@ -3,7 +3,7 @@ IPOBJ=ip.o ipaddress.o ipaddrlabel.o ipr
     ipmaddr.o ipmonitor.o ipmroute.o ipprefix.o iptuntap.o \
     ipxfrm.o xfrm_state.o xfrm_policy.o xfrm_monitor.o \
     iplink_vlan.o link_veth.o link_gre.o iplink_can.o \
-    iplink_macvlan.o iplink_macvtap.o ipl2tp.o
+    iplink_macvlan.o iplink_macvtap.o ipl2tp.o tcp_metrics.o
 
 RTMONOBJ=rtmon.o
 
diff -urpN iproute2-3.5.1/ip/ip.c iproute2-3.5.1-tcp_metrics/ip/ip.c
--- iproute2-3.5.1/ip/ip.c	2012-08-13 18:13:58.000000000 +0300
+++ iproute2-3.5.1-tcp_metrics/ip/ip.c	2012-08-23 10:21:20.917653464 +0300
@@ -45,7 +45,7 @@ static void usage(void)
 "       ip [ -force ] -batch filename\n"
 "where  OBJECT := { link | addr | addrlabel | route | rule | neigh | ntable |\n"
 "                   tunnel | tuntap | maddr | mroute | mrule | monitor | xfrm |\n"
-"                   netns | l2tp }\n"
+"                   netns | l2tp | tcp_metrics }\n"
 "       OPTIONS := { -V[ersion] | -s[tatistics] | -d[etails] | -r[esolve] |\n"
 "                    -f[amily] { inet | inet6 | ipx | dnet | link } |\n"
 "                    -l[oops] { maximum-addr-flush-attempts } |\n"
@@ -78,6 +78,8 @@ static const struct cmd {
 	{ "tunl",	do_iptunnel },
 	{ "tuntap",	do_iptuntap },
 	{ "tap",	do_iptuntap },
+	{ "tcpmetrics",	do_tcp_metrics },
+	{ "tcp_metrics",do_tcp_metrics },
 	{ "monitor",	do_ipmonitor },
 	{ "xfrm",	do_xfrm },
 	{ "mroute",	do_multiroute },
diff -urpN iproute2-3.5.1/ip/ip_common.h iproute2-3.5.1-tcp_metrics/ip/ip_common.h
--- iproute2-3.5.1/ip/ip_common.h	2012-08-13 18:13:58.000000000 +0300
+++ iproute2-3.5.1-tcp_metrics/ip/ip_common.h	2012-08-23 10:19:11.005647457 +0300
@@ -42,6 +42,7 @@ extern int do_multirule(int argc, char *
 extern int do_netns(int argc, char **argv);
 extern int do_xfrm(int argc, char **argv);
 extern int do_ipl2tp(int argc, char **argv);
+extern int do_tcp_metrics(int argc, char **argv);
 
 static inline int rtm_get_table(struct rtmsg *r, struct rtattr **tb)
 {
diff -urpN iproute2-3.5.1/ip/tcp_metrics.c iproute2-3.5.1-tcp_metrics/ip/tcp_metrics.c
--- iproute2-3.5.1/ip/tcp_metrics.c	1970-01-01 02:00:00.000000000 +0200
+++ iproute2-3.5.1-tcp_metrics/ip/tcp_metrics.c	2012-08-25 15:18:31.695944914 +0300
@@ -0,0 +1,504 @@
+/*
+ * tcp_metrics.c	"ip tcp_metrics/tcpmetrics"
+ *
+ *		This program is free software; you can redistribute it and/or
+ *		modify it under the terms of the GNU General Public License
+ *		version 2 as published by the Free Software Foundation;
+ *
+ * Authors:	Julian Anastasov <ja@ssi.bg>, August 2012
+ */
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <unistd.h>
+#include <errno.h>
+#include <sys/types.h>
+#include <sys/socket.h>
+#include <arpa/inet.h>
+#include <sys/ioctl.h>
+#include <linux/if.h>
+
+#include <linux/genetlink.h>
+#include <linux/tcp_metrics.h>
+
+#include "utils.h"
+#include "ip_common.h"
+
+static void usage(void)
+{
+	fprintf(stderr, "Usage: ip tcp_metrics/tcpmetrics { COMMAND | help }\n");
+	fprintf(stderr, "       ip tcp_metrics { show | flush } SELECTOR\n");
+	fprintf(stderr, "       ip tcp_metrics delete [ address ] ADDRESS\n");
+	fprintf(stderr, "SELECTOR := [ [ address ] PREFIX ]\n");
+	exit(-1);
+}
+
+/* netlink socket */
+static struct rtnl_handle grth = { .fd = -1 };
+static int genl_family = -1;
+
+
+#define GENL_DEFINE_REQUEST(req, hdrsize, bufsiz)			\
+struct {								\
+	struct nlmsghdr		n;					\
+	struct genlmsghdr	g;					\
+	char			buf[NLMSG_ALIGN(hdrsize) + bufsiz];	\
+} req
+
+#define GENL_INIT_REQUEST(req, family, hdrsize, ver, cmd_, flags)	\
+	do {								\
+		memset(&req, 0, sizeof(req));				\
+		req.n.nlmsg_type = family;				\
+		req.n.nlmsg_flags = flags;				\
+		req.n.nlmsg_len = NLMSG_LENGTH(GENL_HDRLEN + hdrsize);	\
+		req.g.cmd = cmd_;					\
+		req.g.version = ver;					\
+	} while (0)
+
+#define INIT_GENL_ACTION(req, cmd, ack)					\
+	GENL_INIT_REQUEST(req, genl_family, 0,				\
+			  TCP_METRICS_GENL_VERSION, cmd,		\
+			  NLM_F_REQUEST | ((ack) ? NLM_F_ACK : 0))
+
+#define INIT_GENL_DUMP(req)						\
+	GENL_INIT_REQUEST(req, genl_family, 0,				\
+			  TCP_METRICS_GENL_VERSION,			\
+			  TCP_METRICS_CMD_GET,				\
+			  NLM_F_DUMP | NLM_F_REQUEST)
+
+#define CMD_LIST	0x0001	/* list, lst, show		*/
+#define CMD_DEL		0x0002	/* delete, remove		*/
+#define CMD_FLUSH	0x0004	/* flush			*/
+
+static struct {
+	char	*name;
+	int	code;
+} cmds[] = {
+	{	"list",		CMD_LIST	},
+	{	"lst",		CMD_LIST	},
+	{	"show",		CMD_LIST	},
+	{	"delete",	CMD_DEL		},
+	{	"remove",	CMD_DEL		},
+	{	"flush",	CMD_FLUSH	},
+};
+
+static char *metric_name[TCP_METRIC_MAX + 1] = {
+	[TCP_METRIC_RTT]		= "rtt",
+	[TCP_METRIC_RTTVAR]		= "rttvar",
+	[TCP_METRIC_SSTHRESH]		= "ssthresh",
+	[TCP_METRIC_CWND]		= "cwnd",
+	[TCP_METRIC_REORDERING]		= "reordering",
+};
+
+static struct
+{
+	int flushed;
+	char *flushb;
+	int flushp;
+	int flushe;
+	int cmd;
+	inet_prefix addr;
+} f;
+
+static int flush_update(void)
+{
+	if (rtnl_send_check(&grth, f.flushb, f.flushp) < 0) {
+		perror("Failed to send flush request\n");
+		return -1;
+	}
+	f.flushp = 0;
+	return 0;
+}
+
+static int process_msg(const struct sockaddr_nl *who, struct nlmsghdr *n,
+		       void *arg)
+{
+	FILE *fp = (FILE *) arg;
+	struct genlmsghdr *ghdr;
+	struct rtattr *attrs[TCP_METRICS_ATTR_MAX + 1], *a;
+	int len = n->nlmsg_len;
+	char abuf[256];
+	inet_prefix addr;
+	int family, i, atype;
+
+	if (n->nlmsg_type != genl_family)
+		return -1;
+
+	len -= NLMSG_LENGTH(GENL_HDRLEN);
+	if (len < 0)
+		return -1;
+
+	ghdr = NLMSG_DATA(n);
+	if (ghdr->cmd != TCP_METRICS_CMD_GET)
+		return 0;
+
+	parse_rtattr(attrs, TCP_METRICS_ATTR_MAX, (void *) ghdr + GENL_HDRLEN,
+		     len);
+
+	a = attrs[TCP_METRICS_ATTR_ADDR_IPV4];
+	if (a) {
+		if (f.addr.family && f.addr.family != AF_INET)
+			return 0;
+		memcpy(&addr.data, RTA_DATA(a), 4);
+		addr.bytelen = 4;
+		family = AF_INET;
+		atype = TCP_METRICS_ATTR_ADDR_IPV4;
+	} else {
+		a = attrs[TCP_METRICS_ATTR_ADDR_IPV6];
+		if (a) {
+			if (f.addr.family && f.addr.family != AF_INET6)
+				return 0;
+			memcpy(&addr.data, RTA_DATA(a), 16);
+			addr.bytelen = 16;
+			family = AF_INET6;
+			atype = TCP_METRICS_ATTR_ADDR_IPV6;
+		} else
+			return 0;
+	}
+
+	if (f.addr.family && f.addr.bitlen >= 0 &&
+	    inet_addr_match(&addr, &f.addr, f.addr.bitlen))
+		return 0;
+
+	if (f.flushb) {
+		struct nlmsghdr *fn;
+		GENL_DEFINE_REQUEST(req2, 0, 128);
+
+		INIT_GENL_ACTION(req2, TCP_METRICS_CMD_DEL, 0);
+		addattr_l(&req2.n, sizeof(req2), atype, &addr.data,
+			  addr.bytelen);
+
+		if (NLMSG_ALIGN(f.flushp) + req2.n.nlmsg_len > f.flushe) {
+			if (flush_update())
+				return -1;
+		}
+		fn = (struct nlmsghdr *) (f.flushb + NLMSG_ALIGN(f.flushp));
+		memcpy(fn, &req2.n, req2.n.nlmsg_len);
+		fn->nlmsg_seq = ++grth.seq;
+		f.flushp = (((char *) fn) + req2.n.nlmsg_len) - f.flushb;
+		f.flushed++;
+		if (show_stats < 2)
+			return 0;
+	}
+
+	if (f.cmd & (CMD_DEL | CMD_FLUSH))
+		fprintf(fp, "Deleted ");
+
+	fprintf(fp, "%s",
+		format_host(family, RTA_PAYLOAD(a), &addr.data,
+			    abuf, sizeof(abuf)));
+
+	a = attrs[TCP_METRICS_ATTR_AGE];
+	if (a) {
+		__u64 val = rta_getattr_u64(a);
+
+		fprintf(fp, " age %llu.%03llusec",
+			val / 1000, val % 1000);
+	}
+
+	a = attrs[TCP_METRICS_ATTR_TW_TS_STAMP];
+	if (a) {
+		__s32 val = (__s32) rta_getattr_u32(a);
+		__u32 tsval;
+
+		a = attrs[TCP_METRICS_ATTR_TW_TSVAL];
+		tsval = a ? rta_getattr_u32(a) : 0;
+		fprintf(fp, " tw_ts %u/%dsec ago", tsval, val);
+	}
+
+	a = attrs[TCP_METRICS_ATTR_VALS];
+	if (a) {
+		struct rtattr *m[TCP_METRIC_MAX + 1 + 1];
+
+		parse_rtattr_nested(m, TCP_METRIC_MAX + 1, a);
+
+		for (i = 0; i < TCP_METRIC_MAX + 1; i++) {
+			__u32 val;
+
+			a = m[i + 1];
+			if (!a)
+				continue;
+			if (metric_name[i])
+				fprintf(fp, " %s ", metric_name[i]);
+			else
+				fprintf(fp, " metric_%d ", i);
+			val = rta_getattr_u32(a);
+			switch (i) {
+			case TCP_METRIC_RTT:
+			case TCP_METRIC_RTTVAR:
+				fprintf(fp, "%ums", val);
+				break;
+			case TCP_METRIC_SSTHRESH:
+			case TCP_METRIC_CWND:
+			case TCP_METRIC_REORDERING:
+			default:
+				fprintf(fp, "%u", val);
+				break;
+			}
+		}
+	}
+
+	a = attrs[TCP_METRICS_ATTR_FOPEN_MSS];
+	if (a)
+		fprintf(fp, " fo_mss %u", rta_getattr_u16(a));
+
+	a = attrs[TCP_METRICS_ATTR_FOPEN_SYN_DROPS];
+	if (a) {
+		__u16 syn_loss = rta_getattr_u16(a);
+		__u64 ts;
+
+		a = attrs[TCP_METRICS_ATTR_FOPEN_SYN_DROP_TS];
+		ts = a ? rta_getattr_u64(a) : 0;
+
+		fprintf(fp, " fo_syn_drops %u/%llu.%03llusec ago",
+			syn_loss, ts / 1000, ts % 1000);
+	}
+
+	a = attrs[TCP_METRICS_ATTR_FOPEN_COOKIE];
+	if (a) {
+		char cookie[32 + 1];
+		unsigned char *ptr = RTA_DATA(a);
+		int i, max = RTA_PAYLOAD(a);
+
+		if (max > 16)
+			max = 16;
+		cookie[0] = 0;
+		for (i = 0; i < max; i++)
+			sprintf(cookie + i + i, "%02x", ptr[i]);
+		fprintf(fp, " fo_cookie %s", cookie);
+	}
+
+	fprintf(fp, "\n");
+
+	fflush(fp);
+	return 0;
+}
+
+static int genl_parse_getfamily(struct nlmsghdr *nlh)
+{
+	struct rtattr *tb[CTRL_ATTR_MAX + 1];
+	struct genlmsghdr *ghdr = NLMSG_DATA(nlh);
+	int len = nlh->nlmsg_len;
+	struct rtattr *attrs;
+
+	if (nlh->nlmsg_type != GENL_ID_CTRL) {
+		fprintf(stderr, "Not a controller message, nlmsg_len=%d "
+			"nlmsg_type=0x%x\n", nlh->nlmsg_len, nlh->nlmsg_type);
+		return -1;
+	}
+
+	if (ghdr->cmd != CTRL_CMD_NEWFAMILY) {
+		fprintf(stderr, "Unknown controller command %d\n", ghdr->cmd);
+		return -1;
+	}
+
+	len -= NLMSG_LENGTH(GENL_HDRLEN);
+
+	if (len < 0) {
+		fprintf(stderr, "wrong controller message len %d\n", len);
+		return -1;
+	}
+
+	attrs = (struct rtattr *) ((char *) ghdr + GENL_HDRLEN);
+	parse_rtattr(tb, CTRL_ATTR_MAX, attrs, len);
+
+	if (tb[CTRL_ATTR_FAMILY_ID] == NULL) {
+		fprintf(stderr, "Missing family id TLV\n");
+		return -1;
+	}
+
+	return rta_getattr_u16(tb[CTRL_ATTR_FAMILY_ID]);
+}
+
+static int genl_ctrl_resolve_family(const char *family)
+{
+	GENL_DEFINE_REQUEST(req, 0, 1024);
+
+	GENL_INIT_REQUEST(req, GENL_ID_CTRL, 0, 0, CTRL_CMD_GETFAMILY,
+			  NLM_F_REQUEST);
+
+	addattr_l(&req.n, 1024, CTRL_ATTR_FAMILY_NAME,
+		  family, strlen(family) + 1);
+
+	if (rtnl_talk(&grth, &req.n, 0, 0, &req.n) < 0) {
+		fprintf(stderr, "Error talking to the kernel\n");
+		return -2;
+	}
+
+	return genl_parse_getfamily(&req.n);
+}
+
+static int tcpm_do_cmd(int cmd, int argc, char **argv)
+{
+	GENL_DEFINE_REQUEST(req, 0, 1024);
+	int atype = -1;
+	int code, ack;
+
+	memset(&f, 0, sizeof(f));
+	f.addr.bitlen = -1;
+	f.addr.family = preferred_family;
+
+	switch (preferred_family) {
+	case AF_UNSPEC:
+	case AF_INET:
+	case AF_INET6:
+		break;
+	default:
+		fprintf(stderr, "Unsupported family:%d\n", preferred_family);
+		return -1;
+	}
+
+	for (; argc > 0; argc--, argv++) {
+		char *who = "address";
+
+		if (strcmp(*argv, "addr") == 0 ||
+		    strcmp(*argv, "address") == 0) {
+			who = *argv;
+			NEXT_ARG();
+		}
+		if (matches(*argv, "help") == 0)
+			usage();
+		if (f.addr.bitlen >= 0)
+			duparg2(who, *argv);
+
+		get_prefix(&f.addr, *argv, preferred_family);
+		if (f.addr.bytelen && f.addr.bytelen * 8 == f.addr.bitlen) {
+			if (f.addr.family == AF_INET)
+				atype = TCP_METRICS_ATTR_ADDR_IPV4;
+			else if (f.addr.family == AF_INET6)
+				atype = TCP_METRICS_ATTR_ADDR_IPV6;
+		}
+		if ((CMD_DEL & cmd) && atype < 0) {
+			fprintf(stderr, "Error: a specific IP address is expected rather than \"%s\"\n",
+				*argv);
+			return -1;
+		}
+
+		argc--; argv++;
+	}
+
+	if (cmd == CMD_DEL && atype < 0)
+		missarg("address");
+
+	/* flush for exact address ? Single del */
+	if (cmd == CMD_FLUSH && atype >= 0)
+		cmd = CMD_DEL;
+	/* flush for all addresses ? Single del without address */
+	if (cmd == CMD_FLUSH && f.addr.bitlen <= 0 &&
+	    preferred_family == AF_UNSPEC) {
+		cmd = CMD_DEL;
+		code = TCP_METRICS_CMD_DEL;
+		ack = 1;
+	} else if (cmd == CMD_DEL) {
+		code = TCP_METRICS_CMD_DEL;
+		ack = 1;
+	} else {	/* CMD_FLUSH, CMD_LIST */
+		code = TCP_METRICS_CMD_GET;
+		ack = 0;
+	}
+
+	if (genl_family < 0) {
+		if (rtnl_open_byproto(&grth, 0, NETLINK_GENERIC) < 0) {
+			fprintf(stderr, "Cannot open generic netlink socket\n");
+			exit(1);
+		}
+		genl_family = genl_ctrl_resolve_family(TCP_METRICS_GENL_NAME);
+		if (genl_family < 0)
+			exit(1);
+	}
+
+	if (!(cmd & CMD_FLUSH) && (atype >= 0 || (cmd & CMD_DEL))) {
+		INIT_GENL_ACTION(req, code, ack);
+		if (atype >= 0)
+			addattr_l(&req.n, sizeof(req), atype, &f.addr.data,
+				  f.addr.bytelen);
+	} else {
+		INIT_GENL_DUMP(req);
+	}
+
+	f.cmd = cmd;
+	if (cmd & CMD_FLUSH) {
+		int round = 0;
+		char flushb[4096-512];
+
+		f.flushb = flushb;
+		f.flushp = 0;
+		f.flushe = sizeof(flushb);
+
+		for (;;) {
+			req.n.nlmsg_seq = grth.dump = ++grth.seq;
+			if (rtnl_send(&grth, &req, req.n.nlmsg_len) < 0) {
+				perror("Failed to send flush request");
+				exit(1);
+			}
+			f.flushed = 0;
+			if (rtnl_dump_filter(&grth, process_msg, stdout) < 0) {
+				fprintf(stderr, "Flush terminated\n");
+				exit(1);
+			}
+			if (f.flushed == 0) {
+				if (round == 0) {
+					fprintf(stderr, "Nothing to flush.\n");
+				} else if (show_stats)
+					printf("*** Flush is complete after %d round%s ***\n",
+					       round, round > 1 ? "s" : "");
+				fflush(stdout);
+				return 0;
+			}
+			round++;
+			if (flush_update() < 0)
+				exit(1);
+			if (show_stats) {
+				printf("\n*** Round %d, deleting %d entries ***\n",
+				       round, f.flushed);
+				fflush(stdout);
+			}
+		}
+		return 0;
+	}
+
+	req.n.nlmsg_seq = ++grth.seq;
+	if (ack) {
+		if (rtnl_talk(&grth, &req.n, 0, 0, NULL) < 0)
+			return -2;
+	} else if (atype >= 0) {
+		if (rtnl_talk(&grth, &req.n, 0, 0, &req.n) < 0)
+			return -2;
+		if (process_msg(NULL, &req.n, stdout) < 0) {
+			fprintf(stderr, "Dump terminated\n");
+			exit(1);
+		}
+	} else {
+		grth.dump = grth.seq;
+		if (rtnl_send(&grth, &req, req.n.nlmsg_len) < 0) {
+			perror("Failed to send dump request");
+			exit(1);
+		}
+
+		if (rtnl_dump_filter(&grth, process_msg, stdout) < 0) {
+			fprintf(stderr, "Dump terminated\n");
+			exit(1);
+		}
+	}
+	return 0;
+}
+
+int do_tcp_metrics(int argc, char **argv)
+{
+	int i;
+
+	if (argc < 1)
+		return tcpm_do_cmd(CMD_LIST, 0, NULL);
+	for (i = 0; i < ARRAY_SIZE(cmds); i++) {
+		if (matches(argv[0], cmds[i].name) == 0)
+			return tcpm_do_cmd(cmds[i].code, argc-1, argv+1);
+	}
+	if (matches(argv[0], "help") == 0)
+		usage();
+
+	fprintf(stderr, "Command \"%s\" is unknown, "
+			"try \"ip tcp_metrics help\".\n", *argv);
+	exit(-1);
+}
+
diff -urpN iproute2-3.5.1/man/man8/Makefile iproute2-3.5.1-tcp_metrics/man/man8/Makefile
--- iproute2-3.5.1/man/man8/Makefile	2012-08-13 18:13:58.000000000 +0300
+++ iproute2-3.5.1-tcp_metrics/man/man8/Makefile	2012-08-23 14:45:12.506385476 +0300
@@ -7,7 +7,8 @@ MAN8PAGES = $(TARGETS) ip.8 arpd.8 lnsta
 	bridge.8 rtstat.8 ctstat.8 nstat.8 routef.8 \
 	ip-tunnel.8 ip-rule.8 ip-ntable.8 \
 	ip-monitor.8 tc-stab.8 tc-hfsc.8 ip-xfrm.8 ip-netns.8 \
-	ip-neighbour.8 ip-mroute.8 ip-maddress.8 ip-addrlabel.8 
+	ip-neighbour.8 ip-mroute.8 ip-maddress.8 ip-addrlabel.8 \
+	ip-tcp_metrics.8
 
 
 all: $(TARGETS)
diff -urpN iproute2-3.5.1/man/man8/ip-tcp_metrics.8 iproute2-3.5.1-tcp_metrics/man/man8/ip-tcp_metrics.8
--- iproute2-3.5.1/man/man8/ip-tcp_metrics.8	1970-01-01 02:00:00.000000000 +0200
+++ iproute2-3.5.1-tcp_metrics/man/man8/ip-tcp_metrics.8	2012-08-25 17:37:30.868321617 +0300
@@ -0,0 +1,143 @@
+.TH "IP\-TCP_METRICS" 8 "23 Aug 2012" "iproute2" "Linux"
+.SH "NAME"
+ip-tcp_metrics \- management for TCP Metrics
+.SH "SYNOPSIS"
+.sp
+.ad l
+.in +8
+.ti -8
+.B ip
+.RI "[ " OPTIONS " ]"
+.B tcp_metrics
+.RI "{ " COMMAND " | "
+.BR help " }"
+.sp
+
+.ti -8
+.BR "ip tcp_metrics" " { " show " | " flush " }
+.IR SELECTOR
+
+.ti -8
+.BR "ip tcp_metrics delete " [ " address " ]
+.IR ADDRESS
+
+.ti -8
+.IR SELECTOR " := " 
+.RB "[ [ " address " ] "
+.IR PREFIX " ]"
+
+.SH "DESCRIPTION"
+.B ip tcp_metrics
+is used to manipulate entries in the kernel that keep TCP information
+for IPv4 and IPv6 destinations. The entries are created when
+TCP sockets want to share information for destinations and are
+stored in a cache keyed by the destination address. The saved
+information may include values for metrics (initially obtained from
+routes), recent TSVAL for TIME-WAIT recycling purposes, state for the
+Fast Open feature, etc.
+For performance reasons the cache can not grow above configured limit
+and the older entries are replaced with fresh information, sometimes
+reclaimed and used for new destinations. The kernel never removes
+entries, they can be flushed only with this tool.
+
+.SS ip tcp_metrics show - show cached entries
+
+.TP
+.BI address " PREFIX " (default)
+IPv4/IPv6 prefix or address. If no prefix is provided all entries are shown.
+
+.LP
+The output may contain the following information:
+
+.BI age " <S.MMM>" sec
+- time after the entry was created, reset or updated with metrics
+from sockets. The entry is reset and refreshed on use with metrics from
+route if the metrics are not updated in last hour. Not all cached values
+reset the age on update.
+
+.BI cwnd " <N>"
+- CWND metric value
+
+.BI fo_cookie " <HEX-STRING>"
+- Cookie value received in SYN-ACK to be used by Fast Open for next SYNs
+
+.BI fo_mss " <N>"
+- MSS value received in SYN-ACK to be used by Fast Open for next SYNs
+
+.BI fo_syn_drops " <N>/<S.MMM>" "sec ago"
+- Number of drops of initial outgoing Fast Open SYNs with data
+detected by monitoring the received SYN-ACK after SYN retransmission.
+The seconds show the time after last SYN drop and together with
+the drop count can be used to disable Fast Open for some time.
+
+.BI reordering " <N>"
+- Reordering metric value
+
+.BI rtt " <N>" ms
+- RTT metric value
+
+.BI rttvar " <N>" ms
+- RTTVAR metric value
+
+.BI ssthresh " <SSTHRESH>"
+- SSTHRESH metric value
+
+.BI tw_ts " <TSVAL>/<SEC>" "sec ago"
+- recent TSVAL and the seconds after saving it into TIME-WAIT socket
+
+.SS ip tcp_metrics delete - delete single entry
+
+.TP
+.BI address " ADDRESS " (default)
+IPv4/IPv6 address. The address is a required argument.
+
+.SS ip tcp_metrics flush - flush entries
+This command flushes the entries selected by some criteria.
+
+.PP
+This command has the same arguments as
+.B show.
+
+.SH "EXAMPLES"
+.PP
+ip tcp_metrics show address 192.168.0.0/24
+.RS 4
+Shows the entries for destinations from subnet
+.RE
+.PP
+ip tcp_metrics show 192.168.0.0/24
+.RS 4
+The same but address keyword is optional
+.RE
+.PP
+ip tcp_metrics
+.RS 4
+Show all is the default action
+.RE
+.PP
+ip tcp_metrics delete 192.168.0.1
+.RS 4
+Removes the entry for 192.168.0.1 from cache.
+.RE
+.PP
+ip tcp_metrics flush 192.168.0.0/24
+.RS 4
+Removes entries for destinations from subnet
+.RE
+.PP
+ip tcp_metrics flush all
+.RS 4
+Removes all entries from cache
+.RE
+.PP
+ip -6 tcp_metrics flush all
+.RS 4
+Removes all IPv6 entries from cache keeping the IPv4 entries.
+.RE
+
+.SH SEE ALSO
+.br
+.BR ip (8)
+
+.SH AUTHOR
+Original Manpage by Julian Anastasov <ja@ssi.bg>
diff -urpN iproute2-3.5.1/man/man8/ip.8 iproute2-3.5.1-tcp_metrics/man/man8/ip.8
--- iproute2-3.5.1/man/man8/ip.8	2012-08-13 18:13:58.000000000 +0300
+++ iproute2-3.5.1-tcp_metrics/man/man8/ip.8	2012-08-23 10:28:12.541672495 +0300
@@ -15,7 +15,7 @@ ip \- show / manipulate routing, devices
 .IR OBJECT " := { "
 .BR link " | " addr " | " addrlabel " | " route " | " rule " | " neigh " | "\
  ntable " | " tunnel " | " tuntap " | " maddr " | "  mroute " | " mrule " | "\
- monitor " | " xfrm " | " netns " | "  l2tp " }"
+ monitor " | " xfrm " | " netns " | "  l2tp " | "  tcp_metrics " }"
 .sp
 
 .ti -8
@@ -156,6 +156,10 @@ host addresses.
 - rule in routing policy database.
 
 .TP
+.B tcp_metrics/tcpmetrics
+- manage TCP Metrics
+
+.TP
 .B tunnel
 - tunnel over IP.
 
@@ -215,6 +219,7 @@ was written by Alexey N. Kuznetsov and a
 .BR ip-ntable (8),
 .BR ip-route (8),
 .BR ip-rule (8),
+.BR ip-tcp_metrics (8),
 .BR ip-tunnel (8),
 .BR ip-xfrm (8)
 .br

^ permalink raw reply

* [PATCH v2 0/2] Interface for TCP Metrics
From: Julian Anastasov @ 2012-09-02  5:36 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, Stephen Hemminger

	This patchset contains 2 patches, one for kernel
and one for iproute2. We add DUMP/GET/DEL support for
genl "tcp_metrics" and minimal support for filtering
by address prefix and family in user space.

	I tested show/del/flush, filtering by family, IPv4 prefix,
output for metrics such as rtt, rttvar, cwnd, ssthresh (after
modifying route). I didn't tested output for fast open.

	May be some corrections in output can be desired.

	The kernel patch has some parts to check:

- in tcp_metrics_nl_cmd_del I'm trying to flush all addresses
with single request, eg. when no family, address or other
selectors are provided. I use some arbitrary counter sync_count
to force memory to be freed in parts by using synchronize_rcu,
note that we are under genl_mutex, so may be this is bad idea
to delay other genl users? My idea was to avoid many commands
on "flush all", may be we should trigger flush by using some
other mechanism that is pernet? And I don't know how to test
it properly without large cache.

- I enabled lockdep in config and don't see any warnings.
make C=2 net/ipv4/tcp_metrics.o is silent too.

v2:
- patch 1: remove rcu_assign_pointer, add tcp_metrics_flush_all
- patch 2: properly flush by specifying address, improve man page

^ permalink raw reply

* [PATCH v2 1/2] tcp: add generic netlink support for tcp_metrics
From: Julian Anastasov @ 2012-09-02  5:36 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, Stephen Hemminger
In-Reply-To: <1346564178-1794-1-git-send-email-ja@ssi.bg>

	Add support for genl "tcp_metrics". No locking
is changed, only that now we can unlink and delete
entries after grace period. We implement get/del for
single entry and dump to support show/flush filtering
in user space. Del without address attribute causes
flush for all addresses, sadly under genl_mutex.

v2:
- remove rcu_assign_pointer as suggested by Eric Dumazet,
it is not needed because there are no other writes under lock
- move the flushing code in tcp_metrics_flush_all

Signed-off-by: Julian Anastasov <ja@ssi.bg>
---
 include/linux/Kbuild        |    1 +
 include/linux/tcp_metrics.h |   54 +++++++
 net/ipv4/tcp_metrics.c      |  357 +++++++++++++++++++++++++++++++++++++++++--
 3 files changed, 399 insertions(+), 13 deletions(-)
 create mode 100644 include/linux/tcp_metrics.h

diff --git a/include/linux/Kbuild b/include/linux/Kbuild
index 1f2c1c7..90da0af 100644
--- a/include/linux/Kbuild
+++ b/include/linux/Kbuild
@@ -363,6 +363,7 @@ header-y += sysctl.h
 header-y += sysinfo.h
 header-y += taskstats.h
 header-y += tcp.h
+header-y += tcp_metrics.h
 header-y += telephony.h
 header-y += termios.h
 header-y += time.h
diff --git a/include/linux/tcp_metrics.h b/include/linux/tcp_metrics.h
new file mode 100644
index 0000000..cb5157b
--- /dev/null
+++ b/include/linux/tcp_metrics.h
@@ -0,0 +1,54 @@
+/* tcp_metrics.h - TCP Metrics Interface */
+
+#ifndef _LINUX_TCP_METRICS_H
+#define _LINUX_TCP_METRICS_H
+
+#include <linux/types.h>
+
+/* NETLINK_GENERIC related info
+ */
+#define TCP_METRICS_GENL_NAME		"tcp_metrics"
+#define TCP_METRICS_GENL_VERSION	0x1
+
+enum tcp_metric_index {
+	TCP_METRIC_RTT,
+	TCP_METRIC_RTTVAR,
+	TCP_METRIC_SSTHRESH,
+	TCP_METRIC_CWND,
+	TCP_METRIC_REORDERING,
+
+	/* Always last.  */
+	__TCP_METRIC_MAX,
+};
+
+#define TCP_METRIC_MAX	(__TCP_METRIC_MAX - 1)
+
+enum {
+	TCP_METRICS_ATTR_UNSPEC,
+	TCP_METRICS_ATTR_ADDR_IPV4,		/* u32 */
+	TCP_METRICS_ATTR_ADDR_IPV6,		/* binary */
+	TCP_METRICS_ATTR_AGE,			/* msecs */
+	TCP_METRICS_ATTR_TW_TSVAL,		/* u32, raw, rcv tsval */
+	TCP_METRICS_ATTR_TW_TS_STAMP,		/* s32, sec age */
+	TCP_METRICS_ATTR_VALS,			/* nested +1, u32 */
+	TCP_METRICS_ATTR_FOPEN_MSS,		/* u16 */
+	TCP_METRICS_ATTR_FOPEN_SYN_DROPS,	/* u16, count of drops */
+	TCP_METRICS_ATTR_FOPEN_SYN_DROP_TS,	/* msecs age */
+	TCP_METRICS_ATTR_FOPEN_COOKIE,		/* binary */
+
+	__TCP_METRICS_ATTR_MAX,
+};
+
+#define TCP_METRICS_ATTR_MAX	(__TCP_METRICS_ATTR_MAX - 1)
+
+enum {
+	TCP_METRICS_CMD_UNSPEC,
+	TCP_METRICS_CMD_GET,
+	TCP_METRICS_CMD_DEL,
+
+	__TCP_METRICS_CMD_MAX,
+};
+
+#define TCP_METRICS_CMD_MAX	(__TCP_METRICS_CMD_MAX - 1)
+
+#endif /* _LINUX_TCP_METRICS_H */
diff --git a/net/ipv4/tcp_metrics.c b/net/ipv4/tcp_metrics.c
index 0abe67b..b020135 100644
--- a/net/ipv4/tcp_metrics.c
+++ b/net/ipv4/tcp_metrics.c
@@ -8,6 +8,7 @@
 #include <linux/init.h>
 #include <linux/tcp.h>
 #include <linux/hash.h>
+#include <linux/tcp_metrics.h>
 
 #include <net/inet_connection_sock.h>
 #include <net/net_namespace.h>
@@ -17,20 +18,10 @@
 #include <net/ipv6.h>
 #include <net/dst.h>
 #include <net/tcp.h>
+#include <net/genetlink.h>
 
 int sysctl_tcp_nometrics_save __read_mostly;
 
-enum tcp_metric_index {
-	TCP_METRIC_RTT,
-	TCP_METRIC_RTTVAR,
-	TCP_METRIC_SSTHRESH,
-	TCP_METRIC_CWND,
-	TCP_METRIC_REORDERING,
-
-	/* Always last.  */
-	TCP_METRIC_MAX,
-};
-
 struct tcp_fastopen_metrics {
 	u16	mss;
 	u16	syn_loss:10;		/* Recurring Fast Open SYN losses */
@@ -45,8 +36,10 @@ struct tcp_metrics_block {
 	u32				tcpm_ts;
 	u32				tcpm_ts_stamp;
 	u32				tcpm_lock;
-	u32				tcpm_vals[TCP_METRIC_MAX];
+	u32				tcpm_vals[TCP_METRIC_MAX + 1];
 	struct tcp_fastopen_metrics	tcpm_fastopen;
+
+	struct rcu_head			rcu_head;
 };
 
 static bool tcp_metric_locked(struct tcp_metrics_block *tm,
@@ -690,6 +683,328 @@ void tcp_fastopen_cache_set(struct sock *sk, u16 mss,
 	rcu_read_unlock();
 }
 
+static struct genl_family tcp_metrics_nl_family = {
+	.id		= GENL_ID_GENERATE,
+	.hdrsize	= 0,
+	.name		= TCP_METRICS_GENL_NAME,
+	.version	= TCP_METRICS_GENL_VERSION,
+	.maxattr	= TCP_METRICS_ATTR_MAX,
+	.netnsok	= true,
+};
+
+static struct nla_policy tcp_metrics_nl_policy[TCP_METRICS_ATTR_MAX + 1] = {
+	[TCP_METRICS_ATTR_ADDR_IPV4]	= { .type = NLA_U32, },
+	[TCP_METRICS_ATTR_ADDR_IPV6]	= { .type = NLA_BINARY,
+					    .len = sizeof(struct in6_addr), },
+	/* Following attributes are not received for GET/DEL,
+	 * we keep them for reference
+	 */
+#if 0
+	[TCP_METRICS_ATTR_AGE]		= { .type = NLA_MSECS, },
+	[TCP_METRICS_ATTR_TW_TSVAL]	= { .type = NLA_U32, },
+	[TCP_METRICS_ATTR_TW_TS_STAMP]	= { .type = NLA_S32, },
+	[TCP_METRICS_ATTR_VALS]		= { .type = NLA_NESTED, },
+	[TCP_METRICS_ATTR_FOPEN_MSS]	= { .type = NLA_U16, },
+	[TCP_METRICS_ATTR_FOPEN_SYN_DROPS]	= { .type = NLA_U16, },
+	[TCP_METRICS_ATTR_FOPEN_SYN_DROP_TS]	= { .type = NLA_MSECS, },
+	[TCP_METRICS_ATTR_FOPEN_COOKIE]	= { .type = NLA_BINARY,
+					    .len = TCP_FASTOPEN_COOKIE_MAX, },
+#endif
+};
+
+/* Add attributes, caller cancels its header on failure */
+static int tcp_metrics_fill_info(struct sk_buff *msg,
+				 struct tcp_metrics_block *tm)
+{
+	struct nlattr *nest;
+	int i;
+
+	switch (tm->tcpm_addr.family) {
+	case AF_INET:
+		if (nla_put_be32(msg, TCP_METRICS_ATTR_ADDR_IPV4,
+				tm->tcpm_addr.addr.a4) < 0)
+			goto nla_put_failure;
+		break;
+	case AF_INET6:
+		if (nla_put(msg, TCP_METRICS_ATTR_ADDR_IPV6, 16,
+			    tm->tcpm_addr.addr.a6) < 0)
+			goto nla_put_failure;
+		break;
+	default:
+		return -EAFNOSUPPORT;
+	}
+
+	if (nla_put_msecs(msg, TCP_METRICS_ATTR_AGE,
+			  jiffies - tm->tcpm_stamp) < 0)
+		goto nla_put_failure;
+	if (tm->tcpm_ts_stamp) {
+		if (nla_put_s32(msg, TCP_METRICS_ATTR_TW_TS_STAMP,
+				(s32) (get_seconds() - tm->tcpm_ts_stamp)) < 0)
+			goto nla_put_failure;
+		if (nla_put_u32(msg, TCP_METRICS_ATTR_TW_TSVAL,
+				tm->tcpm_ts) < 0)
+			goto nla_put_failure;
+	}
+
+	{
+		int n = 0;
+
+		nest = nla_nest_start(msg, TCP_METRICS_ATTR_VALS);
+		if (!nest)
+			goto nla_put_failure;
+		for (i = 0; i < TCP_METRIC_MAX + 1; i++) {
+			if (!tm->tcpm_vals[i])
+				continue;
+			if (nla_put_u32(msg, i + 1, tm->tcpm_vals[i]) < 0)
+				goto nla_put_failure;
+			n++;
+		}
+		if (n)
+			nla_nest_end(msg, nest);
+		else
+			nla_nest_cancel(msg, nest);
+	}
+
+	{
+		struct tcp_fastopen_metrics tfom_copy[1], *tfom;
+		unsigned int seq;
+
+		do {
+			seq = read_seqbegin(&fastopen_seqlock);
+			tfom_copy[0] = tm->tcpm_fastopen;
+		} while (read_seqretry(&fastopen_seqlock, seq));
+
+		tfom = tfom_copy;
+		if (tfom->mss &&
+		    nla_put_u16(msg, TCP_METRICS_ATTR_FOPEN_MSS,
+				tfom->mss) < 0)
+			goto nla_put_failure;
+		if (tfom->syn_loss &&
+		    (nla_put_u16(msg, TCP_METRICS_ATTR_FOPEN_SYN_DROPS,
+				tfom->syn_loss) < 0 ||
+		     nla_put_msecs(msg, TCP_METRICS_ATTR_FOPEN_SYN_DROP_TS,
+				jiffies - tfom->last_syn_loss) < 0))
+			goto nla_put_failure;
+		if (tfom->cookie.len > 0 &&
+		    nla_put(msg, TCP_METRICS_ATTR_FOPEN_COOKIE,
+			    tfom->cookie.len, tfom->cookie.val) < 0)
+			goto nla_put_failure;
+	}
+
+	return 0;
+
+nla_put_failure:
+	return -EMSGSIZE;
+}
+
+static int tcp_metrics_dump_info(struct sk_buff *skb,
+				 struct netlink_callback *cb,
+				 struct tcp_metrics_block *tm)
+{
+	void *hdr;
+
+	hdr = genlmsg_put(skb, NETLINK_CB(cb->skb).pid, cb->nlh->nlmsg_seq,
+			  &tcp_metrics_nl_family, NLM_F_MULTI,
+			  TCP_METRICS_CMD_GET);
+	if (!hdr)
+		return -EMSGSIZE;
+
+	if (tcp_metrics_fill_info(skb, tm) < 0)
+		goto nla_put_failure;
+
+	return genlmsg_end(skb, hdr);
+
+nla_put_failure:
+	genlmsg_cancel(skb, hdr);
+	return -EMSGSIZE;
+}
+
+static int tcp_metrics_nl_dump(struct sk_buff *skb,
+			       struct netlink_callback *cb)
+{
+	struct net *net = sock_net(skb->sk);
+	unsigned int max_rows = 1U << net->ipv4.tcp_metrics_hash_log;
+	unsigned int row, s_row = cb->args[0];
+	int s_col = cb->args[1], col = s_col;
+
+	for (row = s_row; row < max_rows; row++, s_col = 0) {
+		struct tcp_metrics_block *tm;
+		struct tcpm_hash_bucket *hb = net->ipv4.tcp_metrics_hash + row;
+
+		rcu_read_lock();
+		for (col = 0, tm = rcu_dereference(hb->chain); tm;
+		     tm = rcu_dereference(tm->tcpm_next), col++) {
+			if (col < s_col)
+				continue;
+			if (tcp_metrics_dump_info(skb, cb, tm) < 0) {
+				rcu_read_unlock();
+				goto done;
+			}
+		}
+		rcu_read_unlock();
+	}
+
+done:
+	cb->args[0] = row;
+	cb->args[1] = col;
+	return skb->len;
+}
+
+static int parse_nl_addr(struct genl_info *info, struct inetpeer_addr *addr,
+			 unsigned int *hash, int optional)
+{
+	struct nlattr *a;
+
+	a = info->attrs[TCP_METRICS_ATTR_ADDR_IPV4];
+	if (a) {
+		addr->family = AF_INET;
+		addr->addr.a4 = nla_get_be32(a);
+		*hash = (__force unsigned int) addr->addr.a4;
+		return 0;
+	}
+	a = info->attrs[TCP_METRICS_ATTR_ADDR_IPV6];
+	if (a) {
+		if (nla_len(a) != sizeof(sizeof(struct in6_addr)))
+			return -EINVAL;
+		addr->family = AF_INET6;
+		memcpy(addr->addr.a6, nla_data(a), sizeof(addr->addr.a6));
+		*hash = ipv6_addr_hash((struct in6_addr *) addr->addr.a6);
+		return 0;
+	}
+	return optional ? 1 : -EAFNOSUPPORT;
+}
+
+static int tcp_metrics_nl_cmd_get(struct sk_buff *skb, struct genl_info *info)
+{
+	struct tcp_metrics_block *tm;
+	struct inetpeer_addr addr;
+	unsigned int hash;
+	struct sk_buff *msg;
+	struct net *net = genl_info_net(info);
+	void *reply;
+	int ret;
+
+	ret = parse_nl_addr(info, &addr, &hash, 0);
+	if (ret < 0)
+		return ret;
+
+	msg = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL);
+	if (!msg)
+		return -ENOMEM;
+
+	reply = genlmsg_put_reply(msg, info, &tcp_metrics_nl_family, 0,
+				  info->genlhdr->cmd);
+	if (!reply)
+		goto nla_put_failure;
+
+	hash = hash_32(hash, net->ipv4.tcp_metrics_hash_log);
+	ret = -ESRCH;
+	rcu_read_lock();
+	for (tm = rcu_dereference(net->ipv4.tcp_metrics_hash[hash].chain); tm;
+	     tm = rcu_dereference(tm->tcpm_next)) {
+		if (addr_same(&tm->tcpm_addr, &addr)) {
+			ret = tcp_metrics_fill_info(msg, tm);
+			break;
+		}
+	}
+	rcu_read_unlock();
+	if (ret < 0)
+		goto out_free;
+
+	genlmsg_end(msg, reply);
+	return genlmsg_reply(msg, info);
+
+nla_put_failure:
+	ret = -EMSGSIZE;
+
+out_free:
+	nlmsg_free(msg);
+	return ret;
+}
+
+#define deref_locked_genl(p)	\
+	rcu_dereference_protected(p, lockdep_genl_is_held() && \
+				     lockdep_is_held(&tcp_metrics_lock))
+
+#define deref_genl(p)	rcu_dereference_protected(p, lockdep_genl_is_held())
+
+static int tcp_metrics_flush_all(struct net *net)
+{
+	unsigned int max_rows = 1U << net->ipv4.tcp_metrics_hash_log;
+	struct tcpm_hash_bucket *hb = net->ipv4.tcp_metrics_hash;
+	struct tcp_metrics_block *tm;
+	unsigned int sync_count = 0;
+	unsigned int row;
+
+	for (row = 0; row < max_rows; row++, hb++) {
+		spin_lock_bh(&tcp_metrics_lock);
+		tm = deref_locked_genl(hb->chain);
+		if (tm)
+			hb->chain = NULL;
+		spin_unlock_bh(&tcp_metrics_lock);
+		while (tm) {
+			struct tcp_metrics_block *next;
+
+			next = deref_genl(tm->tcpm_next);
+			kfree_rcu(tm, rcu_head);
+			if (!((++sync_count) & 2047))
+				synchronize_rcu();
+			tm = next;
+		}
+	}
+	return 0;
+}
+
+static int tcp_metrics_nl_cmd_del(struct sk_buff *skb, struct genl_info *info)
+{
+	struct tcpm_hash_bucket *hb;
+	struct tcp_metrics_block *tm;
+	struct tcp_metrics_block __rcu **pp;
+	struct inetpeer_addr addr;
+	unsigned int hash;
+	struct net *net = genl_info_net(info);
+	int ret;
+
+	ret = parse_nl_addr(info, &addr, &hash, 1);
+	if (ret < 0)
+		return ret;
+	if (ret > 0)
+		return tcp_metrics_flush_all(net);
+
+	hash = hash_32(hash, net->ipv4.tcp_metrics_hash_log);
+	hb = net->ipv4.tcp_metrics_hash + hash;
+	pp = &hb->chain;
+	spin_lock_bh(&tcp_metrics_lock);
+	for (tm = deref_locked_genl(*pp); tm;
+	     pp = &tm->tcpm_next, tm = deref_locked_genl(*pp)) {
+		if (addr_same(&tm->tcpm_addr, &addr)) {
+			*pp = tm->tcpm_next;
+			break;
+		}
+	}
+	spin_unlock_bh(&tcp_metrics_lock);
+	if (!tm)
+		return -ESRCH;
+	kfree_rcu(tm, rcu_head);
+	return 0;
+}
+
+static struct genl_ops tcp_metrics_nl_ops[] = {
+	{
+		.cmd = TCP_METRICS_CMD_GET,
+		.doit = tcp_metrics_nl_cmd_get,
+		.dumpit = tcp_metrics_nl_dump,
+		.policy = tcp_metrics_nl_policy,
+		.flags = GENL_ADMIN_PERM,
+	},
+	{
+		.cmd = TCP_METRICS_CMD_DEL,
+		.doit = tcp_metrics_nl_cmd_del,
+		.policy = tcp_metrics_nl_policy,
+		.flags = GENL_ADMIN_PERM,
+	},
+};
+
 static unsigned int tcpmhash_entries;
 static int __init set_tcpmhash_entries(char *str)
 {
@@ -753,5 +1068,21 @@ static __net_initdata struct pernet_operations tcp_net_metrics_ops = {
 
 void __init tcp_metrics_init(void)
 {
-	register_pernet_subsys(&tcp_net_metrics_ops);
+	int ret;
+
+	ret = register_pernet_subsys(&tcp_net_metrics_ops);
+	if (ret < 0)
+		goto cleanup;
+	ret = genl_register_family_with_ops(&tcp_metrics_nl_family,
+					    tcp_metrics_nl_ops,
+					    ARRAY_SIZE(tcp_metrics_nl_ops));
+	if (ret < 0)
+		goto cleanup_subsys;
+	return;
+
+cleanup_subsys:
+	unregister_pernet_subsys(&tcp_net_metrics_ops);
+
+cleanup:
+	return;
 }
-- 
1.7.3.4

^ permalink raw reply related

* [GIT] Networking
From: David Miller @ 2012-09-02  4:34 UTC (permalink / raw)
  To: torvalds; +Cc: akpm, netdev, linux-kernel


1) NLA_PUT* --> nla_put_* conversion got one case wrong in
   nfnetlink_log, fix from Patrick McHardy.

2) Missed error return check in ipw2100 driver, from Julia Lawall.

3) PMTU updates in ipv4 were setting the expiry time incorrectly, fix
   from Eric Dumazet.

4) SFC driver erroneously reversed src and dst when reporting filters
   via ethtool.

5) Memory leak in CAN protocol and wrong setting of IRQF_SHARED in
   sja1000 can platform driver, from Alexey Khoroshilov and Sven
   Schmitt.

6) Fix multicast traffic scaling regression in ipv4_dst_destroy, only
   take the lock when we really need to.  From Eric Dumazet.

7) Fix non-root process spoofing in netlink, from Pablo Neira Ayuso.

8) CWND reduction in TCP is done incorrectly during non-SACK recovery,
   fix from Yuchung Cheng.

9) Revert netpoll change, and fix what was actually a driver specific
   problem.  From Amerigo Wang.  This should cure bootup hangs with
   netconsole some people reported.

10) Fix xen-netfront invoking __skb_fill_page_desc() with a NULL page
    pointer.  From Ian Campbell.

11) SIP NAT fix for expectiontation creation, from Pablo Neira Ayuso.

12) __ip_rt_update_pmtu() needs RCU locking, from Eric Dumazet.

13) Fix usbnet deadlock on resume, can't use GFP_KERNEL in this
    situation.  From Oliver Neukum.

14) The davinci ethernet driver triggers an OOPS on removal because it
    frees an MDIO object before unregistering it.  Fix from Bin Liu.

Please pull, thanks a lot!

The following changes since commit fea7a08acb13524b47711625eebea40a0ede69a0:

  Linux 3.6-rc3 (2012-08-22 13:29:06 -0700)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git master

for you to fetch changes up to 5002200599429e83fc13e0d9a2d4788b79515b0c:

  net: qmi_wwan: add several new Gobi devices (2012-09-01 22:49:34 -0400)

----------------------------------------------------------------
Aleksander Morgado (1):
      net: qmi_wwan: new device: Foxconn/Novatel E396

Alexey Khoroshilov (1):
      can: softing: Fix potential memory leak in softing_load_fw()

Amerigo Wang (1):
      netpoll: revert 6bdb7fe3104 and fix be_poll() instead

Ben Hutchings (1):
      sfc: Fix reporting of IPv4 full filters through ethtool

Bin Liu (1):
      net: ethernet: fix kernel OOPS when remove davinci_mdio module

Bjørn Mork (1):
      net: qmi_wwan: add several new Gobi devices

Bruce Allan (1):
      e1000e: DoS while TSO enabled caused by link partner with small MSS

Claudiu Manoil (1):
      gianfar: fix default tx vlan offload feature flag

Dan Carpenter (1):
      fddi: 64 bit bug in smt_add_para()

David S. Miller (4):
      Merge branch 'for-davem' of git://git.kernel.org/.../linville/wireless
      Merge branch 'fixes-for-3.6' of git://gitorious.org/linux-can/linux-can
      Merge branch 'sfc-3.6' of git://git.kernel.org/.../bwh/sfc
      Merge branch 'master' of git://1984.lsi.us.es/nf

Eric Dumazet (3):
      ipv4: properly update pmtu
      ipv4: take rt_uncached_lock only if needed
      ipv4: must use rcu protection while calling fib_lookup

Fengguang Wu (1):
      af_packet: match_fanout_group() can be static

Francesco Ruggeri (1):
      net: ipv4: ipmr_expire_timer causes crash when removing net namespace

Giuseppe CAVALLARO (2):
      stmmac: fix GMAC syn ID
      stmmac: fix a typo in the macro used to mask the mmc irq

Ian Campbell (1):
      xen-netfront: use __pskb_pull_tail to ensure linear area is big enough on RX

Jaccon Bastiaansen (1):
      cs89x0 : packet reception not working

Johannes Berg (2):
      iwlwifi: fix flow handler debug code
      iwlwifi: protect SRAM debugfs

John W. Linville (2):
      Merge branch 'for-john' of git://git.kernel.org/.../jberg/mac80211
      Merge branch 'master' of git://git.kernel.org/.../linville/wireless into for-davem

Julia Lawall (6):
      drivers/net/wireless/ipw2x00/ipw2100.c: introduce missing initialization
      ipvs: fix error return code
      netfilter: ctnetlink: fix error return code in init path
      netfilter: nfnetlink_log: fix error return code in init path
      net: ipv6: fix error return code
      net/xfrm/xfrm_state.c: fix error return code

Merav Sicron (2):
      bnx2x: Move netif_napi_add to the open call
      bnx2x: Correct the ndo_poll_controller call

Oliver Neukum (1):
      usbnet: fix deadlock in resume

Pablo Neira Ayuso (3):
      netlink: fix possible spoofing from non-root processes
      netfilter: nf_nat_sip: fix incorrect handling of EBUSY for RTCP expectation
      netfilter: nf_conntrack: fix racy timer handling with reliable events

Patrick McHardy (1):
      netfilter: nfnetlink_log: fix NLA_PUT macro removal bug

Rayagond Kokatanur (1):
      stmmac: add header inclusion protection

Sven Schmitt (1):
      can: sja1000_platform: fix wrong flag IRQF_SHARED for interrupt sharing

Thomas Huehn (1):
      ath5k: fix wrong max power per rate eeprom reads for 802.11a

Thomas Pedersen (1):
      mac80211: fix DS to MBSS address translation

Vladimir Zapolskiy (1):
      brcm80211: smac: set interface down on reset

Yuchung Cheng (1):
      tcp: fix cwnd reduction for non-sack recovery

Yuval Mintz (1):
      bnx2x: fix 57840_MF pci id

xeb@mail.ru (1):
      l2tp: avoid to use synchronize_rcu in tunnel free function

 drivers/net/can/sja1000/sja1000_platform.c            |  4 +++-
 drivers/net/can/softing/softing_fw.c                  |  7 ++++---
 drivers/net/ethernet/broadcom/bnx2x/bnx2x.h           |  3 ---
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c       |  4 ++++
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h       |  4 ++--
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_ethtool.c   |  2 --
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c      | 18 ++++++++---------
 drivers/net/ethernet/cirrus/cs89x0.c                  | 10 +++++-----
 drivers/net/ethernet/emulex/benet/be_cmds.c           |  6 ++++--
 drivers/net/ethernet/emulex/benet/be_main.c           |  2 ++
 drivers/net/ethernet/freescale/gianfar.c              |  2 +-
 drivers/net/ethernet/intel/e1000e/e1000.h             |  1 +
 drivers/net/ethernet/intel/e1000e/netdev.c            | 48 ++++++++++++++++++++++------------------------
 drivers/net/ethernet/sfc/ethtool.c                    |  4 ++--
 drivers/net/ethernet/stmicro/stmmac/common.h          |  5 +++++
 drivers/net/ethernet/stmicro/stmmac/descs.h           |  6 ++++++
 drivers/net/ethernet/stmicro/stmmac/descs_com.h       |  5 +++++
 drivers/net/ethernet/stmicro/stmmac/dwmac100.h        |  5 +++++
 drivers/net/ethernet/stmicro/stmmac/dwmac1000.h       |  5 ++++-
 drivers/net/ethernet/stmicro/stmmac/dwmac_dma.h       |  5 +++++
 drivers/net/ethernet/stmicro/stmmac/mmc.h             |  5 +++++
 drivers/net/ethernet/stmicro/stmmac/mmc_core.c        |  6 +++---
 drivers/net/ethernet/stmicro/stmmac/stmmac.h          |  5 +++++
 drivers/net/ethernet/stmicro/stmmac/stmmac_timer.h    |  4 ++++
 drivers/net/ethernet/ti/davinci_mdio.c                |  4 +++-
 drivers/net/fddi/skfp/pmf.c                           |  2 +-
 drivers/net/usb/qmi_wwan.c                            |  4 ++++
 drivers/net/usb/usbnet.c                              |  2 +-
 drivers/net/wireless/ath/ath5k/eeprom.c               |  2 +-
 drivers/net/wireless/ath/ath5k/eeprom.h               |  1 +
 drivers/net/wireless/brcm80211/brcmsmac/mac80211_if.c |  3 +++
 drivers/net/wireless/ipw2x00/ipw2100.c                |  3 ++-
 drivers/net/wireless/iwlwifi/dvm/debugfs.c            |  3 +++
 drivers/net/wireless/iwlwifi/pcie/internal.h          |  2 +-
 drivers/net/wireless/iwlwifi/pcie/rx.c                |  2 +-
 drivers/net/wireless/iwlwifi/pcie/trans.c             | 30 +++++++++++++++--------------
 drivers/net/xen-netfront.c                            | 39 ++++++++++---------------------------
 include/linux/pci_ids.h                               |  2 +-
 include/net/netfilter/nf_conntrack_ecache.h           |  1 +
 net/core/netpoll.c                                    | 10 +---------
 net/ipv4/ipmr.c                                       | 14 ++++++++++++--
 net/ipv4/netfilter/nf_nat_sip.c                       |  5 ++++-
 net/ipv4/route.c                                      |  6 ++++--
 net/ipv4/tcp_input.c                                  | 15 +++++++--------
 net/ipv6/esp6.c                                       |  6 +++---
 net/l2tp/l2tp_core.c                                  |  3 +--
 net/l2tp/l2tp_core.h                                  |  1 +
 net/mac80211/tx.c                                     | 38 ++++++++++++++++--------------------
 net/netfilter/ipvs/ip_vs_ctl.c                        |  4 +++-
 net/netfilter/nf_conntrack_core.c                     | 16 +++++++++++-----
 net/netfilter/nf_conntrack_netlink.c                  |  3 ++-
 net/netfilter/nfnetlink_log.c                         |  6 ++++--
 net/netlink/af_netlink.c                              |  4 +++-
 net/packet/af_packet.c                                |  2 +-
 net/xfrm/xfrm_state.c                                 |  4 +++-
 55 files changed, 232 insertions(+), 171 deletions(-)

^ permalink raw reply

* Re: [PATCH] net: qmi_wwan: add several new Gobi devices
From: David Miller @ 2012-09-02  2:49 UTC (permalink / raw)
  To: bjorn; +Cc: netdev, linux-usb, aleksander, ttuttle
In-Reply-To: <1346507246-9029-1-git-send-email-bjorn@mork.no>

From: Bjørn Mork <bjorn@mork.no>
Date: Sat,  1 Sep 2012 15:47:26 +0200

> Gobi devices are composite, needing both the qcserial and
> qmi_wwan drivers to support all functions.  Re-syncing the
> list of supported devices with qcserial.
> 
> Cc: Aleksander Morgado <aleksander@lanedo.com>
> Cc: Thomas Tuttle <ttuttle@chromium.org>
> Signed-off-by: Bjørn Mork <bjorn@mork.no>

Applied, thanks.

^ permalink raw reply

* Re: [PATCH v3 0/2] 6lowpan fixes
From: David Miller @ 2012-09-02  2:48 UTC (permalink / raw)
  To: alan
  Cc: alex.bluesman.smirnov, dbaryshkov, tony.cheneau,
	linux-zigbee-devel, netdev, linux-kernel
In-Reply-To: <1346515027-5237-1-git-send-email-alan@signal11.us>

From: Alan Ott <alan@signal11.us>
Date: Sat,  1 Sep 2012 11:57:05 -0400

> v3 of this patch changes skb_copy() to skb_clone() in patch #1 at the
> recommendation of Eric Dumazet
> 
> Alan Ott (2):
>   6lowpan: Make a copy of skb's delivered to 6lowpan
>   6lowpan: handle NETDEV_UNREGISTER event

All applied to net-next, thanks.

^ permalink raw reply

* Re: [patch] fddi: 64 bit bug in smt_add_para()
From: David Miller @ 2012-09-02  2:46 UTC (permalink / raw)
  To: dan.carpenter; +Cc: netdev, kernel-janitors
In-Reply-To: <20120901195740.GG20741@mwanda>

From: Dan Carpenter <dan.carpenter@oracle.com>
Date: Sat, 1 Sep 2012 12:57:40 -0700

> The intent was to set 4 bytes of data so that's why the sp_len is set
> to 4 on the next line.  The cast to u_long pointer clears 8 bytes
> on 64 bit arches.
> 
> Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>

Obvious enough, applied, thanks Dan.

^ permalink raw reply

* [patch] fddi: 64 bit bug in smt_add_para()
From: Dan Carpenter @ 2012-09-01 19:57 UTC (permalink / raw)
  To: netdev; +Cc: David S. Miller, kernel-janitors

The intent was to set 4 bytes of data so that's why the sp_len is set
to 4 on the next line.  The cast to u_long pointer clears 8 bytes
on 64 bit arches.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
---
Code auditing work.  I don't have this hardware.  This bug is old.

diff --git a/drivers/net/fddi/skfp/pmf.c b/drivers/net/fddi/skfp/pmf.c
index 24d8566..441b4dc 100644
--- a/drivers/net/fddi/skfp/pmf.c
+++ b/drivers/net/fddi/skfp/pmf.c
@@ -673,7 +673,7 @@ void smt_add_para(struct s_smc *smc, struct s_pcon *pcon, u_short para,
 			sm_pm_get_ls(smc,port_to_mib(smc,port))) ;
 		break ;
 	case SMT_P_REASON :
-		* (u_long *) to = 0 ;
+		*(u32 *)to = 0 ;
 		sp_len = 4 ;
 		goto sp_done ;
 	case SMT_P1033 :			/* time stamp */

^ permalink raw reply related

* [PATCH] codel: reduce default maxpacket and reinitialize upon re-entering drop state
From: Dave Täht @ 2012-09-01 18:50 UTC (permalink / raw)
  To: Eric Dumazet, netdev; +Cc: Dave Taht

From: Dave Taht <dave.taht@bufferbloat.net>

The minimum size packet in a queue is 64, not 256.

Also, as ethtool can be used to remove tso/gso/ufo, the last seen
maxpacket can vary considerably in size (e.g. 64k). So reset it
on re-entering the drop scheduler to the most recent packet size,
and let it grow again naturally.

Signed-off-by: Dave Taht <dave.taht@bufferbloat.net>
---
 include/net/codel.h |    3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/include/net/codel.h b/include/net/codel.h
index 389cf62..0b0ac5e 100644
--- a/include/net/codel.h
+++ b/include/net/codel.h
@@ -170,7 +170,7 @@ static void codel_vars_init(struct codel_vars *vars)
 
 static void codel_stats_init(struct codel_stats *stats)
 {
-	stats->maxpacket = 256;
+	stats->maxpacket = 64;
 }
 
 /*
@@ -333,6 +333,7 @@ static struct sk_buff *codel_dequeue(struct Qdisc *sch,
 			 */
 			codel_Newton_step(vars);
 		} else {
+			stats->maxpacket = skb ? qdisc_pkt_len(skb) : 64;
 			vars->count = 1;
 			vars->rec_inv_sqrt = ~0U >> REC_INV_SQRT_SHIFT;
 		}
-- 
1.7.9.5

^ permalink raw reply related

* Re: [PATCH] fq_codel: dont reinit flow state
From: Dave Taht @ 2012-09-01 18:12 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, netdev
In-Reply-To: <1346505597.7996.76.camel@edumazet-glaptop>

partial NAK.

There is no need to reset dropped either, and resetting quantum or not
is an interesting question.

(in both cases they can be initted at init time)

Lastly, stats->maxpacket is presently a shared resource in fq_codel,
and the initial setting of 256 was an artifact some test ns2 code
(according to kathie). In the case of ack traffic owning a queue, this
can be 64 or 66 bytes (or larger on ipv6 acks).

It's totally unclear if this distinction is important in fq_codel at
present. :/. Obviously in codel which is a pure FIFO, maxpacket will
hit the max size really fast.

A related problem on maxpacket is if someone turns off tso/gso/ufo, it
remains set to the largest packet seen forever until the qdisc is
re-initialized.
This leads to really puzzling behavior... (I'll submit a patch for
this shortly but not change fq_codel's
centralized stats)

I'd be happy (for now) if dropped was preserved,
and to fiddle with the other ideas a while longer.

On Sat, Sep 1, 2012 at 6:19 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> When fq_codel builds a new flow, it should not reset codel state.
>
> Codel algo needs to get previous values (lastcount, drop_next) to get
> proper behavior.
>
> Signed-off-by: Dave Taht <dave.taht@gmail.com>
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> ---
>  net/sched/sch_fq_codel.c |    2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/net/sched/sch_fq_codel.c b/net/sched/sch_fq_codel.c
> index 9fc1c62..4e606fc 100644
> --- a/net/sched/sch_fq_codel.c
> +++ b/net/sched/sch_fq_codel.c
> @@ -191,7 +191,6 @@ static int fq_codel_enqueue(struct sk_buff *skb, struct Qdisc *sch)
>
>         if (list_empty(&flow->flowchain)) {
>                 list_add_tail(&flow->flowchain, &q->new_flows);
> -               codel_vars_init(&flow->cvars);
>                 q->new_flow_count++;
>                 flow->deficit = q->quantum;
>                 flow->dropped = 0;
> @@ -418,6 +417,7 @@ static int fq_codel_init(struct Qdisc *sch, struct nlattr *opt)
>                         struct fq_codel_flow *flow = q->flows + i;
>
>                         INIT_LIST_HEAD(&flow->flowchain);
> +                       codel_vars_init(&flow->cvars);
>                 }
>         }
>         if (sch->limit >= 1)
>
>



-- 
Dave Täht
http://www.bufferbloat.net/projects/cerowrt/wiki - "3.3.8-17 is out
with fq_codel!"

^ permalink raw reply

* [PATCH v3 2/2] 6lowpan: handle NETDEV_UNREGISTER event
From: Alan Ott @ 2012-09-01 15:57 UTC (permalink / raw)
  To: Alexander Smirnov, Dmitry Eremin-Solenikov, David S. Miller,
	Tony Cheneau
  Cc: Alan Ott, netdev-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-zigbee-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f
In-Reply-To: <1346294341-26808-1-git-send-email-alan-yzvJWuRpmD1zbRFIqnYvSA@public.gmane.org>

Before, it was impossible to remove a wpan device which had lowpan
attached to it.

Signed-off-by: Alan Ott <alan-yzvJWuRpmD1zbRFIqnYvSA@public.gmane.org>
---
 net/ieee802154/6lowpan.c |   44 +++++++++++++++++++++++++++++++++++++-------
 1 files changed, 37 insertions(+), 7 deletions(-)

diff --git a/net/ieee802154/6lowpan.c b/net/ieee802154/6lowpan.c
index b28ec79..d529111 100644
--- a/net/ieee802154/6lowpan.c
+++ b/net/ieee802154/6lowpan.c
@@ -1063,12 +1063,6 @@ out:
 	return (err < 0 ? NETDEV_TX_BUSY : NETDEV_TX_OK);
 }
 
-static void lowpan_dev_free(struct net_device *dev)
-{
-	dev_put(lowpan_dev_info(dev)->real_dev);
-	free_netdev(dev);
-}
-
 static struct wpan_phy *lowpan_get_phy(const struct net_device *dev)
 {
 	struct net_device *real_dev = lowpan_dev_info(dev)->real_dev;
@@ -1118,7 +1112,7 @@ static void lowpan_setup(struct net_device *dev)
 	dev->netdev_ops		= &lowpan_netdev_ops;
 	dev->header_ops		= &lowpan_header_ops;
 	dev->ml_priv		= &lowpan_mlme;
-	dev->destructor		= lowpan_dev_free;
+	dev->destructor		= free_netdev;
 }
 
 static int lowpan_validate(struct nlattr *tb[], struct nlattr *data[])
@@ -1244,6 +1238,34 @@ static inline void __init lowpan_netlink_fini(void)
 	rtnl_link_unregister(&lowpan_link_ops);
 }
 
+static int lowpan_device_event(struct notifier_block *unused,
+				unsigned long event,
+				void *ptr)
+{
+	struct net_device *dev = ptr;
+	LIST_HEAD(del_list);
+	struct lowpan_dev_record *entry, *tmp;
+
+	if (dev->type != ARPHRD_IEEE802154)
+		goto out;
+
+	if (event == NETDEV_UNREGISTER) {
+		list_for_each_entry_safe(entry, tmp, &lowpan_devices, list) {
+			if (lowpan_dev_info(entry->ldev)->real_dev == dev)
+				lowpan_dellink(entry->ldev, &del_list);
+		}
+
+		unregister_netdevice_many(&del_list);
+	};
+
+out:
+	return NOTIFY_DONE;
+}
+
+static struct notifier_block lowpan_dev_notifier = {
+	.notifier_call = lowpan_device_event,
+};
+
 static struct packet_type lowpan_packet_type = {
 	.type = __constant_htons(ETH_P_IEEE802154),
 	.func = lowpan_rcv,
@@ -1258,6 +1280,12 @@ static int __init lowpan_init_module(void)
 		goto out;
 
 	dev_add_pack(&lowpan_packet_type);
+
+	err = register_netdevice_notifier(&lowpan_dev_notifier);
+	if (err < 0) {
+		dev_remove_pack(&lowpan_packet_type);
+		lowpan_netlink_fini();
+	}
 out:
 	return err;
 }
@@ -1270,6 +1298,8 @@ static void __exit lowpan_cleanup_module(void)
 
 	dev_remove_pack(&lowpan_packet_type);
 
+	unregister_netdevice_notifier(&lowpan_dev_notifier);
+
 	/* Now 6lowpan packet_type is removed, so no new fragments are
 	 * expected on RX, therefore that's the time to clean incomplete
 	 * fragments.
-- 
1.7.0.4


------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/

^ permalink raw reply related

* [PATCH v3 1/2] 6lowpan: Make a copy of skb's delivered to 6lowpan
From: Alan Ott @ 2012-09-01 15:57 UTC (permalink / raw)
  To: Alexander Smirnov, Dmitry Eremin-Solenikov, David S. Miller,
	Tony Cheneau
  Cc: Alan Ott, netdev-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-zigbee-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f
In-Reply-To: <1346294341-26808-1-git-send-email-alan-yzvJWuRpmD1zbRFIqnYvSA@public.gmane.org>

Since lowpan_process_data() modifies the skb (by calling skb_pull()), we
need our own copy so that it doesn't affect the data received by other
protcols (in this case, af_ieee802154).

Signed-off-by: Alan Ott <alan-yzvJWuRpmD1zbRFIqnYvSA@public.gmane.org>
---
 net/ieee802154/6lowpan.c |    9 ++++++++-
 1 files changed, 8 insertions(+), 1 deletions(-)

diff --git a/net/ieee802154/6lowpan.c b/net/ieee802154/6lowpan.c
index 6a09522..b28ec79 100644
--- a/net/ieee802154/6lowpan.c
+++ b/net/ieee802154/6lowpan.c
@@ -1133,6 +1133,8 @@ static int lowpan_validate(struct nlattr *tb[], struct nlattr *data[])
 static int lowpan_rcv(struct sk_buff *skb, struct net_device *dev,
 	struct packet_type *pt, struct net_device *orig_dev)
 {
+	struct sk_buff *local_skb;
+
 	if (!netif_running(dev))
 		goto drop;
 
@@ -1144,7 +1146,12 @@ static int lowpan_rcv(struct sk_buff *skb, struct net_device *dev,
 	case LOWPAN_DISPATCH_IPHC:	/* ipv6 datagram */
 	case LOWPAN_DISPATCH_FRAG1:	/* first fragment header */
 	case LOWPAN_DISPATCH_FRAGN:	/* next fragments headers */
-		lowpan_process_data(skb);
+		local_skb = skb_clone(skb, GFP_ATOMIC);
+		if (!local_skb)
+			goto drop;
+		lowpan_process_data(local_skb);
+
+		kfree_skb(skb);
 		break;
 	default:
 		break;
-- 
1.7.0.4


------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/

^ permalink raw reply related

* [PATCH v3 0/2] 6lowpan fixes
From: Alan Ott @ 2012-09-01 15:57 UTC (permalink / raw)
  To: Alexander Smirnov, Dmitry Eremin-Solenikov, David S. Miller,
	Tony Cheneau
  Cc: Alan Ott, netdev-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-zigbee-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f
In-Reply-To: <1346294341-26808-1-git-send-email-alan-yzvJWuRpmD1zbRFIqnYvSA@public.gmane.org>

v3 of this patch changes skb_copy() to skb_clone() in patch #1 at the
recommendation of Eric Dumazet

Alan Ott (2):
  6lowpan: Make a copy of skb's delivered to 6lowpan
  6lowpan: handle NETDEV_UNREGISTER event

 net/ieee802154/6lowpan.c |   53 +++++++++++++++++++++++++++++++++++++++-------
 1 files changed, 45 insertions(+), 8 deletions(-)


------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/

^ permalink raw reply

* [PATCH] net: qmi_wwan: add several new Gobi devices
From: Bjørn Mork @ 2012-09-01 13:47 UTC (permalink / raw)
  To: netdev-u79uwXL29TY76Z2rM5mHXA
  Cc: linux-usb-u79uwXL29TY76Z2rM5mHXA, Bjørn Mork,
	Aleksander Morgado, Thomas Tuttle

Gobi devices are composite, needing both the qcserial and
qmi_wwan drivers to support all functions.  Re-syncing the
list of supported devices with qcserial.

Cc: Aleksander Morgado <aleksander-bhGbAngMcJvQT0dZR+AlfA@public.gmane.org>
Cc: Thomas Tuttle <ttuttle-F7+t8E8rja9g9hUCZPvPmw@public.gmane.org>
Signed-off-by: Bjørn Mork <bjorn-yOkvZcmFvRU@public.gmane.org>
---
 drivers/net/usb/qmi_wwan.c |    3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/net/usb/qmi_wwan.c b/drivers/net/usb/qmi_wwan.c
index 189e52d..adfab3f 100644
--- a/drivers/net/usb/qmi_wwan.c
+++ b/drivers/net/usb/qmi_wwan.c
@@ -413,7 +413,9 @@ static const struct usb_device_id products[] = {
 
 	/* 5. Gobi 2000 and 3000 devices */
 	{QMI_GOBI_DEVICE(0x413c, 0x8186)},	/* Dell Gobi 2000 Modem device (N0218, VU936) */
+	{QMI_GOBI_DEVICE(0x413c, 0x8194)},	/* Dell Gobi 3000 Composite */
 	{QMI_GOBI_DEVICE(0x05c6, 0x920b)},	/* Generic Gobi 2000 Modem device */
+	{QMI_GOBI_DEVICE(0x05c6, 0x920d)},	/* Gobi 3000 Composite */
 	{QMI_GOBI_DEVICE(0x05c6, 0x9225)},	/* Sony Gobi 2000 Modem device (N0279, VU730) */
 	{QMI_GOBI_DEVICE(0x05c6, 0x9245)},	/* Samsung Gobi 2000 Modem device (VL176) */
 	{QMI_GOBI_DEVICE(0x03f0, 0x251d)},	/* HP Gobi 2000 Modem device (VP412) */
@@ -441,6 +443,7 @@ static const struct usb_device_id products[] = {
 	{QMI_GOBI_DEVICE(0x1199, 0x9015)},	/* Sierra Wireless Gobi 3000 Modem device */
 	{QMI_GOBI_DEVICE(0x1199, 0x9019)},	/* Sierra Wireless Gobi 3000 Modem device */
 	{QMI_GOBI_DEVICE(0x1199, 0x901b)},	/* Sierra Wireless MC7770 */
+	{QMI_GOBI_DEVICE(0x12d1, 0x14f1)},	/* Sony Gobi 3000 Composite */
 	{QMI_GOBI_DEVICE(0x1410, 0xa021)},	/* Foxconn Gobi 3000 Modem device (Novatel E396) */
 
 	{ }					/* END */
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* [PATCH] fq_codel: dont reinit flow state
From: Eric Dumazet @ 2012-09-01 13:19 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, Dave Taht

When fq_codel builds a new flow, it should not reset codel state.

Codel algo needs to get previous values (lastcount, drop_next) to get
proper behavior.

Signed-off-by: Dave Taht <dave.taht@gmail.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 net/sched/sch_fq_codel.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/sched/sch_fq_codel.c b/net/sched/sch_fq_codel.c
index 9fc1c62..4e606fc 100644
--- a/net/sched/sch_fq_codel.c
+++ b/net/sched/sch_fq_codel.c
@@ -191,7 +191,6 @@ static int fq_codel_enqueue(struct sk_buff *skb, struct Qdisc *sch)
 
 	if (list_empty(&flow->flowchain)) {
 		list_add_tail(&flow->flowchain, &q->new_flows);
-		codel_vars_init(&flow->cvars);
 		q->new_flow_count++;
 		flow->deficit = q->quantum;
 		flow->dropped = 0;
@@ -418,6 +417,7 @@ static int fq_codel_init(struct Qdisc *sch, struct nlattr *opt)
 			struct fq_codel_flow *flow = q->flows + i;
 
 			INIT_LIST_HEAD(&flow->flowchain);
+			codel_vars_init(&flow->cvars);
 		}
 	}
 	if (sch->limit >= 1)

^ permalink raw reply related

* Re: [RFC v2] fq_codel : interval servo on hosts
From: Eric Dumazet @ 2012-09-01 12:51 UTC (permalink / raw)
  To: Yuchung Cheng; +Cc: Tomas Hruby, Nandita Dukkipati, netdev, codel
In-Reply-To: <CAK6E8=cwpq0MVK6fcrOa=wTX237xQiVP2xzh0aX-AtuSmzt-sw@mail.gmail.com>

On Fri, 2012-08-31 at 18:37 -0700, Yuchung Cheng wrote:

> Just curious: tp->srtt is a very rough estimator, e.g., Delayed-ACks
> can easily add 40 - 200ms fuzziness. Will this affect short flows?

Good point

Delayed acks shouldnt matter, because they happen when flow had been
idle for a while.

I guess we should clamp the srtt to the default interval

if (srtt)
	q->cparams.interval = min(tcp_srtt_to_codel(srtt),
				  q->default_interval);

^ permalink raw reply

* Re: kernel 3.2.27 on arm: WARNING: at mm/page_alloc.c:2109 __alloc_pages_nodemask+0x1d4/0x68c()
From: Francois Romieu @ 2012-09-01  8:20 UTC (permalink / raw)
  To: Hugh Dickins; +Cc: David Madore, netdev, linux-kernel, Eric Dumazet
In-Reply-To: <alpine.LSU.2.00.1208311900330.1936@eggly.anvils>

Hugh Dickins <hughd@google.com> :
> [ Cc'ing original mail to netdev as the problem may be recognized there ]
[...]
> Francois is right that a GFP_ATOMIC allocation from pskb_expand_head()
> is failing, which can easily happen, and cause your "failed to reallocate
> TX buffer" errors; but it's well worth looking up what's actually on
> lines 2108 and 2109 of mm/page_alloc.c in 3.2.27:
> 
> 	if (order >= MAX_ORDER) {
> 		WARN_ON_ONCE(!(gfp_mask & __GFP_NOWARN));

You are right. The wifi Tx path error is not related.

I overlooked it was using SLUB btw.

> That was probably not a sane allocation request, it has gone out of range:
> maybe the skb header is even corrupted.  If you're lucky, it might be
> something that netdev will recognize as already fixed.

Afaics nothing related to mv643xx_eth.c, nor pskb_expand_head nor
ip6_forward, at least nothing trivially close. Eric may provide a
better answer.

-- 
Ueimor

^ permalink raw reply

* Re: [PATCH -next v1] wireless: ath9k-htc: only load firmware in need
From: Ming Lei @ 2012-09-01  5:11 UTC (permalink / raw)
  To: linux-wireless-u79uwXL29TY76Z2rM5mHXA
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA, Ming Lei,
	ath9k-devel-xDcbHBWguxHbcTqmT+pZeQ, Luis R. Rodriguez,
	Jouni Malinen, Vasanthakumar Thiagarajan, Senthil Balasubramanian,
	John W. Linville
In-Reply-To: <1345536267-9449-1-git-send-email-ming.lei-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>

On Tue, Aug 21, 2012 at 4:04 PM, Ming Lei <ming.lei-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org> wrote:
> It is not necessary to hold the firmware memory during the whole
> driver lifetime, and obviously it does waste memory. Suppose there
> are 4 ath9k-htc usb dongles working, kernel has to consume about
> 4*50KBytes RAM to cache firmware for all dongles. After applying the
> patch, kernel only caches one single firmware image in RAM for
> all ath9k-htc devices just during system suspend/resume cycle.
>
> When system is ready for loading firmware, ath9k-htc can request
> the loading from usersapce. During system resume, ath9k-htc still
> can load the firmware which was cached in kernel memory before
> system suspend.
>
> Cc: ath9k-devel-xDcbHBWguxHbcTqmT+pZeQ@public.gmane.org
> Cc: "Luis R. Rodriguez" <mcgrof-A+ZNKFmMK5xy9aJCnZT0Uw@public.gmane.org>
> Cc: Jouni Malinen <jouni-A+ZNKFmMK5xy9aJCnZT0Uw@public.gmane.org>
> Cc: Vasanthakumar Thiagarajan <vthiagar-A+ZNKFmMK5xy9aJCnZT0Uw@public.gmane.org>
> Cc: Senthil Balasubramanian <senthilb-A+ZNKFmMK5xy9aJCnZT0Uw@public.gmane.org>
> Cc: "John W. Linville" <linville-2XuSBdqkA4R54TAoqtyWWQ@public.gmane.org>
> Signed-off-by: Ming Lei <ming.lei-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>
> ---
> v1:
>         fix double free of firmware in failue path of
>         ath9k_hif_usb_firmware_cb

Gentle ping, :-)


Thanks,
--
Ming Lei
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: kernel 3.2.27 on arm: WARNING: at mm/page_alloc.c:2109 __alloc_pages_nodemask+0x1d4/0x68c()
From: Hugh Dickins @ 2012-09-01  2:21 UTC (permalink / raw)
  To: David Madore; +Cc: Francois Romieu, netdev, linux-kernel
In-Reply-To: <20120829002548.GA7063@aldebaran.gro-tsen.net>

[ Cc'ing original mail to netdev as the problem may be recognized there ]

On Wed, 29 Aug 2012, David Madore wrote:
> Dear all,
> 
> I hope this is the right place to send this sort of backtrace dump.
> 
> I'm getting the following sort of dumps (below) on a 3.2.27 kernel on
> an arm/kirkwood (actually DreamPlug) machine that's used as a router.
> 
> I imagine it being somehow related to the fact that it operates a
> network bridge (I imagine this because I have another identical
> machine with exactly the same kernel and a very similar config but not
> running a bridge, and the warning never pops up).
> 
> Is this worth investigating?  (I will, of course, provide the config
> file and any other relevant data if the answer is "yes".)  Is this
> potentially serious?  (I'm getting hard lockups on this machine which
> I suspect are due to hardware and unrelated to this, but if someone
> tells me it could be the cause, I'd be more than happy to believe it.)
> 
> [24711.204492] ------------[ cut here ]------------
> [24711.209151] WARNING: at mm/page_alloc.c:2109 __alloc_pages_nodemask+0x1d4/0x68c()
> [24711.216667] Modules linked in: 8021q ath9k_htc mac80211 ath9k_common ath9k_hw ath cfg80211 bnep rfcomm sit tunnel4 sch_ingress cls_fw cls_u32 sch_sfq sch_htb pppoe pppox ppp_generic slhc bridge stp llc ip6t_REJECT ip6table_filter ip6table_mangle xt_NOTRACK ip6table_raw ip6_tables nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ftp nf_conntrack_ftp ipt_REJECT xt_conntrack iptable_filter ipt_MASQUERADE iptable_nat nf_nat xt_TCPMSS xt_tcpudp xt_mark iptable_mangle ip_tables x_tables nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 orion_wdt ipv6 snd_usb_audio snd_pcm snd_page_alloc snd_hwdep snd_usbmidi_lib snd_seq_midi snd_seq_midi_event snd_rawmidi btmrvl_sdio btmrvl snd_seq snd_timer snd_seq_device snd bluetooth soundcore
> [24711.280663] [<c000d728>] (unwind_backtrace+0x0/0xf0) from [<c0022f74>] (warn_slowpath_common+0x50/0x68)
> [24711.290124] [<c0022f74>] (warn_slowpath_common+0x50/0x68) from [<c0022fa8>] (warn_slowpath_null+0x1c/0x24)
> [24711.299845] [<c0022fa8>] (warn_slowpath_null+0x1c/0x24) from [<c009caec>] (__alloc_pages_nodemask+0x1d4/0x68c)
> [24711.309914] [<c009caec>] (__alloc_pages_nodemask+0x1d4/0x68c) from [<c009cfb4>] (__get_free_pages+0x10/0x3c)
> [24711.319805] [<c009cfb4>] (__get_free_pages+0x10/0x3c) from [<c00c9fd0>] (kmalloc_order_trace+0x24/0xdc)
> [24711.329269] [<c00c9fd0>] (kmalloc_order_trace+0x24/0xdc) from [<c038d638>] (pskb_expand_head+0x68/0x298)
> [24711.338901] [<c038d638>] (pskb_expand_head+0x68/0x298) from [<bf0dd3ec>] (ip6_forward+0x4d4/0x7bc [ipv6])
> [24711.348638] [<bf0dd3ec>] (ip6_forward+0x4d4/0x7bc [ipv6]) from [<bf0dfebc>] (ipv6_rcv+0x2bc/0x3dc [ipv6])
> [24711.358333] [<bf0dfebc>] (ipv6_rcv+0x2bc/0x3dc [ipv6]) from [<c0394870>] (__netif_receive_skb+0x544/0x66c)
> [24711.368106] [<c0394870>] (__netif_receive_skb+0x544/0x66c) from [<bf1cd054>] (br_nf_pre_routing_finish_ipv6+0x10c/0x160 [bridge])
> [24711.379899] [<bf1cd054>] (br_nf_pre_routing_finish_ipv6+0x10c/0x160 [bridge]) from [<bf1cdae8>] (br_nf_pre_routing+0x59c/0x67c [bridge])
> [24711.392271] [<bf1cdae8>] (br_nf_pre_routing+0x59c/0x67c [bridge]) from [<c03bd2a4>] (nf_iterate+0x8c/0xb4)
> [24711.401988] [<c03bd2a4>] (nf_iterate+0x8c/0xb4) from [<c03bd328>] (nf_hook_slow+0x5c/0x118)
> [24711.410540] [<c03bd328>] (nf_hook_slow+0x5c/0x118) from [<bf1c7fa4>] (br_handle_frame+0x1b8/0x290 [bridge])
> [24711.420367] [<bf1c7fa4>] (br_handle_frame+0x1b8/0x290 [bridge]) from [<c03946f8>] (__netif_receive_skb+0x3cc/0x66c)
> [24711.430872] [<c03946f8>] (__netif_receive_skb+0x3cc/0x66c) from [<c031e254>] (mv643xx_eth_poll+0x540/0x734)
> [24711.440680] [<c031e254>] (mv643xx_eth_poll+0x540/0x734) from [<c0397390>] (net_rx_action+0x118/0x314)
> [24711.449970] [<c0397390>] (net_rx_action+0x118/0x314) from [<c0029924>] (__do_softirq+0xac/0x234)
> [24711.458817] [<c0029924>] (__do_softirq+0xac/0x234) from [<c0029f00>] (irq_exit+0x94/0x9c)
> [24711.467046] [<c0029f00>] (irq_exit+0x94/0x9c) from [<c00094b0>] (handle_IRQ+0x34/0x84)
> [24711.475007] [<c00094b0>] (handle_IRQ+0x34/0x84) from [<c04398d4>] (__irq_svc+0x34/0x98)
> [24711.483068] [<c04398d4>] (__irq_svc+0x34/0x98) from [<c0011d6c>] (kirkwood_enter_idle+0x4c/0x94)
> [24711.491908] [<c0011d6c>] (kirkwood_enter_idle+0x4c/0x94) from [<c0357a00>] (cpuidle_idle_call+0xc8/0x35c)
> [24711.501532] [<c0357a00>] (cpuidle_idle_call+0xc8/0x35c) from [<c0009764>] (cpu_idle+0x88/0xdc)
> [24711.510201] [<c0009764>] (cpu_idle+0x88/0xdc) from [<c05d8720>] (start_kernel+0x2a0/0x2f0)
> [24711.518512] ---[ end trace e1776fbe32468909 ]---

Francois is right that a GFP_ATOMIC allocation from pskb_expand_head()
is failing, which can easily happen, and cause your "failed to reallocate
TX buffer" errors; but it's well worth looking up what's actually on
lines 2108 and 2109 of mm/page_alloc.c in 3.2.27:

	if (order >= MAX_ORDER) {
		WARN_ON_ONCE(!(gfp_mask & __GFP_NOWARN));

That was probably not a sane allocation request, it has gone out of range:
maybe the skb header is even corrupted.  If you're lucky, it might be
something that netdev will recognize as already fixed.

Hugh

^ permalink raw reply

* Re: [RFC v2] fq_codel : interval servo on hosts
From: Yuchung Cheng @ 2012-09-01  1:37 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Tomas Hruby, Nandita Dukkipati, netdev, codel
In-Reply-To: <1346421466.2591.38.camel@edumazet-glaptop>

On Fri, Aug 31, 2012 at 6:57 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Fri, 2012-08-31 at 06:50 -0700, Eric Dumazet wrote:
>> On Thu, 2012-08-30 at 23:55 -0700, Eric Dumazet wrote:
>> > On locally generated TCP traffic (host), we can override the 100 ms
>> > interval value using the more accurate RTT estimation maintained by TCP
>> > stack (tp->srtt)
>> >
>> > Datacenter workload benefits using shorter feedback (say if RTT is below
>> > 1 ms, we can react 100 times faster to a congestion)
>> >
>> > Idea from Yuchung Cheng.
>> >
>>
>> Linux patch would be the following :
>>
>> I'll do tests next week, but I am sending a raw patch right now if
>> anybody wants to try it.
>>
>> Presumably we also want to adjust target as well.
>>
>> To get more precise srtt values in the datacenter, we might avoid the
>> 'one jiffie slack' on small values in tcp_rtt_estimator(), as we force
>> m to be 1 before the scaling by 8 :
>>
>> if (m == 0)
>>       m = 1;
>>
>> We only need to force the least significant bit of srtt to be set.
>>
Just curious: tp->srtt is a very rough estimator, e.g., Delayed-ACks
can easily add 40 - 200ms fuzziness. Will this affect short flows?


>
> Hmm, I also need to properly init default_interval after
> codel_params_init(&q->cparams) :
>
>  net/sched/sch_fq_codel.c |   24 ++++++++++++++++++++++--
>  1 file changed, 22 insertions(+), 2 deletions(-)
>
> diff --git a/net/sched/sch_fq_codel.c b/net/sched/sch_fq_codel.c
> index 9fc1c62..f04ff6a 100644
> --- a/net/sched/sch_fq_codel.c
> +++ b/net/sched/sch_fq_codel.c
> @@ -25,6 +25,7 @@
>  #include <net/pkt_sched.h>
>  #include <net/flow_keys.h>
>  #include <net/codel.h>
> +#include <linux/tcp.h>
>
>  /*     Fair Queue CoDel.
>   *
> @@ -59,6 +60,7 @@ struct fq_codel_sched_data {
>         u32             perturbation;   /* hash perturbation */
>         u32             quantum;        /* psched_mtu(qdisc_dev(sch)); */
>         struct codel_params cparams;
> +       codel_time_t    default_interval;
>         struct codel_stats cstats;
>         u32             drop_overlimit;
>         u32             new_flow_count;
> @@ -211,6 +213,14 @@ static int fq_codel_enqueue(struct sk_buff *skb, struct Qdisc *sch)
>         return NET_XMIT_SUCCESS;
>  }
>
> +/* Given TCP srtt evaluation, return codel interval.
> + * srtt is given in jiffies, scaled by 8.
> + */
> +static codel_time_t tcp_srtt_to_codel(unsigned int srtt)
> +{
> +       return srtt * ((NSEC_PER_SEC >> (CODEL_SHIFT + 3)) / HZ);
> +}
> +
>  /* This is the specific function called from codel_dequeue()
>   * to dequeue a packet from queue. Note: backlog is handled in
>   * codel, we dont need to reduce it here.
> @@ -220,12 +230,21 @@ static struct sk_buff *dequeue(struct codel_vars *vars, struct Qdisc *sch)
>         struct fq_codel_sched_data *q = qdisc_priv(sch);
>         struct fq_codel_flow *flow;
>         struct sk_buff *skb = NULL;
> +       struct sock *sk;
>
>         flow = container_of(vars, struct fq_codel_flow, cvars);
>         if (flow->head) {
>                 skb = dequeue_head(flow);
>                 q->backlogs[flow - q->flows] -= qdisc_pkt_len(skb);
>                 sch->q.qlen--;
> +               sk = skb->sk;
> +               q->cparams.interval = q->default_interval;
> +               if (sk && sk->sk_protocol == IPPROTO_TCP) {
> +                       u32 srtt = tcp_sk(sk)->srtt;
> +
> +                       if (srtt)
> +                               q->cparams.interval = tcp_srtt_to_codel(srtt);
> +               }
>         }
>         return skb;
>  }
> @@ -330,7 +349,7 @@ static int fq_codel_change(struct Qdisc *sch, struct nlattr *opt)
>         if (tb[TCA_FQ_CODEL_INTERVAL]) {
>                 u64 interval = nla_get_u32(tb[TCA_FQ_CODEL_INTERVAL]);
>
> -               q->cparams.interval = (interval * NSEC_PER_USEC) >> CODEL_SHIFT;
> +               q->default_interval = (interval * NSEC_PER_USEC) >> CODEL_SHIFT;
>         }
>
>         if (tb[TCA_FQ_CODEL_LIMIT])
> @@ -395,6 +414,7 @@ static int fq_codel_init(struct Qdisc *sch, struct nlattr *opt)
>         INIT_LIST_HEAD(&q->new_flows);
>         INIT_LIST_HEAD(&q->old_flows);
>         codel_params_init(&q->cparams);
> +       q->default_interval = q->cparams.interval;
>         codel_stats_init(&q->cstats);
>         q->cparams.ecn = true;
>
> @@ -441,7 +461,7 @@ static int fq_codel_dump(struct Qdisc *sch, struct sk_buff *skb)
>             nla_put_u32(skb, TCA_FQ_CODEL_LIMIT,
>                         sch->limit) ||
>             nla_put_u32(skb, TCA_FQ_CODEL_INTERVAL,
> -                       codel_time_to_us(q->cparams.interval)) ||
> +                       codel_time_to_us(q->default_interval)) ||
>             nla_put_u32(skb, TCA_FQ_CODEL_ECN,
>                         q->cparams.ecn) ||
>             nla_put_u32(skb, TCA_FQ_CODEL_QUANTUM,
>
>

^ permalink raw reply

* RE: [PATCH net-next 0/12] qlcnic: patches for new adapter - Qlogic 83XX CNA
From: Sony Chacko @ 2012-09-01  1:29 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, Dept-NX Linux NIC Driver
In-Reply-To: <20120831.205308.538476490984218378.davem@davemloft.net>

Sorry about that. I will address all the issues in the next set of patches.
I am waiting for feedback from others as well.

Thanks,
Sony

-----Original Message-----
From: David Miller [mailto:davem@davemloft.net]
Sent: Friday, August 31, 2012 5:53 PM
To: Sony Chacko
Cc: netdev; Dept-NX Linux NIC Driver
Subject: Re: [PATCH net-next 0/12] qlcnic: patches for new adapter - Qlogic 83XX CNA

From: David Miller <davem@davemloft.net>
Date: Fri, 31 Aug 2012 20:00:35 -0400 (EDT)

> From: Sony Chacko <sony.chacko@qlogic.com>
> Date: Fri, 31 Aug 2012 18:36:47 -0400
>
>> Please apply the updated patch series to net-next.
>
> You posted too soon, you received other feedback in the mean-time
> which you should address as well.

Also in patch #5:

-               qlcnic_config_ipaddr(adapter, ifa->ifa_address, QLCNIC_IP_UP);
+               qlcnic_config_ipaddr(adapter, ifa->ifa_address,
+                                                QLCNIC_IP_UP);

This is not the correct way to format multi-line function calls, the correct way is:

                qlcnic_config_ipaddr(adapter, ifa->ifa_address,
                                     QLCNIC_IP_UP);

That is, you line up the first character of the argument to the column right after the openning parenthesis on the pevious line, using the appropriate number of TAB and SPACE characters necessary to achieve that.


This message and any attached documents contain information from QLogic Corporation or its wholly-owned subsidiaries that may be confidential. If you are not the intended recipient, you may not read, copy, distribute, or use this information. If you have received this transmission in error, please notify the sender immediately by reply e-mail and then delete this message.

^ permalink raw reply

* Re: [PATCH net-next 0/12] qlcnic: patches for new adapter - Qlogic 83XX CNA
From: David Miller @ 2012-09-01  0:53 UTC (permalink / raw)
  To: sony.chacko; +Cc: netdev, Dept_NX_Linux_NIC_Driver
In-Reply-To: <20120831.200035.576516635394055754.davem@davemloft.net>

From: David Miller <davem@davemloft.net>
Date: Fri, 31 Aug 2012 20:00:35 -0400 (EDT)

> From: Sony Chacko <sony.chacko@qlogic.com>
> Date: Fri, 31 Aug 2012 18:36:47 -0400
> 
>> Please apply the updated patch series to net-next.
> 
> You posted too soon, you received other feedback in the
> mean-time which you should address as well.

Also in patch #5:

-		qlcnic_config_ipaddr(adapter, ifa->ifa_address, QLCNIC_IP_UP);
+		qlcnic_config_ipaddr(adapter, ifa->ifa_address,
+						 QLCNIC_IP_UP);

This is not the correct way to format multi-line function
calls, the correct way is:

		qlcnic_config_ipaddr(adapter, ifa->ifa_address,
				     QLCNIC_IP_UP);

That is, you line up the first character of the argument to
the column right after the openning parenthesis on the pevious
line, using the appropriate number of TAB and SPACE characters
necessary to achieve that.

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox